Download the llamafile
Run the server
chmod +x TinyLlama-1.1B.llamafile
./TinyLlama-1.1B.llamafile --server --host 0.0.0.0 --port 1234
Use the LLM with OpenAI SDK:
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:1234/v1", api_key="test")
prompt = "Hi, tell me something new about AppSec"
stream = client.chat.completions.create(
model="avi-llmsky",
messages=[{"role": "user", "content": prompt}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")