Download the llamafile

Download the llamafile from https://huggingface.co/avilum/llamafile-python-openai-template/blob/main/TinyLlama-1.1B.llamafile
- Use the download button.

Run the server

chmod +x TinyLlama-1.1B.llamafile

./TinyLlama-1.1B.llamafile --server --host 0.0.0.0 --port 1234

Use the LLM with OpenAI SDK:

from openai import OpenAI


client = OpenAI(base_url="http://127.0.0.1:1234/v1", api_key="test")

# Prompt
prompt = "Hi, tell me something new about AppSec"

# Send API request to llamafile server
stream = client.chat.completions.create(
    model="avi-llmsky",
    messages=[{"role": "user", "content": prompt}],
    stream=True,
)

# Print the responses
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")