Andrew Reed
AI & ML interests
Articles
Organizations
andrewrreed's activity
Thanks! And yes, several people have pointed out the light mode color issue... will push a fix when I get the chance
Inspired by the awesome work from @mlabonne , I created a Space to monitor the narrowing gap between open and proprietary LLMs as scored by the LMSYS Chatbot Arena ELO ratings π€
The goal is to have a continuously updated place to easily visualize these rapidly evolving industry trends π
π Open LLM Progress Tracker: andrewrreed/closed-vs-open-arena-elo
π Source of Inspiration: https://www.linkedin.com/posts/maxime-labonne_arena-elo-graph-updated-with-new-models-activity-7187062633735368705-u2jB/
For RAG use cases, responses directly include inline citations, making source attribution an inherent part of generation rather than an afterthought π
Who's working on an open dataset with this for the HF community to fine-tune with??
πCommandR+ Docs: https://docs.cohere.com/docs/retrieval-augmented-generation-rag
πModel on the π€ Hub: CohereForAI/c4ai-command-r-plus
Thanks for sharing! I'm guessing this is related to the .strip()
call in the template?
I've shared this feedback internally and we'll make a fix to it soon! cc @Rocketknight1 @Xenova
The latter, just those with a chat template set
Cool project, thanks for sharing! Reminds me of a similar effort by notdiamond
Correct!
Details on the API spec can be found here: https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/chat_completions
Hugging Face's TGI now supports an OpenAI compatible Chat Completion API
This means you can transition code that uses OpenAI client libraries (or frameworks like LangChain π¦ and LlamaIndex π¦) to run open models by changing just two lines of code π€
β Here's how:
from openai import OpenAI
# initialize the client but point it to TGI
client = OpenAI(
base_url="<ENDPOINT_URL>" + "/v1/", # replace with your endpoint url
api_key="<HF_API_TOKEN>", # replace with your token
)
chat_completion = client.chat.completions.create(
model="tgi",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Why is open-source software important?"},
],
stream=True,
max_tokens=500
)
# iterate and print stream
for message in chat_completion:
print(message.choices[0].delta.content, end="")
π Blog post β‘ https://huggingface.co/blog/tgi-messages-api
π TGI docs β‘ https://huggingface.co/docs/text-generation-inference/en/messages_api
For models like Mixtral that don't have an explicit "system prompt" in the chat template, how is the system prompt handled? Is it just prepended to the first input from the user?