15 42 51

Andrew Reed

andrewrreed

https://www.andrewreed.com

AI & ML interests

Applied ML, Practical AI, Inference & Deployment, LLMs, Multi-modal Models, RAG

Articles

XLSCOUT Unveils ParaEmbed 2.0: a Powerful Embedding Model Tailored for Patents and IP with Expert Support from Hugging Face

Jun 25

• 10

Ryght’s Journey to Empower Healthcare and Life Sciences with Expert Support from Hugging Face

Apr 16

• 6

From OpenAI to Open LLMs with Messages API

Feb 8

• 11

Open-source LLMs as LangChain Agents

Jan 24

• 30

Organizations

andrewrreed's activity

replied to their post 5 months ago

Thanks! And yes, several people have pointed out the light mode color issue... will push a fix when I get the chance

posted an update 5 months ago

Post

2337

🔬 Open LLM Progress Tracker 🔬

Inspired by the awesome work from @mlabonne , I created a Space to monitor the narrowing gap between open and proprietary LLMs as scored by the LMSYS Chatbot Arena ELO ratings 🤗

The goal is to have a continuously updated place to easily visualize these rapidly evolving industry trends 🚀

🔗 Open LLM Progress Tracker: andrewrreed/closed-vs-open-arena-elo
🔗 Source of Inspiration: https://www.linkedin.com/posts/maxime-labonne_arena-elo-graph-updated-with-new-models-activity-7187062633735368705-u2jB/

2 replies

posted an update 5 months ago

Post

2240

IMO, the "grounded generation" feature from Cohere's CommandR+ has flown under the radar...

For RAG use cases, responses directly include inline citations, making source attribution an inherent part of generation rather than an afterthought 😎

Who's working on an open dataset with this for the HF community to fine-tune with??

🔗CommandR+ Docs: https://docs.cohere.com/docs/retrieval-augmented-generation-rag

🔗Model on the 🤗 Hub: CohereForAI/c4ai-command-r-plus

1 reply

replied to their post 7 months ago

Thanks for sharing! I'm guessing this is related to the .strip() call in the template?

I've shared this feedback internally and we'll make a fix to it soon! cc @Rocketknight1 @Xenova

replied to their post 7 months ago

The latter, just those with a chat template set

replied to chiphuyen's post 7 months ago

Cool project, thanks for sharing! Reminds me of a similar effort by notdiamond

replied to their post 8 months ago

Correct!

Details on the API spec can be found here: https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/chat_completions

posted an update 8 months ago

Post

🚀 It's now easier than ever to switch from OpenAI to open LLMs

Hugging Face's TGI now supports an OpenAI compatible Chat Completion API

This means you can transition code that uses OpenAI client libraries (or frameworks like LangChain 🦜 and LlamaIndex 🦙) to run open models by changing just two lines of code 🤗

⭐ Here's how:

from openai import OpenAI

# initialize the client but point it to TGI
client = OpenAI(
    base_url="<ENDPOINT_URL>" + "/v1/",  # replace with your endpoint url
    api_key="<HF_API_TOKEN>",  # replace with your token
)
chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Why is open-source software important?"},
    ],
    stream=True,
    max_tokens=500
)

# iterate and print stream
for message in chat_completion:
    print(message.choices[0].delta.content, end="")

🔗 Blog post ➡ https://huggingface.co/blog/tgi-messages-api
🔗 TGI docs ➡ https://huggingface.co/docs/text-generation-inference/en/messages_api

7 replies

replied to victor's post 8 months ago

For models like Mixtral that don't have an explicit "system prompt" in the chat template, how is the system prompt handled? Is it just prepended to the first input from the user?