Hugging Face org Sep 23

•

Inference Playground

This discussion is dedicated to providing feedback on the Inference Playground and Serverless Inference API.

About the Inference Playground:

The Inference Playground is a user interface designed to simplify testing our serverless inference API with chat models. It lists available models for you to try, allowing you to experiment with each model's settings, test available models via a UI, and copy code snippets.

To view all available settings, refer to the Serverless Inference for Chat Completion documentation.
Browse available chat models here.

If you need more usage, you can subscribe to PRO.

User Tier	Rate Limit
Unregistered Users	1 request per hour
Signed-up Users	50 requests per hour
PRO and Enterprise Users	500 requests per hour

Upcoming Features:

Continuous UI improvements
A dedicated UI for function calling
Support for vision language models
A feature to easily compare models

victor pinned discussion Sep 23

maharshpatelx

Sep 26

Add place to change API keys in Playground.

victor

Hugging Face org Sep 26

•

edited Sep 26

Add place to change API keys in Playground.

Yes I'll try to add this today. edit: I added it.

not-lain

Sep 27

support tool use🛠️

maharshpatelx

Sep 27

How to use this 🤗 InferenceClient with "Langchain" or "Llama-Index " ?

victor

Hugging Face org Sep 30

How to use this 🤗 InferenceClient with "Langchain" or "Llama-Index " ?

What do you mean? This is just a UI for easier testing and getting the code to do Inference on HF models.

Smorty100

Oct 9

Being able to use text completion like in Open Web UI would be great.

Testing function calling output would also be very appreciated. Actually calling the functions doesn't make any sense in this case, but generating the json for it would be very useful.

I am aware that this probably takes some work to accomplish, as the templates need to be evaluated to implement this, but it would be great, even if just for popular models (like the recent llama 3b)

victor

Hugging Face org Oct 9

•

edited Oct 9

Being able to use text completion like in Open Web UI would be great.

What do you mean? I think this is HuggingChat no?

Testing function calling output would also be very appreciated. Actually calling the functions doesn't make any sense in this case, but generating the json for it would be very useful.

Yes, we'll add that.

cfahlgren1

Hugging Face org Oct 10

This is awesome! one small nit, is on an iPhone with compare mode it doesn't show both well.

maybe a carousel type component (swipe) to show the different models could work good there.

victor

Hugging Face org Oct 15

It should be a bit better on mobile now @cfahlgren1

Smorty100

26 days ago

•

edited 26 days ago

@victor Maybe I am missing something, but as far as I know, HuggingChat does not have a text-completion feature.
What I am refering to is a feature, where you provide some text, and the model completes it, like base models tend to do. Like this:

User input

The following artilcle will discuss the differences between lemon and carrot cake:
# Lemon cake vs Carrot cake!
**Lemon cake**
A delicious cake made of flour, some sugar and some other stuff too
**Carrot cake**

AI output

Another cake also made from flour, but with a carroty twist!
[...]

In open webui, one can simple use any text model, and use it to make the given input text longer.
I hope this clears up what I mean

EDIT: Got a question, does this inference playgound "subtract" the amount of compute we can spend on the spaces, or is that on another seemingly endless supply of compute, like HuggingChat is? (As far as I can tell, there is no limit to how many generations one can do in HuggingChat. I use it daily and haven't encountered any limit yet)

victor

Hugging Face org 25 days ago

Hi @Smorty100 ,

The playground is only available for instruct models at the moment (I don't know if we will add support for base models).
Yes, when you use the playground you're calling the endpoint with your HF tokens, so it's subtracting from your quotas. We'll be clarifying the limits soon.

Smorty100

23 days ago

Hi @victor ,

Instruction tuned models can also complete text, just like base models. Just tested this with the granite3-moe-1B-instruct model with ollama (in Godot Game engine)

Here the test:

The model is prompted completely without any template, it simply continues what has already been said in a somewhat logical way.

The model I picked here is not the best, but I used it for the speed for the demo.

PSM272

23 days ago

Do you have any plans to add pay-as-you-go pricing per token?

And, would it be possible to support Qwen2.5-7B or 14B?

Smorty100

12 days ago

Currently, when accessing the playground from HuggingChat, it sometimes give an Error 500 code. This happens when a model is not available on the playground, but has the button on HuggingChat.

Here an example link which links from HuggingChat Llama 3.2 vision right to the playspace.

Maybe put a check in place on the playground so that it defaults to a certain model when the one in the URL isn't available and pop show some kind of message like This model isn't available anymore.

Smorty100

12 days ago

[FEATURE REQUEST]

Please give us the option to Write a prefix for the model response by typing something into its message field and then let the LLM complete it.
This would give us the ability to steer the models even better.

Here a short post about Why prefixes for LLMs can be real useful by me

John6666

12 days ago

Currently, when accessing the playground from HuggingChat, it sometimes give an Error 500 code.

This may be a variant of the problem.
https://huggingface.co/posts/Tonic/169924015276572

Smorty100

10 days ago

We have the qwen2.5-1.5B base model (not instruction tuned) on the playground but we don't have any kind of text completion interface yet.
So as it stands right now, the LLM is probably being fed with all sorts of chat tokens like <|im_start|> and <|im_end|>, which it doesn't know what to do with.

I tried to give it some start of a sentence, but it just ended up token-dumping on me.
So either we need a text completion interface like I requested previously, or we need to ban base models from the playground.
I would prefer a text completion interface ~

Smorty100

10 days ago

Aaand here I am again with another problem.
When using Zephyr beta for chat, it usually responds with <user> and then writes a prompt a user might ask. Which is surprising that Zephyr can even do that. I always though we only train the LLM on the response side and never on the input side.

It seems like the chat templates in general are broken for many models. Some models simply don't have a chat template and give an error, some kinda work but don't really, like here with zaphyr beta, and sometimes even base models get a chat template, even though they haven't been instruction tuned and thus don't know what to do with these chat tokens.

Is playground pulling the templates for these from somewhere? Maybe some of the repos don't include the correct template?
But especially with Zephyr by HuggingFace I would expect the template to be correct...

Spaces:

huggingface
/

inference-playground

Running

[FEEDBACK] Inference Playground

Inference Playground

About the Inference Playground:

Upcoming Features:

User input

AI output