See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!
โ๏ธ Computed with the great dataset maxiw/hf-posts โ๏ธ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!
The Synthetic Data Generator now directly integrates with Argilla, so you can generate and curate your own high-quality datasets from pure natural language!
Up next -> include dataset generation for text classification. Other suggestions? Let us know.
You can now build a custom text classifier without days of human labeling!
๐ LLMs work reasonably well as text classifiers. ๐ They are expensive to run at scale and their performance drops in specialized domains.
๐ Purpose-built classifiers have low latency and can potentially run on CPU. ๐ They require labeled training data.
Combine the best of both worlds: the automatic labeling capabilities of LLMs and the high-quality annotations from human experts to train and deploy a specialized model.
Import any dataset from the Hub and configure your labeling tasks without needing any code!
Really excited about extending the Hugging Face Hub integration with many more streamlined features and workflows, and we would love to hear your feedback and ideas, so don't feel shy and reach out ๐ซถ๐ฝ
You can now build a custom text classifier without days of human labeling!
๐ LLMs work reasonably well as text classifiers. ๐ They are expensive to run at scale and their performance drops in specialized domains.
๐ Purpose-built classifiers have low latency and can potentially run on CPU. ๐ They require labeled training data.
Combine the best of both worlds: the automatic labeling capabilities of LLMs and the high-quality annotations from human experts to train and deploy a specialized model.
By far the coolest release of the day! > The Open LLM Leaderboard, most comprehensive suite for comparing Open LLMs on many benchmarks, just released a comparator tool that lets you dig into the detail of differences between any models.
Here's me checking how the new Llama-3.1-Nemotron-70B that we've heard so much compares to the original Llama-3.1-70B. ๐ค๐
The Synthetic Data Generator now directly integrates with Argilla, so you can generate and curate your own high-quality datasets from pure natural language!
Up next -> include dataset generation for text classification. Other suggestions? Let us know.
Thursday 10 October 17:00 CEST, I will show a good way to get started with a text classification project on the Hugging Face Hub with Argilla and Setfit.
Why is argilla/FinePersonas-v0.1 great for synthetic data generation? It can be used to synthesise realistic and diverse data of the customer personas your company is interested in!