davidberenstein1957 (David Berenstein)

replied to m-ric's post 1 day ago

sweeet 25th place 😎

reacted to m-ric's post with 👀❤️🔥 1 day ago

Post

3409

𝗧𝗵𝗲 𝗻𝗲𝘅𝘁 𝗯𝗶𝗴 𝘀𝗼𝗰𝗶𝗮𝗹 𝗻𝗲𝘁𝘄𝗼𝗿𝗸 𝗶𝘀 𝗻𝗼𝘁 🦋, 𝗶𝘁'𝘀 𝗛𝘂𝗯 𝗣𝗼𝘀𝘁𝘀! [INSERT STONKS MEME WITH LASER EYES]

See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!

⚙️ Computed with the great dataset maxiw/hf-posts
⚙️ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!

cc @merve who's far in front of me

9 replies

·

reacted to their post with 👀 13 days ago

Post

676

The Synthetic Data Generator now directly integrates with Argilla, so you can generate and curate your own high-quality datasets from pure natural language!

Up next -> include dataset generation for text classification.
Other suggestions? Let us know.

Space: argilla/synthetic-data-generator

reacted to their post with ➕❤️ 13 days ago

Post

1682

You can now build a custom text classifier without days of human labeling!

👍 LLMs work reasonably well as text classifiers.
👎 They are expensive to run at scale and their performance drops in specialized domains.

👍 Purpose-built classifiers have low latency and can potentially run on CPU.
👎 They require labeled training data.

Combine the best of both worlds: the automatic labeling capabilities of LLMs and the high-quality annotations from human experts to train and deploy a specialized model.

Blog: https://huggingface.co/blog/sdiazlor/custom-text-classifier-ai-human-feedback

posted an update 13 days ago

Post

2056

Import any dataset from the Hub and configure your labeling tasks without needing any code!

Really excited about extending the Hugging Face Hub integration with many more streamlined features and workflows, and we would love to hear your feedback and ideas, so don't feel shy and reach out 🫶🏽

https://huggingface.co/blog/argilla-ui-hub

reacted to their post with 👀🚀🤗 13 days ago

Post

3062

Vector Search (most) datasets on the Hugging Face Hub 🔦

Powered by: Polars, DuckDB, Gradio and model2vec (lightning-fast embeddings by Stéphan Tulkens).

Should work fast enough for datasets up to 100K.

davidberenstein1957/vectorsearch-hub-datasets

posted an update 13 days ago

Post

3062

Vector Search (most) datasets on the Hugging Face Hub 🔦

Powered by: Polars, DuckDB, Gradio and model2vec (lightning-fast embeddings by Stéphan Tulkens).

Should work fast enough for datasets up to 100K.

davidberenstein1957/vectorsearch-hub-datasets

posted an update 19 days ago

Post

1737

⚡️ LLMs do a good job at NER, but don't you want to do learn how to do more with less?

Go from 🐢 -> 🐇

If you want a small model to perform well on your problem, you need to fine-tune it.

Bootstrap with a teacher model.

Correct potential mistakes to get high-quality data.

Fine-tune your student model

Go more accurate and more efficient.

Free signup: https://lu.ma/zx2t7irs

posted an update about 1 month ago

Post

1682

You can now build a custom text classifier without days of human labeling!

👍 LLMs work reasonably well as text classifiers.
👎 They are expensive to run at scale and their performance drops in specialized domains.

👍 Purpose-built classifiers have low latency and can potentially run on CPU.
👎 They require labeled training data.

Combine the best of both worlds: the automatic labeling capabilities of LLMs and the high-quality annotations from human experts to train and deploy a specialized model.

Blog: https://huggingface.co/blog/sdiazlor/custom-text-classifier-ai-human-feedback

reacted to nroggendorff's post with 😎 about 1 month ago

Post

1251

100 followers? When did that happen?

reacted to m-ric's post with 👀 about 1 month ago

Post

1685

By far the coolest release of the day!
> The Open LLM Leaderboard, most comprehensive suite for comparing Open LLMs on many benchmarks, just released a comparator tool that lets you dig into the detail of differences between any models.

Here's me checking how the new Llama-3.1-Nemotron-70B that we've heard so much compares to the original Llama-3.1-70B. 🤔🔎

Try it out here 👉 open-llm-leaderboard/comparator

2 replies

·

posted an update about 1 month ago

Post

676

The Synthetic Data Generator now directly integrates with Argilla, so you can generate and curate your own high-quality datasets from pure natural language!

Up next -> include dataset generation for text classification.
Other suggestions? Let us know.

Space: argilla/synthetic-data-generator

posted an update about 1 month ago

Post

2495

Don't use an LLM when you can use a much cheaper model.

The problem is that no one tells you how to actually do it.

Just picking a pre-trained model (e.g., BERT) and throwing it at your problem won't work!

If you want a small model to perform well on your problem, you need to fine-tune it.

And to fine-tune it, you need data.

The good news is that you don't need a lot of data but instead high-quality data for your specific problem.

In the latest livestream, I showed you guys how to get started with Argilla on the Hub! Hope to see you at the next one.

https://www.youtube.com/watch?v=BEe7shiG3rY

posted an update about 1 month ago

Post

1213

Thursday 10 October 17:00 CEST, I will show a good way to get started with a text classification project on the Hugging Face Hub with Argilla and Setfit.

Signup here: https://lu.ma/31mecp34

reacted to their post with 🔥 about 2 months ago

Post

1134

Why is argilla/FinePersonas-v0.1 great for synthetic data generation? It can be used to synthesise realistic and diverse data of the customer personas your company is interested in!

Dataset: argilla/FinePersonas-v0.1
Example usage: https://distilabel.argilla.io/dev/sections/pipeline_samples/examples/fine_personas_social_network/

1 reply

·

David Berenstein

AI & ML interests

Articles

How to build a custom text classifier without days of human labeling

How to optimize your data labelling project with custom interfaces

To what extent are we responsible for our content and how to create safer Spaces?

Data Is Better Together: A Look Back and Forward

Organizations

davidberenstein1957's activity