Yacine Jernite

yjernite

AI & ML interests

Technical, community, and regulatory tools of AI governance @HuggingFace

Recent Activity

liked a Space about 9 hours ago
akhaliq/olmo-anychat
liked a dataset 1 day ago
CohereForAI/include-base-44
View all activity

Articles

Organizations

Hugging Face's profile picture Society & Ethics's profile picture BigScience Workshop's profile picture GEM benchmark's profile picture BigScience Catalogue Data's profile picture BigScience Data's profile picture HF Task Exploration's profile picture HuggingFaceM4's profile picture BigCode's profile picture Stable Bias's profile picture Hugging Face H4's profile picture πŸ€— H4 Community's profile picture BigCode Data's profile picture Stable Diffusion Bias Eval's profile picture Librarian Bots's profile picture Blog-explorers's profile picture Evaluating Social Impacts of Generative AI's profile picture llm-values's profile picture Bias Leaderboard Development's profile picture AI Energy Score Project's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture

yjernite's activity

liked a Space about 9 hours ago
upvoted an article 8 days ago
view article
Article

Let’s make a generation of amazing image generation models

By burtenshaw β€’
β€’ 33
reacted to cfahlgren1's post with ❀️ 13 days ago
view post
Post
2982
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
Β·
reacted to fdaudens's post with πŸ”₯ 21 days ago
view post
Post
1832
Fascinating point from @thomwolf at Web Summit: AI misuse (deepfakes, fake news) is actually easier to make with closed models, not with open-source ones.

This challenges the common narrative that open-source AI is inherently more dangerous. The reality is more nuanced - while we may think open source is technically easier to misuse, closed models' accessibility and product-focused design appear to be driving more actual harm.

Important context for current AI safety discussions and regulation debates.

Do you agree? πŸ‘‡
  • 1 reply
Β·
upvoted an article 21 days ago
view article
Article

Releasing the largest multilingual open pretraining dataset

By Pclanglais β€’
β€’ 97