Stefano Fiorucci PRO

anakin87

AI & ML interests

Contributing to Haystack LLM framework ๐Ÿ—๏ธ. Language Models: orchestration, post-training, synthetic data...

Recent Activity

Articles

Organizations

deepset's profile picture Blog-explorers's profile picture ZeroGPU Explorers's profile picture Hugging Face Discord Community's profile picture

anakin87's activity

replied to their post about 5 hours ago
New activity in anakin87/tulu-3-sft-mixture-with-language about 15 hours ago

[bot] Conversion to Parquet

#1 opened about 15 hours ago by parquet-converter
posted an update about 20 hours ago
view post
Post
714
Tulu 3 SFT Mixture by AllenAI is a massive, good, multilingual dataset for fine-tuning Language Models.

Unfortunately, it was missing the "language" column.

I added it using the good old fastText.

Check out the dataset here ๐Ÿ‘‰ anakin87/tulu-3-sft-mixture-with-language

  • 1 reply
ยท