Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
davanstrienΒ 
posted an update May 7
Post
2573
Introducing CosmoChat, a multiturn chat dataset based on Cosmopedia that I'm working on in the open on the Hub.

🎯 Goals:
πŸ’¬ Create multi-turn chats seeded from Cosmopedia
πŸŽ“ Customize questions for different audience levels
πŸ” Evaluate the model's ability to elaborate and clarify
πŸ€“ (I want to learn more about creating valuable synthetic datasets, and I learn best by doing stuff rather than reading stuff).

Cosmochat is created using the excellent distilabel library.

πŸ”— Explore the current version of the dataset: davanstrien/cosmochat
πŸ“ Read more: https://huggingface.co/blog/davanstrien/cosmochat

Awesome work! Here is a tool I crafted and use myself for synthetic datasets, maybe it could be of some use for your project: https://github.com/severian42/Vodalus-Expert-LLM-Forge

Β·

Thanks, I'm currently using distilabel, which is working very well for me, but I will take a look at your tool!