Xiaosen Zheng

xszheng2020

AI & ML interests

Data-Centric AI and AI Safety.

Organizations

xszheng2020's activity

upvoted an article 21 days ago
view article
Article

SmolLM - blazingly fast and remarkably powerful

258
upvoted an article 29 days ago
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

65
upvoted 2 articles 4 months ago
view article
Article

How NuminaMath Won the 1st AIMO Progress Prize

99
view article
Article

RegMix: Data Mixture as Regression for Language Model Pre-training

10