Indic Datasets List of text and voice datasets to train and finetune Indic LLMs ai4bharat/sangraha Viewer • Updated 12 days ago • 268M • 12.9k • 28 uonlp/CulturaX Viewer • Updated Jul 23 • 7.18B • 17.3k • 472 pary/hind_encorp Updated Jan 18 • 205 • 1 PleIAs/YouTube-Commons Updated Jun 26 • 898 • 314
Alignment Dataset English and other model alignment datasets. H-D-T/Buzz-8b-Large-v0.5 Text Generation • Updated May 14 • 33 • 29 allenai/WildChat-1M Viewer • Updated 16 days ago • 838k • 1.17k • 276 nvidia/ChatQA-Training-Data Viewer • Updated Jun 4 • 442k • 1.2k • 159 nvidia/ChatRAG-Bench Viewer • Updated May 24 • 34.6k • 1.33k • 97