Data, embedding, and index of MassiveDS by "Scaling Retrieval-Based Language Models with a Trillion-Token Datastore"
Rulin Shao
rulins
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
3 days ago
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented
LMs
updated
a dataset
6 days ago
OpenScholar/OpenScholar-DataStore-V3
updated
a dataset
8 days ago
rulins/pes2o_v3
Organizations
Collections
1
datasets
10
rulins/pes2o_v3
Viewer
•
Updated
•
150M
•
27
rulins/raw_data
Viewer
•
Updated
•
514M
•
2.62k
rulins/MassiveDS-1.4T
Updated
•
298
•
8
rulins/MassiveDS-1.4T-raw-data
Viewer
•
Updated
•
514M
•
336
•
6
rulins/mmlu_searched_results_from_massiveds
Viewer
•
Updated
•
33.5k
•
301
rulins/bright-expanded
Viewer
•
Updated
•
1.64M
•
314
rulins/bright-fork
Viewer
•
Updated
•
1.4M
•
123
rulins/MassiveDS-140B
Viewer
•
Updated
•
3.08M
•
4.2k
•
5
rulins/FineWeb-Edu-1MT
Viewer
•
Updated
•
1k
•
46
rulins/FineWeb-Edu-1BT
Viewer
•
Updated
•
665k
•
64