Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ This organization hosts the π· FineWeb datasets, a collection of text datasets
|
|
12 |
|
13 |
The creation of π· FineWeb involved careful processing and filtering of large amounts of web data with the aim of lowering the barriers to entry to anyone intending to pretrain high-performance large language models.
|
14 |
|
15 |
-
All code and artefacts needed for reproduction are public and built on top of open source libraries,
|
16 |
|
17 |
|
18 |
_Currently releasing v1_
|
|
|
12 |
|
13 |
The creation of π· FineWeb involved careful processing and filtering of large amounts of web data with the aim of lowering the barriers to entry to anyone intending to pretrain high-performance large language models.
|
14 |
|
15 |
+
All code and artefacts needed for reproduction are public and built on top of open source libraries, such as the π€ libraries [`datatrove`](https://github.com/huggingface/datatrove/), [`nanotron`](https://github.com/huggingface/nanotron/) or [`lighteval`](https://github.com/huggingface/lighteval/).
|
16 |
|
17 |
|
18 |
_Currently releasing v1_
|