davidbrandfonbrener
commited on
Commit
•
6d38ed3
1
Parent(s):
1307814
Update README.md
Browse files
README.md
CHANGED
@@ -2,6 +2,9 @@
|
|
2 |
|
3 |
See accompanying code at: https://github.com/davidbrandfonbrener/color-filter-olmo
|
4 |
|
|
|
|
|
|
|
5 |
|
6 |
To download the data, we recommend using the huggingface-cli.
|
7 |
|
@@ -18,10 +21,10 @@ If you only want to download some files (e.g. just the models), use the cli. For
|
|
18 |
If you use this code in your research, please cite the following paper:
|
19 |
|
20 |
```bibtex
|
21 |
-
@
|
22 |
-
title={},
|
23 |
-
author={},
|
24 |
-
|
25 |
-
year={}
|
26 |
}
|
27 |
```
|
|
|
2 |
|
3 |
See accompanying code at: https://github.com/davidbrandfonbrener/color-filter-olmo
|
4 |
|
5 |
+
If you only want to download the filtered, untokenized data, see: https://huggingface.co/datasets/davidbrandfonbrener/color-filtered-c4
|
6 |
+
|
7 |
+
## Usage
|
8 |
|
9 |
To download the data, we recommend using the huggingface-cli.
|
10 |
|
|
|
21 |
If you use this code in your research, please cite the following paper:
|
22 |
|
23 |
```bibtex
|
24 |
+
@article{brandfonbrener2024color,
|
25 |
+
title={CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training},
|
26 |
+
author={Brandfonbrener, David and Zhang, Hanlin and Kirsch, Andreas and Schwarz, Jonathan Richard and Kakade, Sham M},
|
27 |
+
journal={arXiv preprint arXiv:XXXX.XXXXX},
|
28 |
+
year={2024}
|
29 |
}
|
30 |
```
|