davidbrandfonbrener commited on
Commit
6d38ed3
1 Parent(s): 1307814

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -2,6 +2,9 @@
2
 
3
  See accompanying code at: https://github.com/davidbrandfonbrener/color-filter-olmo
4
 
 
 
 
5
 
6
  To download the data, we recommend using the huggingface-cli.
7
 
@@ -18,10 +21,10 @@ If you only want to download some files (e.g. just the models), use the cli. For
18
  If you use this code in your research, please cite the following paper:
19
 
20
  ```bibtex
21
- @inproceedings{,
22
- title={},
23
- author={},
24
- booktitle={},
25
- year={},
26
  }
27
  ```
 
2
 
3
  See accompanying code at: https://github.com/davidbrandfonbrener/color-filter-olmo
4
 
5
+ If you only want to download the filtered, untokenized data, see: https://huggingface.co/datasets/davidbrandfonbrener/color-filtered-c4
6
+
7
+ ## Usage
8
 
9
  To download the data, we recommend using the huggingface-cli.
10
 
 
21
  If you use this code in your research, please cite the following paper:
22
 
23
  ```bibtex
24
+ @article{brandfonbrener2024color,
25
+ title={CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training},
26
+ author={Brandfonbrener, David and Zhang, Hanlin and Kirsch, Andreas and Schwarz, Jonathan Richard and Kakade, Sham M},
27
+ journal={arXiv preprint arXiv:XXXX.XXXXX},
28
+ year={2024}
29
  }
30
  ```