amphora commited on
Commit
1b2b3f1
1 Parent(s): 28bc9a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -12,6 +12,11 @@ pinned: false
12
 
13
  We are a non-profit research lab focused on the interpretability and evaluation of Korean language models. Our mission is to advance the field with insightful benchmarks and tools. Below is an overview of our projects.
14
 
 
 
 
 
 
15
  ## Evaluation Benchmarks
16
  - **HAE_RAE_BENCH Series**:
17
  - [HAE_RAE_BENCH_1.0](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.0): An evaluation suite for Korean knowledge. See [paper](https://arxiv.org/abs/2309.02706) for further information.
 
12
 
13
  We are a non-profit research lab focused on the interpretability and evaluation of Korean language models. Our mission is to advance the field with insightful benchmarks and tools. Below is an overview of our projects.
14
 
15
+ ## High-Quality Korean Corpora
16
+ - [Korean WebText](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-WEBTEXT) : A collection of 2B tokens of Korean text collected from the web.
17
+ - [Korean SyntheticText](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-SyntheticText-1.5B) : A collection of 1.5B tokens of Korean text synthetically generated.
18
+
19
+
20
  ## Evaluation Benchmarks
21
  - **HAE_RAE_BENCH Series**:
22
  - [HAE_RAE_BENCH_1.0](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.0): An evaluation suite for Korean knowledge. See [paper](https://arxiv.org/abs/2309.02706) for further information.