Spaces:

HAERAE-HUB
/

README

Running

amphora commited on Jul 19

Commit

1b2b3f1

•

1 Parent(s): 28bc9a7

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -12,6 +12,11 @@ pinned: false
 We are a non-profit research lab focused on the interpretability and evaluation of Korean language models. Our mission is to advance the field with insightful benchmarks and tools. Below is an overview of our projects.
 ## Evaluation Benchmarks
 - **HAE_RAE_BENCH Series**:
   - [HAE_RAE_BENCH_1.0](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.0): An evaluation suite for Korean knowledge. See [paper](https://arxiv.org/abs/2309.02706) for further information.

 We are a non-profit research lab focused on the interpretability and evaluation of Korean language models. Our mission is to advance the field with insightful benchmarks and tools. Below is an overview of our projects.
+## High-Quality Korean Corpora
+- [Korean WebText](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-WEBTEXT) : A collection of 2B tokens of Korean text collected from the web.
+- [Korean SyntheticText](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-SyntheticText-1.5B) : A collection of 1.5B tokens of Korean text synthetically generated.
 ## Evaluation Benchmarks
 - **HAE_RAE_BENCH Series**:
   - [HAE_RAE_BENCH_1.0](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.0): An evaluation suite for Korean knowledge. See [paper](https://arxiv.org/abs/2309.02706) for further information.