Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,11 @@ pinned: false
|
|
12 |
|
13 |
We are a non-profit research lab focused on the interpretability and evaluation of Korean language models. Our mission is to advance the field with insightful benchmarks and tools. Below is an overview of our projects.
|
14 |
|
|
|
|
|
|
|
|
|
|
|
15 |
## Evaluation Benchmarks
|
16 |
- **HAE_RAE_BENCH Series**:
|
17 |
- [HAE_RAE_BENCH_1.0](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.0): An evaluation suite for Korean knowledge. See [paper](https://arxiv.org/abs/2309.02706) for further information.
|
|
|
12 |
|
13 |
We are a non-profit research lab focused on the interpretability and evaluation of Korean language models. Our mission is to advance the field with insightful benchmarks and tools. Below is an overview of our projects.
|
14 |
|
15 |
+
## High-Quality Korean Corpora
|
16 |
+
- [Korean WebText](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-WEBTEXT) : A collection of 2B tokens of Korean text collected from the web.
|
17 |
+
- [Korean SyntheticText](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-SyntheticText-1.5B) : A collection of 1.5B tokens of Korean text synthetically generated.
|
18 |
+
|
19 |
+
|
20 |
## Evaluation Benchmarks
|
21 |
- **HAE_RAE_BENCH Series**:
|
22 |
- [HAE_RAE_BENCH_1.0](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.0): An evaluation suite for Korean knowledge. See [paper](https://arxiv.org/abs/2309.02706) for further information.
|