Pretrained BART in Korean
This is pretrained BART model with multiple Korean Datasets.
I used multiple datasets for generalizing the model for both colloquial and written texts.
The training is supported by TPU Research Cloud program.
The script which is used to pre-train model is here.
When you use the reference API, you must wrap the sentence with [BOS]
and [EOS]
like below example.
[BOS] ์๋
ํ์ธ์? ๋ฐ๊ฐ์์~~ [EOS]
You can also test mask filling performance using [MASK]
token like this.
[BOS] [MASK] ๋จน์์ด? [EOS]
Benchmark
Dataset | KLUE NLI dev | NSMC test | QuestionPair test | KLUE TC dev | KLUE STS dev | KorSTS dev | HateSpeech dev | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | Acc | Acc | Acc | Acc | F1 | F1 | Pearson | Spearman | F1 | Pearson | Spearman | Bias Acc | Hate Acc |
Score | 0.639 | 0.8721 | 0.905 | 0.8551 | 0.8515 | 0.7406 | 0.7593 | 0.7551 | 0.7897 | 0.7269 | 0.7037 | 0.8068 | 0.5966 |
- The performance was measured using the notebooks here with colab.
Used Datasets
๋ชจ๋์ ๋ง๋ญ์น
- ์ผ์ ๋ํ ๋ง๋ญ์น 2020
- ๊ตฌ์ด ๋ง๋ญ์น
- ๋ฌธ์ด ๋ง๋ญ์น
- ์ ๋ฌธ ๋ง๋ญ์น
AIhub
- ๊ฐ๋ฐฉ๋ฐ์ดํฐ ์ ๋ฌธ๋ถ์ผ๋ง๋ญ์น
- ๊ฐ๋ฐฉ๋ฐ์ดํฐ ํ๊ตญ์ด๋ํ์์ฝ
- ๊ฐ๋ฐฉ๋ฐ์ดํฐ ๊ฐ์ฑ ๋ํ ๋ง๋ญ์น
- ๊ฐ๋ฐฉ๋ฐ์ดํฐ ํ๊ตญ์ด ์์ฑ
- ๊ฐ๋ฐฉ๋ฐ์ดํฐ ํ๊ตญ์ด SNS
์ธ์ข ๋ง๋ญ์น
- Downloads last month
- 40
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.