|
--- |
|
license: apache-2.0 |
|
--- |
|
## Model |
|
base_model : yanolja/KoSOLAR-10.7B-v0.2 |
|
## Dataset |
|
* 공개 데이터 수집 |
|
* Deduplicating Training Data Makes Language Models Better 알고리즘 활용 |
|
|
|
## Code |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
model_name = "jingyeom/KoSoLAR-10.7B-v0.2_1.3_dedup" |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
``` |
|
|
|
## Benchmark |
|
**[Ko-LLM-Leaderboard](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard)** |
|
|