|
--- |
|
license: apache-2.0 |
|
language: |
|
- ko |
|
--- |
|
# MPTK-1B |
|
|
|
MPTK-1B๋ ํ๊ตญ์ด/์์ด์ฝ๋ ๋ฐ์ดํฐ์
์์ ํ์ต๋ 1.3B ํ๋ผ๋ฏธํฐ์ decoder-only transformer ์ธ์ด๋ชจ๋ธ์
๋๋ค. |
|
|
|
์ด ๋ชจ๋ธ์ ๊ตฌ๊ธ์ [TPU Research Cloud(TRC)](https://sites.research.google/trc/about/)๋ฅผ ํตํด ์ง์๋ฐ์ Cloud TPU๋ก ํ์ต๋์์ต๋๋ค. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
๋ค๋ฅธ decoder-only transformer์์ ์ผ๋ถ ์์ ๋ ์ํคํ
์ฒ์ธ MPT๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํฉ๋๋ค. |
|
|
|
- [ALiBi (Attention with Linear Biases)](https://arxiv.org/abs/2108.12409)๋ฅผ ์ฌ์ฉํฉ๋๋ค |
|
- bias๋ฅผ ์ฌ์ฉํ์ง ์์ต๋๋ค. |
|
|
|
| Hyperparameter | Value | |
|
|-----------------|-------| |
|
| n_parameters | 1.3B | |
|
| n_layers | 24 | |
|
| n_heads | 16 | |
|
| d_model | 2048 | |
|
| vocab size | 50432 | |
|
| sequence length | 2048 | |
|
|
|
## Uses |
|
|
|
## How to Get Started with the Model |
|
|
|
fp16์ผ๋ก ์คํ ์ NaN์ด ๋ฐ์ํ ์ ์์ต๋๋ค. ๋ฐ๋ผ์ fp32 ํน์ bf16๋ก ์คํํ๊ธฐ๋ฅผ ๊ถ์ฅํฉ๋๋ค. |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("team-lucid/mptk-1b") |
|
model = AutoModelForCausalLM.from_pretrained("team-lucid/mptk-1b") |
|
|
|
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0') |
|
|
|
with torch.autocast('cuda', dtype=torch.bfloat16): |
|
print( |
|
pipe( |
|
'๋ํ๋ฏผ๊ตญ์ ์๋๋', |
|
max_new_tokens=100, |
|
do_sample=True, |
|
) |
|
) |
|
|
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
[OSCAR](https://oscar-project.org/), mC4, wikipedia, namuwiki ๋ฑ ํ๊ตญ์ด |
|
๋ฐ์ดํฐ์ [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), [The Stack](https://huggingface.co/datasets/bigcode/the-stack) |
|
์์ ์ผ๋ถ๋ฅผ ์ถ๊ฐํด ํ์ตํ์์ต๋๋ค. |
|
|
|
#### Training Hyperparameters |
|
|
|
| **Hyperparameter** | **Value** | |
|
|--------------------|------------| |
|
| Precision | bfloat16 | |
|
| Optimizer | Lion | |
|
| Learning rate | 2e-4 | |
|
| Batch size | 1024 | |
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_team-lucid__mptk-1b) |
|
|
|
| Metric | Value | |
|
|-----------------------|---------------------------| |
|
| Avg. | 17.88 | |
|
| ARC (25-shot) | 22.7 | |
|
| HellaSwag (10-shot) | 25.48 | |
|
| MMLU (5-shot) | 27.11 | |
|
| TruthfulQA (0-shot) | 0.0 | |
|
| Winogrande (5-shot) | 49.72 | |
|
| GSM8K (5-shot) | 0.0 | |
|
| DROP (3-shot) | 0.17 | |
|
|