sudy-super
commited on
Commit
•
fe24a97
1
Parent(s):
34d3b55
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,82 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- llm-jp/oasst1-21k-ja
|
5 |
+
- llm-jp/oasst2-33k-ja
|
6 |
+
- HachiML/Hachi-Alpaca
|
7 |
+
- Aratako/Rosebleu-1on1-Dialogues-RP
|
8 |
+
- baobab-trees/wikipedia-human-retrieval-ja
|
9 |
+
- aixsatoshi/Longcontext-aozora-summary
|
10 |
+
- aixsatoshi/Longcontext-aozora-instruction
|
11 |
+
- kunishou/amenokaku-code-instruct
|
12 |
+
- HachiML/Evol-hh-rlhf-gen3-1k
|
13 |
+
- Kendamarron/jimba-wiki-instruction-calm3
|
14 |
+
- Manual-Dataset-Creation-Project/Malum-130
|
15 |
+
- sudy-super/CoTangent
|
16 |
+
- minnade/chat-daily
|
17 |
+
---
|
18 |
+
# Yamase-12B
|
19 |
+
### Description
|
20 |
+
Yamase-12Bは、[Mistral-Nemo-Instruct](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)に対して日本語能力の向上を目的として約11万件のデータでFine-tuningを行ったモデルです。
|
21 |
+
|
22 |
+
### Usage
|
23 |
+
```python
|
24 |
+
import torch
|
25 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
26 |
+
B_INST, E_INST = "[INST]", "[/INST]"
|
27 |
+
text = "旅行に行くと高層ビルがたくさん建っていました。これからどのようなことが推測できますか?"
|
28 |
+
model_name = "sudy-super/Yamase-12B"
|
29 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
30 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
|
31 |
+
if torch.cuda.is_available():
|
32 |
+
model = model.to("cuda")
|
33 |
+
prompt = "{bos_token}{b_inst}{prompt}{e_inst}".format(
|
34 |
+
bos_token=tokenizer.bos_token,
|
35 |
+
b_inst=B_INST,
|
36 |
+
prompt=text,
|
37 |
+
e_inst=E_INST,
|
38 |
+
)
|
39 |
+
with torch.no_grad():
|
40 |
+
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
|
41 |
+
output_ids = model.generate(
|
42 |
+
token_ids.to(model.device),
|
43 |
+
max_new_tokens=256,
|
44 |
+
pad_token_id=tokenizer.pad_token_id,
|
45 |
+
eos_token_id=tokenizer.eos_token_id,
|
46 |
+
)
|
47 |
+
output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=True)
|
48 |
+
print(output)
|
49 |
+
"""
|
50 |
+
|
51 |
+
"""
|
52 |
+
```
|
53 |
+
|
54 |
+
### Chat Template
|
55 |
+
```
|
56 |
+
<s>[INST]明日の東京の天気は何ですか?[/INST]晴れです。</s>[INST]大阪はどうですか?[/INST]雨です。</s>
|
57 |
+
```
|
58 |
+
|
59 |
+
|
60 |
+
### Hyperparameter
|
61 |
+
```
|
62 |
+
num_train_epochs: 5
|
63 |
+
per_device_train_batch_size: 2
|
64 |
+
per_device_eval_batch_size: 2
|
65 |
+
gradient_accumulation_steps: 128
|
66 |
+
learning_rate: 2e-5
|
67 |
+
lr_scheduler_kwargs={"min_lr": 2e-6}
|
68 |
+
lr_scheduler_type: "cosine_with_min_lr"
|
69 |
+
warmup_ratio: 0.1
|
70 |
+
dataloader_pin_memory: True
|
71 |
+
gradient_checkpointing: True
|
72 |
+
bf16: True
|
73 |
+
optim: "adamw_torch_fused"
|
74 |
+
weight_decay: 0.0
|
75 |
+
max_grad_norm: 1.0
|
76 |
+
adam_beta2: 0.99
|
77 |
+
label_smoothing_factor: 0.0
|
78 |
+
seed: 42
|
79 |
+
```
|
80 |
+
|
81 |
+
### Author
|
82 |
+
[Rakuto Suda](https://huggingface.co/sudy-super)
|