Question Answering
Transformers
Chinese
chat
llm
llama2
chatgpt
Inference Endpoints
File size: 4,852 Bytes
6d94207
f508402
2b2da63
d2e5e79
2b2da63
 
 
 
 
 
 
 
 
6d94207
23a3cf3
 
b8e6c1e
f508402
b8e6c1e
31103ac
d0600a6
 
 
b8e6c1e
521eed7
b73578f
2ba9299
 
 
e8f725f
2b2da63
 
f50fb30
 
 
 
 
2b2da63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e8f725f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f50fb30
25d1679
b3e0d2b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e8f725f
 
 
 
 
f508402
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
library_name: transformers
datasets:
- shareAI/ShareGPT-Chinese-English-90k
- FreedomIntelligence/ShareGPT-CN
language:
- zh
pipeline_tag: question-answering
tags:
- chat
- llm
- llama2
- chatgpt
---
- Github:https://github.com/CrazyBoyM/llama2-Chinese-chat
  
更新:
- 2023-7-19 首个llama2 13b中文对话版本放出。
- 2023-07-23 完成第2个epoch训练放出,测试有更好的对话体验
- 2023-08-03 分支版本:bimoGPT放出,拥有自我身份认知、不错的代码问答能力,下载地址:https://huggingface.co/shareAI/bimoGPT-llama2-13b
- 2023-08-21 更新世界模型排名榜,超越某号称“中文Llama2官方”社区的收费模型十多个名次。



完整合并后文件下载:https://www.codewithgpu.com/m/file/llama2-13b-Chinese-chat

- 训练用数据集:https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k
- llama2训练交流QQ群:443064756
  
项目在中文sharegpt数据集上训练得到的llama2 Chinese chat 13b,为减轻文件大小负担这里只放出了adapter的权重  
请拉取https://huggingface.co/TheBloke/Llama-2-13B-fp16 作为基础权重,使用如下脚步执行合并得到可工作的总权重:  

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name_or_path = '/data/TheBloke/Llama-2-13B-fp16'
adapter_name_or_path = '/data/llama2-13b-Chinese-chat'
save_path = '/data/llama2-13b-Chinese-chat_v1'

tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path,
    trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    device_map='auto'
)
print("load model success")
model = PeftModel.from_pretrained(model, adapter_name_or_path)
print("load adapter success")
model = model.merge_and_unload()
print("merge success")

tokenizer.save_pretrained(save_path)
model.save_pretrained(save_path)
print("save done.")
```
合并后,体验对话:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch


def main():
    model_name = '/data/llama2-13b-Chinese-chat_v1'

    device = 'cuda'
    max_new_tokens = 500    # 每轮对话最多生成多少个token
    history_max_len = 2000  # 模型记忆的最大token长度
    top_p = 0.9
    temperature = 0.35 # 越大模型越浪
    repetition_penalty = 1.2 # 如果模型出现重复说话可以调节该系数

    # 加载模型
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        torch_dtype=torch.float16,
        device_map='auto'
    ).to(device).eval()
    tokenizer = AutoTokenizer.from_pretrained(
        model_name,
        trust_remote_code=True,
        # llama不支持fast
        use_fast=False if model.config.model_type == 'llama' else True
    )
    # 记录所有历史记录
    history_token_ids = tokenizer('<s>', return_tensors="pt").input_ids

    # 开始对话
    user_input = input('User:')
    while True:
        user_input = '{}</s>'.format(user_input)
        user_input_ids = tokenizer(user_input, return_tensors="pt", add_special_tokens=False).input_ids
        history_token_ids = torch.concat((history_token_ids, user_input_ids), dim=1)
        model_input_ids = history_token_ids[:, -history_max_len:].to(device)
        with torch.no_grad():
            outputs = model.generate(
                input_ids=model_input_ids, max_new_tokens=max_new_tokens, do_sample=True, top_p=top_p,
                temperature=temperature, repetition_penalty=repetition_penalty, eos_token_id=tokenizer.eos_token_id
            )
        model_input_ids_len = model_input_ids.size(1)
        response_ids = outputs[:, model_input_ids_len:]
        history_token_ids = torch.concat((history_token_ids, response_ids.cpu()), dim=1)
        response = tokenizer.batch_decode(response_ids)
        print("Bot:" + response[0].strip().replace('</s>', ""))
        user_input = input('User:')


if __name__ == '__main__':
    main()

```
推荐继续二次训练以针对性调优对话效果~ 
## Training procedure


The following `bitsandbytes` quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: float16
### Framework versions


- PEFT 0.4.0.dev0
训练1个epoch,loss 0.9,实测用中文对话体验优于baichuan13b(仅主观感受)。还有很大潜力,建议作为底座把文件拉回去继续调优。

感谢:  
- LLaMA2  
- Firefly项目  
- shareGPT中文数据集的建设者们