File size: 4,852 Bytes
6d94207 f508402 2b2da63 d2e5e79 2b2da63 6d94207 23a3cf3 b8e6c1e f508402 b8e6c1e 31103ac d0600a6 b8e6c1e 521eed7 b73578f 2ba9299 e8f725f 2b2da63 f50fb30 2b2da63 e8f725f f50fb30 25d1679 b3e0d2b e8f725f f508402 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
library_name: transformers
datasets:
- shareAI/ShareGPT-Chinese-English-90k
- FreedomIntelligence/ShareGPT-CN
language:
- zh
pipeline_tag: question-answering
tags:
- chat
- llm
- llama2
- chatgpt
---
- Github:https://github.com/CrazyBoyM/llama2-Chinese-chat
更新:
- 2023-7-19 首个llama2 13b中文对话版本放出。
- 2023-07-23 完成第2个epoch训练放出,测试有更好的对话体验
- 2023-08-03 分支版本:bimoGPT放出,拥有自我身份认知、不错的代码问答能力,下载地址:https://huggingface.co/shareAI/bimoGPT-llama2-13b
- 2023-08-21 更新世界模型排名榜,超越某号称“中文Llama2官方”社区的收费模型十多个名次。
完整合并后文件下载:https://www.codewithgpu.com/m/file/llama2-13b-Chinese-chat
- 训练用数据集:https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k
- llama2训练交流QQ群:443064756
项目在中文sharegpt数据集上训练得到的llama2 Chinese chat 13b,为减轻文件大小负担这里只放出了adapter的权重
请拉取https://huggingface.co/TheBloke/Llama-2-13B-fp16 作为基础权重,使用如下脚步执行合并得到可工作的总权重:
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name_or_path = '/data/TheBloke/Llama-2-13B-fp16'
adapter_name_or_path = '/data/llama2-13b-Chinese-chat'
save_path = '/data/llama2-13b-Chinese-chat_v1'
tokenizer = AutoTokenizer.from_pretrained(
model_name_or_path,
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
trust_remote_code=True,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
device_map='auto'
)
print("load model success")
model = PeftModel.from_pretrained(model, adapter_name_or_path)
print("load adapter success")
model = model.merge_and_unload()
print("merge success")
tokenizer.save_pretrained(save_path)
model.save_pretrained(save_path)
print("save done.")
```
合并后,体验对话:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
def main():
model_name = '/data/llama2-13b-Chinese-chat_v1'
device = 'cuda'
max_new_tokens = 500 # 每轮对话最多生成多少个token
history_max_len = 2000 # 模型记忆的最大token长度
top_p = 0.9
temperature = 0.35 # 越大模型越浪
repetition_penalty = 1.2 # 如果模型出现重复说话可以调节该系数
# 加载模型
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
device_map='auto'
).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True,
# llama不支持fast
use_fast=False if model.config.model_type == 'llama' else True
)
# 记录所有历史记录
history_token_ids = tokenizer('<s>', return_tensors="pt").input_ids
# 开始对话
user_input = input('User:')
while True:
user_input = '{}</s>'.format(user_input)
user_input_ids = tokenizer(user_input, return_tensors="pt", add_special_tokens=False).input_ids
history_token_ids = torch.concat((history_token_ids, user_input_ids), dim=1)
model_input_ids = history_token_ids[:, -history_max_len:].to(device)
with torch.no_grad():
outputs = model.generate(
input_ids=model_input_ids, max_new_tokens=max_new_tokens, do_sample=True, top_p=top_p,
temperature=temperature, repetition_penalty=repetition_penalty, eos_token_id=tokenizer.eos_token_id
)
model_input_ids_len = model_input_ids.size(1)
response_ids = outputs[:, model_input_ids_len:]
history_token_ids = torch.concat((history_token_ids, response_ids.cpu()), dim=1)
response = tokenizer.batch_decode(response_ids)
print("Bot:" + response[0].strip().replace('</s>', ""))
user_input = input('User:')
if __name__ == '__main__':
main()
```
推荐继续二次训练以针对性调优对话效果~
## Training procedure
The following `bitsandbytes` quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: float16
### Framework versions
- PEFT 0.4.0.dev0
训练1个epoch,loss 0.9,实测用中文对话体验优于baichuan13b(仅主观感受)。还有很大潜力,建议作为底座把文件拉回去继续调优。
感谢:
- LLaMA2
- Firefly项目
- shareGPT中文数据集的建设者们 |