File size: 1,689 Bytes
6bfaa8b b3056b6 6bfaa8b f7e7f92 6bfaa8b 52f82be 6bfaa8b 52f82be 6bfaa8b 52f82be 6bfaa8b f7e7f92 6bfaa8b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
license: apache-2.0
---
这是基于Auto-GPTQ框架的量化模型,模型选取为huatuoGPT2-7B,这是一个微调模型,基底模型为百川-7B。
参数说明:
原模型大小:16GB,量化后模型大小:5GB
推理准确度尚未测试,请谨慎使用
量化过程中,校准数据采用微调训练集Medical Fine-tuning Instruction (GPT-4)。
使用示例:
确保你安装了bitsandbytes
```
pip install bitsandbytes
```
```
确保你安装了auto-gptq
!git clone https://github.com/AutoGPTQ/AutoGPTQ
cd AutoGPTQ
!pip install -e .
```
```
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("jiangchengchengNLP/huatuo_AutoGPTQ_7B4bits", use_fast=True, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("jiangchengchengNLP/huatuo_AutoGPTQ_7B4bits", device_map="auto", torch_dtype="auto", trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained("jiangchengchengNLP/huatuo_AutoGPTQ_7B4bits")
messages = []
messages.append({"role": "user", "content": "肚子疼怎么办?"})
response = model.HuatuoChat(tokenizer, messages)
print(response)
```
更多量化细节:
量化环境:双卡T4
校正规模:512 训练对
量化配置:
```
ntize_config = BaseQuantizeConfig(
bits=4, # 4 or 8
group_size=128,
damp_percent=0.01,
desc_act=False, # set to False can significantly speed up inference but the perplexity may slightly bad
static_groups=False,
sym=True,
true_sequential=True,
model_name_or_path=None,
model_file_base_name="model"
)
```
|