--- license: apache-2.0 --- 这是基于Auto-GPTQ框架的量化模型,模型选取为huatuoGPT2-7B,这是一个微调模型,基底模型为百川-7B。 参数说明: 原模型大小:16GB,量化后模型大小:5GB 推理准确度尚未测试,请谨慎使用 量化过程中,校准数据采用微调训练集Medical Fine-tuning Instruction (GPT-4)。 使用示例: 确保你安装了bitsandbytes ``` pip install bitsandbytes ``` 确保你安装了auto-gptq !git clone https://github.com/AutoGPTQ/AutoGPTQ cd AutoGPTQ !pip install -e . ``` import torch from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation.utils import GenerationConfig tokenizer = AutoTokenizer.from_pretrained("jiangchengchengNLP/huatuo_AutoGPTQ_7B4bits", use_fast=True, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("jiangchengchengNLP/huatuo_AutoGPTQ_7B4bits", device_map="auto", torch_dtype="auto", trust_remote_code=True) model.generation_config = GenerationConfig.from_pretrained("jiangchengchengNLP/huatuo_AutoGPTQ_7B4bits") messages = [] messages.append({"role": "user", "content": "肚子疼怎么办?"}) response = model.HuatuoChat(tokenizer, messages) print(response) ``` 更多量化细节: 量化环境:双卡T4 校正规模:512 训练对 量化配置: ``` ntize_config = BaseQuantizeConfig( bits=4, # 4 or 8 group_size=128, damp_percent=0.01, desc_act=False, # set to False can significantly speed up inference but the perplexity may slightly bad static_groups=False, sym=True, true_sequential=True, model_name_or_path=None, model_file_base_name="model" ) ```