Edit model card

量化需要使用A100才能完成实验。

原来的大模型:chenshake/Llama-2-7b-chat-hf

转换过程:quantize_llama-2-7b-chat_with_autogptq

目的用来学习。量化后,模型从13G,变成4g左右。

推理的时候,就不需要A100,使用T4就可以。

推理测试

Downloads last month
15
Safetensors
Model size
1.13B params
Tensor type
I32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.