量化需要使用A100才能完成实验。
原来的大模型:chenshake/Llama-2-7b-chat-hf
转换过程:quantize_llama-2-7b-chat_with_autogptq
目的用来学习。量化后,模型从13G,变成4g左右。
推理的时候,就不需要A100,使用T4就可以。
- Downloads last month
- 15
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.