pytorch_model.bin
Hello, thanks for your collaboration.
I have got this error when trying to discharge the 4-bit model
OSError: TehVenom/oasst-sft-6-llama-33b-xor-MERGED-4bit-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
My code:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = 'TehVenom/oasst-sft-6-llama-33b-xor-MERGED-4bit-GPTQ'
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
This is a 33B model that has been converted and quantized down to 4bit, using the GPTQ library, and thus is not natively supported by transformers.
Please see the GPTQ repository for code examples on how to run inference or how to quantize other models, or use an inference back end with existing 4bit support,
such as Oobabooga's Text Generation UI,
or OccamRazor's 4bit fork of KoboldAI:
https://github.com/0cc4m/KoboldAI
Alternatively, if you have the necessary compute you can just use the native transformers version that runs at normal precision (fp16) without having to enact changes in your current code:
https://huggingface.co/TehVenom/oasst-sft-6-llama-33b-xor-MERGED-16bit
Thanks for your quick response as well, I will try to load 16 bit model once and quantize it by a half.
I believe BnB (bits and bytes) supports 8bit inference on most hardware, so that's also an option for you.
Hey, just was wondering: I am rnning the model on https://github.com/0cc4m/KoboldAI with an nvidia a4500 wich has 24gb vram. Is that supposed to be sufficient since the model is 16gb large? because I am getting a RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 238551040 bytes.
error on cpu, however I am loading it completely on gpu
Hey, just was wondering: I am rnning the model on https://github.com/0cc4m/KoboldAI with an nvidia a4500 wich has 24gb vram. Is that supposed to be sufficient since the model is 16gb large? because I am getting a
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 238551040 bytes.
error on cpu, however I am loading it completely on gpu
You're running out of CPU memory, i.e. RAM. Setup a swap space or get more RAM. The model loads up on RAM out of the disk, and only then gets offloaded into VRAM for inference.