pytorch_model.bin

by jalcaideguindo - opened Apr 27, 2023

Apr 27, 2023

Hello, thanks for your collaboration.

I have got this error when trying to discharge the 4-bit model
OSError: TehVenom/oasst-sft-6-llama-33b-xor-MERGED-4bit-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

My code:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = 'TehVenom/oasst-sft-6-llama-33b-xor-MERGED-4bit-GPTQ'
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

TehVenom

Owner Apr 27, 2023

This is a 33B model that has been converted and quantized down to 4bit, using the GPTQ library, and thus is not natively supported by transformers.

Please see the GPTQ repository for code examples on how to run inference or how to quantize other models, or use an inference back end with existing 4bit support,
such as Oobabooga's Text Generation UI,
or OccamRazor's 4bit fork of KoboldAI:
https://github.com/0cc4m/KoboldAI

TehVenom

Owner Apr 27, 2023

Alternatively, if you have the necessary compute you can just use the native transformers version that runs at normal precision (fp16) without having to enact changes in your current code:
https://huggingface.co/TehVenom/oasst-sft-6-llama-33b-xor-MERGED-16bit

jalcaideguindo

Apr 28, 2023

Thanks for your quick response as well, I will try to load 16 bit model once and quantize it by a half.

TehVenom

Owner Apr 28, 2023

I believe BnB (bits and bytes) supports 8bit inference on most hardware, so that's also an option for you.

TehVenom changed discussion status to closed Apr 28, 2023

Paillat

May 9, 2023

Hey, just was wondering: I am rnning the model on https://github.com/0cc4m/KoboldAI with an nvidia a4500 wich has 24gb vram. Is that supposed to be sufficient since the model is 16gb large? because I am getting a RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 238551040 bytes. error on cpu, however I am loading it completely on gpu

TehVenom

Owner May 9, 2023

Hey, just was wondering: I am rnning the model on https://github.com/0cc4m/KoboldAI with an nvidia a4500 wich has 24gb vram. Is that supposed to be sufficient since the model is 16gb large? because I am getting a RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 238551040 bytes. error on cpu, however I am loading it completely on gpu

You're running out of CPU memory, i.e. RAM. Setup a swap space or get more RAM. The model loads up on RAM out of the disk, and only then gets offloaded into VRAM for inference.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment