16-bit version?

#13

by saattrupdan - opened Feb 9

Feb 9

Do you have plans to upload a 16bit version of your model? That would make it a lot more accessible for inference on smaller GPUs.

shanearora

Ai2 org Feb 9

@dirkgr Can correct me but I am not aware of such plans. You should be able to load the model and then call, say,model = model.bfloat16() to convert the weights to 16 bits. You may need to load the model on the CPU, downcast to 16 bits, and then move the model to GPU. An alternative with a higher memory requirements (that we used while training the model) is to use torch.autocast with a 16 bit type.

saattrupdan

Feb 9

@shanearora I completely get that, but if I’m loading in the model with vLLM then I get OOM errors before any conversion can happen. I guess I could convert it and upload it myself, but it would just be a bit more official if you all had a 16bit version uploaded. Same thing with quantised and GGUF versions for that matter, as these are required by other applications like llama.cpp and LM Studio. But it’s up to you - feel free to close this issue if you’re not planning on it 🙂

shanearora

Ai2 org Feb 9

@akshitab Do you know about OLMo plans in relation to vLLM?

akshitab

Ai2 org Feb 10

vLLM integration for OLMo is currently in progress here: https://github.com/vllm-project/vllm/issues/2763

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment