allenai/OLMo-7B · Will there be quantized versions? (GGUF)

alexcardo

Feb 2

Can this model be quantized and converted to the GGUF formate to use it with llama.cpp?

Kre8tiveAi

Feb 2

Did you check visual STudio for ai extention/2convert ?

natolambert

Ai2 org Feb 2

@TheBlocki plz thx
we'll work on more code integrations if anything specific is wrong.

jbkcrash

Feb 2

I have been trying to hack it to work this morning. I added the new arch "OlmoModelForCausalLM," but I'm not sure if there is an existing compatible one like MODEL_ARCH.LLAMA.

As a likely result, I am running into deeper model issues with llama.cpp, for example.

Loading model: OLMo-7B
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
The repository for /backup_disks/OLMo-7B contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//backup_disks/OLMo-7B.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
gguf: Adding 50009 merge(s).
gguf: Setting special token type eos to 50279
gguf: Setting special token type pad to 1
Exporting model to 'olmo.gguf'
gguf: loading model part 'pytorch_model.bin'
Can not map tensor 'model.transformer.wte.weight'

Transformers needs to be updated to the latest version from Github, but ai2-olmo seems to need a version of the torch that is hard to resolve. I will give it one last attempt to try with this version torch-2.3.0a0+git52b679d. But I fear a proper arch needs to be added to llama.ccp, and all my attempts are to no avail. In that regard, I am trying to use the llama.cpp convert_hf_to_gguf.py, just too early, I think, at this point.