Will there be quantized versions? (GGUF)
Can this model be quantized and converted to the GGUF formate to use it with llama.cpp?
Did you check visual STudio for ai extention/2convert ?
@TheBlocki
plz thx
we'll work on more code integrations if anything specific is wrong.
I have been trying to hack it to work this morning. I added the new arch "OlmoModelForCausalLM," but I'm not sure if there is an existing compatible one like MODEL_ARCH.LLAMA.
As a likely result, I am running into deeper model issues with llama.cpp, for example.
Loading model: OLMo-7B
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
The repository for /backup_disks/OLMo-7B contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//backup_disks/OLMo-7B.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
Do you wish to run the custom code? [y/N] y
gguf: Adding 50009 merge(s).
gguf: Setting special token type eos to 50279
gguf: Setting special token type pad to 1
Exporting model to 'olmo.gguf'
gguf: loading model part 'pytorch_model.bin'
Can not map tensor 'model.transformer.wte.weight'
Transformers needs to be updated to the latest version from Github, but ai2-olmo seems to need a version of the torch that is hard to resolve. I will give it one last attempt to try with this version torch-2.3.0a0+git52b679d. But I fear a proper arch needs to be added to llama.ccp, and all my attempts are to no avail. In that regard, I am trying to use the llama.cpp convert_hf_to_gguf.py, just too early, I think, at this point.
Interested to see if anyone can make GGUF work for Olmo arch.
Any news about the GGUF versions? Could someone finally make them?
I tried, but I couldn't add this architect into llama.cpp and make the required changes.
I hope they add this feature faster to llama.cpp
Awesome! Thanks @eleius , shall we do it or it's been done already?