Update hardcoded filenames

#1
by Wauplin HF staff - opened

Related to https://github.com/mistralai/mistral-inference/pull/191 and discussion on Slack.
This would allow something like this:

from mamba_ssm import MambaLMHeadModel

model = MambaLMHeadModel.from_pretrained("mistralai/mamba-codestral-7B-v0.1")

to work out of the box.

It would also enable the download counter on the model page (currently showing Downloads are not tracked for this model.)

Wauplin changed pull request status to open

This would also help keeping convert_hf_to_gguf.py (from llama.cpp) simple for this model. That script currently assumes all relevant safetensors files match the glob model*.safetensors (it assumes the model prefix and the .safetensors suffix, which also allows multi-part models). And renaming the tokenizer from tokenizer.model.v3 to tokenizer.model too would help, assuming it's a SentencePiece tokenizer (if it's not, then nevermind).

config.json should ideally contain an architectures list, like "architectures": [ "Mamba2ForCausalLM" ], or something like that, at least to let the convert script know that this is a Mamba2 model (all model architectures supported by convert_hf_to_gguf.py are identified with the architectures list from config.json).

Cannot merge
This branch has merge conflicts in the following files:
  • config.json

Sign up or log in to comment