Inference on finetuned Mamba model
#11
by
Kartik305
- opened
Using the [draft script] shared here: (https://huggingface.co/docs/transformers/main/en/model_doc/mamba2) ,
I have finetuned the mamba-codestral 7B model on custom data.
After saving the model using HF's save_pretrained
method, I am unable to use the generate_mamba
inference method due to the following error.
mamba_output = generate([tokens], model=Mamba.from_folder("<path_to_model>"), max_tokens=200, temperature=0.1)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[12], line 1
----> 1 mamba_output = generate([tokens], model=Mamba.from_folder(model_id), max_tokens=200, temperature=0.1)
File ~/.local/lib/python3.10/site-packages/mistral_inference/mamba.py:71, in Mamba.from_folder(folder, max_batch_size, num_pipeline_ranks, device, dtype)
63 @staticmethod
64 def from_folder(
65 folder: Union[Path, str],
(...)
69 dtype: Optional[torch.dtype] = None,
70 ) -> "Mamba":
---> 71 with open(Path(folder) / "params.json", "r") as f:
72 model_args = MambaArgs.from_dict(json.load(f))
74 with torch.device("meta"):
FileNotFoundError: [Errno 2] No such file or directory: '<path_to_model>/params.json'
If I copy the params.json
from the base model to the finetuned model directory, I get another error like so:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[24], line 1
----> 1 mamba_output = generate([tokens], model=Mamba.from_folder("\<path_to_model>"), max_tokens=200, temperature=0.1)
File ~/.local/lib/python3.10/site-packages/mistral_inference/mamba.py:79, in Mamba.from_folder(folder, max_batch_size, num_pipeline_ranks, device, dtype)
75 model = Mamba(model_args)
77 model_file = Path(folder) / "consolidated.safetensors"
---> 79 assert model_file.exists(), f"Make sure {model_file} exists."
80 loaded = safetensors.torch.load_file(str(model_file))
82 model.load_state_dict(loaded, assign=True, strict=True)
AssertionError: Make sure /<path_to_model>/consolidated.safetensors exists.
Having followed the draft script, Is there a way to load the trained model either using mistral-inference or transfomers inference?
Only difference is instead of PEFT, I did full finetuning of the model.