model type llava_mistral is unrecognised

#1
by shshwtv - opened

ValueError: The checkpoint you are trying to load has model type llava_mistral but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

How to add this model type in Transformers?

Microsoft org

Hi - please check our repo (https://github.com/microsoft/LLaVA-Med?tab=readme-ov-file#contents) for the use of LLaVA-Med v1.5.

Hi, thanks for you response. I want to finetune your base-model on my own data. Let me know if it is possible to do so in near future? Thank you

Hi, did you solve the problem? I met the same problem that Transformers does not recognize this architecture. I download the model files and run them offline.

No. It's not possible. We switched to original llava model.

Hi, thanks for you response. I want to finetune your base-model on my own data. Let me know if it is possible to do so in near future? Thank you

Hello. I want to fine tune the Llava-med on my own dataset. Is it possible, did you find a solution?

I can load it successfully, steps are as following:

  1. clone repository from https://github.com/microsoft/LLaVA-Med and create virtual environment
  2. download parameters from this repository
  3. use the following code to load model:
from llava.model.builder import load_pretrained_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
        model_path='<path_to_downloaded_repository(this)>',
        model_base=None,
        model_name='llava-med-v1.5-mistral-7b'
 )

Then I can use this model like any other models in Hugging Face transformers library.

Thanks for the update @mizukiQ

Also, wanted to ask - what's the max resolution of image that can be used? ViTL14 supports 224 x 224.
And what are various strategies to handle CT/ MR images.

I am also willing to join on discord or zoom to catchup and exchange-notes with other builders in the space.

Best,
Shash

I can load it successfully, steps are as following:

  1. clone repository from https://github.com/microsoft/LLaVA-Med and create virtual environment
  2. download parameters from this repository
  3. use the following code to load model:
from llava.model.builder import load_pretrained_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
        model_path='<path_to_downloaded_repository(this)>',
        model_base=None,
        model_name='llava-med-v1.5-mistral-7b'
 )

Then I can use this model like any other models in Hugging Face transformers library.

thank you!
I also want to ask how should I prepare my dataset. I have images and captions. How should I convert them for fine tuning on lava med?
Is there any tutorial ?

Thanks for the update @mizukiQ

Also, wanted to ask - what's the max resolution of image that can be used? ViTL14 supports 224 x 224.
And what are various strategies to handle CT/ MR images.

I am also willing to join on discord or zoom to catchup and exchange-notes with other builders in the space.

Best,
Shash

I am not the official researcher of llava-med, here's some configuration from their code:
llava-med utilizes CLIPImageProcessor to handle image, and its crop_size is (336, 336), and it will split it into patches with shape (24, 24), where each patch is 14 x 14.
llava-med handles the images in different modalities (CT or MR) in the same way.

thank you!
I also want to ask how should I prepare my dataset. I have images and captions. How should I convert them for fine tuning on lava med?
Is there any tutorial ?

I publish some data loading code here (the code is from my reproduction of llava-med, it may be not good enough)
You can replace SlakeDataset class with your own dataset class (since image caption and vqa have the similar form I + Q -> T), just keep two interfaces the same. and finetune the model.

I actually want to convert my image-text pair dataset to QA dataset like in the paper.
So I need to get question and answers from my texts first.
maybe after that I can use the exact code without changing in the repo
do you have any recommendation for that "meta-llama/Llama-3-8b-chat-hf" and "meta-llama/Meta-Llama-3-8B-Instruct" models seems suitable for this task

I can load it successfully, steps are as following:

  1. clone repository from https://github.com/microsoft/LLaVA-Med and create virtual environment
  2. download parameters from this repository
  3. use the following code to load model:
from llava.model.builder import load_pretrained_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
        model_path='<path_to_downloaded_repository(this)>',
        model_base=None,
        model_name='llava-med-v1.5-mistral-7b'
 )

Then I can use this model like any other models in Hugging Face transformers library.

hey, awesome solution
i have a bug that my path does not appear to have a file named config.json
any help?
thanks

i have a bug that my path does not appear to have a file named config.json
any help? did you solve this error?

can anyone send the updated code.. now i am not able to load the model

@marinasam @satheeshkola532

I am confused about your problem. This repository (https://huggingface.co/microsoft/llava-med-v1.5-mistral-7b) does contain config.json.
How about re-cloning this repository and checking file integrity?

This comment has been hidden

Easiest way to load

git clone https://github.com/microsoft/LLaVA-Med

Then in this directory run this

from llava.model.builder import load_pretrained_model
model_path='microsoft/llava-med-v1.5-mistral-7b'
model_base=None
model_name='llava-med-v1.5-mistral-7b'
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, model_base, model_name, load_8bit=False, load_4bit=False, device="cuda")

For quantized loading though, this doesn't work with some strange errors like llava_mistral is unrecognized
This can be bypassed by

from transformers import AutoTokenizer, BitsAndBytesConfig
from llava.model import LlavaMistralForCausalLM
from transformers import AutoModelForCausalLM

model_path = "microsoft/llava-med-v1.5-mistral-7b"
kwargs = {"device_map": "auto"}
kwargs['load_in_4bit'] = True
kwargs['quantization_config'] = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type='nf4'
)
#model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
model = LlavaMistralForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)

*AutoModelForCausalLM did work for me sometimes. But I can't seem to reproduce it now

Whenever I try to follow the previously mentioned instructions, I always get "Some weights of the model checkpoint at llava-med-v1.5-mistral-7b were not used when initializing LlavaMistralForCausalLM: ['model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight'...". My understanding is that the pretrained vision_tower from llava-med-v1.5-mistral-7b is not being used and, instead, another pretrained vision tower from the LLava-med repository is being used. Can anyone clarify?

Before loading the model, could someone please explain to me how to download parameters from the repository ? I dont understand this step

It is more easier to follow this .py file: LLaVA-Med/llava/eval/model_vqa.py

It would be cool is someone converted the weights to the Transformers-native model, LlavaForConditionalGeneration.

Here's the conversion script: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

I can load it successfully, steps are as following:

  1. clone repository from https://github.com/microsoft/LLaVA-Med and create virtual environment
  2. download parameters from this repository
  3. use the following code to load model:
from llava.model.builder import load_pretrained_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
        model_path='<path_to_downloaded_repository(this)>',
        model_base=None,
        model_name='llava-med-v1.5-mistral-7b'
 )

Then I can use this model like any other models in Hugging Face transformers library.

hi,could you give me a sample code that can directly use pictures and questions to generate answers? My code keeps reporting errors and cannot be generated.

Sign up or log in to comment