tinyllava
/

TinyLLaVA-Phi-2-SigLIP-3.1B

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

TinyLLaVA-Phi-2-SigLIP-3.1B / README.md

tinyllava's picture

Update README.md

cd7a15c verified 6 months ago

|

1.91 kB

	---
	license: apache-2.0
	pipeline_tag: image-text-to-text
	---

	### TinyLLaVA

	We trained a TinyLLaVA model with 3.1B parameters, employing the same training settings as [TinyLLaVA](https://github.com/DLCV-BUAA/TinyLLaVABench). For the Language and Vision models, we chose [Phi-2](microsoft/phi-2) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384), respectively. The Connector was configured with a 2-layer MLP. The dataset used for training is the [ShareGPT4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md) dataset.

	### Usage
	Execute the following test code:
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	hf_path = 'tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B'
	model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True)
	model.cuda()
	config = model.config
	tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False, model_max_length = config.tokenizer_model_max_length,padding_side = config.tokenizer_padding_side)
	prompt="What are these?"
	image_url="http://images.cocodataset.org/val2017/000000039769.jpg"
	output_text, genertaion_time = model.chat(prompt=prompt, image=image_url, model=model, tokenizer=tokenizer)

	print('model output: ', output_text)
	print('runing time: ', genertaion_time)
	```
	### Result

	\| model_name \| vqav2 \| gqa \| sqa \| textvqa \| MM-VET \| POPE \| MME \| MMMU \|
	\| :----------------------------------------------------------: \| ----- \| ------- \| ----- \| ----- \| ------- \| ----- \| ------ \| ------ \|
	\| [bczhou/TinyLLaVA-3.1B](https://huggingface.co/bczhou/TinyLLaVA-3.1B) \| 79.9 \| 62.0 \| 69.1 \| 59.1 \| 32.0 \| 86.4 \| 1464.9 \| - \|
	\| [tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B](https://huggingface.co/tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B) \| 80.1 \| 62.1 \| 73.0 \| 60.3 \| 37.5 \| 87.2 \| 1466.4 \| 38.4 \|