Model Card for LLaVa-Phi-2-3B-GGUF
Model Details
Model Description
Quantized version of llava-phi-2-3b. Quantization was done using llama.cpp
- Developed by: LAION, SkunkworksAI & Ontocord
- Model type: LLaVA is an open-source chatbot trained by fine-tuning Phi-2 on GPT-generated multimodal instruction-following data.
It is an auto-regressive language model, based on the transformer architecture
- Finetuned from model: Phi-2
- License: MIT
Model Sources
Usage
make & ./llava-cli -m ../ggml-model-f16.gguf --mmproj ../mmproj-model-f16.gguf --image /path/to/image.jpg
Evaluation
Benchmarks
Model |
Parameters |
SQA |
GQA |
TextVQA |
POPE |
LLaVA-1.5 |
7.3B |
68.0 |
62.0 |
58.3 |
85.3 |
MC-LLaVA-3B |
3B |
- |
49.6 |
38.59 |
- |
LLaVA-Phi |
3B |
68.4 |
- |
48.6 |
85.0 |
moondream1 |
1.6B |
- |
56.3 |
39.8 |
- |
llava-phi-2-3b |
3B |
69.0 |
51.2 |
47.0 |
86.0 |
Image Captioning (MS COCO)
Model |
BLEU_1 |
BLEU_2 |
BLEU_3 |
BLEU_4 |
METEOR |
ROUGE_L |
CIDEr |
SPICE |
llava-1.5-7b |
75.8 |
59.8 |
45 |
33.3 |
29.4 |
57.7 |
108.8 |
23.5 |
llava-phi-2-3b |
67.7 |
50.5 |
35.7 |
24.2 |
27.0 |
52.4 |
85.0 |
20.7 |