Update README.md
Browse files
README.md
CHANGED
@@ -18,31 +18,31 @@ model-index:
|
|
18 |
metrics:
|
19 |
- type: GQA
|
20 |
name: GQA
|
21 |
-
value: 0.
|
22 |
- type: MME Cog.
|
23 |
name: MME Cog.
|
24 |
-
value:
|
25 |
- type: MME Per.
|
26 |
name: MME Per.
|
27 |
-
value:
|
28 |
- type: MM-Vet
|
29 |
name: MM-Vet
|
30 |
-
value:
|
31 |
- type: POPE Acc.
|
32 |
name: POPE Acc.
|
33 |
-
value: 0.
|
34 |
- type: POPE F1
|
35 |
name: POPE F1
|
36 |
value: 0.839
|
37 |
- type: VQAv2
|
38 |
name: VQAv2
|
39 |
-
value:
|
40 |
- type: MMVP
|
41 |
name: MMVP
|
42 |
-
value: 0.
|
43 |
- type: ScienceQA Image
|
44 |
name: ScienceQA Image
|
45 |
-
value: 0.
|
46 |
library_name: transformers
|
47 |
pipeline_tag: image-text-to-text
|
48 |
---
|
@@ -165,8 +165,8 @@ Performance of LLaVA-Gemma models across seven benchmarks. Highlighted box indic
|
|
165 |
|
166 |
| LM Backbone | Vision Model | Pretrained Connector | GQA | MME cognition | MME perception | MM-Vet | POPE accuracy | POPE F1 | VQAv2 | ScienceQA Image | MMVP |
|
167 |
| ----------- | ------------ | -------------------- | ----- | ------------- | -------------- | ------ | ------------- | ------- | ----- | --------------- | ----- |
|
168 |
-
| gemma-2b-it | CLIP | Yes | 0.531 | 236 | 1130 | 17.7 | 0.850 |<mark>0.839</mark>| 70.65 | 0.564 | 0.287 |
|
169 |
-
|
|
170 |
| gemma-2b-it | DinoV2 | Yes |<mark>0.587</mark>| 307| <mark>1133</mark> |<mark>19.1</mark>| <mark>0.853</mark> | 0.838 |<mark>71.37</mark>| 0.555 | 0.227 |
|
171 |
| gemma-2b-it | DinoV2 | No | 0.501 | <mark>309</mark>| 959 | 14.5 | 0.793 | 0.772 | 61.65 | 0.568 | 0.180 |
|
172 |
| | | | | | | | | | | | |
|
|
|
18 |
metrics:
|
19 |
- type: GQA
|
20 |
name: GQA
|
21 |
+
value: 0.531
|
22 |
- type: MME Cog.
|
23 |
name: MME Cog.
|
24 |
+
value: 236
|
25 |
- type: MME Per.
|
26 |
name: MME Per.
|
27 |
+
value: 1130
|
28 |
- type: MM-Vet
|
29 |
name: MM-Vet
|
30 |
+
value: 17.7
|
31 |
- type: POPE Acc.
|
32 |
name: POPE Acc.
|
33 |
+
value: 0.850
|
34 |
- type: POPE F1
|
35 |
name: POPE F1
|
36 |
value: 0.839
|
37 |
- type: VQAv2
|
38 |
name: VQAv2
|
39 |
+
value: 70.7
|
40 |
- type: MMVP
|
41 |
name: MMVP
|
42 |
+
value: 0.287
|
43 |
- type: ScienceQA Image
|
44 |
name: ScienceQA Image
|
45 |
+
value: 0.564
|
46 |
library_name: transformers
|
47 |
pipeline_tag: image-text-to-text
|
48 |
---
|
|
|
165 |
|
166 |
| LM Backbone | Vision Model | Pretrained Connector | GQA | MME cognition | MME perception | MM-Vet | POPE accuracy | POPE F1 | VQAv2 | ScienceQA Image | MMVP |
|
167 |
| ----------- | ------------ | -------------------- | ----- | ------------- | -------------- | ------ | ------------- | ------- | ----- | --------------- | ----- |
|
168 |
+
| **gemma-2b-it** | CLIP | Yes | 0.531 | 236 | 1130 | 17.7 | 0.850 |<mark>0.839</mark>| 70.65 | 0.564 | 0.287 |
|
169 |
+
| gemma-2b-it | CLIP | No | 0.481 | 248 | 935 | 13.1 | 0.784 | 0.762 | 61.74 | 0.549 | 0.180 |
|
170 |
| gemma-2b-it | DinoV2 | Yes |<mark>0.587</mark>| 307| <mark>1133</mark> |<mark>19.1</mark>| <mark>0.853</mark> | 0.838 |<mark>71.37</mark>| 0.555 | 0.227 |
|
171 |
| gemma-2b-it | DinoV2 | No | 0.501 | <mark>309</mark>| 959 | 14.5 | 0.793 | 0.772 | 61.65 | 0.568 | 0.180 |
|
172 |
| | | | | | | | | | | | |
|