chenlin commited on
Commit
6f8da97
1 Parent(s): 3a82f72
Files changed (2) hide show
  1. README.md +38 -1
  2. config.json +1 -1
README.md CHANGED
@@ -1,3 +1,40 @@
1
  ---
2
- license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ inference: false
3
  ---
4
+ <br>
5
+ <br>
6
+
7
+ # ShareGPT4V-13B Model Card
8
+
9
+ ## Model details
10
+
11
+ **Model type:**
12
+ ShareGPT4V-13B is an open-source chatbot trained by fine-tuning CLP vision tower and LLaMA/Vicuna on GPT4-Vision-assisted [ShareGPT4V](https://huggingface.co/datasets/Lin-Chen/ShareGPT4V) data and LLaVA instruction-tuning data.
13
+
14
+ **Model date:**
15
+ ShareGPT4V-13B was trained in Nov 2023.
16
+
17
+ **Paper or resources for more information:**
18
+ [[Project](https://ShareGPT4V.github.io/)] [[Paper](https://huggingface.co/papers/2311.12793)] [[Code](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V)]
19
+
20
+ ## Usage
21
+ You can directly utilize this model as we provide in our [[repository](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V)]. Moreover, you can modify the architecture name from "Share4VLlamaForCausalLM" to "LLaVALlamaForCausalLM" and the model_type keyword from "share4v" to "llava" in our config file and seamlessly load our model in the [[LLaVA repository](https://github.com/haotian-liu/LLaVA)].
22
+
23
+ ## License
24
+ Llama 2 is licensed under the LLAMA 2 Community License,
25
+ Copyright (c) Meta Platforms, Inc. All Rights Reserved.
26
+
27
+ ## Intended use
28
+ **Primary intended uses:**
29
+ The primary use of ShareGPT4V-13B is research on large multimodal models and chatbots.
30
+
31
+ **Primary intended users:**
32
+ The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
33
+
34
+ ## Training dataset
35
+ - 1.2M high-quality image-text pairs, i.e., ShareGPT4V-PT data
36
+ - 100K GPT4-Vision-generated image-text pairs
37
+ - LLaVA instruction-tuning data
38
+
39
+ ## Evaluation dataset
40
+ A collection of 11 benchmarks
config.json CHANGED
@@ -21,7 +21,7 @@
21
  "mm_use_im_start_end": false,
22
  "mm_vision_select_feature": "patch",
23
  "mm_vision_select_layer": -2,
24
- "mm_vision_tower": "pretrained/vision_encoder/ShareGPT4V-13B_Pretrained_vit-large336-l12",
25
  "model_type": "share4v",
26
  "num_attention_heads": 40,
27
  "num_hidden_layers": 40,
 
21
  "mm_use_im_start_end": false,
22
  "mm_vision_select_feature": "patch",
23
  "mm_vision_select_layer": -2,
24
+ "mm_vision_tower": "Lin-Chen/ShareGPT4V-13B_Pretrained_vit-large336-l12",
25
  "model_type": "share4v",
26
  "num_attention_heads": 40,
27
  "num_hidden_layers": 40,