weizhiwang
commited on
Commit
•
35089c0
1
Parent(s):
3bd23e6
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
inference: false
|
3 |
+
datasets:
|
4 |
+
- liuhaotian/LLaVA-CC3M-Pretrain-595K
|
5 |
+
---
|
6 |
+
|
7 |
+
# llava-v1.5-llama-3-8b-pretrain Model Card
|
8 |
+
|
9 |
+
This is a pretrained checkpoint with the MLP connector after LLaVA stage 1, you can use it to instruct tune your multimodal models.
|
10 |
+
Please follow my reproduced implementation [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3/) for more details on fine-tuning LLaVA model with Llama-3 as the foundatiaon LLM.
|
11 |
+
|
12 |
+
|
13 |
+
## Training dataset
|
14 |
+
- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
|
15 |
+
|
16 |
+
## Architecture
|
17 |
+
- LLM: llama-3-8b (Frozen)
|
18 |
+
- Vision-Language Adapter: MLP
|
19 |
+
- Vision Encoder: CLIP-ViT-L-336px (Frozen)
|