weizhiwang
/

llava-v1.5-llama-3-8b-pretrain-clip-large-336px

Text Generation

Model card Files Files and versions Community

weizhiwang commited on Apr 24

Commit

35089c0

•

1 Parent(s): 3bd23e6

Create README.md

Files changed (1) hide show

README.md +19 -0

README.md ADDED Viewed

	@@ -0,0 +1,19 @@

+---
+inference: false
+datasets:
+- liuhaotian/LLaVA-CC3M-Pretrain-595K
+---
+# llava-v1.5-llama-3-8b-pretrain Model Card
+This is a pretrained checkpoint with the MLP connector after LLaVA stage 1, you can use it to instruct tune your multimodal models.
+Please follow my reproduced implementation [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3/) for more details on fine-tuning LLaVA model with Llama-3 as the foundatiaon LLM.
+## Training dataset
+- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
+## Architecture
+- LLM: llama-3-8b (Frozen)
+- Vision-Language Adapter: MLP
+- Vision Encoder: CLIP-ViT-L-336px (Frozen)