czczup commited on
Commit
2217075
1 Parent(s): 0593a76

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -33,8 +33,7 @@ InternVL-Chat-V1.2-Plus uses the same model architecture as [InternVL-Chat-V1.2]
33
  - **Training Strategy:**
34
  - Pretraining Stage
35
  - Learnable Component: MLP
36
- - Data: Trained on 8192x4800=39.3M samples, including COYO, LAION, CC12M, CC3M, SBU, Wukong, GRIT, Objects365, OpenImages, and OCR data.
37
- - Note: In this stage, we load the pretrained weights of [InternViT-6B-448px-V1-2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
38
  - Supervised Finetuning Stage
39
  - Learnable Component: ViT + MLP + LLM
40
  - Data: 12 million SFT samples.
 
33
  - **Training Strategy:**
34
  - Pretraining Stage
35
  - Learnable Component: MLP
36
+ - Data: Trained on 8192x4800=39.3M samples, including COYO, LAION, CC12M, CC3M, SBU, Wukong, GRIT, Objects365, OpenImages, and OCR data. In this stage, we first load the pre-trained weights of [InternViT-6B-448px-V1-0](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0) and connect it to Nous-Hermes-2-Yi-34B. After pre-training, the extracted ViT is published as [InternViT-6B-448px-V1-2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
 
37
  - Supervised Finetuning Stage
38
  - Learnable Component: ViT + MLP + LLM
39
  - Data: 12 million SFT samples.