OpenGVLab
/

InternVL2-8B-MPO

@@ -16,24 +16,20 @@ tags:
 ---
 # InternVL2-8B-MPO
-[\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL)  [\[🆕 Blog\]](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/)  [\[📜 Paper\]](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)
 [切换至中文版](#简介)
-![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/619507e7b74b6c591f794340/ZNSqJ_rYzNUGk6rJVjEfq.jpeg)
 ## Introduction
 Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pre-training and supervised fine-tuning. However, these models suffer from distribution shifts, which limit their multimodal reasoning, particularly in the Chain-of-Thought (CoT) performance.
-To address this, we introduce a preference optimization (PO) process to enhance the multimodal reasoning capabilities of MLLMs.
-Specifically,
-(1) on the data side, we design an automated preference data construction pipeline to create [MMPR](https://huggingface.co/datasets/OpenGVLab/MMPR), a high-quality, large-scale multimodal reasoning preference dataset;
-and (2) on the model side, we explore integrating PO with MLLMs, developing a simple yet effective method, termed Mixed Preference Optimization (MPO), that boosts multimodal CoT performance.
-Our approach demonstrates improved performance across multiple benchmarks, particularly in multimodal reasoning tasks.
-Notably, our model, [InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B), achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10$\times$ larger InternVL2-76B.
-We hope this study could inspire further advancements in MLLMs.
 ## Model Details

 ---
 # InternVL2-8B-MPO
+[\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl2.0_mpo)  [\[🆕 Blog\]](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/)  [\[📜 Paper\]](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)
 [切换至中文版](#简介)
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/619507e7b74b6c591f794340/sy8aVC1Y5wtAjG-OQzrDI.jpeg)
 ## Introduction
 Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pre-training and supervised fine-tuning. However, these models suffer from distribution shifts, which limit their multimodal reasoning, particularly in the Chain-of-Thought (CoT) performance.
+To address this, we introduce a preference optimization (PO) process to enhance the multimodal reasoning capabilities of MLLMs. Specifically, (1) on the data side, we design an automated preference data construction pipeline to create [MMPR](https://huggingface.co/datasets/OpenGVLab/MMPR), a high-quality, large-scale multimodal reasoning preference dataset. and (2) on the model side, we explore integrating PO with MLLMs, developing a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance.
+Our approach demonstrates improved performance across multiple benchmarks, particularly in multimodal reasoning tasks. Notably, our model, [InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B), achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10$\times$ larger InternVL2-76B. We hope this study could inspire further advancements in MLLMs.
 ## Model Details