openbmb
/

MiniCPM-Reranker

@@ -2,33 +2,33 @@
 language:
 - zh
 - en
-base_model: openbmb/MiniCPM-2B-dpo-bf16
 ---
-## RankCPM-R
-**RankCPM-R** 是面壁智能与清华大学自然语言处理实验室（THUNLP）共同开发的中英双语言文本重排序模型，有如下特点：
 - 出色的中文、英文重排序能力。
 - 出色的中英跨语言重排序能力。
-RankCPM-R 基于 [MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) 训练，结构上采取双向注意力。采取多阶段训练方式，共使用包括开源数据、机造数据、闭源数据在内的约 600 万条训练数据。
 欢迎关注 RAG 套件系列：
-- 检索模型：[RankCPM-E](https://huggingface.co/openbmb/RankCPM-E)
-- 重排模型：[RankCPM-R](https://huggingface.co/openbmb/RankCPM-R)
 - 面向 RAG 场景的 LoRA 插件：[MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA)
-**RankCPM-R** is a bilingual & cross-lingual text re-ranking model developed by ModelBest Inc. and THUNLP, featuring:
 - Exceptional Chinese and English re-ranking capabilities.
 - Outstanding cross-lingual re-ranking capabilities between Chinese and English.
-RankCPM-R is trained based on [MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) and incorporates bidirectional attention in its architecture. The model underwent multi-stage training using approximately 6 million training examples, including open-source, synthetic, and proprietary data.
 We also invite you to explore the RAG toolkit series:
-- Retrieval Model: [RankCPM-E](https://huggingface.co/openbmb/RankCPM-E)
-- Re-ranking Model: [RankCPM-R](https://huggingface.co/openbmb/RankCPM-R)
 - LoRA Plugin for RAG scenarios: [MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA)
 ## 模型信息 Model Information
@@ -45,7 +45,7 @@ We also invite you to explore the RAG toolkit series:
 本模型支持指令，输入格式如下：
-RankCPM-R supports instructions in the following format:
 ```
 <s>Instruction: {{ instruction }} Query: {{ query }}</s>{{ document }}
@@ -65,7 +65,7 @@ For example:
 也可以不提供指令，即采取如下格式：
-RankCPM-R also works in instruction-free mode in the following format:
 ```
 <s>Query: {{ query }}</s>{{ document }}
@@ -89,7 +89,7 @@ from transformers import AutoModel, AutoTokenizer, AutoModelForSequenceClassific
 import torch
 import numpy as np
-model_name = "openbmb/RankCPM-R"
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
 tokenizer.padding_side = "right"
 model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True,attn_implementation="flash_attention_2", torch_dtype=torch.float16).to("cuda")
@@ -152,7 +152,7 @@ We re-rank top-100 docments from `bge-large-zh-v1.5` in C-MTEB/Retrieval and fro
 | bge-reranker-v2-minicpm-28 | 73.51             | 59.86         |
 | bge-reranker-v2-gemma      | 71.74             | 60.71         |
 | bge-reranker-v2.5-gemma2   | -                 | **63.67**     |
-| RankCPM-R                 | **76.79**         | 61.32        |
 ### 中英跨语言重排序结果 CN-EN Cross-lingual Re-ranking Results
@@ -166,14 +166,14 @@ We re-rank top-100 documents from `bge-m3` (Dense).
 | jina-reranker-v2-base-multilingual | 69.33              | 36.66              | 50.03              |
 | bge-reranker-v2-m3                 | 69.75              | 40.98              | 49.67              |
 | gte-multilingual-reranker-base     | 68.51              | 38.74              | 45.3              |
-| RankCPM-R                         | **71.73**          | **43.65**          | **50.59**          |
 ## 许可证 License
 - 本仓库中代码依照 [Apache-2.0 协议](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE)开源。
-- RankCPM-R 模型权重的使用则需要遵循 [MiniCPM 模型协议](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md)。
-- RankCPM-R 模型权重对学术研究完全开放。如需将模型用于商业用途，请填写[此问卷](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g)。
 * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
-* The usage of RankCPM-R model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
-* The models and weights of RankCPM-R are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, RankCPM-R weights are also available for free commercial use.

 language:
 - zh
 - en
+base_model: openbmb/MiniCPM-2B-sft-bf16
 ---
+## MiniCPM-Reranker
+**MiniCPM-Reranker** 是面壁智能与清华大学自然语言处理实验室（THUNLP）共同开发的中英双语言文本重排序模型，有如下特点：
 - 出色的中文、英文重排序能力。
 - 出色的中英跨语言重排序能力。
+MiniCPM-Reranker 基于 [MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) 训练，结构上采取双向注意力。采取多阶段训练方式，共使用包括开源数据、机造数据、闭源数据在内的约 600 万条训练数据。
 欢迎关注 RAG 套件系列：
+- 检索模型：[MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding)
+- 重排模型：[MiniCPM-Reranker](https://huggingface.co/openbmb/MiniCPM-Reranker)
 - 面向 RAG 场景的 LoRA 插件：[MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA)
+**MiniCPM-Reranker** is a bilingual & cross-lingual text re-ranking model developed by ModelBest Inc. and THUNLP, featuring:
 - Exceptional Chinese and English re-ranking capabilities.
 - Outstanding cross-lingual re-ranking capabilities between Chinese and English.
+MiniCPM-Reranker is trained based on [MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) and incorporates bidirectional attention in its architecture. The model underwent multi-stage training using approximately 6 million training examples, including open-source, synthetic, and proprietary data.
 We also invite you to explore the RAG toolkit series:
+- Retrieval Model: [MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding)
+- Re-ranking Model: [MiniCPM-Reranker](https://huggingface.co/openbmb/MiniCPM-Reranker)
 - LoRA Plugin for RAG scenarios: [MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA)
 ## 模型信息 Model Information
 本模型支持指令，输入格式如下：
+MiniCPM-Reranker supports instructions in the following format:
 ```
 <s>Instruction: {{ instruction }} Query: {{ query }}</s>{{ document }}
 也可以不提供指令，即采取如下格式：
+MiniCPM-Reranker also works in instruction-free mode in the following format:
 ```
 <s>Query: {{ query }}</s>{{ document }}
 import torch
 import numpy as np
+model_name = "openbmb/MiniCPM-Reranker"
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
 tokenizer.padding_side = "right"
 model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True,attn_implementation="flash_attention_2", torch_dtype=torch.float16).to("cuda")
 | bge-reranker-v2-minicpm-28 | 73.51             | 59.86         |
 | bge-reranker-v2-gemma      | 71.74             | 60.71         |
 | bge-reranker-v2.5-gemma2   | -                 | **63.67**     |
+| MiniCPM-Reranker                 | **76.79**         | 61.32        |
 ### 中英跨语言重排序结果 CN-EN Cross-lingual Re-ranking Results
 | jina-reranker-v2-base-multilingual | 69.33              | 36.66              | 50.03              |
 | bge-reranker-v2-m3                 | 69.75              | 40.98              | 49.67              |
 | gte-multilingual-reranker-base     | 68.51              | 38.74              | 45.3              |
+| MiniCPM-Reranker                         | **71.73**          | **43.65**          | **50.59**          |
 ## 许可证 License
 - 本仓库中代码依照 [Apache-2.0 协议](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE)开源。
+- MiniCPM-Reranker 模型权重的使用则需要遵循 [MiniCPM 模型协议](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md)。
+- MiniCPM-Reranker 模型权重对学术研究完全开放。如需将模型用于商业用途，请填写[此问卷](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g)。
 * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
+* The usage of MiniCPM-Reranker model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
+* The models and weights of MiniCPM-Reranker are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, MiniCPM-Reranker weights are also available for free commercial use.