Kaguya-19 commited on
Commit
a72acc2
1 Parent(s): 28dc150

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -19
README.md CHANGED
@@ -2,33 +2,33 @@
2
  language:
3
  - zh
4
  - en
5
- base_model: openbmb/MiniCPM-2B-dpo-bf16
6
  ---
7
- ## RankCPM-R
8
 
9
- **RankCPM-R** 是面壁智能与清华大学自然语言处理实验室(THUNLP)共同开发的中英双语言文本重排序模型,有如下特点:
10
  - 出色的中文、英文重排序能力。
11
  - 出色的中英跨语言重排序能力。
12
 
13
- RankCPM-R 基于 [MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) 训练,结构上采取双向注意力。采取多阶段训练方式,共使用包括开源数据、机造数据、闭源数据在内的约 600 万条训练数据。
14
 
15
  欢迎关注 RAG 套件系列:
16
 
17
- - 检索模型:[RankCPM-E](https://huggingface.co/openbmb/RankCPM-E)
18
- - 重排模型:[RankCPM-R](https://huggingface.co/openbmb/RankCPM-R)
19
  - 面向 RAG 场景的 LoRA 插件:[MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA)
20
 
21
- **RankCPM-R** is a bilingual & cross-lingual text re-ranking model developed by ModelBest Inc. and THUNLP, featuring:
22
 
23
  - Exceptional Chinese and English re-ranking capabilities.
24
  - Outstanding cross-lingual re-ranking capabilities between Chinese and English.
25
 
26
- RankCPM-R is trained based on [MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) and incorporates bidirectional attention in its architecture. The model underwent multi-stage training using approximately 6 million training examples, including open-source, synthetic, and proprietary data.
27
 
28
  We also invite you to explore the RAG toolkit series:
29
 
30
- - Retrieval Model: [RankCPM-E](https://huggingface.co/openbmb/RankCPM-E)
31
- - Re-ranking Model: [RankCPM-R](https://huggingface.co/openbmb/RankCPM-R)
32
  - LoRA Plugin for RAG scenarios: [MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA)
33
 
34
  ## 模型信息 Model Information
@@ -45,7 +45,7 @@ We also invite you to explore the RAG toolkit series:
45
 
46
  本模型支持指令,输入格式如下:
47
 
48
- RankCPM-R supports instructions in the following format:
49
 
50
  ```
51
  <s>Instruction: {{ instruction }} Query: {{ query }}</s>{{ document }}
@@ -65,7 +65,7 @@ For example:
65
 
66
  也可以不提供指令,即采取如下格式:
67
 
68
- RankCPM-R also works in instruction-free mode in the following format:
69
 
70
  ```
71
  <s>Query: {{ query }}</s>{{ document }}
@@ -89,7 +89,7 @@ from transformers import AutoModel, AutoTokenizer, AutoModelForSequenceClassific
89
  import torch
90
  import numpy as np
91
 
92
- model_name = "openbmb/RankCPM-R"
93
  tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
94
  tokenizer.padding_side = "right"
95
  model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True,attn_implementation="flash_attention_2", torch_dtype=torch.float16).to("cuda")
@@ -152,7 +152,7 @@ We re-rank top-100 docments from `bge-large-zh-v1.5` in C-MTEB/Retrieval and fro
152
  | bge-reranker-v2-minicpm-28 | 73.51 | 59.86 |
153
  | bge-reranker-v2-gemma | 71.74 | 60.71 |
154
  | bge-reranker-v2.5-gemma2 | - | **63.67** |
155
- | RankCPM-R | **76.79** | 61.32 |
156
 
157
  ### 中英跨语言重排序结果 CN-EN Cross-lingual Re-ranking Results
158
 
@@ -166,14 +166,14 @@ We re-rank top-100 documents from `bge-m3` (Dense).
166
  | jina-reranker-v2-base-multilingual | 69.33 | 36.66 | 50.03 |
167
  | bge-reranker-v2-m3 | 69.75 | 40.98 | 49.67 |
168
  | gte-multilingual-reranker-base | 68.51 | 38.74 | 45.3 |
169
- | RankCPM-R | **71.73** | **43.65** | **50.59** |
170
 
171
  ## 许可证 License
172
 
173
  - 本仓库中代码依照 [Apache-2.0 协议](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE)开源。
174
- - RankCPM-R 模型权重的使用则需要遵循 [MiniCPM 模型协议](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md)。
175
- - RankCPM-R 模型权重对学术研究完全开放。如需将模型用于商业用途,请填写[此问卷](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g)。
176
 
177
  * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
178
- * The usage of RankCPM-R model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
179
- * The models and weights of RankCPM-R are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, RankCPM-R weights are also available for free commercial use.
 
2
  language:
3
  - zh
4
  - en
5
+ base_model: openbmb/MiniCPM-2B-sft-bf16
6
  ---
7
+ ## MiniCPM-Reranker
8
 
9
+ **MiniCPM-Reranker** 是面壁智能与清华大学自然语言处理实验室(THUNLP)共同开发的中英双语言文本重排序模型,有如下特点:
10
  - 出色的中文、英文重排序能力。
11
  - 出色的中英跨语言重排序能力。
12
 
13
+ MiniCPM-Reranker 基于 [MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) 训练,结构上采取双向注意力。采取多阶段训练方式,共使用包括开源数据、机造数据、闭源数据在内的约 600 万条训练数据。
14
 
15
  欢迎关注 RAG 套件系列:
16
 
17
+ - 检索模型:[MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding)
18
+ - 重排模型:[MiniCPM-Reranker](https://huggingface.co/openbmb/MiniCPM-Reranker)
19
  - 面向 RAG 场景的 LoRA 插件:[MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA)
20
 
21
+ **MiniCPM-Reranker** is a bilingual & cross-lingual text re-ranking model developed by ModelBest Inc. and THUNLP, featuring:
22
 
23
  - Exceptional Chinese and English re-ranking capabilities.
24
  - Outstanding cross-lingual re-ranking capabilities between Chinese and English.
25
 
26
+ MiniCPM-Reranker is trained based on [MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) and incorporates bidirectional attention in its architecture. The model underwent multi-stage training using approximately 6 million training examples, including open-source, synthetic, and proprietary data.
27
 
28
  We also invite you to explore the RAG toolkit series:
29
 
30
+ - Retrieval Model: [MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding)
31
+ - Re-ranking Model: [MiniCPM-Reranker](https://huggingface.co/openbmb/MiniCPM-Reranker)
32
  - LoRA Plugin for RAG scenarios: [MiniCPM3-RAG-LoRA](https://huggingface.co/openbmb/MiniCPM3-RAG-LoRA)
33
 
34
  ## 模型信息 Model Information
 
45
 
46
  本模型支持指令,输入格式如下:
47
 
48
+ MiniCPM-Reranker supports instructions in the following format:
49
 
50
  ```
51
  <s>Instruction: {{ instruction }} Query: {{ query }}</s>{{ document }}
 
65
 
66
  也可以不提供指令,即采取如下格式:
67
 
68
+ MiniCPM-Reranker also works in instruction-free mode in the following format:
69
 
70
  ```
71
  <s>Query: {{ query }}</s>{{ document }}
 
89
  import torch
90
  import numpy as np
91
 
92
+ model_name = "openbmb/MiniCPM-Reranker"
93
  tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
94
  tokenizer.padding_side = "right"
95
  model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True,attn_implementation="flash_attention_2", torch_dtype=torch.float16).to("cuda")
 
152
  | bge-reranker-v2-minicpm-28 | 73.51 | 59.86 |
153
  | bge-reranker-v2-gemma | 71.74 | 60.71 |
154
  | bge-reranker-v2.5-gemma2 | - | **63.67** |
155
+ | MiniCPM-Reranker | **76.79** | 61.32 |
156
 
157
  ### 中英跨语言重排序结果 CN-EN Cross-lingual Re-ranking Results
158
 
 
166
  | jina-reranker-v2-base-multilingual | 69.33 | 36.66 | 50.03 |
167
  | bge-reranker-v2-m3 | 69.75 | 40.98 | 49.67 |
168
  | gte-multilingual-reranker-base | 68.51 | 38.74 | 45.3 |
169
+ | MiniCPM-Reranker | **71.73** | **43.65** | **50.59** |
170
 
171
  ## 许可证 License
172
 
173
  - 本仓库中代码依照 [Apache-2.0 协议](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE)开源。
174
+ - MiniCPM-Reranker 模型权重的使用则需要遵循 [MiniCPM 模型协议](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md)。
175
+ - MiniCPM-Reranker 模型权重对学术研究完全开放。如需将模型用于商业用途,请填写[此问卷](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g)。
176
 
177
  * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
178
+ * The usage of MiniCPM-Reranker model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
179
+ * The models and weights of MiniCPM-Reranker are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, MiniCPM-Reranker weights are also available for free commercial use.