zsytony commited on
Commit
dca5b9e
1 Parent(s): 65d0a45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -7
README.md CHANGED
@@ -6,7 +6,7 @@ base_model:
6
  # CompassJudger-1
7
 
8
  <p align="center">
9
- 🤗 <a href="https://huggingface.co/opencompass">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/opencompass">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Paper</a> &nbsp&nbsp
10
  <br>
11
 
12
  </p>
@@ -27,7 +27,7 @@ Here provides a code to show you how to load the tokenizer and model and how to
27
  ```python
28
  from transformers import AutoModelForCausalLM, AutoTokenizer
29
 
30
- model_name = "opencompass/CompassJudger-1"
31
 
32
  model = AutoModelForCausalLM.from_pretrained(
33
  model_name,
@@ -63,7 +63,7 @@ print(response)
63
 
64
 
65
  We also provide some examples for different usage situations:
66
- ### Gneral Chat
67
 
68
  ```
69
  **Input**: Hello, can you help me to judge something?
@@ -173,7 +173,9 @@ cd opencompass
173
  pip install -e .
174
  python run.py configs/eval_judgerbench.py --mode all --reuse latest
175
  ```
 
176
 
 
177
 
178
  ## Use CompassJudger-1 to Test Subjective Datasets in OpenCompass
179
 
@@ -224,8 +226,8 @@ infer = dict(
224
  judge_models = [dict(
225
  dict(
226
  type=TurboMindModelwithChatTemplate,
227
- abbr='CompassJudger-1-7B,
228
- path='Opencompass/CompassJudger-1-7B',
229
  engine_config=dict(session_len=16384, max_batch_size=16, tp=1),
230
  gen_config=dict(top_k=1, temperature=1e-6, top_p=0.9, max_new_tokens=2048),
231
  max_seq_len=16384,
@@ -253,13 +255,17 @@ For more detailed subjective evaluation guidelines, please refer to: https://git
253
 
254
  To facilitate better comparisons within the community, we have tested the subjective performance of some models using CompassJudger-1.
255
 
 
 
 
 
256
  ## Citation
257
 
258
  ```bib
259
  @article{cao2024compass,
260
  title={CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution},
261
  author={Maosong Cao, Alexander Lam, Haodong Duan, Hongwei Liu, Songyang Zhang, Kai Chen},
262
- journal={arXiv preprint arXiv:2410.xxxxxx},
263
  year={2024}
264
  }
265
  ```
@@ -269,4 +275,4 @@ To facilitate better comparisons within the community, we have tested the subjec
269
  - https://github.com/open-compass/opencompass
270
  - https://github.com/InternLM/InternLM
271
  - https://github.com/QwenLM/Qwen2.5
272
- - https://github.com/InternLM/xtuner
 
6
  # CompassJudger-1
7
 
8
  <p align="center">
9
+ 🤗 <a href="https://huggingface.co/opencompass">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/opencompass">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/pdf/2410.16256">Paper</a> &nbsp&nbsp | &nbsp&nbsp 🎖️ <a href="https://huggingface.co/spaces/opencompass/judgerbench_leaderboard">Leaderboard</a> &nbsp&nbsp
10
  <br>
11
 
12
  </p>
 
27
  ```python
28
  from transformers import AutoModelForCausalLM, AutoTokenizer
29
 
30
+ model_name = "opencompass/CompassJudger-1-7B-Instruct"
31
 
32
  model = AutoModelForCausalLM.from_pretrained(
33
  model_name,
 
63
 
64
 
65
  We also provide some examples for different usage situations:
66
+ ### General Chat
67
 
68
  ```
69
  **Input**: Hello, can you help me to judge something?
 
173
  pip install -e .
174
  python run.py configs/eval_judgerbench.py --mode all --reuse latest
175
  ```
176
+ We also provided a leaderboard for JudgerBench: https://huggingface.co/spaces/opencompass/judgerbench_leaderboard
177
 
178
+ If you want to add your model to this leaderboard, welcome to add an issue in this Repository.
179
 
180
  ## Use CompassJudger-1 to Test Subjective Datasets in OpenCompass
181
 
 
226
  judge_models = [dict(
227
  dict(
228
  type=TurboMindModelwithChatTemplate,
229
+ abbr='CompassJudger-1-7B-Instruct',
230
+ path='opencompass/CompassJudger-1-7B-Instruct',
231
  engine_config=dict(session_len=16384, max_batch_size=16, tp=1),
232
  gen_config=dict(top_k=1, temperature=1e-6, top_p=0.9, max_new_tokens=2048),
233
  max_seq_len=16384,
 
255
 
256
  To facilitate better comparisons within the community, we have tested the subjective performance of some models using CompassJudger-1.
257
 
258
+ See in: [https://huggingface.co/spaces/opencompass/judgerbench_leaderboard](https://huggingface.co/spaces/opencompass/compassjudger_subj_eval_leaderboard)
259
+
260
+ If you want to add your model to this leaderboard, welcome to add an issue in this Repository.
261
+
262
  ## Citation
263
 
264
  ```bib
265
  @article{cao2024compass,
266
  title={CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution},
267
  author={Maosong Cao, Alexander Lam, Haodong Duan, Hongwei Liu, Songyang Zhang, Kai Chen},
268
+ journal={arXiv preprint arXiv:2410.16256},
269
  year={2024}
270
  }
271
  ```
 
275
  - https://github.com/open-compass/opencompass
276
  - https://github.com/InternLM/InternLM
277
  - https://github.com/QwenLM/Qwen2.5
278
+ - https://github.com/InternLM/xtuner