zsytony commited on
Commit
2461a46
1 Parent(s): deaccd3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -7
README.md CHANGED
@@ -7,7 +7,7 @@ base_model:
7
  # CompassJudger-1
8
 
9
  <p align="center">
10
- 🤗 <a href="https://huggingface.co/opencompass">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/opencompass">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Paper</a> &nbsp&nbsp
11
  <br>
12
 
13
  </p>
@@ -28,7 +28,7 @@ Here provides a code to show you how to load the tokenizer and model and how to
28
  ```python
29
  from transformers import AutoModelForCausalLM, AutoTokenizer
30
 
31
- model_name = "opencompass/CompassJudger-1"
32
 
33
  model = AutoModelForCausalLM.from_pretrained(
34
  model_name,
@@ -64,7 +64,7 @@ print(response)
64
 
65
 
66
  We also provide some examples for different usage situations:
67
- ### Gneral Chat
68
 
69
  ```
70
  **Input**: Hello, can you help me to judge something?
@@ -174,7 +174,9 @@ cd opencompass
174
  pip install -e .
175
  python run.py configs/eval_judgerbench.py --mode all --reuse latest
176
  ```
 
177
 
 
178
 
179
  ## Use CompassJudger-1 to Test Subjective Datasets in OpenCompass
180
 
@@ -225,8 +227,8 @@ infer = dict(
225
  judge_models = [dict(
226
  dict(
227
  type=TurboMindModelwithChatTemplate,
228
- abbr='CompassJudger-1-7B,
229
- path='Opencompass/CompassJudger-1-7B',
230
  engine_config=dict(session_len=16384, max_batch_size=16, tp=1),
231
  gen_config=dict(top_k=1, temperature=1e-6, top_p=0.9, max_new_tokens=2048),
232
  max_seq_len=16384,
@@ -254,13 +256,17 @@ For more detailed subjective evaluation guidelines, please refer to: https://git
254
 
255
  To facilitate better comparisons within the community, we have tested the subjective performance of some models using CompassJudger-1.
256
 
 
 
 
 
257
  ## Citation
258
 
259
  ```bib
260
  @article{cao2024compass,
261
  title={CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution},
262
  author={Maosong Cao, Alexander Lam, Haodong Duan, Hongwei Liu, Songyang Zhang, Kai Chen},
263
- journal={arXiv preprint arXiv:2410.xxxxxx},
264
  year={2024}
265
  }
266
  ```
@@ -270,4 +276,4 @@ To facilitate better comparisons within the community, we have tested the subjec
270
  - https://github.com/open-compass/opencompass
271
  - https://github.com/InternLM/InternLM
272
  - https://github.com/QwenLM/Qwen2.5
273
- - https://github.com/InternLM/xtuner
 
7
  # CompassJudger-1
8
 
9
  <p align="center">
10
+ 🤗 <a href="https://huggingface.co/opencompass">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/opencompass">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/pdf/2410.16256">Paper</a> &nbsp&nbsp | &nbsp&nbsp 🎖️ <a href="https://huggingface.co/spaces/opencompass/judgerbench_leaderboard">Leaderboard</a> &nbsp&nbsp
11
  <br>
12
 
13
  </p>
 
28
  ```python
29
  from transformers import AutoModelForCausalLM, AutoTokenizer
30
 
31
+ model_name = "opencompass/CompassJudger-1-7B-Instruct"
32
 
33
  model = AutoModelForCausalLM.from_pretrained(
34
  model_name,
 
64
 
65
 
66
  We also provide some examples for different usage situations:
67
+ ### General Chat
68
 
69
  ```
70
  **Input**: Hello, can you help me to judge something?
 
174
  pip install -e .
175
  python run.py configs/eval_judgerbench.py --mode all --reuse latest
176
  ```
177
+ We also provided a leaderboard for JudgerBench: https://huggingface.co/spaces/opencompass/judgerbench_leaderboard
178
 
179
+ If you want to add your model to this leaderboard, welcome to add an issue in this Repository.
180
 
181
  ## Use CompassJudger-1 to Test Subjective Datasets in OpenCompass
182
 
 
227
  judge_models = [dict(
228
  dict(
229
  type=TurboMindModelwithChatTemplate,
230
+ abbr='CompassJudger-1-7B-Instruct',
231
+ path='opencompass/CompassJudger-1-7B-Instruct',
232
  engine_config=dict(session_len=16384, max_batch_size=16, tp=1),
233
  gen_config=dict(top_k=1, temperature=1e-6, top_p=0.9, max_new_tokens=2048),
234
  max_seq_len=16384,
 
256
 
257
  To facilitate better comparisons within the community, we have tested the subjective performance of some models using CompassJudger-1.
258
 
259
+ See in: [https://huggingface.co/spaces/opencompass/judgerbench_leaderboard](https://huggingface.co/spaces/opencompass/compassjudger_subj_eval_leaderboard)
260
+
261
+ If you want to add your model to this leaderboard, welcome to add an issue in this Repository.
262
+
263
  ## Citation
264
 
265
  ```bib
266
  @article{cao2024compass,
267
  title={CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution},
268
  author={Maosong Cao, Alexander Lam, Haodong Duan, Hongwei Liu, Songyang Zhang, Kai Chen},
269
+ journal={arXiv preprint arXiv:2410.16256},
270
  year={2024}
271
  }
272
  ```
 
276
  - https://github.com/open-compass/opencompass
277
  - https://github.com/InternLM/InternLM
278
  - https://github.com/QwenLM/Qwen2.5
279
+ - https://github.com/InternLM/xtuner