Bowieee commited on
Commit
963e536
1 Parent(s): 81630e4
Files changed (1) hide show
  1. text_content.py +2 -1
text_content.py CHANGED
@@ -4,7 +4,7 @@ This is the official leaderboard for 🏅StructEval benchmark. Starting from an
4
  Please refer to 🐱[StructEval repository](https://github.com/c-box/StructEval) for model evaluation and 📖[our paper]() for experimental analysis.
5
 
6
  🚀 **_Latest News_**
7
- * [2024.8.6] We released the first version of StructEval leaderboard, which includes 22 open-sourced language models, more datasets and models as comming soon🔥🔥🔥.
8
 
9
  * [2024.7.31] We regenerated the StructEval Benchmark based on the latest [Wikipedia](https://www.wikipedia.org/) pages (20240601) using [GPT-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) model, which could minimize the impact of data contamination🔥🔥🔥.
10
  """
@@ -36,6 +36,7 @@ Inspired from the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/Hugg
36
 
37
 
38
  NOTES_TEXT = """
 
39
  * On most models on base MMLU, we collected the results for their official technical report. For the models that have not been reported, we use opencompass for evaluation.
40
  * For other 2 base benchmarks and all 3 structured benchmarks: for chat models, we evaluate them under 0-shot setting; for completion model, we evaluate them under 0-shot setting with ppl. And we keep the prompt format consistent across all benchmarks.
41
  """
 
4
  Please refer to 🐱[StructEval repository](https://github.com/c-box/StructEval) for model evaluation and 📖[our paper]() for experimental analysis.
5
 
6
  🚀 **_Latest News_**
7
+ * [2024.8.6] We released the first version of StructEval leaderboard, which includes 22 open-sourced language models, more datasets and models are comming soon🔥🔥🔥.
8
 
9
  * [2024.7.31] We regenerated the StructEval Benchmark based on the latest [Wikipedia](https://www.wikipedia.org/) pages (20240601) using [GPT-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) model, which could minimize the impact of data contamination🔥🔥🔥.
10
  """
 
36
 
37
 
38
  NOTES_TEXT = """
39
+ * Base benchmark refers to the original dataset, while struct benchmarks refer to the benchmarks constructed using StructEval with these base benchmarks as seed data.
40
  * On most models on base MMLU, we collected the results for their official technical report. For the models that have not been reported, we use opencompass for evaluation.
41
  * For other 2 base benchmarks and all 3 structured benchmarks: for chat models, we evaluate them under 0-shot setting; for completion model, we evaluate them under 0-shot setting with ppl. And we keep the prompt format consistent across all benchmarks.
42
  """