yanglet commited on
Commit
ead04f7
2 Parent(s): e35301b 61d4fa3

Merge pull request #15 from miragecoa/main

Browse files

Update README.md with contribution guidelines.

Files changed (2) hide show
  1. README.md +44 -25
  2. src/about.py +6 -8
README.md CHANGED
@@ -37,31 +37,50 @@ OFLL provides a specialized evaluation framework tailored specifically to the fi
37
  The Open Financial LLM Leaderboard aims to set a new standard in evaluating the capabilities of language models in the financial domain, offering a specialized, real-world-focused benchmarking solution.
38
 
39
 
40
- # Start the configuration
41
-
42
- Most of the variables to change for a default leaderboard are in `src/env.py` (replace the path for your leaderboard) and `src/about.py` (for tasks).
43
-
44
- Results files should have the following format and be stored as json files:
45
- ```json
46
- {
47
- "config": {
48
- "model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
49
- "model_name": "path of the model on the hub: org/model",
50
- "model_sha": "revision on the hub",
51
- },
52
- "results": {
53
- "task_name": {
54
- "metric_name": score,
55
- },
56
- "task_name2": {
57
- "metric_name": score,
58
- }
59
- }
60
- }
61
- ```
62
-
63
- Request files are created automatically by this tool.
64
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
  If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
66
 
67
  # Code logic for more complex edits
 
37
  The Open Financial LLM Leaderboard aims to set a new standard in evaluating the capabilities of language models in the financial domain, offering a specialized, real-world-focused benchmarking solution.
38
 
39
 
40
+ # Contribute to OFLL
41
+
42
+ To make the leaderboard more accessible for external contributors, we offer clear guidelines for adding tasks, updating result files, and other maintenance activities.
43
+
44
+ 1. **Primary Files**:
45
+ - `src/env.py`: Modify variables like repository paths for customization.
46
+ - `src/about.py`: Update task configurations here to add new datasets.
47
+
48
+ 2. **Adding New Tasks**:
49
+ - Navigate to `src/about.py` and specify new tasks in the `Tasks` enum section.
50
+ - Each task requires details such as `benchmark`, `metric`, `col_name`, and `category`. For example:
51
+ ```python
52
+ taskX = Task("DatasetName", "MetricType", "ColumnName", category="Category")
53
+ ```
54
+
55
+ 3. **Updating Results Files**:
56
+ - Results files should be in JSON format and structured as follows:
57
+ ```json
58
+ {
59
+ "config": {
60
+ "model_dtype": "torch.float16",
61
+ "model_name": "path of the model on the hub: org/model",
62
+ "model_sha": "revision on the hub"
63
+ },
64
+ "results": {
65
+ "task_name": {
66
+ "metric_name": score
67
+ },
68
+ "task_name2": {
69
+ "metric_name": score
70
+ }
71
+ }
72
+ }
73
+ ```
74
+
75
+ 4. **Updating Leaderboard Data**:
76
+ - When a new task is added, ensure that the results JSON files reflect this update. This process will be automated in future releases.
77
+ - Access the current results at [Hugging Face Datasets](https://huggingface.co/datasets/TheFinAI/results/tree/main/demo-leaderboard).
78
+
79
+ 5. **Useful Links**:
80
+ - [Hugging Face Leaderboard Documentation](https://huggingface.co/docs/leaderboards/en/leaderboards/building_page)
81
+ - [OFLL Demo on Hugging Face](https://huggingface.co/spaces/finosfoundation/Open-Financial-LLM-Leaderboard)
82
+
83
+
84
  If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
85
 
86
  # Code logic for more complex edits
src/about.py CHANGED
@@ -194,12 +194,10 @@ If everything is done, check you can launch the EleutherAIHarness on your model
194
 
195
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
196
  CITATION_BUTTON_TEXT = r"""
197
- @misc{xie2024finben,
198
- title={The FinBen: An Holistic Financial Benchmark for Large Language Models},
199
- author={Qianqian Xie and Weiguang Han and Zhengyu Chen and Ruoyu Xiang and Xiao Zhang and Yueru He and Mengxi Xiao and Dong Li and Yongfu Dai and Duanyu Feng and Yijing Xu and Haoqiang Kang and Ziyan Kuang and Chenhan Yuan and Kailai Yang and Zheheng Luo and Tianlin Zhang and Zhiwei Liu and Guojun Xiong and Zhiyang Deng and Yuechen Jiang and Zhiyuan Yao and Haohang Li and Yangyang Yu and Gang Hu and Jiajia Huang and Xiao-Yang Liu and Alejandro Lopez-Lira and Benyou Wang and Yanzhao Lai and Hao Wang and Min Peng and Sophia Ananiadou and Jimin Huang},
200
- year={2024},
201
- eprint={2402.12659},
202
- archivePrefix={arXiv},
203
- primaryClass={cs.CL}
204
- }
205
  """
 
194
 
195
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
196
  CITATION_BUTTON_TEXT = r"""
197
+ @article{Xie2024FinBen,
198
+ title={FinBen: A Holistic Financial Benchmark for Large Language Models},
199
+ author={Qianqian Xie and Weiguang Han and Zhengyu Chen and Ruoyu Xiang and Xiao Zhang and Yueru He and Mengxi Xiao and Dong Li and Yongfu Dai and Duanyu Feng and Yijing Xu and Haoqiang Kang and Ziyan Kuang and Chenhan Yuan and Kailai Yang and Zheheng Luo and Tianlin Zhang and Zhiwei Liu and Guojun Xiong and Zhiyang Deng and Yuechen Jiang and Zhiyuan Yao and Haohang Li and Yangyang Yu and Gang Hu and Jiajia Huang and Xiao-Yang Liu and Alejandro Lopez-Lira and Benyou Wang and Yanzhao Lai and Hao Wang and Min Peng and Sophia Ananiadou and Jimin Huang},
200
+ journal={NeurIPS, Special Track on Datasets and Benchmarks},
201
+ year={2024},
202
+ }
 
 
203
  """