Spaces:

finosfoundation
/

Open-Financial-LLM-Leaderboard

Running

App Files Files Community

yanglet commited on 3 days ago

Commit

ead04f7

•

2 Parent(s): e35301b 61d4fa3

Merge pull request #15 from miragecoa/main

Browse files

Update README.md with contribution guidelines.

Files changed (2) hide show

README.md +44 -25
src/about.py +6 -8

README.md CHANGED Viewed

@@ -37,31 +37,50 @@ OFLL provides a specialized evaluation framework tailored specifically to the fi
 The Open Financial LLM Leaderboard aims to set a new standard in evaluating the capabilities of language models in the financial domain, offering a specialized, real-world-focused benchmarking solution.
-# Start the configuration
-Most of the variables to change for a default leaderboard are in `src/env.py` (replace the path for your leaderboard) and `src/about.py` (for tasks).
-Results files should have the following format and be stored as json files:
-```json
-{
-    "config": {
-        "model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
-        "model_name": "path of the model on the hub: org/model",
-        "model_sha": "revision on the hub",
-    },
-    "results": {
-        "task_name": {
-            "metric_name": score,
-        },
-        "task_name2": {
-            "metric_name": score,
-        }
-    }
-}
-```
-Request files are created automatically by this tool.
 If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
 # Code logic for more complex edits

 The Open Financial LLM Leaderboard aims to set a new standard in evaluating the capabilities of language models in the financial domain, offering a specialized, real-world-focused benchmarking solution.
+# Contribute to OFLL
+To make the leaderboard more accessible for external contributors, we offer clear guidelines for adding tasks, updating result files, and other maintenance activities.
+1. **Primary Files**:
+   - `src/env.py`: Modify variables like repository paths for customization.
+   - `src/about.py`: Update task configurations here to add new datasets.
+2. **Adding New Tasks**:
+   - Navigate to `src/about.py` and specify new tasks in the `Tasks` enum section.
+   - Each task requires details such as `benchmark`, `metric`, `col_name`, and `category`. For example:
+     ```python
+     taskX = Task("DatasetName", "MetricType", "ColumnName", category="Category")
+     ```
+3. **Updating Results Files**:
+   - Results files should be in JSON format and structured as follows:
+     ```json
+     {
+         "config": {
+             "model_dtype": "torch.float16",
+             "model_name": "path of the model on the hub: org/model",
+             "model_sha": "revision on the hub"
+         },
+         "results": {
+             "task_name": {
+                 "metric_name": score
+             },
+             "task_name2": {
+                 "metric_name": score
+             }
+         }
+     }
+     ```
+4. **Updating Leaderboard Data**:
+   - When a new task is added, ensure that the results JSON files reflect this update. This process will be automated in future releases.
+   - Access the current results at [Hugging Face Datasets](https://huggingface.co/datasets/TheFinAI/results/tree/main/demo-leaderboard).
+5. **Useful Links**:
+   - [Hugging Face Leaderboard Documentation](https://huggingface.co/docs/leaderboards/en/leaderboards/building_page)
+   - [OFLL Demo on Hugging Face](https://huggingface.co/spaces/finosfoundation/Open-Financial-LLM-Leaderboard)
 If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
 # Code logic for more complex edits

src/about.py CHANGED Viewed

@@ -194,12 +194,10 @@ If everything is done, check you can launch the EleutherAIHarness on your model
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
 CITATION_BUTTON_TEXT = r"""
-@misc{xie2024finben,
-          title={The FinBen: An Holistic Financial Benchmark for Large Language Models},
-          author={Qianqian Xie and Weiguang Han and Zhengyu Chen and Ruoyu Xiang and Xiao Zhang and Yueru He and Mengxi Xiao and Dong Li and Yongfu Dai and Duanyu Feng and Yijing Xu and Haoqiang Kang and Ziyan Kuang and Chenhan Yuan and Kailai Yang and Zheheng Luo and Tianlin Zhang and Zhiwei Liu and Guojun Xiong and Zhiyang Deng and Yuechen Jiang and Zhiyuan Yao and Haohang Li and Yangyang Yu and Gang Hu and Jiajia Huang and Xiao-Yang Liu and Alejandro Lopez-Lira and Benyou Wang and Yanzhao Lai and Hao Wang and Min Peng and Sophia Ananiadou and Jimin Huang},
-          year={2024},
-          eprint={2402.12659},
-          archivePrefix={arXiv},
-          primaryClass={cs.CL}
-        }
 """

 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
 CITATION_BUTTON_TEXT = r"""
+@article{Xie2024FinBen,
+  title={FinBen: A Holistic Financial Benchmark for Large Language Models},
+  author={Qianqian Xie and Weiguang Han and Zhengyu Chen and Ruoyu Xiang and Xiao Zhang and Yueru He and Mengxi Xiao and Dong Li and Yongfu Dai and Duanyu Feng and Yijing Xu and Haoqiang Kang and Ziyan Kuang and Chenhan Yuan and Kailai Yang and Zheheng Luo and Tianlin Zhang and Zhiwei Liu and Guojun Xiong and Zhiyang Deng and Yuechen Jiang and Zhiyuan Yao and Haohang Li and Yangyang Yu and Gang Hu and Jiajia Huang and Xiao-Yang Liu and Alejandro Lopez-Lira and Benyou Wang and Yanzhao Lai and Hao Wang and Min Peng and Sophia Ananiadou and Jimin Huang},
+  journal={NeurIPS, Special Track on Datasets and Benchmarks},
+  year={2024},
+}
 """