Spaces:
Running
Running
ABOUT_TEXT = """# What is RACE benchmark? | |
RACE is a multi-dimensional benchmark for code generation that focuses on **R**eadability, m**A**intainability, **C**orrectness, and **E**fficiency. | |
Its goal is to evaluate LLM's ability to generate code that is correct and meets the requirements of real-world development scenarios. | |
The benchmark is designed with various real-world demands across different **_demand-dependent_** dimensions, making it more applicable to practical scenarios. | |
# What are the specific aspects to be evaluated? | |
We have summarized representative influencing factors in real-world scenarios for different dimensions and designed various requirements for each factor. | |
These have been incorporated into the task description to prompt the LLM to generate code that is correct and meets the specified requirements. | |
The specific factors are as follows: | |
- **Readability**: The code should be easy to read and understand. | |
- `Comment` | |
- `Naming Convention` | |
- `Code Length` | |
- **Maintainability**: The code should be easy to maintain and extend. | |
- `MI Metric` | |
- `Modularity` | |
- **Efficiency**: The code should be efficient in terms of time and space complexity. | |
- `Time Complexity` | |
- `Space Complexity` | |
# How to evaluate? | |
To facilitate evaluation on the RACE benchmark, we provide the evaluation data and easy-to-use evaluation scripts in our ποΈRACE GitHub repository. | |
Additionally, factors involving execution-based evaluation are conducted in a virtual environment to ensure evaluation security. | |
# Contact | |
If you have any questions, feel free to reach out to us at [[email protected]](mailto:[email protected]). | |
# Citation Information | |
```bibtex | |
``` | |
""" | |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results" | |
CITATION_BUTTON_TEXT = r""" | |
""" | |
ACKNOWLEDGEMENT_TEXT = """ | |
Inspired from the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). | |
""" | |
NOTES_TEXT = """ | |
**Notes:** | |
- `π― RACE Score` denotes the final evaluation result based on ποΈRACE benchmark, which is the average of the scores in the four dimensions: `β Correctness`, `π Readability`, `π¨ Maintainability`, and `π Efficiency`. | |
- All fine-grained evaluation results are provided in `β¬ Hidden Columns`. `π R` denotes code **R**eadability, `π¨ M` denotes code **M**aintainability, and `π E` denotes code **E**fficiency. `*` denotes the correctness of the code in the corresponding dimension. More details about the abbreviations are as follows: | |
- `π R*`: The code accuracy (baesline). | |
- `π RN`: The proportion of code that is both functionally correct and follows customized instructions related to `Naming Convention`. | |
- `π RL`: The proportion of code that is both functionally correct and follows customized instructions related to `Code Length`. | |
- `π RC`: The proportion of code that is both functionally correct and follows customized instructions related to `Comment`. | |
- `π¨ MI*`: The code accuracy related to `Maintainability Index` (baesline). | |
- `π¨ MI`: The proportion of code that is both functionally correct and follows customized instructions related to `MI Metric`. | |
- `π¨ MC*`: The code accuracy related to `Modularity` (baesline). | |
- `π¨ MC`: The proportion of code that is both functionally correct and follows customized instructions related to `Modularity`. | |
- `π E*`: The code accuracy (baesline). | |
- `π E_NI_T`: The proportion of code that is both functionally correct and follows customized instructions related to `Time Complexity`. | |
- `π E_NI_S`: The proportion of code that is both functionally correct and follows customized instructions related to `Space Complexity`. | |
- Regarding the types of evaluation results, `π¨ MI`, `π E_NI_T`, and `π E_NI_S` are scalar values ranging from 0 to 100, while the remaining metrics are percentages. | |
- For more explanation check the π About section. | |
""" |