ABOUT_TEXT = """# What is RACE benchmark? RACE is a multi-dimensional benchmark for code generation that focuses on **R**eadability, m**A**intainability, **C**orrectness, and **E**fficiency. Its goal is to evaluate LLM's ability to generate code that is correct and meets the requirements of real-world development scenarios. The benchmark is designed with various real-world demands across different **_demand-dependent_** dimensions, making it more applicable to practical scenarios. # What are the specific aspects to be evaluated? We have summarized representative influencing factors in real-world scenarios for different dimensions and designed various requirements for each factor. These have been incorporated into the task description to prompt the LLM to generate code that is correct and meets the specified requirements. The specific factors are as follows: - **Readability**: The code should be easy to read and understand. - `Comment` - `Naming Convention` - `Code Length` - **Maintainability**: The code should be easy to maintain and extend. - `MI Metric` - `Modularity` - **Efficiency**: The code should be efficient in terms of time and space complexity. - `Time Complexity` - `Space Complexity` # How to evaluate? To facilitate evaluation on the RACE benchmark, we provide the evaluation data and easy-to-use evaluation scripts in our 🏎️RACE GitHub repository. Additionally, factors involving execution-based evaluation are conducted in a virtual environment to ensure evaluation security. # Contact If you have any questions, feel free to reach out to us at [zhengjiasheng2022@iscas.ac.cn](mailto:zhengjiasheng2022@iscas.ac.cn). # Citation Information ```bibtex ``` """ CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results" CITATION_BUTTON_TEXT = r""" """ ACKNOWLEDGEMENT_TEXT = """ Inspired from the [πŸ€— Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). """ NOTES_TEXT = """ **Notes:** - `πŸ’― RACE Score` denotes the final evaluation result based on 🏎️RACE benchmark, which is the average of the scores in the four dimensions: `βœ… Correctness`, `πŸ“– Readability`, `πŸ”¨ Maintainability`, and `πŸš€ Efficiency`. - All fine-grained evaluation results are provided in `⏬ Hidden Columns`. `πŸ“– R` denotes code **R**eadability, `πŸ”¨ M` denotes code **M**aintainability, and `πŸš€ E` denotes code **E**fficiency. `*` denotes the correctness of the code in the corresponding dimension. More details about the abbreviations are as follows: - `πŸ“– R*`: The code accuracy (baesline). - `πŸ“– RN`: The proportion of code that is both functionally correct and follows customized instructions related to `Naming Convention`. - `πŸ“– RL`: The proportion of code that is both functionally correct and follows customized instructions related to `Code Length`. - `πŸ“– RC`: The proportion of code that is both functionally correct and follows customized instructions related to `Comment`. - `πŸ”¨ MI*`: The code accuracy related to `Maintainability Index` (baesline). - `πŸ”¨ MI`: The proportion of code that is both functionally correct and follows customized instructions related to `MI Metric`. - `πŸ”¨ MC*`: The code accuracy related to `Modularity` (baesline). - `πŸ”¨ MC`: The proportion of code that is both functionally correct and follows customized instructions related to `Modularity`. - `πŸš€ E*`: The code accuracy (baesline). - `πŸš€ E_NI_T`: The proportion of code that is both functionally correct and follows customized instructions related to `Time Complexity`. - `πŸš€ E_NI_S`: The proportion of code that is both functionally correct and follows customized instructions related to `Space Complexity`. - Regarding the types of evaluation results, `πŸ”¨ MI`, `πŸš€ E_NI_T`, and `πŸš€ E_NI_S` are scalar values ranging from 0 to 100, while the remaining metrics are percentages. - For more explanation check the πŸ“ About section. """