Spaces:

jszheng
/

RACE_leaderboard

Running

RACE_leaderboard / text_content.py

Jason Zheng

first commit

6906870 4 months ago

4.01 kB

	ABOUT_TEXT = """# What is RACE benchmark?
	RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.
	Its goal is to evaluate LLM's ability to generate code that is correct and meets the requirements of real-world development scenarios.
	The benchmark is designed with various real-world demands across different _demand-dependent_ dimensions, making it more applicable to practical scenarios.

	# What are the specific aspects to be evaluated?
	We have summarized representative influencing factors in real-world scenarios for different dimensions and designed various requirements for each factor.
	These have been incorporated into the task description to prompt the LLM to generate code that is correct and meets the specified requirements.
	The specific factors are as follows:
	- Readability: The code should be easy to read and understand.
	- `Comment`
	- `Naming Convention`
	- `Code Length`
	- Maintainability: The code should be easy to maintain and extend.
	- `MI Metric`
	- `Modularity`
	- Efficiency: The code should be efficient in terms of time and space complexity.
	- `Time Complexity`
	- `Space Complexity`

	# How to evaluate?
	To facilitate evaluation on the RACE benchmark, we provide the evaluation data and easy-to-use evaluation scripts in our 🏎️RACE GitHub repository.
	Additionally, factors involving execution-based evaluation are conducted in a virtual environment to ensure evaluation security.

	# Contact
	If you have any questions, feel free to reach out to us at [[email protected]](mailto:[email protected]).

	# Citation Information
	```bibtex

	```
	"""

	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"

	CITATION_BUTTON_TEXT = r"""

	"""

	ACKNOWLEDGEMENT_TEXT = """
	Inspired from the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
	"""


	NOTES_TEXT = """
	Notes:
	- `💯 RACE Score` denotes the final evaluation result based on 🏎️RACE benchmark, which is the average of the scores in the four dimensions: `✅ Correctness`, `📖 Readability`, `🔨 Maintainability`, and `🚀 Efficiency`.
	- All fine-grained evaluation results are provided in `⏬ Hidden Columns`. `📖 R` denotes code Readability, `🔨 M` denotes code Maintainability, and `🚀 E` denotes code Efficiency. `*` denotes the correctness of the code in the corresponding dimension. More details about the abbreviations are as follows:
	- `📖 R*`: The code accuracy (baesline).
	- `📖 RN`: The proportion of code that is both functionally correct and follows customized instructions related to `Naming Convention`.
	- `📖 RL`: The proportion of code that is both functionally correct and follows customized instructions related to `Code Length`.
	- `📖 RC`: The proportion of code that is both functionally correct and follows customized instructions related to `Comment`.
	- `🔨 MI*`: The code accuracy related to `Maintainability Index` (baesline).
	- `🔨 MI`: The proportion of code that is both functionally correct and follows customized instructions related to `MI Metric`.
	- `🔨 MC*`: The code accuracy related to `Modularity` (baesline).
	- `🔨 MC`: The proportion of code that is both functionally correct and follows customized instructions related to `Modularity`.
	- `🚀 E*`: The code accuracy (baesline).
	- `🚀 E_NI_T`: The proportion of code that is both functionally correct and follows customized instructions related to `Time Complexity`.
	- `🚀 E_NI_S`: The proportion of code that is both functionally correct and follows customized instructions related to `Space Complexity`.

	- Regarding the types of evaluation results, `🔨 MI`, `🚀 E_NI_T`, and `🚀 E_NI_S` are scalar values ranging from 0 to 100, while the remaining metrics are percentages.
	- For more explanation check the 📝 About section.
	"""