Spaces:

jszheng
/

RACE_leaderboard

Running

App Files Files Community

Jason Zheng commited on Jul 24

Commit

310a5d6

•

1 Parent(s): 77e8689

add latest news

Browse files

Files changed (2) hide show

app.py +2 -18
text_content.py +10 -0

app.py CHANGED Viewed

@@ -4,7 +4,7 @@ import gradio as gr
 import pandas as pd
 from css_html import custom_css
-from text_content import ABOUT_TEXT, CITATION_BUTTON_TEXT, CITATION_BUTTON_LABEL, ACKNOWLEDGEMENT_TEXT, NOTES_TEXT
 from utils import (
     AutoEvalColumn,
     fields,
@@ -66,23 +66,7 @@ with demo:
             elem_classes="markdown-text",
         )
-        gr.Markdown(
-            """
-            Based on the 🏎️RACE benchmark, we demonstrated the ability of different LLMs to generate code that is **_correct_** and **_meets the requirements of real-world development scenarios_**.
-            More details about how to evalute the LLM are available in the [🏎️RACE GitHub repository](https://github.com/jszheng21/RACE). For a complete description of RACE benchmark and related experimental analysis, please refer to the paper: [**Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models**](https://arxiv.org/abs/2407.11470). [![](https://img.shields.io/badge/arXiv-2407.11470-b31b1b.svg)](https://arxiv.org/abs/2407.11470)
-""",
-            elem_classes="markdown-text",
-        )
-#         gr.Markdown(
-#             """<div style="text-align: center;"><h1> 🏎️RACE Leaderboard</h1></div>\
-#             <br>\
-#             <p>Based on the 🏎️RACE benchmark, we demonstrated the ability of different LLMs to generate code that is <b><i>correct</i></b> and <b><i>meets the requirements of real-world development scenarios</i></b>.</p>
-#             <p>More details about how to evalute the LLM are available in the <a href="https://github.com/jszheng21/RACE">🏎️RACE GitHub repository</a>. For a complete description of RACE benchmark and related experimental analysis, please refer to the paper: Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models</p>
-# """,
-#             elem_classes="markdown-text",
-#         )
     with gr.Tabs(elem_classes="tab-buttons") as tabs:
         with gr.Column():

 import pandas as pd
 from css_html import custom_css
+from text_content import ABOUT_TEXT, CITATION_BUTTON_TEXT, CITATION_BUTTON_LABEL, ACKNOWLEDGEMENT_TEXT, NOTES_TEXT, HEAD_TEXT
 from utils import (
     AutoEvalColumn,
     fields,
             elem_classes="markdown-text",
         )
+        gr.Markdown(HEAD_TEXT, elem_classes="markdown-text")
     with gr.Tabs(elem_classes="tab-buttons") as tabs:
         with gr.Column():

text_content.py CHANGED Viewed

@@ -1,3 +1,13 @@
 ABOUT_TEXT = """# What is RACE benchmark?
 RACE is a multi-dimensional benchmark for code generation that focuses on **R**eadability, m**A**intainability, **C**orrectness, and **E**fficiency.
 Its goal is to evaluate LLM's ability to generate code that is correct and meets the requirements of real-world development scenarios.

+HEAD_TEXT = """
+Based on the 🏎️RACE benchmark, we demonstrated the ability of different LLMs to generate code that is **_correct_** and **_meets the requirements of real-world development scenarios_**.
+More details about how to evalute the LLM are available in the [🏎️RACE GitHub repository](https://github.com/jszheng21/RACE). For a complete description of RACE benchmark and related experimental analysis, please refer to the paper: [Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models](https://arxiv.org/abs/2407.11470). [![](https://img.shields.io/badge/arXiv-2407.11470-b31b1b.svg)](https://arxiv.org/abs/2407.11470)
+**_Latest News_** 🔥
+- [24/07/24] We add the evaluation results of `claude-3.5-sonnet` and `Qwen2-72B-Instruct` in [RACE leaderboard](https://huggingface.co/spaces/jszheng/RACE_leaderboard).
+- [24/07/16] We release our RACE benchmark, leaderboard and paper.
+"""
 ABOUT_TEXT = """# What is RACE benchmark?
 RACE is a multi-dimensional benchmark for code generation that focuses on **R**eadability, m**A**intainability, **C**orrectness, and **E**fficiency.
 Its goal is to evaluate LLM's ability to generate code that is correct and meets the requirements of real-world development scenarios.