Spaces:
Runtime error
Runtime error
Madhavan Iyengar
commited on
Commit
•
8b5c523
1
Parent(s):
94f782b
update about and submit pages
Browse files- app.py +2 -2
- src/about.py +5 -5
app.py
CHANGED
@@ -282,8 +282,8 @@ with demo:
|
|
282 |
|
283 |
with gr.Row():
|
284 |
model_name_textbox = gr.Textbox(label="Model name")
|
285 |
-
model_zip_file = gr.File(label="Upload model ZIP file")
|
286 |
-
model_link_textbox = gr.Textbox(label="
|
287 |
with gr.Row():
|
288 |
gr.Column()
|
289 |
with gr.Column(scale=2):
|
|
|
282 |
|
283 |
with gr.Row():
|
284 |
model_name_textbox = gr.Textbox(label="Model name")
|
285 |
+
model_zip_file = gr.File(label="Upload model prediction result ZIP file")
|
286 |
+
model_link_textbox = gr.Textbox(label="Link to model page")
|
287 |
with gr.Row():
|
288 |
gr.Column()
|
289 |
with gr.Column(scale=2):
|
src/about.py
CHANGED
@@ -39,7 +39,7 @@ NUM_FEWSHOT = 0 # Change with your few shot
|
|
39 |
|
40 |
|
41 |
# Your leaderboard name
|
42 |
-
TITLE = """<h1 align="center" id="space-title"
|
43 |
<p><center>
|
44 |
<a href="https://3d-grand.github.io/" target="_blank">[Project Page]</a>
|
45 |
<a href="https://www.dropbox.com/scl/fo/5p9nb4kalnz407sbqgemg/AG1KcxeIS_SUoJ1hoLPzv84?rlkey=weunabtbiz17jitfv3f4jpmm1&dl=0" target="_blank">[3D-GRAND Data]</a>
|
@@ -49,7 +49,7 @@ TITLE = """<h1 align="center" id="space-title">3D-POPE Leaderboard</h1>
|
|
49 |
|
50 |
# What does your leaderboard evaluate?
|
51 |
INTRODUCTION_TEXT = """
|
52 |
-
#### This is the official leaderboard for the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark.
|
53 |
"""
|
54 |
|
55 |
# Which evaluations are you running? how can people reproduce what you have?
|
@@ -58,13 +58,13 @@ LLM_BENCHMARKS_TEXT = f"""
|
|
58 |
### To systematically evaluate the hallucination behavior of 3D-LLMs, we introduce the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark. 3D-POPE is designed to assess a model's ability to accurately identify the presence or absence of objects in a given 3D scene.
|
59 |
|
60 |
## Dataset
|
61 |
-
To facilitate the 3D-POPE benchmark, we curate a dedicated dataset from the ScanNet dataset, utilizing the semantic classes from ScanNet200. Specifically, we use the ScanNet validation set as the foundation for evaluating 3D-LLMs on the 3D-POPE benchmark.
|
62 |
|
63 |
-
Benchmark design. 3D-POPE consists of a set of triples, each comprising a 3D scene, a posed question, and a binary answer (“Yes” or “No”) indicating the presence or absence of an object
|
64 |
|
65 |
• Random Sampling: Nonexistent objects are randomly selected from the set of objects not present in the 3D scene.\n
|
66 |
• Popular Sampling: We select the top-k most frequent objects not present in the 3D scene, where k equals the number of objects currently in the scene.\n
|
67 |
-
• Adversarial Sampling: For each positively identified object in the scene, we rank objects that are not present and have not been used as negative samples based on their frequency of co-occurrence with the positive object in the training dataset. The highest-ranking co-occurring object is then selected as the adversarial sample. This approach differs from the original POPE
|
68 |
These sampling strategies are designed to challenge the model's robustness and assess its susceptibility to different levels of object hallucination.
|
69 |
|
70 |
## Metrics
|
|
|
39 |
|
40 |
|
41 |
# Your leaderboard name
|
42 |
+
TITLE = """<h1 align="center" id="space-title">🏠💬 3D-POPE Leaderboard 🏅</h1>
|
43 |
<p><center>
|
44 |
<a href="https://3d-grand.github.io/" target="_blank">[Project Page]</a>
|
45 |
<a href="https://www.dropbox.com/scl/fo/5p9nb4kalnz407sbqgemg/AG1KcxeIS_SUoJ1hoLPzv84?rlkey=weunabtbiz17jitfv3f4jpmm1&dl=0" target="_blank">[3D-GRAND Data]</a>
|
|
|
49 |
|
50 |
# What does your leaderboard evaluate?
|
51 |
INTRODUCTION_TEXT = """
|
52 |
+
#### This is the official leaderboard for the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark. 3D-POPE is a benchmark to evaluate object hallucination in 3D LLMs from the work [3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination](https://3d-grand.github.io/).
|
53 |
"""
|
54 |
|
55 |
# Which evaluations are you running? how can people reproduce what you have?
|
|
|
58 |
### To systematically evaluate the hallucination behavior of 3D-LLMs, we introduce the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark. 3D-POPE is designed to assess a model's ability to accurately identify the presence or absence of objects in a given 3D scene.
|
59 |
|
60 |
## Dataset
|
61 |
+
To facilitate the 3D-POPE benchmark, we curate a dedicated dataset from the [ScanNet](https://arxiv.org/abs/1702.04405) dataset, utilizing the semantic classes from [ScanNet200](https://arxiv.org/abs/2204.07761). Specifically, we use the ScanNet validation set as the foundation for evaluating 3D-LLMs on the 3D-POPE benchmark.
|
62 |
|
63 |
+
Benchmark design. 3D-POPE consists of a set of triples, each comprising a 3D scene, a posed question, and a binary answer (“Yes” or “No”) indicating the presence or absence of an object. To ensure a balanced dataset, we maintain a 1:1 ratio of existent to nonexistent objects when constructing these triples. For the selection of negative samples (i.e., nonexistent objects), we employ three distinct sampling strategies:
|
64 |
|
65 |
• Random Sampling: Nonexistent objects are randomly selected from the set of objects not present in the 3D scene.\n
|
66 |
• Popular Sampling: We select the top-k most frequent objects not present in the 3D scene, where k equals the number of objects currently in the scene.\n
|
67 |
+
• Adversarial Sampling: For each positively identified object in the scene, we rank objects that are not present and have not been used as negative samples based on their frequency of co-occurrence with the positive object in the training dataset. The highest-ranking co-occurring object is then selected as the adversarial sample. This approach differs from the original [POPE](https://arxiv.org/abs/2305.10355) to avoid adversarial samples mirroring popular samples, as indoor scenes often contain similar objects.\n
|
68 |
These sampling strategies are designed to challenge the model's robustness and assess its susceptibility to different levels of object hallucination.
|
69 |
|
70 |
## Metrics
|