Spaces:

ucla-contextual
/

contextual_leaderboard

Runtime error

App Files Files Community

Where is the test dataset?

by zhiminy - opened Mar 16

Discussion

zhiminy

Mar 16

•

edited Mar 16

Checking https://github.com/rohan598/ConTextual/blob/main/data/contextual_all.csv and https://huggingface.co/datasets/ucla-contextual/contextual_all respectively, I only found the full dataset instead of the test dataset. Where is the test dataset after all?

The ambiguity in the description raises questions about which specific dataset—train, test, or full—the leaderboard employs to present its evaluation results. To ensure clarity and facilitate accurate interpretation, it would be helpful to explicitly state the dataset used for evaluation in the leaderboard's documentation.

rohan598

ucla-contextual org Mar 18

Hi @zhiminy ,
Apologies for the confusion,
We have two leaderboards (val) and (test)

For the val leaderboard, please use contextual_val.csv

For the test leaderboard, please use contextual_all.csv

Note: This is only an evaluation benchmark so there are no training samples. The train in this image, is a naming convention of the platform (will look into how we change it)

Val leaderboard is to give you a quick idea about how well your model might perform on the overall dataset and how well it understand these contextual tasks on text-rich images

Test leaderboard is a final evaluation of the performance of your model on all the samples of this dataset.

To prevent over-engineering of the benchmark, we keep release only part of the image, instruction, response triplets (100 out of 506) for validation, while keeping the remaining hidden.

zhiminy

Mar 18

•

edited Mar 18

Hi @zhiminy ,
Apologies for the confusion,
We have two leaderboards (val) and (test)

For the val leaderboard, please use contextual_val.csv

For the test leaderboard, please use contextual_all.csv

Note: This is only an evaluation benchmark so there are no training samples. The train in this image, is a naming convention of the platform (will look into how we change it)

Val leaderboard is to give you a quick idea about how well your model might perform on the overall dataset and how well it understand these contextual tasks on text-rich images

Test leaderboard is a final evaluation of the performance of your model on all the samples of this dataset.

To prevent over-engineering of the benchmark, we keep release only part of the image, instruction, response triplets (100 out of 506) for validation, while keeping the remaining hidden.

Thanks for your explanation! Considering that "all" actually refers to "test," it would be beneficial to standardize the terminology to avoid any potential confusion among users.

rohan598

ucla-contextual org Mar 20

Thanks for spotting this and the suggestion. We have updated all ConTextual resources for consistency. In case you still find something mis aligned, feel free to reopen this issue!

rohan598 changed discussion status to closed Mar 20

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment