Evaluation is not consistent with local evaluation

#29
by morenolq - opened

Hi, I tried to use the space to evaluate some models for the summarization task. However, the results I got are very different from what I got when running the test on my local machine. I used the same configuration for the metric (rouge) and, of course, the same package (evaluate) and test sets.

Is this something known? Is there any way to solve this?

Hi! Can you share your setup in both evaluate and the configuration used here so we can reproduce? Thanks!

Sign up or log in to comment