Request on prompts for the evaluation results.
#5
by
sh0416
- opened
I reproduced the result for MBPP+ and the reproduced score is 0.4385, which is different from the reported score (0.474).
I think the difference comes from the prompt that I've used for my experiment.
My prompt is as follow.
Problem: Write a function to find the shared elements from the given two lists.
Test:
assert set(similar_elements((3, 4, 5, 6),(5, 7, 4, 10))) == set((4, 5))
assert set(similar_elements((1, 2, 3, 4),(5, 4, 3, 7))) == set((3, 4))
assert set(similar_elements((11, 12, 14, 13),(17, 15, 14, 13))) == set((13, 14))
Implementation:
```python
What is the prompt for the evaluation?
Thank you