multiple_choice_score: there are 12032 tasks in prompt multiple_choice_score: reading tasksmultiple_choice_score: failed to read task 1 of 12032