laion/larger_clap_music · Equally-weighted output when using with HF transformers `pipeline` API

When running the model with transformers' pipeline API, the model labels each class with ~0.5 score:

from transformers import pipeline

audio = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/piano.wav'

audio_classifier = pipeline(task="zero-shot-audio-classification", model="laion/larger_clap_music")
output = audio_classifier(audio, candidate_labels=['calm piano music', 'heavy metal'])
print(output)
# [{'score': 0.5001561641693115, 'label': 'calm piano music'}, {'score': 0.49984389543533325, 'label': 'heavy metal'}]

Is this a limitation of the model, or perhaps a usage error? If the latter, could you update the model card with example code that shows intended usage?

The example code provided in the README: also has the same issue:

from datasets import load_dataset
from transformers import pipeline

dataset = load_dataset("ashraq/esc50")
audio = dataset["train"]["audio"][-1]["array"]

audio_classifier = pipeline(task="zero-shot-audio-classification", model="laion/larger_clap_music")
output = audio_classifier(audio, candidate_labels=["Sound of a dog", "Sound of vaccum cleaner"])
print(output)
# [{'score': 0.5000357627868652, 'label': 'Sound of a dog'}, {'score': 0.49996429681777954, 'label': 'Sound of vaccum cleaner'}]