S2S_Evaluation / data /rhythm.json
KurtDu's picture
Upload 21 files
50ad069 verified
raw
history blame
7.47 kB
[
{
"id": "rhythm_audio_0",
"input_path": "/input/rhythm/audio_0.mp3",
"text": "Say the following sentence very slowly: \"The quick brown fox jumps over the lazy dog.\"",
"task": "Rhythm control capabilities",
"task_description": "Can the model adjust the output pace, speaking faster or slower as required?",
"output_path_4o": "/output/ChatGPT-4o/rhythm/audio_0/audio_0.wav",
"output_path_miniomni": "/output/Mini-Omni/rhythm/00.wav",
"output_path_speechgpt": "/output/SpeechGPT/rhythm/answer_0.wav",
"output_path_funaudio": "/output/FunAudioLLM/rhythm/audio_0.wav",
"text_cn": "非常缓慢地说以下句子:'快速的棕狐跳过了懒狗。",
"language": "English",
"category": "Education",
"output_path_4o_cascade": "/output/cascade/rhythm/audio_0.wav",
"output_path_4o_llama_omni": "/output/LLaMA_omni/rhythm/audio_0.wav",
"level": "L2"
},
{
"id": "rhythm_audio_3",
"input_path": "/input/rhythm/audio_3.mp3",
"text": "Say the first half of this sentence quickly and the second half slowly: \"In a world full of challenges, we need to find creative solutions to make things better.\"",
"task": "Rhythm control capabilities",
"task_description": "Can the model adjust the output pace, speaking faster or slower as required?",
"output_path_4o": "/output/ChatGPT-4o/rhythm/audio_3/audio_3.wav",
"output_path_miniomni": "/output/Mini-Omni/rhythm/01.wav",
"output_path_speechgpt": "/output/SpeechGPT/rhythm/answer_3.wav",
"output_path_funaudio": "/output/FunAudioLLM/rhythm/audio_1.wav",
"text_cn": "在这句话的上半年中迅速和下半部分说:“在一个充满挑战的世界中,我们需要找到创造性的解决方案来使事情变得更好。",
"language": "English",
"category": "Education",
"output_path_4o_cascade": "/output/cascade/rhythm/audio_3.wav",
"output_path_4o_llama_omni": "/output/LLaMA_omni/rhythm/audio_3.wav",
"level": "L2"
},
{
"id": "rhythm_audio_7",
"input_path": "/input/rhythm/audio_7.mp3",
"text": "First, say this sentence as fast as you can, then repeat it very slowly: \"The rain in Spain falls mainly in the plain.\"",
"task": "Rhythm control capabilities",
"task_description": "Can the model adjust the output pace, speaking faster or slower as required?",
"output_path_4o": "/output/ChatGPT-4o/rhythm/audio_7/audio_7.wav",
"output_path_miniomni": "/output/Mini-Omni/rhythm/02.wav",
"output_path_speechgpt": "/output/SpeechGPT/rhythm/answer_7.wav",
"output_path_funaudio": "/output/FunAudioLLM/rhythm/audio_2.wav",
"text_cn": "首先,尽可能快地说这句话,然后非常缓慢地重复:'西班牙的降雨主要落在平原上。",
"language": "English",
"category": "Education",
"output_path_4o_cascade": "/output/cascade/rhythm/audio_7.wav",
"output_path_4o_llama_omni": "/output/LLaMA_omni/rhythm/audio_7.wav",
"level": "L2"
},
{
"id": "rhythm_audio_18",
"input_path": "/input/rhythm/audio_18.mp3",
"text": "Repeat the phrase 'Practice makes perfect' three times, each time at a different speed: first very slowly, then moderately, and finally very fast.",
"task": "Rhythm control capabilities",
"task_description": "Can the model adjust the output pace, speaking faster or slower as required?",
"output_path_4o": "/output/ChatGPT-4o/rhythm/audio_18/audio_18.wav",
"output_path_miniomni": "/output/Mini-Omni/rhythm/03.wav",
"output_path_speechgpt": "/output/SpeechGPT/rhythm/answer_18.wav",
"output_path_funaudio": "/output/FunAudioLLM/rhythm/audio_3.wav",
"text_cn": "每次以不同的速度重复三次“练习完美”一词:首先非常缓慢,然后适度,最后非常快。",
"language": "English",
"category": "Education",
"output_path_4o_cascade": "/output/cascade/rhythm/audio_18.wav",
"output_path_4o_llama_omni": "/output/LLaMA_omni/rhythm/audio_18.wav",
"level": "L2"
},
{
"id": "rhythm_audio_22",
"input_path": "/input/rhythm/audio_22.mp3",
"text": "Say the following sentence as if you're whispering: \"This is a secret, you can't tell anyone.\"",
"task": "Rhythm control capabilities",
"task_description": "Can the model adjust the output pace, speaking faster or slower as required?",
"output_path_4o": "/output/ChatGPT-4o/rhythm/audio_22/audio_22.wav",
"output_path_miniomni": "/output/Mini-Omni/rhythm/04.wav",
"output_path_speechgpt": "/output/SpeechGPT/rhythm/answer_22.wav",
"output_path_funaudio": "/output/FunAudioLLM/rhythm/audio_4.wav",
"text_cn": "说以下句子,好像您在窃窃私语:“这是一个秘密,您不能告诉任何人。",
"language": "English",
"category": "Education",
"output_path_4o_cascade": "/output/cascade/rhythm/audio_22.wav",
"output_path_4o_llama_omni": "/output/LLaMA_omni/rhythm/audio_22.wav",
"level": "L2"
},
{
"id": "rhythm_audio_23",
"input_path": "/input/rhythm/audio_23.mp3",
"text": "Repeat this sentence very loudly as if you are trying to get someone's attention: \"Watch out! There's something coming your way!\"",
"task": "Rhythm control capabilities",
"task_description": "Can the model adjust the output pace, speaking faster or slower as required?",
"output_path_4o": "/output/ChatGPT-4o/rhythm/audio_23/audio_23.wav",
"output_path_miniomni": "/output/Mini-Omni/rhythm/05.wav",
"output_path_speechgpt": "/output/SpeechGPT/rhythm/answer_23.wav",
"output_path_funaudio": "/output/FunAudioLLM/rhythm/audio_5.wav",
"text_cn": "大声重复这句话,好像您想引起某人的注意:“当心!有一些事情!",
"language": "English",
"category": "Education",
"output_path_4o_cascade": "/output/cascade/rhythm/audio_23.wav",
"output_path_4o_llama_omni": "/output/LLaMA_omni/rhythm/audio_23.wav",
"level": "L2"
},
{
"id": "rhythm_input_audio_3",
"input_path": "/input/rhythm/input_audio_3.wav",
"text": "Say the following sentence at my speed first, then say it again very slowly: \"Artificial intelligence is changing the world in many ways.\"",
"task": "Rhythm control capabilities",
"task_description": "Can the model adjust the output pace, speaking faster or slower as required?",
"output_path_4o": "/output/ChatGPT-4o/rhythm/4o_audio_3.wav",
"output_path_miniomni": "/output/Mini-Omni/rhythm/mini-omni_03.wav",
"output_path_speechgpt": "/output/SpeechGPT/rhythm/SpeechGPT_answer_3.wav",
"output_path_funaudio": "/output/FunAudioLLM/rhythm/FunAudio_audio_3.1.wav",
"text_cn": "先按我的速度说出下面这句话,然后再慢慢地说一遍:“人工智能正在以多种方式改变世界。",
"language": "English",
"category": "Education",
"output_path_4o_cascade": "/output/cascade/rhythm/input_audio_3.wav",
"output_path_4o_llama_omni": "/output/LLaMA_omni/rhythm/input_audio_3.wav",
"level": "L3"
}
]