metadata
license: apache-2.0
metrics:
- accuracy
pipeline_tag: audio-classification
说话识别
针对通话场景中的声音如:
sound | description |
---|---|
bell | 响铃 |
music | 音乐 |
mute | 静音(完全没有声音) |
noise | 噪音(声音比较大的噪音) |
noise_mute | 环境音(其实也是噪音, 但声音比较小) |
voice | 语音(用户说话的声音, 但如果是远场说话则被认为是环境音) |
voicemail | 语音信箱(运营商播报的语音信箱) |
white_noise | 白噪声(一般是电话线路导致的, 嗡嗡的声音) |
些模型将以上声音区分为 "non_voice", "voice" 两种. 如下:
sound | label |
---|---|
bell | non_voice |
music | non_voice |
mute | non_voice |
noise | non_voice |
noise_mute | non_voice |
voice | voice |
voicemail | voice |
white_noise | voice |
准确率:
sound | accuracy |
---|---|
non_voice | 95.27% |
voice | 95.48% |
total | 95.35% |