stereoplegic
's Collections
Large-Scale Automatic Audiobook Creation
Paper
•
2309.03926
•
Published
•
53
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper
•
2310.00704
•
Published
•
19
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with
Multi-Scale Acoustic Prompts
Paper
•
2309.11977
•
Published
•
2
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language
Models
Paper
•
2308.16692
•
Published
•
1
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
Pretraining
Paper
•
2308.05734
•
Published
•
36
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Paper
•
2308.06873
•
Published
•
25
Fewer-token Neural Speech Codec with Time-invariant Codes
Paper
•
2310.00014
•
Published
•
1
Improved Cross-Lingual Transfer Learning For Automatic Speech
Translation
Paper
•
2306.00789
•
Published
•
1
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large
Language Models
Paper
•
2309.10707
•
Published
•
1
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
Labelling
Paper
•
2311.00430
•
Published
•
56
Reproducing Whisper-Style Training Using an Open-Source Toolkit and
Publicly Available Data
Paper
•
2309.13876
•
Published
•
1
HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models
Paper
•
2309.15701
•
Published
•
2
Massive End-to-end Models for Short Search Queries
Paper
•
2309.12963
•
Published
•
1
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework
for Speech Recognition
Paper
•
2310.06434
•
Published
•
4
MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics
Transcription
Paper
•
2108.02625
•
Published
•
1
TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression
For On-device ASR Models
Paper
•
2309.01947
•
Published
•
1
Attention or Convolution: Transformer Encoders in Audio Language Models
for Inference Efficiency
Paper
•
2311.02772
•
Published
•
3
FLAP: Fast Language-Audio Pre-training
Paper
•
2311.01615
•
Published
•
16
UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation
Paper
•
2303.05668
•
Published
•
1
One-Step Knowledge Distillation and Fine-Tuning in Using Large
Pre-Trained Self-Supervised Learning Models for Speaker Verification
Paper
•
2305.17394
•
Published
•
1
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech
Models
Paper
•
2305.17651
•
Published
•
1
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech
Representations
Paper
•
2203.16965
•
Published
•
1
Task-Agnostic Structured Pruning of Speech Representation Models
Paper
•
2306.01385
•
Published
•
1
Recycle-and-Distill: Universal Compression Strategy for
Transformer-based Speech SSL Models with Attention Map Reusing and Masking
Distillation
Paper
•
2305.11685
•
Published
•
2
Beyond Universal Transformer: block reusing with adaptor in Transformer
for automatic speech recognition
Paper
•
2303.13072
•
Published
•
1
SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture
of Experts
Paper
•
2105.03036
•
Published
•
2
Language-Routing Mixture of Experts for Multilingual and Code-Switching
Speech Recognition
Paper
•
2307.05956
•
Published
•
1
Continual Learning for Monolingual End-to-End Automatic Speech
Recognition
Paper
•
2112.09427
•
Published
•
1
From Words to Music: A Study of Subword Tokenization Techniques in
Symbolic Music Generation
Paper
•
2304.08953
•
Published
•
1
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic
Speech Recognition
Paper
•
2209.15176
•
Published
•
1
Decoder-only Architecture for Speech Recognition with CTC Prompts and
Text Data Augmentation
Paper
•
2309.08876
•
Published
•
1
Phonetic-assisted Multi-Target Units Modeling for Improving
Conformer-Transducer ASR system
Paper
•
2211.01571
•
Published
•
1
E-Branchformer: Branchformer with Enhanced merging for speech
recognition
Paper
•
2210.00077
•
Published
•
1
Semi-Autoregressive Streaming ASR With Label Context
Paper
•
2309.10926
•
Published
•
1
Augmenting text for spoken language understanding with Large Language
Models
Paper
•
2309.09390
•
Published
•
2
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper
•
2312.15821
•
Published
•
12
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
•
2312.03491
•
Published
•
34
Efficient Monotonic Multihead Attention
Paper
•
2312.04515
•
Published
•
6
Qwen-Audio: Advancing Universal Audio Understanding via Unified
Large-Scale Audio-Language Models
Paper
•
2311.07919
•
Published
•
9
Towards General-Purpose Speech Abilities for Large Language Models Using
Unpaired Data
Paper
•
2311.06753
•
Published
•
6
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
Paper
•
2401.12179
•
Published
•
19
StreamVoice: Streamable Context-Aware Language Modeling for Real-time
Zero-Shot Voice Conversion
Paper
•
2401.11053
•
Published
•
9
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Paper
•
2401.10032
•
Published
•
12
BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural
network for speech super-resolution
Paper
•
2312.13722
•
Published
•
1
Incremental FastPitch: Chunk-based High Quality Text to Speech
Paper
•
2401.01755
•
Published
•
8
CoMoSVC: Consistency Model-based Singing Voice Conversion
Paper
•
2401.01792
•
Published
•
8
Towards High-Quality and Efficient Speech Bandwidth Extension with
Parallel Amplitude and Phase Prediction
Paper
•
2401.06387
•
Published
•
1
Multi-Scale Sub-Band Constant-Q Transform Discriminator for
High-Fidelity Vocoder
Paper
•
2311.14957
•
Published
•
2
ModaVerse: Efficiently Transforming Modalities with LLMs
Paper
•
2401.06395
•
Published
•
3
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Paper
•
2401.00246
•
Published
•
10
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for
End-to-End Speech Recognition
Paper
•
2209.08326
•
Published
•
1
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper
•
2312.09911
•
Published
•
53
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on
E-Branchformer
Paper
•
2401.16658
•
Published
•
13
SpeechAgents: Human-Communication Simulation with Multi-Modal
Multi-Agent Systems
Paper
•
2401.03945
•
Published
Multilingual Byte2Speech Models for Scalable Low-resource Speech
Synthesis
Paper
•
2103.03541
•
Published
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Paper
•
2402.08846
•
Published
•
1
Bilingual End-to-End ASR with Byte-Level Subwords
Paper
•
2205.00485
•
Published
Speak While You Think: Streaming Speech Synthesis During Text Generation
Paper
•
2309.11210
•
Published