slseanwu's picture
Update README.md
fe3c78e
|
raw
history blame
1.53 kB
---
license: apache-2.0
language:
- en
tags:
- audio-captioning
- audiocaps
- clotho
- dcase-challenge
- icassp-24
---
## Summary
This repo contains the config & pretrained weights of the model described in the following paper:
- **Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation**
Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, and Shinji Watanabe
Int. Conf. on Acoustics, Speech, and Signal Processing (**ICASSP**) 2024
[[arXiv page](https://arxiv.org/abs/2309.17352)]
## GitHub Repository
To use this model, please refer to our code published at:
- https://github.com/slSeanWU/beats-conformer-bart-audio-captioner
## Training Data
- Pretrain
- **AudioCaps**: https://github.com/cdjkim/audiocaps/tree/master
- **ChatGPT mix-ups from Clotho**: https://huggingface.co/datasets/slseanwu/clotho-chatgpt-mixup-50K
- Finetune
- **Clotho (V2)**: https://zenodo.org/records/4783391
## BibTex
If you find our model useful, please consider citing our paper. Thanks!
```
@inproceedings{wu2024improving,
title={Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation},
author={Wu, Shih-Lun and Chang, Xuankai and Wichern, Gordon and Jung, Jee-weon and Germain, Fran{\c{c}}ois and Le Roux, Jonathan and Watanabe, Shinji},
booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
year={2024}
}
```