slseanwu
/

beats-conformer-bart-audio-captioner

audio-captioning

dcase-challenge

Inference Endpoints

Model card Files Files and versions Community

beats-conformer-bart-audio-captioner / README.md

slseanwu's picture

Update README.md

fe3c78e 11 months ago

|

1.53 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- audio-captioning
	- audiocaps
	- clotho
	- dcase-challenge
	- icassp-24
	---
	## Summary
	This repo contains the config & pretrained weights of the model described in the following paper:
	- Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
	Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, and Shinji Watanabe
	Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) 2024
	[[arXiv page](https://arxiv.org/abs/2309.17352)]
	## GitHub Repository
	To use this model, please refer to our code published at:
	- https://github.com/slSeanWU/beats-conformer-bart-audio-captioner
	## Training Data
	- Pretrain
	- AudioCaps: https://github.com/cdjkim/audiocaps/tree/master
	- ChatGPT mix-ups from Clotho: https://huggingface.co/datasets/slseanwu/clotho-chatgpt-mixup-50K
	- Finetune
	- Clotho (V2): https://zenodo.org/records/4783391
	## BibTex
	If you find our model useful, please consider citing our paper. Thanks!
	```
	@inproceedings{wu2024improving,
	title={Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation},
	author={Wu, Shih-Lun and Chang, Xuankai and Wichern, Gordon and Jung, Jee-weon and Germain, Fran{\c{c}}ois and Le Roux, Jonathan and Watanabe, Shinji},
	booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
	year={2024}
	}
	```