slseanwu's picture
Update README.md
fe3c78e
|
raw
history blame
1.53 kB
metadata
license: apache-2.0
language:
  - en
tags:
  - audio-captioning
  - audiocaps
  - clotho
  - dcase-challenge
  - icassp-24

Summary

This repo contains the config & pretrained weights of the model described in the following paper:

  • Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
    Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, and Shinji Watanabe
    Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) 2024
    [arXiv page]

GitHub Repository

To use this model, please refer to our code published at:

Training Data

BibTex

If you find our model useful, please consider citing our paper. Thanks!

@inproceedings{wu2024improving,
  title={Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation},
  author={Wu, Shih-Lun and Chang, Xuankai and Wichern, Gordon and Jung, Jee-weon and Germain, Fran{\c{c}}ois and Le Roux, Jonathan and Watanabe, Shinji},
  booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2024}
}