slseanwu
/

beats-conformer-bart-audio-captioner

audio-captioning

dcase-challenge

Inference Endpoints

Model card Files Files and versions Community

slseanwu commited on Jan 4

Commit

ca4f21e

•

1 Parent(s): 0f1ea92

add readme; gh repo to be added

Files changed (1) hide show

README.md +28 -0

README.md CHANGED Viewed

@@ -1,3 +1,31 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- en
+library_name: transformers
+tags:
+- audio-captioning
+- audiocaps
+- clotho
+- dcase-challenge
+- icassp-24
 ---
+## Summary
+This repo contains the config & pretrained weights of the model described in the following paper:
+- **Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation**
+  Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, and Shinji Watanabe
+  Int. Conf. on Acoustics, Speech, and Signal Processing (**ICASSP**) 2024
+  [[arXiv page](https://arxiv.org/abs/2309.17352)]
+## GitHub Repository
+To use this model, please refer to our code published at:
+- TBA
+## BibTex
+If you find our model useful, please consider citing our paper. Thanks!
+```
+@inproceedings{wu2024improving,
+  title={Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation},
+  author={Wu, Shih-Lun and Chang, Xuankai and Wichern, Gordon and Jung, Jee-weon and Germain, Fran{\c{c}}ois and Le Roux, Jonathan and Watanabe, Shinji},
+  booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
+  year={2024}
+}
+```