add readme; gh repo to be added
Browse files
README.md
CHANGED
@@ -1,3 +1,31 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
library_name: transformers
|
6 |
+
tags:
|
7 |
+
- audio-captioning
|
8 |
+
- audiocaps
|
9 |
+
- clotho
|
10 |
+
- dcase-challenge
|
11 |
+
- icassp-24
|
12 |
---
|
13 |
+
## Summary
|
14 |
+
This repo contains the config & pretrained weights of the model described in the following paper:
|
15 |
+
- **Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation**
|
16 |
+
Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, and Shinji Watanabe
|
17 |
+
Int. Conf. on Acoustics, Speech, and Signal Processing (**ICASSP**) 2024
|
18 |
+
[[arXiv page](https://arxiv.org/abs/2309.17352)]
|
19 |
+
## GitHub Repository
|
20 |
+
To use this model, please refer to our code published at:
|
21 |
+
- TBA
|
22 |
+
## BibTex
|
23 |
+
If you find our model useful, please consider citing our paper. Thanks!
|
24 |
+
```
|
25 |
+
@inproceedings{wu2024improving,
|
26 |
+
title={Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation},
|
27 |
+
author={Wu, Shih-Lun and Chang, Xuankai and Wichern, Gordon and Jung, Jee-weon and Germain, Fran{\c{c}}ois and Le Roux, Jonathan and Watanabe, Shinji},
|
28 |
+
booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
|
29 |
+
year={2024}
|
30 |
+
}
|
31 |
+
```
|