metadata
datasets:
- DeSTA-ntu/DeSTA2-Llama3-8B-Instruct
base_model:
- meta-llama/Meta-Llama-3-8B-Instruct
- openai/whisper-small
DeSTA2
π Paper | π Website | π©βπ» Github | π€ Model | π€ Dataset |
Quickstart
from transformers import AutoModel
HF_TOKEN = "hf_..." # your huggingface token for downloading Llama3 from official Meta repo
model = AutoModel.from_pretrained("DeSTA-ntu/DeSTA2-8B-beta", trust_remote_code=True, token=HF_TOKEN)
messages = [
{"role": "system", "content": "You are a helpful voice assistant."},
{"role": "audio", "content": "<path_to_audio_file>"},
{"role": "user", "content": "Describe the audio."}
]
generated_ids = model.chat(
messages,
max_new_tokens=128,
do_sample=True,
temperature=0.6,
top_p=0.9
)
response = model.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Citation
if you find our work useful, please consider citing the paper:
@article{lu2024developing,
title={Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data},
author={Lu, Ke-Han and Chen, Zhehuai and Fu, Szu-Wei and Yang, Chao-Han Huck and Balam, Jagadeesh and Ginsburg, Boris and Wang, Yu-Chiang Frank and Lee, Hung-yi},
journal={arXiv preprint arXiv:2409.20007},
year={2024}
}
@inproceedings{lu24c_interspeech,
title = {DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment},
author = {Ke-Han Lu and Zhehuai Chen and Szu-Wei Fu and He Huang and Boris Ginsburg and Yu-Chiang Frank Wang and Hung-yi Lee},
year = {2024},
booktitle = {Interspeech 2024},
pages = {4159--4163},
doi = {10.21437/Interspeech.2024-457},
issn = {2958-1796},
}