ICTNLP
/

Llama-3.1-8B-Omni

@@ -10,11 +10,11 @@ tags:
 - speech-to-speech
 ---
-# 🎧 LLaMA-Omni: Seamless Speech Interaction with Large Language Models
 > **Authors: [Qingkai Fang](https://fangqingkai.github.io/), [Shoutao Guo](https://scholar.google.com/citations?hl=en&user=XwHtPyAAAAAJ), [Yan Zhou](https://zhouyan19.github.io/zhouyan/), [Zhengrui Ma](https://scholar.google.com.hk/citations?user=dUgq6tEAAAAJ), [Shaolei Zhang](https://zhangshaolei1998.github.io/), [Yang Feng*](https://people.ucas.edu.cn/~yangfeng?language=en)**
-[[Paper]](https://arxiv.org/abs/xxxx.xxxxx) [[Model]](https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni) [[Code]](https://github.com/ictnlp/LLaMA-Omni)
 LLaMA-Omni is a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions.
@@ -22,13 +22,13 @@ LLaMA-Omni is a speech-language model built upon Llama-3.1-8B-Instruct. It suppo
 ## 💡 Highlights
-💪 **Built on Llama-3.1-8B-Instruct, ensuring high-quality responses.**
-🚀 **Low-latency speech interaction with a latency as low as 226ms.**
-🎧 **Simultaneous generation of both text and speech responses.**
-♻️ **Trained in less than 3 days using just 4 GPUs.**
 <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65b7573482d384513443875e/dr4XWUxzuVQ52lBuzNBTt.mp4"></video>
@@ -126,7 +126,7 @@ If our work is useful for you, please cite as:
 @article{fang-etal-2024-llama-omni,
   title={LLaMA-Omni: Seamless Speech Interaction with Large Language Models},
   author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang},
-  journal={arXiv preprint arXiv:xxxx.xxxxx},
   year={2024}
 }
 ```

 - speech-to-speech
 ---
+# 🦙🎧 LLaMA-Omni: Seamless Speech Interaction with Large Language Models
 > **Authors: [Qingkai Fang](https://fangqingkai.github.io/), [Shoutao Guo](https://scholar.google.com/citations?hl=en&user=XwHtPyAAAAAJ), [Yan Zhou](https://zhouyan19.github.io/zhouyan/), [Zhengrui Ma](https://scholar.google.com.hk/citations?user=dUgq6tEAAAAJ), [Shaolei Zhang](https://zhangshaolei1998.github.io/), [Yang Feng*](https://people.ucas.edu.cn/~yangfeng?language=en)**
+[[Paper]](https://arxiv.org/abs/2409.06666) [[Model]](https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni) [[Code]](https://github.com/ictnlp/LLaMA-Omni)
 LLaMA-Omni is a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions.
 ## 💡 Highlights
+- 💪 **Built on Llama-3.1-8B-Instruct, ensuring high-quality responses.**
+- 🚀 **Low-latency speech interaction with a latency as low as 226ms.**
+- 🎧 **Simultaneous generation of both text and speech responses.**
+- ♻️ **Trained in less than 3 days using just 4 GPUs.**
 <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65b7573482d384513443875e/dr4XWUxzuVQ52lBuzNBTt.mp4"></video>
 @article{fang-etal-2024-llama-omni,
   title={LLaMA-Omni: Seamless Speech Interaction with Large Language Models},
   author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang},
+  journal={arXiv preprint arXiv:2409.06666},
   year={2024}
 }
 ```