Update README.md
Browse files
README.md
CHANGED
@@ -10,11 +10,11 @@ tags:
|
|
10 |
- speech-to-speech
|
11 |
---
|
12 |
|
13 |
-
#
|
14 |
|
15 |
> **Authors: [Qingkai Fang](https://fangqingkai.github.io/), [Shoutao Guo](https://scholar.google.com/citations?hl=en&user=XwHtPyAAAAAJ), [Yan Zhou](https://zhouyan19.github.io/zhouyan/), [Zhengrui Ma](https://scholar.google.com.hk/citations?user=dUgq6tEAAAAJ), [Shaolei Zhang](https://zhangshaolei1998.github.io/), [Yang Feng*](https://people.ucas.edu.cn/~yangfeng?language=en)**
|
16 |
|
17 |
-
[[Paper]](https://arxiv.org/abs/
|
18 |
|
19 |
LLaMA-Omni is a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions.
|
20 |
|
@@ -22,13 +22,13 @@ LLaMA-Omni is a speech-language model built upon Llama-3.1-8B-Instruct. It suppo
|
|
22 |
|
23 |
## π‘ Highlights
|
24 |
|
25 |
-
πͺ **Built on Llama-3.1-8B-Instruct, ensuring high-quality responses.**
|
26 |
|
27 |
-
π **Low-latency speech interaction with a latency as low as 226ms.**
|
28 |
|
29 |
-
π§ **Simultaneous generation of both text and speech responses.**
|
30 |
|
31 |
-
β»οΈ **Trained in less than 3 days using just 4 GPUs.**
|
32 |
|
33 |
|
34 |
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65b7573482d384513443875e/dr4XWUxzuVQ52lBuzNBTt.mp4"></video>
|
@@ -126,7 +126,7 @@ If our work is useful for you, please cite as:
|
|
126 |
@article{fang-etal-2024-llama-omni,
|
127 |
title={LLaMA-Omni: Seamless Speech Interaction with Large Language Models},
|
128 |
author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang},
|
129 |
-
journal={arXiv preprint arXiv:
|
130 |
year={2024}
|
131 |
}
|
132 |
```
|
|
|
10 |
- speech-to-speech
|
11 |
---
|
12 |
|
13 |
+
# π¦π§ LLaMA-Omni: Seamless Speech Interaction with Large Language Models
|
14 |
|
15 |
> **Authors: [Qingkai Fang](https://fangqingkai.github.io/), [Shoutao Guo](https://scholar.google.com/citations?hl=en&user=XwHtPyAAAAAJ), [Yan Zhou](https://zhouyan19.github.io/zhouyan/), [Zhengrui Ma](https://scholar.google.com.hk/citations?user=dUgq6tEAAAAJ), [Shaolei Zhang](https://zhangshaolei1998.github.io/), [Yang Feng*](https://people.ucas.edu.cn/~yangfeng?language=en)**
|
16 |
|
17 |
+
[[Paper]](https://arxiv.org/abs/2409.06666) [[Model]](https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni) [[Code]](https://github.com/ictnlp/LLaMA-Omni)
|
18 |
|
19 |
LLaMA-Omni is a speech-language model built upon Llama-3.1-8B-Instruct. It supports low-latency and high-quality speech interactions, simultaneously generating both text and speech responses based on speech instructions.
|
20 |
|
|
|
22 |
|
23 |
## π‘ Highlights
|
24 |
|
25 |
+
- πͺ **Built on Llama-3.1-8B-Instruct, ensuring high-quality responses.**
|
26 |
|
27 |
+
- π **Low-latency speech interaction with a latency as low as 226ms.**
|
28 |
|
29 |
+
- π§ **Simultaneous generation of both text and speech responses.**
|
30 |
|
31 |
+
- β»οΈ **Trained in less than 3 days using just 4 GPUs.**
|
32 |
|
33 |
|
34 |
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65b7573482d384513443875e/dr4XWUxzuVQ52lBuzNBTt.mp4"></video>
|
|
|
126 |
@article{fang-etal-2024-llama-omni,
|
127 |
title={LLaMA-Omni: Seamless Speech Interaction with Large Language Models},
|
128 |
author={Fang, Qingkai and Guo, Shoutao and Zhou, Yan and Ma, Zhengrui and Zhang, Shaolei and Feng, Yang},
|
129 |
+
journal={arXiv preprint arXiv:2409.06666},
|
130 |
year={2024}
|
131 |
}
|
132 |
```
|