nina-summer commited on
Commit
34093da
β€’
1 Parent(s): 7ee8737

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - multimodal
8
+ - aria
9
+ ---
10
+ <p align="center">
11
+ <br>Aria</br>
12
+ </p>
13
+
14
+ <p align="center">
15
+ πŸ”— <a href="https://huggingface.co" target="_blank"> Try Aria!</a> Β· πŸ“– <a href="https://huggingface.co" target="_blank">Blog</a> Β· πŸ“Œ <a href="https://huggingface.co" target="_blank">Paper</a> Β·
16
+ Β·πŸ–€ <a href="https://huggingface.co" target="_blank">GitHub</a> πŸ’œ <a href="https://huggingface.co" target="_blank">Discord</a>
17
+ Β· πŸ’™ <a href="https://huggingface.co" target="_blank">Twitter</a>
18
+ </p>
19
+
20
+ # Highlights
21
+
22
+ - Aria is the **first open multimodal native MoE** model, capable of seamlessly handling various input modalities within a MoE architecture.
23
+ - Aria performs **on par with GPT-4o mini and Gemini 1.5 Flash** across a range of multimodal tasks while maintaining strong performance on **text**-only tasks.
24
+ - Compared to similar or even larger models, Aria boasts **faster speeds** and **lower costs**. This high efficiency stems from its ability to activate only 3.9B parameters during inference – the **fewest** among models with comparable performance.
25
+
26
+ # Key features
27
+
28
+ - **Robust multimodal understanding**: Aria processes various input modalities, including video, images, code, and text. It demonstrates strong performance across diverse downstream tasks such as long-context video and image understanding and OCR. Moreover, it excels in instruction following.
29
+ - **Flexible image handling**: Aria supports variable image sizes and aspect ratios while maintaining high quality.
30
+ - **Extended context capacity**: Aria can manage multiple images within a long context window of 64k tokens.
31
+ - **Advanced text understanding**: Aria demonstrates competitive performance across language and coding tasks.
32
+
33
+ # Model Info
34
+
35
+ | Model | Download | Parameter | Context Length |
36
+ | :---- | :------- | :------------ | :------ |
37
+ | Aria | < HF link - TBD> | β€’ Activation: 3.9B (3.5B MoE + 0.4B Visual Encoder) <br> β€’ Total: 25.3B | 64K |
38
+
39
+ # Benchmark
40
+
41
+
42
+
43
+ # Quick Start
44
+
45
+
46
+
47
+
48
+ # License
49
+
50
+ This repo is released under the Apache 2.0 License.