File size: 4,466 Bytes
39556d5
3fd8e4c
 
 
6aed7fa
3fd8e4c
 
6aed7fa
 
 
 
39556d5
1876606
 
 
3fd8e4c
637025f
 
3fd8e4c
a6af88b
 
 
1876606
86c40a2
637025f
1876606
 
 
 
 
 
a6af88b
1876606
 
 
 
3fd8e4c
 
1876606
 
 
858e5f8
1876606
 
c7b0a9f
5ebc5c9
1876606
 
 
 
 
 
7dbe7be
 
 
 
 
 
 
 
 
 
 
 
6aed7fa
 
 
 
 
 
 
 
 
 
 
 
a6af88b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6aed7fa
7dbe7be
 
 
86c40a2
 
 
 
 
 
 
 
7dbe7be
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
license: mit
license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
language:
- en
pipeline_tag: text-generation
tags:
- nlp
- code
datasets:
- LLM360/AmberDatasets
---
# MobiLlama-05B

<center><img src="MobileLLaMa.png" alt="mobillama logo" width="300"/></center>

MobiLlama-05B is a Small Language Model with **0.5 billion** parameters. It was trained using the Amber data sources [Amber-Dataset](https://huggingface.co/datasets/LLM360/AmberDatasets).


## Model Summary

"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the ‘less is more’ paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource-constrained devices. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named MobiLlama, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands. MobiLlama is a SLM design that initiates from a larger model and applies a careful parameter sharing scheme to reduce both the pre-training and the deployment cost. Our work strives to not only bridge the gap in open-source SLMs but also ensures full transparency, where complete training data pipeline, training code, model weights, and over 300 checkpoints along with evaluation codes are available on our [Github](https://github.com/mbzuai-oryx/MobiLlama).

[Arxiv Paper Link](https://arxiv.org/abs/2402.16840)

## Model Description

- **Model type:** Small Language Model (SLM) built using the architecture design of LLaMA-7B 
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Resources for more information:**
  - [Training Code](https://github.com/mbzuai-oryx/MobiLlama)
  - [Data Preparation](https://github.com/LLM360/amber-data-prep)
  - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)

 
## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-05B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("MBZUAI/MobiLlama-05B", trust_remote_code=True)

model.to('cuda')
text = "I was walking towards the river when "
input_ids = tokenizer(text, return_tensors="pt").to('cuda').input_ids
outputs = model.generate(input_ids, max_length=1000, repetition_penalty=1.2, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())

```

## Training DataMix
| Subset      | Tokens (Billion) |
| ----------- | ----------- |
| Arxiv      | 30.00       |
| Book   | 28.86        |
| C4   | 197.67        |
| Refined-Web   | 665.01        |
| StarCoder   | 291.92        |
| StackExchange   | 21.75        |
| Wikipedia   | 23.90        |
| Total | 1259.13 |

## Hyperparameters
| Hyperparameter      | Value |
| ----------- | ----------- |
| Total Parameters      | 0.52B       |
| Hidden Size   | 2048        |
| Intermediate Size (MLPs)   | 5632        |
| Number of Attention Heads   | 32        |
| Number of Hidden Lyaers  | 22        |
| RMSNorm ɛ  | 1e^-5        |
| Max Seq Length   | 2048        |
| Vocab Size | 32000 |


## Evaluation

| Evaluation Benchmark | MobiLlama-0.5B | MobiLlama-0.8B | MobiLlama-1.2B |
| ----------- | ----------- | ----------- | ----------- |
| HellaSwag | 52.52 | 54.09 | 62.99 |
| MMLU | 26.45 | 26.92 | 24.23 | 
| Arc Challenge | 29.52 | 30.20 | 34.55 | 
| TruthfulQA | 38.05 | 38.48 | 35.57 | 
| CrowsPairs | 64.03 | 64.82 | 68.12 | 
| PIQA | 72.03 | 73.17 | 75.29 |
| Race | 33.68 | 33.37 | 35.31 |
| SIQA | 40.22 | 41.60 | 41.96 |
| Winogrande | 57.53 | 57.45 | 61.08 |


## Citation
**BibTeX:**

```bibtex
@misc{thawakar2024mobillama,
      title={MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT}, 
      author={Omkar Thawakar and Ashmal Vayani and Salman Khan and Hisham Cholakkal and Rao Muhammad Anwer and Michael Felsberg and Timothy Baldwin and Eric P. Xing and Fahad Shahbaz Khan},
      year={2024},
      eprint={2402.16840},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
} 
```