NeMo
PyTorch
nemotron
srvm commited on
Commit
4fd2193
1 Parent(s): 2fb6086

Add initial README

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md CHANGED
@@ -4,3 +4,63 @@ license_name: nvidia-open-model-license
4
  license_link: >-
5
  https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  license_link: >-
5
  https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
6
  ---
7
+
8
+ # Minitron 4B Base
9
+
10
+ Minitron is a family of small language models (SLMs) obtained by pruning NVIDIA's [Nemotron-4 15B](https://arxiv.org/abs/2402.16819) model. We prune model embedding size, attention heads, and MLP intermediate dimension, following which, we perform continued training with distillation to arrive at the final models.
11
+
12
+ Deriving the Minitron 8B and 4B models from the base 15B model using our approach requires up to **40x fewer training tokens** per model compared to training from scratch; this results in **compute cost savings of 1.8x** for training the full model family (15B, 8B, and 4B). Minitron models exhibit up to a 16% improvement in MMLU scores compared to training from scratch, perform comparably to other community models such as Mistral 7B, Gemma 7B and Llama-3 8B, and outperform state-of-the-art compression techniques from the literature. Please refer to our [arXiv paper]() for more details.
13
+
14
+ Minitron models are for research and development only.
15
+
16
+ ## HuggingFace Quickstart
17
+
18
+ The [PR](https://github.com/huggingface/transformers/pull/31699) to support our models in Hugging Face in under review and expected to be merged soon. This [branch](https://github.com/suiyoubi/transformers/tree/aot/nemotron-support) can be used meanwhile to use our Minitron models.
19
+
20
+ ```
21
+ git clone [email protected]:suiyoubi/transformers.git
22
+ cd transformers
23
+ git checkout aot/nemotron-support
24
+ pip install .
25
+ ```
26
+ The following code provides an example of how to load the Minitron-4B model and use it to perform text generation.
27
+
28
+ ```python
29
+ import torch
30
+ from transformers import AutoTokenizer, AutoModelForCausalLM
31
+
32
+ # Load the tokenizer and model
33
+ model_path = "nvidia/Minitron-4B-Base"
34
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
35
+
36
+ device='cuda'
37
+ dtype=torch.bfloat16
38
+ model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=dtype, device_map=device)
39
+
40
+ # Prepare the input text
41
+ prompt = "To be or not to be,"
42
+ input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
43
+
44
+ # Generate the output
45
+ output_ids = model.generate(input_ids, max_length=50, num_return_sequences=1)
46
+
47
+ # Decode and print the output
48
+ output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
49
+ print(output_text)
50
+ ```
51
+
52
+ ## License
53
+
54
+ Minitron is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
55
+
56
+ ## Citation
57
+
58
+ If you find our work helpful, please consider citing our paper:
59
+ ```
60
+ @article{minitron2024,
61
+ title={Compact Language Models via Pruning and Knowledge Distillation},
62
+ author={Saurav Muralidharan and Sharath Turuvekere Sreenivas and Raviraj Joshi and Marcin Chochowski and Mostofa Patwary and Mohammad Shoeybi and Bryan Catanzaro and Jan Kautz and Pavlo Molchanov},
63
+ journal={arXiv preprint arXiv:XXX},
64
+ year={2024}
65
+ }
66
+ ```