Delta-Vector commited on
Commit
93ea84e
1 Parent(s): 74b3681

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -68
README.md CHANGED
@@ -1,21 +1,53 @@
 
 
 
1
  ---
2
- library_name: transformers
3
- license: other
4
- base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
5
- tags:
6
- - generated_from_trainer
7
- model-index:
8
- - name: outputs/out
9
- results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
16
  <details><summary>See axolotl config</summary>
17
 
18
- axolotl version: `0.4.1`
19
  ```yaml
20
  base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
21
  model_type: AutoModelForCausalLM
@@ -63,10 +95,10 @@ liger_rms_norm: true
63
  liger_swiglu: true
64
  liger_fused_linear_cross_entropy: true
65
 
66
- wandb_project: Ohashi4b
67
  wandb_entity:
68
  wandb_watch:
69
- wandb_name: Ohashi4b
70
  wandb_log_model:
71
 
72
  gradient_accumulation_steps: 32
@@ -107,66 +139,23 @@ fsdp_config:
107
  special_tokens:
108
  pad_token: <|finetune_right_pad_id|>
109
 
110
-
111
  ```
112
 
113
  </details><br>
114
 
115
- # outputs/out
116
-
117
- This model is a fine-tuned version of [IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml](https://huggingface.co/IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml) on the None dataset.
118
- It achieves the following results on the evaluation set:
119
- - Loss: 1.0118
120
-
121
- ## Model description
122
-
123
- More information needed
124
-
125
- ## Intended uses & limitations
126
-
127
- More information needed
128
-
129
- ## Training and evaluation data
130
-
131
- More information needed
132
-
133
- ## Training procedure
134
-
135
- ### Training hyperparameters
136
-
137
- The following hyperparameters were used during training:
138
- - learning_rate: 2e-05
139
- - train_batch_size: 1
140
- - eval_batch_size: 1
141
- - seed: 42
142
- - distributed_type: multi-GPU
143
- - num_devices: 2
144
- - gradient_accumulation_steps: 32
145
- - total_train_batch_size: 64
146
- - total_eval_batch_size: 2
147
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
148
- - lr_scheduler_type: cosine
149
- - lr_scheduler_warmup_steps: 6
150
- - num_epochs: 2
151
 
152
- ### Training results
 
 
 
153
 
154
- | Training Loss | Epoch | Step | Validation Loss |
155
- |:-------------:|:------:|:----:|:---------------:|
156
- | 1.4992 | 0.0278 | 1 | 1.6105 |
157
- | 1.2866 | 0.25 | 9 | 1.2680 |
158
- | 1.1737 | 0.5 | 18 | 1.1396 |
159
- | 1.1355 | 0.75 | 27 | 1.0766 |
160
- | 1.1065 | 1.0 | 36 | 1.0408 |
161
- | 0.9673 | 1.2370 | 45 | 1.0272 |
162
- | 0.9526 | 1.4870 | 54 | 1.0167 |
163
- | 0.9653 | 1.7370 | 63 | 1.0126 |
164
- | 0.958 | 1.9870 | 72 | 1.0118 |
165
 
 
 
166
 
167
- ### Framework versions
168
 
169
- - Transformers 4.45.0.dev0
170
- - Pytorch 2.4.0+cu121
171
- - Datasets 2.19.1
172
- - Tokenizers 0.19.1
 
1
+
2
+
3
+
4
  ---
5
+ License: apache-2.0
6
+ Language:
7
+ - En
8
+ Pipeline_tag: text-generation
9
+ Base_model: nvidia/Llama-3.1-Minitron-4 B-Width-Base
10
+ Tags:
11
+ - Chat
 
12
  ---
13
 
14
+ ![image/png]()
15
+ A model made to continue off my previous work on anthracite-org/magnum-4 b, A small model made for creative writing / General assistant tasks, finetuned ontop of [Intervitens](link), this model is made to be more coherent and generally be better then the 4 B at both writing and assistant tasks.
16
+
17
+ ## Prompting
18
+ Model has been Instruct tuned with the ChatML formatting. A typical input would look like this:
19
+
20
+ ```py
21
+ """<|im_start|>system
22
+ system prompt<|im_end|>
23
+ <|im_start|>user
24
+ Hi there!<|im_end|>
25
+ <|im_start|>assistant
26
+ Nice to meet you!<|im_end|>
27
+ <|im_start|>user
28
+ Can I ask a question?<|im_end|>
29
+ <|im_start|>assistant
30
+ """
31
+ ```
32
+
33
+ ## Support
34
+
35
+ To run inference on this model, you'll need to use Aphrodite, vLLM or EXL 2/tabbyAPI, as llama. Cpp hasn't yet merged the required pull request to fix the llama 3.1 rope_freqs issue with custom head dimensions.
36
+
37
+ However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8 k tokens.
38
+
39
+ To create a working GGUF file, make the following adjustments:
40
+
41
+ 1. Remove the `"rope_scaling": {}` entry from `config.json`
42
+ 2. Change `"max_position_embeddings"` to `8192` in `config.json`
43
+
44
+ These modifications should allow you to use the model with llama. Cpp, albeit with the mentioned context limitation.
45
+
46
+ ## Axolotl config
47
 
 
48
  <details><summary>See axolotl config</summary>
49
 
50
+ Axolotl version: `0.4.1`
51
  ```yaml
52
  base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
53
  model_type: AutoModelForCausalLM
 
95
  liger_swiglu: true
96
  liger_fused_linear_cross_entropy: true
97
 
98
+ wandb_project:
99
  wandb_entity:
100
  wandb_watch:
101
+ wandb_name:
102
  wandb_log_model:
103
 
104
  gradient_accumulation_steps: 32
 
139
  special_tokens:
140
  pad_token: <|finetune_right_pad_id|>
141
 
 
142
  ```
143
 
144
  </details><br>
145
 
146
+ ## Credits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
 
148
+ - [anthracite-org/kalo-opus-instruct-22k-no-refusal](https://huggingface.co/datasets/anthracite-org/kalo-opus-instruct-22k-no-refusal)
149
+ - [NewEden/Gryphe-3.5-16k-Subset](https://huggingface.co/datasets/NewEden/Gryphe-3.5-16k-Subset)
150
+ - [Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned](https://huggingface.co/datasets/Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned)
151
+ - [lodrick-the-lafted/OpusStories](https://huggingface.co/datasets/lodrick-the-lafted/OpusStories)
152
 
153
+ I couldn't have made this model without the help of [Kubernetes_bad](https://huggingface.co/kubernetes-bad) and the support of [Lucy Knada](https://huggingface.co/lucyknada)
 
 
 
 
 
 
 
 
 
 
154
 
155
+ ## Training
156
+ The training was done for 2 epochs. We used 2 x [RTX 6000s](https://store.nvidia.com/en-us/nvidia-rtx/products/nvidia-rtx-6000-ada-generation/) GPUs graciously provided by [Kubernetes_Bad](https://huggingface.co/kubernetes-bad) for the full-parameter fine-tuning of the model.
157
 
158
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
159
 
160
+ ## Safety
161
+ ...