luigi86 commited on
Commit
5efe2dd
1 Parent(s): 5eef712

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: mrl
4
+ language:
5
+ - en
6
+ tags:
7
+ - chat
8
+ pipeline_tag: text-generation
9
+ library_name: transformers
10
+ ---
11
+
12
+ # MLX Format and Quantizations for Magnum v4 22b
13
+
14
+ Quantized to 8-bit precision and tested using the `mlx_lm` utility on a 64GiB URAM M1 Max.
15
+
16
+ See [original model](https://huggingface.co/anthracite-org/magnum-v4-22b) for further details.
17
+
18
+ # Original model card
19
+
20
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a46cbfb9c2bdfae75b3a6/WvQykcYiK13x7sMI93T6e.png)
21
+
22
+
23
+ This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus.
24
+
25
+
26
+
27
+ This model is fine-tuned on top of [Mistral-Small-Instruct-2409](https://huggingface.co/mistralai/Mistral-Small-Instruct-2409).
28
+
29
+ ## Prompting
30
+ A typical input would look like this:
31
+
32
+ ```py
33
+ <s>[INST] SYSTEM MESSAGE
34
+ USER MESSAGE[/INST] ASSISTANT MESSAGE</s>[INST] USER MESSAGE[/INST]
35
+ ```
36
+
37
+ ## SillyTavern templates
38
+
39
+ Below are Instruct and Context templates for use within SillyTavern.
40
+
41
+ <details><summary>context template</summary>
42
+
43
+ ```yaml
44
+ default SillyTavern template works fine
45
+ ```
46
+
47
+ </details><br>
48
+ <details><summary>instruct template</summary>
49
+
50
+ ```yaml
51
+ default SillyTavern template works fine
52
+ ```
53
+
54
+ </details><br>
55
+
56
+ ## Axolotl config
57
+
58
+ <details><summary>See axolotl config</summary>
59
+
60
+ ```yaml
61
+ base_model: /workspace/models/Mistral-Small-Instruct-2409
62
+ model_type: AutoModelForCausalLM
63
+ tokenizer_type: AutoTokenizer
64
+
65
+ hub_model_id: anthracite-org/magnum-v4-22b-r4
66
+ hub_strategy: "all_checkpoints"
67
+ push_dataset_to_hub:
68
+ hf_use_auth_token: true
69
+
70
+ plugins:
71
+ - axolotl.integrations.liger.LigerPlugin
72
+ liger_rope: true
73
+ liger_rms_norm: true
74
+ liger_swiglu: true
75
+ #liger_cross_entropy: true
76
+ liger_fused_linear_cross_entropy: true
77
+
78
+ load_in_8bit: false
79
+ load_in_4bit: false
80
+ strict: false
81
+
82
+ datasets:
83
+ - path: anthracite-org/c2_logs_32k_mistral-v3_v1.2_no_system
84
+ type: custommistralv2v3
85
+ - path: anthracite-org/kalo-opus-instruct-22k-no-refusal-no-system
86
+ type: custommistralv2v3
87
+ - path: anthracite-org/kalo-opus-instruct-3k-filtered-no-system
88
+ type: custommistralv2v3
89
+ - path: anthracite-org/nopm_claude_writing_fixed
90
+ type: custommistralv2v3
91
+ - path: anthracite-org/kalo_opus_misc_240827_no_system
92
+ type: custommistralv2v3
93
+ - path: anthracite-org/kalo_misc_part2_no_system
94
+ type: custommistralv2v3
95
+ #chat_template: mistral_v2v3
96
+ shuffle_merged_datasets: true
97
+ #default_system_message: "You are an assistant that responds to the user."
98
+ dataset_prepared_path: /workspace/data/magnum-22b-data
99
+ val_set_size: 0.0
100
+ output_dir: /workspace/data/22b-r4-fft-out
101
+
102
+ sequence_len: 32768
103
+ sample_packing: true
104
+ pad_to_sequence_len: true
105
+
106
+ adapter:
107
+ lora_model_dir:
108
+ lora_r:
109
+ lora_alpha:
110
+ lora_dropout:
111
+ lora_target_linear:
112
+ lora_fan_in_fan_out:
113
+
114
+ wandb_project: 22b-magnum-fft
115
+ wandb_entity:
116
+ wandb_watch:
117
+ wandb_name: v4-r4-attempt-01
118
+ wandb_log_model:
119
+
120
+ gradient_accumulation_steps: 2
121
+ micro_batch_size: 1
122
+ num_epochs: 2
123
+ optimizer: adamw_bnb_8bit
124
+ lr_scheduler: cosine
125
+ learning_rate: 0.000004
126
+
127
+ train_on_inputs: false
128
+ group_by_length: false
129
+ bf16: auto
130
+ fp16:
131
+ tf32: false
132
+
133
+ gradient_checkpointing: true
134
+ early_stopping_patience:
135
+ resume_from_checkpoint:
136
+ local_rank:
137
+ logging_steps: 1
138
+ xformers_attention:
139
+ flash_attention: true
140
+
141
+ warmup_steps: 40
142
+ evals_per_epoch:
143
+ eval_table_size:
144
+ eval_max_new_tokens:
145
+ saves_per_epoch: 2
146
+ debug:
147
+ deepspeed: deepspeed_configs/zero3_bf16.json
148
+ weight_decay: 0.1
149
+ fsdp:
150
+ fsdp_config:
151
+ special_tokens:
152
+ ```
153
+ </details><br>
154
+
155
+ ## Credits
156
+ We'd like to thank Recursal / Featherless for sponsoring the compute for this train, Featherless has been hosting our Magnum models since the first 72 B and has given thousands of people access to our models and helped us grow.
157
+
158
+ We would also like to thank all members of Anthracite who made this finetune possible.
159
+
160
+ ## Datasets
161
+ - [anthracite-org/c2_logs_32k_mistral-v3_v1.2_no_system](https://huggingface.co/datasets/anthracite-org/c2_logs_32k_mistral-v3_v1.2_no_system)
162
+ - [anthracite-org/kalo-opus-instruct-22k-no-refusal-no-system](https://huggingface.co/datasets/anthracite-org/kalo-opus-instruct-22k-no-refusal-no-system)
163
+ - [anthracite-org/kalo-opus-instruct-3k-filtered-no-system](https://huggingface.co/datasets/anthracite-org/kalo-opus-instruct-3k-filtered-no-system)
164
+ - [anthracite-org/nopm_claude_writing_fixed](https://huggingface.co/datasets/anthracite-org/nopm_claude_writing_fixed)
165
+ - [anthracite-org/kalo_opus_misc_240827_no_system](https://huggingface.co/datasets/anthracite-org/kalo_opus_misc_240827_no_system)
166
+ - [anthracite-org/kalo_misc_part2_no_system](https://huggingface.co/datasets/anthracite-org/kalo_misc_part2_no_system)
167
+
168
+ ## Training
169
+ The training was done for 2 epochs. We used 8x[H100s](https://www.nvidia.com/en-us/data-center/h100/) GPUs graciously provided by [Recursal AI](https://recursal.ai/) / [Featherless AI](https://featherless.ai/) for the full-parameter fine-tuning of the model.
170
+
171
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
172
+
173
+ ## Safety
174
+ ...
config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "head_dim": 128,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 6144,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 16384,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 48,
16
+ "num_hidden_layers": 56,
17
+ "num_key_value_heads": 8,
18
+ "quantization": {
19
+ "group_size": 64,
20
+ "bits": 8
21
+ },
22
+ "quantization_config": {
23
+ "group_size": 64,
24
+ "bits": 8
25
+ },
26
+ "rms_norm_eps": 1e-05,
27
+ "rope_theta": 1000000.0,
28
+ "sliding_window": null,
29
+ "tie_word_embeddings": false,
30
+ "torch_dtype": "bfloat16",
31
+ "transformers_version": "4.45.0.dev0",
32
+ "use_cache": false,
33
+ "vocab_size": 32768
34
+ }
model-00001-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93e910370f2b09add0b556750777baeb8f1a30d7c2ad8b3d0d63a4a9fef71dfa
3
+ size 5281218788
model-00002-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5dbc8c527a395b86a2b2eea44c8852efb93ca6f292da3112c56faf9313532a5
3
+ size 5348090850
model-00003-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:482f7aded0f3f6e2c3d33715571f719c18755712b79e62a6d53d40867447ef22
3
+ size 5334720870
model-00004-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a73c988f0b83cd5a8d0437b240a36b6e6d27f2bdc5fe8e5cbccf5d490ea0d76d
3
+ size 5281219385
model-00005-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24f9468ad4450e2403c894e202d369dad44e9adc934be66fd1c00cf8cddb3a5a
3
+ size 2393286348
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59f95e28944c062244741268596badc900df86c7f5ded05088d2da22a7379e06
3
+ size 587583
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff