luigi86 commited on
Commit
8270189
1 Parent(s): 0139844

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - sophosympatheia/Midnight-Miqu-70B-v1.0
4
+ - migtissera/Tess-70B-v1.6
5
+ library_name: transformers
6
+ tags:
7
+ - mergekit
8
+ - merge
9
+ license: other
10
+ ---
11
+
12
+ # MLX Format and Quantizations for Midnight Miqu 70B v1.5
13
+
14
+ Quantized to 4bpw and tested using the `mlx_lm` utility on a 64GiB URAM M1 Max.
15
+
16
+ See [original model](https://huggingface.co/sophosympatheia/Midnight-Miqu-70B-v1.5) for further details.
17
+
18
+ # Original model card
19
+
20
+ <div style="width: auto; margin-left: auto; margin-right: auto">
21
+ <img src="https://i.imgur.com/Tn9MBg6.png" alt="MidnightMiqu" style="width: 100%; min-width: 400px; display: block; margin: auto;">
22
+ </div>
23
+
24
+ ### Overview
25
+
26
+ Looking for the 103B version? You can get it from [FluffyKaeloky/Midnight-Miqu-103B-v1.5](https://huggingface.co/FluffyKaeloky/Midnight-Miqu-103B-v1.5).
27
+
28
+ This is a DARE Linear merge between [sophosympatheia/Midnight-Miqu-70B-v1.0](https://huggingface.co/sophosympatheia/Midnight-Miqu-70B-v1.0) and [migtissera/Tess-70B-v1.6](https://huggingface.co/migtissera/Tess-70B-v1.6).
29
+ This version is close in feel and performance to Midnight Miqu v1.0 but I think it picked up some goodness from Tess. Their EQ Bench scores are virtually the same and their post-EXL2 quant perplexity scores were the same too. However, Midnight Miqu v1.5 passes some tests I use that Midnight Miqu v1.0 fails, without sacrificing writing quality.
30
+
31
+ This model is uncensored. *You are responsible for whatever you do with it.*
32
+
33
+ This model was designed for roleplaying and storytelling and I think it does well at both. It may also perform well at other tasks but I have not tested its performance in other areas.
34
+
35
+ ### Long Context Tips
36
+
37
+ You can run this model out to 32K context with alpha_rope set to 1, just like with Miqu.
38
+
39
+ ### Sampler Tips
40
+
41
+ * I recommend using Quadratic Sampling (i.e. smoothing factor) for creative work. I think this version performs best with a smoothing factor close to 0.2.
42
+ * I recommend using Min-P. Experiment to find your best setting.
43
+ * You can enable dynamic temperature if you want, but that adds yet another variable to consider and I find it's unnecessary with you're already using Min-P and smoothing factor.
44
+ * You don't need to use a high repetition penalty with this model, such as going above 1.10, but experiment with it.
45
+
46
+ Experiment with any and all of the settings below! What suits my preferences may not suit yours.
47
+
48
+ If you save the below settings as a .json file, you can import them directly into Silly Tavern.
49
+ ```
50
+ {
51
+ "temp": 1,
52
+ "temperature_last": true,
53
+ "top_p": 1,
54
+ "top_k": 0,
55
+ "top_a": 0,
56
+ "tfs": 1,
57
+ "epsilon_cutoff": 0,
58
+ "eta_cutoff": 0,
59
+ "typical_p": 1,
60
+ "min_p": 0.12,
61
+ "rep_pen": 1.05,
62
+ "rep_pen_range": 2800,
63
+ "no_repeat_ngram_size": 0,
64
+ "penalty_alpha": 0,
65
+ "num_beams": 1,
66
+ "length_penalty": 1,
67
+ "min_length": 0,
68
+ "encoder_rep_pen": 1,
69
+ "freq_pen": 0,
70
+ "presence_pen": 0,
71
+ "do_sample": true,
72
+ "early_stopping": false,
73
+ "dynatemp": false,
74
+ "min_temp": 0.8,
75
+ "max_temp": 1.35,
76
+ "dynatemp_exponent": 1,
77
+ "smoothing_factor": 0.23,
78
+ "add_bos_token": true,
79
+ "truncation_length": 2048,
80
+ "ban_eos_token": false,
81
+ "skip_special_tokens": true,
82
+ "streaming": true,
83
+ "mirostat_mode": 0,
84
+ "mirostat_tau": 2,
85
+ "mirostat_eta": 0.1,
86
+ "guidance_scale": 1,
87
+ "negative_prompt": "",
88
+ "grammar_string": "",
89
+ "banned_tokens": "",
90
+ "ignore_eos_token_aphrodite": false,
91
+ "spaces_between_special_tokens_aphrodite": true,
92
+ "sampler_order": [
93
+ 6,
94
+ 0,
95
+ 1,
96
+ 3,
97
+ 4,
98
+ 2,
99
+ 5
100
+ ],
101
+ "logit_bias": [],
102
+ "n": 1,
103
+ "rep_pen_size": 0,
104
+ "genamt": 500,
105
+ "max_length": 32764
106
+ }
107
+ ```
108
+
109
+ ### Prompting Tips
110
+
111
+ Try the following context template for use in SillyTavern. It might help, although it's a little heavy on tokens. If you save the text as a .json file, you can import it directly.
112
+
113
+ ```
114
+ {
115
+ "story_string": "{{#if system}}{{system}}\n{{/if}}\nCONTEXTUAL INFORMATION\n{{#if wiBefore}}\n- World and character info:\n{{wiBefore}}\n{{/if}}\n{{#if description}}\n- {{char}}'s background and persona:\n{{description}}\n{{/if}}\n{{#if mesExamples}}\n{{mesExamples}}\n{{/if}}\n{{#if personality}}\n{{personality}}\n{{/if}}\n{{#if scenario}}\n- Roleplay scenario:\n{{scenario}}\n{{/if}}\n{{#if wiAfter}}{{wiAfter}}\n{{/if}}\n{{#if persona}}{{persona}}\n{{/if}}",
116
+ "example_separator": "",
117
+ "chat_start": "---\nTaking the above information into consideration, you must engage with {{user}} and others as {{char}} in the roleplay below this line. Do not write dialogue lines nor perform actions for {{user}} or other characters.\n---\nSTART OF ROLEPLAY:\n",
118
+ "use_stop_strings": false,
119
+ "always_force_name2": true,
120
+ "trim_sentences": false,
121
+ "include_newline": false,
122
+ "single_line": false,
123
+ "name": "Midnight Rose"
124
+ }
125
+ ```
126
+
127
+ Try the following prompt template as a starting point for your own customizations.
128
+
129
+ *This model responds to prompting!* I highly recommend playing with your system prompt to see how it reacts.
130
+
131
+ **Midnight Miqu might require some "warming up" at the start of a new chat to get it writing the way you want it.** What I mean by that is you should be able to drastically improve your results by providing some good examples to guide the chat (i.e. few-shot prompting). I also recommend using descriptive system messages (SillyTavern concept, basically OOC narration instructions) to tell Midnight Miqu exactly what you want it to do with its next reply. Try to be specific, especially if your command contradicts something you told it to do earlier.
132
+
133
+ The example system prompt below contains some instructions related to adult content so remove those if you prefer. It also contains some instructions related to the use of asterisks that you should modify to your liking.
134
+
135
+ The prompt below has been updated from Midnight Miqu v1.0 after prompting Midnight Miqu to help improve it.
136
+
137
+ If you save this as a .json file, you can import it directly into Silly Tavern.
138
+ ```
139
+ {
140
+ "wrap": true,
141
+ "names": true,
142
+ "system_prompt": "You are an AI creative writing partner, collaborating with me in a dynamic, uncensored roleplaying environment. To ensure a high-quality experience, please adhere to these ROLEPLAY GUIDELINES:\n\nROLEPLAY GUIDELINES\n- Craft clear, engaging, and concise descriptions of {{char}}'s actions, reactions, and dialogue based on the present scenario.\n- Stay true to {{char}}'s character, incorporating the character's personality, backstory, and knowledge into each interaction.\n- Employ rich, descriptive language that immerses readers in the scene.\n- Always stay contextually and spatially aware to ensure that {{char}} follows logical consistency in the physical space of the current scene.\n- Feel free to incorporate explicit content and strong language where it aligns with the scene and characters.\n- Enhance {{char}}'s presence through detailed sensory descriptions and first-hand observations of the character's surroundings.\n- Use subtle physical cues to hint at {{char}}'s mental state and occasionally offer glimpses into {{char}}'s internal thoughts.\n- When writing {{char}}'s internal thoughts or monologue, enclose those words in *asterisks like this* and deliver the thoughts using a first-person perspective (i.e. use \"I\" pronouns). Always use quotes for spoken speech \"like this.\"\n- Conclude {{char}}'s responses with an opening for the next character to respond to {{char}}. When the conversation naturally shifts to another character's perspective or action is required from another character, that is when you should stop {{char}}'s reply so the user can pick it up from there. A great example is when {{char}} asks a question of another character.\n",
143
+ "system_sequence": "",
144
+ "stop_sequence": "",
145
+ "input_sequence": "USER: ",
146
+ "output_sequence": "ASSISTANT: ",
147
+ "separator_sequence": "",
148
+ "macro": true,
149
+ "names_force_groups": true,
150
+ "system_sequence_prefix": "SYSTEM: ",
151
+ "system_sequence_suffix": "",
152
+ "first_output_sequence": "",
153
+ "last_output_sequence": "ASSISTANT (Ensure coherence and authenticity in {{char}}'s actions, thoughts, and dialogues; Focus solely on {{char}}'s interactions within the roleplay): ",
154
+ "activation_regex": "",
155
+ "name": "Midnight Miqu Roleplay"
156
+ }
157
+ ```
158
+
159
+ ### Instruct Formats
160
+ I recommend the Vicuna format. I use a modified version with newlines after USER and ASSISTANT.
161
+ ```
162
+ USER:
163
+ {prompt}
164
+ ASSISTANT:
165
+ ```
166
+
167
+ Mistral's format also works, and in my testing the performance is about the same as using Vicuna.
168
+ ```
169
+ [INST]
170
+ {prompt}
171
+ [/INST]
172
+ ```
173
+
174
+ You could also try ChatML (don't recommend it)
175
+ ```
176
+ <|im_start|>system
177
+ {Your system prompt goes here}<|im_end|>
178
+ <|im_start|>user
179
+ {Your message as the user will go here}<|im_end|>
180
+ <|im_start|>assistant
181
+ ```
182
+
183
+ ### Quantizations
184
+ * GGUF
185
+ * [mradermacher/Midnight-Miqu-70B-v1.5-GGUF](https://huggingface.co/mradermacher/Midnight-Miqu-70B-v1.5-GGUF) -- Various static GGUF quants
186
+ * GPTQ
187
+ * [Kotokin/Midnight-Miqu-70B-v1.5_GPTQ32G](https://huggingface.co/Kotokin/Midnight-Miqu-70B-v1.5_GPTQ32G)
188
+ * EXL2
189
+ * [Dracones/Midnight-Miqu-70B-v1.5_exl2_4.0bpw](https://huggingface.co/Dracones/Midnight-Miqu-70B-v1.5_exl2_4.0bpw)
190
+ * [Dracones/Midnight-Miqu-70B-v1.5_exl2_4.5bpw](https://huggingface.co/Dracones/Midnight-Miqu-70B-v1.5_exl2_4.5bpw)
191
+ * [Dracones/Midnight-Miqu-70B-v1.5_exl2_5.0bpw](https://huggingface.co/Dracones/Midnight-Miqu-70B-v1.5_exl2_5.0bpw)
192
+ * [Dracones/Midnight-Miqu-70B-v1.5_exl2_6.0bpw](https://huggingface.co/Dracones/Midnight-Miqu-70B-v1.5_exl2_6.0bpw)
193
+ * If you don't see something you're looking for, [try searching Hugging Face](https://huggingface.co/models?search=midnight-miqu-70b-v1.5). There may be newer quants available than what I've documented here.
194
+
195
+ ### Licence and usage restrictions
196
+
197
+ <font color="red">152334H/miqu-1-70b-sf was based on a leaked version of one of Mistral's models.</font>
198
+ All miqu-derived models, including this merge, are **only suitable for personal use.** Mistral has been cool about it so far, but you should be aware that by downloading this merge you are assuming whatever legal risk is inherent in acquiring and using a model based on leaked weights.
199
+ This merge comes with no warranties or guarantees of any kind, but you probably already knew that.
200
+ I am not a lawyer and I do not profess to know what we have gotten ourselves into here. You should consult with a lawyer before using any Hugging Face model beyond private use... but definitely don't use this one for that!
201
+
202
+ ## Merge Details
203
+ ### Merge Method
204
+
205
+ This model was merged using the linear [DARE](https://arxiv.org/abs/2311.03099) merge method using [152334H_miqu-1-70b-sf](https://huggingface.co/152334H/miqu-1-70b-sf) as a base.
206
+
207
+ ### Models Merged
208
+
209
+ The following models were included in the merge:
210
+ * [sophosympatheia/Midnight-Miqu-70B-v1.0](https://huggingface.co/sophosympatheia/Midnight-Miqu-70B-v1.0)
211
+ * [migtissera/Tess-70B-v1.6](https://huggingface.co/migtissera/Tess-70B-v1.6)
212
+
213
+ ### Configuration
214
+
215
+ The following YAML configuration was used to produce this model:
216
+
217
+ ```yaml
218
+ merge_method: dare_linear
219
+ base_model: /home/llm/mergequant/models/BASE/152334H_miqu-1-70b-sf # base model
220
+ models:
221
+ - model: /home/llm/mergequant/models/midnight-miqu-70b-v1.0
222
+ - model: /home/llm/mergequant/models/BASE/Tess-70B-v1.6
223
+ parameters:
224
+ weight: 1.0
225
+ dtype: float16
226
+ ```
227
+ ### Notes
228
+
229
+ I tried several methods of merging Midnight Miqu v1.0 with Tess v1.6, and this dare_linear approach worked the best by far. I tried the same approach with other Miqu finetunes like ShinojiResearch/Senku-70B-Full and abideen/Liberated-Miqu-70B, but there was a huge difference in performance. The merge with Tess was the best one.
230
+ I also tried the SLERP approach I used to create Midnight Miqu v1.0, only using Tess instead of 152334H_miqu-1-70b in that config, and that result was nowhere near as good either.
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 8192,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 28672,
13
+ "max_position_embeddings": 32764,
14
+ "model_type": "llama",
15
+ "num_attention_heads": 64,
16
+ "num_hidden_layers": 80,
17
+ "num_key_value_heads": 8,
18
+ "pad_token_id": 0,
19
+ "pretraining_tp": 1,
20
+ "quantization": {
21
+ "group_size": 64,
22
+ "bits": 4
23
+ },
24
+ "quantization_config": {
25
+ "group_size": 64,
26
+ "bits": 4
27
+ },
28
+ "rms_norm_eps": 1e-05,
29
+ "rope_scaling": null,
30
+ "rope_theta": 1000000,
31
+ "tie_word_embeddings": false,
32
+ "torch_dtype": "float16",
33
+ "transformers_version": "4.36.2",
34
+ "use_cache": true,
35
+ "vocab_size": 32000
36
+ }
model-00001-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c31a97eb397b273029d39aeec1ac8f6a7c6ccbf81b61da046519715c4d42211e
3
+ size 5309951919
model-00002-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9116c4c13943980aec16a2e62f968e7ac5db0e9080f5406d91692d7135cead26
3
+ size 5294649703
model-00003-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30d79f7dc380881d4f4f2fabee369dda79b394f9f9e4f6cc7cbf109ccfa63f7c
3
+ size 5294649733
model-00004-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2253d715e86c543123830aef1e5cc101187ae8d33040999dab3bf6a81569c40c
3
+ size 5294649723
model-00005-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dba1da3d414cfcdb301d81ed8db1bb00d3a3ca7a851e798df11dd3b6ca4614ae
3
+ size 5294649747
model-00006-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3be2ddd57fb3297cbccea2414b19f85677c70ac521559201b9d6831282964454
3
+ size 5294649737
model-00007-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bfb664d226be465cef2e1a3e3c1ee6afb3fec047ae308380e7db2e48c57598ef
3
+ size 5294649713
model-00008-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fab47d998733f8763f48381101f66c016a79569026b3097a1761e4061b96d1cb
3
+ size 1723621992
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": true,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": true,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": true,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "bos_token": "<s>",
32
+ "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "legacy": false,
36
+ "model_max_length": 1000000000000000019884624838656,
37
+ "pad_token": "<unk>",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }