Added weights
Browse files- README.md +89 -0
- config.json +39 -0
- generation_config.json +8 -0
- job_new.json +0 -0
- measurement.json +0 -0
- output.safetensors +3 -0
- special_tokens_map.json +24 -0
- tokenizer.model +3 -0
- tokenizer_config.json +44 -0
README.md
ADDED
@@ -0,0 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
license_name: freeuse
|
4 |
+
license_link: LICENSE
|
5 |
+
tags:
|
6 |
+
- not-for-all-audiences
|
7 |
+
---
|
8 |
+
This is an EXL2 quantized model in 4bpw of [TheDrummer/Moistral-11B-v1](https://huggingface.co/TheDrummer/Moistral-11B-v1) using the default calibration dataset.
|
9 |
+
|
10 |
+
# Original Model card:
|
11 |
+
|
12 |
+
# Moistral 11B v1 💦
|
13 |
+
|
14 |
+
*An eRP model with a rich vocabulary of moist. Finetuned by yours truly.*
|
15 |
+
|
16 |
+
GGUF: https://huggingface.co/TheDrummer/Moistral-11B-v1-GGUF/tree/main
|
17 |
+
|
18 |
+
![image/gif](https://cdn-uploads.huggingface.co/production/uploads/65f2fd1c25b848bd061b5c2e/JLzCB9m6U-8rICy7ZRQPx.gif)
|
19 |
+
|
20 |
+
## Description
|
21 |
+
|
22 |
+
Moistral 11B is an *aggressive* **finetune** of the legendary [Fimbulvert v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2) model.
|
23 |
+
|
24 |
+
If you haven't heard of that model, then your poor-GPU ass is missing out!
|
25 |
+
It is an amazing substitute for those heavyweight models, and it won't disappoint.
|
26 |
+
*(That's coming from someone who daily drives a Venus / Goliath 120B)*
|
27 |
+
|
28 |
+
...and if you're looking for something *more moist*, then you've come to the right place.
|
29 |
+
|
30 |
+
You can expect the same sort of performance here! It's the Honda Civic of eRP models!
|
31 |
+
|
32 |
+
### Pimp my ride
|
33 |
+
Turn any old story into a **Moist**ral masterpiece.
|
34 |
+
In fact, we recommend that you leave out any hints of *moist* in your story to _let Moistral cook_.
|
35 |
+
|
36 |
+
### Drive it like it's hot
|
37 |
+
When in Instruct Mode, Moistral will treat you like a director of your own fantasy ride.
|
38 |
+
Be a controlling navigator or a passenger princess, it won't matter - you're in for a good ride!
|
39 |
+
|
40 |
+
<details>
|
41 |
+
<summary>Moistral Sample</summary>
|
42 |
+
<b>ASSISTANT:</b><br/>
|
43 |
+
When the doors to the throne room finally opened, she saw him there - Dik, the sorcerer prince sitting on his throne. He looked majestic, just as Princess Lilia expected. In a robe embroidered with silver yarn and covered with magical runes. His courtiers stood on either side of him, all looking at the princess with interest. Princess Lilia felt her determination leaving her, but pulled herself together and once again reminded herself that she should not be afraid. She approached the throne intending to show the dignity of all Iliac by personal example.<br/>
|
44 |
+
<b>USER:</b><br/>
|
45 |
+
She bows slightly<br/>
|
46 |
+
<b>ASSISTANT:</b><br/>
|
47 |
+
Princess Lilia bowed slightly in front of Dik, the sorcerer prince. Her eyes fell to the floor out of respect, but soon rose again. She would show herself to be a dignified woman!<br/>
|
48 |
+
"I greet you, your highness," Princess Lilia said. "I am here on behalf of my father, King Edmund Ruhr, to serve as a hostage and envoy."<br/>
|
49 |
+
Dik nodded, but didn't say anything.<br/>
|
50 |
+
</details>
|
51 |
+
|
52 |
+
### It's got the mileage
|
53 |
+
Moistral has been trained with many long form texts, a nice chunk of which are 8K in length.
|
54 |
+
It is capable of going far and long without passing it back to you. This is not your typical chibi RP model.
|
55 |
+
|
56 |
+
### Parameters
|
57 |
+
If Moistral starts to underperform and token spit, I've noticed that lowering the parameters makes it coherent again. Here's what worked for me:
|
58 |
+
```yaml
|
59 |
+
temperature: 0.66
|
60 |
+
repetition_penalty: 1.1
|
61 |
+
top_p: 0.64
|
62 |
+
rp_slp: 1
|
63 |
+
```
|
64 |
+
I encourage you to play around with the parameters yourself to see what works for you.
|
65 |
+
|
66 |
+
## What's next?
|
67 |
+
Moistral 11B is my first attempt at finetuning a capable model (Sorry, CreamPhi-2).
|
68 |
+
It's coherent and creative enough to let me understand the impact of my dataset & training.
|
69 |
+
Playing around with it has already given me a better idea on the Do's and Don'ts.
|
70 |
+
I will most likely make a version 2 with some improvements:
|
71 |
+
1. Remove any glitchy texts that come from my dataset. Sanitize sanitize sanitize!
|
72 |
+
2. Balance out the themes in the dataset for a richer, more diverse experience.
|
73 |
+
3. Consider extending the context window.
|
74 |
+
4. Add a 'monologue' dataset that forces the model to keep talking without much interaction from the `user`.
|
75 |
+
5. Maybe, just maybe, expose it to dry stuff to let Moistral cook.
|
76 |
+
|
77 |
+
GGUF: https://huggingface.co/TheDrummer/Moistral-11B-v1-GGUF/tree/main
|
78 |
+
|
79 |
+
I have to acknowledge that I'm standing on the shoulders of giants.
|
80 |
+
Thank you Sao for sharing your finetune config along with tips on getting started.
|
81 |
+
Thanks to everyone in the Finetuning channel for entertaining my every question.
|
82 |
+
|
83 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65f2fd1c25b848bd061b5c2e/Ll8CA5RR7ugTi72P2HBb8.png)
|
84 |
+
|
85 |
+
---
|
86 |
+
license: other
|
87 |
+
license_name: freeuse
|
88 |
+
license_link: LICENSE
|
89 |
+
---
|
config.json
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "Sao10K/Fimbulvetr-11B-v2",
|
3 |
+
"architectures": [
|
4 |
+
"LlamaForCausalLM"
|
5 |
+
],
|
6 |
+
"attention_bias": false,
|
7 |
+
"attention_dropout": 0.0,
|
8 |
+
"bos_token_id": 1,
|
9 |
+
"eos_token_id": 2,
|
10 |
+
"hidden_act": "silu",
|
11 |
+
"hidden_size": 4096,
|
12 |
+
"initializer_range": 0.02,
|
13 |
+
"intermediate_size": 14336,
|
14 |
+
"max_position_embeddings": 8192,
|
15 |
+
"model_type": "llama",
|
16 |
+
"num_attention_heads": 32,
|
17 |
+
"num_hidden_layers": 48,
|
18 |
+
"num_key_value_heads": 8,
|
19 |
+
"pretraining_tp": 1,
|
20 |
+
"rms_norm_eps": 1e-05,
|
21 |
+
"rope_scaling": null,
|
22 |
+
"rope_theta": 10000.0,
|
23 |
+
"tie_word_embeddings": false,
|
24 |
+
"torch_dtype": "bfloat16",
|
25 |
+
"transformers_version": "4.38.2",
|
26 |
+
"use_cache": false,
|
27 |
+
"vocab_size": 32000,
|
28 |
+
"quantization_config": {
|
29 |
+
"quant_method": "exl2",
|
30 |
+
"version": "0.0.16",
|
31 |
+
"bits": 4.0,
|
32 |
+
"head_bits": 6,
|
33 |
+
"calibration": {
|
34 |
+
"rows": 100,
|
35 |
+
"length": 2048,
|
36 |
+
"dataset": "(default)"
|
37 |
+
}
|
38 |
+
}
|
39 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"do_sample": true,
|
5 |
+
"eos_token_id": 2,
|
6 |
+
"transformers_version": "4.38.2",
|
7 |
+
"use_cache": false
|
8 |
+
}
|
job_new.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
measurement.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
output.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ee4fb164e5134201d499ceb4708f85943d13713bba344117483054929ba8bc36
|
3 |
+
size 5601411052
|
special_tokens_map.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": false,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"pad_token": "</s>",
|
17 |
+
"unk_token": {
|
18 |
+
"content": "<unk>",
|
19 |
+
"lstrip": false,
|
20 |
+
"normalized": false,
|
21 |
+
"rstrip": false,
|
22 |
+
"single_word": false
|
23 |
+
}
|
24 |
+
}
|
tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
|
3 |
+
size 493443
|
tokenizer_config.json
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": true,
|
3 |
+
"add_eos_token": false,
|
4 |
+
"add_prefix_space": true,
|
5 |
+
"added_tokens_decoder": {
|
6 |
+
"0": {
|
7 |
+
"content": "<unk>",
|
8 |
+
"lstrip": false,
|
9 |
+
"normalized": false,
|
10 |
+
"rstrip": false,
|
11 |
+
"single_word": false,
|
12 |
+
"special": true
|
13 |
+
},
|
14 |
+
"1": {
|
15 |
+
"content": "<s>",
|
16 |
+
"lstrip": false,
|
17 |
+
"normalized": false,
|
18 |
+
"rstrip": false,
|
19 |
+
"single_word": false,
|
20 |
+
"special": true
|
21 |
+
},
|
22 |
+
"2": {
|
23 |
+
"content": "</s>",
|
24 |
+
"lstrip": false,
|
25 |
+
"normalized": false,
|
26 |
+
"rstrip": false,
|
27 |
+
"single_word": false,
|
28 |
+
"special": true
|
29 |
+
}
|
30 |
+
},
|
31 |
+
"additional_special_tokens": [],
|
32 |
+
"bos_token": "<s>",
|
33 |
+
"clean_up_tokenization_spaces": false,
|
34 |
+
"eos_token": "</s>",
|
35 |
+
"legacy": true,
|
36 |
+
"model_max_length": 1000000000000000019884624838656,
|
37 |
+
"pad_token": "</s>",
|
38 |
+
"sp_model_kwargs": {},
|
39 |
+
"spaces_between_special_tokens": false,
|
40 |
+
"tokenizer_class": "LlamaTokenizer",
|
41 |
+
"unk_token": "<unk>",
|
42 |
+
"use_default_system_prompt": true,
|
43 |
+
"use_fast": true
|
44 |
+
}
|