kromeurus commited on
Commit
28fb952
1 Parent(s): 3bd680f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -5
README.md CHANGED
@@ -26,7 +26,7 @@ in, ended up with that and solid intruct following. Took a week and a half, but
26
 
27
  [OG Q8 GGUF](https://huggingface.co/kromeurus/L3-Horizon-Anteros-v0.1-13B-Q8-GGUF) by me.
28
 
29
- Other quants are not available, yet.
30
 
31
  ### Details & Recommended Settings
32
 
@@ -45,8 +45,8 @@ Rec. Settings:
45
  Template: Model Default
46
  Temperature: 1.25
47
  Min P: 0.1
48
- Repeat Penelty: 1.05
49
- Repeat Penelty Tokens: 256
50
  ```
51
 
52
  ### Models Merged & Merge Theory
@@ -60,7 +60,30 @@ The following models were included in the merge:
60
  * [ArliAI/ArliAI-Llama-3-8B-Formax-v1.0](https://huggingface.co/ArliAI/ArliAI-Llama-3-8B-Formax-v1.0)
61
  * [nothingiisreal/L3-8B-Celeste-V1.2](https://huggingface.co/nothingiisreal/L3-8B-Celeste-V1.2)
62
 
63
- too tired rn, will update later
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  ### Config
66
 
@@ -199,7 +222,7 @@ parameters:
199
  int8_mask: true
200
  merge_method: passthrough
201
  dtype: bfloat16
202
- name: himerus.c # Himerus Basis.C, available on it's own.
203
  ----
204
  models:
205
  - model: himerus.c
 
26
 
27
  [OG Q8 GGUF](https://huggingface.co/kromeurus/L3-Horizon-Anteros-v0.1-13B-Q8-GGUF) by me.
28
 
29
+ [GGUFs](https://huggingface.co/backyardai/L3-Horizon-Anteros-v0.1-13B-GGUF) by [BackyardAI](https://huggingface.co/backyardai)
30
 
31
  ### Details & Recommended Settings
32
 
 
45
  Template: Model Default
46
  Temperature: 1.25
47
  Min P: 0.1
48
+ Repeat Penalty: 1.05
49
+ Repeat Penalty Tokens: 256
50
  ```
51
 
52
  ### Models Merged & Merge Theory
 
60
  * [ArliAI/ArliAI-Llama-3-8B-Formax-v1.0](https://huggingface.co/ArliAI/ArliAI-Llama-3-8B-Formax-v1.0)
61
  * [nothingiisreal/L3-8B-Celeste-V1.2](https://huggingface.co/nothingiisreal/L3-8B-Celeste-V1.2)
62
 
63
+ It looks like a lot, I swear it's not that bad.
64
+
65
+ General idea was for a more story like RP merge so I picked models that were either for basic RP and story wriing. Originally it was going to only be six models split into two
66
+ groups to min max their attributes, then someone on the BackyardAI discord asked if someone could fit in Formax into an RP model and I said 'bet'. Small problem, Formax is a
67
+ 4k context model while the rest are 8k. As excpected, some frakenmerges straight up failed to quant. But, I managed to finagle it in.
68
+
69
+ The first merge (anteros.b) is the more story-writing heavy base of Anteros, also the part with majority of the *moist and spice* in it. Given that the models in it are all NSFW and one
70
+ trained off of r/DirtyWritingPrompts, who's suprised? Since Instruct DWP is a story writing model, the top end had to be sole RP models to balence it out as well as right at the
71
+ end. But, I wanted to keep that 'human' writing style so I capped the end still with Instruct DWP. What came out was a merge that rambled a ton but verbose and narritive driven.
72
+
73
+ Second merge (anteros.c) is the RP forward merge with Formax at the start. Still keeping that 'human' style with by capping the end with Celeste v1.2 while stuffing the mids with Niitama
74
+ and Tahsin; two very competent models in their own right. Took a page out of [@matchaaaaa](https://huggingface.co/matchaaaaa)'s recent models with the splice method where
75
+ you take a slice out of two models of the same layers then merge it. I took it a step further, splicing where there was major (4+) layer overlap, streamlining the merge overall.
76
+ The resulting merge is consise and almost snippy, but suprisingly consise and coherent though it doesn't pick up nuance well.
77
+
78
+ As for how I managed to get Formax in, I did some testing and found out that you could get away with merging roughly the first quarter of the layers without much error.
79
+ It's the last few layers that cause trouble; trying to put the end layers at the end of the merge will make the model unquantable. But, I don't know how much that effects the
80
+ overall context length for the final model, still need more testing for that.
81
+
82
+ The final merge was a DELLA merge with both anteros.b and anteros.c with multiple gradients. DELLA is a very new merge method as of this model's release, you can read more about it [here](https://arxiv.org/abs/2406.11617).
83
+ Wanted to keep anteros.b's narritive affinity and nuance while also keeping anteros.c's coherence; so, I did a high weight of anteros.c with low density at the start then eased
84
+ the weight down until the end, where it went up again. Maintained an averege density around ~0.4 and epsilon around 0.05.
85
+
86
+
87
 
88
  ### Config
89
 
 
222
  int8_mask: true
223
  merge_method: passthrough
224
  dtype: bfloat16
225
+ name: anteros.c # Himerus Basis.C, available on its own.
226
  ----
227
  models:
228
  - model: himerus.c