Lambent commited on
Commit
f07dc20
1 Parent(s): 54df78a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -2
README.md CHANGED
@@ -6,7 +6,7 @@ library_name: transformers
6
  tags:
7
  - mergekit
8
  - merge
9
-
10
  ---
11
 
12
  <img src="https://cdn.midjourney.com/f62c61c7-c5b4-4fa2-8d8c-ece4c65808a4/0_0.jpeg"></img>
@@ -15,6 +15,16 @@ This is a merge of pre-trained language models created using [mergekit](https://
15
 
16
  ## Merge Details
17
 
 
 
 
 
 
 
 
 
 
 
18
  v1.1 was based on *approximately* the same steps as v1, but based on the abliterated version of Qwen-Instruct.
19
 
20
  Presuming this dealt some damage, this version heals it with the middle layers of v1.
@@ -22,6 +32,8 @@ It's still less 'refusal-censored' than v1, though be sure to calibrate the syst
22
  EQ-bench testing had some syntax issues still but tested at 76.1336 (with Qwen prompt that I plan on removing).
23
  Not too bad given at least half of it's been through abliteration and DPO.
24
 
 
 
25
  ### Merge Method
26
 
27
  This model was merged using the SLERP merge method.
@@ -47,4 +59,4 @@ parameters:
47
  dtype: bfloat16
48
 
49
 
50
- ```
 
6
  tags:
7
  - mergekit
8
  - merge
9
+ - not-for-all-audiences
10
  ---
11
 
12
  <img src="https://cdn.midjourney.com/f62c61c7-c5b4-4fa2-8d8c-ece4c65808a4/0_0.jpeg"></img>
 
15
 
16
  ## Merge Details
17
 
18
+ WARNING: There's actually a *reason* for the not-for-all-audiences tag on this one.
19
+
20
+ Qwen2.5 was much more refusal-censored in the first place compared to Mistral Nemo, but abliteration adjusts that.
21
+
22
+ (It's still probably more prudish. Humanlike style points and successful instruction following aren't really a pointer away from that.)
23
+
24
+ Given it's at least half-abliterated, I can't even promise it'll refuse with a guardrailed system prompt.
25
+
26
+ (I suspect it will due to the healing and re-integration of the base model, but may be more jailbreakable than fully intact refusal features.)
27
+
28
  v1.1 was based on *approximately* the same steps as v1, but based on the abliterated version of Qwen-Instruct.
29
 
30
  Presuming this dealt some damage, this version heals it with the middle layers of v1.
 
32
  EQ-bench testing had some syntax issues still but tested at 76.1336 (with Qwen prompt that I plan on removing).
33
  Not too bad given at least half of it's been through abliteration and DPO.
34
 
35
+
36
+
37
  ### Merge Method
38
 
39
  This model was merged using the SLERP merge method.
 
59
  dtype: bfloat16
60
 
61
 
62
+ ```