Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,18 @@ tags:
|
|
12 |
|
13 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
## Merge Details
|
16 |
### Merge Method
|
17 |
|
|
|
12 |
|
13 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
14 |
|
15 |
+
Re-injected base model into instruct model in the intermediate layers while keeping input and output layers the same (sophosympatheia gradient).
|
16 |
+
|
17 |
+
While this did degrade the overall score of the model compared to instruct in EQ-bench testing (76.9195 down to 73.8068),
|
18 |
+
it removed its issue with misspelling some of the emotion responses and remains notably higher than the base model
|
19 |
+
(60.1027 but without any syntax errors).
|
20 |
+
It did throw in one non-mispelled "didn't match reference" syntax error, I presume it replaced the emotion entirely or used a similar grammatically correct one.
|
21 |
+
|
22 |
+
Looking at this as research evidence, it seems like the instruct model picked up something hurting the spelling occasionally specifically in the intermediate layers?
|
23 |
+
|
24 |
+
I don't know if there's any other gain from this merge compared to using one or both components, this was for curiosity.
|
25 |
+
Might still be useful as more-compact merge materials if you wanted both base and instruct anyway.
|
26 |
+
|
27 |
## Merge Details
|
28 |
### Merge Method
|
29 |
|