HiroseKoichi
commited on
Commit
•
ae418f7
1
Parent(s):
ab6c593
Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ tags:
|
|
20 |
I made a really stupid mistake and uploaded two models instead of one. I uploaded the files for both and was going to decide which one to release today, but I got up at 4-5am, immediately got on my PC, and then just set both to public after writing more of the model card. Hopefully no one downloaded it, but if you did, then I'm sorry for the inconvenience.
|
21 |
|
22 |
# Llama-3-8B-Stroganoff-4.0
|
23 |
-
Since V3, I tested a lot of old models, looked at some new ones, and used every merge method available in mergekit. This one is from experiments I was doing on model order, which is why all the models use the same parameters, but it was good enough that I decided to upload
|
24 |
|
25 |
Ties is not better than Task-Arithmetic, and Task-Arithmetic is not better than Ties; they both have certain advantages that make them better in different situations. Ties aims to reduce model interference by keeping weights that agree with each other and zeroing out the rest. If you try to use Ties with a bunch of models that do different things, then some aspects of the models might get erased if it doesn't have a strong enough presence. The order of the models does not matter with a Ties merge because all of the merging happens in one step, and changing the model order will produce identical hashes, assuming you're not using Dare or Della, which adds randomness to the merge.
|
26 |
|
|
|
20 |
I made a really stupid mistake and uploaded two models instead of one. I uploaded the files for both and was going to decide which one to release today, but I got up at 4-5am, immediately got on my PC, and then just set both to public after writing more of the model card. Hopefully no one downloaded it, but if you did, then I'm sorry for the inconvenience.
|
21 |
|
22 |
# Llama-3-8B-Stroganoff-4.0
|
23 |
+
Since V3, I tested a lot of old models, looked at some new ones, and used every merge method available in mergekit. This one is from experiments I was doing on model order, which is why all the models use the same parameters, but it was good enough that I decided to upload it. If you've been doing merges yourself, then most or all of the following information will be redundant, but some of it was not at all apparent to me, so I hope it will help others looking for more information.
|
24 |
|
25 |
Ties is not better than Task-Arithmetic, and Task-Arithmetic is not better than Ties; they both have certain advantages that make them better in different situations. Ties aims to reduce model interference by keeping weights that agree with each other and zeroing out the rest. If you try to use Ties with a bunch of models that do different things, then some aspects of the models might get erased if it doesn't have a strong enough presence. The order of the models does not matter with a Ties merge because all of the merging happens in one step, and changing the model order will produce identical hashes, assuming you're not using Dare or Della, which adds randomness to the merge.
|
26 |
|