Update README.md
Browse files
README.md
CHANGED
@@ -10,15 +10,38 @@ tags:
|
|
10 |
- yuvraj17/Llama-3-8B-spectrum-25
|
11 |
- ruggsea/Llama3-stanford-encyclopedia-philosophy-QA
|
12 |
- arcee-ai/Llama-3.1-SuperNova-Lite
|
|
|
|
|
|
|
|
|
13 |
---
|
14 |
|
15 |
# Llama3-8B-SuperNova-Spectrum-dare_ties
|
16 |
|
17 |
-
Llama3-8B-SuperNova-Spectrum-dare_ties is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
|
18 |
* [yuvraj17/Llama-3-8B-spectrum-25](https://huggingface.co/yuvraj17/Llama-3-8B-spectrum-25)
|
19 |
* [ruggsea/Llama3-stanford-encyclopedia-philosophy-QA](https://huggingface.co/ruggsea/Llama3-stanford-encyclopedia-philosophy-QA)
|
20 |
* [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite)
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
## 🧩 Configuration
|
23 |
|
24 |
```yaml
|
@@ -65,4 +88,12 @@ pipeline = transformers.pipeline(
|
|
65 |
|
66 |
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
67 |
print(outputs[0]["generated_text"])
|
68 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
- yuvraj17/Llama-3-8B-spectrum-25
|
11 |
- ruggsea/Llama3-stanford-encyclopedia-philosophy-QA
|
12 |
- arcee-ai/Llama-3.1-SuperNova-Lite
|
13 |
+
license: apache-2.0
|
14 |
+
language:
|
15 |
+
- en
|
16 |
+
pipeline_tag: text-classification
|
17 |
---
|
18 |
|
19 |
# Llama3-8B-SuperNova-Spectrum-dare_ties
|
20 |
|
21 |
+
Llama3-8B-SuperNova-Spectrum-dare_ties is a `DARE_TIES` merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
|
22 |
* [yuvraj17/Llama-3-8B-spectrum-25](https://huggingface.co/yuvraj17/Llama-3-8B-spectrum-25)
|
23 |
* [ruggsea/Llama3-stanford-encyclopedia-philosophy-QA](https://huggingface.co/ruggsea/Llama3-stanford-encyclopedia-philosophy-QA)
|
24 |
* [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite)
|
25 |
|
26 |
+
## DARE_TIES Merging
|
27 |
+
|
28 |
+
### TIES Merging
|
29 |
+
|
30 |
+
[TIES](https://arxiv.org/abs/2306.01708) Merging, introduced by Yadav et al. (2023), is a method for merging multiple specialized models into one general-purpose model. It solves two key challenges:
|
31 |
+
* **Redundancy Removal**: Identifies and eliminates overlapping or unnecessary information between models, making the final model more efficient.
|
32 |
+
* **Conflict Resolution**: Reconciles differences between models by creating a unified sign vector that represents the most dominant direction of change across all models.
|
33 |
+
|
34 |
+
### DARE Merging
|
35 |
+
|
36 |
+
Introduced by Yu et al. (2023), [DARE](https://arxiv.org/abs/2311.03099) uses an approach similar to TIES with two main differences:
|
37 |
+
|
38 |
+
* **Weight Pruning**: Randomly resets some fine-tuned weights to their original values, reducing model complexity.
|
39 |
+
* **Weight Scaling**: Adjusts the remaining weights by scaling and combining them with the base model's weights to maintain consistent performance.
|
40 |
+
|
41 |
+
Mergekit’s implementation of this method has two flavours: with the sign election step of TIES (`dare_ties`) or without (`dare_linear`).
|
42 |
+
|
43 |
+
For more information refer this [Merge Large Language Models with MergeKit by Maxime Labonne](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
|
44 |
+
|
45 |
## 🧩 Configuration
|
46 |
|
47 |
```yaml
|
|
|
88 |
|
89 |
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
90 |
print(outputs[0]["generated_text"])
|
91 |
+
```
|
92 |
+
|
93 |
+
## 🏆 Evaluation Scores
|
94 |
+
Coming soon
|
95 |
+
|
96 |
+
|
97 |
+
## Special thanks & Reference
|
98 |
+
- Maxime Labonne for their easy-to-use colab-notebook [Merging LLMs with MergeKit](https://github.com/mlabonne/llm-course/blob/main/Mergekit.ipynb) and [Blog](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
|
99 |
+
- Authors of [Mergekit](https://github.com/arcee-ai/mergekit)
|