|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- mergekit |
|
- text-generation |
|
- merge |
|
--- |
|
|
|
|
|
# Mistral-NeuralHermes-Merge-7B-slerp |
|
|
|
## Model Description |
|
The `Mistral-Merge-7B-slerp` is a merged model which leverages the spherical linear interpolation (SLERP) technique to blend layers from two distinct transformer-based models. This merging strategy is aimed at synthesizing a model that incorporates the robust linguistic capabilities of `OpenPipe/mistral-ft-optimized-1218` and the nuanced understanding of `mlabonne/NeuralHermes-2.5-Mistral-7B`. |
|
|
|
## Configuration |
|
The merging process was configured to apply a SLERP method across all comparable layers of the two source models. Below is the YAML configuration used for merging: |
|
|
|
```yaml |
|
slices: |
|
- sources: |
|
- model: OpenPipe/mistral-ft-optimized-1218 |
|
layer_range: [0, 32] |
|
- model: mlabonne/NeuralHermes-2.5-Mistral-7B |
|
layer_range: [0, 32] |
|
merge_method: slerp |
|
base_model: OpenPipe/mistral-ft-optimized-1218 |
|
parameters: |
|
t: |
|
- filter: self_attn |
|
value: [0, 0.5, 0.3, 0.7, 1] |
|
- filter: mlp |
|
value: [1, 0.5, 0.7, 0.3, 0] |
|
- value: 0.5 |
|
dtype: bfloat16 |
|
``` |
|
|
|
This configuration ensures that both self-attention and MLP (multi-layer perceptron) layers undergo interpolation with a gradient of weights to optimize the integration of features from both models. |