Aster-G2-9B-v1 / README.md
twosmoothslateslabs's picture
Upload 12 files
e9efe6d verified
|
raw
history blame
3.09 kB
metadata
base_model:
  - anthracite-org/magnum-v3-9b-customgemma2
  - nbeerbower/gemma2-gutenberg-9B
  - grimjim/Magnolia-v1-Gemma2-8k-9B
  - UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
  - BeaverLegacy/Smegmma-Deluxe-9B-v1
  - ifable/gemma-2-Ifable-9B
library_name: transformers
tags:
  - mergekit
  - merge

temp

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the SLERP method to create an intermediate model. The Model Stock merge method used the SLERP model as a base to mix in more models.

The idea was to make a nice and smart base model and add in a few pinches of spice.

For some reason it wouldn't let me use any other merge method- it gave me ModelReference errors about my intermediary model for every method except Model Stock for some reason. I'll see if I can fix it and upload my intended task-arithmetic version as a v2.

This is the only one of my like 700 merges that I think uses something novel/interesting enough in its creation to merit an upload.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

# THIS YAML CONFIGURATION WAS USED TO CREATE THE INTERMEDIARY MODEL.
# slices:
#   - sources:
#     - model: anthracite-org/magnum-v3-9b-customgemma2
#       layer_range: [0, 42]
#     - model: nbeerbower/gemma2-gutenberg-9B
#       layer_range: [0, 42]
# merge_method: slerp
# base_model: nbeerbower/gemma2-gutenberg-9B
# parameters:
#   t:
#     - filter: self_attn
#       value: [0.2, 0.5, 0.4, 0.7, 1]
#     - filter: mlp
#       value: [1, 0.5, 0.3, 0.4, 0.2]
#     - value: 0.5
# dtype: float16

# THIS YAML CONFIGURATION WAS USED TO CREATE ASTER. The E: model is the intermediate
# model created in the previous config.
models:
  - model: E:/models/mergekit/output/intermediate/
  - model: BeaverLegacy/Smegmma-Deluxe-9B-v1
    parameters:
      weight: 0.3
  - model: ifable/gemma-2-Ifable-9B
    parameters:
      weight: 0.3
  - model: UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
    parameters:
      weight: 0.15
  - model: grimjim/Magnolia-v1-Gemma2-8k-9B
    parameters:
      weight: 0.25
merge_method: model_stock
base_model: E:/models/mergekit/output/intermediate/
dtype: float16

Alright, now back to smashing models together and seeing what happens...