Llama-3-11.5B-Instruct-attenuated
The core idea came from @jukofyork, see this issue;
As I understand, The concept of the idea is to make model think twice but leap same distances like original. but why 0.7071067812?
The scale factor to use, eg: solve x^2 = 1/2 --> x = 1/sqrt(2) ≈ 0.7071067812
Merge Details
Merge Method
This model was merged using the passthrough merge method.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
###############################
# llama-3-attenuated.yaml #
###############################
# Use: mergekit-yaml --clone-tensors ./llama-3-attenuated.yaml ./llama-3-attenuated
# See: https://github.com/arcee-ai/mergekit/issues/198 for discussion/reasoning behind this idea.
# ---
# The scale factor to use, eg: solve x^2 = 1/2 --> x = 1/sqrt(2) ≈ 0.7071067812
const_tag: &scale_factor 0.7071067812 # 1/sqrt(2)
# The filter parameters of a scaled block.
attenuate-env: &attenuated_env
parameters:
scale:
- filter: q_proj
value: *scale_factor
- filter: k_proj
value: *scale_factor
- value: 1.0
# ---
slices:
###########################
# Block 1: miqu-1 [0, 16] #
###########################
- sources:
- model: kuotient/Meta-Llama-3-8B-Instruct
layer_range: [0, 8] # The first 8 layers of Block 1 are not duplicated
- sources:
- model: kuotient/Meta-Llama-3-8B-Instruct
layer_range: [8, 16] # The last 8 layers of Block 1 are are duplicated twice
<<: *attenuated_env
###########################
# Block 2: miqu-1 [8, 24] #
###########################
- sources:
- model: kuotient/Meta-Llama-3-8B-Instruct
layer_range: [8, 24] # All the layers of Block 2 are are duplicated twice
<<: *attenuated_env
##########################
# Block 3: miqu-1 [16, 32] #
##########################
- sources:
- model: kuotient/Meta-Llama-3-8B-Instruct
layer_range: [16, 24] # The first 8 layers of Block 3 are are duplicated twice
<<: *attenuated_env
- sources:
- model: kuotient/Meta-Llama-3-8B-Instruct
layer_range: [24, 32] # The last 8 layers of Block 3 are not duplicated
merge_method: passthrough
dtype: bfloat16
- Downloads last month
- 4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for kuotient/Llama-3-11B-Instruct-attenuated
Base model
kuotient/Meta-Llama-3-8B-Instruct