Edit model card

This model is made with the intention to be used for fine-tuning. It should not to be used for inference as is. This is a pruned version of Meta-Llama-3-70B-Instruct .

Meta-Llama-3-70B-Instruct has 70.6 billion params and Drobeta-Turnu-Severin has 44.9 billion (~63% param size)

Steps to replicate:

Use laserQlora.ipynb from cognitivecomputations/laserRMT to determine which layers should be eliminated.

Adapt the script for Meta-Llama-3-70B-Instruct by replacing model_name = "mistralai/Mistral-7B-v0.1" with model_name = "Meta-Llama-3-70B-Instruct" and layer_numbers = list(range(31, -1, -1)) with layer_numbers = list(range(79, -1, -1)), 79 being the last recurrent layer index Meta-Llama-3-70B-Instruct has.

Then look for the layer indexes where self_attn.v_proj snr is Infinity and eliminate those layers using mergekit. Here are the layer indexes that were eliminated: 11,17,37,40,41,42,43,44,45,46,48,49,50,51,53,54,55,57,58,59,60,61,62,63,64,65,66,67,68,69 .

Here is the mergekit config:

slices:
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [0, 11]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [12, 17]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [18, 37]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [38, 40]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [47, 48]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [52, 53]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [56, 57]
  - sources:
    - model: "meta-llama/Meta-Llama-3-70B-Instruct"
      layer_range: [70, 80]
merge_method: passthrough
dtype: bfloat16
Downloads last month
2,536
Safetensors
Model size
44.9B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.