Edit model card

llama 3 1b

wip effort to make merging compatible llama model

comparison to palmer-004

Component palmer-004 llama 3 1b How to Make Second Similar to First
Total Layers 22 (0 to 21) 16 (0 to 15) Add 6 more layers (16 to 21) with identical structure to existing layers
Embedding Layer model.embed_tokens.weight model.embed_tokens.weight Already identical
Self-Attention Layers 22 sets of (q_proj, k_proj, v_proj, o_proj) weights 16 sets of (q_proj, k_proj, v_proj, o_proj) weights Add 6 more sets of self-attention weights
MLP Layers 22 sets of (gate_proj, up_proj, down_proj) weights 16 sets of (gate_proj, up_proj, down_proj) weights Add 6 more sets of MLP weights
Layer Normalization 22 sets of (input_layernorm, post_attention_layernorm) weights 16 sets of (input_layernorm, post_attention_layernorm) weights Add 6 more sets of layer normalization weights
Final Normalization model.norm.weight model.norm.weight Already identical
Language Model Head lm_head.weight lm_head.weight Already identical
Layer Structure Consistent across all 22 layers Consistent across all 16 layers Maintain the same structure when adding new layers
Hidden Size Likely consistent (inferred from weight names) Likely consistent (inferred from weight names) Ensure new layers use the same hidden size
Attention Heads Likely consistent (inferred from weight names) Likely consistent (inferred from weight names) Ensure new layers use the same number of attention heads
Intermediate MLP Size Likely consistent (inferred from weight names) Likely consistent (inferred from weight names) Ensure new layers use the same intermediate MLP size
Position Embeddings Not explicitly mentioned (might be part of embed_tokens) Not explicitly mentioned (might be part of embed_tokens) Ensure position embeddings support the maximum sequence length of the first model
Vocabulary Size Determined by embed_tokens and lm_head dimensions Determined by embed_tokens and lm_head dimensions Already identical (assuming dimensions match)

further investigation

there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:

Traceback (most recent call last):
  File "/home/zeus/miniconda3/envs/cloudspace/bin/mergekit-yaml", line 8, in <module>
    sys.exit(main())
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/teamspace/studios/this_studio/mergekit/mergekit/options.py", line 82, in wrapper
    f(*args, **kwargs)
  File "/teamspace/studios/this_studio/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
    run_merge(
  File "/teamspace/studios/this_studio/mergekit/mergekit/merge.py", line 96, in run_merge
    for _task, value in exec.run(quiet=options.quiet):
  File "/teamspace/studios/this_studio/mergekit/mergekit/graph.py", line 197, in run
    res = task.execute(**arguments)
  File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
    raise RuntimeError(
RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B

which seems odd given the following output layers from llama-3-1b:

model.embed_tokens.weight
model.layers.0.self_attn.q_proj.weight
model.layers.0.self_attn.k_proj.weight
model.layers.0.self_attn.v_proj.weight
model.layers.0.self_attn.o_proj.weight
model.layers.0.mlp.gate_proj.weight
model.layers.0.mlp.up_proj.weight
model.layers.0.mlp.down_proj.weight
model.layers.0.input_layernorm.weight
model.layers.0.post_attention_layernorm.weight
model.layers.1.self_attn.q_proj.weight
model.layers.1.self_attn.k_proj.weight
model.layers.1.self_attn.v_proj.weight
model.layers.1.self_attn.o_proj.weight
model.layers.1.mlp.gate_proj.weight
model.layers.1.mlp.up_proj.weight
model.layers.1.mlp.down_proj.weight
model.layers.1.input_layernorm.weight
model.layers.1.post_attention_layernorm.weight
model.layers.2.self_attn.q_proj.weight
model.layers.2.self_attn.k_proj.weight
model.layers.2.self_attn.v_proj.weight
model.layers.2.self_attn.o_proj.weight
model.layers.2.mlp.gate_proj.weight
model.layers.2.mlp.up_proj.weight
model.layers.2.mlp.down_proj.weight
model.layers.2.input_layernorm.weight
model.layers.2.post_attention_layernorm.weight
model.layers.3.self_attn.q_proj.weight
model.layers.3.self_attn.k_proj.weight
model.layers.3.self_attn.v_proj.weight
model.layers.3.self_attn.o_proj.weight
model.layers.3.mlp.gate_proj.weight
model.layers.3.mlp.up_proj.weight
model.layers.3.mlp.down_proj.weight
model.layers.3.input_layernorm.weight
model.layers.3.post_attention_layernorm.weight
model.layers.4.self_attn.q_proj.weight
model.layers.4.self_attn.k_proj.weight
model.layers.4.self_attn.v_proj.weight
model.layers.4.self_attn.o_proj.weight
model.layers.4.mlp.gate_proj.weight
model.layers.4.mlp.up_proj.weight
model.layers.4.mlp.down_proj.weight
model.layers.4.input_layernorm.weight
model.layers.4.post_attention_layernorm.weight
model.layers.5.self_attn.q_proj.weight
model.layers.5.self_attn.k_proj.weight
model.layers.5.self_attn.v_proj.weight
model.layers.5.self_attn.o_proj.weight
model.layers.5.mlp.gate_proj.weight
model.layers.5.mlp.up_proj.weight
model.layers.5.mlp.down_proj.weight
model.layers.5.input_layernorm.weight
model.layers.5.post_attention_layernorm.weight
model.layers.6.self_attn.q_proj.weight
model.layers.6.self_attn.k_proj.weight
model.layers.6.self_attn.v_proj.weight
model.layers.6.self_attn.o_proj.weight
model.layers.6.mlp.gate_proj.weight
model.layers.6.mlp.up_proj.weight
model.layers.6.mlp.down_proj.weight
model.layers.6.input_layernorm.weight
model.layers.6.post_attention_layernorm.weight
model.layers.7.self_attn.q_proj.weight
model.layers.7.self_attn.k_proj.weight
model.layers.7.self_attn.v_proj.weight
model.layers.7.self_attn.o_proj.weight
model.layers.7.mlp.gate_proj.weight
model.layers.7.mlp.up_proj.weight
model.layers.7.mlp.down_proj.weight
model.layers.7.input_layernorm.weight
model.layers.7.post_attention_layernorm.weight
model.layers.8.self_attn.q_proj.weight
model.layers.8.self_attn.k_proj.weight
model.layers.8.self_attn.v_proj.weight
model.layers.8.self_attn.o_proj.weight
model.layers.8.mlp.gate_proj.weight
model.layers.8.mlp.up_proj.weight
model.layers.8.mlp.down_proj.weight
model.layers.8.input_layernorm.weight
model.layers.8.post_attention_layernorm.weight
model.layers.9.self_attn.q_proj.weight
model.layers.9.self_attn.k_proj.weight
model.layers.9.self_attn.v_proj.weight
model.layers.9.self_attn.o_proj.weight
model.layers.9.mlp.gate_proj.weight
model.layers.9.mlp.up_proj.weight
model.layers.9.mlp.down_proj.weight
model.layers.9.input_layernorm.weight
model.layers.9.post_attention_layernorm.weight
model.layers.10.self_attn.q_proj.weight
model.layers.10.self_attn.k_proj.weight
model.layers.10.self_attn.v_proj.weight
model.layers.10.self_attn.o_proj.weight
model.layers.10.mlp.gate_proj.weight
model.layers.10.mlp.up_proj.weight
model.layers.10.mlp.down_proj.weight
model.layers.10.input_layernorm.weight
model.layers.10.post_attention_layernorm.weight
model.layers.11.self_attn.q_proj.weight
model.layers.11.self_attn.k_proj.weight
model.layers.11.self_attn.v_proj.weight
model.layers.11.self_attn.o_proj.weight
model.layers.11.mlp.gate_proj.weight
model.layers.11.mlp.up_proj.weight
model.layers.11.mlp.down_proj.weight
model.layers.11.input_layernorm.weight
model.layers.11.post_attention_layernorm.weight
model.layers.12.self_attn.q_proj.weight
model.layers.12.self_attn.k_proj.weight
model.layers.12.self_attn.v_proj.weight
model.layers.12.self_attn.o_proj.weight
model.layers.12.mlp.gate_proj.weight
model.layers.12.mlp.up_proj.weight
model.layers.12.mlp.down_proj.weight
model.layers.12.input_layernorm.weight
model.layers.12.post_attention_layernorm.weight
model.layers.13.self_attn.q_proj.weight
model.layers.13.self_attn.k_proj.weight
model.layers.13.self_attn.v_proj.weight
model.layers.13.self_attn.o_proj.weight
model.layers.13.mlp.gate_proj.weight
model.layers.13.mlp.up_proj.weight
model.layers.13.mlp.down_proj.weight
model.layers.13.input_layernorm.weight
model.layers.13.post_attention_layernorm.weight
model.layers.14.self_attn.q_proj.weight
model.layers.14.self_attn.k_proj.weight
model.layers.14.self_attn.v_proj.weight
model.layers.14.self_attn.o_proj.weight
model.layers.14.mlp.gate_proj.weight
model.layers.14.mlp.up_proj.weight
model.layers.14.mlp.down_proj.weight
model.layers.14.input_layernorm.weight
model.layers.14.post_attention_layernorm.weight
model.layers.15.self_attn.q_proj.weight
model.layers.15.self_attn.k_proj.weight
model.layers.15.self_attn.v_proj.weight
model.layers.15.self_attn.o_proj.weight
model.layers.15.mlp.gate_proj.weight
model.layers.15.mlp.up_proj.weight
model.layers.15.mlp.down_proj.weight
model.layers.15.input_layernorm.weight
model.layers.15.post_attention_layernorm.weight
model.norm.weight
lm_head.weight
Downloads last month
26
Safetensors
Model size
1.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.