EVA LLaMA 3.33 70B v0.0
A RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct on mixture of synthetic and natural data.
It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.
This model was built with Llama by Meta.
Prompt format is Llama3.
Recommended sampler values:
- Temperature: 1
- Min-P: 0.05
- Repetition Penalty: 1.03
Recommended SillyTavern preset (via Virt-io):
Training data:
- Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's card for details.
- Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.
- A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe
- A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe
- Synthstruct and SynthRP datasets by Epiculous
- A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.
Training time and hardware:
- 10 hours on 8xH100 SXM
Model was created by Kearm, Auri and Cahvay.
Special thanks:
- to Cahvay for his work on dataset filtering.
- to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data
- and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.
Licensing
Llama-3.3-70B-Instruct by Meta is licensed under Llama 3.3 Community License Agreement (further referred as L3.3 license) and is a subject to Acceptable Use Policy for Llama Materials.
This derivative is free for personal, research and commercial use on terms of L3.3 license with one extra clause:
- Infermatic Inc and any of its employees or paid associates cannot utilize, distribute, download, or otherwise make use of EVA models for any purpose.
See axolotl config
axolotl version: 0.4.1
base_model: meta-llama/Llama-3.3-70B-Instruct
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
strict: false
chat_template: llama3
datasets:
- path: datasets/Celeste_Filtered_utf8fix.jsonl
type: sharegpt
- path: datasets/deduped_not_samantha_norefusals.jsonl
type: sharegpt
- path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
type: sharegpt
- path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
type: sharegpt
- path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
type: sharegpt
- path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
type: sharegpt
- path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
type: sharegpt
- path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
type: sharegpt
dataset_prepared_path: last_run_prepared
val_set_size: 0.001
output_dir: /dev/shm/EVA-LLaMA-3.33-70B-v0.1
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
wandb_project: EVA-LLaMA-3.33-70B
wandb_entity:
wandb_watch:
wandb_name: Unit-v0.1
wandb_log_model:
unfrozen_parameters:
- ^lm_head.weight$
- ^model.embed_tokens.weight$
# mlp.down_proj layers
- model.layers.40.mlp.down_proj
- model.layers.44.mlp.down_proj
- model.layers.45.mlp.down_proj
- model.layers.46.mlp.down_proj
- model.layers.43.mlp.down_proj
- model.layers.52.mlp.down_proj
- model.layers.47.mlp.down_proj
- model.layers.39.mlp.down_proj
- model.layers.48.mlp.down_proj
- model.layers.49.mlp.down_proj
- model.layers.38.mlp.down_proj
- model.layers.53.mlp.down_proj
- model.layers.35.mlp.down_proj
- model.layers.41.mlp.down_proj
- model.layers.51.mlp.down_proj
- model.layers.42.mlp.down_proj
- model.layers.37.mlp.down_proj
- model.layers.50.mlp.down_proj
- model.layers.76.mlp.down_proj
- model.layers.60.mlp.down_proj
- model.layers.36.mlp.down_proj
- model.layers.54.mlp.down_proj
- model.layers.57.mlp.down_proj
- model.layers.56.mlp.down_proj
- model.layers.59.mlp.down_proj
- model.layers.55.mlp.down_proj
- model.layers.77.mlp.down_proj
- model.layers.61.mlp.down_proj
- model.layers.58.mlp.down_proj
- model.layers.65.mlp.down_proj
- model.layers.75.mlp.down_proj
- model.layers.64.mlp.down_proj
- model.layers.62.mlp.down_proj
- model.layers.68.mlp.down_proj
- model.layers.19.mlp.down_proj
- model.layers.73.mlp.down_proj
- model.layers.66.mlp.down_proj
- model.layers.67.mlp.down_proj
- model.layers.63.mlp.down_proj
- model.layers.74.mlp.down_proj
# mlp.gate_proj layers
- model.layers.70.mlp.gate_proj
- model.layers.71.mlp.gate_proj
- model.layers.67.mlp.gate_proj
- model.layers.58.mlp.gate_proj
- model.layers.55.mlp.gate_proj
- model.layers.57.mlp.gate_proj
- model.layers.56.mlp.gate_proj
- model.layers.66.mlp.gate_proj
- model.layers.72.mlp.gate_proj
- model.layers.52.mlp.gate_proj
- model.layers.69.mlp.gate_proj
- model.layers.54.mlp.gate_proj
- model.layers.62.mlp.gate_proj
- model.layers.60.mlp.gate_proj
- model.layers.59.mlp.gate_proj
- model.layers.74.mlp.gate_proj
- model.layers.51.mlp.gate_proj
- model.layers.68.mlp.gate_proj
- model.layers.61.mlp.gate_proj
- model.layers.53.mlp.gate_proj
- model.layers.73.mlp.gate_proj
- model.layers.63.mlp.gate_proj
- model.layers.48.mlp.gate_proj
- model.layers.49.mlp.gate_proj
- model.layers.64.mlp.gate_proj
- model.layers.50.mlp.gate_proj
- model.layers.65.mlp.gate_proj
- model.layers.47.mlp.gate_proj
- model.layers.44.mlp.gate_proj
- model.layers.45.mlp.gate_proj
- model.layers.75.mlp.gate_proj
- model.layers.46.mlp.gate_proj
- model.layers.43.mlp.gate_proj
- model.layers.77.mlp.gate_proj
- model.layers.41.mlp.gate_proj
- model.layers.40.mlp.gate_proj
- model.layers.42.mlp.gate_proj
- model.layers.32.mlp.gate_proj
- model.layers.30.mlp.gate_proj
- model.layers.39.mlp.gate_proj
# mlp.up_proj layers
- model.layers.70.mlp.up_proj
- model.layers.67.mlp.up_proj
- model.layers.66.mlp.up_proj
- model.layers.69.mlp.up_proj
- model.layers.62.mlp.up_proj
- model.layers.63.mlp.up_proj
- model.layers.65.mlp.up_proj
- model.layers.68.mlp.up_proj
- model.layers.71.mlp.up_proj
- model.layers.64.mlp.up_proj
- model.layers.61.mlp.up_proj
- model.layers.58.mlp.up_proj
- model.layers.59.mlp.up_proj
- model.layers.57.mlp.up_proj
- model.layers.55.mlp.up_proj
- model.layers.72.mlp.up_proj
- model.layers.54.mlp.up_proj
- model.layers.56.mlp.up_proj
- model.layers.60.mlp.up_proj
- model.layers.73.mlp.up_proj
- model.layers.50.mlp.up_proj
- model.layers.51.mlp.up_proj
- model.layers.53.mlp.up_proj
- model.layers.52.mlp.up_proj
- model.layers.74.mlp.up_proj
- model.layers.49.mlp.up_proj
- model.layers.30.mlp.up_proj
- model.layers.47.mlp.up_proj
- model.layers.46.mlp.up_proj
- model.layers.34.mlp.up_proj
- model.layers.48.mlp.up_proj
- model.layers.38.mlp.up_proj
- model.layers.45.mlp.up_proj
- model.layers.43.mlp.up_proj
- model.layers.29.mlp.up_proj
- model.layers.42.mlp.up_proj
- model.layers.75.mlp.up_proj
- model.layers.35.mlp.up_proj
- model.layers.44.mlp.up_proj
- model.layers.31.mlp.up_proj
# self_attn.k_proj layers
- model.layers.72.self_attn.k_proj
- model.layers.75.self_attn.k_proj
- model.layers.71.self_attn.k_proj
- model.layers.74.self_attn.k_proj
- model.layers.44.self_attn.k_proj
- model.layers.31.self_attn.k_proj
- model.layers.33.self_attn.k_proj
- model.layers.34.self_attn.k_proj
- model.layers.76.self_attn.k_proj
- model.layers.78.self_attn.k_proj
- model.layers.77.self_attn.k_proj
- model.layers.60.self_attn.k_proj
- model.layers.56.self_attn.k_proj
- model.layers.22.self_attn.k_proj
- model.layers.2.self_attn.k_proj
- model.layers.18.self_attn.k_proj
- model.layers.17.self_attn.k_proj
- model.layers.21.self_attn.k_proj
- model.layers.19.self_attn.k_proj
- model.layers.23.self_attn.k_proj
- model.layers.52.self_attn.k_proj
- model.layers.73.self_attn.k_proj
- model.layers.35.self_attn.k_proj
- model.layers.15.self_attn.k_proj
- model.layers.27.self_attn.k_proj
- model.layers.29.self_attn.k_proj
- model.layers.36.self_attn.k_proj
- model.layers.28.self_attn.k_proj
- model.layers.20.self_attn.k_proj
- model.layers.25.self_attn.k_proj
- model.layers.37.self_attn.k_proj
- model.layers.30.self_attn.k_proj
- model.layers.41.self_attn.k_proj
- model.layers.16.self_attn.k_proj
- model.layers.32.self_attn.k_proj
- model.layers.68.self_attn.k_proj
- model.layers.26.self_attn.k_proj
- model.layers.38.self_attn.k_proj
- model.layers.39.self_attn.k_proj
- model.layers.70.self_attn.k_proj
# self_attn.o_proj layers
- model.layers.50.self_attn.o_proj
- model.layers.61.self_attn.o_proj
- model.layers.46.self_attn.o_proj
- model.layers.53.self_attn.o_proj
- model.layers.54.self_attn.o_proj
- model.layers.19.self_attn.o_proj
- model.layers.42.self_attn.o_proj
- model.layers.41.self_attn.o_proj
- model.layers.49.self_attn.o_proj
- model.layers.68.self_attn.o_proj
- model.layers.18.self_attn.o_proj
- model.layers.45.self_attn.o_proj
- model.layers.11.self_attn.o_proj
- model.layers.48.self_attn.o_proj
- model.layers.51.self_attn.o_proj
- model.layers.67.self_attn.o_proj
- model.layers.64.self_attn.o_proj
- model.layers.13.self_attn.o_proj
- model.layers.14.self_attn.o_proj
- model.layers.16.self_attn.o_proj
- model.layers.17.self_attn.o_proj
- model.layers.47.self_attn.o_proj
- model.layers.0.self_attn.o_proj
- model.layers.20.self_attn.o_proj
- model.layers.63.self_attn.o_proj
- model.layers.5.self_attn.o_proj
- model.layers.15.self_attn.o_proj
- model.layers.21.self_attn.o_proj
- model.layers.52.self_attn.o_proj
- model.layers.12.self_attn.o_proj
- model.layers.10.self_attn.o_proj
- model.layers.56.self_attn.o_proj
- model.layers.62.self_attn.o_proj
- model.layers.22.self_attn.o_proj
- model.layers.6.self_attn.o_proj
- model.layers.7.self_attn.o_proj
- model.layers.43.self_attn.o_proj
- model.layers.38.self_attn.o_proj
- model.layers.9.self_attn.o_proj
- model.layers.44.self_attn.o_proj
# self_attn.q_proj layers
- model.layers.2.self_attn.q_proj
- model.layers.4.self_attn.q_proj
- model.layers.46.self_attn.q_proj
- model.layers.5.self_attn.q_proj
- model.layers.7.self_attn.q_proj
- model.layers.6.self_attn.q_proj
- model.layers.9.self_attn.q_proj
- model.layers.10.self_attn.q_proj
- model.layers.1.self_attn.q_proj
- model.layers.18.self_attn.q_proj
- model.layers.62.self_attn.q_proj
- model.layers.8.self_attn.q_proj
- model.layers.15.self_attn.q_proj
- model.layers.14.self_attn.q_proj
- model.layers.31.self_attn.q_proj
- model.layers.17.self_attn.q_proj
- model.layers.16.self_attn.q_proj
- model.layers.19.self_attn.q_proj
- model.layers.12.self_attn.q_proj
- model.layers.33.self_attn.q_proj
- model.layers.35.self_attn.q_proj
- model.layers.21.self_attn.q_proj
- model.layers.13.self_attn.q_proj
- model.layers.27.self_attn.q_proj
- model.layers.56.self_attn.q_proj
- model.layers.34.self_attn.q_proj
- model.layers.11.self_attn.q_proj
- model.layers.52.self_attn.q_proj
- model.layers.28.self_attn.q_proj
- model.layers.54.self_attn.q_proj
- model.layers.30.self_attn.q_proj
- model.layers.29.self_attn.q_proj
- model.layers.20.self_attn.q_proj
- model.layers.75.self_attn.q_proj
- model.layers.37.self_attn.q_proj
- model.layers.44.self_attn.q_proj
- model.layers.23.self_attn.q_proj
- model.layers.64.self_attn.q_proj
- model.layers.60.self_attn.q_proj
- model.layers.36.self_attn.q_proj
# self_attn.v_proj layers
- model.layers.11.self_attn.v_proj
- model.layers.17.self_attn.v_proj
- model.layers.37.self_attn.v_proj
- model.layers.40.self_attn.v_proj
- model.layers.41.self_attn.v_proj
- model.layers.42.self_attn.v_proj
- model.layers.43.self_attn.v_proj
- model.layers.44.self_attn.v_proj
- model.layers.45.self_attn.v_proj
- model.layers.46.self_attn.v_proj
- model.layers.48.self_attn.v_proj
- model.layers.49.self_attn.v_proj
- model.layers.50.self_attn.v_proj
- model.layers.51.self_attn.v_proj
- model.layers.53.self_attn.v_proj
- model.layers.54.self_attn.v_proj
- model.layers.55.self_attn.v_proj
- model.layers.57.self_attn.v_proj
- model.layers.58.self_attn.v_proj
- model.layers.59.self_attn.v_proj
- model.layers.60.self_attn.v_proj
- model.layers.61.self_attn.v_proj
- model.layers.62.self_attn.v_proj
- model.layers.63.self_attn.v_proj
- model.layers.64.self_attn.v_proj
- model.layers.65.self_attn.v_proj
- model.layers.66.self_attn.v_proj
- model.layers.67.self_attn.v_proj
- model.layers.69.self_attn.v_proj
- model.layers.75.self_attn.v_proj
- model.layers.18.self_attn.v_proj
- model.layers.78.self_attn.v_proj
- model.layers.68.self_attn.v_proj
- model.layers.47.self_attn.v_proj
- model.layers.38.self_attn.v_proj
- model.layers.39.self_attn.v_proj
- model.layers.71.self_attn.v_proj
- model.layers.19.self_attn.v_proj
- model.layers.36.self_attn.v_proj
- model.layers.20.self_attn.v_proj
gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 3
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00003
max_grad_norm: 2
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: "unsloth"
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 20
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero3_bf16.json
weight_decay: 0.2
- Downloads last month
- 737
Model tree for EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0
Base model
meta-llama/Llama-3.1-70B