Edit model card

This is just an experiment similar to that done on chargoddard/llama3-42b-v0. The post-pruning was fine-tuned or "healed" with QLoRA using the code DPO dataset AlekseyKorshuk/evol-codealpaca-v1-dpo. Due to limitations, this was only trained on 3150/4935 (~64%) steps of the data. I had to restart the training about halfway through, so the logs are split in two. I am still unsure if the tokenizer is correct.

Loss: ~1.2

mergekit.yaml

slices:
  - sources:
      - model: ./Meta-Llama-3-8B-Instruct/
        layer_range: [0,15]
  - sources:
      - model: ./Meta-Llama-3-8B-Instruct/
        layer_range: [29,32]
            
merge_method: passthrough
dtype: bfloat16

ORPOConfig

    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_length=1024,
    max_prompt_length=512,
    overwrite_output_dir=False,
    beta=0.1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    num_train_epochs=1,
    evaluation_strategy="steps",
    eval_steps=0.02,
    logging_steps=1,
    warmup_steps=50,
    report_to="wandb",
    output_dir=out_dir_folder,
    fp16=True,
    save_steps=50
Downloads last month
26
Safetensors
Model size
4.98B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.