This is a uncenscored version of Phi-3.
Abliterated using the following the guide here: https://huggingface.co/blog/mlabonne/abliteration
Then it was fine tuned on orpo-dpo-mix-40k
See axolotl config
axolotl version: 0.4.0
base_model: cowWhySo/Phi-3-mini-4k-instruct-Friendly
trust_remote_code: true
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
chat_template: phi_3
load_in_8bit: false
load_in_4bit: true
strict: false
save_safetensors: true
rl: dpo
datasets:
- path: mlabonne/orpo-dpo-mix-40k
split: train
type: chatml.intel
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./out
sequence_len: 4096
sample_packing: false
pad_to_sequence_len: false
adapter: qlora
lora_model_dir:
lora_r: 64
lora_alpha: 32
lora_dropout: 0.1
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_name: phi3-mini-4k-instruct-Friendly
wandb_log_model:
gradient_accumulation_steps: 8
micro_batch_size: 4
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: linear
learning_rate: 5e-6
train_on_inputs: false
group_by_length: false
bf16: auto
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: True
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 150
evals_per_epoch: 0
eval_table_size:
eval_table_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero3.json
weight_decay: 0.01
max_grad_norm: 1.0
resize_token_embeddings_to_32x: true
Quants
GGUF: https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly-gguf
Benchmarks
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
Phi-3-mini-4k-instruct-Friendly | 41 | 67.56 | 46.36 | 39.3 | 48.56 |
AGIEval
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 22.05 | ± | 2.61 |
acc_norm | 22.05 | ± | 2.61 | ||
agieval_logiqa_en | 0 | acc | 41.01 | ± | 1.93 |
acc_norm | 41.32 | ± | 1.93 | ||
agieval_lsat_ar | 0 | acc | 22.17 | ± | 2.75 |
acc_norm | 22.17 | ± | 2.75 | ||
agieval_lsat_lr | 0 | acc | 45.69 | ± | 2.21 |
acc_norm | 45.88 | ± | 2.21 | ||
agieval_lsat_rc | 0 | acc | 59.48 | ± | 3.00 |
acc_norm | 56.51 | ± | 3.03 | ||
agieval_sat_en | 0 | acc | 75.24 | ± | 3.01 |
acc_norm | 70.39 | ± | 3.19 | ||
agieval_sat_en_without_passage | 0 | acc | 39.81 | ± | 3.42 |
acc_norm | 37.86 | ± | 3.39 | ||
agieval_sat_math | 0 | acc | 33.64 | ± | 3.19 |
acc_norm | 31.82 | ± | 3.15 |
Average: 41.0%
GPT4All
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_challenge | 0 | acc | 49.74 | ± | 1.46 |
acc_norm | 50.43 | ± | 1.46 | ||
arc_easy | 0 | acc | 76.68 | ± | 0.87 |
acc_norm | 73.23 | ± | 0.91 | ||
boolq | 1 | acc | 79.27 | ± | 0.71 |
hellaswag | 0 | acc | 57.91 | ± | 0.49 |
acc_norm | 77.13 | ± | 0.42 | ||
openbookqa | 0 | acc | 35.00 | ± | 2.14 |
acc_norm | 43.80 | ± | 2.22 | ||
piqa | 0 | acc | 77.86 | ± | 0.97 |
acc_norm | 79.54 | ± | 0.94 | ||
winogrande | 0 | acc | 69.53 | ± | 1.29 |
Average: 67.56%
TruthfulQA
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
truthfulqa_mc | 1 | mc1 | 31.21 | ± | 1.62 |
mc2 | 46.36 | ± | 1.55 |
Average: 46.36%
Bigbench
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
bigbench_causal_judgement | 0 | multiple_choice_grade | 54.74 | ± | 3.62 |
bigbench_date_understanding | 0 | multiple_choice_grade | 66.67 | ± | 2.46 |
bigbench_disambiguation_qa | 0 | multiple_choice_grade | 29.46 | ± | 2.84 |
bigbench_geometric_shapes | 0 | multiple_choice_grade | 11.98 | ± | 1.72 |
exact_str_match | 0.00 | ± | 0.00 | ||
bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade | 28.00 | ± | 2.01 |
bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade | 17.14 | ± | 1.43 |
bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade | 45.67 | ± | 2.88 |
bigbench_movie_recommendation | 0 | multiple_choice_grade | 24.40 | ± | 1.92 |
bigbench_navigate | 0 | multiple_choice_grade | 53.70 | ± | 1.58 |
bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade | 68.10 | ± | 1.04 |
bigbench_ruin_names | 0 | multiple_choice_grade | 31.03 | ± | 2.19 |
bigbench_salient_translation_error_detection | 0 | multiple_choice_grade | 15.93 | ± | 1.16 |
bigbench_snarks | 0 | multiple_choice_grade | 77.35 | ± | 3.12 |
bigbench_sports_understanding | 0 | multiple_choice_grade | 52.64 | ± | 1.59 |
bigbench_temporal_sequences | 0 | multiple_choice_grade | 51.50 | ± | 1.58 |
bigbench_tracking_shuffled_objects_five_objects | 0 | multiple_choice_grade | 19.52 | ± | 1.12 |
bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 13.89 | ± | 0.83 |
bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 45.67 | ± | 2.88 |
Average: 39.3%
Average score: 48.56%
Training Summary
{
"train/loss": 0.299,
"train/grad_norm": 0.9337566701340533,
"train/learning_rate": 0,
"train/rewards/chosen": 0.08704188466072083,
"train/rewards/rejected": -2.835820436477661,
"train/rewards/accuracies": 0.84375,
"train/rewards/margins": 2.9228620529174805,
"train/logps/rejected": -509.9840393066406,
"train/logps/chosen": -560.8234252929688,
"train/logits/rejected": 1.6356163024902344,
"train/logits/chosen": 1.7323706150054932,
"train/epoch": 1.002169197396963,
"train/global_step": 231,
"_timestamp": 1717711643.3345022,
"_runtime": 22808.557655334473,
"_step": 231,
"train_runtime": 22809.152,
"train_samples_per_second": 1.944,
"train_steps_per_second": 0.01,
"total_flos": 0,
"train_loss": 0.44557410065745895,
"_wandb": {
"runtime": 22810
}
}
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.