Edit model card

tFINE-680m-e32-d16-infinity_instruct-L2

this is an instruction-tuned version of a pretrained t5 with GQA.

Model description

This model is a fine-tuned version of BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L1 on the pszemraj/infinity-instruct-7m-T2T_en dataset (config deduped-L2).

It achieves the following results on the evaluation set:

  • Loss: 1.3139
  • Num Input Tokens Seen: 361724696

usage

prerequisite: you need to have t5-gqa fork of transformers installed, and accelerate.

from transformers import pipeline

pipe = pipeline(
    "text2text-generation",
    model="BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2",
    device_map="auto",
)
prompt = "Write me a python fn that demonstrates an advanced sorting algorithm"
res = pipe(
    prompt, max_new_tokens=384, num_beams=4, early_stopping=True, repetition_penalty=1.1
)
print(res[0]["generated_text"])

Quick eval

Quick eval for: BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2

hf (pretrained=BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2,trust_remote_code=True,dtype=bfloat16,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
boolq 2 none 0 acc ↑ 0.6364 ± 0.0084
openbookqa 1 none 0 acc ↑ 0.1480 ± 0.0159
none 0 acc_norm ↑ 0.2860 ± 0.0202
piqa 1 none 0 acc ↑ 0.6083 ± 0.0114
none 0 acc_norm ↑ 0.6132 ± 0.0114
social_iqa 0 none 0 acc ↑ 0.3854 ± 0.0110
tinyArc 0 none 25 acc_norm ↑ 0.3122 ± N/A
tinyHellaswag 0 none 10 acc_norm ↑ 0.3356 ± N/A
tinyMMLU 0 none 0 acc_norm ↑ 0.2793 ± N/A
winogrande 1 none 0 acc ↑ 0.5201 ± 0.0140

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2.5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 17868
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 256
  • total_eval_batch_size: 8
  • optimizer: Use paged_ademamix_32bit and the args are: No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.02
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.4008 0.2534 1000 1.4020 91375832
1.3456 0.5068 2000 1.3669 182939052
1.3437 0.7602 3000 1.3378 274855796
Downloads last month
22
Safetensors
Model size
680M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2

Dataset used to train BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2