mega-ar-350m-v0.13
Model description
Continued-training of BEE-spoke-data/mega-ar-350m-L3t-v0.08-ultraTBfw on a few more datasets.
It achieves the following results on the evaluation set (BEE-spoke-data/UltraTextbooks-2.1-fw_mix
):
- Loss: 1.9926
- Accuracy: 0.5885
- Num Input Tokens Seen: 3468165120
Quick eval
Quick eval for: pszemraj/mega-ar-350m-v0.13
hf (pretrained=pszemraj/mega-ar-350m-v0.13,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_easy | 1 | none | 0 | acc | 0.4491 | ± | 0.0102 |
none | 0 | acc_norm | 0.4061 | ± | 0.0101 | ||
boolq | 2 | none | 0 | acc | 0.5367 | ± | 0.0087 |
lambada_openai | 1 | none | 0 | perplexity | 55.3308 | ± | 2.3100 |
none | 0 | acc | 0.3113 | ± | 0.0065 | ||
openbookqa | 1 | none | 0 | acc | 0.1760 | ± | 0.0170 |
none | 0 | acc_norm | 0.2680 | ± | 0.0198 | ||
piqa | 1 | none | 0 | acc | 0.6366 | ± | 0.0112 |
none | 0 | acc_norm | 0.6213 | ± | 0.0113 | ||
winogrande | 1 | none | 0 | acc | 0.5036 | ± | 0.0141 |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 80085
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 32
- total_train_batch_size: 96
- total_eval_batch_size: 3
- optimizer: Adam with betas=(0.9,0.985) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1.0
- Downloads last month
- 2,633
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.