language: en | |
license: mit | |
library_name: pytorch | |
# Plainly Optimized Network | |
Dataset: BIGBENCH | |
Trainer Hyperparameters: | |
- `lr` = 5e-05 | |
- `per_device_batch_size` = 1 | |
- `gradient_accumulation_steps` = 4 | |
- `weight_decay` = 1e-09 | |
- `seed` = 42 | |
|eval_loss|eval_accuracy|epoch| | |
|--|--|--| | |
|66.323|0.063|1.0| | |
|59.935|0.055|2.0| | |
|60.344|0.056|3.0| | |
|58.559|0.054|4.0| | |
|56.373|0.051|5.0| | |
|58.011|0.053|6.0| | |
|64.814|0.059|7.0| | |
|54.974|0.048|8.0| | |
|59.489|0.055|9.0| | |
|55.248|0.049|10.0| | |
|51.685|0.044|11.0| | |
|54.073|0.048|12.0| | |
|57.350|0.051|13.0| | |
|54.031|0.048|14.0| | |
|53.526|0.048|15.0| | |
|53.041|0.047|16.0| | |
|55.731|0.050|17.0| | |
|52.224|0.045|18.0| | |
|52.757|0.046|19.0| | |