--- language: en license: mit library_name: pytorch --- # Plainly Optimized Network Dataset: BIGBENCH Trainer Hyperparameters: - `lr` = 5e-05 - `per_device_batch_size` = 1 - `gradient_accumulation_steps` = 4 - `weight_decay` = 1e-09 - `seed` = 42 |eval_loss|eval_accuracy|epoch| |--|--|--| |58.940|0.054|1.0| |54.182|0.049|2.0| |56.362|0.051|3.0| |52.705|0.046|4.0| |55.357|0.050|5.0| |53.973|0.048|6.0| |56.034|0.050|7.0| |51.731|0.045|8.0| |54.661|0.048|9.0| |50.378|0.043|10.0| |51.579|0.044|11.0| |51.193|0.044|12.0| |52.724|0.046|13.0| |52.055|0.045|14.0| |51.406|0.044|15.0| |51.539|0.045|16.0| |52.422|0.046|17.0| |50.304|0.043|18.0| |50.937|0.044|19.0|