--- language: en license: mit library_name: pytorch --- # Plainly Optimized Network Dataset: BIGBENCH Trainer Hyperparameters: - `lr` = 5e-05 - `per_device_batch_size` = 1 - `gradient_accumulation_steps` = 4 - `weight_decay` = 1e-09 - `seed` = 42 |eval_loss|eval_mse|epoch| |--|--|--| |58.741|0.055|1.0| |60.624|0.058|2.0| |60.765|0.057|3.0| |55.858|0.051|4.0| |57.271|0.053|5.0| |56.004|0.051|6.0| |60.246|0.056|7.0| |55.218|0.049|8.0| |55.261|0.049|9.0| |54.730|0.049|10.0| |58.137|0.052|11.0| |53.927|0.048|12.0| |56.143|0.051|13.0| |54.604|0.049|14.0| |53.596|0.048|15.0| |54.241|0.049|16.0| |55.500|0.050|17.0| |53.256|0.047|18.0| |53.139|0.047|19.0|