alexbalandi's picture
Upload PPO LunarLander-v2 trained agent, used 1 mil more steps with more loose variance hyperparameter.
3120398
raw
history blame
163 Bytes
{"mean_reward": 286.0182618528187, "std_reward": 16.23159898013778, "is_deterministic": true, "n_eval_episodes": 10, "eval_datetime": "2023-03-13T10:49:57.212941"}