tags: | |
- LunarLander-v2 | |
- ppo | |
- deep-reinforcement-learning | |
- reinforcement-learning | |
- custom-implementation | |
- deep-rl-class | |
model-index: | |
- name: PPO | |
results: | |
- task: | |
type: reinforcement-learning | |
name: reinforcement-learning | |
dataset: | |
name: LunarLander-v2 | |
type: LunarLander-v2 | |
metrics: | |
- type: mean_reward | |
value: 15.03 +/- 87.72 | |
name: mean_reward | |
verified: false | |
# PPO Agent Playing LunarLander-v2 | |
This is a trained model of a PPO agent playing LunarLander-v2. | |
To learn to code your own PPO agent and train it Unit 8 of the Deep Reinforcement Learning Class: https://github.com/huggingface/deep-rl-class/tree/main/unit8 | |
# Hyperparameters | |
```python | |
{'exp_name': 'ppo' | |
'seed': 1 | |
'torch_deterministic': True | |
'cuda': False | |
'track': False | |
'wandb_project_name': 'cleanRL' | |
'wandb_entity': 'KeWangRL' | |
'capture_video': False | |
'env_id': 'LunarLander-v2' | |
'total_timesteps': 1000000 | |
'learning_rate': 0.00035 | |
'num_envs': 8 | |
'num_steps': 128 | |
'anneal_lr': True | |
'gae': True | |
'gamma': 0.99 | |
'gae_lambda': 0.95 | |
'num_minibatches': 4 | |
'update_epochs': 4 | |
'norm_adv': True | |
'clip_coef': 0.2 | |
'clip_vloss': True | |
'ent_coef': 0.015 | |
'vf_coef': 0.5 | |
'max_grad_norm': 0.5 | |
'target_kl': 0.015 | |
'repo_id': 'kewangRL/LunarLander-v2' | |
'batch_size': 1024 | |
'minibatch_size': 256} | |
``` | |