Commit
•
5e2f480
1
Parent(s):
9352ef4
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -5,6 +5,7 @@ tags:
|
|
5 |
- deep-reinforcement-learning
|
6 |
- reinforcement-learning
|
7 |
- stable-baselines3
|
|
|
8 |
model-index:
|
9 |
- name: PPO
|
10 |
results:
|
@@ -20,64 +21,3 @@ model-index:
|
|
20 |
name: mean_reward
|
21 |
verified: false
|
22 |
---
|
23 |
-
|
24 |
-
# **PPO** Agent playing **HumanoidStandup-v2**
|
25 |
-
This is a trained model of a **PPO** agent playing **HumanoidStandup-v2**
|
26 |
-
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
|
27 |
-
and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
|
28 |
-
|
29 |
-
The RL Zoo is a training framework for Stable Baselines3
|
30 |
-
reinforcement learning agents,
|
31 |
-
with hyperparameter optimization and pre-trained agents included.
|
32 |
-
|
33 |
-
## Usage (with SB3 RL Zoo)
|
34 |
-
|
35 |
-
RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
|
36 |
-
SB3: https://github.com/DLR-RM/stable-baselines3<br/>
|
37 |
-
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
|
38 |
-
|
39 |
-
Install the RL Zoo (with SB3 and SB3-Contrib):
|
40 |
-
```bash
|
41 |
-
pip install rl_zoo3
|
42 |
-
```
|
43 |
-
|
44 |
-
```
|
45 |
-
# Download model and save it into the logs/ folder
|
46 |
-
python -m rl_zoo3.load_from_hub --algo ppo --env HumanoidStandup-v2 -orga qgallouedec -f logs/
|
47 |
-
python -m rl_zoo3.enjoy --algo ppo --env HumanoidStandup-v2 -f logs/
|
48 |
-
```
|
49 |
-
|
50 |
-
If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
|
51 |
-
```
|
52 |
-
python -m rl_zoo3.load_from_hub --algo ppo --env HumanoidStandup-v2 -orga qgallouedec -f logs/
|
53 |
-
python -m rl_zoo3.enjoy --algo ppo --env HumanoidStandup-v2 -f logs/
|
54 |
-
```
|
55 |
-
|
56 |
-
## Training (with the RL Zoo)
|
57 |
-
```
|
58 |
-
python -m rl_zoo3.train --algo ppo --env HumanoidStandup-v2 -f logs/
|
59 |
-
# Upload the model and generate video (when possible)
|
60 |
-
python -m rl_zoo3.push_to_hub --algo ppo --env HumanoidStandup-v2 -f logs/ -orga qgallouedec
|
61 |
-
```
|
62 |
-
|
63 |
-
## Hyperparameters
|
64 |
-
```python
|
65 |
-
OrderedDict([('batch_size', 32),
|
66 |
-
('clip_range', 0.3),
|
67 |
-
('ent_coef', 3.62109e-06),
|
68 |
-
('gae_lambda', 0.9),
|
69 |
-
('gamma', 0.99),
|
70 |
-
('learning_rate', 2.55673e-05),
|
71 |
-
('max_grad_norm', 0.7),
|
72 |
-
('n_envs', 1),
|
73 |
-
('n_epochs', 20),
|
74 |
-
('n_steps', 512),
|
75 |
-
('n_timesteps', 10000000.0),
|
76 |
-
('normalize', True),
|
77 |
-
('policy', 'MlpPolicy'),
|
78 |
-
('policy_kwargs',
|
79 |
-
'dict( log_std_init=-2, ortho_init=False, activation_fn=nn.ReLU, '
|
80 |
-
'net_arch=dict(pi=[256, 256], vf=[256, 256]) )'),
|
81 |
-
('vf_coef', 0.430793),
|
82 |
-
('normalize_kwargs', {'norm_obs': True, 'norm_reward': False})])
|
83 |
-
```
|
|
|
5 |
- deep-reinforcement-learning
|
6 |
- reinforcement-learning
|
7 |
- stable-baselines3
|
8 |
+
- HumanoidStandup-v4
|
9 |
model-index:
|
10 |
- name: PPO
|
11 |
results:
|
|
|
21 |
name: mean_reward
|
22 |
verified: false
|
23 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|