Quentin GallouΓ©dec commited on
Commit
263af70
β€’
1 Parent(s): 0660028

text and tuto

Browse files
Files changed (4) hide show
  1. app.py +9 -86
  2. texts/about.md +53 -0
  3. texts/getting_my_agent_evaluated.md +133 -0
  4. texts/heading.md +3 -0
app.py CHANGED
@@ -209,90 +209,6 @@ Be the first to [submit your model]()!
209
  """
210
 
211
 
212
- HEADING = """
213
- # πŸ₯‡ Open RL Leaderboard πŸ₯‡
214
-
215
- Welcome to the Open RL Leaderboard! This is a community-driven benchmark for reinforcement learning models.
216
- """
217
-
218
- ABOUT_TEXT = r"""
219
- The Open RL Leaderboard is a community-driven benchmark for reinforcement learning models.
220
-
221
- ## πŸ”Œ How to have your agent evaluated?
222
-
223
- The Open RL Leaderboard constantly scans the πŸ€— Hub to detect new models to be evaluated. For your model to be evaluated, it must meet the following criteria.
224
-
225
- 1. The model must be public on the πŸ€— Hub
226
- 2. The model must contain an `agent.pt` file.
227
- 3. The model must be [tagged](https://huggingface.co/docs/hub/model-cards#model-cards) `reinforcement-learning`
228
- 4. The model must be [tagged](https://huggingface.co/docs/hub/model-cards#model-cards) with the name of the environment you want to evaluate (for example `MountainCar-v0`)
229
-
230
- Once your model meets these criteria, it will be automatically evaluated on the Open RL Leaderboard. It usually takes a few minutes for the evaluation to be completed.
231
- That's it!
232
-
233
- ## πŸ—οΈ How do I build the `agent.pt`?
234
-
235
- The `agent.pt` file is a [TorchScript module](https://pytorch.org/docs/stable/jit.html#). It must be loadable using `torch.jit.load`.
236
- The module must take batch of observations as input and return batch of actions. To check if your model is compatible with the Open RL Leaderboard, you can run the following code:
237
-
238
- ```python
239
- import gymnasium as gym
240
- import numpy as np
241
- import torch
242
-
243
- agent_path = "path/to/agent.pt"
244
- env_id = ... # e.g. "MountainCar-v0"
245
-
246
- agent = torch.jit.load(agent_path)
247
- env = gym.make(env_id)
248
- observations = np.array([env.observation_space.sample()])
249
- observations = torch.from_numpy(observations)
250
- actions = agent(observations)
251
- actions = actions.numpy()[0]
252
- assert env.action_space.contains(actions)
253
- ```
254
-
255
- ## πŸ•΅ How are the models evaluated?
256
-
257
- The evaluation is done by running the agent on the environment for 100 episodes.
258
-
259
- For further information, please refer to the [Open RL Leaderboard evaulation script](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/blob/main/src/evaluation.py).
260
-
261
- ### The particular case of Atari environments
262
-
263
- Atari environments are evaluated on the `NoFrameskip-v4` version of the environment. For example, to evaluate an agent on the `Pong` environment, you must tag your model with `PongNoFrameskip-v4`. The environment is then wrapped to match the standard Atari preprocessing pipeline.
264
-
265
- - No-op reset with a maximum of 30 no-ops
266
- - Max and skip with a skip of 4
267
- - Episodic life (although the reported score is for the full episode, not the life)
268
- - Fire reset
269
- - Clip reward (although the reported score is not clipped)
270
- - Resize observation to 84x84
271
- - Grayscale observation
272
- - Frame stack of 4
273
-
274
- ## πŸš‘ Troubleshooting
275
-
276
- If you encounter any issue, please [open an issue](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/discussions/new) on the Open RL Leaderboard repository.
277
-
278
- ## πŸƒ Next steps
279
-
280
- We are working on adding more environments and metrics to the Open RL Leaderboard.
281
- If you have any suggestions, please [open an discussion](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/discussions/new) on the Open RL Leaderboard repository.
282
-
283
- ## πŸ“œ Citation
284
-
285
- ```bibtex
286
- @misc{open-rl-leaderboard,
287
- author = {Quentin GallouΓ©dec and TODO},
288
- title = {Open RL Leaderboard},
289
- year = {2024},
290
- publisher = {Hugging Face},
291
- howpublished = "\url{https://huggingface.co/spaces/open-rl-leaderboard/leaderboard}",
292
- }
293
- ```
294
- """
295
-
296
  css = """
297
  .generating {
298
  border: none;
@@ -325,7 +241,8 @@ def refresh():
325
 
326
 
327
  with gr.Blocks(css=css) as demo:
328
- gr.Markdown(HEADING)
 
329
  with gr.Tabs(elem_classes="tab-buttons") as tabs:
330
  with gr.TabItem("πŸ… Leaderboard"):
331
  all_gr_dfs = {}
@@ -365,8 +282,14 @@ with gr.Blocks(css=css) as demo:
365
  # Load the first video of the first environment
366
  demo.load(refresh_one_video(df, env_ids[0]), outputs=[all_gr_videos[env_ids[0]]])
367
 
 
 
 
 
 
368
  with gr.TabItem("πŸ“ About"):
369
- gr.Markdown(ABOUT_TEXT)
 
370
 
371
  demo.load(refresh, outputs=list(all_gr_dfs.values()) + list(all_gr_winners.values()))
372
 
 
209
  """
210
 
211
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
212
  css = """
213
  .generating {
214
  border: none;
 
241
 
242
 
243
  with gr.Blocks(css=css) as demo:
244
+ with open("texts/heading.md") as fp:
245
+ gr.Markdown(fp.read())
246
  with gr.Tabs(elem_classes="tab-buttons") as tabs:
247
  with gr.TabItem("πŸ… Leaderboard"):
248
  all_gr_dfs = {}
 
282
  # Load the first video of the first environment
283
  demo.load(refresh_one_video(df, env_ids[0]), outputs=[all_gr_videos[env_ids[0]]])
284
 
285
+
286
+ with gr.TabItem("πŸš€ Getting my agent evaluated"):
287
+ with open("texts/getting_my_agent_evaluated.md") as fp:
288
+ gr.Markdown(fp.read())
289
+
290
  with gr.TabItem("πŸ“ About"):
291
+ with open("texts/about.md") as fp:
292
+ gr.Markdown(fp.read())
293
 
294
  demo.load(refresh, outputs=list(all_gr_dfs.values()) + list(all_gr_winners.values()))
295
 
texts/about.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ The Open RL Leaderboard is a community-driven benchmark for reinforcement learning models.
2
+
3
+ ## πŸ”Œ How to have your agent evaluated?
4
+
5
+ The Open RL Leaderboard constantly scans the πŸ€— Hub to detect new models to be evaluated. For your model to be evaluated, it must meet the following criteria.
6
+
7
+ 1. The model must be public on the πŸ€— Hub
8
+ 2. The model must contain an `agent.pt` file.
9
+ 3. The model must be [tagged](https://huggingface.co/docs/hub/model-cards#model-cards) `reinforcement-learning`
10
+ 4. The model must be [tagged](https://huggingface.co/docs/hub/model-cards#model-cards) with the name of the environment you want to evaluate (for example `MountainCar-v0`)
11
+
12
+ Once your model meets these criteria, it will be automatically evaluated on the Open RL Leaderboard. It usually takes a few minutes for the evaluation to be completed.
13
+ That's it!
14
+
15
+ ## πŸ•΅ How are the models evaluated?
16
+
17
+ The evaluation is done by running the agent on the environment for 100 episodes.
18
+
19
+ For further information, please refer to the [Open RL Leaderboard evaulation script](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/blob/main/src/evaluation.py).
20
+
21
+ ### The particular case of Atari environments
22
+
23
+ Atari environments are evaluated on the `NoFrameskip-v4` version of the environment. For example, to evaluate an agent on the `Pong` environment, you must tag your model with `PongNoFrameskip-v4`. The environment is then wrapped to match the standard Atari preprocessing pipeline.
24
+
25
+ - No-op reset with a maximum of 30 no-ops
26
+ - Max and skip with a skip of 4
27
+ - Episodic life (although the reported score is for the full episode, not the life)
28
+ - Fire reset
29
+ - Clip reward (although the reported score is not clipped)
30
+ - Resize observation to 84x84
31
+ - Grayscale observation
32
+ - Frame stack of 4
33
+
34
+ ## πŸš‘ Troubleshooting
35
+
36
+ If you encounter any issue, please [open an issue](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/discussions/new) on the Open RL Leaderboard repository.
37
+
38
+ ## πŸƒ Next steps
39
+
40
+ We are working on adding more environments and metrics to the Open RL Leaderboard.
41
+ If you have any suggestions, please [open an discussion](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/discussions/new) on the Open RL Leaderboard repository.
42
+
43
+ ## πŸ“œ Citation
44
+
45
+ ```bibtex
46
+ @misc{open-rl-leaderboard,
47
+ author = {Quentin GallouΓ©dec and TODO},
48
+ title = {Open RL Leaderboard},
49
+ year = {2024},
50
+ publisher = {Hugging Face},
51
+ howpublished = "\url{https://huggingface.co/spaces/open-rl-leaderboard/leaderboard}",
52
+ }
53
+ ```
texts/getting_my_agent_evaluated.md ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ In this guide, we explain how to get your agent evaluated by the [Open RL Leaderboard](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard). For the sake of demonstration, we'll train a simple agent, but if you already have a trained agent, you can of course skip this step.
2
+
3
+ ## πŸ› οΈ Prerequisites
4
+
5
+ Ensure you have the necessary packages installed:
6
+
7
+ ```bash
8
+ pip install torch huggingface-hub
9
+ ```
10
+
11
+ ## πŸ‹οΈβ€β™‚οΈ Training the agent (optinal, just for demonstration)
12
+
13
+ Here is a simple example of training a reinforcement learning agent using the `CartPole-v1` environment from Gymnasium. You can skip this step if you already have a trained model. For this example, you'll also need the `gymnasium` package:
14
+
15
+ ```bash
16
+ pip install gymnasium
17
+ ```
18
+
19
+ Now, let's train the agent with a simple policy gradient algorithm:
20
+
21
+ ```python
22
+ import gymnasium as gym
23
+ import torch
24
+ from torch import nn, optim
25
+ from torch.distributions import Categorical
26
+
27
+ # Environment setup
28
+ env_id = "CartPole-v1"
29
+ env = gym.make(env_id, render_mode="human")
30
+ env = gym.wrappers.RecordEpisodeStatistics(env)
31
+
32
+ # Agent setup
33
+ policy = nn.Sequential(
34
+ nn.Linear(4, 128),
35
+ nn.Dropout(p=0.6),
36
+ nn.ReLU(),
37
+ nn.Linear(128, 2),
38
+ nn.Softmax(-1),
39
+ )
40
+ optimizer = optim.Adam(policy.parameters(), lr=1e-2)
41
+
42
+ # Training loop
43
+ global_step = 0
44
+ for episode_idx in range(10):
45
+ log_probs = torch.zeros((env.spec.max_episode_steps + 1))
46
+ returns = torch.zeros((env.spec.max_episode_steps + 1))
47
+ observation, info = env.reset()
48
+ terminated = truncated = False
49
+ step = 0
50
+ while not terminated and not truncated:
51
+ probs = policy(torch.tensor(observation))
52
+ distribution = Categorical(probs) # Create distribution
53
+ action = distribution.sample() # Sample action
54
+ log_probs[step] = distribution.log_prob(action) # Store log probability
55
+ action = action.cpu().numpy() # Convert to numpy array
56
+ observation, reward, terminated, truncated, info = env.step(action)
57
+ step += 1
58
+ global_step += 1
59
+ returns[:step] += 0.99 ** torch.flip(torch.arange(step), (0,)) * reward # return = sum(gamma^i * reward_i)
60
+
61
+ episodic_return = info["episode"]["r"][0]
62
+ print(f"Episode: {episode_idx} Global step: {global_step} Episodic return: {episodic_return:.2f}")
63
+
64
+ batch_returns = returns[:step]
65
+ batch_log_probs = log_probs[:step]
66
+ batch_returns = (batch_returns - batch_returns.mean()) / (batch_returns.std() + 10**-5)
67
+ policy_loss = torch.sum(-batch_log_probs * batch_returns)
68
+ optimizer.zero_grad()
69
+ policy_loss.backward()
70
+ optimizer.step()
71
+ ```
72
+
73
+ That's it! You've trained a simple policy gradient agent. Now let's see how to upload the agent to the πŸ€— Hub so that the [Open RL Leaderboard](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard) can evaluate it.
74
+
75
+ ## πŸ€– From policy to agent
76
+
77
+ To make the agent compatible with the Open RL Leaderboard, you need your model to take a batch of observations as input and return a batch of actions. Here's how you can wrap your policy model into an agent class:
78
+
79
+ ```python
80
+ class Agent(nn.Module):
81
+ def __init__(self, policy):
82
+ super().__init__()
83
+ self.policy = policy
84
+
85
+ def forward(self, observations):
86
+ probs = self.policy(observations)
87
+ distribution = Categorical(probs)
88
+ return distribution.sample()
89
+
90
+
91
+ agent = Agent(policy) # instantiate the agent
92
+
93
+ # A few tests to check if the agent is working
94
+ observations = torch.tensor(env.observation_space.sample()).unsqueeze(0) # dummy batch of observations
95
+ actions = agent(observations)
96
+ actions = actions.numpy()[0]
97
+ assert env.action_space.contains(actions)
98
+ ```
99
+
100
+ ## πŸ’Ύ Saving the agent
101
+
102
+ For the Open RL Leaderboard to evaluate your agent, you need to save it as a [TorchScript module](https://pytorch.org/docs/stable/jit.html#) under the name `agent.pt`.
103
+ It must be loadable using `torch.jit.load`. Then you can push it to the πŸ€— Hub.
104
+
105
+ ```python
106
+ from huggingface_hub import metadata_save, HfApi
107
+
108
+ # Save model along with its card
109
+ metadata_save("model_card.md", {"tags": ["reinforcement-learning", env_id]})
110
+ dummy_input = torch.tensor(env.observation_space.sample()).unsqueeze(0) # dummy batch of observations
111
+ agent = torch.jit.trace(agent.eval(), dummy_input)
112
+ agent = torch.jit.freeze(agent) # required for for the model not to depend on the training library
113
+ agent = torch.jit.optimize_for_inference(agent)
114
+ torch.jit.save(agent, "agent.pt")
115
+
116
+ # Upload model and card to the πŸ€— Hub
117
+ api = HfApi()
118
+ repo_id = "username/REINFORCE-CartPole-v1" # can be any name
119
+ model_path = api.create_repo(repo_id, repo_type="model")
120
+ api.upload_file(path_or_fileobj="agent.pt", path_in_repo="agent.pt", repo_id=repo_id)
121
+ api.upload_file(path_or_fileobj="model_card.md", path_in_repo="README.md", repo_id=repo_id)
122
+ ```
123
+
124
+ Now, you can find your agent on the πŸ€— Hub at `https://huggingface.co/username/REINFORCE-CartPole-v1`.
125
+
126
+ ## πŸ“Š Open RL Leaderboard evaluation
127
+
128
+ At this point, all you have to do is to wait for the Open RL Leaderboard to evaluate your agent. It usually takes less than 10 minutes.
129
+ Speaking of which, my agent has just appeared on the leaderboard:
130
+
131
+ ![Leaderboard](img.png)
132
+
133
+ Last place 😒. Next time, our agent will do better πŸ’ͺ!
texts/heading.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # πŸ₯‡ Open RL Leaderboard πŸ₯‡
2
+
3
+ Welcome to the Open RL Leaderboard! This is a community-driven benchmark for reinforcement learning models.