|
--- |
|
license: mit |
|
library_name: stable-baselines3 |
|
tags: |
|
- dqn |
|
- Reinforcement Learning |
|
- Atari |
|
- Pac-Man |
|
pipeline_tag: reinforcement-learning |
|
|
|
model-index: |
|
- name: DQN |
|
results: |
|
- task: |
|
type: reinforcement-learning |
|
name: reinforcement-learning |
|
dataset: |
|
name: ALE/Pacman-v5 |
|
type: ALE/Pacman-v5 |
|
metrics: |
|
- type: mean_reward |
|
value: 455.60 +/- 40.10 |
|
name: mean_reward |
|
verified: false |
|
--- |
|
|
|
# *Agent using DQN to play ALE/Pacman-v5* |
|
|
|
## Update 20 May 2024: Latest DQN model is version 2.8 |
|
***NOTE:** Video preview is the best model of version 2.8 playing for 10,000 steps. Evaluation metrics are self-reported based on 10 episodes of evaluation. Can be found in `agents/dqn_v2-8/evals.txt`* |
|
|
|
This is an agent that is trained using Stable Baselines3 as part of the capstone project for South Hills School in Spring 2024. The goal of this project is to gain familiarity with reinforcement learning concepts and tools, and to train an agent to score up into the 400-500 point range in Pac-Man. |
|
|
|
--- |
|
|
|
## *How to use this repository* |
|
|
|
The primary purpose of this repository is to give someone a basic introduction and understanding of reinforcement learning. Helpful documentation to consult for assistance includes [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/) (particularly the section on [Deep Q-Networks](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html)), [RL Zoo](https://rl-baselines3-zoo.readthedocs.io/en/master/), and [Gymnasium](https://gymnasium.farama.org/), or reach out to me and I will do my best to assist you! I may be reached on [Twitter](https://x.com/ledmands) or [LinkedIn](https://linkedin.com/in/lucasedmands), or you may start a discussion in the Community section of this repository. Please download, explore, and have fun! |
|
|
|
### *Prerequisites* |
|
Basic understanding of the Python programming language, package management software, and command line interfaces is preferable and will help with effective utilization. Please refer to [python.org](https://www.python.org/), [pip.pypa.io](https://pip.pypa.io/en/stable/installation/), or [this introduction to the command line](https://www.freecodecamp.org/news/how-to-use-the-cli-beginner-guide/) if needed. Depending on your operating system, additional dependencies may need to be installed. |
|
|
|
Python and `pip` should be installed in your environment. Using a virtual environment is recommended to manage dependencies, however this is not required. You can learn more about Python virtual environments [here](https://packaging.python.org/en/latest/tutorials/installing-packages/#creating-and-using-virtual-environments) or about Conda [here.](https://conda.io/projects/conda/en/latest/user-guide/index.html) |
|
|
|
To use the scripts and explore the files on your local device, first ensure you have [git](https://git-scm.com/downloads) and [git-lfs](https://git-lfs.com/) installed and initialized. Then clone this repository to your local device: |
|
```bash |
|
git clone https://huggingface.co/ledmands/ALE-Pacman-v5 |
|
``` |
|
After cloning the repository to your local device, run: |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
***NOTE:** The `requirements.txt` file will install all the extra dependencies for Stable Baselines and the entire version of TensorFlow. This is for ease of use for Stable Baselines and to ensure that extra data points and tools are available in TensorBoard. If you wish to install dependencies as needed, you can simply skip the `requirements.txt` file and install packages via `pip` as desired.* |
|
|
|
--- |
|
|
|
### *Repository Structure* |
|
|
|
The root directory contains this ReadMe, license information, and scripts, as well as a `.gitattributes` file necessary for `git-lfs`. The `replay.mp4` file exists in order to show a video preview of the agent on the Model Card on Hugging Face. There are other branches that are being used for development purposes. Please feel free to explore them, but note that they may not be up-to-date or contain functional code. |
|
|
|
#### Agents |
|
|
|
- The files containing the agents playing Pac-Man are located in the agents directory. Each agent directory is prepended with the algorithm that it used to learn. If numerous versions of the agent exist, version information is appended to the agent directory. That is, if an agent was trained for a certain amount of time and then trained further, there are two versions of that agent. Agent versioning follows a somewhat semantic versioning structure. E.g. version 2.5 of an agent using DQN is named `dqn_v2-5`. |
|
|
|
- Each agent file contains files relevant to that agent. The agent itself that resulted from the training run version is located in the `.zip` file named by the environment identifier. That is, `ALE-Pacman-v5.zip` contains the agent that has learned to play Pac-Man in the Arcade Learning Environment as specified by the version number in Gymnasium. You can refer to the [section on Atari environments in the Gymnasium documentation](https://gymnasium.farama.org/environments/atari/) for more information. |
|
|
|
- The `best_model.zip` file contains the agent that performed the best during training even if there was a drop off in performance by the end of the training run. That is, `best_model.zip` contains the best-performing model over the entire course of a training run and `ALE-Pacman-v5.zip` contains the model that resulted at the end of the training run. In this sense, `best_model.zip` is sometimes a better measure of an agent’s performance than the resultant model. |
|
|
|
- The `evaluations.npz` file is a NumPy compressed file that contains evaluation metrics automatically collected by Stable Baselines3 during the course of the training. These files are used in the `plot_improvement.py` script to visualize agent performance over time. |
|
`tfevents` files are accessed by TensorBoard in order to visualize training metrics and contain more data points than the `evaluations.npz` files. If you are unable to see the TensorBoard dashboard in the Hugging Face repository, you may need to clone the repository to your local device and launch a TensorBoard instance from that directory. You will need a full installation of TensorFlow to access the full feature set of TensorBoard. |
|
|
|
#### Charts |
|
|
|
- The charts directory contains various TensorBoard and NumPy charts from agents in the agents directory. |
|
|
|
#### Notebooks |
|
|
|
- The notebooks directory contains the notebooks used to train each agent. These notebooks were run in Kaggle, using prior Kaggle notebooks as input in order to load an agent to continue training. While somewhat inconsistent, the versioning should mostly match the agents directory structure. E.g. the notebook `dqn_pacmanv5_run2v3.ipynb` corresponds to the agent located in the directory `dqn_v2-3`. |
|
|
|
#### Videos |
|
|
|
- The videos directory contains various videos of some of the agents playing Pac-Man in an evaluation environment. |
|
|
|
--- |
|
|
|
### *Training An Agent* |
|
|
|
To train an agent of your own, you can run either the `dqn_pacmanv5_run1.ipynb` notebook or `dqn_pacmanv5_run2.ipynb` notebook. It is possible to run these locally in Visual Studio Code or Jupyter, however you may find it best to utilize Google Colab or Kaggle. The subsequent versions of the notebooks are structured such that they are dependent on input to construct the model and use a prior notebook as said input. This is due to the Kaggle architecture and as such, these notebooks will need to be modified slightly in order to run effectively. However, these notebooks should give a good idea of how to load a trained agent and continue training with different hyperparameters. |
|
|
|
#### To train an agent in Kaggle *(recommended)*: |
|
1. You will need to sign up for a [Kaggle](https://www.kaggle.com/) account. |
|
|
|
2. Create a new notebook by clicking on the Code section of the menu on the left. You will then see an option for *“New Notebook.”* This will start a Kaggle notebook with default settings. |
|
|
|
3. From the File menu in the top left of the screen, under the notebook name, select *“Import Notebook.”* Then select the notebook you wish to use to train the agent. If this is your first training run, use either `dqn_pacmanv5_run1.ipynb` or `dqn_pacman_v5_run2.ipynb`. You may name these notebooks however you see fit. |
|
|
|
4. For the first training run, no input is needed. In the notebook options on the right of the screen, select the accelerator of your choosing in the Session Options menu. Either of the GPUs is sufficient. I used GPU T4 x2 in my testing. Using another option may affect your outcomes, training times, or settings. Make sure to toggle the *“Internet”* option to on. This ensures that you can close your browser and the run will continue. |
|
|
|
5. In the second cell of the notebook, you can configure the training run to your liking. The `MODEL_FILE_NAME` is the name of the zip file that will be saved when training is finished. This file will contain the policy and training information that the agent has learned. The `BUFFER_FILE_NAME` and `POLICY_FILE_NAME` are the file names for the replay buffer and policy the agent will have at the end of the training run. The buffer file is important if you wish to continue training as this is the agent’s recent memory. Due to the size of the file, which can exceed 14GB in this case, the buffer must always be saved independently from the model if you intend to use it later. |
|
|
|
6. The `NUM_TIMESTEPS` variable is the total number of training steps the model for train for, i.e. how many loops through the algorithm to learn the environment. `EVAL_CALLBACK_FREQ` and `VIDEO_CALLBACK_FREQ` denote the intervals, in number of timesteps, at which the model will be evaluated and record a video of the agent, respectively. Play with the rest of the hyperparameters! The hyperparameters listed as variables are ones that have been modified. If no hyperparameter is specified when creating the model in cell #6, the default ones will be used. Default hyperparameters used can be found [here.](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#parameters) |
|
|
|
7. To run the notebook cells in order and save the notebook version, click *“Save Version”* in the top right corner of the interface. The configuration as described above should take a little over 2 hours for 1.5 million timesteps. |
|
|
|
8. When the training is completed, you can access the output of the notebook by navigating to *"Code" --> "Your Work"* and then selecting the appropriate notebook. Selecting the *“Output”* tab on the notebook’s page will show all the output files that were generated during the training run. You can download them as you wish and place them in a local directory for analysis. A good practice I've found is to keep all the files related to a particular agent in one directory, so as not to lose track of which files relate to which agent. |
|
|
|
#### Loading and continuing training in Kaggle: |
|
|
|
1. To continue an agent’s training in Kaggle, you can use a prior notebook’s output as input to the new notebook. In this fashion, you can chain notebooks together to create a flow of agent training with checkpoints. |
|
|
|
2. Create a new notebook and select the Input drop-down in the options menu on the right side of the screen. Then select *“Add input.”* |
|
|
|
3. Select the notebook that contains the agent you wish to continue training. You can filter these selections using the *“Your Work”* option or search for the notebook by name. ***NOTE:** If you upload files to Kaggle for input, they will be unzipped upon upload. Stable Baselines looks for a zip file to load an agent, and thus will not be able to recognize an agent uploaded from another source. There may be a way around this, but loading a prior Kaggle notebook was the solution I found, as it preserves the zip file.* |
|
|
|
4. You can use any of the `dqn_pacmanv5_run2v2` or above `.ipynb` notebooks as a template for this. Modify any of the hyperparameters that you wish to change for the continued training. ***NOTE:** The `CUSTOM_OBJECTS` dictionary is necessary for modifying and overriding hyperparameters in a loaded agent. Otherwise, the model will be created using the hyperparameters that were defined in the prior training run.* |
|
|
|
5. In cell 6, you will need to copy and paste the path to the input agent and the replay buffer. You can achieve this by navigating to the notebook that was added as input on the right side, expanding the selection to see all the files added, hovering over the agent’s zip file, and then selecting the *“Copy file path”* option for that input file. Please note that the .zip extension is required for loading the model, but the .pkl extension must be omitted for loading the replay buffer. |
|
|
|
6. Note that in cell 8, the `reset_num_timesteps` variable is set to False in the `model.learn()` function. This is essential for keeping track of TensorBoard files and ensuring they log appropriately. By default, the number of timesteps will always reset when training an agent, which means any charts that show in TensorBoard will start at 0, thus rendering analysis of an agent’s improvement over time much more difficult. By not resetting timesteps, the timesteps will accumulate, and the TensorBoard charts will show each run with an individual line. |
|
|
|
7. Run the notebook by saving the version as described in the **How to train an agent in Kaggle** section. The output files can be accessed in the same way as well. |
|
|
|
--- |
|
|
|
### *Using TensorBoard* |
|
TensorBoard is a useful tool for evaluating agent performance. While MatPlotLib is handy for plotting custom functions and evaluations, I found TensorBoard to be much more useful in the sense that you can easily filter different agents, change the color of their chart lines, and log custom values to the TensorBoard file. |
|
|
|
When training is completed, if TensorBoard logging was enabled (it is in all the example notebooks here), an output file beginning with `events.out.tfevents.` and ending with `.0` will be generated. The `.0` may change if you have multiple runs in the same Kaggle session to allow TensorBoard to differentiate between runs. |
|
|
|
To start a TensorBoard instance, navigate to the root directory of this repository in a terminal on your local device and run: |
|
```bash |
|
tensorboard --logdir agents/ |
|
``` |
|
This will launch a local TensorBoard instance in the directory specified by `--logdir`. TensorBoard walks the directory recursively and will display any `.tfevents.` files found in the TensorBoard dashboard. Usually port 6006 is used, thus launching localhost:6006 in a browser window should display the dashboard. |
|
|
|
On the dashboard, the TensorBoard files will be organized by directory, thus allowing you to filter out different runs. If you have multiple agents, you can compare their performance. If you have one agent with many different runs, you can see the improvement over time. However, each run will be its own color and may not show an exact continuous line. To show a continuous line for one agent over all of its training runs, you will need to move all the TensorBoard files into one directory. Use caution when moving files and naming directories to ensure that an errant TensorBoard file from another model doesn’t end up in an incorrect directory. |
|
More information about TensorBoard integration in Stable Baselines3 can be found [here.](https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html) |
|
|
|
--- |
|
|
|
### *About the Scripts* |
|
|
|
To run a script, navigate to the root directory of the repository in a terminal shell, then run: |
|
```bash |
|
python <script_name> [options] |
|
``` |
|
For a list of available options, run: |
|
```bash |
|
python <script_name> --help |
|
``` |
|
|
|
***Please note that these scripts are intended to be educational and are still considered prototypes. They are by no means bug-free and may not work as intended in every environment.*** |
|
|
|
##### *watch_agent.py* |
|
|
|
- This will render the specified agent in real-time. Does not save any evaluation information. |
|
|
|
##### *evaluate_agent.py* |
|
|
|
- This will evaluate a specified agent and append the results to a specified log file. |
|
|
|
##### *get_config.py* |
|
|
|
- This will pull configuration information from the specified agent and save it in JSON format. The data is pulled from the data file in the agent's zip file and strips out the serialized data to make the data more human-readable. The default save file will save to the directory from which the command is run. Best practice is to save the file to the agent's directory. |
|
|
|
##### *plot_improvement.py* |
|
|
|
- This plots the average score and standard deviation of the `dqn_v2` agent over all evaluation episodes during a training run as a bar graph with each training run shown as one bar. Removes the lowest and highest episode scores from each evaluation. |
|
|
|
##### *record_video.py* |
|
|
|
- This will record a video of a specified agent being evaluated. Does not save any evaluation information. *Currently in major development. Currently located in development branch.* |
|
|
|
##### *plot_evaluations.py* |
|
|
|
- This will plot the evaluation data that was gathered during the training run of the specified agent using MatPlotLib. Charts can be saved to a directory of the user's choosing. *Currently in major development. Currently located in development branch.* |
|
|
|
--- |
|
|
|
## *External References* |
|
|
|
- [Foundations of Deep RL -- 6-lecture series by Pieter Abbeel](https://www.youtube.com/playlist?list=PLwRJQ4m4UJjNymuBM9RdmB3Z9N5-0IlY0). *This is an excellent introduction to some of the concepts behind Deep RL Algorithms. Pieter Abbeel is a machine learning and robotics researcher at UC Berkeley.* |
|
- [Training AI to Play Pokemon with Reinforcement Learning](https://www.youtube.com/watch?v=DcYLT37ImBY). *Peter Whidden's video of using Proximal Policy Optimization was a major inspiration for this project and has some fantastic visualizations of the agent learning.* |
|
- [Frame Skipping and Pre-Processing for Deep Q-Networks on Atari 2600 Games](https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/). *Daniel Takeshi wrote an excellent post that helped me better understand some of the terminology around frame skipping.* |
|
- [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602). *This paper on Deep Q Networks is a landmark in the field of reinforcement learning.* |
|
- [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/learn/deep-rl-course/unit0/introduction). *Another inspiration for this project and a great place to get hands-on experience.* |
|
- [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/) |
|
- [RL Zoo](https://rl-baselines3-zoo.readthedocs.io/en/master/) |
|
- [Gymnasium](https://gymnasium.farama.org/) |
|
|
|
--- |
|
|
|
## *Contact* |
|
|
|
Please feel free to contact me on [Twitter](https://x.com/ledmands) or [LinkedIn](https://linkedin.com/in/lucasedmands) or in the Discussion section on the Community tab of this repository! |