This model serves as the baseline for the Ocean Plastic Collection environment, trained and tested on task 0
using the Proximal Policy Optimization (PPO) algorithm.
Environment: Ocean Plastic Collection
Task: 0
Algorithm: PPO
Episode Length: 5000
Training max_steps
: 3000000
Testing max_steps
: 150000
Train & Test Scripts
Download the Environment
Evaluation results
- Cumulative Reward on hivex-ocean-plastic-collectionself-reported823.2983947753906 +/- 197.42713024318527
- Global Reward on hivex-ocean-plastic-collectionself-reported285.8913818359375 +/- 84.43798423128938
- Local Reward on hivex-ocean-plastic-collectionself-reported158.69510040283203 +/- 32.16273712262643