Proximal Policy Optimization Algorithms
Paper
•
1707.06347
•
Published
•
11
This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library.
This model is a Proximal Policy Optimization (PPO) agent trained to solve the LunarLander-v2 environment from OpenAI Gymnasium. The agent learns to successfully land a lunar module by controlling its main engine and side thrusters while managing fuel consumption and landing precision.
The model was trained with the following PPO hyperparameters:
| Parameter | Value |
|---|---|
| Policy | MlpPolicy |
| n_steps | 1024 |
| batch_size | 64 |
| n_epochs | 4 |
| gamma (discount factor) | 0.999 |
| gae_lambda | 0.98 |
| ent_coef (entropy coefficient) | 0.01 |
Evaluation Results:
This performance indicates the agent has successfully learned to land the lunar module, as:
LunarLander-v2 Environment:
import gymnasium as gym
from stable_baselines3 import PPO
from huggingface_sb3 import load_from_hub
# Load the model from Hugging Face Hub
model = load_from_hub(
repo_id="Adilbai/ppo-LunarLander-v2",
filename="ppo-LunarLander-v2.zip"
)
# Create environment
env = gym.make("LunarLander-v2", render_mode="human")
# Run the trained agent
obs, info = env.reset()
for _ in range(1000):
action, _states = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
The PPO agent uses a Multi-Layer Perceptron (MLP) policy network that:
To reproduce this model's training:
from stable_baselines3 import PPO
import gymnasium as gym
env = gym.make("LunarLander-v2")
model = PPO(
policy='MlpPolicy',
env=env,
n_steps=1024,
batch_size=64,
n_epochs=4,
gamma=0.999,
gae_lambda=0.98,
ent_coef=0.01,
verbose=1
)
model.learn(total_timesteps=500000) # Adjust based on your training duration
If you use this model, please cite:
@misc{ppo_lunarlander_2024,
title={PPO Agent for LunarLander-v2},
author={[Your Name]},
year={2024},
publisher={Hugging Face Hub},
url={https://huggingface.co/Adilbai/ppo-LunarLander-v2}
}