Posted 2025-02-28Updated 2025-03-02artificial-intelligence3 minutes read (About 468 words)

Soft Actor Critic (Visualized) Part 2: Lunar Lander Example from Scratch in Torch

Introduction

Just like in the previous example using the CartPole environment, we will be using the Lunar Lander environment from OpenAI Gym. The goal of this example is to implement the Soft Actor Critic (SAC) algorithm from scratch using PyTorch. The SAC algorithm is a model-free, off-policy actor-critic algorithm that uses a stochastic policy and a value function to learn optimal policies in continuous action spaces.
Like before, I will be using notation that matches the original paper (Haarnoja et al., 2018) and the code will be structured in a similar way to the previous example. The main difference is that we will be using a different environment and a different algorithm.
Since the paper’s notation is critical to the understanding of the code, I highly recommend reading that alongside (or before) diving into the code.
Part 1 of this series provides extensive details linking the theory to the code. In this part, we will focus on the implementation of the SAC algorithm in PyTorch for Lunar Lander.

https://github.com/FranciscoRMendes/soft-actor-critic/blob/main/lunar-lander/LL_main_sac.py

Example Data

Lunar Lander State Vector

Action		Reward	State								Done	Next State
Main	Lateral		x	y	v_x	v_y	angle	angular velocity	left contact	right contact		x	y	v_x	v_y	angle	angular velocity	left contact	right contact
0.66336113	-0.485024	-1.56	0.00716772	1.4093536	0.7259957	-0.06963848	-0.0082988	-0.16444895	0	0	False	0.01442766	1.4081073	0.73378086	-0.05545701	-0.01600615	-0.15416077	0	0
0.87302077	0.8565877	-2.85810149	0.01442766	1.4081073	0.73378086	-0.05545701	-0.01600615	-0.15416077	0	0	False	0.02185297	1.4071543	0.7518369	-0.04247425	-0.02521554	-0.18420467	0	0
0.4880578	0.18216014	-2.248854395	0.02185297	1.4071543	0.7518369	-0.04247425	-0.02521554	-0.18420467	0	0	False	0.02941189	1.4065428	0.7646336	-0.02735517	-0.03385869	-0.17287907	0	0
0.0541396	-0.70224154	-0.765160122	0.02941189	1.4065428	0.7646336	-0.02735517	-0.03385869	-0.17287907	0	0	False	0.03697386	1.4056652	0.7634756	-0.03918146	-0.04105976	-0.14403483	0	0

Lunar Lander Dataset Explanation

This dataset captures the experience of an agent in the Lunar Lander environment from OpenAI Gym. Each row represents a single transition (state, action, reward, next state) in the environment.

Environment Details

Action
- Main Engine: The thrust applied to the main engine.
- Lateral Thruster: The thrust applied to the left/right thrusters.
Reward
- The reward received in this step. It is based on:
  - Proximity to the landing pad.
  - Smoothness of the landing.
  - Fuel consumption.
  - Avoiding crashes.
State
- x, y: Position coordinates.
- v_x, v_y: Velocity components.
- theta: The lander’s rotation angle.
- omega: The rate of change of the angle.
- left contact, right contact: Binary indicators (0 or 1) showing whether the lander has made contact with the ground.
Done
- True: The episode has ended (either successful landing or crash).
- False: The episode is still ongoing.
Next State
- The same attributes as State, but after the action has been applied.

Sample Game Play

Sample game play from the OpenAI website

Game play 500 games

YouTube video embedded

Game play 500k games

Soft Actor Critic (Visualized) Part 2: Lunar Lander Example from Scratch in Torch

Introduction

Example Data

Lunar Lander Dataset Explanation

Environment Details

Sample Game Play

Game play 500 games

Game play 500k games

Comments

Links

Categories

Recents

Archives

Tags

Subscribe for updates

follow.it