Sb3, the Swiss Army Knife of Applied RL | by James Koh, PhD | Oct, 2023


Your choice of model, with any environment

James Koh, PhD
Towards Data Science
Image created by DALL·E 3 based on the prompt “Create a realistic looking image of an opened swiss army knife.”

Stablebaseline3 (sb3) is like a Swiss Army knife. It is a multi-function utility tool, that can be used for many purpose. And, just like a Swiss Army knife can save your life if you are stranded in a jungle, sb3 can save your life in the office, when you have seemingly impossible deadlines to meet.

This guide uses gymnasium=0.28.1 and stable-baselines=2.1.0. If you use different versions, or perhaps even refer to other old guides, you may not get the results below. But fret not, an installation guide is given here as well. I guarantee you can get the results if you follow my instructions.

Stablebaseline3 is easy to use. It is also well documented, and you can follow the tutorials on your own. But…

  • Have you referred to older guides (perhaps those using gym), only to find errors on your machine?
  • Are you able to always ensure compatibility?
  • What if you want to use gymnasium‘s environment and modify perhaps the rewards?
  • Do you know how to wrap your own tasks, such that SOTA models can be applied in a few lines?

That’s the objective of this article! After reading this guided demonstration, you will…

  1. Solve classic environments with sb3 models, visualize the results, as well as save (or load) the trained model in a few lines of code. [Section 3.1]
  2. Understand how to check the action space and observation space for compatibility. [Section 3.2]
  3. Learn how to wrap gymnasiumenvironments so that any sb3 models can be used, without any restrictions on box or discrete. [Section 4.1]
  4. Learn how to wrap gymnasiumenvironments for reward shaping. [Section 4.2]
  5. Learn how to wrap your own custom environments to be compatible with sb3, with minimal changes to your original code which may follow a different structure. [Section 5]

Create a virtual environment and set up the relevant dependencies. I cater to the majority — here the guide is created using Windows…



Source link

This post originally appeared on TechToday.