If you’re looking to get started with Reinforcement Learning, the OpenAI gym is undeniably the most popular choice for implementing environments to train your agents. A wide range of environments that are used as benchmarks for proving the efficacy of any new research methodology are implemented in OpenAI Gym, which is out of the box. Furthermore, OpenAI Gym provides an easy API to implement your environments.
In this article, I will introduce the basic building blocks of OpenAI Gym. Here is a list of things I have covered in this article.
The first thing we do is to make sure we have the latest version of the gym installed.
One can either use conda or pip to install gym. In our case, we’ll use pip.
The fundamental building block of OpenAI Gym is the Env class. It is a Python class that basically implements a simulator that runs the environment you want to train your agent in. Open AI Gym comes packed with a lot of environments, such as one where you can move a car up a hill, balance a swinging pendulum, score well on Atari games, etc. Gym also provides you with the ability to create custom environments as well.
We start with an environment called MountainCar, where the objective is to drive a car up a mountain. The car is on a one-dimensional track, positioned between two “mountains”. The goal is to drive up the mountain on the right; however, the car’s engine is not strong enough to scale the mountain in a single go. Therefore, the only way to succeed is to drive back and forth to build up momentum.
The goal of the Mountain Car Environment is to gain momentum and reach the flag.
The basic structure of the environment is described by the observation_space and the action_space attributes of the Gym Env class.
The observation_space defines the structure as well as the legitimate values for the observation of the state of the environment. The observation can be different things for different environments. The most common form is a screenshot of the game. There can be other forms of observations as well, such as certain characteristics of the environment described in vector form.
Similarly, the Env class also defines an attribute called the action_space, which describes the numerical structure of the legitimate actions that can be applied to the environment.
The observation for the mountain car environment is a vector of two numbers representing velocity and position. The middle point between the two mountains is taken to be the origin, with right being the positive direction and left being the negative direction.
We see that both the observation space as well as the action space are represented by classes called Box and Discrete, respectively. These are one of the various data structures provided by gym in order to implement observation and action spaces for different kind of scenarios (discrete action space, continuous action space, etc). We will dig further into these later in the article.
Info: DigitalOcean’s GenAI Platform offers businesses a fully-managed service to build and deploy custom AI agents. With access to leading models from Meta, Mistral AI, and Anthropic, along with essential features like RAG workflows and guardrails, the platform makes it easier than ever to integrate powerful AI capabilities into your applications.
In this section, we cover functions of the Env class that help the agent interact with the environment. Two such important functions are:
Let us now see an example that illustrates the concepts discussed above. We first begin by resetting the environment, then we inspect an observation. We then apply an action and inspect the new observation.
In this case, our observation is not the screenshot of the task being performed. In many other environments (like Atari, as we will see), the observation is a screenshot of the game. In either of the scenarios, if you want to see how the environment looks in the current state, you can use the render method.
This should display the environment in its current state in a pop-up window. You can close the window using the close function.
If you want to see a screenshot of the game as an image, rather than as a pop-up window, you should set the mode argument of the render function to rgb_array.
Collecting all the little blocks of code we have covered so far, the typical code for running your agent inside the MountainCar environment would look like the following. In our case, we just take random actions, but you can have an agent that does something more intelligent based on the observations you get.
The observation_space
for our environment was Box(2,)
, and the action_space
was Discrete(2)
. These are both types of data structures called Spaces, provided by the Gym library, which describe the valid values for observations and actions in reinforcement learning environments. All of these structures inherit from the gym.Space
base class.
The Box(n,)
space represents an n-dimensional continuous space. In our case, n = 2
, so the observation space is a 2-dimensional continuous space. This space is bounded, meaning it has specified upper and lower limits for each dimension. These bounds define the range of legitimate values that observations can take and can be accessed using the high
and low
attributes of the Box
space. These typically correspond to the maximum and minimum values of features such as positions or velocities in the environment.
You can set these upper/lower limits while defining your space, as well as when you are creating an environment.
The Discrete(n) box describes a discrete space with [0…n-1] possible values. In our case, n = 3, meaning our actions can take values of either 0, 1, or 2. Unlike Box, Discrete does not have a high and low method, since, by the very definition, it is clear what type of values are allowed.
If you try to input invalid values in the step function of our environment (in our case, say, 4), it will lead to an error.
There are multiple other spaces available for various use cases, such as MultiDiscrete, which allow you to use more than one discrete variable for your observation and action space.
The Wrapper class in OpenAI Gym provides you with the functionality to modify various parts of an environment to suit your needs. Why might such a need arise? Maybe you want to normalize your pixel input, or maybe you want to clip your rewards. While typically you could accomplish the same by making another class that subclasses your environment Env class, the Wrapper class allows us to do it more systematically.
But before we begin, let’s switch to a more complex environment that will help us appreciate the utility that Wrapper brings to the table. This complex environment is going to be the the Atari game Breakout.
Before we begin, we will install the Atari components for the gym.
If you have an error to the tune of AttributeError: module ‘enum’ has no attribute ‘IntFlag’, you might need to uninstall the enum package, and then re-attempt the install.
Gameplay of Atari Breakout
Let’s now run the environment with random actions.
Our observation space is a continuous space of dimensions (210, 160, 3) corresponding to an RGB pixel observation of the same size. Our action space contains 4 discrete actions (Left, Right, Do Nothing, Fire)
Now that we have our environment loaded, let us suppose we have to make certain changes to the Atari Environment. It’s a common practice in Deep RL that we construct our observation by concatenating the past k frames together. We have to modify the Breakout Environment such that both our reset and step functions return concatenated observations.
For this, we define a class of type gym. Wrapper to override the reset and return functions of the Breakout Env. The Wrapper class, as the name suggests, is a wrapper on top of an Env class that modifies some of its attributes and functions.
The __init__ function is defined with the Env class for which the wrapper is written, and the number of past frames to be concatenated. Note that we also need to redefine the observation space since we are now using concatenated frames as our observations. (We modify the observation space from (210, 160, 3) to (210, 160, 3 * num_past_frames.)
In the reset function, while we are initializing the environment, since we don’t have any previous observations to concatenate, we concatenate just the initial observations repeatedly.
Now, to effectively get our modified environment, we wrap our environment Env in the wrapper we just created.
Let us now verify whether the observations are indeed concatenated or not.
There is more to Wrappers than the vanilla Wrapper class. Gym also provides you with specific wrappers that target specific elements of the environment, such as observations, rewards, and actions. Their use is demonstrated in the following section.
Let us suppose that we have to make the following changes to our environment:
Now we apply all of these wrappers to our environment in a single line of code to get a modified environment. Then, we verify that all of our intended changes have been applied to the environment.
In case you want to recover the original Env after applying wrappers to it, you can use the unwrapped attribute of an Env class. While the Wrapper class may look like just any other class that subclasses from Env, it does maintain a list of wrappers applied to the base Env.
A lot of Deep RL algorithms (like Asynchronous Actor Critic Methods) use parallel threads, where each thread runs an instance of the environment to both speed up the training process and improve efficiency.
Now we will use another library, also called Baselines, by OpenAI. This library provides us with performant implementations of many standard Deep RL algorithms to compare any novel algorithm with. In addition to these implementations, baselines also provide us with many other features that enable us to prepare our environments in accordance with the way they were used in OpenAI experiments.
One of these features includes wrappers, which allow you to run multiple environments in parallel using a single function call. Before we begin, we first proceed with the installation of baselines by running the following commands in a terminal.
You may need to restart your Jupyter notebook before the installed package becomes available.
The wrapper of interest here is SubProcEnv, which will run all the environments asynchronously. We first create a list of function calls that return the environment we are running in. I have used a lambda function in code to create an anonymous function that returns the gym environment.
This envs now acts as a single environment where we can call the reset and step functions. However, these functions now return an array of observations/actions rather than a single observation/action.
Calling the render function on the vectorized envs displays screenshots of the games in a tiled fashion.
The following screen plays out.
render for the SubProcEnv environment. Learn more about vectorized environments in Gym by exploring the vectorized environments documentation.
Given what we’ve covered so far, you should now be able to start training your reinforcement learning agents using environments from OpenAI Gym. But what if the environment you need isn’t readily available? You’re in luck for a couple of reasons! First, OpenAI Gym allows you to create custom reinforcement learning environments. Second, that’s exactly what we have covered in Part 2 of this series. Until then, enjoy exploring the world of reinforcement learning with OpenAI Gym environments!
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!