This article is the second of a three-part series that focuses on the core components of the Gym library. Gym is a Python library developed and maintained by OpenAI, where its purpose is to house a rich collection of environments for Reinforcement Learning (RL) experiments using a unified interface.
If you have only just started your journey with the Gym framework, the official Gym documentation is a great place to start. However, the information it offers is limited. The articles in this series aim to expand this information and provide a deeper understanding of each of the four components Gym offers. In this article, we will focus on Environments.
Env class is the main building block for Gym and works for both partially and fully-observed environments, providing arbitrary behind-the-scenes dynamics. A list of environments can be found in the envs folder on the Gym GitHub repository. The class has three attributes:
action_space– a space object corresponding to the possible actions an agent can take. The action space isn’t limited to a type of space object, allowing the actions to be discrete, continuous, or a combination of both.
observation_space– a space object corresponding to the environment observations that are provided to the agent. Like the action space, this isn't limited to a type of space object. Observations can be as simple as a list of state, reward, and next state to a multi-dimensional tensor containing colour images.
reward_range– a tuple defining the minimum and maximum possible reward values the agent can receive. Defaults to
Every Gym environment has a unique name consisting of the environment name followed by a v with a version number (
Env class can be created using the gym package with the
A basic example for creating an environment can be found on the official Gym documentation. Additionally, the
Env class has five functions:
step(action) function is the central piece of any environment. It accepts an action provided by the agent used to run one timestep of the environment’s dynamics, divided into four stages:
- Firstly, we tell the environment which action to execute on the next step.
- Next, we retrieve the new observation from the environment after the performed action.
- Thirdly, we obtain the reward that the agent has gained during the current timestep.
- And lastly, we obtain a Boolean value that determines if the episode has ended.
Once these stages have concluded,
step(action) returns a tuple of (
info). Each component returned has a unique type and meaning. These are as follows:
observation (object)– an array or matrix with an agent’s observation data for the current timestep of the environment.
reward (float)– a floating-point number containing the amount of reward returned after the previous action.
done (boolean)– a Boolean indicator to identify if the episode has ended. If true, the episode has ended, and further
step(action)calls will return undefined results.
info (dict)– a dictionary containing auxiliary diagnostic information. The information can help to debug the algorithm but is rarely used in agent training. It is common and acceptable to ignore this information in most RL algorithms.
reset() function is used to reset the environment to its initial state (first state) and returns an object of the initial environment observation. After each episode, use this function to reset the RL environment.
render(mode) function is an optional function takes one of three parameters -
ansi. Each parameter provides different ways to render the environment. Each parameters functionality is detailed below:
human- renders the environment display (or terminal) into a human-readable format, and is set as the functions default value.
rgb_array- provides an n-dimensional array with the shape (x, y, 3), representing the RGB (red, green, blue) values for an x-by-y pixel image, suitable for turning into a video.
ansi- outputs a string (str) or StringIO.StringIO containing a terminal-style text representation.
close() function performs any necessary clean-up of the environment, such as terminating the program once completed or during automatic garbage collection.
seed(seed) function sets the seed for the environment's random number generators and helps to reproduce results. Generally, some environments will use multiple pseudorandom number generators.
Thus, it is recommended that all seeds are managed within this function to avoid accidental conflicts. The function returns a list of the seeds used in the environment's random number generators, where the first value is the "main" seed (typically the provided seed).