This article is the third and final article in a three-part series that focuses on the core components of the Gym library. Gym is a Python library developed and maintained by OpenAI, where its purpose is to house a rich collection of environments for Reinforcement Learning (RL) experiments using a unified interface.
If you have only just started your journey with the Gym framework, the official Gym documentation is a great place to start. However, the information it offers is limited. The articles in this series aim to expand this information and provide a deeper understanding of each of the four components Gym offers. In this article, we will focus on Wrappers & Monitors.
Wrappers are a method to modularly extend an RL environment's functionality while keeping the original classes separate. Given an environment with a set of observations, a wrapper can provide the ability to accumulate the observations within a replay buffer and output the agent's \(N\) last observations without editing the original class.
Typically, wrappers are "wrapped" around an existing environment and add extra functionality. Gym provides a framework for these situations using the
There are two sets of locations for wrappers on the Gym GitHub repository. The first is in the core.py file, which contains the main template wrappers, all of which are discussed in this section. The second is inside the wrappers folder that houses a wide variety of utility wrappers for different environment requirements and highlights the following quick tips for creating wrappers:
- Remember to use
super().__init__(env)to override the wrapper's
- Inner environments are accessible with
- Previous environment layers are accessed using
- The variables
specare copied to
selffrom the previous environment.
- Wrapped classes require at least one of the following functions:
- A wrapped (layered) function requires an input from the previous layer (
self.env) and/or the inner layer (
Wrapper class inherits from the
Env class, where it accepts a single parameter – the instance of the
Env class to wrap. The
Wrapper class has three child classes that allow filtration of the environments core information. The child wrappers are the
Each wrapper has a unique version of the
step(action) functions found in the Env class. Both of these functions are explained in more detail in the Open AI Gym: Environments article. Furthermore, each child wrapper requires an additional class function, highlighted within their respective sections below.
ObservationWrapper is a wrapper that focuses on observations. It requires a single function
observation(obs), where the
obs argument is a single observation from the wrapped environment. The observation is passed through the function and updated, based on the developer's requirements, and returned by the function.
RewardWrapper requires a single function,
reward(rew). However, this function focuses on agent rewards, not observations. The
rew argument is a single reward value that gets updated within the function and then returned by it.
The last child wrapper,
ActionWrapper, focuses on the third critical component to RL algorithms, actions. Similar to the others, it requires a single function
action(act). The act argument is a single agent action that passes through the function, is updated, and then returned.
For instance, imagine a situation where we want to amend the stream of actions sent by the agent and change them so that with a probability of 10%, the current action is replaced with a random one. Using this approach, the agent will explore the environment more, assisting in solving the exploration/exploitation problem. In practice, the example would look like the following:
import gym import random from typing import TypeVar Action = TypeVar('Action') class RandomActionWrapper(gym.ActionWrapper): """A basic ActionWrapper representation for including random actions.""" def __init__(self, env: gym.Env, epsilon: float = 0.1) -> None: super().__init__(env) self.epsilon = epsilon def action(self, action: Action) -> Action: """ Handles the agent action functionality. Accepts a current action and returns a random action if a random value is less than self.epsilon, otherwise, the current action is taken. """ if random.random() < self.epsilon: return self.env.action_space.sample() return action
Monitor class is implemented like a
Wrapper class and is used to write information about the agent's performance into a file. Monitors are useful for reviewing an agent’s life inside of its environment. In total, there are eight parameters available for the
Monitor class. Two are required and the remaining six are optional. The parameters are:
Envclass object denoting the type of environment to monitor.
directory– a string value containing a non-created directory name for storing the monitor information.
While the six optional arguments are as follows:
video_callable– accepts the parameters
[function, None, False]. If a custom function is provided, it must take in the index of an episode and output a Boolean, indicating whether a video is recorded for the current episode. The default value is
None, which assigns the variable to the monitor's
capped_cubic_video_schedule(episode_id)function. The parameter can be set to
Falseto disable video recording.
force– a Boolean value that works in conjunction with the
directoryargument, which has a default value of false. If set to true, all existing training data within the given direction gets deleted, and a prefix of "openaigym" is applied to each file.
resume– a Boolean value that is set to false by default. If set to true, the training data already in the given
directoryis retained and merged with the new data.
write_upon_reset– a Boolean value set to false by default. When set to true, the monitor will write the manifest (JSON) file on each reset. Warning: this can be computationally expensive.
uid– a string value representing a unique id, used as part of the suffix for the JSON file. If the
None, one is generated automatically using
mode– a string value that is set to
Noneby default. The parameter accepts
[evaluation, training]as values to help distinguish between varying episodes when reviewing the results.
Additionally, there are two components required for the
Monitor class. The first is the FFmpeg utility for converting captured observations into an output video file. If the utility isn’t available,
Monitor will raise an exception.
The second component required is a method for video recording, allowing the monitor to take screenshots of the window drawn by the environment. The recommended approach is to use an Xvfb virtual display, a ‘virtual’ graphical display, where its full name is X11 virtual framebuffer. Xvfb starts a virtual graphical display server and forces the program to draw inside of it.
Using Linux, a standard set of commands to install and run xvfb is seen below:
sudo apt install xvfb python-opengl ffmpeg xvfb-run -s "-screen 0 640x480x24" python filename.py
Monitor instance using the Gym library can be easily created via the following: