Open AI Gym: Spaces

This article is the first of a three-part series that focuses on the core components of the Gym library. Gym is a Python library developed and maintained by OpenAI, where its purpose is to house a rich collection of environments for Reinforcement Learning (RL) experiments using a unified interface.

If you have only just started your journey with the Gym framework, the official Gym documentation is a great place to start. However, the information it offers is limited. The articles in this series aim to expand this information and provide a deeper understanding of each of the four components Gym offers. In this article, we will focus on Spaces.

Article Contents

  1. Spaces

Spaces

Figure 1.1 Spaces class hierarchy

Spaces assist in defining different types of environment storage spaces within an RL environment, specifically for the observation and action spaces. Six main types derive from the Space(shape=None, dtype=None) abstract class: Discrete, Box, Dict, Tuple, MultiBinary, and MultiDiscrete. However, all spaces are found on the Gym GitHub repository. The Space abstract class can be inherited from directly. Though, it is highly recommended to use one of the six primary existing space classes. Furthermore, the space folder contains a file dedicated to utility functions that will also be discussed.

There are four main functions that the Space class provides:

  • sample() – randomly samples an element from the space and returns it.
  • contains(x) – returns true or false depending on if x is an item within the space.
  • to_jsonable(sample_n) – converts a given batch (list) of samples into a JSON data type.
  • from_jsonable(sample_n) – converts a given batch (list) of JSON data into a batch of samples for the space.

Every child class that derives from Space creates a version of these four functions that are accessible in the same manner, with the same parameters. However, the Discrete class is the only exception to this, where it uses only two of the four functions: sample() and contains(x).

Additionally, there are two parameters required for the Space class:

  • shape – a tuple of the size of the space.
  • dtype – the format type of the data, such as float32 or int32.

Discrete

The Discrete class represents a mutually exclusive set of items, numbered from 0 to \(n - 1\), where n is the number of items stored within it. For example, Discrete(n=4) could signify an action space with four directions to move in [left, right, up, down], each is associated with a number between 0 and 3, respectively. A simple code example:

  import gym
  env = gym.make("FrozenLake-v0")
  print(env.action_space) 
  # > Discrete(4)
  print(env.action_space.sample()) 
  # > 2

Box

The Box class represents an \(n\)-dimensional tensor of rational numbers with the intervals [low, high]. For instance, this could be an accelerator pedal with a single value between 0 and 1, presented as Box(low=0.0, high=1.0, shape=(1,), dtype=np.float32), creating a one-dimensional tensor with a single float value.

Another example of Box could represent an Atari screen observation space, which is an RGB (red, green, blue) image of size 210x160: Box(low=0, high=255, shape(210, 160, 3), dtype=np.uint8). In this instance, we have a tuple with three arguments stating the height, width, and colour channels (red, green, blue), respectively. Overall, every observation consists of a three-dimensional tensor with 100,800 bytes.

These examples present an identical bound Box for each dimension. However, there is another common use case available for independent bound boxes, such as, Box(low=np.array([-1.0, -2.0]), high=np.array([2.0, 4.0]), dtype=np.float32), where the shape is inferred from the low and high parameters.

The class has one additional function:

  • is_bounded(manner) – accepts the parameters below, above, or both and determines if the Box contains values within the range [-inf, +inf]. If the parameter is below, it will check if -inf is greater than all low numbers, above performs the opposite for all high numbers, and both checks for both comparisons. The function returns true or false based on the conditions.

Code example for identical and independent bound boxes:

  b1 = Box(low=0.0, high=1.0, shape=(1,), dtype=np.float32) # identical
  b2 = Box(low=np.array([-1.0, 2.0]), high=np.array([-2.0, 4.0]), dtype=np.float32) # independent
  print(b1) 
  # > Box(0.0, 1.0, (1,), float32)
  print(b2) 
  # > Box(-1.0, 4.0, (2,), float32) - low.min(), high.max(), shape, dtype
  print(b2.sample()) 
  # > [-1.58, 2.52]

Dict

The Dict class is the first of two methods that provides a way to store multiple Space class instances in a container, providing the ability to index spaces by a given key, making it easy to categorize and find space objects efficiently.

For example, a simple Dict class that reflects a 2D shape with an x-y position and an x-y-z velocity is created as follows:

  Dict({
    "position": Discrete(2), 
    "velocity": Discrete(3)
  })

One powerful feature of the Dict class is that we can nest a Dict inside another Dict to make more comprehensive storage containers for large projects. For example:

  Dict({
    'sensors':  Dict({
      'front_cam': Tuple((
          Box(low=0, high=1, shape=(10, 10, 3)),
          Box(low=0, high=1, shape=(10, 10, 3))
      )),
      'rear_cam': Box(low=0, high=1, shape=(10, 10, 3))
    }),
    'ext_controller': MultiDiscrete((5, 2, 2))
  })

Tuple

The Tuple class is the second method for storing several Space class instances inside a single container, providing a more minimised storage container.

Imagine we are creating an environment for a car. The car requires a continuous action space for three components: steering wheel controls, brake pedal positions, and accelerator pedal positions. Additionally, the car needs two sets of discrete action spaces: a turn signal (off, right, or left) and a horn (on or off).

These controls can be represented in the form of a Tuple class as follows:

  Tuple(spaces=(
      Box(low=-1.0, high=1.0, shape=(3,), dtype=np.float32), 
      Discrete(n=3), 
      Discrete(n=2)
    )
  )

MultiBinary

The MultiBinary class represents an \(n\)-dimensional binary space containing 0s and 1s. MultiBinary accepts one parameter, \(n\), which is the number of items to add to the binary space. n can be a single number, a list of numbers, or a tuple of numbers.

For example:

  mb1 = MultiBinary(5)
  mb2 = MultiBinary([3,2])
  print(mb1.sample()) 
  # > array([0, 1, 0, 1, 0], dtype=int8)
  print(mb2.sample()) 
  # > array([[0, 0], 
             [0, 1], 
             [1, 1]], dtype=int8)

MultiDiscrete

The MultiDiscrete class creates a series of discrete spaces with a different number of elements in each. A use for this class involves game controllers or keyboards, requiring their own unique discrete action space.

The class accepts two parameters:

  • nvec – a list of numbers describing the number of actions in each discrete space.
  • dtype – the format data type of the discrete spaces, defaults to int64.

For example, we can conceptualize a Nintendo Game Controller that has three discrete action spaces, where NOOP stands for no operation:

  1. Arrow keys (5 actions) - NOOP[0], UP[1], RIGHT[2], DOWN[3], LEFT[4] – min: 0, max: 4.
  2. Button A (2 actions) – NOOP[0], PRESSED[1] – min: 0, max: 1.
  3. Button B (2 actions) – NOOP[0], PRESSED[1] – min: 0, max: 1.

The controller is initialized using:

MultiDiscrete(nvec=[5, 2, 2], dtype=np.int64)

Utility Functions

Within the spaces folder, there is a file called utils.py that contains four utility functions:

  • flatdim(space) – returns an integer for the number of dimensions a flatten equivalent of the given space would have.
  • flatten(space, x) – converts a data point x from a given space into a one-dimensional array and returns the 1D array. The function is useful for passing data points from a space into a neural network that accepts only flattened arrays.
  • unflatten(space, x) – reverses the transformation applied by flatten(space, x) and returns a data point with a structure that matches the space.
  • flatten_space(space) - converts a given space into a one-dimensional Box, containing the exact number of dimensions specified with flatdim(space).

Each function has its benefits and can be applied to any RL project by importing the file using the following import statement:

import gym.spaces.utils as gym_utils

A code example:

  b = Box(low=0, high=255, shape=(210, 160, 3), dtype=np.uint8)
  b_flat = gym_utils.flatten_space(b) # Box(0, 255, (100800,), uint8)
  b_flat_count = gym_utils.flatdim(b) # 100800

  print(b.sample()[0][:2]) 
  # > [[18 200 33], [123 139 28]] - first 2 rows of first item
  print(b_flat.sample()[:5]) 
  # > [33 53 352 177 102] - first 5 items