Overview

In order to communicate with the game simulator and enable easy use of various reinforcement learning (RL) methods, we have set up an OpenAI Gym environment for the game. The main aspects of this implementation are the step and reset methods.

  • step simulates performing a tap on a boardpiece and returns the observation, reward, done and info.
  • reset starts the current level and returns the first observation of the level. Additional information and methods are detailed in the following Environment section.

The purpose for providing this is to get started quickly, but you are free to modify the environment (such as reward function or observation space) for this competition in any way you want.

Environment description

Action space

Since the game board has 9 rows and 13 columns, there are 9 * 13 = 117 possible actions. The actions are therefore mapped as following to the game board: Action space overview

Observation space

Board representation

Since there are about 40 distinct game board pieces and additional permutations with color, we have chosen to describe the game board using attributes of the pieces in each position. For this, we have chosen the m=24 mechanics listed below, leading to a 3D observation space with shape (9 x 13 x m). The hitpoints of the board piece is used to represent the piece in the respective layer. If multiple pieces are stacked on top of each other, the hitpoints in the respective cells are simply added together.

Channel Name Description
0 Cell Activated parts of the game board where board pieces can fill up the space.
1-6 Colours Layer representing each of the 6 colors. All colors are mechanically the same.
7-8 Clickable true/false Whether a board piece may be clickable. Clickable board pieces are basic tile and power pieces but it does NOT imply tapping on said piece is a valid move. Any other board piece is non-clickable, such as rocks or other kinds of blockers.
9 Collectgoal Remaining collectgoal in the specific coordinate.
10 Basic piece Standard board piece.
11 Bomb Bomb power pieces.
12 Flask Flask power piece.
13-14 Rocket horizontal/vertical Rocket power piece with the direction.
15 Gravity Whether a board piece has gravity. If so, the board piece will fall down if the cell below is empty. Otherwise it will stay in place.
16 Spreadable Typically grass pieces which can spread if none of the pieces are attacked in the previous.
17 Healable Regenerates 1 HP if not attacked, requiring two or more consecutive attacks to clear.
18 Spawner Will spawn some board piece (typically collectgoals) when attacked.
19 Colorable Can take on the color the piece is attack with or random if done with power piece.
20-23 Hittable by piece / neighbor / power / cluster How a board piece can be attacked. Piece: tap on single piece. Neighbor: matching adjacent pieces. Power: hittable by power pieces. Cluster: tap on 2+ pieces together.

As an example, a basic red piece with 1 hitpoint in index position (0, 0) [lower left corner] that is also a collectgoal will be represented by:

obs[0, 0, 1] = 1
obs[0, 0, 7] = 1
obs[0, 0, 9] = 1
obs[0, 0, 15] = 1
obs[0, 0, 22] = 1
obs[0, 0, 23] = 1

while a rock with no gravity and 2 hitpoints is represented as:

obs[0, 0, 8] = 2
obs[0, 0, 21] = 2
obs[0, 0, 22] = 2

Game state

In addition to the 3D representation of the game board, additional information is also provided in the output from the simulator.

Name Type Description
boardSize tuple<int> Board size in simulator, used to verify size.
normList list<int> Suggested normalisation values for each channel.
collectGoalRemaining int Number of collectgoals remaining. Does not differentiate between types.
collectGoalGoal int Required collectgoals to complete level. Only valid in the 0’th step (reset).

Info dictionary

When taking a step, additional information is provided in the

Name Type Description
valid_steps int Number of valid steps taken in environment
total_steps int Total number of steps taken in environment
successful_click bool Whether the previous action was valid or not

Reward

The current reward function gives the following rewards per step:

  • +1 per collectgoal
  • -0.1 per step
  • -0.5 if invalid step
  • +5 for winning level

Example gameplay

In the github repository is an example of a random agent playing the game:

import gym
from game.simulator import Simulator

# Loading in simulator and environment
sim = Simulator(host="http://localhost:8090")
env = gym.make('lg-competition-v0', level=1, simulator=sim)

# To start a new playthrough, reset environment first to get initial obs. space
obs = env.reset(seed=42)

# Taking a random step in env:
action = env.action_space.sample()  # OpenAI Gym build-in random sampling

obs, reward, done, info_dict = env.step(action)

# Continuing until level is complete:
rewards = reward
while not done:
    action = env.action_space.sample()  # OpenAI Gym build-in random sampling
    obs, reward, done, info_dict = env.step(action)
    rewards += reward
...

Render

A visualisation method has not been implemented yet.