Hello World: Your First Agent¶
With the minerl
package installed on your system you can
now make your first agent in Minecraft!
To get started, let’s first import the necessary packages
import gym
import minerl
Creating an environment¶
Now we can choose any one of the many environments included
in the minerl
package. To learn more about the environments
checkout the environment documentation.
For this tutorial we’ll choose the MineRLNavigateDense-v0
environment. In this task, the agent is challenged with using
a first-person perspective of a random Minecraft map and
navigating to a target.
To create the environment, simply invoke gym.make
env = gym.make('MineRLNavigateDense-v0')
Caution
Currently minerl
only supports environment rendering in headed environments
(servers with monitors attached).
In order to run minerl
environments without a head use a software renderer
such as xvfb
:
xvfb-run python3 <your_script.py>
Note
The first time you run this command to complete, it will take a while as it is recompiling Minecraft with the MineRL simulator mod!
If you’re worried and want to make sure something is happening behind the scenes install a logger before you create the envrionment.
import logging
logging.basicConfig(level=logging.DEBUG)
env = gym.make('MineRLNavigateDense-v0')
Taking actions¶
As a warm up let’s create a random agent. 🧠
Now we can reset this environment to its first position and get our first observation from the agent by resetting the environment.
obs = env.reset()
The obs
variable will be a dictionary containing the following
observations returned by the environment. In the case of the
MineRLNavigate-v0
environment, three observations are returned:
pov
, an RGB image of the agent’s first person perspective;
compassAngle
, a float giving the angle of the agent to its
(approximate) target; and inventory
, a dictionary containing
the amount of 'dirt'
blocks in the agent’s inventory (this
is useful for climbing steep inclines).
{
'pov': array([[[ 63, 63, 68],
[ 63, 63, 68],
[ 63, 63, 68],
...,
[ 92, 92, 100],
[ 92, 92, 100],
[ 92, 92, 100]],,
...,
[[ 95, 118, 176],
[ 95, 119, 177],
[ 96, 119, 178],
...,
[ 93, 116, 172],
[ 93, 115, 171],
[ 92, 115, 170]]], dtype=uint8),
'compassAngle': -63.48639,
'inventory': {'dirt': 0}
}
Note
To see the exact format of observations returned from
and the exact action format expected by env.step
for any environment refer to the environment reference documentation!
Now let’s take actions through the environment until time runs out
or the agent dies. To do this, we will use the normal OpenAI Gym env.step
method.
done = False
while not done:
action = env.action_space.sample()
obs, reward, done, _ = env.step(action)
After running this code you should see your agent move sporadically until the
done
flag is set to true. To confirm that our agent is at least qualitatively
acting randomly, on the right is a plot of the compass angle over the course of the experiment.
No-op actions and a better policy¶
Now let’s make a hard-coded agent that actually runs towards the target. 🧠🧠🧠
To do this at every step of the environment we will take the noop
action with a few modifications; in particular, we will only move forward,
jump, attack, and change the agent’s direction to minimize
the angle between the agent’s movement direction and it’s target, compassAngle
.
import minerl
import gym
env = gym.make('MineRLNavigateDense-v0')
obs = env.reset()
done = False
net_reward = 0
while not done:
action = env.action_space.noop()
action['camera'] = [0, 0.03*obs["compassAngle"]]
action['back'] = 0
action['forward'] = 1
action['jump'] = 1
action['attack'] = 1
obs, reward, done, info = env.step(
action)
net_reward += reward
print("Total reward: ", net_reward)
After running this agent, you should notice marekedly less sporadic
behaviour. Plotting both the compassAngle
and the
net reward over the episode confirm that this policy performs
better than our random policy.
Congratulations! You’ve just made your first agent using the
minerl
framework!