minerl.data
¶
The minerl.data
package provides a unified interface for
sampling data from the MineRL-v0 Dataset. Data is accessed by
making a dataset from one of the minerl environments and iterating
over it using one of the iterators provided by the minerl.data.DataPipeline
The following is a description of the various methods included within the package as well as some basic usage examples. To see more detailed descriptions and tutorials on how to use the data API, please take a look at our numerous getting started manuals.
MineRLv0¶
- class minerl.data.DataPipeline(data_directory: <module 'posixpath' from '/Users/cody/.pyenv/versions/3.7.0/lib/python3.7/posixpath.py'>, environment: str, num_workers: int, worker_batch_size: int, min_size_to_dequeue: int, random_seed=42)¶
Bases:
object
Creates a data pipeline object used to itterate through the MineRL-v0 dataset
- property action_space¶
action space of current MineRL environment
- Type
Returns
- batch_iter(batch_size: int, seq_len: int, num_epochs: int = - 1, preload_buffer_size: int = 2, seed: Optional[int] = None)¶
Returns batches of sequences length SEQ_LEN of the data of size BATCH_SIZE. The iterator produces batches sequentially. If an element of a batch reaches the end of its episode, it will be appended with a new episode.
If you wish to obtain metadata of the episodes, consider using load_data instead.
- Parameters
batch_size (int) – The batch size.
seq_len (int) – The size of sequences to produce.
num_epochs (int, optional) – The number of epochs to iterate over the data. Defaults to -1.
preload_buffer_size (int, optional) – Increase to IMPROVE PERFORMANCE. The data iterator uses a queue to prevent blocking, the queue size is the number of trajectories to load into the buffer. Adjust based on memory constraints. Defaults to 32.
seed (int, optional) – [int]. NOT IMPLEMENTED Defaults to None.
- Returns
A generator that yields (sarsd) batches
- Return type
Generator
- get_trajectory_names()¶
Gets all the trajectory names
- Returns
[description]
- Return type
A list of experiment names
- load_data(stream_name: str, skip_interval=0, include_metadata=False, include_monitor_data=False)¶
Iterates over an individual trajectory named stream_name.
- Parameters
stream_name (str) – The stream name desired to be iterated through.
skip_interval (int, optional) – How many sices should be skipped.. Defaults to 0.
include_metadata (bool, optional) – Whether or not meta data about the loaded trajectory should be included.. Defaults to False.
include_monitor_data (bool, optional) – Whether to include all of the monitor data from the environment. Defaults to False.
- Yields
A tuple of (state, player_action, reward_from_action, next_state, is_next_state_terminal). These are tuples are yielded in order of the episode.
- property observation_space¶
action space of current MineRL environment
- Type
Returns
- static read_frame(cap)¶
- sarsd_iter(num_epochs=- 1, max_sequence_len=32, queue_size=None, seed=None, include_metadata=False)¶
Returns a generator for iterating through (state, action, reward, next_state, is_terminal) tuples in the dataset. Loads num_workers files at once as defined in minerl.data.make() and return up to max_sequence_len consecutive samples wrapped in a dict observation space
- Parameters
num_epochs (int, optional) – number of epochs to iterate over or -1 to loop forever. Defaults to -1
max_sequence_len (int, optional) – maximum number of consecutive samples - may be less. Defaults to 32
seed (int, optional) – seed for random directory walk - note, specifying seed as well as a finite num_epochs will cause the ordering of examples to be the same after every call to seq_iter
queue_size (int, optional) – maximum number of elements to buffer at a time, each worker may hold an additional item while waiting to enqueue. Defaults to 16*self.number_of_workers or 2* self.number_of_workers if max_sequence_len == -1
include_metadata (bool, optional) – adds an additional member to the tuple containing metadata about the stream the data was loaded from. Defaults to False
- Yields
A tuple of (state, player_action, reward_from_action, next_state, is_next_state_terminal, (metadata)). Each element is in the format of the environment action/state/reward space and contains as many samples are requested.
- seq_iter(num_epochs=- 1, max_sequence_len=32, queue_size=None, seed=None, include_metadata=False)¶
DEPRECATED METHOD FOR SAMPLING DATA FROM THE MINERL DATASET.
This function is now
DataPipeline.batch_iter()
- property spec: minerl.herobraine.env_spec.EnvSpec¶
- minerl.data.download(directory=None, resolution='low', texture_pack=0, update_environment_variables=True, disable_cache=False, environment=None, competition=None)¶
Downloads MineRLv0 to specified directory. If directory is None, attempts to download to $MINERL_DATA_ROOT. Raises ValueError if both are undefined.
- Parameters
directory (os.path) – destination root for downloading MineRLv0 datasets
resolution (str, optional) – one of [ ‘low’, ‘high’ ] corresponding to video resolutions of [ 64x64,1024x1024 ] respectively (note: high resolution is not currently supported). Defaults to ‘low’.
texture_pack (int, optional) – 0: default Minecraft texture pack, 1: flat semi-realistic texture pack. Defaults to 0.
update_environment_variables (bool, optional) – enables / disables exporting of MINERL_DATA_ROOT environment variable (note: for some os this is only for the current shell) Defaults to True.
disable_cache (bool, optional) – downloads temporary files to local directory. Defaults to False
experiment (str, optional) – specify the desired experiment to download. Will only download data for this experiment. Note there is no hash verification for individual experiments
competition (str, optional) – One of [‘diamond’, ‘basalt’].
- minerl.data.make(environment=None, data_dir=None, num_workers=4, worker_batch_size=32, minimum_size_to_dequeue=32, force_download=False)¶
Initalizes the data loader with the chosen environment
- Parameters
environment (string) – desired MineRL environment
data_dir (string, optional) – specify alternative dataset location. Defaults to None.
num_workers (int, optional) – number of files to load at once. Defaults to 4.
force_download (bool, optional) – specifies whether or not the data should be downloaded if missing. Defaults to False.
- Returns
initalized data pipeline
- Return type