We are holding a competition on sample-efficient reinforcement learning using human priors. Standard methods require months to years of game time to attain human performance in complex games such as Go and StarCraft. In our competition, participants develop a system to obtain a diamond in Minecraft using only four days of training time.
The MineRL Diamond competition offers a set of Gym environments paired with human demonstrations to provide participants with the ability to tackle the difficult Minecraft sample efficiently. This year we have two tracks:
- Research Track - continues the challenge from last year where the action and observation spaces are vectorized and obfuscated to prevent participants from using domain knowledge to solve the ObtainDiamond task.
- Intro Track - Removes the obfuscation and allows for any creative solution to solving the task, whether entirely scripted, entirely learned, or a hybrid approach.
Sample snippets of the dataset.
- Participants train their agents to play Minecraft. During the round, they submit trained models for evaluation to determine leaderboard ranks.
- At the end of the round, participants submit source code. The models at the top of the leaderboard are re-trained (from scratch) for four days to compute the final score used for ranking.
- 20 participants move on to the second round, 15 from the main track and 5 from the data only track.
- Participants may submit code up to four times. Each submission is trained for four days to compute score. Final ranking is based on best submission for each participant.
- The top participants will present their work at a workshop at NeurIPS 2021.
The Task: Obtain Diamond in Minecraft
Minecraft is a 3D, first-person, open-world game centered around the gathering of resources and creation of structures and items. These structures and items have prerequisite tools and materials required for their creation. As a result, many items require the completion of a series of natural subtasks.
The procedurally generated world is composed of discrete blocks that allow modification. Over the course of gameplay, players change their surroundings by gathering resources and constructing structures.
In this competition, the goal is to obtain a diamond. The agent begins in a random starting location without any items, and receives rewards for obtaining items which are prerequisites for diamond.
The stages of obtaining a diamond.
Top-ranking teams in round 2 will receive rewards from our sponsors. Details will be announced as we finalize agreements.
The organizing team consists of:
- William H. Guss (OpenAI and Carnegie Mellon University)
- Alara Dirik (Boğaziçi University)
- Byron V. Galbraith (Talla)
- Brandon Houghton (OpenAI)
- Anssi Kanervisto (University of Eastern Finland)
- Noboru Sean Kuno (Microsoft Research)
- Stephanie Milani (Carnegie Mellon University)
- Sharada Mohanty (AIcrowd)
- Karolis Ramanauskas
- Ruslan Salakhutdinov (Carnegie Mellon University)
- Rohin Shah (UC Berkeley)
- Nicholay Topin (Carnegie Mellon University)
- Steven H. Wang (UC Berkeley)
- Cody Wild (UC Berkeley)
The advisory committee consists of:
- Manuela Veloso (Carnegie Mellon University and JPMorgan Chase)
- Oriol Vinyals (DeepMind)
- More TBA
If you have any questions, please feel free to contact us:
NeurIPS 2020 Competition: The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors
William H. Guss, Mario Ynocente Castro, Sam Devlin, Brandon Houghton, Noboru Sean Kuno, Crissman Loomis, Keisuke Nakata, Stephanie Milani, Sharada Mohanty, Ruslan Salakhutdinov, Shinya Shiroshita, John Schulman, Nicholay Topin, Avinash Ummadisingu, Oriol Vinyals
NeurIPS 2020 Competition Track
William H. Guss, Cayden Codel, Katja Hofmann, Brandon Houghton, Noboru Kuno, Stephanie Milani, Sharada Mohanty, Diego Perez Liebana, Ruslan Salakhutdinov, Nicholay Topin, Manuela Veloso, Phillip Wang
NeurIPS 2019 Competition Track