AI plays hide-and-seek, finds new strategies along the way

By Natasha Doyle
Thursday, 19 September, 2019

They say there’s a fine line between madness and genius, and if I had to do something 380 million times I’d probably go mad. But after 388 million games of hide-and-seek, this artificial intelligence (AI) did something ‘genius’, unexpected and not even thought possible. It learned to surf.

On 17 September, OpenAI published a blog post about a recent experiment, pitting its AI against other versions of itself in a small arena to see what hide-and-seek strategies they would learn.

The technique is called ‘reinforcement learning’ and it’s just like training a puppy. In the beginning, the AI will make random actions. When it completes the desired action, it gets a reward. With a lot of trial, error and treats, the AI can learn some pretty sophisticated tricks.

In a simple world with only a few hiders and seekers and some walls, boxes and ramps, the researchers expected four outcomes. First, the players learn to run around and chase each other, then the hiders learn to block doorways and build forts with boxes, locking them down so the seekers can’t move them to get in, the seekers discover they can push ramps and use them to jump into the forts, and then the hiders learn to hoard the ramps in their forts so the seekers can’t use them.

But then genius (or madness) struck. The seekers learned they could jump on unlocked boxes, ‘surf’ over to the forts and jump in that way. According to OpenAI, the game’s physics engine allowed the action because the agents move by applying forces to themselves, although they didn’t know the surfing was possible beforehand. The hiders didn’t stand for this though and it only took them 70 million rounds to start locking down all boxes to prevent surfing.

The AI wasn’t totally immune to madness, even if you don’t count the surfing as such. In cases where there was no ‘discipline’ or penalty for leaving the play area, players would run endlessly, and sometimes they’d even take a box with them (kind of like when your dog bolts out the front door with a sock, despite having an ‘endless’ supply of food, warmth and toys).

Hiders also learned that they could abuse contact physics and push ramps out of the play area (okay, maybe that’s more ‘genius’), but perhaps the craziest thing the seekers learned was that “if they run at a wall with a ramp at a right angle, they can launch themselves upward”, according to OpenAI.

While this kind of improvisation is all well and good in a controlled, play environment, a IEEE Spectrum blog post suggested it could spell trouble in the real world. The post references a 2014 interview with Nick Bostrom, who discussed how an AI dedicated to making as many paper clips as possible in a paper clip factory might react to people — considering them either obstructions to their work or, potentially, a source of atoms that could be used to make “some very nice paper clips”.

IEEE Spectrum also quoted OpenAI research team member Bowen Baker, who noted that the unexpected behaviours could be a safety problem when AI is put into more complex environments.

Despite this, Baker told IEEE Spectrum that there’s a “hopeful takeaway” from the surprises given by the hide-and-seek experiment, citing the AI’s potential to solve problems people may not be able to imagine solutions to.

Image credit: ©stock.adobe.com/au/phonlamaiphoto

Information Technology Professionals Association (ITPA) is a not-for-profit organisation focused on continual professional development for its 18,700 members. To learn more about becoming an ITPA member, and the range of training opportunities, mentoring programs, events and online forums available, go to www.itpa.org.au.