How does modern AI work? – Math for my mom

This is part of a series of posts aimed at helping my mom, who is not a scientist, understand what I’m up to as a mathematician.

Lately, Artificial Intelligence (AI) has made some remarkable milestones. There are computers that are better than humans at the strategy board game GO and at Poker. Computers can turn pictures into short moving clips and can “enhance” blurry pictures as in television crime shows. They can also produce new music in the style of Bach or customized to your tastes. It’s all very exciting, and it feels pretty surreal; remember back when Skype video calling felt like the future?

I’m going to give you a broad overview for how these types of AI work, and how they learn. There won’t be any equations or algebra.

Let’s play

Before we jump into the computer stuff, let’s make our very first AI. Well, this will be more “I” than “AI”, because I want you to play a game. You are going to be the “AI” that’s going to learn a task!

I want you to play Zrist for about 5 minutes (or longer if you like it). It’s a fun little platform game. See how far you can get. My best score was 37 400. We’ll use this experience to help describe how AI works. Okay, go play now!

Welcome back! I hope you had fun playing that game.

I want you to think about these questions, and give an answer to each of them. (It’s not a test, there are no wrong answers.)

1. What was the goal of the game?
2. How did you know you were doing well at the game?
3. How did you adapt to the rules changes? Did you get them on the first try?
4. How did you make decisions about what to do next? (What did you look for, and what did you ignore?)

Mar I/O plays mario

We’ll come back to your answers in a moment. For now, I want you to watch a bit of a video of an AI (called Mar I/O) learning to play the original 1985 Super Mario Brothers. Watch maybe the first 4 or 5 minutes, and then skip to the middle of the video. You only need to watch a little bit to get the sense of what’s going on.

(If you like this, you can watch a livestream of Mar I/O’s attempts to beat the game level by level.)

First of all, this program starts off only knowing a couple of things:

1. It can see a simplified version of the screen (that’s what appears in the top left of the video).
2. It can press any buttons on a normal controller.
3. It knows that it wants to increase its “fitness” score, which is increased the further mario gets in the level, but it doesn’t know why its fitness score increases.

Here are some things it doesn’t know:

1. It doesn’t know the rules of the game or what the buttons on the controller do.
2. It doesn’t know that it controls mario.
3. It doesn’t know that touching an enemy will kill mario.

If you’re interested, Mar I/O is a Recurrent Neural Net. There are other types of AI, but this is the we’ll look at today.

“See how many envelopes you can lick in an hour, then try to break that record!”

So at first it tries random stuff to increase its fitness score: jumping, standing still, ducking, running left, and none of these seem to increase its fitness. Then, when it presses right, mario starts progressing in the level and its fitness score goes up.

This is called training the AI. It measures its progress against a fitness score, and it reinforces behaviour that increases that score. i.e. It starts to favour pressing right because that seems to increase its fitness score.

This works great until it gets to the first enemy and mario runs right into it and dies. After a couple more tries, it starts to experiment some more (just like it was trying random things at the beginning of the level). Around the 2:20 mark of the video, Mar I/O presses the jump button right before the enemy and successfully clears it, allowing mario to move further right and increase its fitness score.

To recap:

1. The AI tries random things, until something increases its “fitness score”.
2. When it does something that increases its fitness score, it tries to do more of that in the future. (Training and learning.)
3. It keeps repeating this procedure over and over. Training takes time!

Back to Zrist

Let’s go back to the platform game you played and look at how you learned to play the game.

What was the goal of the game?
How did you know you were doing well at the game?

I asked you to get as far in the level as you could; that was your goal. The game kept track of it by telling you your current high score. That was your fitness score!

How did you adapt to the rules changes? Did you get them on the first try?

If you’re anything like me, when the rules changed for the first time you thought, “Oh crap, what’s this?”, and then promptly died when the screen said “Mode: lag”. What were you supposed to do?! No one told you what to do!

When my character turned invisible, the screen stopped scrolling and I wasn’t sure what to do. At that point I just pressed buttons until it started to scroll again; i.e. I tried random things when I got stuck. As I continued to get stuck and unstuck, I recognized that I was getting stuck at the short walls, and that jumping over them saved me then. Trying the same trick saved me again when I was invisible. i.e. I was training on the short walls.

This is very similar to how Mar I/O trains and learns.

For comparison, here’s a video of one of the best Mario players in the world, CarlSagan42, taking 18 hours to beat an extremely difficult fan-made level. (Warning: there are a bunch of swear words.)

Notice a couple things:

1. When faced with a new obstacle, he often just “tries something”. Maybe it works, maybe it doesn’t; Either way he gets information.
2. The more he plays a section the easier it gets; he’s training on the earlier sections more.
3. The sections near the end are harder to get to, sometimes taking 10+ minutes just to reach that section again. It means those sections don’t get trained very much.

These are all in common with Mar I/O.

Neurons and memory; an artificial brain

How did you make decisions about what to do next? (What did you look for, and what did you ignore?)

In Zrist, you were probably looking for gaps (to jump over), those horrible red death blocks, and big walls to slide under. For each of these you developed a reaction: “When I see a gap, then I press C (to jump over it)”.

For each of these you had to remember a task: If I see a gap, then I jump over it.

For AI like Mar I/O, it stores these tasks by associating visual cues and inputs with button presses. For example, when it sees a wide open space it learns to press the right button. When it sees a gap in the ground it learns to press the A button (to jump).

Now Mar I/O doesn’t have any extra code which tells it “this is what a pit looks like” or “this is what a pipe is” or anything like that, (although it can see enemies as black tiles, it doesn’t know what an enemy is).

Each time it succeeds at increasing its fitness score it strengthens the connections between the visual cues and the sequence of button presses that got it there. Each connection like this is stored in the AI as an “artificial neuron”. So when you were playing Zrist, you probably developed a neuron relating to gaps (“If gap, then jump”), one for tall walls (“If tall wall, then slide”), and many others.

“Features don’t have any deep meaning. They’re just stupid drawings that give you a cheap laugh”

The very cool thing about modern AI is that you typically don’t need to tell it what or how many artificial neurons to make ahead of time, Mar I/O adds neurons as it learns. It’s just like how you didn’t need to know how many types of obstacles you would face in Zrist, you built up a list as you went. This is very powerful!

The flip side to this is that after Mar I/O learns to beat a level, we humans will have a hard time understanding what it’s using to make its decisions. It won’t always be clear to us what visual elements (called “features”) it’s using to make its decisions.

Human learning

Hopefully you see some of the parallels between the way AIs learn things and the way humans learn things. There are a lot of similarities. Mimicking human learning has been very useful for creating AIs.

I’m going to point out a couple other ways that humans learn that help illustrate ways in which AI can learn.

1. Muscle memory.
2. Learning from others.

Have you ever driven somewhere familiar and then forgotten how you got there? You were on autopilot. Similarly, have you ever been doing something with your hands, like playing the piano, but when you stop to think about what you’re actually doing, the task suddenly becomes much harder. This sort of muscle memory is very similar to what Mar I/O is doing. It learns sequences of moves and button presses, but there is no underlying reasoning.

I skipped over a big part of Mar I/O’s learning, which is that it actually contains many different “styles” of players (called species); it’s not just a single mario learning. After each species completes about 10 attempts at beating the level, we rank the species by which achieved the highest fitness. We then delete the bottom 10% of the species and replace them by blending some of the best species (in a process called breeding). This ensures that if one of the mediocre species discovers something useful (like shooting fireballs can kill enemies) it still has a chance to give that idea to the best performers. Similarly, the best performers get to share their ideas with the mediocre performers.

One of these processes is called a generation. For easy levels, Mar I/O only needed 40 or so generations. For difficult levels, Mar I/O needed over 250 generations! It can take a long time for these random mutations to produce helpful effects.

If this feels a lot like evolution, well that’s because it is! These AI learn by evolving and refining their strategies. This is a very deep and powerful idea, but I’ve already gone on long enough, so I’ll save it for another time.

The limits of AI

The advancement of AI evokes many feelings: Awe and wonder, but also fear and skepticism. So I’ll end this post talking about what the future might look like.

AI are machines. The term artificial intelligence might better be described as artificial skill. Mar I/O is only able to maximize a fitness score. It’s quite good at that, but that’s the only thing it can do. This AI is highly specialized to Super Mario Brothers. While it’s possible that the underlying Mar I/O code can be adapted to other games (like Mario Kart), it requires human knowledge, judgement and skill to adapt it to other settings.

We don’t expect that Mar I/O will turn ever turn into a killer robot. At its core, Mar I/O is a (complicated) machine that presses buttons and is good at increasing a number (its fitness score).