Introduction to the world of AI-Challenges

Bitjitsu, is a AI-Challenge. So, what is an AI-Challenge? I’ll explain this with an example. Lets take Chess, it’s a pretty cool game. Normally Chess is played between two players. Or a player and a computer, lets call the part of computer playing the game as a bot. This bot plays the moves for the chess game, that is its sole purpose. The AI-Challenge is where two or more bots play against each other, in case of chess there are only two bots fighting each other. In AI-Challenge, participants are expected to write bots, the bots compete against each other and the winner eventually emerges. So what did I have to do with all this? I had to write the game platform (game-engine and the game), maybe some sloppy docs and a tutorial.

Now that we know what an AI-Challenge is, we will dive head first into the interesting stuff, the architecture of the platform. The entire thing runs as a single process, that is the engine and the game. The game is like a set of API that the engine uses to run the whole process. So the game dependent parts are in the game. The game independent parts are in the engine. Functions of the engine (in chronological order)

  • Load the map
  • Spawn bot processes
  • Check if they are actually bots. For my engine, the bots have to write I’m Poppy! to their STDOUT, else the bot is set invalid. For those who don’t know who Poppy is, she is the queen of the Internet. Click here to get enlightened.
  • Suspend all the bots
  • The map data and the details of the bots that are validated are passed on to the game, it returns a new game-state.
  • All the bots that are set invalid are killed
  • The engine, writes the game-state to each of the bots and suspend them again.
  • Now for each bot, resume the bot, perform a non blocking read after the fixed time quantum.
  • validate the move, if the move is invalid, the bot is set invalid
  • after the moves of the bots are collected, then it is passed on to the game to create a new game-state, then back to step six.

Fairness of the Game

One thing that we have included for fairness is every bot will get an equal amount of wall time to make a move. You may ask why wall time and not CPU time. We thought the same, it’s a very bad idea to give the users a fixed amount of CPU time. Lets say that our bot prints I’m Poppy! ten thousand times, how much CPU time so you think that it would have taken? NONE! NADA! One more thing, if the bot goes to sleep, then it uses not CPU time. That is it, so we cannot find a better way to shoot ourselves in the foot than waiting for specific amount of CPU time on a random process. So wall time may not ensure fairness to the participants but we cannot do better in that department. But if we add one more constraint, run only one process at a time (excluding the init and some other system processes of course), this ensures fairness to some extent.

Some interesting problems

We had some interesting problems. The IPC, what would be the easiest and the laziest way for the bot to communicate with the engine? Probably it is the bot’s own STDIN and STDOUT. So in order for that to happen, the engine has to establish pipes between the engine and the bot, also the STDOUT and the STDIN of the bot has to be piped to some file from the engine’s perspective. This can be easily achieved with the help of popen from subprocess module (builtin), but we did not use it. What we did would be very displeasing for many of you. We used the os module. We explicitly used fork, dup2 and pipe system calls. The thing is that popen does these things for you. But why did we not use popen?

To the next problem, We have to allow random strangers run their code on our server. What could be worse than this? What could they do? They could launch a fork bomb for starters! It is a problem that we have to solve. We used the python version of seccomp-bpf filter. When this filter is employed on a process, the process can run only selected syscalls. Lets say we ban fork on the bot process, now if the bot tries to launch a fork bomb, it will be killed before you know it. There is another problem, we have to launch this filter at the start of the process. We can put this on the bot size of the code, in the API or so but that would be bad. What if the person who submits the code decides to remove the code for the seccomp filter from his side of the code. He certainly has the freedom to do as he pleases. One more thing that we could do is launch this filter immediately after the fork and before the execl or the family of syscalls.

The next problem was the amount of memory that each participant must be allowed to use. We used prlimit to solve this problem. We explored cgroupspy but we stuck with easier solution. Anyway, we set the upper limit of memory consumption for each bot as one gigabyte.

Would the execl syscall not wipe out the seccomp filter?

Answer: The answer to that question would be no. That is simply because of how the seccomp filter operates, the filter operates at the kernel level and acts as a trap, then the validation of the syscall happens and appropriate action is taken, that is either to execute the syscall or to kill the process making the syscall (or send another signal and raise an exception). Not having libseccomp available via pip was a pain, but I did get through it. I’ve written about it here.

Now we have the reason why we did not use popen that would have made our lives so much more easier. Yeah, yeah I get it, stop with the claps already.

Now, to the next problem, what kind of reads and writes are we supposed to do? Remember, we give only fixed amount of time for the bots to make each of their moves. So we have to do a non blocking read whether the bot finishes writing its move to its STDOUT or not. If it writes half its move or if it makes an invalid move the bot is killed (that was the policy decision we made).

Now how do wait for exactly some fixed time quantum? Before we answer that question we are running the processes serially. Only one process runs at a time, either one of the bots or the engine. Like we discussed before remember we are using the wall time. One easy way to do this would be to make the engine sleep for the fixed time quantum, wake up and the collect the move made by the bot. This would be the easiest way to do it. We have one problem with this solution, even if the bot finishes its work, the engine is going to sit idle and do nothing about it. We can do better. Another way to do it is busy waiting and yeah not so elegant but solves the problem. In a search for elegant solution, I discovered select. select allows you to wait on a file descriptor for a selected amount of time, if the write has been performed, then it will return the file descriptor else it returns an empty list. This is great, we almost solved it. What is the problem now? Think about it, how is a write action determined? For the reader the write action has been performed when the buffer is flushed by the writer. What if the size of the buffer is small, or it is line buffered? We have a problem. One thing that we could do is increase the size of the buffer. In Linux this can be done with fcntl and some parameters. This feature is not even available in BSD that is why I said Linux, but I try my best so that it runs in other Unix computers at least. Very well now that we have set the buffer size big enough, the bots have to explicitly flush their buffers else their moves will not be counted. If we are giving the API we can make sure of the explicit flush.

Speculations

We have more problems to solve. Had we used the Network stack for IPC, we would not have faced some of the problems, still we could not have made it cross platform. After the AI-Challenge, we explored Docker. Running one process per container is not what we wanted to do. There was phusion/passenger to the rescue. They have some problems solved for people like me. Click here for reading about that interesting blog. I have not created or published the image for the game yet. Also the idea of running an instance of the game in a container was something that we discussed earlier but, we skipped it at that time because we had to learn about it. This idea was better than running the game with the seccomp filter. In the sense of protecting the server, but we still need seccomp filter to kill the specific bot for making illegal system calls.

The Game

The game that we chose was agar.io. I was watching my soap House of Cards, in one episode, Kevin Spacey was introduced to agar.io, that was when I thought to myself, that would be a great game for Bitjitsu. Unfortunately agar.io is a popular game and people have written AI-bots for it already. We had to change some of the game mechanics for the participants to be not able to submit the same AI-Scripts.

There was one interesting thing about the game, validation of the moves. We used jsonschema for that. It was very interesting library. Other than that there was only a couple of collision handling and scoring. The scoring was within the game to determine the winner and not for the points on the leader-board, so I did not go through the ELO rating system.

So when is the AI-Challenge?

It was canceled due to lack of participation and buggy platform. I did not implement some parts of the game. We removed those features from the game.

Acknowledgment

I would like to thank Ananya Bahadur for his guidance on this interesting journey. I would also like to thank Paul Moore, for helping me out on the libseccomp mailing list. I would also like to thank the #python irc channel on Freenode for pointing out to use select with pipes. I would also like to thank the StackOverflow community for answering my questions. I became a karma whore on StackOverflow, as a result of working on this project.

Edit: Check our code base here.