Using reinforcement learning from scratch to teach a computer to play Tic-Tac-Toe
It appears that everyone in the AI sector is currently honing their Reinforcement Learning (RL) skills, especially in Q-learning, following the recent rumours about OpenAI’s new AI model, Q* and I’m joining in too. However, rather than speculating about Q* or revisiting old papers and examples for Q-learning, I’ve decided to use my enthusiasm for board games to give an introduction to Q-learning 🤓
In this blog post, I will create a simple programme from scratch to teach a model how to play Tic-Tac-Toe (TTT). I will refrain from using any RL libraries like Gym or Stable Baselines; everything is hand-coded in native Python, and the script is merely 100 lines long. If you’re curious about how to instruct an AI to play games, keep reading.
You can find all the code on GitHub at https://github.com/marshmellow77/tictactoe-q.
Teaching an AI to play Tic-Tac-Toe (TTT) might not seem all that important. However, it does provide a (hopefully) clear and understandable introduction to Q-learning and RL, which might be important in the field of Generative AI (GenAI) since there has been speculation that stand-alone GenAI models, such as GPT-4, are insufficient for significant advancements. They are limited by the fact that they can only ever predict the next token and not being able to reason at all. RL is believed to be able to address this issue and potentially enhance the responses from GenAI models.
But whether you’re aiming to brush up on your RL skills in anticipation of these advancements, or you’re simply seeking an engaging introduction to Q-learning, this tutorial is designed for both scenarios 🤗
At its core, Q-learning is an algorithm that learns the value of an action in a particular state, and then uses this information to find the best action. Let’s consider the example of the Frozen Lake game, a popular single-player game used to demonstrate Q-learning.