For our TinyML (ESE 3600) final project, we built a fully self-contained, physical Connect 4 system that allows a human to play against an AI with no laptop or cloud connection required. Running entirely on a Seeed Studio XIAO ESP32-S3, the system uses an onboard camera and a lightweight CNN to recognize the board state, then applies a Minimax algorithm with Alpha–Beta pruning to select the best move. The AI’s move is displayed directly on an LED board, showcasing an end-to-end integration of embedded vision and decision-making on a single microcontroller.
Special thanks to my teammate Nathan for their collaboration on this project.
Motivation
This project was motivated by two goals: creating a screen-free, hands-on game experience and demonstrating the power of modern edge AI. We wanted a tabletop game that feels tactile and interactive while still featuring an intelligent computer opponent, all running efficiently on a single low-power microcontroller. At the same time, the project showcases what is possible with TinyML by performing onboard image recognition and game decision-making directly on the ESP32-S3.
Data Collection
To ensure reliable performance we created our own dataset using the ESP32’s camera. We captured 150 images of the board while a script displayed random Connect 4 game states. Collecting the data directly on the device ensured the images matched the exact lighting and camera view the AI sees during real gameplay, improving reliability.
Preprocessing
We preprocessed the board images to make them usable for a small CNN. First we tried to classify the whole board position at once but accuracy was really low. We believe this was due to not having enough data points. Later we changed our strategy to splitting each board into 64 tiles, one for each LED. After cropping and segmenting, we created over 9,500 labeled tile images, resized to 32×32 pixels, and sorted them into three classes: Empty, Red (human), and Blue (AI). The dataset was balanced, which helped the model learn accurately without bias.
CNN Architecture
We chose a small Convolutional Neural Network because CNNs handle images well and can run efficiently on microcontrollers. A larger model like MobileNet would be too big, while a fully dense network would not capture spatial patterns.
The model has two convolution layers, one depthwise separable layer, pooling layers, a 64-neuron dense layer, and a 3-class softmax output.
Quantization
We converted the model to TensorFlow Lite with 8-bit dynamic range quantization. This reduced the model size from 305 KB to about 127 KB and sped up inference without hurting accuracy.
CNN Training Results
We trained the model in Google Colab using an 85/15 train-test split with the Adam optimizer (learning rate 0.001). Early stopping and learning-rate reduction helped prevent overfitting, and the model converged in about 50 epochs. On the validation set, it achieved 99.86% accuracy and a loss of 0.0065. Breaking this down by class reveals the model's robustness:
Empty Tiles: 99.7% accuracy (730 samples)
Red Tiles: 100.0% accuracy (351 samples)
Blue Tiles: 100.0% accuracy (350 samples)
The model reliably distinguished all classes, even with variations in LED brightness and minor light bleeding.
On-Device Performance
Using EloquentTinyML, the quantized model ran on a 100 KB tensor arena. For each turn, the system captures a frame, splits it into 64 tiles, and runs the CNN on each one. All 64 inferences took about 13 seconds, but the game remained playable. In the future, we hope to speed this up with a more optimized model or faster image-processing techniques.
Deployment and Hardware
The system uses a Seeed Studio XIAO ESP32-S3 with a built-in camera, an 8×8 WS2812B LED matrix, and 8 push buttons for player input. To save pins, all 8 buttons are read through a single analog pin using a voltage-divider ladder, with the software mapping voltage ranges to the correct column. This setup keeps the hardware simple while still supporting full gameplay.
(voltage-divider circuit diagram)
Software and AI Strategy
The software is written in C++ using the Arduino framework. After the CNN predicts each LED’s state, the full board is built and fed into a Minimax algorithm with Alpha-Beta pruning to choose the AI’s move.
The search depth is set to 5 turns ahead, balancing intelligence with calculation speed. The evaluation function uses a positional weight matrix to prioritize the center columns, which statistically offer more winning opportunities, and scans the board for 4-cell "windows" to score potential threats and opportunities.
Demo
Results
From 22 games, the AI Player defeated the human player in 21 of them (~95%).
Future Improvements
Looking forward, we aim to improve inference speed, enhance vision robustness under different lighting conditions, and explore replacing the Minimax algorithm with a Deep Reinforcement Learning approach. We also plan to expand the user experience with sound effects or voice feedback and move the hardware off the breadboard onto a perfboard or custom PCB to create a more compact and polished system.