Pigeons on the Edge

Deploy pigeon detection on low-power and low-cost edge devices


6 min read

Pigeons on the Edge

In this article, we will talk about deploying state-of-the-art computer vision object detection on low-power and low-cost edge devices. Finally, we see a budget hardware setup, which can detect pigeons in almost real-time.

State of the AI

In recent years, there has been a huge advance in AI with the introduction of Large Language Models. These methods have also been adopted by the Computer Vision community, where images are converted to tokens and passed to a Language Model for classification, detection, or other tasks, these are called Vision Language Models. Although these methods show better performance in regards to accuracy or precision, they require far more processing power compared to their older relatives, let's call them the Convolutional Neural Network ones. The need for processing power becomes an even larger issue when deploying a model on the edge.

  • Large Language Model (LLM), e.g ChatGPT, Llama2

  • Vision Language Model (VLM), e.g GPT-4V, Swin-L or ViTL

  • Convolutional Neural Network (CNN), e.g YOLO or EfficientNet

Object Detection

Object detection is a task in computer vision, where we take an image as an input and localize and classify objects within the input image. The de facto metric to assess the quality of an object detector is the mean Average Precision (mAP). This is calculated by the sum of the intersection between the predicted bounding box and the annotated bounding box.

ModelmAP (50-95) COCOModel Size (#params)
ViT-L (Co-DETR)65.9304M

YOLO (Ultralytics on GitHub)

YOLO (You Only Look Once) is the state-of-the-art convolutional neural network based method for object detection, but the newer YOLO versions can perform other tasks like segmentation, pose estimation, or classification.

As we see from the table above VLM methods outperform CNN methods, but require a higher number of trainable parameters. The number of parameters is an indication of both computational power and memory usage, but multiple factors influence inference speed. YOLO comes also in different sizes, indicated by the last character in the name. The smaller a model is, the faster the inference is, and the smaller the memory consumption is, but also the accuracy decreases.

YOLOv8 performance plots

Edge Devices

Edge devices are limited by processing capabilities and power consumption. They give AI capabilities for IoT devices, sensors, cameras, drones, and smartphones. There are low-end solutions that can cost 20$, but one can also choose from high-end System-on-Chip (SoC) solutions above 1000$. The rule of thumb is if it costs more then it has higher processing power and more power consumption. When choosing the hardware, there will be a tradeoff between price and performance. The following tables show a few examples of devices as of February 2024.

Low-End EdgeCoral AINXP iMX8 PlusHailo8
TypeDedicated Chip (NPU)SoC (arm64) + GPU + NPUDedicated Chip (NPU)
Instruction Set SoCN/AFP,INTN/A
Instruction Set GPUN/AFP32N/A
Instruction Set NPUINT8INT8INT8
FormUSB, PCIe, m.2, ChipBoard, ChipPCIe, m.2, Chip
Power Usage~2 Watt~5-15 Watt~2.5 Watt
Price~20 $~60 $~140 $
High-End EdgeIntel UltraQualcomm Snapdragon v3Nvidia Jetson Orin
TypeSoC (x64) + GPU + NPUSoC (arm64) + GPU + NPUSoC (arm64) + GPU + NPU
Instruction Set SoCFP,INT,BF16TBAFP,INT
Instruction Set GPUF16TBAFP32, FP16
Instruction Set NPUINT8INT4, TBAINT8
FormChipChipBoard, Chip
Power Usage~ 18-64 WattTBA~7-60 Watt
Price~ 375 $~ 900 $ TBA~400-1100 $

Deploying a quantized YOLO

In this post, we assume that we already have a quantized model. In short, quantization is the step where the model precision is decreased, for example instead of using FP32, we quantize to INT8. While on many hardware this comes as an improvement in inference speed, the accuracy of the detection will decrease. We discuss quantization in more detail in the next chapter, Quantized YOLO for Edge Solutions.

Model deployment on edge hardware is different for each device. Let's discuss a few cases:

  • Coral AI works only with INT8 precision. This means the model weights are in INT8 precision and inference is performed as integers.

  • NXP is most efficient also using only INT8 precision.

  • Intel can execute inference in different ways. It also depends on, which generation CPU is being used. In general, it consists of a CPU where you can execute either in FP32 or combine FP32 with INT8, an iGPU where you can execute in FP16, and an NPU where you can execute only in INT8.

Now let's test the inference speed on the following hardware:

  • *Intel i7-9750H

  • **Raspberry Pi4 + Coral AI Edge TPU

ModelmAP50-95InferenceAvg Speed
yolov8n F3237.4Unquantized, Intel*24.40 ms ~ 40 FPS
yolov8n INT8+FP3237.1Quantized, Intel*15.18 ms ~ 66 FPS
yolov8n FULL INT832.9Quantized, Coral**61.00 ms ~ 16 FPS

The Setup

This is a home-made setup, which can detect pigeons in almost real-time. This is a Raspberry Pi4 connected with a battery pack and 2 Coral Ai Edge TPUs. One for bird detection and the other for bird classification. After deploying this setup, together with two plastic crows, no more pigeons landed on my balcony in the last 1.5 years. Check out my repo to make sure no pigeons make your balcony dirty. GitHub Repository


For more technical details on the quantization, continue reading the next chapter, Quantized YOLO for Edge Solutions