Pigeons on the Edge

In this article, we will talk about deploying state-of-the-art computer vision object detection on low-power and low-cost edge devices. Finally, we see a budget hardware setup, which can detect pigeons in almost real-time.

State of the AI

In recent years, there has been a huge advance in AI with the introduction of Large Language Models. These methods have also been adopted by the Computer Vision community, where images are converted to tokens and passed to a Language Model for classification, detection, or other tasks, these are called Vision Language Models. Although these methods show better performance in regards to accuracy or precision, they require far more processing power compared to their older relatives, let's call them the Convolutional Neural Network ones. The need for processing power becomes an even larger issue when deploying a model on the edge.

Large Language Model (LLM), e.g ChatGPT, Llama2
Vision Language Model (VLM), e.g GPT-4V, Swin-L or ViTL
Convolutional Neural Network (CNN), e.g YOLO or EfficientNet

Object Detection

Object detection is a task in computer vision, where we take an image as an input and localize and classify objects within the input image. The de facto metric to assess the quality of an object detector is the mean Average Precision (mAP). This is calculated by the sum of the intersection between the predicted bounding box and the annotated bounding box.

Model	mAP (50-95) COCO	Model Size (#params)
Swin-L(DINO)	63.2	218M
ViT-L (Co-DETR)	65.9	304M
YOLOv8x	53.9	68.2M
YOLOv8s	44.9	11.2M
YOLOv8n	37.3	3.2M

YOLO (Ultralytics on GitHub)

YOLO (You Only Look Once) is the state-of-the-art convolutional neural network based method for object detection, but the newer YOLO versions can perform other tasks like segmentation, pose estimation, or classification.

As we see from the table above VLM methods outperform CNN methods, but require a higher number of trainable parameters. The number of parameters is an indication of both computational power and memory usage, but multiple factors influence inference speed. YOLO comes also in different sizes, indicated by the last character in the name. The smaller a model is, the faster the inference is, and the smaller the memory consumption is, but also the accuracy decreases.

YOLOv8 performance plots

Edge Devices

Edge devices are limited by processing capabilities and power consumption. They give AI capabilities for IoT devices, sensors, cameras, drones, and smartphones. There are low-end solutions that can cost 20$, but one can also choose from high-end System-on-Chip (SoC) solutions above 1000$. The rule of thumb is if it costs more then it has higher processing power and more power consumption. When choosing the hardware, there will be a tradeoff between price and performance. The following tables show a few examples of devices as of February 2024.

Low-End Edge	Coral AI	NXP iMX8 Plus	Hailo8
Type	Dedicated Chip (NPU)	SoC (arm64) + GPU + NPU	Dedicated Chip (NPU)
Instruction Set SoC	N/A	FP,INT	N/A
Instruction Set GPU	N/A	FP32	N/A
Instruction Set NPU	INT8	INT8	INT8
Form	USB, PCIe, m.2, Chip	Board, Chip	PCIe, m.2, Chip
TOPS (INT)	4 TOPS	2.3 TOPS	26 TOPS
FLOPS (FP)	N/A	7.2 GFLOPS	N/A
Power Usage	~2 Watt	~5-15 Watt	~2.5 Watt
Price	~20 $	~60 $	~140 $

High-End Edge	Intel Ultra	Qualcomm Snapdragon v3	Nvidia Jetson Orin
Type	SoC (x64) + GPU + NPU	SoC (arm64) + GPU + NPU	SoC (arm64) + GPU + NPU
Instruction Set SoC	FP,INT,BF16	TBA	FP,INT
Instruction Set GPU	F16	TBA	FP32, FP16
Instruction Set NPU	INT8	INT4, TBA	INT8
Form	Chip	Chip	Board, Chip
TOPS (INT)	max 34 TOPS	TBA 75 TOPS	20-275 TOPS
FLOPS (FP)	max 4.5 TFLOPS	TBA	max 5.3 TFLOPS
Power Usage	~ 18-64 Watt	TBA	~7-60 Watt
Price	~ 375 $	~ 900 $ TBA	~400-1100 $

Deploying a quantized YOLO

In this post, we assume that we already have a quantized model. In short, quantization is the step where the model precision is decreased, for example instead of using FP32, we quantize to INT8. While on many hardware this comes as an improvement in inference speed, the accuracy of the detection will decrease. We discuss quantization in more detail in the next chapter, Quantized YOLO for Edge Solutions.

Model deployment on edge hardware is different for each device. Let's discuss a few cases:

Coral AI works only with INT8 precision. This means the model weights are in INT8 precision and inference is performed as integers.
NXP is most efficient also using only INT8 precision.
Intel can execute inference in different ways. It also depends on, which generation CPU is being used. In general, it consists of a CPU where you can execute either in FP32 or combine FP32 with INT8, an iGPU where you can execute in FP16, and an NPU where you can execute only in INT8.

Now let's test the inference speed on the following hardware:

*Intel i7-9750H
**Raspberry Pi4 + Coral AI Edge TPU

Model	mAP50-95	Inference	Avg Speed
yolov8n F32	37.4	Unquantized, Intel*	24.40 ms ~ 40 FPS
yolov8n INT8+FP32	37.1	Quantized, Intel*	15.18 ms ~ 66 FPS
yolov8n FULL INT8	32.9	Quantized, Coral**	61.00 ms ~ 16 FPS

The Setup

This is a home-made setup, which can detect pigeons in almost real-time. This is a Raspberry Pi4 connected with a battery pack and 2 Coral Ai Edge TPUs. One for bird detection and the other for bird classification. After deploying this setup, together with two plastic crows, no more pigeons landed on my balcony in the last 1.5 years. Check out my repo to make sure no pigeons make your balcony dirty. GitHub Repository

For more technical details on the quantization, continue reading the next chapter, Quantized YOLO for Edge Solutions