MiGa: Multi-Chicken Gait Assessment

Abstract

Broiler chicken production is a major agricultural industry, yet it faces persistent challenges related to animal welfare—most notably, lameness caused by selective breeding for rapid growth. Traditional gait assessment methods, such as Kestin’s scoring system, obstacle tests, and latency-to-lie, have been valuable but they are typically limited to single-bird evaluations in controlled environments, require trained personnel, and are slow due to their manual nature. In this work, we introduce MiGa, a multi-chicken gait assessment system that leverages computer vision and machine learning to automatically evaluate the gait of multiple birds simultaneously in more naturalistic settings. Our approach integrates four components: a multi-bird detector, a pose estimator, a tracking module, and a gait-score regressor. To support development and benchmarking, we introduce the GAIT dataset suite, which includes dedicated datasets for detection, pose extraction, tracking, and gait-score prediction. This system enables scalable, automated locomotion assessment in realistic multi-bird scenarios, contributing toward improved welfare monitoring in broiler production.

System Overview

We aim to build a gait monitoring system for multiple chickens in a pen. Specifically, the system is designed to detect and track multiple chickens over time. For each chicken, the system predicts a Kestin gait score, which indicates the degree of lameness in the chicken’s legs. We propose a system composed of five key modules as follows: Feature Extractor, Multi-bird Detector, Pose Extractor, Tracking Module and Gait score Regressor.

Experimental Setup

Experimental setup of the proposed system. Left: Top‑down schematic of the pen, with the camera mounted diagonally opposite the feeder. Right: Example frame captured from the camera perspective, which serves as input to the system.

Dataset Statistics

Qualitative Examples

Qualitative examples of our multi-chicken gait-monitoring system on unseen test videos. Bounding boxes are colour-coded by identity; the overlaid skeleton depicts detected keypoints. The text above each box shows the persistent ID and the predicted Kestin gait score. Videos are included in the Supplementary Material.

Quantitative Results

The table 1 shows the Multi-Bird Detection Results where, YOLO11x achieves slightly higher AP50 and AP75 scores, YOLO11n demonstrates a superior overall mAP(50:95) and significantly better computational efficiency. Specifically, YOLO11n operates with only 6.6 GFLOPs and reaches a high inference speed of 1,250 FPS (Frame Per Second), compared to the 196.0 GFLOPs and 163 FPS of YOLO11x. Given this substantial difference in speed and efficiency, we select YOLO11n as the backbone detector in our system, as it best balances accuracy and runtime performance for real-time deployment.
The performance of the Pose Extractor is summarized in Table 2. While YOLO11x-Pose slightly surpasses in terms of mAP(50:95), the YOLO11n-Pose model achieves highly competitive performance with significantly lower computational cost. Notably, it retains a high accuracy of 91.83% mAP(50:95) and achieves 1,250 FPS with only 6.6 GFLOPs. Therefore, to maintain system efficiency without compromising performance, we choose YOLO11n-Keypoint as the backbone for our keypoint estimation module.

Table 3 summarizes the results of three popular multi‑object tracking algorithms— SORT, ByteTrack, and OCSORT on our Tracking Dataset. Among the contenders, OCSORT achieves the best overall accuracy, posting the highest MOTA (0.93) and IDF1 (0.91) as well as leading Recall (0.95), all while matching the top Precision (0.98). These results validate our choice of OCSORT as the tracking backbone for downstream gait‑analysis, offering the best balance between detection accuracy and identity continuity in dense, visually challenging poultry‑house scenes.
Table 4 compares the performance of conventional logistic regression (LR) and a recurrent LSTM model. A moderate temporal context (L=15) combined with a narrow stride (W =5) delivers the strongest single-label (Top-1) performance for LR at 62.82%. In contrast, the same configuration yields the highest Top-2 accuracy for the LSTM at 92.30%, indicating that the recurrent architecture more reliably ranks the correct class within the top two predictions—an attractive property when minor mis-ordering is acceptable in practice.