Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking

Sample overview

Comparison between OneShot-GMOT (OS-GMOT) (left) and our Grounded-GMOT (right)

Introduction

In response to the limitations of traditional Multiple Object Tracking (MOT) systems, particularly their struggle with diverse and dynamic environments such as surveillance and autonomous driving, our research introduces groundbreaking advancements. Our motivation stems from the need for more adaptable, inclusive tracking technologies capable of handling the complexities of real-world scenarios. With this goal in mind, we present our contributions:

Pipeline Overview

Grounded-GMOT: This novel tracking paradigm leverages natural language processing, enabling the tracking of objects with varied attributes across different settings. This approach significantly expands the applicability of MOT by using descriptive language as a flexible, powerful tool for tracking.
G2MOT dataset: We unveil a large-scale dataset that surpasses existing collections in both diversity and size. The G2MOT dataset is designed to support the nuanced needs of Grounded-GMOT, providing a rich resource for developing and testing advanced tracking systems.
KAM-SORT: Our innovative tracking methodology combines camera motion analysis with adjustments in the balance between motion and appearance information. KAM-SORT represents a significant improvement in tracking accuracy and robustness, addressing the dynamic challenges of real-world object tracking.

Through these contributions, we aim to push the boundaries of what's possible in object tracking, offering solutions that are not only technically advanced but also broadly accessible and adaptable to the evolving demands of various applications.

G2MOT Dataset

Ensuring a fair assessment of GMOT methods demands a dataset of consistent quality, free from annotator bias, and with a clearly defined problem setup. To offer comprehensive coverage of real-world scenarios across different research domains, our released dataset embodies two characteristics:

  1. Diversity: integrating diverse object categories from various sources, encompassing a broad spectrum of classes and diverse properties such as motion, occlusion, appearance similarity, and density. Additionally, it employs high-level semantics like player, athlete, referee, etc., to describe objects in complex contexts, rather than using generic terms like person.
  2. Fine-Grained Annotation: alongside capturing detailed visual attributes like color, texture, and attachments, it offers extensive textual descriptions with existing synonyms alongside captions.
Sample overview

Examples to illustrate the efficacy of IE-Strategy. Left: Output from pre-trained VLM. Right: Output from IE-Strategy.

Module 1: Diversity The dataset integrates a wide array of object categories from various sources. This diversity encompasses a broad spectrum of classes and properties, including motion, occlusion, appearance similarity, and density. Notably, it uses high-level semantics (like player, athlete, referee) to describe objects in complex contexts, moving beyond generic descriptors.

wordcloud

Statistical information of our proposed G2MOT dataset.



Module 2: Fine-Grained Annotation Beyond visual attributes (color, texture, attachments), the dataset provides extensive textual descriptions and synonyms. These annotations offer a granular view of each object, enhancing the dataset's utility for rigorous GMOT method assessment.

anno_exp

Demonstration of superset and subset from horse_4and car-3 in our proposed G2MOT dataset.



KAM-SORT:

The KAM-SORT method addresses tracking within the Generic Multiple Object Tracking (GMOT) setting. It computes the cost matrix between existing tracks 𝒯 and new detections 𝒟 using a novel approach that combines motion and visual appearance cues.

Key contributions and advantages of this method include:

Overall, the KAM-SORT method exhibits a strategic balance between visual and motion cues, addressing the challenges presented by the high similarity of objects in GMOT. This advancement represents a significant step forward in tracking technology, especially in scenarios with rapid motion and deformation.

kam-sort_compare

Tracking comparison on fast motion objects between our KAM-SORT with SORT, OC-SORT and DeepOCSORT on the video “insect-3”


Algorithm: Kalman++ algorithm
Data:
D, T: set of detection boxes at current frame and tracks at the previous frame.
α: parameters of uncertainty revise factor.
Model:
C: score matrix defined in Equation 5.
M: bipartite matching function.
Kp, Ku: Kalman Filter predict and update.
BC, IoU: function compute box center and IoU.
Output:
T' set of new tracks.
Algorithm Steps:
  1. ^x, P = Kp(T); // Get estimated location and error covariance.
  2. S = C(^x, D); // Compute matching score between estimation and detection.
  3. DTm, Dr, Tr = M(S); // 1st-round association produce matched pairs DTm, unmatched detections Dr, and unmatched tracks Tr.
  4. SIoU = IoU(Dr, Tr); // 2nd-round associate unmatched ones.
  5. DTr = M(SIoU); // Rematched pairs from remaining detections and tracks.
  6. For (id, it) in DTr do
    • // id: detection index, it: track index.
    • cmin = ^xit[:2] - α * sqrt(P[:2]);
    • cmax = ^xit[:2] + α * sqrt(P[:2]);
    • c = BC(Did);
    • If c > cmin && c < cmax then
    • DTm = DTm ∪ {(id, it)};
  7. T' = Ku(DTm); // Update matched tracks.