Comparison between OneShot-GMOT (OS-GMOT) (left) and our Grounded-GMOT (right)
In response to the limitations of traditional Multiple Object Tracking (MOT) systems, particularly their struggle with diverse and dynamic environments such as surveillance and autonomous driving, our research introduces groundbreaking advancements. Our motivation stems from the need for more adaptable, inclusive tracking technologies capable of handling the complexities of real-world scenarios. With this goal in mind, we present our contributions:
Grounded-GMOT: This novel tracking paradigm leverages natural language processing, enabling the tracking of objects with varied attributes across different settings. This approach significantly expands the applicability of MOT by using descriptive language as a flexible, powerful tool for tracking.
G2MOT dataset: We unveil a large-scale dataset that surpasses existing collections in both diversity and size. The G2MOT dataset is designed to support the nuanced needs of Grounded-GMOT, providing a rich resource for developing and testing advanced tracking systems.
KAM-SORT: Our innovative tracking methodology combines camera motion analysis with adjustments in the balance between motion and appearance information. KAM-SORT represents a significant improvement in tracking accuracy and robustness, addressing the dynamic challenges of real-world object tracking.
Through these contributions, we aim to push the boundaries of what's possible in object tracking, offering solutions that are not only technically advanced but also broadly accessible and adaptable to the evolving demands of various applications.
Ensuring a fair assessment of GMOT methods demands a dataset of consistent quality, free from annotator bias, and with a clearly defined problem setup. To offer comprehensive coverage of real-world scenarios across different research domains, our released dataset embodies two characteristics:
Examples to illustrate the efficacy of IE-Strategy. Left: Output from pre-trained VLM. Right: Output from IE-Strategy.
Module 1: Diversity The dataset integrates a wide array of object categories from various sources. This diversity encompasses a broad spectrum of classes and properties, including motion, occlusion, appearance similarity, and density. Notably, it uses high-level semantics (like player
, athlete
, referee
) to describe objects in complex contexts, moving beyond generic descriptors.
Statistical information of our proposed G2MOT dataset.
Module 2: Fine-Grained Annotation Beyond visual attributes (color, texture, attachments), the dataset provides extensive textual descriptions and synonyms. These annotations offer a granular view of each object, enhancing the dataset's utility for rigorous GMOT method assessment.
Demonstration of superset and subset from horse_4and car-3 in our proposed G2MOT dataset.
KAM-SORT:
The KAM-SORT method addresses tracking within the Generic Multiple Object Tracking (GMOT) setting. It computes the cost matrix between existing tracks 𝒯 and new detections 𝒟 using a novel approach that combines motion and visual appearance cues.
Key contributions and advantages of this method include:
Overall, the KAM-SORT method exhibits a strategic balance between visual and motion cues, addressing the challenges presented by the high similarity of objects in GMOT. This advancement represents a significant step forward in tracking technology, especially in scenarios with rapid motion and deformation.
Tracking comparison on fast motion objects between our KAM-SORT with SORT, OC-SORT and DeepOCSORT on the video “insect-3”