Qualitative Result

We conduct extensive experiments to empirically prove the performance of our proposed Grounded-GMOT including both detection with GroundingDINO and association with KAM-SORT in the G2MOT problem.

Quantitative Results

We present a detailed comparison through six tables below. The first table highlights the distinctions between our innovative Grounded-GMOT approach and the existing one-shot GMOT and Yolov8 on the G2MOT dataset. The second table shows the superior of our KAM-SORT tracker compare to other SOTA methods. Next, we evaluate the performance of our method under various settings in the third table. In the subsequent four tables, we demonstrate the effectiveness and generalization of KAM-SORT by comparing it with other state-of-the-art MOT methods on MOT20 datasets. We also conduct an ablation study on the parameter θ (theta), which measures the similarity between two vectors and α (alpha) which for uncertainty revision during tracking using KAM-SORT in fifth and sixth table, respectively.

Tracking performance comparison of multiple trackers under various settings of MOT with YOLOv8, OS-GMOT (averaged over five runs), and our proposed Grounded-GMOT on the G²MOT dataset. The best score is in bold
Trackers Settings HOTA ↑ MOTA ↑ IDF1 ↑ DetA ↑ AssA ↑
SORT YOLOv8 Fully-train 5.48 -145.61 0.80 5.78 6.47
OS Five runs of OS 24.77 7.09 24.90 30.22 20.70
Grounded-GMOT Zero-shot 40.73 46.57 44.52 45.13 37.26
DeepSORT YOLOv8 Fully-train 5.21 -156.20 0.74 5.88 5.82
OS Five runs of OS 22.59 -0.20 21.66 29.30 17.89
Grounded-GMOT Zero-shot 36.01 43.30 37.54 43.94 29.96
ByteTrack YOLOv8 Fully-train 6.02 -140.81 0.84 5.80 7.53
OS Five runs of OS 25.16 8.02 26.46 29.38 21.94
Grounded-GMOT Zero-shot 39.89 45.83 45.65 43.35 37.12
OC-SORT YOLOv8 Fully-train 5.48 -127.30 0.76 6.53 6.78
OS Five runs of OS 25.17 12.62 25.96 29.66 21.67
Grounded-GMOT Zero-shot 41.84 46.32 45.92 44.49 39.92
Deep OCSORT YOLOv8 Fully-train 5.72 -145.60 0.81 5.80 6.94
OS Five runs of OS 25.65 7.06 25.92 30.47 21.92
Grounded-GMOT Zero-shot 40.53 46.12 43.08 46.01 36.27
MOTRv2 YOLOv8 Fully-train 3.06 0.48 0.85 0.45 20.71
OS Five runs of OS 28.69 14.18 29.43 26.32 34.88
Grounded-GMOT Zero-shot 42.02 41.68 45.91 41.81 42.54
Tracking performance comparison between the existing trackers and our proposed KAM-SORT tracker on G²MOT dataset. The best score is in bold.
Trackers Settings HOTA ↑ MOTA ↑ IDF1 ↑ DetA ↑ AssA ↑
SORT [10] Grounded-GMOT 40.73 46.57 44.52 45.13 37.26
DeepSORT [11] Grounded-GMOT 36.01 43.30 37.54 43.94 29.96
ByteTrack [12] Grounded-GMOT 39.89 45.83 45.65 43.35 37.12
OC-SORT [13] Grounded-GMOT 41.84 46.32 45.92 44.49 39.92
DeepOC-SORT [14] Grounded-GMOT 40.53 46.12 43.08 46.01 36.27
MOTRv2 [15] Partly-trained 42.02 41.68 45.91 41.81 42.54
KAM-SORT (Ours) Grounded-GMOT 43.03 46.60 47.13 46.05 40.80
Tracking performance of KAM-SORT on G²MOT with various settings.
Settings HOTA ↑ MOTA ↑ IDF1 ↑ DetA ↑ AssA ↑
attribute + classname 42.20 43.26 45.29 44.73 40.15
definition 34.04 26.45 35.83 34.00 34.49
caption 43.03 46.60 47.13 46.05 40.80
Ablation study on the effectiveness of KAM-SORT on MOT20-testset with MOT task. As ByteTrack, OC-SORT uses different thresholds for testset sequences with an offline interpolation procedure, we also report scores by disabling these as in ByteTrack, OC-SORT. As Deep OC-SORT used separated weights for YOLOX, we also report scores by retraining YOLOX on MOT20-trainset as in Deep OC-SORT.
Trackers HOTA ↑ MOTA ↑ IDF1 ↑
MeMOT [16] 54.1 63.7 66.1
FairMOT [17] 54.6 61.8 67.3
GSDT [18] 53.6 67.1 67.5
CSTrack [19] 54.0 66.6 68.6
ByteTrack [20] 61.3 77.8 75.2
OC-SORT [21] 62.4 75.7 76.3
Deep OC-SORT [22] 63.9 75.6 79.2
ByteTrack [20] 60.4 74.2 74.5
OC-SORT [21] 60.5 73.1 74.4
Deep OC-SORT [22] 59.6 75.3 75.2
KAM-SORT (Ours) 62.6 75.2 76.9
An ablation study conducted on the G²MOT dataset to demonstrate the impact of each proposed component within KAM-SORT.
Exp. Appearance-Motion Balance Kalman++ HOTA ↑ MOTA ↑ IDF1 ↑ DetA ↑ AssA ↑
#1 40.53 46.12 43.08 46.01 37.27
#2 41.90 46.35 45.27 46.02 38.71
#3 43.03 46.60 47.12 46.05 40.79
#4 43.03 46.60 47.13 46.05 40.80
Ablation study on hyper-parameters on KAM-SORT.
Vector Similarity θ Uncertainty Revision α
θ HOTA ↑ MOTA ↑ IDF1 ↑ α HOTA ↑ MOTA ↑ IDF1 ↑
22.5° 42.963 46.586 47.013 0.5 43.026 46.601 47.123
45° 43.010 46.600 47.091 1 43.027 46.601 47.126
67.5° 43.020 46.601 47.122 2 43.026 46.602 47.131
80° 43.027 46.601 47.126 3 43.026 46.602 47.131