Dataset

Google Drive link
Comparison of existing datasets of SOT, MOT, GSOT, GMOT. "#" represents the quantity of the respective items. Cat., Vid. denote Categories and Videos. Obj.: average number of objects per frame. App.: appearance similarity (%) between objects in a frame, calculated by the average cosine similarity of objects in the same frame; Den. density of objects in a frame, computed by the maximum number of objects at the same pixel. Occ.: occlusion between objects in a frame, represented by the average ratio of IoU (%) of the bounding boxes in the same frame; Mot.: motion speed of objects in a video, calculated by the average ratio of the IoU (%) of the bounding boxes in the same track in consecutive frames.
Datasets Task NLP #Cat. #Vid. #Frames #Tracks #Boxs Obj. App. Den. Occ. Mot.
OTB2013 SOT 10 51 29K 51 29K -- -- -- -- --
VOT2017 SOT 24 60 21K 60 21K -- -- -- -- --
TrackingNet SOT 21 31K 14M 31K 14M -- -- -- -- --
MOT17 MOT 1 14 11.2K 1.3K 0.3M 39(35) 62(10) 3.85(1.50) 14(16) 94(11)
MOT20 MOT 1 8 13.41K 3.45K 1.65M 150(70) 68(8) 6.42(1.20) 15(15) 96(4)
Omni-MOT MOT 1 -- 14M+ 250K 110M -- -- -- -- --
DanceTrack MOT 1 100 105K 990 -- 9(5) 77(7) 2.67(0.99) 21(17) 90(9)
TAO MOT 833 2.9K 2.6M 17.2K 333K 3(2) 69(7) 1.82(0.76) 11(14) 49(34)
SportsMOT MOT 1 240 150K 3.4K 1.62M 11(3) 73(8) 2.44(0.80) 18(17) 80(16)
GOT-10k GSOT 563 10K 1.5M 10K 1.5M -- -- -- -- --
Fish GSOT 1 1.6K 527.2K 8.25K 516K -- -- -- -- --
AnimalTrack GMOT 10 58 24.7K 1.92K 429K 17(9) 72(8) 3.13(1.22) 15(15) 91(11)
GMOT-40 GMOT 10 40 9K 2.02K 256K 24(17) 71(9) 2.56(0.88) 11(12) 43(44)
LaSOT SOT coarse 70 1.4K 3.52M 1.4K 3.52M -- -- -- -- --
TNL2K SOT coarse -- 2K 1.24M 2K 1.24M -- -- -- -- --
Refer-KITTI MOT coarse 2 18 6.65K 637 28.72K 5(4) 65(6) 1.78(0.74) 11(11) 73(21)
G2MOT (Ours) GMOT fine 20 253 157.2K 5.84K 1.87M 12(5) 74(8) 2.65(0.95) 18(16) 84(14)

Combining datasets in object tracking offers strategic advantages. First, individual tracking datasets focus on specific challenges. Second, merging tracking datasets yields diverse challenges requiring tracking models to efficiently in varied scenarios. Therefore, by combining datasets, we can evaluate the tracking models' ability to deal with diverse scenarios e.g. object movements, density, similar appearance, and occlusion which are in line with the goal of the GMOT challenge. Finally, our ultimate objective is to propose a new paradigm for GMOT and create a challenging benchmark dataset under various demanding real-world scenarios.

Each video in these datasets has been carefully annotated with several details:

The annotations are formatted in JSON, and we provide examples to illustrate how they are structured. This data, prepared by 4 annotators, will be shared publicly.

Text label for referring with specific attributes
{
    id: "",
    video_id: "",
    is_eval: "",
    type: "",
    superset_idx: "",
    class_name: "",
    synonyms:[],
    definition: "",
    attributes: []
    track_path: "",
    caption: "",
}
                    
Track label for associating objects' IDs through time
1, 1, xl, yt, w, h, 1, 1, 1
1, 2, xl, yt, w, h, 1, 1, 1
1, 3, xl, yt, w, h, 1, 1, 1
2, 1, xl, yt, w, h, 1, 1, 1
2, 2, xl, yt, w, h, 1, 1, 1
2, 3, xl, yt, w, h, 1, 1, 1
3, 1, xl, yt, w, h, 1, 1, 1
3, 2, xl, yt, w, h, 1, 1, 1
3, 3, xl, yt, w, h, 1, 1, 1
                        
Image 1
                video: "airplane-1",
                label:{
                        class_name: "helicopter",
                        class_synonyms:["airplane", "aircraft", "jet", "plane"],
                        definition: "a vehicle designed for flight in the air",
                        include_attributes: ["black", "flying"],
                        exclude_attributes: [],
                        caption: "Track all black flying helicopters",
                        track_path: "airplane_01.txt"
                }
            
Image 1
              video: "car-1"
              label:{
                      class_name: "car",
                      class_synonyms: ["vehicle", "automobile", "auto", "transport", "transportation"],
                      definition: "mechanical device designed for transportation, powered by an engine or motor, equipped by four wheels",
                      include_attributes:  ["white headlight", "oncoming traffic"],
                      exclude_attributes:  ["red taillight", "opposite traffic"],
                      caption:  "Track white headlight cars while excluding red taillight cars",
                      track_path: "car_01.txt",
              }