Tracking Methods

Simple Online Realtime Tracking (SORT)

Simple Online Realtime Tracking (SORT) is used to associate objects detected in video with individual tracks. It also attempts to alleviate some of the issues caused by occlusion (when an object is hidden) by attempting to predict the location of the hidden object, re-identifying the object as it re-emerges. This is useful since it means reasonable tracking can be obtained with an imperfect object detection method [6]. The algorithm consists of four steps - click the bold text for more information on each step

1. Detection

An object detection method is applied to the current frame, returning all object bounding box coordinates [6]

2. Estimation

The algorithm attempts to predict the new location of each tracked object. It does this using a linear constant velocity model [6].

3. Association

The predicted locations of each tracked object are matched with the object detections from step 1. For this an assignment cost matrix is computed using distance computed as the intersection-over-union (IOU) between detected and predicted bounding boxes. A parameter IOUmin governs the minimum IOU required to be considered for assignment. Assignment is then solved optimally using the Hungarian Algorithm. After assignment, the linear velocity model is corrected to align its predictions with the assigned detections. This is achieved using a Kalman filter [6].

4. Creation and Deletion of Track Identies

If a new object enters the frame and its detected bounding box has an IOU below IOUmin for all tracked objects, it is assigned a new track.

Conversely, if a tracked object is not detected for a number of frames (governed by the parameter TLost), the track is terminated [6].

Simple Online Realtime Tracking with a Deep Association Metric (DeepSORT)

While SORT has some capacity for object re-identification (when an obfuscated object is redetected and re-assigned to the correct track) in the form of Estimation, the linear constant velocity model used to predict the location of an object is, on it's own, not a great predictor of the erratic and unpredictable motion of a fruit fly. DeepSORT attempts to remedy this by extending the SORT algorithm to not only track based on spatial distance but also the visual distance - comparing the appearance of a detected object to the previous appearances of tracked objects. This visual distance is derived from an appearance descriptor - a large vector (generated from a deep convolutional network) associated with each tracked object [6].