Simple Online Realtime Tracking (SORT) is used to associate objects detected in video with individual tracks. It also attempts to alleviate some of the issues caused by occlusion (when an object is hidden) by attempting to predict the location of the hidden object, re-identifying the object as it re-emerges. This is useful since it means reasonable tracking can be obtained with an imperfect object detection method [6]. The algorithm consists of four steps - click the bold text for more information on each step
1. Detection
An object detection method is applied to the current frame, returning all object bounding box coordinates [6]
2. Estimation
3. Association
4. Creation and Deletion of Track Identies
While SORT has some capacity for object re-identification (when an obfuscated object is redetected and re-assigned to the correct track) in the form of Estimation, the linear constant velocity model used to predict the location of an object is, on it's own, not a great predictor of the erratic and unpredictable motion of a fruit fly. DeepSORT attempts to remedy this by extending the SORT algorithm to not only track based on spatial distance but also the visual distance - comparing the appearance of a detected object to the previous appearances of tracked objects. This visual distance is derived from an appearance descriptor - a large vector (generated from a deep convolutional network) associated with each tracked object [6].