Results

Comparison of Viola-Jones models

Figure 7.a. Contains a side by side comparison of video inference with both the curated and enriched Viola-Jones object detection models. Both models correctly detect well lit and well dispersed fruit flies. However, both models appear to Consistently misidentify:

1. Flies at boundaries of intensity
2. Flies grouped together in the shaded corners (particularly in the lower right)
3. A portion of the experimental apparatus as a false positive throughout the video.
‍
The performance metrics in Figure 7.b. Indicate that the enriched model, with a smaller Absolute Deviation, has a higher efficacy than the curated model. However, both maintain a similar Mean Fly Count that vastly underpredicts the number of flies. This implies the gain in efficacy is minor since object detection appears to fail in the same edge cases. This is further re-inforced when consulting Figure 7.c. which depicts the deviation per frame of both models. Though it is clear from the plot that the enriched model has a lower deviation, both plots appear to follow the same pattern.

Figure 7.a. Side by side comparison of Viola-Jones video inference. The predicted bounding boxes for each frame in either detection method are overlayed on the test video.

Figure 7.b. Performance metrics for Viola-Jones video inference.

‍

Model	Absolute Deviation	Mean Fly Count (True count = 41)
Enriched	14757	37.503
Curated	16530	37.077

Figure 7.c. Deviation plots for Viola-Jones video inference. The deviation plots show the difference between the number of predicted flies and the true number of flies for each frame of video.

Comparison of CNN models to Viola-Jones

Figure 8.a. Contains a side by side comparison of video inference with Enriched Viola-Jones, Faster R-CNN and YOLOv4 object detection models. It is apparent that Faster R-CNN outperforms the Enriched Viola-Jones model - It can more consistently detect objects at boundaries of intensity, and distinguish individual flies from crowded clusters. This is reflected by the performance metrics in Figure 8.b. where Faster R-CNN has an absolute deviation more than three times smaler than the enriched Viola Jones method. It also predicts a mean fly count almost identical to the true count making it an ideal solution to the problem of object detection.

The YOLOv4 model also appears to improve upon the Enriched Viola-Jones framework - Figure 8.a. indicates it can more accurately predict flies in crowds and in the four corner wells. This is reflected in the the performance metrics where YOLOv4 has a significantly smaller absolute deviation than Viola-Jones and a mean fly count closer to the true fly count. However, the model also consistently predicts a high number of false positives in Figure 8.a. meaning these performance metrics may be misleading - if corrected for these false positives the mean fly count is likely to underpredict the true number of flies and the Absolute Deviation will rise.

Figure 8.a. Side by side comparison of video inference for Enriched Viola-Jones, Faster R-CNN and YOLOv4. Though the video output for YOLOv4 displays the DeepSORT tracking IDs for each bounding box, these detections are created only through YOLOv4 video inference.

Figure 8.b. Performance metrics for Viola-Jones, Faster R-CNN and YOLOv4

Model	Absolute Deviation	Mean Fly Count (True count = 41)
Viola-Jones Enriched	14757	37.503
Faster R-CNN	4704	40.763
YOLOv4	8535	40.257

Evaluation of Tracking Methods

Figure 9.a. indicates that both SORT models produce a more acurate tracking than DeepSort models with Faster R-CNN and Viola-Jones pipelines maintaining large mean track lengths with unique track numbers closer to the true fly count than with YOLOv4/DeepSORT. Figure 9.b. expands on this, indicating a gradual decrease in frequency with increased track length for both Faster R-CNN and Viola-Jones SORT models, but a much steeper decline with DeepSORT (with very few tracks of length > 200).

While the faster R-CNN + SORT pipeline may provide a more accurate tracking than the Viola-Jones + SORT pipleine, it has a drastically lower detection framerate preventing it from being deployed in a real-time tracking scenario

Figure 9.a. Performance metrics for Tracking. An accurate model will have a large mean track length and a number of unique tracks close to the number of flies in the footage. An efficient model will have a high framerate, reducing the computational time for tracking.

Model	Mean Track Length	Number of Unique Tracks (True count = 41)	Detection Framerate (FPS)
Viola-Jones Enriched & SORT	748.981	209	35.68
Faster R-CNN & SORT	949.427	178	2.49
YOLOv4 & DeepSORT	368.401	468	0.40

Figure 9.b. Track length frequency distributions. A well performing model with have a greater frequency of longer track lengths