Object Detection

The Viola-Jones Object Detection Framework

The Viola-Jones object Detection Framework is an efficient supervised machine learning approach to object detection. It attempts to locate objects by searching for unique visual features. In the algorithm these features constitute large differences in pixel intensity corresponding to lines and edges.

The algorithm is particularly fast due to its implementation of an integral image to calculate features (see stage 1). Its speed allows it to be run in real time without the need for advanced hardware, making it an ideal candidate for object detection when applied to the Ethoscope platform[2].

Training the classifier

The classifier is trained on a bank of positive images (images containing the object) and negative images (images containing only the background). Training involves three different stages - click the headings to expand for more information[2].

1. Detecting distinguishing features in an object

Features are identified using a feature window - a scrolling window that passes over the image to test for regions of differing pixel intensity. Example two, three and four feature windows are shown below (Figure 1.a). For each window, the sum of pixel intensities within the white region are subtracted from those within the grey region.

The Viola-Jones algorithm generates all possible two, three and four feature windows and applies them to each of the positive and negative images. This requires a lot of calculations with the potential to become computationally hard to solve - a 24x24 image requires over 180,000 individual feature calculations. To simplify these calculations, the algorithm uses an Integral Image to pre-calculate subtractions.

The Integral Image is a transformation of the original image where the intensity value of each pixel is computed as the sum of intensity values above and to the left of it. I.e. for a pixel at location x,y:

Where ii is the integral image and i is the original.
‍
This vastly simplifies the process of finding the sum of intensities within a region since instead of needing to calculate a series of column sums, only the four pixels bounding the corners of a rectangular region need to be calculated (Figure 1.b) [2]

Figure 1.a. Example two, three and four feature windows. Windows A and B are used to detect the presence of vertical and horizontal edges. Window C is used to detect vertical lines while Window D is used to detect diagonal lines.

Figure 1.b. Demonstrating the efficiency of the integral image. The sum of intensity in area D can be found using only the corner pixels bounding the region in the integral image (positons 1, 2, 3 & 4).
‍

The Viola-Jones Object Detection Framework

Training the classifier

Applying the framework to a test image