Before detection process images are integrated. Each element-pixel (x,y) of integral image represents sum of pixels from (0,0) to (x,y) on initial image. This makes calculating sum of a region much faster. In addition, convolution at any scale has equal time complexity. This part is neccessary only when using box filters for detection.
Points are detected with Hessian Detector using box filters. Here is a description of detection process over time. This may not be entirely compatible flow with the algorithm's flow, because some parts are simultaneous(e.q. first two steps).
Convolution of pixels
Convolution of a pixel at certain scale is carried out with approximation of Gauss 2D function - this is called box filter detection. Each kernel has 4 convolution filters(3 of them are unique - xy and yx filters are the same). The resulting value is then the determinant of Hessian matrix where elements represent convolution values of 4 filters.
Kernel sizes for convolution are:
9,15,21,27, (1st octave)
15,27,39,51, (2nd octave)
21,45,69,93 (3rd octave)
MatchPoint features also offers two ways of convolution:
- box filter: faster and preferable way
- sliding window(implemented for convinience but needs debug): slower, more accurate but also more sensitive to noise
In order to achieve invariance to scaling, detection needs to find maxima of Hessian determinant across many scales. To handle this, octaves were introduced. Octaves are interpolation of determinants at various scales(usually two scales). MatchPoint offers detection at max 3 octaves(by setting a parameter). E.q. at first octave a point can be detected at scale 1.6(=((9+15/2)*1.2)/9 where 1.2 is initial scale and 9 is initial kernel size) or 3.2(=((21+27/2)*1.2)/9. Keypoint's main scale is then selected according to the highest value of Hessian determinant.
Next step is to select points with high values of their determinants(regardless of scale where points where detected) that represent invariant keypoints that will be used further processing. This is achieved using a fixed threshold which should be set low(because otherwise it may happen that no points would be detected).
Then non-maxima supression is applied(only points with local maxima of determinants are selected).
At this point we have a list of points that can vary in length, because threshold from previous step is hardcoded. This can result in over 200.000 points for larger images and that is too much(regardless of image size we need the same amount of control points for all images - min 3 control point/overlapping images), so we need to strip down the list below 10.000 points(this number is set intuitivelly and it is based on consumed time). (Note: this work is progress).
Scale invariance is achieved by assigning main orientation of the interest point using special algorithm proposed by Herbert Bay. Efficiency of this algorithm has not been yet tested, therefore the executable of MatchPoint does not use any orientation assignment.
Shape Context descriptors
To each interest point a 36 element descriptor is assigned. Elements of this descriptors are organized according to Shape Context descriptor proposed by ... and adapted by ... for the purpose of finding control points. Steps in creating point's descriptor:
- detect edges in the surrounding region. MatchPoint uses vigra's API(Canny edge detection) for this.
- based on relative location of edge elements create a histogram with 36 values. Use log-polar values. Edge point contribution to the histogram is based on it's magnitude and orientation.
See reference papers for details.