# Difference between revisions of "MatchPoint"

(Really needs a rewrite) |
m (update some text formatting to use correct Wiki syntax, move example usage to usage section, correct some typographical errors) |
||

Line 7: | Line 7: | ||

If you want to use MatchPoint in the process of creating panoramic images now(when not all features are implemented) you will have to use autopano-sift also. | If you want to use MatchPoint in the process of creating panoramic images now(when not all features are implemented) you will have to use autopano-sift also. | ||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

== Command line usage == | == Command line usage == | ||

Line 26: | Line 14: | ||

Parameters: | Parameters: | ||

;-v: verbose output | ;-v: verbose output | ||

− | ;-t: generate keypoint file for | + | ;-t: generate keypoint file for MATLAB test suite (file name is generated using formula: image1.jpg.key) |

Arguments: | Arguments: | ||

;image1.jpg: Path to image to be analyzed. | ;image1.jpg: Path to image to be analyzed. | ||

;output.key: Output keypoint file. | ;output.key: Output keypoint file. | ||

+ | |||

+ | === Example Workflow === | ||

+ | |||

+ | # first extract features from the first image and output them to the image1.key: | ||

+ | MatchPoint image1.jpg image1.key | ||

+ | |||

+ | # for second image: | ||

+ | MatchPoint image2.jpg image2.key | ||

+ | |||

+ | # match features from the two generated files using autopano: | ||

+ | autopano project.pto image1.key image2.key | ||

+ | |||

+ | # open the project file in Hugin: | ||

+ | hugin project.pto | ||

+ | |||

== Algorithm description == | == Algorithm description == | ||

Line 36: | Line 39: | ||

Before detection process images are integrated. Each element-pixel (x,y) of integral image represents sum of pixels from (0,0) to (x,y) on initial image. This makes calculating sum of a region much faster. In addition, convolution at any scale has equal time complexity. | Before detection process images are integrated. Each element-pixel (x,y) of integral image represents sum of pixels from (0,0) to (x,y) on initial image. This makes calculating sum of a region much faster. In addition, convolution at any scale has equal time complexity. | ||

− | This part is | + | This part is necessary only when using box filters for detection. |

=== Feature detection === | === Feature detection === | ||

Line 42: | Line 45: | ||

Points are detected with Hessian Detector using box filters. Here is a description of detection process over time. This may not be entirely compatible flow with code's flow, because some parts are simultaneous(e.q. first two steps). | Points are detected with Hessian Detector using box filters. Here is a description of detection process over time. This may not be entirely compatible flow with code's flow, because some parts are simultaneous(e.q. first two steps). | ||

− | + | ==== Convolution of Pixels ==== | |

+ | |||

Convolution of a pixel at certain scale is carried out with approximation of Gauss 2D function - this is called box filter detection. Each kernel has 4 convolution filters(3 of them are unique - xy and yx filters are the same). The resulting value is then the determinant of Hessian matrix where elements represent convolution values of 4 filters. | Convolution of a pixel at certain scale is carried out with approximation of Gauss 2D function - this is called box filter detection. Each kernel has 4 convolution filters(3 of them are unique - xy and yx filters are the same). The resulting value is then the determinant of Hessian matrix where elements represent convolution values of 4 filters. | ||

− | Kernel sizes for convolution are:<br /> | + | Kernel sizes for convolution are:<br /> |

9,15,21,27, (1st octave)<br /> | 9,15,21,27, (1st octave)<br /> | ||

15,27,39,51, (2nd octave)<br /> | 15,27,39,51, (2nd octave)<br /> | ||

Line 54: | Line 58: | ||

* sliding window(implemented for convenience but needs debug): slower, more accurate but also more sensitive to noise | * sliding window(implemented for convenience but needs debug): slower, more accurate but also more sensitive to noise | ||

− | + | ==== Finding Maxima ==== | |

+ | |||

In order to achieve invariance to scaling, detection needs to find maxima of Hessian determinant across many scales. To handle this, octaves were introduced. Octaves are interpolation of determinants at various scales(usually two scales). MatchPoint offers detection at max 3 octaves(by setting a parameter). E.q. at first octave a point can be detected at scale 1.6(=((9+15/2)*1.2)/9 where 1.2 is initial scale and 9 is initial kernel size) or 3.2(=((21+27/2)*1.2)/9. Keypoint's main scale is then selected according to the highest value of Hessian determinant. | In order to achieve invariance to scaling, detection needs to find maxima of Hessian determinant across many scales. To handle this, octaves were introduced. Octaves are interpolation of determinants at various scales(usually two scales). MatchPoint offers detection at max 3 octaves(by setting a parameter). E.q. at first octave a point can be detected at scale 1.6(=((9+15/2)*1.2)/9 where 1.2 is initial scale and 9 is initial kernel size) or 3.2(=((21+27/2)*1.2)/9. Keypoint's main scale is then selected according to the highest value of Hessian determinant. | ||

− | + | ==== Selecting Points ==== | |

Next step is to select points with high values of their determinants(regardless of scale where points where detected) that represent invariant keypoints that will be used further processing. This is achieved using a fixed threshold which should be set low(because otherwise it may happen that no points would be detected). | Next step is to select points with high values of their determinants(regardless of scale where points where detected) that represent invariant keypoints that will be used further processing. This is achieved using a fixed threshold which should be set low(because otherwise it may happen that no points would be detected). | ||

Line 63: | Line 68: | ||

Then non-maxima suppression is applied(only points with local maxima of determinants are selected). | Then non-maxima suppression is applied(only points with local maxima of determinants are selected). | ||

− | At this point we have a list of points that can vary in length, because threshold from previous step is | + | At this point we have a list of points that can vary in length, because threshold from previous step is hard-coded. This can result in over 200.000 points for larger images and that is too much(regardless of image size we need the same amount of control points for all images - min 3 control point/overlapping images), so we need to strip down the list below 10.000 points (this number is set intuitively and it is based on consumed time). (Note: this work is progress). |

=== Feature description === | === Feature description === | ||

− | + | ==== Orientation Assignment ==== | |

+ | |||

Scale invariance is achieved by assigning main orientation of the interest point using special algorithm proposed by Herbert Bay. Efficiency of this algorithm has not been yet tested, therefore the executable of MatchPoint does not use any orientation assignment. | Scale invariance is achieved by assigning main orientation of the interest point using special algorithm proposed by Herbert Bay. Efficiency of this algorithm has not been yet tested, therefore the executable of MatchPoint does not use any orientation assignment. | ||

− | + | ==== Shape Context descriptors ==== | |

+ | |||

To each interest point a 36 element descriptor is assigned. Elements of this descriptors are organized according to Shape Context descriptor proposed by [http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sc_digits.html S. Belongie, J. Malik and J. Puzicha] and adapted by [http://www.robots.ox.ac.uk/~vgg/research/affine/index.html K. Mikolajczyk] for the purpose of finding control points. | To each interest point a 36 element descriptor is assigned. Elements of this descriptors are organized according to Shape Context descriptor proposed by [http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sc_digits.html S. Belongie, J. Malik and J. Puzicha] and adapted by [http://www.robots.ox.ac.uk/~vgg/research/affine/index.html K. Mikolajczyk] for the purpose of finding control points. | ||

Steps in creating descriptors: | Steps in creating descriptors: | ||

− | # | + | # Detect edges in the region around the interest point(region size depends on the scale at which point was detected). MatchPoint uses vigra's API(Canny edge detection) for this. |

− | # | + | # Based on relative location of edge elements create a histogram with 36 values. Use log-polar values. Edge point contribution to the histogram is based on it's magnitude and orientation. |

See reference papers for details. | See reference papers for details. | ||

− | ==References== | + | == References == |

* [http://www.vision.ee.ethz.ch/~surf/ Speeded Up Robust Features - SURF] | * [http://www.vision.ee.ethz.ch/~surf/ Speeded Up Robust Features - SURF] | ||

* [http://www.robots.ox.ac.uk/~vgg/research/affine Matlab suite for testing, papers for detection, descriptors and evaluation by K. Mikolajczyk] | * [http://www.robots.ox.ac.uk/~vgg/research/affine Matlab suite for testing, papers for detection, descriptors and evaluation by K. Mikolajczyk] |

## Latest revision as of 19:08, 19 December 2014

## Contents

## Introduction

MatchPoint is a next generation CP generator. The result of a GSoC2007 project SoC2007_project_Feature_Descriptor. Currently it offers only keypoint detection but not matching (matching was part of other GSoC 07 project that was not carried out) and can currently be used only as a replacement for generatekeys from autopano-sift or autopano-sift-C.

A current version of matchpoint can be obtained via the hugin project.

Goal of this MatchPoint is to create a complete control point suite that will be used by hugin as a replacement(or at least an alternative) to existing autopano suites.

If you want to use MatchPoint in the process of creating panoramic images now(when not all features are implemented) you will have to use autopano-sift also.

## Command line usage

Usage:

- MatchPoint [options] image1.jpg output.key

Parameters:

- -v
- verbose output
- -t
- generate keypoint file for MATLAB test suite (file name is generated using formula: image1.jpg.key)

Arguments:

- image1.jpg
- Path to image to be analyzed.
- output.key
- Output keypoint file.

### Example Workflow

# first extract features from the first image and output them to the image1.key: MatchPoint image1.jpg image1.key # for second image: MatchPoint image2.jpg image2.key # match features from the two generated files using autopano: autopano project.pto image1.key image2.key # open the project file in Hugin: hugin project.pto

## Algorithm description

### Integral images

Before detection process images are integrated. Each element-pixel (x,y) of integral image represents sum of pixels from (0,0) to (x,y) on initial image. This makes calculating sum of a region much faster. In addition, convolution at any scale has equal time complexity. This part is necessary only when using box filters for detection.

### Feature detection

Points are detected with Hessian Detector using box filters. Here is a description of detection process over time. This may not be entirely compatible flow with code's flow, because some parts are simultaneous(e.q. first two steps).

#### Convolution of Pixels

Convolution of a pixel at certain scale is carried out with approximation of Gauss 2D function - this is called box filter detection. Each kernel has 4 convolution filters(3 of them are unique - xy and yx filters are the same). The resulting value is then the determinant of Hessian matrix where elements represent convolution values of 4 filters.

Kernel sizes for convolution are:

9,15,21,27, (1st octave)

15,27,39,51, (2nd octave)

21,45,69,93 (3rd octave)

MatchPoint features also offers two ways of convolution:

- box filter: faster and preferable way
- sliding window(implemented for convenience but needs debug): slower, more accurate but also more sensitive to noise

#### Finding Maxima

In order to achieve invariance to scaling, detection needs to find maxima of Hessian determinant across many scales. To handle this, octaves were introduced. Octaves are interpolation of determinants at various scales(usually two scales). MatchPoint offers detection at max 3 octaves(by setting a parameter). E.q. at first octave a point can be detected at scale 1.6(=((9+15/2)*1.2)/9 where 1.2 is initial scale and 9 is initial kernel size) or 3.2(=((21+27/2)*1.2)/9. Keypoint's main scale is then selected according to the highest value of Hessian determinant.

#### Selecting Points

Next step is to select points with high values of their determinants(regardless of scale where points where detected) that represent invariant keypoints that will be used further processing. This is achieved using a fixed threshold which should be set low(because otherwise it may happen that no points would be detected).

Then non-maxima suppression is applied(only points with local maxima of determinants are selected).

At this point we have a list of points that can vary in length, because threshold from previous step is hard-coded. This can result in over 200.000 points for larger images and that is too much(regardless of image size we need the same amount of control points for all images - min 3 control point/overlapping images), so we need to strip down the list below 10.000 points (this number is set intuitively and it is based on consumed time). (Note: this work is progress).

### Feature description

#### Orientation Assignment

Scale invariance is achieved by assigning main orientation of the interest point using special algorithm proposed by Herbert Bay. Efficiency of this algorithm has not been yet tested, therefore the executable of MatchPoint does not use any orientation assignment.

#### Shape Context descriptors

To each interest point a 36 element descriptor is assigned. Elements of this descriptors are organized according to Shape Context descriptor proposed by S. Belongie, J. Malik and J. Puzicha and adapted by K. Mikolajczyk for the purpose of finding control points.

Steps in creating descriptors:

- Detect edges in the region around the interest point(region size depends on the scale at which point was detected). MatchPoint uses vigra's API(Canny edge detection) for this.
- Based on relative location of edge elements create a histogram with 36 values. Use log-polar values. Edge point contribution to the histogram is based on it's magnitude and orientation.

See reference papers for details.