3D single-object recognition in photographs
The method of recognizing a 3D object depends on the properties of an object. For simplicity, many existing algorithms have focused on recognizing rigid objects consisting of a single part, that is, objects whose spatial transformation is aPattern recognition approaches
These methods use appearance information gathered from pre-captured or pre-computed projections of an object to match the object in the potentially cluttered scene. However, they do not take the 3D geometric constraints of the object into consideration during matching, and typically also do not handle occlusion as well as feature-based approaches. See urase and Nayar 1995and elinger and Nelson 1999Feature-based geometric approaches
Feature-based approaches work well for objects which have distinctive features. Thus far, objects which have good edge features or blob features have been successfully recognized; for example detection algorithms, see Harris affine region detector and SIFT, respectively. Due to lack of the appropriate feature detectors, objects without textured, smooth surfaces cannot currently be handled by this approach. Feature-based object recognizers generally work by pre-capturing a number of fixed views of the object to be recognized, extracting features from these views, and then in the recognition process, matching these features to the scene and enforcing geometric constraints. As an example of a prototypical system taking this approach, we will present an outline of the method used by othganger et al. 2004 with some detail elided. The method starts by assuming that objects undergo globally rigid transformations. Because smooth surfaces are locally planar, affine invariant features are appropriate for matching: the paper detects ellipse-shaped regions of interest using both edge-like and blob-like features, and as per owe 2004 finds the dominant gradient direction of the ellipse, converts the ellipse into a parallelogram, and takes a SIFT descriptor on the resulting parallelogram. Color information is used also to improve discrimination over SIFT features alone. Next, given a number of camera views of the object (24 in the paper), the method constructs a 3D model for the object, containing the 3D spatial position and orientation of each feature. Because the number of views of the object is large, typically each feature is present in several adjacent views. The center points of such matching features correspond, and detected features are aligned along the dominant gradient direction, so the points at (1, 0) in the local coordinate system of the feature parallelogram also correspond, as do the points (0, 1) in the parallelogram's local coordinates. Thus for every pair of matching features in nearby views, three point pair correspondences are known. Given at least two matching features, a multi-view affineReferences
* Murase, H. and S. K. Nayar: 1995, ''Visual Learning and Recognition of 3-D Objects from Appearance''. International Journal of Computer Vision 14, 5–24See also
*