triangulation (computer vision)
   HOME

TheInfoList



OR:

In
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
, triangulation refers to the process of determining a point in 3D space given its projections onto two, or more, images. In order to solve this problem it is necessary to know the parameters of the camera projection function from 3D to 2D for the cameras involved, in the simplest case represented by the camera matrices. Triangulation is sometimes also referred to as reconstruction or intersection. The triangulation problem is in principle trivial. Since each point in an image corresponds to a line in 3D space, all points on the line in 3D are projected to the point in the image. If a pair of corresponding points in two, or more images, can be found it must be the case that they are the projection of a common 3D point x. The set of lines generated by the image points must intersect at x (3D point) and the algebraic formulation of the coordinates of x (3D point) can be computed in a variety of ways, as is presented below. In practice, however, the coordinates of image points cannot be measured with arbitrary accuracy. Instead, various types of noise, such as geometric noise from lens distortion or interest point detection error, lead to inaccuracies in the measured image coordinates. As a consequence, the lines generated by the corresponding image points do not always intersect in 3D space. The problem, then, is to find a 3D point which optimally fits the measured image points. In the literature there are multiple proposals for how to define optimality and how to find the optimal 3D point. Since they are based on different optimality criteria, the various methods produce different estimates of the 3D point x when noise is involved.


Introduction

In the following, it is assumed that triangulation is made on corresponding image points from two views generated by pinhole cameras. Generalization from these assumptions are discussed
here Here is an adverb that means "in, on, or at this place". It may also refer to: Software * Here Technologies, a mapping company * Here WeGo (formerly Here Maps), a mobile app and map website by Here Television * Here TV (formerly "here!"), a TV ...
. The image to the left illustrates the
epipolar geometry Epipolar geometry is the geometry of stereo vision. When two cameras view a 3D scene from two distinct positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints b ...
of a pair of stereo cameras of pinhole model. A point x (3D point) in 3D space is projected onto the respective image plane along a line (green) which goes through the camera's focal point, \mathbf_ and \mathbf_ , resulting in the two corresponding image points \mathbf_ and \mathbf_ . If \mathbf_ and \mathbf_ are given and the geometry of the two cameras are known, the two projection lines (green lines) can be determined and it must be the case that they intersect at point x (3D point). Using basic
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...
that intersection point can be determined in a straightforward way. The image to the right shows the real case. The position of the image points \mathbf_ and \mathbf_ cannot be measured exactly. The reason is a combination of factors such as * Geometric distortion, for example lens distortion, which means that the 3D to 2D mapping of the camera deviates from the
pinhole camera model The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ''ideal'' pinhole camera, where the camera aperture is described as a poi ...
. To some extent these errors can be compensated for, leaving a residual geometric error. * A single ray of light from x (3D point) is dispersed in the lens system of the cameras according to a point spread function. The recovery of the corresponding image point from measurements of the dispersed intensity function in the images gives errors. * In a digital camera, the image intensity function is only measured in discrete sensor elements. Inexact interpolation of the discrete intensity function have to be used to recover the true one. * The image points y1' and y2' used for triangulation are often found using various types of feature extractors, for example of corners or interest points in general. There is an inherent localization error for any type of feature extraction based on
neighborhood operation In computer vision and image processing a neighborhood operation is a commonly used class of computations on image data which implies that it is processed according to the following pseudo code: Visit each point p in the image data and do Th ...
s. As a consequence, the measured image points are \mathbf'_ and \mathbf'_ instead of \mathbf_ and \mathbf_ . However, their projection lines (blue) do not have to intersect in 3D space or come close to x. In fact, these lines intersect if and only if \mathbf'_ and \mathbf'_ satisfy the
epipolar constraint Epipolar geometry is the geometry of stereo vision. When two cameras view a 3D scene from two distinct positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints b ...
defined by the fundamental matrix. Given the measurement noise in \mathbf'_ and \mathbf'_ it is rather likely that the epipolar constraint is not satisfied and the projection lines do not intersect. This observation leads to the problem which is solved in triangulation. Which 3D point xest is the best estimate of x given \mathbf'_ and \mathbf'_ and the geometry of the cameras? The answer is often found by defining an error measure which depends on xest and then minimizing this error. In the following sections, some of the various methods for computing xest presented in the literature are briefly described. All triangulation methods produce xest = x in the case that \mathbf_ = \mathbf'_ and \mathbf_ = \mathbf'_ , that is, when the epipolar constraint is satisfied (except for singular points, see below). It is what happens when the constraint is not satisfied which differs between the methods.


Properties

A triangulation method can be described in terms of a function \tau \, such that : \mathbf \sim \tau(\mathbf'_, \mathbf'_, \mathbf_, \mathbf_) where \mathbf'_, \mathbf'_ are the homogeneous coordinates of the detected image points and \mathbf_, \mathbf_ are the camera matrices. x (3D point) is the homogeneous representation of the resulting 3D point. The \sim \, sign implies that \tau \, is only required to produce a vector which is equal to x up to a multiplication by a non-zero scalar since homogeneous vectors are involved. Before looking at the specific methods, that is, specific functions \tau \, , there are some general concepts related to the methods that need to be explained. Which triangulation method is chosen for a particular problem depends to some extent on these characteristics.


Singularities

Some of the methods fail to correctly compute an estimate of x (3D point) if it lies in a certain subset of the 3D space, corresponding to some combination of \mathbf'_, \mathbf'_, \mathbf_, \mathbf_ . A point in this subset is then a ''singularity'' of the triangulation method. The reason for the failure can be that some equation system to be solved is under-determined or that the projective representation of xest becomes the zero vector for the singular points.


Invariance

In some applications, it is desirable that the triangulation is independent of the coordinate system used to represent 3D points; if the triangulation problem is formulated in one coordinate system and then transformed into another the resulting estimate xest should transform in the same way. This property is commonly referred to as ''invariance''. Not every triangulation method assures invariance, at least not for general types of coordinate transformations. For a homogeneous representation of 3D coordinates, the most general transformation is a projective transformation, represented by a 4 \times 4 matrix \mathbf . If the homogeneous coordinates are transformed according to : \mathbf \sim \mathbf \, \mathbf then the camera matrices must transform as (Ck) : \mathbf_ \sim \mathbf_ \, \mathbf^ to produce the same homogeneous image coordinates (yk) : \mathbf_ \sim \mathbf_ \, \mathbf = \mathbf_ \, \mathbf If the triangulation function \tau is invariant to \mathbf then the following relation must be valid : \mathbf_ \sim \mathbf \, \mathbf_ from which follows that : \tau(\mathbf'_, \mathbf'_, \mathbf_, \mathbf_) \sim \mathbf^ \, \tau(\mathbf'_, \mathbf'_, \mathbf_ \, \mathbf^, \mathbf_ \, \mathbf^),   for all \mathbf'_, \mathbf'_ For each triangulation method, it can be determined if this last relation is valid. If it is, it may be satisfied only for a subset of the projective transformations, for example, rigid or affine transformations.


Computational complexity

The function \tau is only an abstract representation of a computation which, in practice, may be relatively complex. Some methods result in a \tau which is a closed-form continuous function while others need to be decomposed into a series of computational steps involving, for example,
SVD ''Svenska Dagbladet'' (, "The Swedish Daily News"), abbreviated SvD, is a daily newspaper published in Stockholm, Sweden. History and profile The first issue of ''Svenska Dagbladet'' appeared on 18 December 1884. During the beginning of the ...
or finding the roots of a polynomial. Yet another class of methods results in \tau which must rely on iterative estimation of some parameters. This means that both the computation time and the complexity of the operations involved may vary between the different methods.


Methods


Mid-point method

Each of the two image points \mathbf'_ and \mathbf'_ has a corresponding projection line (blue in the right image above), here denoted as \mathbf'_ and \mathbf'_ , which can be determined given the camera matrices \mathbf_, \mathbf_ . Let d\, be a distance function between a (3D line) L and a x (3D point) such that d(\mathbf, \mathbf) is the Euclidean distance between \mathbf and \mathbf . The ''midpoint method'' finds the point xest which minimizes : d(\mathbf'_, \mathbf)^ + d(\mathbf'_, \mathbf)^ It turns out that xest lies exactly at the middle of the shortest line segment which joins the two projection lines.


Direct linear transformation


Via the essential matrix

The problem to be solved there is how to compute (x_, x_, x_) given corresponding normalized image coordinates (y_, y_) and (y'_, y'_) . If the
essential matrix In computer vision, the essential matrix is a 3 \times 3 matrix, \mathbf that relates corresponding points in stereo images assuming that the cameras satisfy the pinhole camera model. Function More specifically, if \mathbf and \mathbf' ar ...
is known and the corresponding rotation and translation transformations have been determined, this algorithm (described in Longuet-Higgins' paper) provides a solution. Let \mathbf_ denote row ''k'' of the rotation matrix \mathbf : : \mathbf = \begin - \mathbf_ - \\ - \mathbf_ - \\ - \mathbf_ - \end Combining the above relations between 3D coordinates in the two coordinate systems and the mapping between 3D and 2D points described earlier gives : y'_ = \frac = \frac = \frac or :x_ = \frac Once x_ is determined, the other two coordinates can be computed as : \begin x_1 \\ x_2 \end = x_3 \begin y_1 \\ y_2 \end The above derivation is not unique. It is also possible to start with an expression for y'_ and derive an expression for x_ according to :x_ = \frac In the ideal case, when the camera maps the 3D points according to a perfect pinhole camera and the resulting 2D points can be detected without any noise, the two expressions for x_ are equal. In practice, however, they are not and it may be advantageous to combine the two estimates of x_ , for example, in terms of some sort of average. There are also other types of extensions of the above computations which are possible. They started with an expression of the primed image coordinates and derived 3D coordinates in the unprimed system. It is also possible to start with unprimed image coordinates and obtain primed 3D coordinates, which finally can be transformed into unprimed 3D coordinates. Again, in the ideal case the result should be equal to the above expressions, but in practice they may deviate. A final remark relates to the fact that if the essential matrix is determined from corresponding image coordinate, which often is the case when 3D points are determined in this way, the translation vector \mathbf is known only up to an unknown positive scaling. As a consequence, the reconstructed 3D points, too, are undetermined with respect to a positive scaling.


See also

*
Bundle adjustment In photogrammetry and computer stereo vision, bundle adjustment is simultaneous refining of the 3D coordinates describing the scene geometry, the parameters of the relative motion, and the optical characteristics of the camera(s) employed to acq ...
*
Line–line intersection In Euclidean geometry, the intersection of a line and a line can be the empty set, a point (geometry), point, or another Line (geometry), line. Distinguishing these cases and finding the Intersection (Euclidean geometry), intersection have uses, ...


References

* {{cite book , author=Richard Hartley and Andrew Zisserman , title=Multiple View Geometry in computer vision , publisher=Cambridge University Press , year=2003 , isbn=978-0-521-54051-3


External links


Two view
an
multi-view
triangulation in Matlab Geometry in computer vision Stereophotogrammetry