computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human ...

a camera matrix or (camera) projection matrix is a

3 \times 4

matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** '' The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...

which describes the mapping of a

pinhole camera A pinhole camera is a simple camera without a lens but with a tiny aperture (the so-called '' pinhole'')—effectively a light-proof box with a small hole in one side. Light from a scene passes through the aperture and projects an inverted image ...

from 3D points in the world to 2D points in an image. Let

\mathbf

be a representation of a 3D point in

homogeneous coordinates In mathematics, homogeneous coordinates or projective coordinates, introduced by August Ferdinand Möbius in his 1827 work , are a system of coordinates used in projective geometry, just as Cartesian coordinates are used in Euclidean geometr ...

(a 4-dimensional vector), and let

\mathbf

be a representation of the image of this point in the pinhole camera (a 3-dimensional vector). Then the following relation holds :

\mathbf \sim \mathbf \, \mathbf

where

\mathbf

is the camera matrix and the

\, \sim

sign implies that the left and right hand sides are equal

except Exception, exceptions or expectional may refer to: *Exception (computer science), an anomalous condition during computation * State of exception, a concept of extension of sovereign power * Exceptional objects, in mathematics ** Exceptional isomor ...

for a multiplication by a non-zero scalar

k \neq 0

: :

\mathbf = k \, \mathbf \, \mathbf .

Since the camera matrix

\mathbf

is involved in the mapping between elements of two

projective space In mathematics, the concept of a projective space originated from the visual effect of perspective, where parallel lines seem to meet ''at infinity''. A projective space may thus be viewed as the extension of a Euclidean space, or, more generally ...

s, it too can be regarded as a projective element. This means that it has only 11 degrees of freedom since any multiplication by a non-zero scalar results in an equivalent camera matrix.

Derivation

The mapping from the coordinates of a 3D point P to the 2D image coordinates of the point's projection onto the image plane, according to the

pinhole camera model The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ''ideal'' pinhole camera, where the camera aperture is described as a p ...

, is given by :

\begin y_1 \\ y_2 \end = \frac \begin x_1 \\ x_2 \end

where

(x_1, x_2, x_3)

are the 3D coordinates of P relative to a camera centered coordinate system,

(y_1, y_2)

are the resulting image coordinates, and ''f'' is the camera's focal length for which we assume ''f'' > 0. Furthermore, we also assume that ''x₃ > 0''. To derive the camera matrix, the expression above is rewritten in terms of homogeneous coordinates. Instead of the 2D vector

(y_1,y_2)

we consider the projective element (a 3D vector)

\mathbf = (y_1,y_2,1)

and instead of equality we consider equality up to scaling by a non-zero number, denoted

\, \sim

. First, we write the homogeneous image coordinates as expressions in the usual 3D coordinates. :

\begin y_1 \\ y_2 \\ 1 \end = \begin \frac x_1 \\ \frac x_2 \\ 1 \end \sim \begin x_1 \\ x_2 \\ \frac \end

Finally, also the 3D coordinates are expressed in a homogeneous representation

\mathbf

and this is how the camera matrix appears: :

\begin y_1 \\ y_2 \\ 1 \end \sim \begin 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac & 0 \end \, \begin x_1 \\ x_2 \\ x_3 \\ 1 \end

\mathbf \sim \mathbf \, \mathbf

where

\mathbf

is the camera matrix, which here is given by :

\mathbf = \begin 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac & 0 \end

, and the corresponding camera matrix now becomes :

\mathbf = \begin 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac & 0 \end \sim \begin f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end

The last step is a consequence of

\mathbf

itself being a projective element. The camera matrix derived here may appear trivial in the sense that it contains very few non-zero elements. This depends to a large extent on the particular coordinate systems which have been chosen for the 3D and 2D points. In practice, however, other forms of camera matrices are common, as will be shown below.

Camera position

The camera matrix

\mathbf

derived in the previous section has a

null space In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the linear subspace of the domain of the map which is mapped to the zero vector. That is, given a linear map between two vector spaces and , the kern ...

which is spanned by the vector :

\mathbf = \begin 0 \\ 0 \\ 0 \\ 1 \end

This is also the homogeneous representation of the 3D point which has coordinates (0,0,0), that is, the "camera center" (aka the

entrance pupil In an optical system, the entrance pupil is the optical image of the physical aperture stop, as 'seen' through the front (the object side) of the lens system. The corresponding image of the aperture as seen through the back of the lens system ...

; the position of the pinhole of a

) is at O. This means that the camera center (and only this point) cannot be mapped to a point in the image plane by the camera (or equivalently, it maps to all points on the image as every ray on the image goes through this point). For any other 3D point with

x_3 = 0

, the result

\mathbf \sim\mathbf\,\mathbf

is well-defined and has the form

\mathbf = (y_1\,y_2\,0)^\top

. This corresponds to a point at infinity in the projective image plane (even though, if the image plane is taken to be a

Euclidean plane In mathematics, the Euclidean plane is a Euclidean space of dimension two. That is, a geometric setting in which two real quantities are required to determine the position of each point ( element of the plane), which includes affine notions ...

, no corresponding intersection point exists).

Normalized camera matrix and normalized image coordinates

The camera matrix derived above can be simplified even further if we assume that ''f = 1'': :

\mathbf_ = \begin 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end = \left ( \begin \mathbf & \mathbf \end \right )

where

\mathbf

here denotes a

3 \times 3

identity matrix. Note that

3 \times 4

matrix

\mathbf

here is divided into a concatenation of a

3 \times 3

matrix and a 3-dimensional vector. The camera matrix

\mathbf_

is sometimes referred to as a ''canonical form''. So far all points in the 3D world have been represented in a ''camera centered'' coordinate system, that is, a coordinate system which has its origin at the camera center (the location of the pinhole of a

). In practice however, the 3D points may be represented in terms of coordinates relative to an arbitrary coordinate system (X1', X2', X3'). Assuming that the camera coordinate axes (X1, X2, X3) and the axes (X1', X2', X3') are of Euclidean type (orthogonal and isotropic), there is a unique Euclidean 3D transformation (rotation and translation) between the two coordinate systems. In other words, the camera is not necessarily at the origin looking along the ''z'' axis. The two operations of rotation and translation of 3D coordinates can be represented as the two

4 \times 4

matrices :

\left ( \begin \mathbf & \mathbf \\ \hline \mathbf & 1 \end \right )

and

\left ( \begin \mathbf & \mathbf \\ \hline \mathbf & 1 \end \right )

where

\mathbf

is a

3 \times 3

rotation matrix In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix :R = \begin \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \ ...

and

\mathbf

is a 3-dimensional translation vector. When the first matrix is multiplied onto the homogeneous representation of a 3D point, the result is the homogeneous representation of the rotated point, and the second matrix performs instead a translation. Performing the two operations in sequence, i.e. first the rotation and then the translation (with translation vector given in the already rotated coordinate system), gives a combined rotation and translation matrix :

\left ( \begin \mathbf & \mathbf \\ \hline \mathbf & 1 \end \right )

Assuming that

\mathbf

and

\mathbf

are precisely the rotation and translations which relate the two coordinate system (X1,X2,X3) and (X1',X2',X3') above, this implies that :

\mathbf = \left ( \begin \mathbf & \mathbf \\ \hline \mathbf & 1 \end \right ) \mathbf'

where

\mathbf'

is the homogeneous representation of the point P in the coordinate system (X1',X2',X3'). Assuming also that the camera matrix is given by

\mathbf_

, the mapping from the coordinates in the (X1,X2,X3) system to homogeneous image coordinates becomes :

\mathbf \sim \mathbf_ \, \mathbf = \left ( \begin \mathbf & \mathbf \end \right ) \, \left ( \begin \mathbf & \mathbf \\ \hline \mathbf & 1 \end \right ) \mathbf' = \left ( \begin \mathbf & \mathbf \end \right ) \, \mathbf'

Consequently, the camera matrix which relates points in the coordinate system (X1',X2',X3') to image coordinates is :

\mathbf_ = \left ( \begin \mathbf & \mathbf \end \right )

a concatenation of a 3D rotation matrix and a 3-dimensional translation vector. This type of camera matrix is referred to as a ''normalized camera matrix'', it assumes focal length = 1 and that image coordinates are measured in a coordinate system where the origin is located at the intersection between axis X3 and the image plane and has the same units as the 3D coordinate system. The resulting image coordinates are referred to as ''normalized image coordinates''.

The camera position

Again, the null space of the normalized camera matrix,

\mathbf_

described above, is spanned by the 4-dimensional vector :

\mathbf = \begin -\mathbf^ \, \mathbf \\ 1 \end  = \begin \tilde \\ 1 \end

This is also, again, the coordinates of the camera center, now relative to the (X1',X2',X3') system. This can be seen by applying first the rotation and then the translation to the 3-dimensional vector

\tilde

and the result is the homogeneous representation of 3D coordinates (0,0,0). This implies that the camera center (in its homogeneous representation) lies in the null space of the camera matrix, provided that it is represented in terms of 3D coordinates relative to the same coordinate system as the camera matrix refers to. The normalized camera matrix

\mathbf_

can now be written as :

\mathbf_ = \mathbf \, \left ( \begin \mathbf & \mathbf^ \, \mathbf \end \right ) = \mathbf \, \left ( \begin \mathbf & -\tilde \end \right )

where

\tilde

is the 3D coordinates of the camera relative to the (X1',X2',X3') system.

General camera matrix

Given the mapping produced by a normalized camera matrix, the resulting normalized image coordinates can be transformed by means of an arbitrary 2D

homography In projective geometry, a homography is an isomorphism of projective spaces, induced by an isomorphism of the vector spaces from which the projective spaces derive. It is a bijection that maps lines to lines, and thus a collineation. In gener ...

. This includes 2D translations and rotations as well as scaling (isotropic and anisotropic) but also general 2D perspective transformations. Such a transformation can be represented as a

3 \times 3

matrix

\mathbf

which maps the homogeneous normalized image coordinates

\mathbf

to the homogeneous transformed image coordinates

\mathbf'

: :

\mathbf' = \mathbf \, \mathbf

Inserting the above expression for the normalized image coordinates in terms of the 3D coordinates gives :

\mathbf' = \mathbf \, \mathbf_ \, \mathbf'

This produces the most general form of camera matrix :

\mathbf = \mathbf \, \mathbf_ = \mathbf \, \left ( \begin \mathbf & \mathbf \end \right )

References

* {{cite book , author=Richard Hartley and Andrew Zisserman , title=Multiple View Geometry in computer vision , publisher=Cambridge University Press, year=2003 , isbn=0-521-54051-8 Geometry in computer vision

Derivation

Camera position

Normalized camera matrix and normalized image coordinates

The camera position

General camera matrix

See also

References