In the fields of
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
and
computer vision
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
, pose (or spatial pose) represents the
position
Position often refers to:
* Position (geometry), the spatial location (rather than orientation) of an entity
* Position, a job or occupation
Position may also refer to:
Games and recreation
* Position (poker), location relative to the dealer
* ...
and
orientation
Orientation may refer to:
Positioning in physical space
* Map orientation, the relationship between directions on a map and compass directions
* Orientation (housing), the position of a building with respect to the sun, a concept in building de ...
of an object, usually in
three dimensions
Three-dimensional space (also: 3D space, 3-space or, rarely, tri-dimensional space) is a geometric setting in which three values (called ''parameters'') are required to determine the position of an element (i.e., point). This is the informa ...
. Poses are often stored internally as
transformation matrices. The term “pose” is largely synonymous with the term “transform”, but a transform may often include
scale, whereas pose does not.
In computer vision, the pose of an object is often estimated from camera input by the process of ''
pose estimation
3D pose estimation is a process of predicting the transformation of an object from a user-defined reference pose, given an image or a 3D scan. It arises in computer vision or robotics where the pose or transformation of an object can be used for ...
''. This information can then be used, for example, to allow a robot to manipulate an object or to avoid moving into the object based on its perceived position and orientation in the environment.
Pose estimation
The specific task of determining the pose of an object in an image (or stereo images, image sequence) is referred to as ''pose estimation''. The pose estimation problem can be solved in different ways depending on the image sensor configuration, and choice of methodology. Three classes of methodologies can be distinguished:
* Analytic or geometric methods: Given that the image sensor (camera) is calibrated and the mapping from 3D points in the scene and 2D points in the image is known. If also the geometry of the object is known, it means that the projected image of the object on the camera image is a well-known function of the object's pose. Once a set of control points on the object, typically corners or other feature points, has been identified, it is then possible to solve the pose transformation from a set of equations which relate the 3D coordinates of the points with their 2D image coordinates. Algorithms that determine the pose of a
point cloud
Point or points may refer to:
Places
* Point, Lewis, a peninsula in the Outer Hebrides, Scotland
* Point, Texas, a city in Rains County, Texas, United States
* Point, the NE tip and a ferry terminal of Lismore, Inner Hebrides, Scotland
* Poin ...
with respect to another point cloud are known as
point set registration
In computer vision, pattern recognition, and robotics, point-set registration, also known as point-cloud registration or scan matching, is the process of finding a spatial transformation (''e.g.,'' scaling, rotation and translation) that aligns t ...
algorithms, if the correspondences between points are not already known.
*
Genetic algorithm
In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to gene ...
methods: If the pose of an object does not have to be computed in real-time a
genetic algorithm
In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to gene ...
may be used. This approach is robust especially when the images are not perfectly calibrated. In this particular case, the pose represent the
genetic representation
In computer programming, genetic representation is a way of presenting solutions/individuals in evolutionary computation methods. Genetic representation can encode appearance, behavior, physical qualities of individuals. Designing a good genetic r ...
and the error between the projection of the object control points with the image is the
fitness function {{no footnotes, date=May 2015
A fitness function is a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims. Fitness functions are used in geneti ...
.
* Learning-based methods: These methods use artificial learning-based system which learn the mapping from 2D image features to pose transformation. In short, this means that a sufficiently large set of images of the object, in different poses, must be presented to the system during a learning phase. Once the learning phase is completed, the system should be able to present an estimate of the object's pose given an image of the object.
Camera pose
See also
*
Gesture recognition
Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. It is a subdiscipline of computer vision. Gestures can originate from any bodily motion or sta ...
*
Homography (computer vision)
*
Camera calibration
*
Structure from motion
Structure from motion (SfM) is a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. It is studied in the fields of computer visio ...
*
Essential matrix
In computer vision, the essential matrix is a 3 \times 3 matrix, \mathbf that relates corresponding points in stereo images assuming that the cameras satisfy the pinhole camera model.
Function
More specifically, if \mathbf and \mathbf' ...
and
Trifocal tensor
In computer vision, the trifocal tensor (also tritensor) is a 3×3×3 array of numbers (i.e., a tensor) that incorporates all projective geometric relationships among three views. It relates the coordinates of corresponding points or lines in thr ...
(relative pose)
References
{{Reflist
Computer vision
Geometry in computer vision
Robot control