In
computer vision
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
and
computer graphics
Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great de ...
, 3D reconstruction is the process of capturing the shape and appearance of real objects.
This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as
non-rigid or spatio-temporal reconstruction.
Motivation and applications
The research of 3D reconstruction has always been a difficult goal. By Using 3D reconstruction one can determine any object's 3D profile, as well as knowing the 3D coordinate of any point on the profile. The 3D reconstruction of objects is a generally scientific problem and core technology of a wide variety of fields, such as Computer Aided Geometric Design (
CAGD
Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve c ...
),
computer graphics
Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great de ...
,
computer animation
Computer animation is the process used for digitally generating animations. The more general term computer-generated imagery (CGI) encompasses both static scenes (still images) and dynamic images (moving images), while computer animation refe ...
,
computer vision
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
,
medical imaging
Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to rev ...
,
computational science
Computational science, also known as scientific computing or scientific computation (SC), is a field in mathematics that uses advanced computing capabilities to understand and solve complex problems. It is an area of science that spans many disc ...
,
virtual reality
Virtual reality (VR) is a simulated experience that employs pose tracking and 3D near-eye displays to give the user an immersive feel of a virtual world. Applications of virtual reality include entertainment (particularly video games), educ ...
,
digital media
Digital media is any communication media that operate in conjunction with various encoded machine-readable data formats. Digital media can be created, viewed, distributed, modified, listened to, and preserved on a digital electronics device. ' ...
, etc. For instance, the lesion information of the patients can be presented in 3D on the computer, which offers a new and accurate approach in diagnosis and thus has vital clinical value.
Digital elevation model
A digital elevation model (DEM) or digital surface model (DSM) is a 3D computer graphics representation of elevation data to represent terrain or overlaying objects, commonly of a planet, moon, or asteroid. A "global DEM" refers to a discrete gl ...
s can be reconstructed using methods such as airborne laser altimetry or
synthetic aperture radar
Synthetic-aperture radar (SAR) is a form of radar that is used to create two-dimensional images or three-dimensional reconstructions of objects, such as landscapes. SAR uses the motion of the radar antenna over a target region to provide fine ...
.
Active methods
Active methods, i.e. range data methods, given the
depth map
In 3D computer graphics and computer vision, a depth map is an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint. The term is related (and may be analogous) to ''depth ...
, reconstruct the 3D profile by
numerical approximation
Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic computation, symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of ...
approach and build the object in scenario based on model. These methods actively interfere with the reconstructed object, either mechanically or radiometrically using
rangefinder
A rangefinder (also rangefinding telemeter, depending on the context) is a device used to measure distances to remote objects. Originally optical devices used in surveying, they soon found applications in other fields, such as photography an ...
s, in order to acquire the depth map, e.g.
structured light
A structured light pattern designed for surface inspection
An Automatix Seamtracker arc welding robot equipped with a camera and structured laser light source, enabling the robot to follow a welding seam automatically
Structured light is the p ...
, laser range finder and other active sensing techniques. A simple example of a mechanical method would use a depth gauge to measure a distance to a rotating object put on a turntable. More applicable
radiometric
Radiometry is a set of techniques for measuring electromagnetic radiation, including visible light. Radiometric techniques in optics characterize the distribution of the radiation's power in space, as opposed to photometric techniques, which cha ...
methods emit
radiance
In radiometry, radiance is the radiant flux emitted, reflected, transmitted or received by a given surface, per unit solid angle per unit projected area. Radiance is used to characterize diffuse emission and reflection of electromagnetic radiatio ...
towards the object and then measure its reflected part. Examples range from moving light sources, colored visible light,
time-of-flight
Time of flight (ToF) is the measurement of the time taken by an object, particle or wave (be it acoustic, electromagnetic, etc.) to travel a distance through a medium. This information can then be used to measure velocity or path length, or as a w ...
lasers
to
microwaves
Microwave is a form of electromagnetic radiation with wavelengths ranging from about one meter to one millimeter corresponding to frequencies between 300 MHz and 300 GHz respectively. Different sources define different frequency rang ...
or
3D ultrasound
3D ultrasound is a medical ultrasound technique, often used in fetal, cardiac, trans-rectal and intra-vascular applications. 3D ultrasound refers specifically to the volume rendering of ultrasound data. When involving a series of 3D volumes collec ...
. See
3D scanning
3D scanning is the process of analyzing a real-world object or environment to collect data on its shape and possibly its appearance (e.g. color). The collected data can then be used to construct digital 3D models.
A 3D scanner can be based on ...
for more details.
Passive methods
Passive methods of 3D reconstruction do not interfere with the reconstructed object; they only use a sensor to measure the radiance reflected or emitted by the object's surface to infer its 3D structure through
image understanding
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human ...
. Typically, the
sensor
A sensor is a device that produces an output signal for the purpose of sensing a physical phenomenon.
In the broadest definition, a sensor is a device, module, machine, or subsystem that detects events or changes in its environment and sends ...
is an image sensor in a camera sensitive to visible light and the input to the method is a set of
digital images
A digital image is an image composed of picture elements, also known as ''pixels'', each with ''finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions f ...
(one, two or more) or video. In this case we talk about image-based reconstruction and the output is a
3D model
In 3D computer graphics, 3D modeling is the process of developing a mathematical coordinate-based representation of any surface of an object (inanimate or living) in three dimensions via specialized software by manipulating edges, vertices, an ...
. By comparison to active methods, passive methods can be applied to a wider range of situations.
Monocular cues methods
Monocular cues In human species
Monocular vision vision is known as seeing and using only one eye in the human species. Depth perception in monocular vision is reduced compared to binocular vision, but still is active primarily due to accommodation of the eye a ...
methods refer to using one or more images from one viewpoint (camera) to proceed to 3D construction. It makes use of 2D characteristics(e.g. Silhouettes, shading and texture) to measure 3D shape, and that's why it is also named Shape-From-X, where X can be
silhouettes
A silhouette ( , ) is the image of a person, animal, object or scene represented as a solid shape of a single colour, usually black, with its edges matching the outline of the subject. The interior of a silhouette is featureless, and the silhou ...
,
shading
Shading refers to the depiction of depth perception in 3D models (within the field of 3D computer graphics) or illustrations (in visual art) by varying the level of darkness. Shading tries to approximate local behavior of light on the object's ...
, texture etc. 3D reconstruction through monocular cues is simple and quick, and only one appropriate digital image is needed thus only one camera is adequate. Technically, it avoids
stereo correspondence
The correspondence problem refers to the problem of ascertaining which parts of one image correspond to which parts of another image, where differences are due to movement of the camera, the elapse of time, and/or movement of objects in the photo ...
, which is fairly complex.
Shape-from-shading Due to the analysis of the shade information in the image, by using
Lambertian reflectance
Lambertian reflectance is the property that defines an ideal "matte" or diffusely reflecting surface. The apparent brightness of a Lambertian surface to an observer is the same regardless of the observer's angle of view. More technically, the su ...
, the depth of
normal Normal(s) or The Normal(s) may refer to:
Film and television
* ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson
* ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie
* ''Norma ...
information of the object surface is restored to reconstruct.
Photometric Stereo
Photometric stereo is a technique in computer vision for estimating the surface normals of objects by observing that object under different lighting conditions. It is based on the fact that the amount of light reflected by a surface is dependent ...
This approach is more sophisticated than the shape-of-shading method. Images taken in different lighting conditions are used to solve the depth information. It is worth mentioning that more than one image is required by this approach.
Shape-from-texture Suppose such an object with smooth surface covered by replicated texture units, and its projection from 3D to 2D causes
distortion
In signal processing, distortion is the alteration of the original shape (or other characteristic) of a signal. In communications and electronics it means the alteration of the waveform of an information-bearing signal, such as an audio signal ...
and
perspective. Distortion and perspective measured in 2D images provide the hint for inversely solving depth of normal information of the object surface.
Stereo vision
Stereo vision obtains the 3-dimensional geometric information of an object from multiple images based on the research of human
visual system
The visual system comprises the sensory organ (the eye) and parts of the central nervous system (the retina containing photoreceptor cells, the optic nerve, the optic tract and the visual cortex) which gives organisms the sense of sight (the a ...
. The results are presented in form of depth maps. Images of an object acquired by
two cameras simultaneously in different
viewing angle
In display technology parlance, viewing angle is the angle at which a display can be viewed with acceptable visual performance. In a technical context, the angular range is called viewing cone defined by a multitude of viewing directions. The vie ...
s, or by one single camera at different time in different viewing angles, are used to restore its 3D geometric information and reconstruct its 3D profile and location. This is more direct than Monocular methods such as shape-from-shading.
Binocular stereo vision method requires two identical cameras with parallel
optical axis
An optical axis is a line along which there is some degree of rotational symmetry in an optical system such as a camera lens, microscope or telescopic sight.
The optical axis is an imaginary line that defines the path along which light propagat ...
to observe one same object, acquiring two images from different points of view. In terms of trigonometry relations, depth information can be calculated from disparity. Binocular stereo vision method is well developed and stably contributes to favorable 3D reconstruction, leading to a better performance when compared to other 3D construction. Unfortunately, it is computationally intensive, besides it performs rather poorly when baseline distance is large.
Problem statement and basics
The approach of using Binocular
stereo vision
Stereopsis () is the component of depth perception retrieved through binocular vision.
Stereopsis is not the only contributor to depth perception, but it is a major one. Binocular vision happens because each eye receives a different image becaus ...
to acquire object's 3D geometric information is on the basis of visual
disparity
Disparity and disparities may refer to:
in healthcare:
* Health disparities
in finance:
* Income disparity between females and males.
**Male–female income disparity in the United States
**Income gender gap
* Economic inequality
* Income inequ ...
. The following picture provides a simple schematic diagram of horizontally sighted Binocular Stereo Vision, where b is the baseline between projective centers of two cameras.
The origin of the camera's coordinate system is at the optical center of the camera's lens as shown in the figure. Actually, the camera's image plane is behind the
optical center
In Gaussian optics, the cardinal points consist of three pairs of points located on the optical axis of a rotationally symmetric, focal, optical system. These are the '' focal points'', the principal points, and the nodal points. For ''ideal'' ...
of the camera's lens. However, to simplify the calculation, images are drawn in front of the optical center of the lens by f. The u-axis and v-axis of the image's coordinate system
are in the same direction with x-axis and y-axis of the camera's coordinate system respectively. The origin of the image's coordinate system is located on the intersection of imaging plane and the optical axis. Suppose such world point
whose corresponding image points are
and
respectively on the left and right image plane. Assume two cameras are in the same plane, then y-coordinates of
and
are identical, i.e.,
. According to
trigonometry
Trigonometry () is a branch of mathematics that studies relationships between side lengths and angles of triangles. The field emerged in the Hellenistic world during the 3rd century BC from applications of geometry to astronomical studies. T ...
relations,
where
are coordinates of
in the left camera's coordinate system,
is
focal length
The focal length of an optical system is a measure of how strongly the system converges or diverges light; it is the inverse of the system's optical power. A positive focal length indicates that a system converges light, while a negative foca ...
of the camera.
Visual disparity is defined as the difference in image point location of a certain world point acquired by two cameras,
based on which the coordinates of
can be worked out.
Therefore, once the coordinates of image points is known, besides the parameters of two cameras, the 3D coordinate of the point can be determined.
The 3D reconstruction consists of the following sections:
Image acquisition
2D digital image acquisition is the information source of 3D reconstruction. Commonly used 3D reconstruction is based on two or more images, although it may employ only one image in some cases. There are various types of methods for image acquisition that depends on the occasions and purposes of the specific application. Not only the requirements of the application must be met, but also the visual disparity, illumination, performance of camera and the feature of scenario should be considered.
Camera calibration
Camera calibration in Binocular Stereo Vision refers to the determination of the mapping relationship between the image points
and
, and space coordinate
in the 3D scenario. Camera calibration is a basic and essential part in 3D reconstruction via Binocular Stereo Vision.
Feature extraction
The aim of feature extraction is to gain the characteristics of the images, through which the stereo correspondence processes. As a result, the characteristics of the images closely link to the choice of matching methods. There is no such universally applicable theory for features extraction, leading to a great diversity of stereo correspondence in Binocular Stereo Vision research.
Stereo correspondence
Stereo correspondence is to establish the correspondence between primitive factors in images, i.e. to match
and
from two images. Certain interference factors in the scenario should be noticed, e.g. illumination, noise, surface physical characteristic, etc.
Restoration
According to precise correspondence, combined with camera location parameters, 3D geometric information can be recovered without difficulties. Due to the fact that accuracy of 3D reconstruction depends on the precision of correspondence, error of camera location parameters and so on, the previous procedures must be done carefully to achieve relatively accurate 3D reconstruction.
3D Reconstruction of medical images
Clinical routine of diagnosis, patient follow-up, computer assisted surgery, surgical planning etc. are facilitated by accurate 3D models of the desired part of human anatomy. Main motivation behind 3D reconstruction includes
* Improved accuracy due to multi view aggregation.
* Detailed surface estimates.
* Can be used to plan, simulate, guide, or otherwise assist a surgeon in performing a medical procedure.
* The precise position and orientation of the patient's anatomy can be determined.
* Helps in a number of clinical areas, such as radiotherapy planning and treatment verification, spinal surgery, hip replacement, neurointerventions and aortic stenting.
Applications:
3D reconstruction has applications in many fields. They are:
* Pavement engineering
*Medicine
*
Free-viewpoint video reconstruction
*
Robotic mapping
Robotic mapping is a discipline related to computer vision and cartography. The goal for an autonomous robot is to be able to construct (or use) a map (outdoor use) or floor plan (indoor use) and to localize itself and its recharging bases or be ...
* City planning
*
Tomographic reconstruction
Tomographic reconstruction is a type of multidimensional inverse problem where the challenge is to yield an estimate of a specific system from a finite number of projections. The mathematical basis for tomographic imaging was laid down by Johann ...
* Gaming
[Mortara, Michela, et al.]
Learning cultural heritage by serious games
" Journal of Cultural Heritage 15.3 (2014): 318-325.
*
Virtual environment
A virtual environment is a networked application that allows a user to interact with both the computing environment and the work of other users. Email
Electronic mail (email or e-mail) is a method of exchanging messages ("mail") betwee ...
s and
virtual tourism
A virtual tour is a simulation of an existing location, usually composed of a sequence of videos, still images or 360-degree images. It may also use other multimedia elements such as sound effects, music, narration, text and floor map. It is dis ...
*
Earth observation
Earth observation (EO) is the gathering of information about the physical, chemical, and biological systems of the planet Earth. It can be performed via remote-sensing technologies (Earth observation satellites) or through direct-contact sensors ...
* Archaeology
*
Augmented reality
Augmented reality (AR) is an interactive experience that combines the real world and computer-generated content. The content can span multiple sensory modalities, including visual, auditory, haptic, somatosensory and olfactory. AR can be de ...
* Reverse engineering
*
Motion capture
Motion capture (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robo ...
*
3D object recognition {{FeatureDetectionCompVisNavbox
In computer vision, 3D object recognition involves recognizing and determining 3D information, such as the pose, volume, or shape, of user-chosen 3D objects in a photograph or range scan. Typically, an example of ...
,
gesture recognition
Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. It is a subdiscipline of computer vision. Gestures can originate from any bodily motion or sta ...
and
hand tracking
In the field of gesture recognition and image processing, finger tracking is a high-resolution technique developed in 1969 that is employed to know the consecutive position of the fingers of the user and hence represent objects in 3D.
In addit ...
Problem Statement:
Mostly algorithms available for 3D reconstruction are extremely slow and cannot be used in real-time. Though the algorithms presented are still in infancy but they have the potential for fast computation.
Existing Approaches:
Delaunay and alpha-shapes
* Delaunay method involves extraction of tetrahedron surfaces from initial point cloud. The idea of ‘shape’ for a set of points in space is given by concept of alpha-shapes. Given a finite point set S, and the real parameter alpha, the alpha-shape of S is a polytope (the generalization to any dimension of a two dimensional polygon and a three-dimensional polyhedron) which is neither convex nor necessarily connected.
For a large value, the alpha-shape is identical to the convex-hull of S. The algorithm proposed by Edelsbrunner and Mucke
eliminates all tetrahedrons which are delimited by a surrounding sphere smaller than α. The surface is then obtained with the external triangles from the resulting tetrahedron.
* Another algorithm called Tight Cocone
labels the initial tetrahedrons as interior and exterior. The triangles found in and out generate the resulting surface.
Both methods have been recently extended for reconstructing point clouds with noise.
In this method the quality of points determines the feasibility of the method. For precise triangulation since we are using the whole point cloud set, the points on the surface with the error above the threshold will be explicitly represented on reconstructed geometry.
Zero set Methods
Reconstruction of the surface is performed using a distance function which assigns to each point in the space a signed distance to the surface ''S''. A contour algorithm is used to extracting a zero-set which is used to obtain polygonal representation of the object. Thus, the problem of reconstructing a surface from a disorganized point cloud is reduced to the definition of the appropriate function ''f'' with a zero value for the sampled points and different to zero value for the rest. An algorithm called
marching cubes
Marching cubes is a computer graphics algorithm, published in the 1987 SIGGRAPH proceedings by Lorensen and Cline, for extracting a polygonal mesh of an isosurface from a three-dimensional discrete scalar field (the elements of which are sometime ...
established the use of such methods. There are different variants for given algorithm, some use a discrete function ''f'', while other use a polyharmonic radial basis function is used to adjust the initial point set. Functions like Moving Least Squares, basic functions with local support, based on the Poisson equation have also been used. Loss of the geometry precision in areas with extreme curvature, i.e., corners, edges is one of the main issues encountered. Furthermore, pretreatment of information, by applying some kind of filtering technique, also affects the definition of the corners by softening them. There are several studies related to post-processing techniques used in the reconstruction for the
detection and refinement of corners but these methods increase the complexity of the solution.
VR Technique
Entire volume transparence of the object is visualized using VR technique. Images will be performed by projecting rays through volume data. Along each ray, opacity and color need to be calculated at every voxel. Then information calculated along each ray will to be aggregated to a pixel on image plane. This technique helps us to see comprehensively an entire compact structure of the object. Since the technique needs enormous amount of calculations, which requires strong configuration computers is appropriate for low contrast data. Two main methods for rays projecting can be considered as follows:
* Object-order method: Projecting rays go through volume from back to front (from volume to image plane).
* Image-order or ray-casting method: Projecting rays go through volume from front to back (from image plane to volume).There exists some other methods to composite image, appropriate methods depending on the user's purposes. Some usual methods in medical image are
MIP (maximum intensity projection), MinIP (minimum intensity projection), AC (
alpha compositing
In computer graphics, alpha compositing or alpha blending is the process of combining one image with a background to create the appearance of partial or full transparency. It is often useful to render picture elements (pixels) in separate pas ...
) and
NPVR Network DVR (NDVR), or network personal video recorder (NPVR), or remote storage digital video recorder (RS-DVR) is a network-based digital video recorder (DVR) stored at the provider's central location rather than at the consumer's private home. T ...
(non-photorealistic
volume rendering
In scientific visualization and computer graphics, volume rendering is a set of techniques used to display a 2D projection of a 3D discretely sampled data set, typically a 3D scalar field.
A typical 3D data set is a group of 2D slice images ...
).
Voxel Grid
In this filtering technique input space is sampled using a grid of 3D voxels to reduce the number of points.
For each
voxel
In 3D computer graphics, a voxel represents a value on a regular grid in three-dimensional space. As with pixels in a 2D bitmap, voxels themselves do not typically have their position (i.e. coordinates) explicitly encoded with their values. Ins ...
, a centroid is chosen as the representative of all points. There are two approaches, the selection of the voxel centroid or select the centroid of the points lying within the voxel. To obtain internal points average has a higher computational cost, but offers better results. Thus, a subset of the input space is obtained that roughly represents the underlying surface. The Voxel Grid method presents the same problems as other filtering techniques: impossibility of defining the final number of points that represent the surface, geometric information loss due to the reduction of the points inside a voxel and sensitivity to noisy input spaces.
See also
*
3D modeling
In 3D computer graphics, 3D modeling is the process of developing a mathematical coordinate-based representation of any surface of an object (inanimate or living) in three dimensions via specialized software by manipulating edges, vertices, an ...
*
3D data acquisition and object reconstruction
3D scanning is the process of analyzing a real-world object or environment to collect data on its shape and possibly its appearance (e.g. color). The collected data can then be used to construct digital 3D models.
A 3D scanner can be based on m ...
*
3D reconstruction from multiple images
3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.
The essence of an image is a projection from a 3D scene onto a 2D pla ...
*
3D scanner
3D scanning is the process of analyzing a real-world object or environment to collect data on its shape and possibly its appearance (e.g. color). The collected data can then be used to construct digital 3D modelling, 3D models.
A 3D scanner can ...
*
3D SEM surface reconstruction
*
4D reconstruction
*
Depth map
In 3D computer graphics and computer vision, a depth map is an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint. The term is related (and may be analogous) to ''depth ...
*
Kinect
Kinect is a line of motion sensing input devices produced by Microsoft and first released in 2010. The devices generally contain RGB cameras, and infrared projectors and detectors that map depth through either structured light or time of flig ...
*
Photogrammetry
Photogrammetry is the science and technology of obtaining reliable information about physical objects and the environment through the process of recording, measuring and interpreting photographic images and patterns of electromagnetic radiant ima ...
*
Stereoscopy
Stereoscopy (also called stereoscopics, or stereo imaging) is a technique for creating or enhancing the illusion of depth in an image by means of stereopsis for binocular vision. The word ''stereoscopy'' derives . Any stereoscopic image is ...
*
Structure from motion
Structure from motion (SfM) is a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. It is studied in the fields of computer visio ...
References
External links
Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks- Generate and reconstruct 3D shapes via modeling multi-view depth maps or silhouettes.
External links
* http://www.nature.com/subjects/3d-reconstruction#news-and-comment
* http://6.869.csail.mit.edu/fa13/lectures/lecture11shapefromX.pdf
* http://research.microsoft.com/apps/search/default.aspx?q=3d+reconstruction
* https://research.google.com/search.html#q=3D%20reconstruction
{{Computer vision
3D computer graphics
3D imaging
Computer vision