Egocentric vision
   HOME

TheInfoList



OR:

Egocentric vision or first-person vision is a sub-field of computer vision that entails analyzing images and videos captured by a wearable camera, which is typically worn on the head or on the chest and naturally approximates the visual field of the camera wearer. Consequently, visual data capture the part of the scene on which the user focuses to carry out the task at hand and offer a valuable perspective to understand the user's activities and their context in a naturalistic setting. The wearable camera looking forwards is often supplemented with a camera looking inward at the user's eye and able to measure a user's eye gaze, which is useful to reveal attention and to better understand the user's activity and intentions.


History

The idea of using a wearable camera to gather visual data from a first-person perspective dates back to the 70s, when Steve Mann invented "Digital Eye Glass", a device that, when worn, causes the human eye itself to effectively become both an electronic camera and a television display. Subsequently, wearable cameras were used for health-related applications in the context of Humanistic Intelligence and Wearable AI. Egocentric vision is best done from the point-of-eye, but may also be done by way of a neck-worn camera when eyeglasses would be in-the-way. This neck-worn variant was popularized by way of the
Microsoft SenseCam Microsoft's SenseCam is a lifelogging camera with fisheye lens and trigger sensors, such as accelerometers, heat sensing, and audio, invented by Lyndsay Williams, patent granted in 2009. Usually worn around the neck, Sensecam is used for the MyLif ...
in 2006 for experimental health research works.Doherty, A. R., Hodges, S. E., King, A. C., Smeaton, A. F., Berry, E., Moulin, C. J., ... & Foster, C. (2013)
Wearable cameras in health.
American Journal of Preventive Medicine, 44(3), 320-323.
The interest of the computer vision community into the egocentric paradigm has been arising slowly entering the 2010s and it is rapidly growing in recent years, boosted by both the impressive advanced in the field of
wearable technology Wearable technology is any technology that is designed to be used while worn. Common types of wearable technology include smartwatches and smartglasses. Wearable electronic devices are often close to or on the surface of the skin, where they detec ...
and by the increasing number of potential applications. The prototypical first-person vision system described by Kanade and Hebert, in 2012 is composed by three basic components: a localization component able to estimate the surrounding, a recognition component able to identify object and people, and an
activity recognition Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several c ...
component, able to provide information about the current activity of the user. Together, these three components provide a complete situational awareness of the user, which in turn can be used to provide assistance to the itself or to the caregiver. Following this idea, the first computational techniques for egocentric analysis focused on hand-related activity recognition and social interaction analysis.Fathi, A., Hodgins, J. K., & Rehg, J. M. (2012, June)
Social interactions: A first-person perspective.
In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 1226-1233). IEEE.
Also, given the unconstrained nature of the video and the huge amount of data generated, temporal segmentation and summarization where among the first problems addressed. After almost ten years of egocentric vision (2007 - 2017), the field is still undergoing diversification. Emerging research topics include: * Social saliency estimation * Multi-agent egocentric vision systems * Privacy preserving techniques and applications * Attention-based activity analysis * Social interaction analysis * Hand pose analysis * Ego graphical User Interfaces (EUI) * Understanding social dynamics and attention * Revisiting robotic vision and
machine vision Machine vision (MV) is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industry. Machine vision refers to ...
as egocentric sensing * Activity forecasting


Technical challenges

Today's wearable cameras are small and lightweight digital recording devices that can acquire images and videos automatically, without the user intervention, with different resolutions and frame rates, and from a first-person point of view. Therefore, wearable cameras are naturally primed to gather visual information from our everyday interactions since they offer an intimate perspective of the visual field of the camera wearer. Depending on the frame rate, it is common to distinguish between photo-cameras (also called lifelogging cameras) and video-cameras. * The former (e.g., Narrative Clip and
Microsoft SenseCam Microsoft's SenseCam is a lifelogging camera with fisheye lens and trigger sensors, such as accelerometers, heat sensing, and audio, invented by Lyndsay Williams, patent granted in 2009. Usually worn around the neck, Sensecam is used for the MyLif ...
), are commonly worn on the chest, and are characterized by a very low frame rate (up to 2fpm) that allows to capture images over a long period of time without the need of recharging the battery. Consequently, they offer considerable potential for inferring knowledge about e.g. behaviour patterns, habits or lifestyle of the user. However, due to the low frame-rate and the free motion of the camera, temporally adjacent images typically present abrupt appearance changes so that motion features cannot be reliably estimated. * The latter (e.g.,
Google Glass Google Glass, or simply Glass, is a brand of smart glasses developed and sold by Google. It was developed by X (previously Google X), with the mission of producing an ubiquitous computer. Google Glass displays information to the wearer using ...
, GoPro), are commonly mounted on the head, and capture conventional video (around 35fps) that allows to capture fine temporal details of interactions. Consequently, they offer potential for in-depth analysis of daily or special activities. However, since the camera is moving with the wearer head, it becomes more difficult to estimate the global motion of the wearer and in the case of abrupt movements, the images can result blurred. In both cases, since the camera is worn in a naturalistic setting, visual data present a huge variability in terms of illumination conditions and object appearance. Moreover, the camera wearer is not visible in the image and what he/she is doing has to be inferred from the information in the visual field of the camera, implying that important information about the wearer, such for instance as pose or facial expression estimation, is not available.


Applications

A collection of studies published in a special theme issue of the American Journal of Preventive Medicine has demonstrated the potential of lifelogs captured through wearable cameras from a number of viewpoints. In particular, it has been shown that used as a tool for understanding and tracking lifestyle behaviour, lifelogs would enable the prevention of noncommunicable diseases associated to unhealthy trends and risky profiles (such as obesity, depression, etc.). In addition, used as a tool of re-memory cognitive training, lifelogs would enable the prevention of cognitive and functional decline in elderly people. More recently, egocentric cameras have been used to study human and animal cognition, human-human social interaction, human-robot interaction, human expertise in complex tasks. Other applications include navigation/assistive technologies for the blind, monitoring and assistance of industrial workflows,Edmunds, S. R., Rozga, A., Li, Y., Karp, E. A., Ibanez, L. V., Rehg, J. M., & Stone, W. L. (2017)
Brief Report: Using a Point-of-View Camera to Measure Eye Gaze in Young Children with Autism Spectrum Disorder During Naturalistic Social Interactions: A Pilot Study.
Journal of Autism and Developmental Disorders, 47(3), 898-904.
and augmented reality interfaces.


See also

*
Eye tracking Eye tracking is the process of measuring either the point of gaze (where one is looking) or the motion of an eye relative to the head. An eye tracker is a device for measuring eye positions and eye movement. Eye trackers are used in research ...
*
Lifelog A lifelog is a personal record of one's daily life in a varying amount of detail, for a variety of purposes. The record contains a comprehensive dataset of a human's activities. The data could be used to increase knowledge about how people liv ...
*
Quantified self The quantified self refers both to the cultural phenomenon of self-tracking with technology and to a community of users and makers of self-tracking tools who share an interest in "self-knowledge through numbers". Quantified self practices overlap ...
*
Smartglasses Smartglasses or smart glasses are eye or head-worn wearable computers that offer useful capabilities to the user. Many smartglasses include displays that add information alongside or to what the wearer sees. Alternatively, smartglasses are som ...
*
Sousveillance Sousveillance ( ) is the recording of an activity by a member of the public, rather than a person or organisation in authority, typically by way of small wearable or portable personal technologies. The term, coined by Steve Mann, stems from th ...


References

{{reflist Computer vision