Video frames of the Parallel Bars action category

Visual temporal attention is a special case of visual attention that involves directing attention to specific instant of time. Similar to its spatial counterpart

visual spatial attention Visual spatial attention is a form of visual attention that involves directing attention to a location in space. Similar to its temporal counterpart visual temporal attention, these attention modules have been widely implemented in video analytic ...

, these attention modules have been widely implemented in

video analytics Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events. This technical capability is used ...

computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human ...

to provide enhanced performance and human interpretable explanation of

deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. D ...

models. As visual spatial attention mechanism allows human and/or

systems to focus more on semantically more substantial regions in space, visual temporal attention modules enable machine learning algorithms to emphasize more on critical video frames in

tasks, such as human action recognition. In

convolutional neural network In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Networ ...

-based systems, the prioritization introduced by the attention mechanism is regularly implemented as a linear weighting layer with parameters determined by labeled training data.

Application in Action Recognition

Recent video segmentation algorithms often exploits both spatial and temporal attention mechanisms. Research in human action recognition has accelerated significantly since the introduction of powerful tools such as Convolutional Neural Networks (CNNs). However, effective methods for incorporation of temporal information into CNNs are still being actively explored. Motivated by the popular recurrent attention models in

natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...

, the Attention-aware Temporal Weighted CNN (ATW CNN) is proposed in videos, which embeds a visual attention model into a temporal weighted multi-stream CNN. This attention model is implemented as temporal weighting and it effectively boosts the recognition performance of video representations. Besides, each stream in the proposed ATW CNN framework is capable of end-to-end training, with both network parameters and temporal weights optimized by stochastic gradient descent (SGD) with back-propagation. Experimental results show that the ATW CNN attention mechanism contributes substantially to the performance gains with the more discriminative snippets by focusing on more relevant video segments.

References

{{reflist, refs= {{cite web , title=NIPS 2017 , website=Interpretable ML Symposium , date=2017-10-20 , url=http://interpretable.ml/ , access-date=2018-09-12 {{cite book , last1=Zang , first1=Jinliang , last2=Wang , first2=Le , last3=Liu , first3=Ziyi , last4=Zhang , first4=Qilin , last5=Hua , first5=Gang , last6=Zheng , first6=Nanning , title=IFIP Advances in Information and Communication Technology , chapter=Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition , publisher=Springer International Publishing , location=Cham , year=2018 , isbn=978-3-319-92006-1 , issn=1868-4238 , doi=10.1007/978-3-319-92007-8_9 , pages=97–108 , arxiv=1803.07179 , s2cid=4058889 {{cite journal , last1=Wang , first1=Le , last2=Zang , first2=Jinliang , last3=Zhang , first3=Qilin , last4=Niu , first4=Zhenxing , last5=Hua , first5=Gang , last6=Zheng , first6=Nanning , title=Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network , journal=Sensors , publisher=MDPI AG , volume=18 , issue=7 , date=2018-06-21 , issn=1424-8220 , doi=10.3390/s18071979 , page=1979 , url=https://qilin-zhang.github.io/_pages/pdfs/sensors-18-01979-Action_Recognition_by_an_Attention-Aware_Temporal_Weighted_Convolutional_Neural_Network.pdf , pmid=29933555 , pmc=6069475, bibcode=2018Senso..18.1979W , doi-access=free CC-BY icon

Material was copied from this source, which is available under
Creative Commons Attribution 4.0 International License
{{cite web , title=UCF101 - Action Recognition Data Set , last=Center , first=UCF , website=CRCV , date=2013-10-17 , url=http://crcv.ucf.edu/data/UCF101.php , access-date=2018-09-12 Attention Computer vision Machine vision Applications of computer vision Applied machine learning Film and video technology Cognition Cognitive neuroscience Neuropsychology

Application in Action Recognition

See also

References