HOME

TheInfoList



OR:

Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails, modern video browsing tools enable users to quickly find desired information in a video archive by iterative
human–computer interaction Human–computer interaction (HCI) is research in the design and the use of computer technology, which focuses on the interfaces between people (users) and computers. HCI researchers observe the ways humans interact with computers and design tec ...
through an
exploratory search Exploratory search is a specialization of information exploration which represents the activities carried out by searchers who are: * unfamiliar with the domain of their goal (i.e. need to learn about the topic in order to understand how to achieve ...
approach. Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level
video content analysis Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events. This technical capability is used ...
, such as
shot transition detection Shot transition detection (or simply ''shot detection'') also called cut detection is a field of research of video processing. Its subject is the automated detection of transitions between ''shots'' in digital video with the purpose of temporal s ...
, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines, visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts (e.g., only shots showing cars), through some specific characteristics (e.g., color or motion filtering), through user-provided sketches (e.g., a visually drawn sketch), or through content-based similarity search.


History

Video browsing was originally proposed by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at
Siemens Siemens AG ( ) is a German multinational conglomerate corporation and the largest industrial manufacturing company in Europe headquartered in Munich with branch offices abroad. The principal divisions of the corporation are ''Industry'', '' ...
, and it was presented at the ACM International Conference in August 1993. They described a shot detection algorithm for
compressed video In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
that was originally encoded with discrete cosine transform (DCT)
video coding standards A video coding format (or sometimes video compression format) is a content representation format for storage or transmission of digital video content (such as in a data file or bitstream). It typically uses a standardized video compression algo ...
such as
JPEG JPEG ( ) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and imag ...
,
MPEG The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by International Organization for Standardization, ISO and International Electrotechnical Commission, IEC that sets standards for media coding, includ ...
and
H.26x The Video Coding Experts Group or Visual Coding Experts Group (VCEG, also known as Question 6) is a working group of the ITU Telecommunication Standardization Sector (ITU-T) concerned with standards for compression coding of video, images, audio, ...
. The basic idea was that, since the DCT coefficients are mathematically related to the spatial domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as
motion vector Motion estimation is the process of determining ''motion vectors'' that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions b ...
representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing. The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for
QBIC Content-based image retrieval, also known as query by image content ( QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching ...
video content mosaics, where each r-frame is a salient still from the shot it represents.


Video Notebook

Modern video browsing solutions include Video Notebook, a Menlo Park startup founded in 2021 by Mike Lanza, which uses
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
to extract slides and
optical character recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scen ...
and
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...
to facilitate video search. The software can be either used on the
client side Client-side refers to operations that are performed by the client in a client–server relationship in a computer network. General concepts Typically, a client is a computer application, such as a web browser, that runs on a user's local compute ...
(using a
browser extension A browser extension is a small software module for customizing a web browser. Browsers typically allow a variety of extensions, including user interface modifications, cookie management, ad blocking, and the custom scripting and styling of web p ...
), where the slides and text are extracted while the video is watched (e.g. on a video platform like
YouTube YouTube is a global online video platform, online video sharing and social media, social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by ...
or
Udemy Udemy, Inc. is a global destination for teaching and learning online. It was founded in May 2010 by Eren Bali, Gagan Biyani, and Oktay Caglar. As of November 2022, the platform has more than 57 million students, 213,000 courses, and 74,000 ins ...
), or on the server side. Processed videos, which can be viewed in the Video Notebook
web app A web application (or web app) is application software that is accessed using a web browser. Web applications are delivered on the World Wide Web to users with an active network connection. History In earlier computing models like client-serve ...
, feature a video browsing
user interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine f ...
with extracted timestamped slides, a search bar for querying the video (or a collection of videos), and text chapters. Video Notebook customers include organisations like
Ernst & Young Ernst & Young Global Limited, trade name EY, is a multinational professional services partnership headquartered in London, England. EY is one of the largest professional services networks in the world. Along with Deloitte, KPMG and Pricewaterh ...
.


Video Browser Showdown

The Video Browser Showdown (VBS) is an annual live evaluation competition for exploratory video search tools, where international researchers use video browsing tools to solve ad-hoc video search tasks on a moderately large data set as fast as possible. The main goal of the VBS, which started in 2012 at the International Conference on MultiMedia Modeling (MMM), is to advance the performance of video browsing tools. Since 2016, the VBS also collaborates with TRECVID.TRECVID
Academic benchmark initiative by
NIST The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into physical sci ...
The aim of the VBS is to evaluate video browsing tools for efficiency at known-item search (KIS) tasks with a well-defined data set in direct comparison to other tools.{{Cite journal , last=Schöffmann , first=Klaus , last2=Bailer , first2=Werner , date=2012-07-24 , title=Video browser showdown , url=https://doi.org/10.1145/2350204.2350205 , journal=ACM SIGMultimedia Records , volume=4 , issue=2 , pages=1–2 , doi=10.1145/2350204.2350205


References

Video editing software Film and video terminology Iranian inventions Taiwanese inventions