Optical music recognition (OMR) is a field of research that investigates how to computationally read

musical notation Music notation or musical notation is any system used to visually represent aurally perceived music played with instruments or sung by the human voice through the use of written, printed, or otherwise-produced symbols, including notation fo ...

in documents. The goal of OMR is to teach the computer to read and interpret

sheet music Sheet music is a handwritten or printed form of musical notation that uses List of musical symbols, musical symbols to indicate the pitches, rhythms, or chord (music), chords of a song or instrumental Musical composition, musical piece. Like ...

and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g.

MIDI MIDI (; Musical Instrument Digital Interface) is a technical standard that describes a communications protocol, digital interface, and electrical connectors that connect a wide variety of electronic musical instruments, computers, and re ...

(for playback) and

MusicXML MusicXML is an XML-based file format for representing Western musical notation. The format iopen fully documented, and can be freely used under the W3C Community Final Specification Agreement. History MusicXML was invented by Michael Good and in ...

(for page layout). In the past it has, misleadingly, also been called "music

optical character recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scen ...

". Due to significant differences, this term should no longer be used.

History

FirstPublishedDigitalScanOfMusic-Prerau1971

Optical music recognition of printed sheet music started in the late 1960s at the

Massachusetts Institute of Technology The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the ...

when the first

image scanner An image scanner—often abbreviated to just scanner—is a device that optically scans images, printed text, handwriting or an object and converts it to a digital image. Commonly used in offices are variations of the desktop ''flatbed scanner'' ...

s became affordable for research institutes. Due to the limited memory of early computers, the first attempts were limited to only a few measures of music. In 1984, a Japanese research group from

Waseda University , abbreviated as , is a private university, private research university in Shinjuku, Tokyo. Founded in 1882 as the ''Tōkyō Senmon Gakkō'' by Ōkuma Shigenobu, the school was formally renamed Waseda University in 1902. The university has numerou ...

developed a specialized robot, called WABOT (WAseda roBOT), which was capable of reading the music sheet in front of it and accompanying a singer on an

electric organ An electric organ, also known as electronic organ, is an electronic keyboard instrument which was derived from the harmonium, pipe organ and theatre organ. Originally designed to imitate their sound, or orchestral sounds, it has since developed ...

. Early research in OMR was conducted by Ichiro Fujinaga, Nicholas Carter, Kia Ng, David Bainbridge, and Tim Bell. These researchers developed many of the techniques that are still being used today. The first commercial OMR application, MIDISCAN (now

SmartScore SmartScore 64 is a music OCR and scorewriter program, developed, published and distributed by Musitek Corporation based in Ojai, California. History SmartScore was originally released in 1991 as MIDISCAN for Windows. The product line was later ...

), was released in 1991 by Musitek Corporation. The availability of

smartphone A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whic ...

s with good cameras and sufficient computational power, paved the way to mobile solutions where the user takes a picture with the smartphone and the device directly processes the image.

Relation to other fields

Optical music recognition relates to other fields of research, including

computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...

, document analysis, and

music information retrieval Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications. Those involved in MIR may have a background in academic musicol ...

. It is relevant for practicing musicians and composers that could use OMR systems as a means to enter music into the computer and thus ease the process of composing,

transcribing Transcription in the linguistic sense is the systematic representation of spoken language in written form. The source can either be utterances (''speech'' or ''sign language'') or preexisting text in another writing system. Transcription shoul ...

, and editing music. In a library, an OMR system could make music scores searchable and for musicologists it would allow to conduct quantitative musicological studies at scale.

OMR vs. OCR

Optical music recognition has frequently been compared to Optical character recognition. The biggest difference is that music notation is a featural writing system. This means that while the alphabet consists of well-defined primitives (e.g., stems, noteheads, or flags), it is their configuration – how they are placed and arranged on the staff – that determines the semantics and how it should be interpreted. The second major distinction is the fact that while an OCR system does not go beyond recognizing letters and words, an OMR system is expected to also recover the semantics of music: The user expects that the vertical position of a note (graphical concept) is being translated into the pitch (musical concept) by applying the rules of music notation. Notice that there is no proper equivalent in text recognition. By analogy, recovering the music from an image of a music sheet can be as challenging as recovering the

HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...

source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...

from the

screenshot screenshot (also known as screen capture or screen grab) is a digital image that shows the contents of a computer display. A screenshot is created by the operating system or software running on the device powering the display. Additionally, s ...

of a

website A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google Search, Google, Facebook, Amaz ...

. The third difference comes from the used character set. Although writing systems like Chinese have extraordinarily complex character sets, the character set of primitives for OMR spans a much greater range of sizes, ranging from tiny elements such as a dot to big elements that potentially span an entire page such as a brace. Some symbols have a nearly unrestricted appearance like slurs, that are only defined as more-or-less smooth curves that may be interrupted anywhere. Finally, music notation involves ubiquitous two-dimensional spatial relationships, whereas text can be read as a one-dimensional stream of information, once the baseline is established.

Approaches to OMR

The process of recognizing music scores is typically broken down into smaller steps that are handled with specialized

pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphi ...

algorithms. Many competing approaches have been proposed with most of them sharing a pipeline architecture, where each step in this pipeline performs a certain operation, such as detecting and removing staff lines before moving on to the next stage. A common problem with that approach is that errors and artifacts that were made in one stage are propagated through the system and can heavily affect the performance. For example, if the staff line detection stage fails to correctly identify the existence of the music staffs, subsequent steps will probably ignore that region of the image, leading to missing information in the output. Optical music recognition is frequently underestimated due to the seemingly easy nature of the problem: If provided with a perfect scan of typeset music, the visual recognition can be solved with a sequence of fairly simple algorithms, such as projections and template matching. However, the process gets significantly harder for poor scans or handwritten music, which many systems fail to recognize altogether. And even if all symbols would have been detected perfectly, it is still challenging to recover the musical semantics due to ambiguities and frequent violations of the rules of music notation (see the example of Chopin's Nocturne). Donald Byrd and Jakob Simonsen argue that OMR is difficult because modern music notation is extremely complex. Donald Byrd also collected a number of interesting examples as well as extreme examples of music notation that demonstrate the sheer complexity of music notation.

Outputs of OMR systems

Typical applications for OMR systems include the creation of an audible version of the music score (referred to as replayability). A common way to create such a version is by generating a

file, which can be synthesised into an audio file. MIDI files, though, are not capable of storing engraving information (how the notes were laid out) or

enharmonic In modern musical notation and tuning, an enharmonic equivalent is a note, interval, or key signature that is equivalent to some other note, interval, or key signature but "spelled", or named differently. The enharmonic spelling of a written n ...

spelling. If the music scores are recognized with the goal of human readability (referred to as reprintability), the structured encoding has to be recovered, which includes precise information on the layout and engraving. Suitable formats to store this information include MEI and

. Apart from those two applications, it might also be interesting to just extract metadata from the image or enable searching. In contrast to the first two applications, a lower level of comprehension of the music score might be sufficient to perform these tasks.

General framework (2001)

In 2001, David Bainbridge and Tim Bell published their work on the challenges of OMR, where they reviewed previous research and extracted a general framework for OMR. Their framework has been used by many systems developed after 2001. The framework has four distinct stages with a heavy emphasis on the visual detection of objects. They noticed that the reconstruction of the musical semantics was often omitted from published articles because the used operations were specific to the output format.

Refined framework (2012)

Optical Music Recognition Architecture by Rebelo (2012)

In 2012, Ana Rebelo et al. surveyed techniques for optical music recognition. They categorized the published research and refined the OMR pipeline into the four stages: Preprocessing, Music symbols recognition, Musical notation reconstruction and Final representation construction. This framework became the de facto standard for OMR and is still being used today (although sometimes with slightly different terminology). For each block, they give an overview of techniques that are used to tackle that problem. This publication is the most cited paper on OMR research as of 2019.

Deep learning (since 2016)

With the advent of

deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. De ...

, many computer vision problems have shifted from imperative programming with hand-crafted heuristics and feature engineering towards machine learning. In optical music recognition, the staff processing stage, the music object detection stage, as well as the music notation reconstruction stage have seen successful attempts to solve them with deep learning. Even completely new approaches have been proposed, including solving OMR in an end-to-end fashion with sequence-to-sequence models, that take an image of music scores and directly produce the recognized music in a simplified format.

Notable scientific projects

Staff removal challenge

For systems that were developed before 2016, staff detection and removal posed a significant obstacle. A scientific competition was organized to improve the state of the art and advance the field. Due to excellent results and modern techniques that made the staff removal stage obsolete, this competition was discontinued. However, the freely available CVC-MUSCIMA dataset that was developed for this challenge is still highly relevant for OMR research as it contains 1000 high-quality images of handwritten music scores, transcribed by 50 different musicians. It has been further extended into the MUSCIMA++ dataset, which contains detailed annotations for 140 out of 1000 pages.

SIMSSA

The Single Interface for Music Score Searching and Analysis project (SIMSSA) is probably the largest project that attempts to teach computers to recognize musical scores and make them accessible. Several sub-projects have already been successfully completed, including the Liber Usualis and Cantus Ultimus.

TROMPA

Towards Richer Online Music Public-domain Archives (TROMPA) is an international research project, sponsored by the European Union that investigates how to make public-domain digital music resources more accessible.

Datasets

The development of OMR systems benefits from test datasets of sufficient size and diversity to ensure the system being developed works under various conditions. However, due to legal reasons and potential copyright violations, it is challenging to compile and publish such a dataset. The most notable datasets for OMR are referenced and summarized by the OMR Datasets project and include the CVC-MUSCIMA, MUSCIMA++, DeepScores, PrIMuS, HOMUS, and SEILS dataset, as well as the Universal Music Symbol Collection. French company Newzik took a different approach in the development of its OMR technology Maestria, by using random score generation. Using synthetic data helped with avoiding copyright issues and training the artificial intelligence algorithms on musical cases that rarely occur in actual repertoire, ultimately resulting in (according to claims by the company) more accurate music recognition.

Software

Academic and open-source software

Many OMR projects have been realized in academia, but only a few of them reached a mature state and were successfully deployed to users. These systems are: * Aruspix *

Audiveris Audiveris is an open source tool for optical music recognition (OMR). It allows a user to import scanned music scores and export them to MusicXML format for use in other applications, e.g. music notation programs or page turning software for ...

* CANTOR * MusicStaves toolkit for Gamera * DMOS *

OpenOMR OpenOMR is an open source optical music recognition (OMR) tool written in Java for printed music scores. It allows a user to scan printed sheet music and play it through the computer speakers. It is being published as free software under the ter ...

* Rodan

Commercial software

Most of the commercial desktop applications that were developed in the last 20 years have been shut down again due to the lack of commercial success, leaving only a few vendors that are still developing, maintaining, and selling OMR products. Some of these products claim extremely high recognition rates with up to 100% accuracy but fail to disclose how those numbers were obtained, making it nearly impossible to verify them and compare different OMR systems. * capella-scan * FORTE by Forte Notation * MIDI-Connections Scan by Composing & Arranging Systems * NoteScan bundled with Nightingale * Myriad SARL ** OMeR (Optical Music easy Reader) Add-on for Harmony Assistant and Melody Assistant: Myriad Software ** PDFtoMusic Pro * PhotoScore by Neuratron The Light version of PhotoScore is used in

Sibelius Jean Sibelius ( ; ; born Johan Julius Christian Sibelius; 8 December 186520 September 1957) was a Finnish composer of the late Romantic and early-modern periods. He is widely regarded as his country's greatest composer, and his music is often ...

; PhotoScore uses the SharpEye SDK * Scorscan by npcImaging *

by Musitek. Formerly packaged as "MIDISCAN". (SmartScore Lite has been used in previous versions of Finale). *ScanScore (Also as a bundle with Forte Notation.) * Soundslice PDF/image importer. AI-based OMR system released in beta in September 2022. * Maestria by Newzik. Released in May 2021, Maestria is an example of new-generation OMR technology based on deep-learning. The company claims it not only brings better results but also means ''"it becomes more accurate with each conversion".''

Mobile apps

Better cameras and increases in processing power have enabled a range of mobile applications, both on the Google Play Store and the Apple Store. Frequently the focus is on sight-playing (see

sight-reading In music, sight-reading, also called ''a prima vista'' (Italian meaning "at first sight"), is the practice of reading and performing of a piece in a music notation that the performer has not seen or learned before. Sight-singing is used to descri ...

) - converting the sheet music into sound that is played on the device. * iSeeNotes by Gear Up AB * NotateMe Now by Neuratron * Notation Scanner by Song Zhang * PlayScore 2 by Organum Ltd * SmartScore NoteReader by Musitek * Newzik app

References

External links

Recording of the ISMIR 2018 tutorial "Optical Music Recognition for Dummies"

* ttps://web.archive.org/web/20131218222736/http://www.informatics.indiana.edu/donbyrd/OMRSystemsTable.html OMR (Optical Music Recognition) Systems Comprehensive table of OMR (Last updated: 30 January 2007). {{commons category-inline , Optical music recognition Music OCR software Musical notation