HTML audio is a subject of the
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
specification, incorporating audio, including
speech to text, all in the browser.
<audio> element
The element represents a sound, or an audio stream. It is commonly used to play back a single audio file within a web page, showing a GUI widget with play/pause/volume controls.
The element has these attributes: the music
* global attributes (accesskey; class; contenteditable; contextmenu; dir; draggable; dropzone; hidden; id; lang; spellcheck; style; tabindex; title; translate)
* autoplay = "autoplay" or "" (empty string) or empty
Instructs the User-Agent to automatically begin playback of the audio stream as soon as it can do so without stopping.
* preload = "none" or "metadata" or "auto" or "" (empty string) or empty
Represents a hint to the User-Agent about whether optimistic downloading of the audio stream itself or its metadata is considered worthwhile.
** "none": Hints to the User-Agent that the user is not expected to need the audio stream, or that minimizing unnecessary traffic is desirable.
** "metadata": Hints to the User-Agent that the user is not expected to need the audio stream, but that fetching its metadata (duration and so on) is desirable.
** "auto": Hints to the User-Agent that optimistically downloading the entire audio stream is considered desirable.
* controls = "controls" or "" (empty string) or empty
Instructs the User-Agent to expose a user interface for controlling playback of the audio stream.
* loop = "loop" or "" (empty string) or empty
Instructs the User-Agent to seek back to the start of the audio stream upon reaching the end.
* mediagroup = string
Instructs the User-Agent to link multiple videos and/or audio streams together.
* muted = "muted" or "" (empty string) or empty
Represents the default state of the audio stream, potentially overriding user preferences.
* src = non-empty
RLpotentially surrounded by spaces
The URL for the audio stream.
Example:
Supporting browsers
On PC:
* Google Chrome
* Internet Explorer 9
* Firefox 3.5
* Opera 10.5
* Safari 3.1
On mobile devices:
* Android Browser 2.3
*
* Google Chrome
* Internet Explorer Mobile 9
* Safari 4
* Firefox
* Opera Mobile 11
Supported audio coding formats
The adoption of HTML audio, as with
HTML video, has become polarized between proponents of
free and
patent-encumbered formats. In 2007, the recommendation to use
Vorbis
Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. The project produces an audio coding format and software reference encoder/decoder ( codec) for lossy audio compression, libvorbis. Vorbis is most comm ...
was
retracted from the
HTML5
HTML5 (Hypertext Markup Language 5) is a markup language used for structuring and presenting hypertext documents on the World Wide Web. It was the fifth and final major HTML version that is now a retired World Wide Web Consortium (W3C) recommend ...
specification by the
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
together with that to use
Ogg Theora
Theora is a free lossy video compression format. It was developed by the Xiph.Org Foundation and distributed without licensing fees alongside their other free and open media projects, including the Vorbis audio format and the Ogg contain ...
, citing the lack of a format accepted by all the major browser vendors.
Apple
An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
and
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
support the
ISO
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
/
IEC-
defined formats
AAC and the older
MP3
MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany under the lead of Karlheinz Brandenburg. It was designed to greatly reduce the amount ...
.
Mozilla
Mozilla is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, publishes and supports Mozilla products, thereby promoting free software and open standards. The community is supported institution ...
and
Opera
Opera is a form of History of theatre#European theatre, Western theatre in which music is a fundamental component and dramatic roles are taken by Singing, singers. Such a "work" (the literal translation of the Italian word "opera") is typically ...
support the free and
open
Open or OPEN may refer to:
Music
* Open (band), Australian pop/rock band
* The Open (band), English indie rock band
* ''Open'' (Blues Image album), 1969
* ''Open'' (Gerd Dudek, Buschi Niebergall, and Edward Vesala album), 1979
* ''Open'' (Go ...
,
royalty-free
Royalty-free (RF) material subject to copyright or other intellectual property rights may be used without the need to pay royalties or license fees for each use, per each copy or volume sold or some time period of use or sales.
Computer standards ...
Vorbis
Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. The project produces an audio coding format and software reference encoder/decoder ( codec) for lossy audio compression, libvorbis. Vorbis is most comm ...
format in
Ogg
Ogg is a digital multimedia container format designed to provide for efficient streaming and manipulation of digital multimedia. It is maintained by the Xiph.Org Foundation and is free and open, unrestricted by software patents. Its name is ...
and
WebM
WebM is an audiovisual media file format. It is primarily intended to offer a royalty-free alternative to use in the HTML video and the HTML audio elements. It has a sister project, WebP, for images. The development of the format is sponsored by ...
containers, and criticize the patent-encumbered nature of MP3 and AAC, which are guaranteed to be “non-free”.
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
has so far provided support for all common formats.
Most AAC files with finite length are wrapped in an MPEG-4 container (.mp4, .m4a), which is supported natively in Internet Explorer, Safari, and Chrome, and supported by the OS in Firefox and Opera. Most AAC live streams with infinite length are wrapped in an Audio Data Transport Stream container (.aac, .adts), which is supported by Chrome, Safari, Firefox and Edge.
Many browsers also support uncompressed
PCM
Pulse-code modulation (PCM) is a method used to Digital signal (signal processing), digitally represent analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio application ...
audio in a
WAVE container.
In 2012, the free and open royalty-free
Opus format was released and standardized by
IETF
The Internet Engineering Task Force (IETF) is a standards organization for the Internet standard, Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster ...
. It is supported by Mozilla, Google, Opera and Edge.
This table documents the current support for
audio coding format
An audio coding format (or sometimes audio compression format) is a content representation format for storage or transmission of digital audio (such as in digital television, digital radio and in audio and video files). Examples of audio coding f ...
s by the
<audio>
element.
Web Audio API and MediaStream Processing API
The Web Audio API specification developed by
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
describes a high-level JavaScript API for processing and synthesizing audio in web applications. The primary paradigm is of an audio routing graph, where a number of AudioNode objects are connected together to define the overall audio rendering. The actual processing will primarily take place in the underlying implementation (typically optimized Assembly / C / C++ code), but direct JavaScript processing and synthesis is also supported.
Mozilla's Firefox browser implements a similar Audio Data API extension since version 4, implemented in 2010 and released in 2011, but Mozilla warns it is non-standard and deprecated, and recommends the Web Audio API instead.
Some JavaScript audio processing and synthesis libraries such a
Audiolet support both APIs.
Th
W3C Audio Working Groupis also considering the
MediaStream Processing API specification developed by
Mozilla
Mozilla is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, publishes and supports Mozilla products, thereby promoting free software and open standards. The community is supported institution ...
.
In addition to audio mixing and processing, it covers more general media streaming, including synchronization with HTML elements, capture of audio and video streams, and
peer-to-peer routing of such media streams.
Supporting browsers
On PC:
*
Google Chrome
Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, iPadOS, an ...
10 (Enabled by default since 14)
*
Firefox
Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...
23 (Enabled by default since 25)
*
Opera
Opera is a form of History of theatre#European theatre, Western theatre in which music is a fundamental component and dramatic roles are taken by Singing, singers. Such a "work" (the literal translation of the Italian word "opera") is typically ...
15
*
Safari
A safari (; originally ) is an overland journey to observe wildlife, wild animals, especially in East Africa. The so-called big five game, "Big Five" game animals of Africa – lion, African leopard, leopard, rhinoceros, African elephant, elep ...
6
*
Microsoft Edge
Microsoft Edge is a Proprietary Software, proprietary cross-platform software, cross-platform web browser created by Microsoft and based on the Chromium (web browser), Chromium open-source project, superseding Edge Legacy. In Windows 11, Edge ...
12
*
Opera GX
Opera is a multi-platform web browser developed by its namesake company Opera. The current edition of the browser is based on Chromium. Opera is available on Windows, macOS, Linux, Android, and iOS (Safari WebKit engine). Opera offers two mob ...
36
On mobile devices:
*
Google Chrome
Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, iPadOS, an ...
for Android 28 (Enabled by default since 29) and Apple iPads
*
Safari
A safari (; originally ) is an overland journey to observe wildlife, wild animals, especially in East Africa. The so-called big five game, "Big Five" game animals of Africa – lion, African leopard, leopard, rhinoceros, African elephant, elep ...
6 (Has restrictions on use (Muted unless user called))
*
Firefox
Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...
23 (Enabled by default since 25)
*
Tizen
Tizen () is a Linux-based operating system primarily developed by Samsung Electronics and supported by the Linux Foundation.
The project was originally conceived as an HTML5-based platform for mobile devices to succeed MeeGo. It was backed by o ...
Web Speech API
The
Web Speech API
HTML audio is a subject of the HTML specification, incorporating audio, including speech to text, all in the browser.
<audio> element
The element represents a sound, or an audio stream. It is commonly used to play back a single audio fil ...
aims to provide an alternative input method for web applications (without using a keyboard). With this API, developers can give web apps the ability to transcribe voice to text, from the computer's microphone. The recorded audio is sent to speech servers for transcription, after which the text is typed out for the user. The API itself is agnostic of the underlying speech recognition implementation and can support both server based as well as embedded recognizers.
The HTML Speech Incubator group has proposed the implementation of audio-speech technology in browsers in the form of uniform,
cross-platform
Within computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several Computing platform, computing platforms. Some ...
APIs. The API contains both:
* Speech Input API
* Text to Speech API
Google integrated this feature into Google Chrome in March 2011. Letting its users search the web with their voice with code like:
Supporting browsers
*
Safari
A safari (; originally ) is an overland journey to observe wildlife, wild animals, especially in East Africa. The so-called big five game, "Big Five" game animals of Africa – lion, African leopard, leopard, rhinoceros, African elephant, elep ...
14.1 and up
*
Google Chrome
Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, iPadOS, an ...
25 and up
*
Firefox
Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...
Desktop 44.0 and up (Linux and Mac) / 45.0 and up (Windows)
ARTIAL: speech synthesis only; no recognition; enabled by default since 49.0">speech synthesis">ARTIAL: speech synthesis only; no recognition; enabled by default since 49.0ref>
See also
*
HTML video
* Use of Ogg formats in HTML5
Notes
References
{{Reflist, 30em
External links
* HTML/Elements/audio – W3C Wiki
HTML5 audio element – W3CWeb Audio API – W3CMediaStream Processing API – W3CWeb Audio DAW – GitHubMozilla's Web Audio API
HTML5
Digital audio
Web standards
Web programming