HOME

TheInfoList



OR:

The Real-time Transport Protocol (RTP) is a
network protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synchroniza ...
for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve
streaming media Streaming media is multimedia that is delivered and consumed in a continuous manner from a source, with little or no intermediate storage in network elements. ''Streaming'' refers to the delivery method of content, rather than the content it ...
, such as
telephony Telephony ( ) is the field of technology involving the development, application, and deployment of telecommunication services for the purpose of electronic transmission of voice, fax, or data, between distant parties. The history of telephony is i ...
,
video teleconference Videotelephony, also known as videoconferencing and video teleconferencing, is the two-way or multipoint reception and transmission of audio and video signals by people in different locations for real time communication.McGraw-Hill Concise Ency ...
applications including
WebRTC WebRTC (Web Real-Time Communication) is a free and open-source project providing web browsers and mobile applications with real-time communication (RTC) via application programming interfaces (APIs). It allows audio and video communication to wor ...
, television services and web-based
push-to-talk Push-to-talk (PTT), also known as press-to-transmit, is a method of having conversations or talking on half-duplex communication lines, including two-way radio, using a momentary button to switch from voice reception mode to transmit mode. H ...
features. RTP typically runs over
User Datagram Protocol In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages (transported as datagrams in packets) to other hosts on an Internet Protocol (IP) network. ...
(UDP). RTP is used in conjunction with the
RTP Control Protocol The RTP Control Protocol (RTCP) is a sister protocol of the Real-time Transport Protocol (RTP). Its basic functionality and packet structure is defined in RFC 3550. RTCP provides out-of-band statistics and control information for an RTP session. ...
(RTCP). While RTP carries the media streams (e.g., audio and video), RTCP is used to monitor transmission statistics and
quality of service Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network, or a cloud computing service, particularly the performance seen by the users of the network. To quantitat ...
(QoS) and aids
synchronization Synchronization is the coordination of events to operate a system in unison. For example, the conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are said to be synchronou ...
of multiple streams. RTP is one of the technical foundations of
Voice over IP Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of speech, voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms In ...
and in this context is often used in conjunction with a
signaling protocol A signaling protocol is a type of communications protocol for encapsulating the signaling between communication endpoints and switching systems to establish or terminate a connection and to identify the state of connection. The following is a list ...
such as the
Session Initiation Protocol The Session Initiation Protocol (SIP) is a signaling protocol used for initiating, maintaining, and terminating communication sessions that include voice, video and messaging applications. SIP is used in Internet telephony, in private IP telepho ...
(SIP) which establishes connections across the network. RTP was developed by the Audio-Video Transport Working Group of the
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and a ...
(IETF) and first published in 1996 as which was then superseded by in 2003.


Overview

The Internet Engineering Task Force (
IETF The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and a ...
) began developing RTP starting in 1992, along with the
Session Announcement Protocol The Session Announcement Protocol (SAP) is an experimental protocol for advertising multicast session information. SAP typically uses Session Description Protocol (SDP) as the format for Real-time Transport Protocol (RTP) session descriptions. Anno ...
(SAP), the
Session Description Protocol The Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation. Its predominant use is in support of streaming media applications, such as voice over IP (VoIP) ...
(SDP), and the
Session Initiation Protocol The Session Initiation Protocol (SIP) is a signaling protocol used for initiating, maintaining, and terminating communication sessions that include voice, video and messaging applications. SIP is used in Internet telephony, in private IP telepho ...
(SIP). RTP is designed for end-to-end,
real-time Real-time or real time describes various operations in computing or other processes that must guarantee response times within a specified time (deadline), usually a relatively short time. A real-time process is generally one that happens in defined ...
transfer of
streaming media Streaming media is multimedia that is delivered and consumed in a continuous manner from a source, with little or no intermediate storage in network elements. ''Streaming'' refers to the delivery method of content, rather than the content it ...
. The protocol provides facilities for
jitter In electronics and telecommunications, jitter is the deviation from true periodicity of a presumably periodic signal, often in relation to a reference clock signal. In clock recovery applications it is called timing jitter. Jitter is a significa ...
compensation and detection of
packet loss Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is either caused by errors in data transmission, typically across wireless networks, or network congestion.Kur ...
and
out-of-order delivery In computer networking, out-of-order delivery is the delivery of data packets in a different order from which they were sent. Out-of-order delivery can be caused by packets following multiple paths through a network, by lower-layer retransmissio ...
, which are common especially during UDP transmissions on an IP network. RTP allows data transfer to multiple destinations through
IP multicast IP multicast is a method of sending Internet Protocol (IP) datagrams to a group of interested receivers in a single transmission. It is the IP-specific form of multicast and is used for streaming media and other network applications. It uses spec ...
. RTP is regarded as the primary standard for audio/video transport in IP networks and is used with an associated profile and payload format. The design of RTP is based on the architectural principle known as
application-layer framing Application-layer framing or application-level framing (ALF) is a method of allowing an application to use its semantics for the design of its network protocols. This procedure was first proposed by D. D. Clark and David L. Tennenhouse.Clark, D ...
where protocol functions are implemented in the application as opposed to the operating system's
protocol stack The protocol stack or network stack is an implementation of a computer networking protocol suite or protocol family. Some of these terms are used interchangeably but strictly speaking, the ''suite'' is the definition of the communication protoco ...
. Real-time
multimedia Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to tradition ...
streaming applications require timely delivery of information and often can tolerate some packet loss to achieve this goal. For example, loss of a packet in audio application may result in loss of a fraction of a second of audio data, which can be made unnoticeable with suitable
error concealment Error concealment is a technique used in signal processing that aims to minimize the deterioration of signals caused by missing data, called packet loss. A signal is a message sent from a transmitter to a receiver in multiple small packets. Packet ...
algorithms. The
Transmission Control Protocol The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is commonly ...
(TCP), although standardized for RTP use, is not normally used in RTP applications because TCP favors reliability over timeliness. Instead the majority of the RTP implementations are built on the
User Datagram Protocol In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages (transported as datagrams in packets) to other hosts on an Internet Protocol (IP) network. ...
(UDP). Other transport protocols specifically designed for multimedia sessions are
SCTP The Stream Control Transmission Protocol (SCTP) is a computer networking communications protocol in the transport layer of the Internet protocol suite. Originally intended for Signaling System 7 (SS7) message transport in telecommunication, the p ...
and DCCP, although, , they are not in widespread use. RTP was developed by the Audio/Video Transport working group of the IETF standards organization. RTP is used in conjunction with other protocols such as
H.323 H.323 is a recommendation from the ITU Telecommunication Standardization Sector (ITU-T) that defines the protocols to provide audio-visual communication sessions on any packet network. The H.323 standard addresses call signaling and control, m ...
and
RTSP The Real Time Streaming Protocol (RTSP) is an application-level network protocol designed for multiplexing and packetizing multimedia transport streams (such as interactive media, video and audio) over a suitable transport protocol. RTSP i ...
. The RTP specification describes two protocols: RTP and RTCP. RTP is used for the transfer of multimedia data, and the RTCP is used to periodically send control information and QoS parameters. The data transfer protocol, RTP, carries real-time data. Information provided by this protocol includes timestamps (for synchronization), sequence numbers (for packet loss and reordering detection) and the payload format which indicates the encoded format of the data. The control protocol, RTCP, is used for quality of service (QoS) feedback and synchronization between the media streams. The bandwidth of RTCP traffic compared to RTP is small, typically around 5%. RTP sessions are typically initiated between communicating peers using a signaling protocol, such as H.323, the
Session Initiation Protocol The Session Initiation Protocol (SIP) is a signaling protocol used for initiating, maintaining, and terminating communication sessions that include voice, video and messaging applications. SIP is used in Internet telephony, in private IP telepho ...
(SIP), RTSP, or
Jingle A jingle is a short song or tune used in advertising and for other commercial uses. Jingles are a form of sound branding. A jingle contains one or more hooks and meaning that explicitly promote the product or service being advertised, usually t ...
(
XMPP Extensible Messaging and Presence Protocol (XMPP, originally named Jabber) is an open communication protocol designed for instant messaging (IM), presence information, and contact list maintenance. Based on XML (Extensible Markup Language), it ...
). These protocols may use the
Session Description Protocol The Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation. Its predominant use is in support of streaming media applications, such as voice over IP (VoIP) ...
to specify the parameters for the sessions. An RTP session is established for each multimedia stream. Audio and video streams may use separate RTP sessions, enabling a receiver to selectively receive components of a particular stream. The RTP and RTCP design is independent of the transport protocol. Applications most typically use UDP with port numbers in the unprivileged range (1024 to 65535). The Stream Control Transmission Protocol (SCTP) and the
Datagram Congestion Control Protocol In computer networking, the Datagram Congestion Control Protocol (DCCP) is a message-oriented transport layer protocol. DCCP implements reliable connection setup, teardown, Explicit Congestion Notification (ECN), congestion control, and feature ne ...
(DCCP) may be used when a reliable transport protocol is desired. The RTP specification recommends even port numbers for RTP, and the use of the next odd port number for the associated RTCP session. A single port can be used for RTP and RTCP in applications that multiplex the protocols. RTP is used by real-time multimedia applications such as
voice over IP Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of speech, voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms In ...
,
audio over IP Audio over IP (AoIP) is the distribution of digital audio across an IP network such as the Internet. It is used increasingly to provide high-quality audio feeds over long distances. The application is also known as audio contribution over IP (ACI ...
,
WebRTC WebRTC (Web Real-Time Communication) is a free and open-source project providing web browsers and mobile applications with real-time communication (RTC) via application programming interfaces (APIs). It allows audio and video communication to wor ...
and
Internet Protocol television Internet Protocol television (IPTV) is the delivery of television content over Internet Protocol (IP) networks. This is in contrast to delivery through traditional terrestrial, satellite, and cable television formats. Unlike downloaded media, ...
.


Profiles and payload formats

RTP is designed to carry a multitude of multimedia formats, which permits the development of new formats without revising the RTP standard. To this end, the information required by a specific application of the protocol is not included in the generic RTP header. For each class of application (e.g., audio, video), RTP defines a ''profile'' and associated ''payload formats''. Every instantiation of RTP in a particular application requires a profile and payload format specifications. The profile defines the codecs used to encode the payload data and their mapping to payload format codes in the protocol field ''Payload Type'' (PT) of the RTP header. Each profile is accompanied by several payload format specifications, each of which describes the transport of particular encoded data. Examples of audio payload formats are
G.711 G.711 is a narrowband audio codec originally designed for use in telephony that provides toll-quality audio at 64 kbit/s. G.711 passes audio signals in the range of 300–3400 Hz and samples them at the rate of 8,000 samples per second ...
,
G.723 G.723 is an ITU-T standard speech codec using extensions of G.721 providing voice quality covering 300 Hz to 3400 Hz using Adaptive Differential Pulse Code Modulation (ADPCM) to 24 and 40 kbit/s for digital circuit multiplication equip ...
,
G.726 G.726 is an ITU-T ADPCM speech codec standard covering the transmission of voice at rates of 16, 24, 32, and 40  kbit/s. It was introduced to supersede both G.721, which covered ADPCM at 32 kbit/s, and G.723, which described ADPCM for ...
,
G.729 G.729 is a royalty-free narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as ''Coding of speech at 8 kbit/s using code-excited linear prediction'' speech coding (CS-ACEL ...
,
GSM The Global System for Mobile Communications (GSM) is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe the protocols for second-generation ( 2G) digital cellular networks used by mobile devices such ...
,
QCELP Qualcomm code-excited linear prediction (QCELP), also known as Qualcomm PureVoice, is a speech codec developed in 1994 by Qualcomm to increase the speech quality of the IS-96A codec earlier used in CDMA networks. It was later replaced with EVRC ...
,
MP3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...
, and
DTMF Dual-tone multi-frequency signaling (DTMF) is a telecommunication signaling system using the voice-frequency band over telephone lines between telephone equipment and other communications devices and switching centers. DTMF was first developed ...
, and examples of video payloads are
H.261 H.261 is an ITU-T video compression standard, first ratified in November 1988. It is the first member of the H.26x family of video coding standards in the domain of the ITU-T Study Group 16 Video Coding Experts Group (VCEG, then Specialists Gro ...
, H.263,
H.264 Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distr ...
,
H.265 H is the eighth letter of the Latin alphabet. H may also refer to: Musical symbols * H number, Harry Halbreich reference mechanism for music by Honegger and Martinů * H, B (musical note) * H, B major People * H. (noble) (died after 12 ...
and
MPEG-1 MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively) without excessive quality loss, making ...
/
MPEG-2 MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic video coding format, coding of moving pictures and associated audio information". It describes a combination of Lossy compression, lossy video compression and ...
. The mapping of
MPEG-4 MPEG-4 is a group of international standards for the compression of digital audio and visual data, multimedia systems, and file storage formats. It was originally introduced in late 1998 as a group of audio and video coding formats and related tec ...
audio/video streams to RTP packets is specified in , and H.263 video payloads are described in . Examples of RTP profiles include: * The ''RTP profile for Audio and video conferences with minimal control'' () defines a set of static payload type assignments, and a dynamic mechanism for mapping between a payload format, and a PT value using
Session Description Protocol The Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation. Its predominant use is in support of streaming media applications, such as voice over IP (VoIP) ...
(SDP). * The
Secure Real-time Transport Protocol The Secure Real-time Transport Protocol (SRTP) is a profile for Real-time Transport Protocol (RTP) intended to provide encryption, message authentication and integrity, and replay attack protection to the RTP data in both unicast and multicas ...
(SRTP) () defines an RTP profile that provides
cryptographic Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or '' -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adve ...
services for the transfer of payload data. * The experimental ''Control Data Profile for RTP'' (RTP/CDP) for
machine-to-machine Machine to machine (M2M) is direct communication between devices using any communications channel, including wired and wireless. Machine to machine communication can include industrial instrumentation, enabling a sensor or meter to communicate the ...
communications.


Packet header

RTP packets are created at the application layer and handed to the transport layer for delivery. Each unit of RTP media data created by an application begins with the RTP packet header. The RTP header has a minimum size of 12 bytes. After the header, optional header extensions may be present. This is followed by the RTP payload, the format of which is determined by the particular class of application. The fields in the header are as follows: * Version: (2 bits) Indicates the version of the protocol. Current version is 2. * P (Padding): (1 bit) Used to indicate if there are extra padding bytes at the end of the RTP packet. Padding may be used to fill up a block of certain size, for example as required by an encryption algorithm. The last byte of the padding contains the number of padding bytes that were added (including itself). * X (Extension): (1 bit) Indicates presence of an ''extension header'' between the header and payload data. The extension header is application or profile specific. * CC (CSRC count): (4 bits) Contains the number of CSRC identifiers (defined below) that follow the SSRC (also defined below). * M (Marker): (1 bit) Signaling used at the application level in a profile-specific manner. If it is set, it means that the current data has some special relevance for the application. * PT (Payload type): (7 bits) Indicates the format of the payload and thus determines its interpretation by the application. Values are profile specific and may be dynamically assigned. * Sequence number: (16 bits) The sequence number is incremented for each RTP data packet sent and is to be used by the receiver to detect packet loss and to accommodate
out-of-order delivery In computer networking, out-of-order delivery is the delivery of data packets in a different order from which they were sent. Out-of-order delivery can be caused by packets following multiple paths through a network, by lower-layer retransmissio ...
. The initial value of the sequence number should be randomized to make
known-plaintext attack The known-plaintext attack (KPA) is an attack model for cryptanalysis where the attacker has access to both the plaintext (called a crib), and its encrypted version (ciphertext). These can be used to reveal further secret information such as secr ...
s on
Secure Real-time Transport Protocol The Secure Real-time Transport Protocol (SRTP) is a profile for Real-time Transport Protocol (RTP) intended to provide encryption, message authentication and integrity, and replay attack protection to the RTP data in both unicast and multicas ...
more difficult. * Timestamp: (32 bits) Used by the receiver to play back the received samples at appropriate time and interval. When several media streams are present, the timestamps may be independent in each stream. The granularity of the timing is application specific. For example, an audio application that samples data once every 125 μs (8 kHz, a common sample rate in digital telephony) would use that value as its clock resolution. Video streams typically use a 90 kHz clock. The clock granularity is one of the details that is specified in the RTP profile for an application.Peterson,
432
/ref> * SSRC: (32 bits) ''Synchronization source identifier'' uniquely identifies the source of a stream. The synchronization sources within the same RTP session will be unique. * CSRC: (32 bits each, the number of entries is indicated by the ''CSRC count'' field) ''Contributing source IDs'' enumerate contributing sources to a stream which has been generated from multiple sources. * Header extension: (optional, presence indicated by ''Extension'' field) The first 32-bit word contains a profile-specific identifier (16 bits) and a length specifier (16 bits) that indicates the length of the extension in 32-bit units, excluding the 32 bits of the extension header. The extension header data follows.


Application design

A functional multimedia application requires other protocols and standards used in conjunction with RTP. Protocols such as SIP,
Jingle A jingle is a short song or tune used in advertising and for other commercial uses. Jingles are a form of sound branding. A jingle contains one or more hooks and meaning that explicitly promote the product or service being advertised, usually t ...
, RTSP, H.225 and H.245 are used for session initiation, control and termination. Other standards, such as H.264, MPEG and H.263, are used for encoding the payload data as specified by the applicable RTP profile. An RTP sender captures the multimedia data, then encodes, frames and transmits it as RTP packets with appropriate timestamps and increasing timestamps and sequence numbers. The sender sets the ''payload type'' field in accordance with connection negotiation and the RTP profile in use. The RTP receiver detects missing packets and may reorder packets. It decodes the media data in the packets according to the payload type and presents the stream to its user.


Standards documents

* , Standard 64, ''RTP: A Transport Protocol for Real-Time Applications'' * , Standard 65, ''RTP Profile for Audio and Video Conferences with Minimal Control'' * , ''Media Type Registration of RTP Payload Formats'' * , ''Media Type Registration of Payload Formats in the RTP Profile for Audio and Video Conferences'' * , ''A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources'' * , ''RTP Payload Format for 12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio'' * , ''RTP Payload Format for
H.264 Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distr ...
Video'' * , ''RTP Payload Format for Transport of MPEG-4 Elementary Streams'' * , ''RTP Payload Format for
MPEG-4 MPEG-4 is a group of international standards for the compression of digital audio and visual data, multimedia systems, and file storage formats. It was originally introduced in late 1998 as a group of audio and video coding formats and related tec ...
Audio/Visual Streams'' * , ''RTP Payload Format for
MPEG1 MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to about 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively) without excessive quality loss, making ...
/ MPEG2 Video'' * , ''RTP Payload Format for Uncompressed Video'' * , '' RTP Payload Format for MIDI'' * , ''An Implementation Guide for RTP MIDI'' * , ''RTP Payload Format for the
Opus ''Opus'' (pl. ''opera'') is a Latin word meaning "work". Italian equivalents are ''opera'' (singular) and ''opere'' (pl.). Opus or OPUS may refer to: Arts and entertainment Music * Opus number, (abbr. Op.) specifying order of (usually) publicatio ...
Speech and Audio Codec'' * , ''RTP Payload Format for
High Efficiency Video Coding High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding (AVC, H.264, or MPEG-4 Part 10). In compar ...
(HEVC)''


See also

*
Real Data Transport Real Data Transport (RDT) is a proprietary transport protocol for the actual audio-video data, developed by RealNetworks in the 1990s. It is commonly used in companion with a control protocol for streaming media like the IETF's Real Time Streaming ...
*
Real Time Streaming Protocol The Real Time Streaming Protocol (RTSP) is an application-level network protocol designed for multiplexing and packetizing multimedia transport streams (such as interactive media, video and audio) over a suitable transport protocol. RTSP is us ...
*
ZRTP ZRTP (composed of Z and Real-time Transport Protocol) is a cryptographic key-agreement protocol to negotiate the keys for encryption between two end points in a Voice over IP (VoIP) phone telephony call based on the Real-time Transport Protocol. ...


Notes


References

* *


Further reading

*


External links


Henning Schulzrinne's RTP page
(includin


GNU ccRTP

JRTPLIB, a C++ RTP library

Managed Media Aggregation
.NET C# RFC compliant implementation of RTP / RTCP written in completely managed code. * {{DEFAULTSORT:Real-Time Transport Protocol Streaming Application layer protocols VoIP protocols Audio network protocols