Real-Time Transport Protocol (RTP)

Live streaming voice and audio presents many challenges. RTP addresses them, ensuring media stream integrity and maintaining playback synchronization.

What is Real-Time Transport Protocol (RTP)?

RTP is a standard protocol for sending live or real-time video or voice data over the Internet. It's designed not to bother with error correction and expects packet loss, skipping lost or damaged packets to keep the stream synchronized with the source.

RTP sends small packets of live or real-time video or voice data over the internet and reassembles them upon receipt. Those packets can:

follow different paths;
arrive at different times;
arrive in the wrong order; or
be lost in transmission.

To support real-time communication, RTP prioritizes the reassembly and delivery of data packets rather than ensuring they're all received in perfect condition. It expects packet loss and simply skips lost or damaged packets to keep the stream synchronized with the source.

How Does RTP Work?

When broadcasting a multimedia stream using RTP, the protocol first encodes it in sections using the specified codec (in the Payload types field of the RTP header). It compresses the video to make it small enough to be transported over an internet connection.

After RTP has encoded each part of the multimedia stream, it breaks it into packets. The size of the packets depends on the type of media. The packets are then transmitted for the receiving stations to reassemble in the correct sequence, decode, and then play.

RTP maintains the integrity of the data stream with tools like timestamping, sequence numbering, and packet loss recovery. This allows receiving devices to rebuild the media content accurately while staying in sync with the source without negatively impacting call quality.

This is possible because the protocol emphasizes sending packets quickly rather than ensuring all the data is received. This helps prevent buffering and stop-start playback, which keeps streams consistent and uninterrupted.

The RTP Packet Format

All voice and video data packets use the same RTP header format:

| Field | Bits | Description | |-----------------------------------|:----:|-------------------------------------------------------------------------------------------------------------------------------------------------------------| | Version (V) | 2 | RTP version number, currently 2 | | Padding (P) | 1 | Value of 1 = end-of-packet padding | | | | Value of 0 = no padding | | eXtension (X) | 1 | Value of 1 = extension header following the fixed header | | | | Value of 0 = no extra extension | | CSRC count (CC) | 4 | Number of CSRC identifiers (max 15) following the fixed header | | Marker (M) | 1 | Demarcates significant events in the packet stream | | Payload type (PT) | 7 | Indicates the specific encoding of the audio/video payload | | Sequence number | 16 | The first packet’s sequence number is random, and the sequence number of each packet after that is incremented by 1. | | Timestamp | 32 | The first packet’s timestamp is random, and the next packet’s is the sum of the previous timestamp and the time to produce the first byte of the new packet | | Synchronization Source Identifier | 32 | The source itself selects the random number. It helps resolve conflicts when two sources begin with the same sequencing number | | Contributor Identifier | 32 | Used for source identification when multiple sources are present in the session |

Real-Time Transport Control Protocol (RTCP)

A key Quality of Service (QoS) mechanism is the Real-Time Transport Control Protocol (RTCP), which operates together with RTP. Where RTP delivers the actual data, RTCP exchanges control packets between senders and receivers. Its principal function is to give feedback on the QoS provided by RTP.

RTP has a wide range of ports for transmitting and receiving data on User Datagram Protocol (UDP): 16384--32767. The RTP stream is sent and received on the even port number, while associated RTCP traffic will be on the odd port (e.g., 25654 & 25655). Some vendors narrow the range, so check your documentation. Generally, RTP and RTCP will be configured to operate on a pair of neighboring ports.

RTCP provides statistics and metadata. The types of data that RTCP exchanges include octet and packet counts, jitter, and round-trip time. Applications use that metadata to control QoS parameters and can select a different video codec, for example.

Its main limitation is that RTCP lacks encryption and authentication methods, but you can get those by using the Secure Real-time Transport Protocol (SRTP).

RTP vs RTSP

Now that you know the roles of RTP and RTCP, we need to discuss where Real-Time Streaming Protocol (RTSP) fits into the picture. The three protocols share a common foundation in enabling real-time multimedia transmission over IP communication.

While RTP and RTCP work together to ensure synchronized media streaming between sources and receivers, RTSP allows clients to initiate, control, and terminate streaming sessions. In simple terms, it adds DVD-player-like functionality to your real-time multimedia streaming.

In short, the difference between RTP and RTSP comes down to the application. RTP is most appropriate for voice and video two-way comms, while RTSP is better for streaming broadcasts.

TCP vs UDP

The most common way to transmit data over the internet is the Transmission Control Protocol (TCP). Data packets can take different routes to the destination and arrive out of order, so the receiving device must rearrange them correctly before processing the data. As you can imagine, this can create problems for live voice or video transmissions.

That's why UDP was introduced. Where TCP is connection-based, UDP is connectionless, making it much faster but less reliable. So, packets tend to arrive in the correct order or, at least, close to it. While some data can get lost or malformed, the Real-time Transport Protocol has several packet loss handling mechanisms.

Packet Loss Recovery

There are several codified packet loss recovery techniques that you can use in your implementation of RTP. Which one you choose depends on the nature of your application and your preferred trade-off between streaming quality and playback continuity. There are also two classes of recovery techniques: sender-based and receiver-based. Your options include:

Sender-based options
Automatic Repeat reQuest (ARQ) -- lost packets are retransmitted
Forward Error Control (FEC) -- the receiver reconstructs lost data
Interleaving -- reduces the effects of packet loss
Receiver-based alternatives
Insertion -- inserting fill-in packets to mitigate loss
Interpolation -- mitigating losses through packet repetition
Regeneration --synthesizing lost packets using code parameters

Methods for Ensuring QoS in RTP Streams

Network bandwidth is finite and easily constrained when communicating with people located in certain countries. So, the goal of QoS is to prioritize data packets and maximize the use of the available bandwidth without compromising the performance of critical applications.

For example, data packets associated with a video chatl would be prioritized over those related to an email download because the first is time-sensitive, and the second isn't. If data packets are delayed or dropped during the video call, users might experience jitter or latency, disrupting the call.

But if packets are dropped or delayed in transmitting the email, it doesn't matter because the message is only delivered once all the packets are received and ordered correctly. There are several techniques you can use to achieve this prioritization:

Traffic shaping allows you to control the volume and rate of traffic sent on the network to ensure a more consistent flow of traffic
Packet prioritization assigns priorities to different types of data
Resource reservation conserves capacity for high-priority traffic

Benefits of Using QoS in RTP

There are several ways QoS techniques and tools can enhance the performance of your network:

Bandwidth management. QoS ensures your critical applications get the bandwidth they require when required. This also avoids network congestion and keeps essential applications online and available.
Latency reduction. Traffic for real-time communications and interactive applications is prioritized, enhancing application responsiveness and user experiences.
Jitter control. When packet arrival times diverge widely, it disrupts real-time apps and communication. QoS helps control this by stabilizing packet delivery rates and providing a consistent stream of data.
Packet loss minimization. Some packet loss is unavoidable, but it doesn't have to affect the user experience of your applications. QoS helps minimize packet loss for applications that require a reliable data flow.
Improved UX. QoS allows you to ensure that your real-time and other critical applications are prioritized and get the resources they need, which can only improve the user experience.

Typical Applications of RTP

For Video APIs and Voice APIs, RTP is critical for efficient live streams, video conferencing, and VoIP. Typical RTP applications include:

VoIP Telephony. Because it's now possible to build VoIP calling functionality into applications, developers are doing so to provide both internal and external support for those applications. Live video calling also uses VoIP infrastructure.
Video Conferencing. RTP is ideal when a stream must be delivered to and received by multiple users in real time.
Live Streaming. A one-to-many scenario, such as live streaming, deploys RTP to minimize jitter and buffering.
IPTV. As with live streaming, IPTV uses RTP to ensure uninterrupted streams for subscribers.
Video-on-Demand. Timing and delivery of a constant stream is also critical for video-on-demand applications.
Online Gaming. One of the most demanding applications for RTP is online gaming, where the actions and responses of multiple different players must be delivered and received in real time.

Limitations of RTP

The biggest weakness of Real-time Transport Protocol is that its packets are not encrypted. This means a third party can eavesdrop undetected on an audio or video call. It makes sense not to use plain RTP when the subject or content of a conversation is sensitive or must be private.

In those cases, using either Secure Reliable Transport (SRT) protocol or SRTP is best. That extra security puts additional pressure on the network, and where bandwidth is not a problem, you can use SRTP. Where bandwidth is an issue and using a lower bitrate doesn't help enough, SRT was designed to deliver low-latency video and other media across network conditions.

History of RTP

RTP was developed roughly 30 years ago by the Audio-Video Transport Working Group of the Internet Engineering Task Force. It was initially intended to provide a standardized protocol for moving real-time audio and video over IP networks.

In 2003, the protocol was revised to add functionality that included the RTP Control Protocol (RTCP). That addition works alongside RTP, providing statistics and feedback about the quality of service of real-time sessions. It added better monitoring and management of real-time communication, leading to improved user experiences.

A more recent development happened in 2021 when RTP was integrated with Web Real-Time Communication (WebRTC). This enabled real-time communication directly within web browsers, allowing real-time video conferencing and voice calls.

Frequently Asked Questions

Why Is RTP Important?

Because it enables real-time communication across the internet, so RTP helps shape how we connect and interact. It allows us to communicate instantaneously with colleagues, friends, and family through voice and video calls. It lets us collaborate, make decisions, and resolve issues in our business and private lives without unnecessary delays.

How Does RTP Enhance Voice and Video Communication?

The main purpose of RTP streaming is to provide a reliable framework for delivering real-time communication. The framework ensures the delivery of a smooth and synchronized audio or video stream using features like packetization, timestamping, and sequence numbering. This lets receiving stations provide continuous, consistent user experiences.

What Are the Technical Details of RTP?

RTP can be used with TCP or UDP, but UDP is preferred because it’s designed for speed and simplicity. TCP emphasizes reliability at the expense of speed, which doesn’t work well for time-sensitive data transmission. You can use any port number with RTP, but the best practice is to use a pair of neighboring ports in the high-port range (1024-65535). The even-numbered port will be allocated to RTP, while RTCP will use the odd-numbered port.

RTP data packets include:

A sequence number (for detecting lost packets)
Payload identification (to describe the media codec)
Frame indication (marking the start and end of the frame)
Source identification (the originator)
Timestamps (detects delays and enables receiving systems to compensate for them)

How Do SSRC and CSRC Work With RTP?

The synchronization source (SSRC) is a throttling mechanism that works by flipping between two states: normal mode and throttling mode. When the SSRC changes, the receiver flips into throttling mode and restricts further SSRC changes, dropping any packets with unexpected SSRCs. It flips back into normal mode only after the SSRC has remained constant for a given time.

The contributing source (CSRC) is used when the RTP stream comes from multiple sources. The sources are identified in an array from 0 to 15, with the first element in the CSRC list representing the dominant speaker. If there are more than 15 contributing sources, only 15 may be identified. This allows receivers to implement special treatment for the dominant speaker, usually through a speaker selection algorithm on the mixer.

What Is SRTP and How Does it Work?

One weakness of RTP is that it lacks encryption to secure streams against packet sniffing and spoofing attacks. That’s where Secure RTP (SRTP) fits in. It’s an extension with enhanced security measures, including message authentication, confidentiality, integrity, encryption, and replay protection. SRTP is most often used in VoIP applications.