Introduction to Peer-To-Peer Architecture
The Peer-To-Peer (P2P) architecture in WebRTC allows browsers to communicate directly without needing an intermediate server. This allows communication to be distributed and decentralized, with numerous benefits, such as increased security and fault tolerance. The P2P architecture is a powerful feature of WebRTC and allows video calls at a far lower cost than a server-based call. This lesson tackles the basic P2P WebRTC architecture and does not delve into any additional optimizations as they are not part of the WebRTC spec.
Note: There may be STUN/TURN servers used in the process of creating a call. However, none are required for forwarding or processing any streams involved in a P2P call.
The P2P architecture of WebRTC is based on a technique called Interactive Connectivity Establishment (ICE) - as seen in the last module. ICE is a protocol for establishing P2P connections between two devices over the internet. The protocol uses techniques (such as STUN, TURN, and NAT traversal) to establish the most optimal connection possible. In a P2P network, all devices are connected to every other device and receive all media streams from each device. This leads to fewer bottlenecks, and no single point of failure is involved. However, the device also has to send its own media stream separately to every device on the P2P network, increasing the total number of streams.
Scalability
While P2P matches resource consumption with other architectures over a few devices, it does not scale as well. Each device on a P2P maintains a connection with every other device in the network. Managing many outgoing streams and individual connections can quickly go from reasonable to untenable. While the added benefit is not needing a server directly for the streams involved, each device has a considerable load.
For a simple scalability test, let’s assume one connection is a bidirectional link between two devices. The following table indicates the number of connections in the network as the number of devices rises:
Devices | Connections |
---|---|
2 | 1 |
3 | 3 |
5 | 10 |
10 | 45 |
25 | 300 |
50 | 1225 |
In most cases, each device has multiple output streams of different media types and video resolutions, which adds an even higher load when working with P2P architecture.
Additionally, when poor network connections are involved, they can cause packets to be dropped, which can result in audio and video glitches. WebRTC contains error correction mechanisms for these situations. These involve messages to correct errors such as SACKs (Selective Acknowledgements: notify the sender of gaps in video packets received) or NACKs (Negative acknowledgments: notify the sender of lost packets in the video stream) to remedy the situation. However, with a large number of network connections, such as in the P2P architecture, these packets can degrade the video quality for all users on the call.
Performance
In most architectures, computational tasks involved in a call are split between the servers and the devices. These tasks include maintaining peers in the network and encoding/decoding media streams. In the P2P architecture, the tasks are mostly carried out by the individual devices.
Latency: The latency between devices is often lower in smaller P2P calls since no server call data is routed.
Compute power: As the number of devices increases, the computation load on every device in a P2P network increases considerably. Increasing the number of participants beyond a small group (5-6 participants) would need further optimizations or architecture improvements.
Cost
The P2P architecture likely has the lowest cost of the architectures described in the lesson, as there is no need for setting up any intermediate server, and most processing happens on the device.
However, there may be a few additional costs:
Signaling server: Allows participants to connect to each other.
TURN server: If participants cannot connect to each other, traffic must be routed through a TURN server.
While these costs are relatively low compared to the costs involved in setting up a Selective Forwarding Unit (SFU) or Multipoint Control Unit (MCU) architecture, it is important to acknowledge the limitations of the P2P architecture and use it in the correct use cases.
Advantages
Here are the advantages associated with using P2P architecture:
- Lower costs since no costly processing/forwarding infrastructure is used (other than signaling STUN & TURN).
- Latency is better than MCU & SFU without cascading. It depends but typically isn’t as good as the SFU with cascading.
Disadvantages
Here are the disadvantages of using P2P architecture:
- Scaling challenges show up as soon as you get to > 3 users.
- Packet loss/reliability on poor networks. Scenarios such as long-distance calling on mobile networks are suboptimal.
- IP leaks & security issues.
- Latency can be suboptimal compared to SFUs with cascading due to network optimization.
- WebRTC has many edge cases, race conditions, and difficult-to-debug errors. From an engineering cost perspective, building directly on WebRTC is expensive.
- Developer experience on mobile is poor.
- Simulcast and sending a high quality to one user and a lower quality to another is difficult to implement.
- More effort is required to maintain connections between devices in the network.
Owing to these points, P2P is rarely used on large production systems for video calls. It is very difficult to scale beyond a few users and has several limitations. This is one of the reasons Stream Video opted for the SFU architecture with cascading.
When to use Peer 2 peer
- 1-1 calls on good network conditions.
- Quality is less important than the price per call.
- High number of calls, so the benefit of the low cost per call outweighs the extra development effort.
- Web support only is acceptable.