With the increasing demand of consumers for Ethernet multimedia applications, traditional Ethernet can not fundamentally meet the needs of real-time data transmission such as voice, multimedia, and other dynamic content, so AVB was born.
3.1. The main problem facing audio and video transmission
Think about it, what are the problems you often encounter when watching online videos? For example Caton, flower screen, (live streaming) time-lapse and audio and video out of sync, and so on. Assuming that the audio and video source is normal, what is the cause of the above problem?
Caton: The network is not smooth, the data can not arrive in time.
Flower screen: video data loss, error. However, this is only the superficial reason, the root cause is that the network is not smooth, data can not arrive in time.
(Live) latency: the network is not smooth, data can not arrive in time.
Audio and video out of sync: network is not smooth, data can not arrive in time will cause this phenomenon, in addition, the media clock out of sync will also cause this phenomenon.
[coderhuo20]
In this way, the main reasons affecting the network audio and video viewing experience are the following two points:
1. The network is not smooth, data loss or can not reach
2. The media clock is out of sync.
[coderhuo20]
This is also the main problem that AVB has to solve. The difference is that we watch network video that is generally transmitted over the WAN, and AVB solves the problem of audio and video transmission within the LAN.
[coderhuo20]
3.2. Solution
How to solve these problems? AVB technology is a good solution to these problems.
3.2.1. Network transmission issues
Traditional Ethernet is a competitive "best-effort transmission" packet messaging network. Some might say that TCP is reliable, yes, but its reliability is based on a retransmitted mechanism, with the side effect of increased latency. In addition, intermediate forwarding nodes (such as switches and routers in the network) have a greater impact on the transmission of messages. For example:
1. During transmission, if the intermediate forwarding node cache queue is full, subsequent packets are lost.
2. Transmission delay is affected by the cache state of the forwarding node, if there is more data in the cache queue, the transfer delay is larger, the data in the cache queue is less, and the transmission delay is small.
[coderhuo20]
If such a network can be built that reserves bandwidth for audio and video data (similar to a dedicated emergency lane on a highway), it not only solves the competition problem but also ensures that audio and video data can be transferred from the source to the destination within a specified time (the concept of a time-sensitive network). Yes, bandwidth reservations are an important part of AVB.
[coderhuo20]
3.2.2. Media clock synchronization issues
Audio and video transmitted over Ethernet are discrete digital signals, while in the real world they are continuous analog signals. In audio and video acquisition equipment, analog signals need to be sampled, quantified, encoded to convert to digital signals (analog conversion, or A/D conversion). The A/D conversion process involves sampling frequency, i.e. how often the interval is sampled. When the receiving end receives the data, it plays at the same frequency. This is media clock synchronization.
[Wikipedia21]
The media clock must have a reference clock so that it knows how long it has been since the last acquisition and whether the next sample should be taken. Some acquisition systems use the system clock as a reference clock, while others use external signals as reference clocks. Different reference clocks are likely to be out of sync, such as 1ms of A clocks may correspond to 0.9ms of the realization world, and 1ms of B clocks may correspond to 1.1ms of real-world amount. In addition, the clock is not stable, due to temperature, running time, and other factors that will produce drift (like watches, some overtime slow down, some overtime become faster).
[Avnu21]
The question is, how does the receiver ensure that its playback rate and acquisition frequency are the same as the acquisition frequency on the acquisition side (i.e., how to restore the media clock)? This is difficult to do without a uniform reference clock. If the reference clock is out of sync, it may cause the audio that should have been played first to play later, or the video that should have been played first, which in the view of the audience is audio and video out of sync.
[Avnu21]
The precise clock synchronization technology in AVB keeps the clocks of all nodes in sync, thus providing conditions for media clock synchronization.
[Avnu21]
3.3. AVB system
AVB is called a time-sensitive network and consists of two main meanings:
1. Small delay, good real-time.
2. Delays are controllable and, in the worst case, do not exceed a certain threshold.
[IEEE802]
3.3.1. Protocol framework
AVB, whose full name is Audio Video Bridging, consists of a series of IEEE standards designed to efficiently transmit audio and video data over a local area network. As shown in Table 1, the AVB stack looks like this:
AVB is mainly a protocol of the link layer, which is related to the traditional TCP/IP protocol stack coexisting side by side. The AVB stack includes not only the bandwidth reservation and clock synchronization protocols mentioned earlier, but also several other auxiliary protocols, the main protocols are as follows.
[coderhuo20]
3.3.1.1. IEEE Std 802.1AS-2011
The Generalized Precision Time Protocol (gPTP) is used to synchronize the clocks of all nodes in the network to the same master clock. 1. PTP defines the clock synchronization mechanism of the whole network based on IEEE 1588:2002 protocol. By defining the master clock selection and negotiation algorithm, path delay measurement, and compensation, and the mechanism of clock frequency matching and adjustment, the PTP device exchanges standard Ethernet messages to synchronize the time of each node of the network to a common master clock.
[AS11]
As a simplified version of the IEEE 1588 protocol, the biggest difference between IEEE 802.1AS and 1588 is that PTP is a protocol that is based entirely on two-tier networks and is not IP routing. Like IEEE 1588, PTP defines a method of automatically negotiating the network master clock, the optimal master clock algorithm (Best Master Clock Algorithm, or BMCA for short). BMCA defines the underlying negotiation and signaling mechanism to identify the master clock (Grandmaster) within the AVB LAN. Once the master clock is selected, the PTP devices of all LAN nodes will use this master clock as a reference value, and if Grandmaster changes, the entire AVB network can determine the new master clock in the shortest possible time through BMCA, ensuring that the entire network keeps the time in sync.
[AS11]
At the heart of 802.1AS is the timestamp mechanism. PTP messages, when entering and exiting ports with 802.1AS capabilities, trigger sampling of the local real-time clock (RTC) according to the protocol, compare their RTC values with information from the corresponding master clock (Master) of the port and match their RTC clock
values to the time of the PTP domain using path delay measurement and compensation techniques. When the PTP synchronization mechanism covers the entire AVB LAN, the clock adjustment and frequency matching algorithm can be precisely realized between the network node devices through the exchange of periodic PTP messages. Eventually, all PTP nodes will be synchronized to the same Wall Clock time, the Grandmaster time. In a network environment with a maximum of 7 hops, PTP can theoretically guarantee clock synchronization errors of less than 1I1/4s.
[AS11]
3.3.1.2. IEEE Std 802.1 Qat
Bandwidth Reservation Protocol (Stream Reservation Protocol, SRP) is currently consolidated in Chapter 35 of IEEE Std 802.1Q-2011.
[Qat10]
The nature of the traditional IEEE 802 network standard limits its inability to prioritize normal asynchronous traffic with time-sensitive streaming traffic. To provide guaranteed quality of service (QoS), the Flow Reservation Protocol (SRP) ensures end-to-end bandwidth availability between AV-streaming devices. If the required path bandwidth is available, all devices on the entire path, including switches and end devices, lock this resource.
[Qat10]
SRP uses the IEEE 802.1ak Multi-Registration Protocol (MRP) to deliver messages in exchange for bandwidth description messages for AV streams and to reserve bandwidth resources. SRP-compliant switches can use 75 percent of the bandwidth resources available across the network for AVB links, leaving 25 percent of the bandwidth left to traditional Ethernet traffic. In SRP, the provider of the streaming service is called Talker, and the recipient of the streaming service is called Listener.
[Qat10]
The same Talker streaming service can be received by multiple Listeners at the same time, and SRP allows only one-way traffic from Talker to Listener to be guaranteed. Talker can start offering real-time AV services as long as bandwidth resources on any path from Talker to any of the multiple Listener can be negotiated and locked. The SRP's internal periodic state machine maintains the registration information of Talker and Listener, and can dynamically monitor the state of the network nodes and update its internal registration information database to adapt to the dynamic changes of the network topology. Both Talker and Listener can join or leave the AVB network at any time without an unrecoverable impact on the overall functionality and status of the AVB network. The SRP consists of registration and reservation, with Talker negotiating reservations for bandwidth resources required for AV streams and Listener registering and receiving the required AV streams.
[Qat10]
3.3.1.3. IEEE Std 802.1 Qav
The Traffic Shaping Protocol (Forwarding and Queuing for time-sensitive streams, or FQTSS), is responsible for queuing and forwarding traffic and is currently integrated into Chapter 34 of IEEE Std 802.1Q-2011.
[Qav09]
The purpose of Qav queueing and forwarding protocols is to ensure that traditional asynchronous Ethernet data traffic does not interfere with AVB's live audio and video streams. Time-sensitive AV stream forwarding uses Pseudo-Synchronous mode, a mechanism that relies on the SRP to provide reserved bandwidth along the route, and an 8Khz clock for PTP. In each 125us time slot, isochronous frames such as 802.3 Ethernet, which contains data, are forwarded. At the same time, to avoid the competition of network resources between ordinary data traffic and AVB traffic, the time-sensitive AV stream and ordinary data stream are handled differently within the AVB switch, queuing the equal-time frame and asynchronous frame respectively, and giving the equal-time frame the highest priority.
[Qav09]
The service of normal asynchronous transmission continues to be provided under the condition of priority guarantee and equal-time frame transmission, which is Qav's Priority Management (Prioritize) and Traffic Shaping. While both terminals and switch devices require mechanisms to guarantee 75% of bandwidth resources for AVB applications, most implementations of 802.1Qav will be the responsibility of AVB switches.
[Qav09]
3.3.1.4. IEEE Std 1722-2016
Audio-Video Transport Protocol (AVTP). AVBTP defines the two-tier package format required to provide real-time audio and video streaming services within the local area network, and the establishment, control, and shutdown protocol of AV streams. AVBTP establishes a virtual link with low latency between physically separated audio and video codecs that transmit data and synchronize AVs using the same streaming format as IEEE 1394.
[IEEE16]
The IEC 61883 format used by AVBTP is as follows:
IEC 61883-2: SD-DVCR data transmission
IEC 61883-4: MPEG2-TS data transmission
IEC 61883-6: Audio and music data transmission protocol
IEC 61883-7: Transmission of ITU-R BO.1294 System B
IEC 61883-8: Transmission of ITU-R BT.601 style Digital Video Data
IIDC 1394-based Digital Camera Specification
[IEEE16]
Various compressed and non-compressed raw audio and video streams are packaged through the AVBTP protocol (filling the stream ID retained by the SRP, timestamped
with PTP, and related information such as media type), grouped through AVBTP-specific Ethernet frame types, emitted from the streaming service (Talker), forwarded by the AVB switch, and then received and unpacked, decoded, decoded, and exported by the recipient (Listener) who has registered for the AV streaming service.
[IEEE16]
3.3.1.5. IEEE Std 1722.1-2013
Audio Video Discovery, Enumeration, Connection Management, and Control Protocol, AVDECC, is responsible for service discovery, node capability traversal, link management, and other control functions.
[IEEE13]
3.3.1.6. IEEE Std 1733-2011
AVB-based RTP/RTCP Transport Protocol (Transport Protocol for Time-Sensitive Applications in Area Networks) is primarily used to coordinate multiple devices in a network, such as the following scenario: There are multiple Talker devices in the network and they are located on different devices. The receiving end needs to synchronize these Talker audio and video data.
[IEEE11]
RTP is one of the most widely used real-time streaming protocols, unlike IEEE 802.1 AVB, which is based entirely on two-tier standards, and RTP is a three-tier UDP/IP network-based protocol. To take advantage of the performance of the two-tier AVB in IP-based three-tier applications, IEEE 1733 extends RTP to provide time synchronization, latency assurance, and bandwidth reservation services within the local area network through bridges and routing to provide real-time audio and video stream transmission. It involves the packet format, the establishment, control, synchronization, and closure of the flow and other protocols.
[IEEE11]
3.3.2. Network topology
As shown in Figure 3, the network topology of the AVB looks like this:
It consists of two main types of nodes:
1. End Station: Sometimes referred to as endpoint, it is divided into Listener and Talker:
The listener is the node that receives audio and video data.
Talker is the node that outputs audio and video data.
2. AV Bridge: Audio and Video Gateway.
[coderhuo20]
There are two characteristics:
1. A node can contain more than one Talker, i.e. produce multiple audio and video streams.
2. A node can have the dual identity of Talker and Listener, both external output and receiving other people's data.
[coderhuo20]
3.3.3. Typical scenario
At present, AVB is mainly used in time-sensitive scenes, such as in-vehicle entertainment systems. Here is a simple example.
As shown in Figure 4, the DVD Player in the car plays audio and video files, which are transmitted to the speakers and video to two monitors in the rear seats. How do I keep my speakers and two monitors in sync for audio and video synchronization?
[coderhuo20]
First, the gPTP protocol ensures that the clocks of all nodes in the network are synchronized. Secondly, the SRP protocol reserves bandwidth for audio and video data to ensure that the destination can be reached within a certain time. So, if the DVR Player is agreed, will audio and video data from different destinations be played at the same time in the future? This is what timestamps are shown in the AVTP protocol.
[coderhuo20]