Video streaming is omnipresent and, due to recent global events, the number of people being at home watching streamed video only increased further. The main issue is that the network conditions under which user stream video is not always ideal. This results in a lowered visual quality due to the "adaptive bitrate algorithms" (ABR) that try to select a quality of video that is small enough, in terms of video bitrate and thus amount of data, to be streamed under all current network conditions. As those algorithms are not perfect, the visual quality degrades unnecessarily. In the worst case the ABR misjudges the network condition, the video does not get downloaded in time and the video stalls, presenting the viewer, typically, with a spinning indicator until enough new video is downloaded to continue playback. While there is vast literature on optimizing video streaming, virtually all prior work follow a piecemeal approach-either "tweaking" the transport layer or making the client "smarter."
With our system, which we called VOXEL, we follow a more holistic approach. First, we recognize that some video frames are more important than others, i.e., simply dropping certain frames does not degrade visual quality and thus does not inﬂuence the end-users quality of experience (QoE). But we start at the transport layer, avoiding TCPs need to transfer every single byte, even when this results in head-of-line blocking.
But we go further as to not only distinguish video frames by type but to analyze the entire video to rank each individual frame by their actual inﬂuence in the overall visual quality of the video. With this ﬁne grained information, we can, instead of blindly reducing the video bitrate, hoping the visual impact will not degrade the QoE, reduce the required amount of data precisely to the network condition while knowing exactly what the impact on the QoE will be. We, therefore, created a new kind of ABR that does not aim to maximize the bitrate but the visual quality. This synergy of video streaming tailored transport, one time in-depth video analysis and visual quality aware ABR, results in VOXEL reducing the rebuﬀering, even in challenging network conditions by at least 25% and up to 90%, all while providing a visual quality that is at least on-par with state-of-the-art streaming solutions.
In addition to measuring the QoE with objective quality metrics like SSIM, VMAF and PSNR, we also conducted a real user survey where we recruited 54 participants from diﬀerent universities and asked them to watch short video clips that were recorded from streaming experiments under identical challenging network conditions with VOXEL and the state-of-the-art. 84% of the participants preferred watching the version streamed with VOXEL. When asked if they would continue a stream that behaves like the shown clips, 74% of participants would have abandoned the video when streamed via the state-of-the-art. In contrast, only 36% would have stopped watching a VOXEL clip.
One reason for this preference is the vastly reduced rebuﬀering, as conﬁrmed by the participants. As the dropping of frames in VOXEL can introduce visual artifacts, the Mean Opinion Score (MOS) for "glitches" and "clarity" were slightly lower for VOXEL, though, the MOS for the overall watching experience was much higher for VOXEL. Lastly, to ease adopting, VOXEL is entirely backwards compatible to existing streaming solutions and each component can incrementally be deployed.
Coordinator: Mirko Palmer
We want to apply VOXEL to 360 degree video, commonly referred to as VR video. Avoiding rebuﬀering a primary goal there as to not confuse or discomfort users when the video suddenly stops and rebuﬀers. The main problem is that, compared to regular video, one does not only have a single ﬂat video stream but a spherical projection of several so called tiles, arranged in a grid, each of them being videos themselves, that are stitched together to form the 360 degree sphere. This results in a vastly increased complexity in terms of quality selection of each individual video tile, or in case of VOXEL, where to drop which frames, in order to avoid rebuﬀering.
Another aspect, diﬀerent to regular video, is that the viewer can freely rotate their head and thus focus on diﬀerent parts of the 360 scene. On one hand, this eases the steam as video data that is, behind the user’s head, so to speak, does not need to be transferred in the highest quality. Though, if the user suddenly turns, they do expect the quality to be as high as possible.
As a result, to avoid rebuﬀering, we have to anticipate where the user will look next, and maximize the quality of each tile, given the current network situation, i.e., the network transfer budget available to select tile qualities and again, what fraction of frames to not even transfer as the lack of them would not negatively inﬂuence the user’s quality of experience (QoE).
Today, content consumption on the internet is omnipresent. Since the global pandemic and a move towards working from home, the amount of content consumption, speciﬁcally, video- streaming has increased substantially. Though, the network conditions are not always ideal to support the high throughput requirements for content consumption. The state-of-the-art solution for overcoming insuﬃcient throughput for video-streaming is to employ some form of adaptive bitrate (ABR) algorithm. An ABR algorithm selects a speciﬁc video quality that has a throughput requirement lower than the available throughput. This selection is repeated every few seconds to adjust to account for a change in the available throughput. These algorithms, however, are not perfect: they can misjudge the network conditions and either download a quality lower than necessary, impacting a users’ quality-of-experience (QoE) or select a quality that requires more data than the current network conditions allow, resulting in stalls due to the video notbeing delivered in time. The latter results in a signiﬁcant degradation of user’s QoE. Virtually all prior work follow a piecemeal approach—either “tweaking” the fully reliable transport layer or making the client “smarter.” Departing from prior work, we follow a holistic approach and design a cross-layer video-streaming solution, called VOXEL . We use VOXEL to demonstrate how to combine application-provided “insights” with a partially reliable protocol for optimizing video streaming. First, we recognize that some video frames are less important than others, i.e., intelligently dropping speciﬁc frames does not degrade visual quality, and thus it does not aﬀect end-users’ QoEs. We rank the individual frames constituting each video segment in terms of their impact or inﬂuence on the overall quality of the video, and use this ranking to determinewhen (and where) a reliable delivery is required. To this end, we present a novel ABR algorithm that explicitly trades oﬀ losses for improving end-users’ video-watching experiences. This synergy of a video streaming tailored transport, a one time in-depth video analysis and a visual quality aware ABR, results in VOXEL reducing the rebuﬀering, even in challenging network conditions, in the 90th-percentile, by up to 97%, while providing a visual quality that is at least on-par with state-of-the-art streaming solutions. The rebuﬀering reduction capabilities of VOXEL were evaluated extensively in a full end-to-end system. We conducted several experiments from emulating a diverse set of network conditions in lab to streaming video over the internet from a datacenter in France. Our results from all experiments show that VOXEL, indeed, is at least on par, but in most cases outperforms the state-of-the-art. To evaluate the objective visual impact of dropping frames, we utilized SSIM for its practicability in terms of its robustness compared to its computational complexity.
Investigators: Mirko Palmer, Qi Guo, and Anja Feldmann, in cooperation with Balakrishnan Chandrasekaran (Vrije Universiteit Amsterdam), Ramesh K. Sitaraman and Kevin Spiteri (UMass Amherst, USA), and Malte Appel (Internet Initiative Japan)
•  M. Palmer, M. Appel, K. Spiteri, B. Chandrasekaran, A. Feldmann, and R. K. Sitaraman. VOXEL:Cross-layer optimization for video streaming with imperfect transmission. In CoNEXT ’21, 17th International Conference on Emerging Networking Experiments And Technologies, Virtual Event, Germany, 2021, pp. 359–374. ACM.