AV1 Codec – Keeping Video on Track for Viewers and Services
Bitmovin was founded in 2013 by co-founders of the MPEG-DASH video standard - Stefan Lederer, Christopher Müller and Christian Timmerer. Since then the company has focussed on online video by developing a fast, API-driven, cloud-based video encoding service and an HTML5 Player for MPEG-DASH and HLS that allows adaptive content to be played on any device, in any browser, without buffering.
The rise of online video services generally, and OTT in particular, has led to more competition and a need for higher video quality to meet consumer expectations for content on demand, on all platforms. Apart from more competition, David Godfrey, Bitmovin Vice President Asia Pacific, believes that service providers, broadcasters and others delivering video to viewers, face a new challenge regarding video codecs. Traditionally, broadcasters only needed systems that could deliver the best picture quality possible for a specific delivery method, for example, cable, terrestrial or satellite television.
“Now, with more networks used to access content and an increase in the amount of data associated with high quality content, new approaches to video compression are necessary,” David said. “Ongoing efforts to address the issue of bandwidth include initiatives such as MPEG-4 and HEVC. However, licencing and low adoption rates, particularly regarding streaming, have slowed progress in this area.
“More recently, the Alliance for Open Media (AOMedia) launched AV1 in March 2018 as an alternative for delivering video over the internet. Because it is open-source and royalty-free, service providers can use AV1 in social, mobile video, VR and online TV developments without having to pay historical IP owners. So far, companies including Apple, Google, Microsoft, Facebook, Intel, Cisco and more recently Vimeo have shown support for AV1.”
By visibly improving on HEVC for higher resolution and higher bitrates, AV1 is a useful option for video streaming. Video coding engineer at Bitmovin Christian Feldmann notes that the codec has other advantages as well, and opens up five practical encoding and decoding techniques to providers. Two of these deal directly with coding, making the process more efficient by accepting non-binary symbols in video, and using coding to help scale compression along with resolution. One neatly prevents digital grain effects from creating noise in compressed video, and two techniques address the familar video issues of filters and motion compensation.
All video codecs rely on various combinations of filters as a means of improving the visual quality of the encoded video. Filtering occurs mainly along the outlines of each of the blocks that divide each picture into smaller sub-units during compression. Most of the filters contained in AV1 are derived from existing codecs, and among these filters, the constrained directional enhancement filter (CDEF) is among the most influential.
Christian said, “This filter merges two existing filters - a directional de-ringing filter as used in the Daala video codec and the constrained low pass filter (CLPF) from the Thor video codec. CLPF is applied to filter out artefacts related to quantization errors that have not been corrected during the preceding application of a de-blocking filter. The directional de-ringing filter works by recognising edges within each block and identifying their orientation. It then conditionally applies a directional low-pass filter along those edges, resulting in a smoother, higher quality picture.”
CDEF in AV1 merges the two filters, and works by analysing the contents of each block, smoothing out artifacts along edges and de-blocking the picture. The filtering process is performed first by the encoder in order to determine the correct reference frames, while searching for the filtering parameters - direction and variance - happens on the decoder end, after the video has already been encoded. Since the filtering operation can run on the consumer’s hardware, both the required network bandwidth and the traffic load can be reduced.
Motion Compensation - Warped and Global
Predicting and compensating motions is another factor common to all types of video compression because it helps weed out redundant information that would add unecessarily to the data transmitted through the bitstream. Motion compensation recognises and accurately anticipates movement patterns within frames and blocks, thereby reducing the amount of video data needed for the coding process.
Warped motion compensation is an interesting part of AV1's motion processing because it anticipates movement patterns in three dimensions, thus making it possible to predict paths of movement through space within videos. Based on the calculated predictions, redundant information is identified and omitted from the encode, resulting in a significant reduction in required data.
Global motion compensation predicts motions over the entire frame including camera moves, zooming sequences and so on. The results of these analyses are used to condense the transmitted information to statements such as 'move all blocks right' or 'pan this block', again saving on the total amount of data in the bitstream.
So far, motion compensation algorithms have been developed for use on a two-dimensional level. AV1 is unusual for implementing non-planar motion compensation. Due to a steady increase in processing power of consumer devices, this technique is now in a good position for use in wider markets. Christian said, “These techniques work very well for predicting large area movements, like background motion or camera movement. They also handle consistent backgrounds and colour schemes effectively, one reason why animated videos often give great encoding results, even with very high levels of compression.”
Synthesising Film Grain
Film grain is a natural characteristic of content shot on film that many people find pleasing, and becomes visible in highly enlarged images. Digital film grain can also be applied to video footage in post as an artistic effect. However, digital video compression processors find this effect hard to recognise simply as an effect, resulting in constant 'noise' that creates considerable traffic in the bitstream and leads to high bitrate requirements. Because our eyes tend to filter out visual noise to some extent in any case, the transmitted information is of little actual value for the perceived quality.
Therefore, AV1's developers sought a way to avoid transferring the grain information with the bitstream, and to re-apply it later instead. Called film grain synthesis, their technique de-noises the initial content before encoding it and then re-adds the grain effect before output during the decoding process. This way, the 'noise' does not have to be transmitted at all and the overall load of data in the stream can be kept to a minimum. The potential bandwidth savings is especially relevant for content providers working with older, 'noisy' digitized video footage or in videos that use film grain for artistic purposes.
Larger Coding Block Sizes
As video resolutions increase, it becomes necessary to scale the compression process along with high resolution content. Larger block size is an effective way of doing this, based on the conventional method of partitioning each frame into individual coding units, or blocks, which are then processed individually during coding. Consequently, small resolutions like 1280×720 (720p) can be divided into 64×64 blocks pretty easily, but the same block size becomes unwieldy for large resolutions like 7680×4320, or 8K UHD.
Since bigger units mean fewer blocks per frame, use of larger coding units is necessary to achieve more efficient compression that still preserves visual quality for high resolution content. Fewer blocks to process, combined with lower signaling rates per block, reduces coding delay for large resolutions. Larger block size also enables the use of bigger prediction and transform units, which again makes large resolution content easier to handle.
Non-binary Arithmetic Coding
This technique diverges from other recent codecs like HEVC or AVC that require every data symbol entered into the arithmetic coding engine to be binary. When usung AV1, non-binary symbols can also be entered, which means they can have up to eight possible values instead of just two. A typical engine would normally have to 'binarize' or convert the symbol into a binary code before arithmetic coding.
But in the case of AV1, the arithmetic coding engine processes the symbols directly, and produces a binary bitstream as output. Christian said, “Both ends, encoder and decoder, operate using probability calculations to estimate how many output bits will be created from a given symbol. Theoretically, any given input symbol could therefore produce multiple bits or even just a fraction of a bit.
“Although non-binary coding makes the coding process more complex by combining multiple values into a single symbol, processing overall is less complex and more flexible than when dealing with one bit per symbol. One major benefit lies in the possibility of processing more symbols per clock cycle using this procedure. As clock cycles have to be performed serially, non-binary coding improves on efficiency by allowing multiple symbols to be handled during each serial cycle.” bitmovin.com