For example, the Intel® Xeon® Processor E3 family, introduced in 2011, supported H.264 hardware-based encoding for the first time. At the same time, Intel introduced Intel® Quick Sync Video technology, which enables hardware-accelerated media processing using the on-chip graphics processing capabilities of select Intel® Xeon® processors with Intel® Graphics Technology. In 2015, Intel launched the Intel Xeon Processor E3 v5 family, which enabled accelerated HEVC support. A comparative timeline of the evolution of Intel Quick Sync Video technology against the progress of video standards demonstrate the rapid pace with which platforms are keeping up with emerging standards.
The evolution of processor platforms has thus provided our industry with two predominant choices to implement video encoding.
The table above shows that the speed vs. quality trade-off remains a key factor, which influences the implementation decision. However, with HEVC high dynamic range (HDR) encoding becoming a popular requirement for higher resolutions like 4K, the unavailability of 10/12-bit support by Intel Quick Sync Video forces the industry to lean towards pure software encoding for such applications. Here is the resulting dilemma. Since performance requirements are amplified at higher resolutions, the complex software video encoding becomes slow and expensive. If we compromise quality for speed, then the gains we can typically achieve with standards like HEVC diminish.
Improve Video Encoding with Integrated GPUs
Alongside Intel Quick Sync Video hardware encoding, Intel introduced programmable GPUs in its processors in 2010 (Intel® HD Graphics), updating to higher performance versions in 2013 (Intel® Iris® Graphics, Intel® Iris® Pro Graphics). These on-chip graphics capabilities, available in both desktop platforms (Intel Core i5/i7 processors) and server/workstation platforms (Intel Xeon E3 processors), provide us opportunities for accelerating software video encoding.
- Intel GPU Execution Units (EUs) support dual single instruction multiple data (SIMD) and floating point operations up to 32-bits that can be leveraged in video encoding
- A typical 40 EU based Intel Xeon Processor E3 with integrated Intel Iris Pro graphics can provide up to 832 GCPS processing power when clocked at the maximum frequency of 1300MHz
- Supports multiple fixed function video coding blocks and pre-processing blocks (like scaling and de-interlacing) making it a rich, media-centric processor
- Intel GPU SIMD can be easily scaled to support higher bit depths (10, 12 bit profiles), enabling support for Main 10 and HDR encoding
Rate Distortion Optimal: Improving Speed without Compromising Quality
Video encoders that leverage CPUs with integrated graphics processing are uniquely positioned to provide a good balance between the lower power of hardwired solutions and the flexibility of CPU-only solutions. However, many GPU accelerated solutions are handicapped by lower quality implementations and limited features – therefore not deemed suitable for high quality applications in traditional broadcast and online video.
A key factor in increasing the processing speed without compromising quality is the inclusion of rate distortion optimal (RDO) encoding in a hybrid CPU + GPU implementation. The inability of traditional GPU designs to perform RDO encoding has led to the perception that GPU implementations deliver lower quality. What we therefore need is a hybrid design that incorporates RDO and also effectively uses the onboard powerful GPU Execution Units and Hardware Video Processing Blocks.
Accelerate Processing Speed
If the encoding quality of GPU accelerated implementations remain identical to their pure software counterparts, how much do we stand to gain by redesigning and accelerating it for a hybrid implementation?
Let us consider a specific operating point of HEVC encoding of a 1080p stream at 30fps, 4 mbps encoding. Typical high quality encoders can realize this with approximately 16 Intel Xeon Processor E5 cores clocked at 3GHz each (using about 45 GCPS). The video encoding process can be further subdivided into three broad categories:
- Estimation - comprising motion, intra and mode estimation
- Decision - rate, distortion or RDO decision making
- Prediction - containing reconstruction and entropy coding
Intel’s on-chip graphics processing offers several fixed function OpenCL calls in addition to the SIMD processing capabilities. By utilizing the SIMD processing and fixed functions, the estimation can be offloaded to the integrated GPU with a processing load well within the 33.3 millisecond window for every frame.
Offloading the estimation loop and parts of the prediction loop to the integrated GPU can provide 50-70% offload in processing from CPU on an average, with minimal or no drop in quality as compared to CPU-only implementations. In fact, we can expect further quality improvements since GPU enables a more exhaustive approach to estimation.
Reduce Power Consumption of Video Solutions
Hybrid HEVC encoders also help us go green. With a computation offload of 50-70% depending upon the operating quality or algorithms, integrated graphics processing provides 2-3x gains in speed. Consider an example to understand what this means in terms of power.
A 4 core CPU with Intel Iris Pro Graphics (for example: the Intel Xeon Processor E3-1585L v5) can realize what a 16 core dual socket Intel Xeon Processor E5-2667 v4 can achieve for H.265 encoding, delivering identical video quality. Considering just the processor power, this means that a 270W processing requirement can be reduced to 45W using hybrid design, translating to about 6x reduction in power.
Need more information on how you can gain the triple benefits of improved quality, speed and power consumption with hybrid HEVC encoders? Contact us at firstname.lastname@example.org
Explore Ittiam’s i265 family of H.265 codecs @https://www.ittiam.com/products/software-ips/video/h265-hevc/