18

H265 / HEVC, hardware accelerated, in #Webrtc – DONE … well, almost.

 3 years ago
source link: http://webrtcbydralex.com/index.php/2020/04/03/h265-hevc-hardware-accelerated-in-webrtc-done-well-almost/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

H265 / HEVC, hardware accelerated, in #Webrtc – DONE … well, almost.

This is really a great week with so many of the projects cosmo has been working on or helping for the past years coming out almost at the same time. During last IETF Hackathon, at the webrtc table, and then at cosmo offices in Singapore, INTEL and Apple came together to add HEVC support in webrtc.

INTEL chips have been supporting Encoding and Decoding for some time now. They support it in non-GPU hardware, making it a big deal for devices that can’t afford full discrete GPU. Most Apple devices integrate those INTEL chip, so having support in WebRTC for H265 HW acceleration was de facto enabling H265 in all of Apple devices in one shot (without legal issues). Of course, the more support there is for INTEL features in media stacks above, the more hardware they sell. Win-Win

INTEL (webrtc group in Shanghai) had an implementation for Desktop, android and iOS. CoSMo acted as a catalyst between the two teams, and the two language/culture (chinese and french :-)).

This blog is about the tech details of H265 in particular, and Hardware Accelerated Codecs implementation in libwebrtc in particular.
Updated on april, 7th, to add info about GPU acceleration support, in addition to INTEL CPU HA acceleration support.

(*) CPU Hardware acceleration means the CPU has dedicated silicon circuit to implement the function, as opposed to running a software implementation in the generic x86 CPU. Even though both use the CPU, the former is much faster.

First thing first, if you want to see the commit in webkit: here it is. It should appear in the next Safari Tech Preview release, or two. There will not be a specific announcement in the Apple/Webkit/Safari blog, so you will have to look at the SDP yourself 🙂

This is the commit about CPU Hardware Acceleration:
https://trac.webkit.org/changeset/259452/webkit

This is the commit about GPU Hardware Acceleration:
https://trac.webkit.org/changeset/259568/webkit

Now let’s take a step back. Codecs (AV1, H265) are a little bit like any other big features, e.g. End-to-End-Encryption. While I’m happy to know what Apple, or Zoom, have or do not have, the most important question is: how do I do it myself. So here it is, introductory course on default Codecs, Injectable codecs and hardware acceleration support in native libwebrtc.

INTRODUCTION AND DESIGN

The native webrtc stack, satellite view

Leaving the negotiation of the media and codec aside, the flow of media through the webrtc stack is pretty much linear and represent the normal data flow in any media engine.

INlBasXfMBcGoc93EYMFLY-1ZBpeuJRiu1U-5AHXYV4FAY8UOHqCT_qnH2q0Ac1ZtLq_NmPltzDSJ1-fxXTOXL85fgLCKgXLyDhmDl3emo313I5U1GVLdaCzQHPyqEoECV3XDm_Y

The design related to codec is mainly in the Codec and RTP (segmentation / fragmentation) section. Everything before (frame capture ) and after (encryption, ICE, network) is pretty much codec agnostic. The codec and RTP part of the code, for historical reasons, is referred to in libwebrtc as the “Call” API, which is illustrated below.

P0m_MZ7EaH7Qdbo3vnW2kANjgulb-JJoTyjTl4AfYaJ1uyA_BIae0EljAhkwGkZH8IVw1kyLvqtPNghoLjPGMU-G-zd3rXSR6VH4BlNb9dGgltZp2KOHA9CPHlPgF21c4NGorj7l

Image 1: The Call API. The perconnection on top acts as the controller. Images come from the capturer (top left), go down to the encoding, and then rtp packetization. Upon reception, a packet  is decrypted then sent to the depactetizer (bottom right) and once the encoded frame is reconstructed, is sent up to the decoder.

For the sake of simplicity we will only focus on the sending/encoding/packetizing side of things, and only for video. The receiving/decoding/aggregation is completely symmetric and can be inferred from this guide. 

A. PeerConnection level

Local codec support needs to be advertised during the handshake, so they all need to be known BEFORE the handshake starts, and cannot vary across time. As usual in C++, Information that do not change during the lifetime of an object are passed through the constructor of that object.

Here, the libwebrtc design provides application the capacity to inject their own Codec support via a Factory on top of libwebrtc. A default, “internal” Video Codec Factory is otherwise provided. This is how, for example, Chrome reuse their own GPU acceleration support in libwebrtc, extending the standalone libwebrtc capacity (see footnote ).

The peerconnection factory can take as a constructor argument a unique External Video Factory. VideoFactory is then passed to a peer connection object when created the same way (through the constructor).

webrtc::CreatePeerConnectionFactory(
    networkThread,
    networkThread,
    signalingThread,
    audioModule,
    webrtc::CreateBuiltinAudioEncoderFactory(),
    webrtc::CreateBuiltinAudioDecoderFactory(),
    createEncoderFactory(),
    createDecoderFactory(),
    Nullptr,
    Nullptr
);

B. VideoEncoderFactory

https://cs.chromium.org/chromium/src/third_party/webrtc/api/video_codecs/video_encoder_factory.h

The VideoEncoderFactory takes a VideoCodecInfo structure as input and return the corresponding VideoEncoderif the VideoCodecType is supported. This can be done multiple times, with different inputs during the lifetime of the factory (i.e. of the peer connection), so it is done through the createEncoder() API:

public VideoEncoder createEncoder(VideoCodecInfo input)

There can be more than one encoder for a given codec type, for example to support software fallback when hardware encoder fail or do not support all the possible profiles and options users could request.

C. VideoEncoder level

https://cs.chromium.org/chromium/src/third_party/webrtc/api/video_codecs/video_encoder.h

Each Video Encoder can set a certain number of flags that allow the code to know not only the type of codec they support, but also if they support HW encoding (native frame type => texture on GPU). See Below for the the settings in the case of the H265 HW accelerated codec in the original Intel Implementation for illustration.

EncoderInfo info;
// Disable texture support, take I420 or I444 as input
info.supports_native_handle = false;
info.is_hardware_accelerated = true;
info.has_internal_source = false;                           // external capturer to feed us with frame
info.implementation_name = "IntelMediaSDK";
// Disable frame-dropper for MSDK.
info.has_trusted_rate_controller = true;
// Disable SVC / Simulcast for MSDK.
info.scaling_settings = VideoEncoder::ScalingSettings::kOff;  
return info;

The following call sequence diagram represent the full call path from the media capture to the encoder. It assumes the JSEP handshake has been done and the VideoCodecInfo structure has been populated from the info found in the SDP offers and Answers.

ZB55__Vk9b_nRHAUUqdiADacPy4Fm_bJV3Qv_O9xuup13sghvE9KbdiDyCTfpVovwIJFk6j2vLyYm_V8BpmSvDgHDZodPFq_A50eHt3p6Rve42WkAQGMyaXuzZiYvyKiKWvCeQ87

The Codec section is in charge of taking in a raw frame (VideoFrame) and generate an encoded frame (EncodedFrame) using a given VideoCodec. The VideoCodec is only passed a raw frame (whose dimensions can change dynamically during the call depending on bandwidth available), and a target bitrate. The details of this part, especially when it comes to Hardware support are provided in section 1.

The RTP section implements the RTP protocol and the specific RTP payload standards that correspond to the supported codecs. It takes an encoded frame as input, and generates several RTP packets. The details of this part is provided in section 2.

Those are then handled down to the encryption layer to generate Secure RTP packets. Encryption, double encryption (SFrame), ICE, and network transport are all out of scope of this post.

Intermediate classes manage bitrate adaptation. This is quite complicated, and include a lot of heuristics, so we are not going to detail it here either.

Section 1: Adding a codec implementation for H264, VP8 or VP9

Here we do not need to extend internal support for new VideoCodecType, nor for a new RTP payload type. This is the simpler case, and also the case most people are apparently interested in.

hhBOo72yA3eUkvqAXIDoR1Xv8OikgFzMNeoMeZ0VNsgAzr9GdfS7tkO2-M4aorsZgpRcPZdC8foZM-6Nhixxr91k8iC_KjZP2kaVq5SFL_yn0Fj6l1gfFTnhw7yd_MeVNY7i1Jtj

A. OWN VideoEncoderFactory

The main way to achieve this is to create a new VideoEncoderFactory, which supports a new type of VideoEncoder, hardware-accelerated. You can then inject it through the peerconnectionFactory constructor. Your VideoEncoderFactory code can reside in the app, in which case you do not have to modify libwebrtc code!

INTEL has its own Hardware accelerated H.264 and H.265 MSDKVideoDecoderFactory that does not reside in his copy of libwebrtc, but in its app ().

B. libwebrtc iOS Hardware acceleration support

The ios hardware acceleration support in libwebrtc () is a direct extension of the VideoEncoderFactory design:

yqIWREI5f1ouYFL0kpVrKhY-BboNtfW5ReLSmwQjvyCyjpis-CRvDO0pZkH51SUJQrugW3wGnCZwE4sQoWzUv51w9WkxrRWjKqT2LeXqEEMa03HWzXmOcfmxs60RsLT0C6LHCS-U

C. libwebrtc Android Hardware Acceleration Support

The android hardware acceleration support is pretty simple to read:

https://chromium.googlesource.com/external/webrtc/+/master/sdk/android/api/org/webrtc/HardwareVideoEncoderFactory.java

The global class diagram can appear as complicated, which is a direct result of both the higher fragmentation of android hardware compared to iOs’, and the fact that one level of indirection is needed to mix java and C++ code, which is not the case when mixing obj-c and C++.. We provide it with annotation for the reader to follow.

cgYn-BaGjwrGKC6a90O4VteKST28WeclbQIDtXZ7ZQ0WmZ3BOI8DlQ_7FmxoU8d9C9LGvlZdGivodA5KfkmHIpbS0p7r8iVaOQfDsabphGZV39nIsK5qGBOqOGfH9d-dlaXf9YWc

D. Examples of some 264 HW implementations out there

INTEL H264 android HW acceleration support

sdk/android/api/org/webrtc/HardwareVideoEncoderFactory.java
sdk/android/src/java/org/webrtc/HardwareVideoEncoder.java
sdk/android/src/java/org/webrtc/MediaCodecUtils.java
sdk/android/src/java/org/webrtc/VideoCodecType.java
sdk/android/src/jni/android_media_codec_common.h

NVIDIA NvEnc, NvPipe, Video Codec SDK implementations

https://github.com/WonderMediaProductions/webrtc-dotnet-core/blob/master/webrtc-native/NvEncoderH264.cpp

MicroSoft 3D streaming toolkit (old)

https://github.com/3DStreamingToolkit/3DStreamingToolkit.git
https://github.com/3DStreamingToolkit/webrtc-extensions-3dstk

modules/video_coding/codecs/h264/h264_encoder_impl.h

You can see in the repositories above that the original implementation (same file name above than in the libwebrtc repository ) was replaced by an NVidia compatible one by microsoft.

MS MixedReality Toolkit (new)

You can see in the new toolkit traces of the same UWP_H264_encoder.

Section 2: Add support for new codecs, like H265 and AV1

y0yoqINf3_99SXSRuIQ21rCETmzSFkzXaAS8T3M3F-g4V-Qvbu_4L9Dbx9OUo5vvmIZ2EqbKpyWjw_sVbb4b6F7fd_IdChGnvIeKOPuIERREkrXcCNpN5EmNMgmM01QVkxrxG7LB

INTRODUCTION

Here we have a completely new codec, so we need to extend all the codec structures, add support for corresponding RTP layer.

The changes percolates through the entire pipeline, instead of being relatively limited to the VideoEncoder and VideoEncoderFactory as before.

You can see on the simplified drawing above, which represents adding support for AV1, that not only did the VEF and the VE classes have been modified, a dedicated AV1 Encoder has been added, but also the RTP packetizer had to be extended to support the corresponding RTP payload, and the EncodedImageCallback, which links the Encoder with the packetizer, had to be extended.

H.265 example from INTEL [win, mac, ios, android]

Video Codec and Factory themself are here:

[win] – https://github.com/open-webrtc-toolkit/owt-client-native/tree/master/talk/owt/sdk/base/win

I. BUILD SYSTEM

Protect your code by using a GN variable, and Translate it into a C++ DEFINITION to protect the corresponding code with preprocessor checks. By tradition all the gn variables specific to webrtc are prefixed by “rtc_”

In BUILD.gn
if (!rtc_use_h265) {
     defines += [ “DISABLE_H265” ]
   }

You can then protect it and only enable it on platforms that support it.

build_overrides/build.gni 
 if (is_win || is_ios || is_android) {
   rtc_use_h265 = true
 } else {
   rtc_use_h265 = false
 }

Specific Codec Implementation code
common_video/BUILD.gn

Codec RTP Payload code
modules/rtp_rtcp/BUILD.gn

If building for mobile, or a MacOS framework:
sdk/BUILD.gn

Extending the android External Video Factory:
sdk/android/api/org/webrtc/HardwareVideoEncoderFactory.java
sdk/android/src/java/org/webrtc/HardwareVideoEncoder.java
sdk/android/src/java/org/webrtc/MediaCodecUtils.java
sdk/android/src/java/org/webrtc/VideoCodecType.java
sdk/android/src/jni/android_media_codec_common.h

ADD CODEC SUPPORT – BASE

The first thing you want to do then is to extend the list of supported codecs. For this you just add a codec entry in the VideoCodecType enum. A good way to get a feeling about how much changes are needed is to grep the source code for any instance of one of those enum fields.

In api/video/video_codec_type.h

   kVideoCodecVP8,
   kVideoCodecVP9,
   kVideoCodecH264,
 #ifndef DISABLE_H265
   kVideoCodecH265,
 #endif
   kVideoCodecMultiplex,
};

In api/video_codecs/video_codec.h

#ifndef DISABLE_H265
 struct VideoCodecH265 {
   bool operator==(const VideoCodecH265& other) const;
   bool operator!=(const VideoCodecH265& other) const {
     return !(*this == other);
   }
   bool frameDroppingOn;
   int keyFrameInterval;
   const uint8_t* vpsData;
   size_t vpsLen;
   const uint8_t* spsData;
   size_t spsLen;
   const uint8_t* ppsData;
   size_t ppsLen;
 };
 #endif

Then in the same file, extend the VideoCodecUnion

union VideoCodecUnion {
   VideoCodecVP8 VP8;
   VideoCodecVP9 VP9;   
   VideoCodecH264 H264;    
 #ifndef DISABLE_H265
   VideoCodecH265 H265;
 #endif
 };

In the same file, expose a const and non-const constructor proxy to the VideoCodec Class:

class RTC_EXPORT VideoCodec {
[...]
   const VideoCodecVP9& VP9() const;               
   VideoCodecH264* H264();
   const VideoCodecH264& H264() const;           
 #ifndef DISABLE_H265
   VideoCodecH265* H265();
   const VideoCodecH265& H265() const;
 #endif
[...]

III. ADD NEW RTP PAYLOAD, OPTIONS, AND SIGNALLING

Payload names need to include a new string:

static const char* kPayloadNameVp8 = “VP8”;
static const char* kPayloadNameVp9 = “VP9”;
static const char* kPayloadNameH264 = “H264”;
#ifndef DISABLE_H265
static const char* kPayloadNameH265 = “H265”;
#endif
static const char* kPayloadNameGeneric = “Generic”;
static const char* kPayloadNameMultiplex = “Multiplex”;

Management of corresponding FTMP options in SDP:
media/base/media_constants.cc

Main RTP/RTCP codec switch to be extended: modules/rtp_rtcp/source/rtp_format.c

Specific Packetizer to be used from the switch above: modules/rtp_rtcp/source/rtp_format_h265.h

IV. BITRATE ALLOCATOR (external)

If adding to the builtin video encoder Factory, you can either let the bitrate allocator fall through the default, or (recommended) handle explicitly your codec in the corresponding Case loop:
api/video/builtin_video_bitrate_allocator_factory.cc

ACKNOWLEDGEMENTS

Acknowledging people who did the work, and/or provided resources which enabled it, is, IOHO, a basic courtesy.

We would like to thank the Google team first for coming up with the original webrtc implementation and maintaining it.

We would also like to thank the INTEL WebRTC group in Shanghai for providing an implementation of H.265 hardware implementation and spending time going through the code with us. Special thanks to manager Lei, for making it happen, and the original H265 HW acceleration implementor Qiu for spending time with us.

We would like to thank the Apple team, and especially Youenn Fablet for additional help on the MacOS side of things.

Some of this work has been done within the scope of the free IETF hackathon, especially the IETF 106 in Singapore in November 2019, for which we have Cisco to Thank for. Special thank to Charles Eckel for the Hackathon vision and kind leadership. Thanks to all the usual participants and volunteers including but not limited to Haral T., Lorenzo M., Bernard A, Jonathan L., Sergio M.

Finally, thanks to all the people who are sharing projects out there for other people to learn from. In this case, we took a look at the nicotyze project. We have surely missed many, and can’t site them all anyway, but we would like to thanks everyone nonetheless.

Posted in Codec


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK