35

A VR Frame’s Life

 3 years ago
source link: https://developer.oculus.com/blog/a-vr-frames-life/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A VR Frame’s Life

Oculus Developer Blog
Posted by Jian Zhang and Rémi Palandri
May 20, 2021
Quest

“VR applications are rendering two images per frame, one for the left eye and one for the right.” Oftentimes, this is how VR rendering is explained, and while it’s not wrong, it can be overly simplistic. For Quest developers, it is beneficial to understand the complete picture so you can make your applications more performant and visually appealing, and easily troubleshoot and resolve issues.

This blog post will take you through a VR frame’s lifecycle, explaining the end-to-end process from frame generation to final display. This journey can be divided into three stages:

  1. Frame generation to submission - how applications render the frame, including application APIs and frame timing model.

  2. Frame submission to compositor - how the frame data is shared between application and compositor.

  3. Compositing to display - the compositor's responsibility and how the final images are displayed on the HMD display.

Stage 1: Frame generation to submission

For Quest applications, we use VrApi / OpenXR to communicate with the HMD. Specific to the rendering, these APIs are responsible for the following:

  • Pose Prediction - Unlike traditional 3D apps, many VR concepts are designed to reduce latency. To set up a render camera for a specific frame in VR, knowing the current HMD pose isn’t enough, we also need to know when the frame is supposed to be displayed on the HMD screen, called PredictedDisplayTime. Then we can use the time to predict the display-time-HMD-pose and render the frame with the predicted pose to largely reduce rendering errors.

  • Frame Synchronization - The VR Runtime is responsible for frame synchronization. Our SDK provides APIs to control when the frame will be started and does not allow the application to run faster than the desired frame rate and will usually run at the same frame rate as the display. The app doesn’t need to (and shouldn’t) insert manual wait or synchronization for frame pacing.

For specific applications, depending on whether it is using VrApi or OpenXR, the behavior can be different, so we’ll address each separately.

VrApi Application

Here is what a typical multi-threaded VrApi application’s frame looks like:

  • Start the Frame: The main thread will call vrapi_WaitFrame to start the main thread frame and vrapi_BeginFrame to start the render thread frame.

  • Get the Poses: Applications usually need to know the HMD and controller’s poses in the simulation thread (Main Thread), so game logic or physics calculation can be executed correctly. To acquire this information, we need to call vrapi_GetPredictedDisplayTime and use the returned time to call vrapi_GetPredictedTracking2 to get the poses.

  • Rendering: In the Rendering thread, we can use the HMD/controller poses we got from the main thread to finish rendering. However, many applications (like UE4) choose to call vrapi_GetPredictedDisplayTime / vrapi_GetPredictedTracking2 again at the beginning of the render frame. This is a latency reduction optimization. We are predicting where the HMD pose will be at predicted display time and the later we call those sensor sampling APIs, the less prediction we need to do resulting in more accurate predictions.

  • Submit Frame: After the rendering thread finishes all the draw call submissions, the application should call vrapi_SubmitFrame2 to tell the VR Runtime that the application has finished the frame’s CPU work. It will do a hand over job to submit useful information to the VR Runtime (note: GPU work may still be in progress due to the synchronized nature which we will address later). The submit frame API then will do the following:

    • Frame Synchronization: If the frame is finishing too quickly, blocking here to avoid the next frame starting too early, guarantees that the app won’t run at a higher FPS than the system desired one (e.g. 72 FPS for Quest by default).

    • Check Texture Swap Chain Availability: Blocks the next eye texture from swapchain if it is still in use by the runtime. The blocking is often triggered by a stale frame since the runtime has to reuse an old frame one more frame.

    • Advance Frame: Increase frame index and decide next frame’s predicted display time, your next frame’s vrapi_GetPredictedDisplayTime call will have dependency on vrapi_SubmitFrame2.

This is how the majority of VrApi applications work. However, there are two additional comments worth mentioning:

  • Due to historical reasons, vrapi_BeginFrame / vrapi_WaitFrame was added later and some early applications only had access to vrapi_SubmitFrame2

  • We released PhaseSync as an opt-in feature for VrApi, which moved frame synchronization into vrapi_WaitFrame for better latency management. Therefore frame behavior is more similar to an OpenXR application, which we will talk about next.

OpenXR Application

With OpenXR applications, there are a few key differences to be aware of compared to VrApi applications.

  • Start the Frame: With OpenXR, PhaseSync is always enabled and xrWaitFrame will take on the responsibility of frame synchronization and latency optimization so the API can block the calling thread. Additionally, developers do not need to call a special API to get predictedDisplayTime. This value is returned from xrWaitFrame through XrFrameState::predictedDisplayTime.

  • Get the Poses: To get tracking poses, developers can call xrLocateViews, which is similar to vrapi_GetPredictedTracking2.

  • Rendering: It is important to understand that OpenXR has dedicated APIs to manage the swapchain; xrAcquireSwapchainImage / xrWaitSwapchainImage should be called before rendering content into the swapchain. xrWaitSwapchainImage can block the render thread if the swapchain image hasn't been released by the compositor.

  • Submit Frame:xrEndFrame is responsible for frame submission, but unlike vrapi_SubmitFrame2, it doesn’t need to do frame synchronization and swapchain availability checking as this function won’t block the render thread.

A typical multithreading OpenXR application’s frame might look like the following diagram:

Overall, whether you are developing a VrApi application or OpenXR application, there are two main sources of blocking; one from frame synchronization and one from swap chain availability checking. If you performed a systrace capture beforehand, you would see a familiar result. When your application runs at full FPS, those sleeps are expected as they simply (like traditional vsync functions such as eglSwapBuffer) prevent the application from rendering faster than the display allows, on top of optimizing latency. When your application can’t reach your target FPS, the situation is more complicated. For example, the compositor might still be using the previously submitted images since the new frame is late. This results in the “swap chain availability checking” blocking longer and possibly frame synchronization blocking as well to adjust frame timing for the next frame. This is why you see applications still spending time on blocking when the app is already slow. For these reasons, we do not recommend using FPS as a performance profiling metric, as it often does not accurately reflect the application workload. GPU systrace and Perfetto are better tools for measuring performance of your application, both on the CPU and GPU side.

Stage 2: From frame submission to compositor

Our VR Runtime is designed around the concept of Out of Process Composition (OOPC). We have an independent process, VR Compositor, that runs in the background while gathering frame submission information from all clients and then compositing and displaying.

The VR application is one of the clients from which frame information is collected. The submitted frame data will be sent over to the VR compositor through Inter-Process communication (IPC). We don’t need to send a copy of the eye buffer into the compositor process, as this is a large amount of data. Instead, the eye buffer’s memory ownership belongs to the compositor process beginning with swapchain allocation. Therefore, only swapchain handle and swapchain index are required. However, we do need to guarantee the data’s access is safe, meaning the compositor should only read the data after the application finished rendering and the application shouldn’t modify the data when the compositor is using it. This is done through the FenceChecker and FrameRetirement systems.

FenceChecker

Quest GPUs (Qualcomm Adreno 540/650) are tile-based architectures that only start working after all draw calls are submitted (until explicit or implicit flushing). When an application is calling SubmitFrame, usually the GPU has just started rendering the corresponding eye texture (as most engines explicitly flush the GPU right before calling SubmitFrame). If the compositor, at this point, immediately reads the submitted image, it receives unfinished data leading to graphics corruption and tearing artifacts.

To solve this issue, we issue a fence object into the GPU command stream at the end of frame (vrapi_SubmitFrame / xrEndFrame) and then kick an asynchronous thread (FenceChecker) to wait on it. The fence is a GPU -> CPU sync primitive which can tell the CPU when the GPU processing has reached the fence. Since we inserted the fence at the end of the frame, when the fence is returned, we know the GPU frame is finished and can then notify the compositor that it is ok to use the frame now.

This sequence is also captured in systrace, as shown below:

Tips: For majority applications, FenceChecker marker’s length is roughly the same with the application GPU cost.

Frame Retirement

FenceChecker helps transfer the eye texture’s ownership from the application to the compositor, but this is only half the cycle. After the frame is finished displaying, the compositor also needs to hand over the data’s ownership back to the application so it can use the eye texture again, which is called “Frame Retirement.”

The VR compositor is designed to handle late (stalled) frames by reusing the frame and reprojecting it again onto display, if the expected frame is not delivered on time (see TimeWarp). Since we don’t know if the next frame can arrive on time for the next compositing cycle, we have to wait until the compositor has picked up the next frame to release the current frame. Once the compositor confirms it does not need the frame again, it marks the frame as “retired” so the client knows the frame has been released by the compositor.

This can also be viewed with systrace if you zoom into the TimeWarp thread. When TimeWarp is reading the new frame, the corresponding frame’s client side FenceChecker is required to be returned, confirming GPU rendering completion.

Stage 3: From compositing to displaying

At this point, the frame (eye textures) has arrived at the Compositor, and needs to be shown on the VR display. Depending on the hardware, there are many steps occurring with roughly the following components involved:

  • Layer Composition - Responsible for blending different compositor layers together. Layers can come from one or more clients

  • TimeWarp - Our reprojection technique to reduce HMD rotation latency

  • Distortion Correction - VR lenses cause distortion to increased perceived FOV. A counter-distortion is necessary so that users see a non-distorted world

  • Other post-processing - There are also other post-processing like Chromatic aberration correction (CAC).

From a developer’s point of view, most of this is done automatically as part of the display pipeline and it is ok to treat them as black boxes. After all this hard work is completed, the screen is lit up at PredictedDisplayTime and people see your app displayed.

Given the importance of the compositor's work (without which the screen would be effectively frozen), it runs on a higher priority context on the GPU and will interrupt any other workloads such as your rendering work, when it needs to execute. You can see its effect on Preempt blocks on GPU systrace. On both Quest1 and Quest2, its per-frame work is split in 2 for latency optimisations, preempting your application usually twice per frame as it runs every 7ms.

Echo VR screenshot

Conclusion

We hope this overview helps Quest developers have a deeper understanding of the system and enables them to build better VR applications. We covered a typical VR frame’s lifecycle, starting from the application rendering and finishing on the VR display. We explained the data flow between the client application and the compositor server. This post represents the internal designs as they are today; we are continually working towards building more optimizations in the future. If you have questions or feedback, let us know in the comments below or in the Developer Forums.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK