GPU coprocessing for virtual reality (remote rendering + local reprojection)

Advanced display talk, display hackers, advanced game programmers, scientists, display researchers, display manufacturers, vision researchers & Advanced Display Articles on Blur Busters. The masters on Blur Busters.
Post Reply
User avatar
Chief Blur Buster
Site Admin
Posts: 9010
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada

GPU coprocessing for virtual reality (remote rendering + local reprojection)

Post by Chief Blur Buster » 06 Apr 2021, 19:21

On March 11th, 2021 on HardForum I published one of my brilliant ideas (that other VR scientists have also come up with too, which I've later learned). So several are having the same kind of idea already;

This article is for people who are familiar with Oculus ASW 2.0 -- a frame rate amplification technology.

Not just remote VR, but also useful for "PCVR + Standalone" headsets that already has 3dof coprocessing already (Virtual Desktop remote GPU + Quest 2 local 3dof reprojection). What we need is local 6dof reprojection co-processing, regardless of remote-streamed or nearby-PCVR-streamed:
Chief Blur Buster wrote:
Armenius, post: 1044951923, member: 273744 wrote: In a world where latency, pixilation, and motion clarity are chief concerns of VR gaming, a company sees the future of VR in streaming. Startup PlutoSphere wants to use NVIDIA's cloud-based gaming service to bring games to the Quest 2 over the internet. Good luck with that, is all I have say. ... lus-quest/
How Local Movements Can Become Lagless In Future Streamed VR:

Firstly, let me preface. I prefer playing on my RTX 3080. But, streamed VR has some brilliant innovations:

There's some clever hybrid GPU co-processing tricks (cloud GPU rendering + local GPU reprojection).

Basically, the graphics is streamed, but reprojection is done locally, so that head turns are instantaneous even with laggy streaming.

This eliminates most of VR nausea from moving around on a laggy streaming service, since the streamed world laglessly reprojects around you (like a simplified version of Oculus ASW 2.0 used in the original Oculus Rift to perceptually losslessly convert 45fps to 90fps).

Those who have read the Frame Rate Amplification Technology (F.R.A.T.) article on Blur Busters in the past (google it or click the Research tab if you haven't seen it yet.) -- the method of generating cheap 1000fps at UE5 levels from a midrange GPU by year 2030. It's a similar kind of parallelism, except parts of it remotely. It is a bit easier to conceptualize the GPU parallelism behind this if you've ever tried an original Oculus Rift and its ASW 2.0 trick.

In the future, some frame rate amplification technologies (of the future, aka 2030s) conceptualize a theoretical GPU-coprocessor built into the monitor (to convert 100fps to 1,000fps for future 1000Hz displays) -- doing Oculus ASW 2.0 like tricks but for PC gaming. But technically it can be cloud-vs-local coprocessing too.

Also, a popular Oculus Rift (original) optimization was to download the third party Oculus Tray and force the VR game to 45fps permanently, and use permanent reprojection (ASW 2.0) to get 90fps on a lower-performance GPU or higher-game-detail levels. Some games worked really well with this frame rate amplification, while others did not. But if the game is properly optimized, the frame rate of the GPU is decoupled from the frame rate the eyes are seeing. With no interpolation artifacts, no noticeable extra lag, and no soap opera effect (at least in some games).

The same concept can be thought as GPU coprocessing (high power GPU in cloud, low power GPU in headset). And the technology is already in Quest 2. The Quest 2 is capable of simple 3dof rotational reprojection during headturns (not as elaborate as 6dof reprojection Oculus ASW 2.0), so streaming services probably will take advantage of this for 3dof reprojection. The GPU in Quest 2 can only do 3dof reprojection (spherical rotation) rather than 6dof reprojection (3D/parallax depth-buffer-aware reprojection like PC Rift ASW 2.0), but that is probably good enough for VR streaming services to Quest 2. It's not as nauseating as you think because of this clever GPU co-processing trick.

The Quest 2 GPU can play a video bitrate of up to 150 megabits per second (essentially perceptually lossless in H.EVC), so if the streaming service blasts that much (needs a gigabit Internet connection), remote streaming over a 100ms latency for stationary and slow-moving content is theory indistinguishable as Oculus Link with the exception of more laggy-looking hand movements (due to lack of 6dof reprojection) even if headturns are lagless (thanks to local 3dof reprojection).

The latency will be observed during actions like shooting an arrow or interacting with other players. but I would hope a Quest 3 to possibly be able to do real time 6dof reprojection with realtime compressed Z-buffer streaming. If 6dof reprojection is done in a way to completely undo 100ms latency, then even local sudden movements + sudden hand movements, can become a lot more perceptually lagless (with minor artifacts) despite being remotely rendered. The lag would then only show with things like remote players and hit registration, etc.

The real question is the proper partition of the GPU co-processing frame rate amplification architecture (cloud based GPU + local GPU co-operating), and the ratio of GPU power needed (how powerful the cloud GPU needs to be, versus how powerful the local GPU needs to be), in how it partitions frame rate amplification tasks. Needless to say, research papers on this incredible stuff is probably spewing out of those companies by now.

Metaphorically, imagine one remote system doing H.264 P-Frames remotely, then a local system doing H.264 I-Frames and B-Frames locally. That's a gross simplification of local-remote render partitioning, but the technologies that make it possible to convert 45fps to 90fps laglessly, is what is used for tomorrow's 100fps-to-1000fps conversion (for future 1000Hz monitors without unobtainium GPUs), and coincidentally for de-latencyizing VR cloud rendering, as long as there's a modicum of a local GPU (like Quest 2) to do some coprocessing tasks like 3dof or 6dof reprojection to undo the streaming latency. This way, you can use a powerful cloud GPU, plus a local less powerful GPU, to retroactive 6dof-reproject to undo the 100ms latency, to do UE5-graphics perceptually laglessly!

The elephant in the room is the remote-GPU-vs-local-GPU power-vs-power ratio needed to successfully do a worthwhile co-processing job of the gaming content -- if in-headset GPUs improve more rapidly than cloud GPUs do, then this may not work out long term (might as well use local GPU instead). But today, it's 3dof reprojection is already automatically undoing the 100ms latency today for lagless headturns on streamed VR game tests today, since 3dof reprojection is already built into Quest 2 stream-player (that's how 360 degree 24fps videos still headturns at 90 frames per second, and it is also used for Oculus Link as a simplified 3dof version of ASW 2.0).

Which means for VR game streaming to Quest 2 today -- you have lagless head turns (thanks to 3dof reprojection) but laggy sideways movements (due to lack of 6dof reprojection). To fix laggy sideways movements, will require 6dof reprojection + retroactive reprojection to undo streaming latency. This is theoretically possible for a future Quest 3 because it knows the current instantaneous 6dof position and can thus locally reproject a lagged frame to retroactively correct its 6dof position. As long as the streamed VR game is also streaming a compressed version of the Z-buffer too for essentially parallax-artifact-free local reprojection. (Oculus ASW 2.0 requires the Z-buffer to do 6dof reprojection today on my original tethered Oculus Rift). Then local movement latency is completely eliminated even if the VR stream is 100-200 milliseconds, and thus with local 6dof reprojection (with compressed Z-buffer streaming) of future VR streams, things like turning head / crouching / looking under desk / leaning to look around a wall / etc becomes lagless despite the VR streaming high lag 100+ms.

For local-remote GPU co-processing work, the bitrate increase of the extra data needed (Z-buffer streaming) remains an issue but is rapidly falling. But some experiments was done with this already and the concept is sound (with the right local:remote GPU:GPU coprocessing ratio and sufficient codec/bandwidth performance), though the Quest 2 isn't powerful enough to do streamed high-quality highly-compressed Z-buffers yet, for local 6dof reprojection. There are minor reprojection artifacts, but much less than Oculus ASW 1.0 days before the Oculus ASW 2.0. It does require a few extra tens of megabits per second for compressed Z-buffer streaming for 6dof reprojection co-processing for fully lagless local body movements during VR streaming.

New Z-buffer compression algorithms may be needed that may be able to piggyback on H.EVC or H.264 codecs (simply compress the monochrome depth maps), to keep the Z-buffer streaming bitrate low enough without very glitchy 6dof reprojection. An H.EVC extension support 16-bit monochrome (48-bit color), and a 16-bit Z-Buffer can simply be represented as a monochrome depth map. For streaming, we don't need to worry about Z-fighting, as the reprojection artifacts are only visible for near-distance objects (where Z-buffer resolution is much higher anyway), so 16-bit is sufficient precision for the "streaming + 6-dof reprojection" combo. Either way, this allows compressing a Z-buffer through a commodity video codec, to make Z-buffers compact enough for Internet streaming, making possible local 6dof reprojection at roughly ASW 2.0 quality, for lagless local movements despite 100ms VR streaming lag. VR streaming with Z-buffers then uses possibly only about 1.5 times more bandwidth (approx) than without Z-buffers. But you gain the advantage of lagless local movements from local 6dof reprojection, without too many objectionable artifacts.

This is just a way to stream Z-buffers using existing commodity video compressors that was never designed for Z-buffers. To fix some shortcomings like compression artifacts, a local Z-buffer optimized compression-artifacts deblock filter (shader based and/or neural based) can filter out most Z-buffer compression artifacts. Like foreground/background pixels erroneously popping far/near at parallax edges. The algorithm can use the visible frame buffer as hinting, possibly with a small amount of AI (like how Quest 2 Passthrough Mode uses the Snapdragon's neural network to AI-stitch the 4 cameras in real time with no seams). By gluing together a few off-the-shelf technologies, realtime 6dof reprojection probably can be possible in a future Quest 3. I doubt the Quest 2 can do more than 3dof reprojection anyway, but willing to be pleasantly surprised. The neat effect is that local 6dof body-movement latency can remain constant (low) regardless of varying Internet, with progressively worse latencies simply slowly increasing the amount of reprojection artifacts during fast movements.

At the end of the day, 100ms VR streaming latencies (for local movements) are eminently correctable via retroactive 6dof reprojection, with two streamed video eye views (at 90fps each) + two streamed Z-buffers (at 90fps each) + local 6dof reprojection GPU to undo the streamed latency for local movements in supported VR games. You'll definitely need ~100 Mbps at least for comfortable VR streaming, and the Z-buffer has to fit in that too...

Another optional theoretical enhancement in a future headset is automatically building reprojection history over a series of multiple frames -- to do parallax fill-ins. For example, moving to hide around a wall, then suddenly tilting to look again. Streaming latency in theory would make the view glitch a bit, even if your local movements were lagless. But if the reprojector retroactively kept a history of the last few seconds of frames (uncompressed pairs of 2D frames + pairs of Zbuffers) and used DLSS-like tricks, you could eliminate most of lagged-reprojection artifacts too for most movement use-cases. Because the local VR headset would remember the last few seconds of 3D, and the reprojector would jigsaw stitch them seamlessly to prevent parallax-reveal glitching from streaming latency. Mind you, the VR games would need some minor modifications to improve the local-remote GPU coprocessing experience.

BTW, I am cited in more than 20 different peer reviewed research papers including this NVIDIA frame rate amplification related paper -- so I've got some reputation here and background knowledge of trends of future GPU workflow parallization.

Mind you, I prefer the original games installed on my PC though, but it's amazing how local movement latency can be decoupled from streaming latency, making cloud VR streaming possible without nausea in a much-better-than-Stadia experience. The science of frame rate amplification technologies also applies to local:remote GPU co-processing too.

UPDATE: I am also reminded of the early Microsoft pre-VR reprojection experiments called Microsoft Hyperlapse (2014 version) which essentially photogrammetry-analyzes video to generate a GPU 3D scenery which can then allow a virtual camera path (motion stabilized) different from the original path. This same "2D-3D-2D" reprojection technique can actually also be used as a frame rate amplification technology for video too, by generating new intermediate camera positions along a camera path (whether the original non-stabilized camera path, or a new Hyperlapse-stabilized camera path).
Head of Blur Busters - | | Follow @BlurBusters on Twitter

       To support Blur Busters:
       • Official List of Best Gaming Monitors
       • List of G-SYNC Monitors
       • List of FreeSync Monitors
       • List of Ultrawide Monitors

Post Reply