Frequent frame repeating on modern GPUs

SubstantialCt8690 · Post by **SubstantialCt8690** » 10 Feb 2024, 19:36

Hello everyone,
This discussion was originally on MonitorTests and posted here by the suggestion of the admin.
I hope you will find the topic and issue interesting and will have some ideas to share.

THE ISSUE:

My company works on AR/VR optics technologies and for several months we've been trying to solve an issue plaguing our device and all off-the-shelf tested monitors. The issue involves frame repeats at the GPU/GPU runtime at >=90Hz.
Right now we are able to compensate for frame repeats in the code in the future frames, but it does not fix the visual artifacts that happen during the frame repeats.

By "frame repeat" I do not mean a frame skip: I mean our simple DirectX/Vulkan test programs present a frame to the GPU, and then the GPU doesn't display new presented frames on time and instead sends the earlier presented frame twice or more to the monitor. The issue is not that the program / game engine is not able to provide frames on time: the frame repeat seems to be happening at the DirectX/Vulkan, GPU runtime or GPU stage which we don’t have access or control over. The APIs are telling our minimal test program to wait while our program itself is not busy and isn’t doing anything.

The artifacts are especially noticeable in our case, since we do optical pixel shifting/wobulation to increase resolution. We are not the only company working on such AR/VR tech (see “Digilens T-Rex”).
We understand that the frame repeat artifact may not be completely preventable for 100% of the time, but we still hope it can be reduced greatly so that it doesn’t happen every 40 seconds or so on modern laptop hardware with no user background processes happening. As-is, we can’t really develop our pixel shifting prototypes into viable products if the user is going to see a flicker/shift effect so frequently.
Being a tiny startup, we haven’t been able to discuss this issue with GPU suppliers directly over email and we’ve posted the issue in the Nvidia developer forum but are not sure they’ll find it worth their time unless we can narrow down the issue or find thr actual root cause.

TEST HARDWARE/SOFTWARE:

Test software:

These are the test programs we've created for detecting and analyzing the issue:

1) A pure DirectX, Vulkan, Unity (DX11) and Panda3D (OpenGL) programs for displaying red and blue frames in sequence on regular monitors.

2) A SteamVR headset runtime utilizing VRWorks API doing the same and tested with regular 180-240Hz monitors. VRWorks API requires an NDA so cannot be shared here.

3) A Unity SteamVR program displaying red and blue frames in sequence and ran on an existing SteamVR headset (HP Reverb G2) with its own proprietary VR runtime.

All Windows Power Settings and GPU settings have been checked.

Program (1) source code and binary is provided in the below link.

Even though we have spent many months on tests regarding this, it is very much possible that there may be some other way to code this to reduce (even if not eliminate) the frame repeat issue.

This is what the test program provided below does:
1) Two white cards are moved one cell each frame in two (top and bottom) 2d grids. Mouse cursor is moved each frame as well.
2) When a frame repeat happens, you notice both the white cards and mouse cursors freezing in place, then:
3) The mouse cursor jumps a position to compensate for the repeat.
4) The top white card moves a cell, only later jumps a cell to compensate. This is because the next frame was already presented and the 3d program couldn’t recall the next presented frame from the GPU when it learned that the current presented frame had been repeated.
5) The bottom white card just resumes as usual after the frame repeat, as it has not been programmed to compensate its position due to a frame repeat.

The program code and executable, 480fps camera recordings, program logs and summary spreadhseet of the logs can be found here: https://e.pcloud.link/publink/show?code ... aQc4SgdJY7

Test hardware:

We've tried with 5 PCs and 5 monitors, and the issue exists on all 5 monitors with 3 out of 5 PCs.

Laptops tested on which have this issue:
1) Aorus 15G XC-8US2430SH (2021) (RTX3070)
2) HP Victus 15 (2023) (RTX2050)
3) Asus Rog G752VS (GTX1070)

There's a custom built PC and one national brand laptop we’ve tested which don't have the issue, both using RTX4090. Right now we’re hesitant to limit use of our hardware and software to RTX40xx series users, even if it was determined that these newer GPUs solve/greatly reduce the issue in general and not just the specific models we’ve tested on so far.

Monitors tested on:
1) AOC C27G2Z 27" - 240Hz - FreeSync Premium
2) SAMSUNG 25" Odyssey G4 LS25BG402ENXGO - 240Hz - FreeSync Premium
3) MSI G27C4X 27 - 240Hz - FreeSync Premium
4) AOC 24G15N 24" - 180Hz - Freesync
5) ASUS TUF Gaming 24” VG249Q1A - 165Hz - FreeSync Premium

HDMI vs DisplayPort, and cheap vs expensive video cables do not seem to make a difference.
Tested both on integrated AMD and Intel, as well as Nvidia GPUs.

(links to these monitors and laptops are available in the cloud folder file “Test hardware info.txt”)

VARIABLE REFRESH RATE: NO IMPACT ON THE ISSUE:

Enabling Variable Refresh Rate (VRR) does not seem to solve the issue.
Our guess is it is due to one of two reasons (or both):
1) The delay induced by the GPU runtime or firmware is much longer than what variable refresh rate monitors can support.
2) There is a bug or limitation in the GPU/GPU runtime that preserves the issue even when VRR is enabled and supported.

This is how we enable VRR in code:
1) Set swap effect to DXGI_SWAP_EFFECT_FLIP_SQUENTIAL
2) Set the application to borderless window mode.
3) Created and resized the swapchain with DXGI_SWAP_CHAIN_FLAG_ALLOW_TEARING flag.
4) Passed 0 to vsync interval parameter field in the Present method.
5) Passed DXGI_PRESENT_ALLOW_TEARING flag to flags parameter field in the Present method

WHY THE ISSUE MATTERS:

The issue does not seem to only affect our optical hardware, but also general VR use and also general 90Hz+ gaming, regardless of monitor used and whether they support Freesync/Freesync Premium or not. Of course for VR it’s much more important due to repeated frames causing mismatch between the current shown VR view and the user’s real head position/rotation.
The issue seems to mainly come up at 120Hz and much more frequently at 240Hz so I’m not surprised it’s not reported or discussed often.

In case you are wondering why our VR device PCB is not synced with the PC some other way: there’s no reliable way to have the frame index data in sync between the device PCB/optical component and PC GPU (A) due to varying latencies you get with USB and DisplayPort-AUX and (B) the GPU simply does not let us know the issue has occurred and it has sent the previously presented frame to the display twice and kept the next presented frame for later until the issue has actually happened.
In theory the frames could have their index embedded on the pixel data itself we could use instead and display black when the issue happens, but this would (A) not solve the artifact and replace the repeating/shifting artifact with a blanking artifact and (B) this would require an expensive FPGA able to handle high fps video since no existing video chip can analyze pixel data this way, which would make the product prohibitively expensive.

If you’re wondering why DLP wobulation/pixel shifting does not have this issue:
DLP projectors receive a 4K 60Hz signal, store it in SRAM on the projector PCB, splits it into 4x 1080p frames at the PCB and displays them in sequence, wobulated. So the PCB does not have to deal with syncing with the GPU and a frame repeat at the GPU since it is producing the sub-frames itself and can sync itself with the optical component easily. But this kind of architecture introduces a 3-frame long latency which is not practical for high fps gaming and AR/VR.

I hope this discussion will help all of us learn why such frame repeats happen on modern GPUs so often, if they can be prevented or reduced with better/different code and if this is a GPU driver/hardware issue, that maybe we can pinpoint the exact cause and report it to Nvidia, AMD and Intel. If it's a code issue, then we can provide the solution to the Unity and Panda3D engine maintainers.

Thanks

YouTube · Post by **Chief Blur Buster** » 24 Feb 2024, 20:38

SubstantialCt8690 wrote: ↑
10 Feb 2024, 19:36
Hello everyone,
This discussion was originally on MonitorTests and posted here by the suggestion of the admin.
I hope you will find the topic and issue interesting and will have some ideas to share.

Do you have variable refresh rate? This behavior is known as LFC (Low Frame Rate Compensation), which is like a DRAM refresh, to repeat-refresh when frametimes get too long between refresh cycles, aka below the refreshtime of the minimum Hz rating. This prevents the image from decaying.
WORKAROUND: Turn off VRR, especially if you need precision framerate=Hz

However, if VRR is turned off, then this is odd, and needs a bit of troubleshoot. Windows DWM (e.g. Borderless Fullscreen) may have some repeat-refresh behaviors as it composites, and if you're using multimonitor (VR and main monitor) you will observe repeat-refresh behaviors because DWM.exe is a single-Hz compositors.
WORKARDOUND: Use single monitor mode when debugging (Windows+Shift+P to turn on/off), and/or use Fullscreen exclusive mode

Inaccurate refresh cycle counting can make it hard to get back in sync quickly, if you're trying to generate frames that correspond to a special refresh cycle (e.g. interlacing pattern, wobulation pattern, or a shutter-glasses sequence).
WORKAROUND: Try my open source refresh cycle estimating/counter module. It's also used by TestUFO.

OPEN SOURCE HELPER MODULE:
Keeping track of which frames aligns to which refresh cycles, can also help you get back in sync with a cycling pattern (interlacing, wobulation). I have opensourced (Apache 2.0) a refresh cycle counter algorithm: https://github.com/blurbusters/RefreshRateCalculator and you can modulus its software-based best-effort refresh cycle counter, to more quickly "get back in correct sync". If you use this, please credit us (and Duckware), as per Apache 2.0 open source license. If you port RefreshRateCalculator.js to a new language, I would request a humble reimbursement in providing the port of the module (even if not your code). The refresh cycle counter is at RefreshRateCalculator.getCount() which is a monotonically increasing refresh cycle counter since the initialization of the module. So you can just quickly get back in sync (within 1-2 frames of a stutter) with whatever shutter/interlace/wobulation/cyclic pattern via a simple MODULUS (%) -- because it does not count frames but microsecond refresh cycle timestamps.

If this does not tick your problem-solve boxes, let me know.

This is a big rabbit hole, of all the abstractions that the Windows compositor does, the 3D API does, the driver/GPU does, etc. So all these different layers can muck about with the frame presentation workflow as a refresh cycle. Drivers and swapchains may also have a habit of repeat-presenting frames, to try to solve various other problems that occur, creating new problems for some people like you. In some ways, this can sometimes be solved by inventing your own custom swapchain. One that piggybacks off fullscreen exclusive + using waitable swapchain (microsoft.com), which is also good to reduce VR latency. Having your own custom swapchain, on a fullscreen-exclusive mode, generally gives you more control over whether frames are presented or not. However, this might be a wild goose chase, without understanding the underlying cause. There are reasons of various kinds of swapchains; a deeper swapchain can reduce stutter, but a shallower swapchain can reduce latency. So a lot of work is needed to get best of both worlds, sometimes full of workarounds such as making the swapchain thread higher priority than the rendering thread, to make sure there's no blocking behaviours and that the page flip occurs on time at the least latency without stutter (missed vsync). What's happening may be more complex than double buffering or triple buffering. So very smart implementation can get best of both worlds (low latency and perfect framerate=Hz required for VR use cases).

BTW, I, myself, as part of Blur Busters also provides consulting services, see https://services.blurbusters.com on contract -- I've helped a few AR/VR vendors too, both onsite and offsite. Although I do not directly provide code, I have various algorithm for making 3D shutter glasses generically more reliable too.

SubstantialCt8690 · Post by **SubstantialCt8690** » 13 Mar 2024, 20:18

Hello,
First of all thank you so much for getting back to me on this, like I mentioned we are at a loss and unable to get feedback directly from NVidia or any other place.
Also apologies for the delay, we prepared a response and checked your open source code first.

I hadn't noticed that you provide professional consulting on the website, sorry, the info seems a little buried. I'll email you about payment/contract, but I really don't mind having the technical discussion here since like I explained the issue doesn't seems to be Nvidia GPU, CPU and 3d API-agnostic. Others may find it useful.

If you are not able to oberve the same issue with your own PC and monitor, I'm ready to supply you the same monitor and test laptop we use for tests o that you can validate the issue with your own eyes and/or measuring equipment.

Chief Blur Buster wrote: ↑
24 Feb 2024, 20:38
Do you have variable refresh rate? This behavior is known as LFC (Low Frame Rate Compensation), which is like a DRAM refresh, to repeat-refresh when frametimes get too long between refresh cycles, aka below the refreshtime of the minimum Hz rating. This prevents the image from decaying.
WORKAROUND: Turn off VRR, especially if you need precision framerate=Hz

So far we have worked using single monitor mode and VRR off when debugging, unfortunately it does not solve the problem.
In DirectX we disabled VRR by:
1) Passing flags 1 and 0 to swapchain presentation method (mSwapchain->Present(1,0))
2) Swapchain created with flag 0

Freesync has also been explicitly enabled and then disabled in the monitor settings, with no difference to the results.

However, if VRR is turned off, then this is odd, and needs a bit of troubleshoot. Windows DWM (e.g. Borderless Fullscreen) may have some repeat-refresh behaviors as it composites, and if you're using multimonitor (VR and main monitor) you will observe repeat-refresh behaviors because DWM.exe is a single-Hz compositors.
WORKARDOUND: Use single monitor mode when debugging (Windows+Shift+P to turn on/off), and/or use Fullscreen exclusive mode

We have built a test setup where the OS version is Windows 7. So that DWM can be disabled via code unlike Windows 8 and later versions. We have used dwmapi.h method DwmEnableComposition(UINT uCompositionAction) with the value of DWM_EC_DISABLECOMPOSITION.

However disabling DWM did not eliminate the problem. I am still hesitant to eliminate the DWM from the list due to DWM has it's one leg inside the DXGI.

We have also tried custom swapchain implementation via NVidia NvAPI and again no luck there:
1) Used method

Frequent frame repeating on modern GPUs

Frequent frame repeating on modern GPUs

Re: Frequent frame repeating on modern GPUs

Re: Frequent frame repeating on modern GPUs

Re: Frequent frame repeating on modern GPUs

Re: Frequent frame repeating on modern GPUs