Frequent frame repeating on modern GPUs

Talk to software developers and aspiring geeks. Programming tips. Improve motion fluidity. Reduce input lag. Come Present() yourself!
Post Reply
SubstantialCt8690
Posts: 3
Joined: 19 Jan 2024, 12:47

Frequent frame repeating on modern GPUs

Post by SubstantialCt8690 » 10 Feb 2024, 19:36

Hello everyone,
This discussion was originally on MonitorTests and posted here by the suggestion of the admin.
I hope you will find the topic and issue interesting and will have some ideas to share.


THE ISSUE:

My company works on AR/VR optics technologies and for several months we've been trying to solve an issue plaguing our device and all off-the-shelf tested monitors. The issue involves frame repeats at the GPU/GPU runtime at >=90Hz.
Right now we are able to compensate for frame repeats in the code in the future frames, but it does not fix the visual artifacts that happen during the frame repeats.

By "frame repeat" I do not mean a frame skip: I mean our simple DirectX/Vulkan test programs present a frame to the GPU, and then the GPU doesn't display new presented frames on time and instead sends the earlier presented frame twice or more to the monitor. The issue is not that the program / game engine is not able to provide frames on time: the frame repeat seems to be happening at the DirectX/Vulkan, GPU runtime or GPU stage which we don’t have access or control over. The APIs are telling our minimal test program to wait while our program itself is not busy and isn’t doing anything.

The artifacts are especially noticeable in our case, since we do optical pixel shifting/wobulation to increase resolution. We are not the only company working on such AR/VR tech (see “Digilens T-Rex”).
We understand that the frame repeat artifact may not be completely preventable for 100% of the time, but we still hope it can be reduced greatly so that it doesn’t happen every 40 seconds or so on modern laptop hardware with no user background processes happening. As-is, we can’t really develop our pixel shifting prototypes into viable products if the user is going to see a flicker/shift effect so frequently.
Being a tiny startup, we haven’t been able to discuss this issue with GPU suppliers directly over email and we’ve posted the issue in the Nvidia developer forum but are not sure they’ll find it worth their time unless we can narrow down the issue or find thr actual root cause.


TEST HARDWARE/SOFTWARE:

Test software:

These are the test programs we've created for detecting and analyzing the issue:

1) A pure DirectX, Vulkan, Unity (DX11) and Panda3D (OpenGL) programs for displaying red and blue frames in sequence on regular monitors.

2) A SteamVR headset runtime utilizing VRWorks API doing the same and tested with regular 180-240Hz monitors. VRWorks API requires an NDA so cannot be shared here.

3) A Unity SteamVR program displaying red and blue frames in sequence and ran on an existing SteamVR headset (HP Reverb G2) with its own proprietary VR runtime.

All Windows Power Settings and GPU settings have been checked.

Program (1) source code and binary is provided in the below link.

Even though we have spent many months on tests regarding this, it is very much possible that there may be some other way to code this to reduce (even if not eliminate) the frame repeat issue.

This is what the test program provided below does:
1) Two white cards are moved one cell each frame in two (top and bottom) 2d grids. Mouse cursor is moved each frame as well.
2) When a frame repeat happens, you notice both the white cards and mouse cursors freezing in place, then:
3) The mouse cursor jumps a position to compensate for the repeat.
4) The top white card moves a cell, only later jumps a cell to compensate. This is because the next frame was already presented and the 3d program couldn’t recall the next presented frame from the GPU when it learned that the current presented frame had been repeated.
5) The bottom white card just resumes as usual after the frame repeat, as it has not been programmed to compensate its position due to a frame repeat.

The program code and executable, 480fps camera recordings, program logs and summary spreadhseet of the logs can be found here: https://e.pcloud.link/publink/show?code ... aQc4SgdJY7


Test hardware:

We've tried with 5 PCs and 5 monitors, and the issue exists on all 5 monitors with 3 out of 5 PCs.

Laptops tested on which have this issue:
1) Aorus 15G XC-8US2430SH (2021) (RTX3070)
2) HP Victus 15 (2023) (RTX2050)
3) Asus Rog G752VS (GTX1070)

There's a custom built PC and one national brand laptop we’ve tested which don't have the issue, both using RTX4090. Right now we’re hesitant to limit use of our hardware and software to RTX40xx series users, even if it was determined that these newer GPUs solve/greatly reduce the issue in general and not just the specific models we’ve tested on so far.

Monitors tested on:
1) AOC C27G2Z 27" - 240Hz - FreeSync Premium
2) SAMSUNG 25" Odyssey G4 LS25BG402ENXGO - 240Hz - FreeSync Premium
3) MSI G27C4X 27 - 240Hz - FreeSync Premium
4) AOC 24G15N 24" - 180Hz - Freesync
5) ASUS TUF Gaming 24” VG249Q1A - 165Hz - FreeSync Premium

HDMI vs DisplayPort, and cheap vs expensive video cables do not seem to make a difference.
Tested both on integrated AMD and Intel, as well as Nvidia GPUs.

(links to these monitors and laptops are available in the cloud folder file “Test hardware info.txt”)


VARIABLE REFRESH RATE: NO IMPACT ON THE ISSUE:

Enabling Variable Refresh Rate (VRR) does not seem to solve the issue.
Our guess is it is due to one of two reasons (or both):
1) The delay induced by the GPU runtime or firmware is much longer than what variable refresh rate monitors can support.
2) There is a bug or limitation in the GPU/GPU runtime that preserves the issue even when VRR is enabled and supported.

This is how we enable VRR in code:
1) Set swap effect to DXGI_SWAP_EFFECT_FLIP_SQUENTIAL
2) Set the application to borderless window mode.
3) Created and resized the swapchain with DXGI_SWAP_CHAIN_FLAG_ALLOW_TEARING flag.
4) Passed 0 to vsync interval parameter field in the Present method.
5) Passed DXGI_PRESENT_ALLOW_TEARING flag to flags parameter field in the Present method


WHY THE ISSUE MATTERS:

The issue does not seem to only affect our optical hardware, but also general VR use and also general 90Hz+ gaming, regardless of monitor used and whether they support Freesync/Freesync Premium or not. Of course for VR it’s much more important due to repeated frames causing mismatch between the current shown VR view and the user’s real head position/rotation.
The issue seems to mainly come up at 120Hz and much more frequently at 240Hz so I’m not surprised it’s not reported or discussed often.

In case you are wondering why our VR device PCB is not synced with the PC some other way: there’s no reliable way to have the frame index data in sync between the device PCB/optical component and PC GPU (A) due to varying latencies you get with USB and DisplayPort-AUX and (B) the GPU simply does not let us know the issue has occurred and it has sent the previously presented frame to the display twice and kept the next presented frame for later until the issue has actually happened.
In theory the frames could have their index embedded on the pixel data itself we could use instead and display black when the issue happens, but this would (A) not solve the artifact and replace the repeating/shifting artifact with a blanking artifact and (B) this would require an expensive FPGA able to handle high fps video since no existing video chip can analyze pixel data this way, which would make the product prohibitively expensive.

If you’re wondering why DLP wobulation/pixel shifting does not have this issue:
DLP projectors receive a 4K 60Hz signal, store it in SRAM on the projector PCB, splits it into 4x 1080p frames at the PCB and displays them in sequence, wobulated. So the PCB does not have to deal with syncing with the GPU and a frame repeat at the GPU since it is producing the sub-frames itself and can sync itself with the optical component easily. But this kind of architecture introduces a 3-frame long latency which is not practical for high fps gaming and AR/VR.


I hope this discussion will help all of us learn why such frame repeats happen on modern GPUs so often, if they can be prevented or reduced with better/different code and if this is a GPU driver/hardware issue, that maybe we can pinpoint the exact cause and report it to Nvidia, AMD and Intel. If it's a code issue, then we can provide the solution to the Unity and Panda3D engine maintainers.

Thanks
Last edited by SubstantialCt8690 on 28 Mar 2024, 20:44, edited 1 time in total.

User avatar
Chief Blur Buster
Site Admin
Posts: 11653
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Frequent frame repeating on modern GPUs

Post by Chief Blur Buster » 24 Feb 2024, 20:38

SubstantialCt8690 wrote:
10 Feb 2024, 19:36
Hello everyone,
This discussion was originally on MonitorTests and posted here by the suggestion of the admin.
I hope you will find the topic and issue interesting and will have some ideas to share.
Do you have variable refresh rate? This behavior is known as LFC (Low Frame Rate Compensation), which is like a DRAM refresh, to repeat-refresh when frametimes get too long between refresh cycles, aka below the refreshtime of the minimum Hz rating. This prevents the image from decaying.
WORKAROUND: Turn off VRR, especially if you need precision framerate=Hz

However, if VRR is turned off, then this is odd, and needs a bit of troubleshoot. Windows DWM (e.g. Borderless Fullscreen) may have some repeat-refresh behaviors as it composites, and if you're using multimonitor (VR and main monitor) you will observe repeat-refresh behaviors because DWM.exe is a single-Hz compositors.
WORKARDOUND: Use single monitor mode when debugging (Windows+Shift+P to turn on/off), and/or use Fullscreen exclusive mode

Inaccurate refresh cycle counting can make it hard to get back in sync quickly, if you're trying to generate frames that correspond to a special refresh cycle (e.g. interlacing pattern, wobulation pattern, or a shutter-glasses sequence).
WORKAROUND: Try my open source refresh cycle estimating/counter module. It's also used by TestUFO.

OPEN SOURCE HELPER MODULE:
Keeping track of which frames aligns to which refresh cycles, can also help you get back in sync with a cycling pattern (interlacing, wobulation). I have opensourced (Apache 2.0) a refresh cycle counter algorithm: https://github.com/blurbusters/RefreshRateCalculator and you can modulus its software-based best-effort refresh cycle counter, to more quickly "get back in correct sync". If you use this, please credit us (and Duckware), as per Apache 2.0 open source license. If you port RefreshRateCalculator.js to a new language, I would request a humble reimbursement in providing the port of the module (even if not your code). The refresh cycle counter is at RefreshRateCalculator.getCount() which is a monotonically increasing refresh cycle counter since the initialization of the module. So you can just quickly get back in sync (within 1-2 frames of a stutter) with whatever shutter/interlace/wobulation/cyclic pattern via a simple MODULUS (%) -- because it does not count frames but microsecond refresh cycle timestamps.

If this does not tick your problem-solve boxes, let me know.

This is a big rabbit hole, of all the abstractions that the Windows compositor does, the 3D API does, the driver/GPU does, etc. So all these different layers can muck about with the frame presentation workflow as a refresh cycle. Drivers and swapchains may also have a habit of repeat-presenting frames, to try to solve various other problems that occur, creating new problems for some people like you. In some ways, this can sometimes be solved by inventing your own custom swapchain. One that piggybacks off fullscreen exclusive + using waitable swapchain (microsoft.com), which is also good to reduce VR latency. Having your own custom swapchain, on a fullscreen-exclusive mode, generally gives you more control over whether frames are presented or not. However, this might be a wild goose chase, without understanding the underlying cause. There are reasons of various kinds of swapchains; a deeper swapchain can reduce stutter, but a shallower swapchain can reduce latency. So a lot of work is needed to get best of both worlds, sometimes full of workarounds such as making the swapchain thread higher priority than the rendering thread, to make sure there's no blocking behaviours and that the page flip occurs on time at the least latency without stutter (missed vsync). What's happening may be more complex than double buffering or triple buffering. So very smart implementation can get best of both worlds (low latency and perfect framerate=Hz required for VR use cases).

BTW, I, myself, as part of Blur Busters also provides consulting services, see https://services.blurbusters.com on contract -- I've helped a few AR/VR vendors too, both onsite and offsite. Although I do not directly provide code, I have various algorithm for making 3D shutter glasses generically more reliable too.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

SubstantialCt8690
Posts: 3
Joined: 19 Jan 2024, 12:47

Re: Frequent frame repeating on modern GPUs

Post by SubstantialCt8690 » 13 Mar 2024, 20:18

Hello,
First of all thank you so much for getting back to me on this, like I mentioned we are at a loss and unable to get feedback directly from NVidia or any other place.
Also apologies for the delay, we prepared a response and checked your open source code first.

I hadn't noticed that you provide professional consulting on the website, sorry, the info seems a little buried. I'll email you about payment/contract, but I really don't mind having the technical discussion here since like I explained the issue doesn't seems to be Nvidia GPU, CPU and 3d API-agnostic. Others may find it useful.

If you are not able to oberve the same issue with your own PC and monitor, I'm ready to supply you the same monitor and test laptop we use for tests o that you can validate the issue with your own eyes and/or measuring equipment.
Chief Blur Buster wrote:
24 Feb 2024, 20:38
Do you have variable refresh rate? This behavior is known as LFC (Low Frame Rate Compensation), which is like a DRAM refresh, to repeat-refresh when frametimes get too long between refresh cycles, aka below the refreshtime of the minimum Hz rating. This prevents the image from decaying.
WORKAROUND: Turn off VRR, especially if you need precision framerate=Hz
So far we have worked using single monitor mode and VRR off when debugging, unfortunately it does not solve the problem.
In DirectX we disabled VRR by:
1) Passing flags 1 and 0 to swapchain presentation method (mSwapchain->Present(1,0))
2) Swapchain created with flag 0

Freesync has also been explicitly enabled and then disabled in the monitor settings, with no difference to the results.
However, if VRR is turned off, then this is odd, and needs a bit of troubleshoot. Windows DWM (e.g. Borderless Fullscreen) may have some repeat-refresh behaviors as it composites, and if you're using multimonitor (VR and main monitor) you will observe repeat-refresh behaviors because DWM.exe is a single-Hz compositors.
WORKARDOUND: Use single monitor mode when debugging (Windows+Shift+P to turn on/off), and/or use Fullscreen exclusive mode
We have built a test setup where the OS version is Windows 7. So that DWM can be disabled via code unlike Windows 8 and later versions. We have used dwmapi.h method DwmEnableComposition(UINT uCompositionAction) with the value of DWM_EC_DISABLECOMPOSITION.

However disabling DWM did not eliminate the problem. I am still hesitant to eliminate the DWM from the list due to DWM has it's one leg inside the DXGI.


We have also tried custom swapchain implementation via NVidia NvAPI and again no luck there:
1) Used method

Code: Select all

NvAPI_D3D_DirectModeCreateSurface
to create each surface image
2) Used

Code: Select all

OpenSharedResource
on the created surface image to get their handle
3) Created a

Code: Select all

ID3D11RenderTargetView
object with parameters "

Code: Select all

Buffer={}","MipSlice=0","ViewDimension=D3D11_RTV_DIMENSION_TEXTURE2D","Format=DXGI_FORMAT_R8G8B8A8_UNORM"
4) upon presenting the image we have called

Code: Select all

NvAPI_D3D_DirectModePresent 
method with

Code: Select all

NV_DIRECTMODE_PRESENT_FLAG_VSYNC
flag.
Inaccurate refresh cycle counting can make it hard to get back in sync quickly, if you're trying to generate frames that correspond to a special refresh cycle (e.g. interlacing pattern, wobulation pattern, or a shutter-glasses sequence).
WORKAROUND: Try my open source refresh cycle estimating/counter module. It's also used by TestUFO.
While your code provides more precision for calculating the frame time than our method, it is not related to the frame repeating issue we see here. Still, we're going ahead and implementing support for your FOSS library to be used by our code.
This is a big rabbit hole, of all the abstractions that the Windows compositor does, the 3D API does, the driver/GPU does, etc. So all these different layers can muck about with the frame presentation workflow as a refresh cycle. Drivers and swapchains may also have a habit of repeat-presenting frames, to try to solve various other problems that occur, creating new problems for some people like you. In some ways, this can sometimes be solved by inventing your own custom swapchain. One that piggybacks off fullscreen exclusive + using
<link removed> ), which is also good to reduce VR latency. Having your own custom swapchain, on a fullscreen-exclusive mode, generally gives you more control over whether frames are presented or not.
We tried a custom swapchain in DirectX as well. We had a similar result compared to our current best settings within the shared application/code.
1) We have used DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT flag during the creation of the swapchain, the same flag is also used during the ResizeBuffers method calls.
2) We have set the maximum frame latency to 1 via method call to IDXGISwapchain2::SetMaximumFrameLatency
3) We get the waitable object via IDXGISwapchain2::GetFrameLatencyWaitableObject
4) Then we simply wait for the waitable object at the start of every frame.
5) We have used DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL for the swap effect.


We need a way of validating which item(s) below is actually affecting the artifact.

1) OS scheduling
2) GPU internal presentation logic
3) 3D API such as Directx11 or Vulkan
4) Drivers
5) Desktop Window Manager (DWM)

Some have been addressed above, for the rest I will be listing and detailing our experience with the items. Please let me know if we have missed something obvious or you have some ideas to test/check.

1) OS scheduling:

We have used the win32 API to assign priorities to processes/threads via using

Code: Select all

SetPriorityClass
for processes and

Code: Select all

SetThreadPriority
for threads.

For process priority we have used

Code: Select all

NORMAL_PRIORITY_CLASS
HIGH_PRIORITY_CLASS
REALTIME_PRIORITY_CLASS
with combination of thread priorities:

Code: Select all

THREAD_PRIORITY_ABOVE_NORMAL
THREAD_PRIORITY_HIGHEST
THREAD_PRIORITY_TIME_CRITICAL
Setting for REALTIME for the process and TIME_CRITICAL for the thread priority helped to some extent but failed to remove the problem completely. We have researched if there are some other methods which can help or piggyback to the Win32 scheduling system but we did not find any. We need to check if the artifact happens due to some OS scheduling precision.

3) 3D API such as Directx11 or Vulkan:

For Directx11:

We have used the both legacy and flip presentation model however neither of them helped.
We have tried 1,2 and 3 buffering with the swapchain backbuffers, no difference is observed.
We have used present swap effects below. DISCARD gave us the worst results whereas FLIP_SEQUENTIAL was the most stable one.

Code: Select all

DXGI_SWAP_EFFECT_DISCARD
DXGI_SWAP_EFFECT_SEQUENTIAL
DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL
DXGI_SWAP_EFFECT_FLIP_DISCARD
We have used scaling modes below, no difference.

Code: Select all

DXGI_MODE_SCALING_UNSPECIFIED
DXGI_MODE_SCALING_CENTERED
DXGI_MODE_SCALING_STRETCHED
We have only used

Code: Select all

DXGI_MODE_SCANLINE_ORDER_PROGRESSIVE
mode for all our tests.
We have only used

Code: Select all

DXGI_FORMAT_R8G8B8A8_UNORM
texture format for our swapchain buffers
We have tried taking the ownership of the output via

Code: Select all

IDXGIOutput::TakeOwnership
. This did not affect the problem.
Setting

Code: Select all

IDXGIDevice::SetMaximumFrameLatency
and

Code: Select all

IDXGISwapchain2::SetMaximumFrameLatency
to 1 seems to have more stable results.
We have tried some

Code: Select all

D3D11_CREATE_DEVICE_FLAG
configurations. The configurations are listed below

Code: Select all

-D3D11_CREATE_DEVICE_SINGLETHREADED
-D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS
The flag

Code: Select all

D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS
. It actually made the presentation more stable. So it feels like D3D11 internal threads gets in the way of precise presentation timings. Which also begs the question of how much exactly the OS scheduling plays a role here.

For Vulkan:

Used the presentation modes below. None of them fixed the problem

Code: Select all

VK_PRESENT_MODE_IMMEDIATE_KHR
VK_PRESENT_MODE_MAILBOX_KHR
VK_PRESENT_MODE_FIFO_KHR
VK_PRESENT_MODE_FIFO_RELAXED_KHR
Swapchain configurations during debugging listed below.
Image Usage->

Code: Select all

VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT | VK_IMAGE_USAGE_STORAGE_BIT
Image color space->

Code: Select all

VK_COLOR_SPACE_SRGB_NONLINEAR_KHR
Image sharing mode->

Code: Select all

VK_SHARING_MODE_EXCLUSIVE

DirectX11 vs Vulkan:


In summary both APIs gives the same results. So this likely indicates that this artifact may not be originated from the Graphics API itself.


Please let me know if you have some directions for us.
As a reminder, issue seems to be present with laptops equipped with GTX1070, RTX2050, RTX3070 with both their integrated Intel and AMD GPUs as well as the NVidia GPUs, but absent with RTX4090. You should probably start by checking if you can reproduce the artifact on your own setup.

User avatar
Chief Blur Buster
Site Admin
Posts: 11653
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Frequent frame repeating on modern GPUs

Post by Chief Blur Buster » 15 Mar 2024, 21:05

Thanks!

This extends into the Blur Busters Consulting Services territory. Continue the conversation with me by email instead, mark [at] blurbusters.com or via www.blurbusters.com/about/contact
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

SubstantialCt8690
Posts: 3
Joined: 19 Jan 2024, 12:47

Re: Frequent frame repeating on modern GPUs

Post by SubstantialCt8690 » 25 Apr 2024, 08:34

Chief Blur Buster wrote:
15 Mar 2024, 21:05
Hello,

We are in a contract now and haven't heard back from you for 21 days and you're not responding to emails. What are we expected to assume?

Post Reply