Frequent frame repeating on modern GPUs
Posted: 10 Feb 2024, 19:36
Hello everyone,
This discussion was originally on MonitorTests and posted here by the suggestion of the admin.
I hope you will find the topic and issue interesting and will have some ideas to share.
THE ISSUE:
My company works on AR/VR optics technologies and for several months we've been trying to solve an issue plaguing our device and all off-the-shelf tested monitors. The issue involves frame repeats at the GPU/GPU runtime at >=90Hz.
Right now we are able to compensate for frame repeats in the code in the future frames, but it does not fix the visual artifacts that happen during the frame repeats.
By "frame repeat" I do not mean a frame skip: I mean our simple DirectX/Vulkan test programs present a frame to the GPU, and then the GPU doesn't display new presented frames on time and instead sends the earlier presented frame twice or more to the monitor. The issue is not that the program / game engine is not able to provide frames on time: the frame repeat seems to be happening at the DirectX/Vulkan, GPU runtime or GPU stage which we don’t have access or control over. The APIs are telling our minimal test program to wait while our program itself is not busy and isn’t doing anything.
The artifacts are especially noticeable in our case, since we do optical pixel shifting/wobulation to increase resolution. We are not the only company working on such AR/VR tech (see “Digilens T-Rex”).
We understand that the frame repeat artifact may not be completely preventable for 100% of the time, but we still hope it can be reduced greatly so that it doesn’t happen every 40 seconds or so on modern laptop hardware with no user background processes happening. As-is, we can’t really develop our pixel shifting prototypes into viable products if the user is going to see a flicker/shift effect so frequently.
Being a tiny startup, we haven’t been able to discuss this issue with GPU suppliers directly over email and we’ve posted the issue in the Nvidia developer forum but are not sure they’ll find it worth their time unless we can narrow down the issue or find thr actual root cause.
TEST HARDWARE/SOFTWARE:
Test software:
These are the test programs we've created for detecting and analyzing the issue:
1) A pure DirectX, Vulkan, Unity (DX11) and Panda3D (OpenGL) programs for displaying red and blue frames in sequence on regular monitors.
2) A SteamVR headset runtime utilizing VRWorks API doing the same and tested with regular 180-240Hz monitors. VRWorks API requires an NDA so cannot be shared here.
3) A Unity SteamVR program displaying red and blue frames in sequence and ran on an existing SteamVR headset (HP Reverb G2) with its own proprietary VR runtime.
All Windows Power Settings and GPU settings have been checked.
Program (1) source code and binary is provided in the below link.
Even though we have spent many months on tests regarding this, it is very much possible that there may be some other way to code this to reduce (even if not eliminate) the frame repeat issue.
This is what the test program provided below does:
1) Two white cards are moved one cell each frame in two (top and bottom) 2d grids. Mouse cursor is moved each frame as well.
2) When a frame repeat happens, you notice both the white cards and mouse cursors freezing in place, then:
3) The mouse cursor jumps a position to compensate for the repeat.
4) The top white card moves a cell, only later jumps a cell to compensate. This is because the next frame was already presented and the 3d program couldn’t recall the next presented frame from the GPU when it learned that the current presented frame had been repeated.
5) The bottom white card just resumes as usual after the frame repeat, as it has not been programmed to compensate its position due to a frame repeat.
The program code and executable, 480fps camera recordings, program logs and summary spreadhseet of the logs can be found here: https://e.pcloud.link/publink/show?code ... aQc4SgdJY7
Test hardware:
We've tried with 5 PCs and 5 monitors, and the issue exists on all 5 monitors with 3 out of 5 PCs.
Laptops tested on which have this issue:
1) Aorus 15G XC-8US2430SH (2021) (RTX3070)
2) HP Victus 15 (2023) (RTX2050)
3) Asus Rog G752VS (GTX1070)
There's a custom built PC and one national brand laptop we’ve tested which don't have the issue, both using RTX4090. Right now we’re hesitant to limit use of our hardware and software to RTX40xx series users, even if it was determined that these newer GPUs solve/greatly reduce the issue in general and not just the specific models we’ve tested on so far.
Monitors tested on:
1) AOC C27G2Z 27" - 240Hz - FreeSync Premium
2) SAMSUNG 25" Odyssey G4 LS25BG402ENXGO - 240Hz - FreeSync Premium
3) MSI G27C4X 27 - 240Hz - FreeSync Premium
4) AOC 24G15N 24" - 180Hz - Freesync
5) ASUS TUF Gaming 24” VG249Q1A - 165Hz - FreeSync Premium
HDMI vs DisplayPort, and cheap vs expensive video cables do not seem to make a difference.
Tested both on integrated AMD and Intel, as well as Nvidia GPUs.
(links to these monitors and laptops are available in the cloud folder file “Test hardware info.txt”)
VARIABLE REFRESH RATE: NO IMPACT ON THE ISSUE:
Enabling Variable Refresh Rate (VRR) does not seem to solve the issue.
Our guess is it is due to one of two reasons (or both):
1) The delay induced by the GPU runtime or firmware is much longer than what variable refresh rate monitors can support.
2) There is a bug or limitation in the GPU/GPU runtime that preserves the issue even when VRR is enabled and supported.
This is how we enable VRR in code:
1) Set swap effect to DXGI_SWAP_EFFECT_FLIP_SQUENTIAL
2) Set the application to borderless window mode.
3) Created and resized the swapchain with DXGI_SWAP_CHAIN_FLAG_ALLOW_TEARING flag.
4) Passed 0 to vsync interval parameter field in the Present method.
5) Passed DXGI_PRESENT_ALLOW_TEARING flag to flags parameter field in the Present method
WHY THE ISSUE MATTERS:
The issue does not seem to only affect our optical hardware, but also general VR use and also general 90Hz+ gaming, regardless of monitor used and whether they support Freesync/Freesync Premium or not. Of course for VR it’s much more important due to repeated frames causing mismatch between the current shown VR view and the user’s real head position/rotation.
The issue seems to mainly come up at 120Hz and much more frequently at 240Hz so I’m not surprised it’s not reported or discussed often.
In case you are wondering why our VR device PCB is not synced with the PC some other way: there’s no reliable way to have the frame index data in sync between the device PCB/optical component and PC GPU (A) due to varying latencies you get with USB and DisplayPort-AUX and (B) the GPU simply does not let us know the issue has occurred and it has sent the previously presented frame to the display twice and kept the next presented frame for later until the issue has actually happened.
In theory the frames could have their index embedded on the pixel data itself we could use instead and display black when the issue happens, but this would (A) not solve the artifact and replace the repeating/shifting artifact with a blanking artifact and (B) this would require an expensive FPGA able to handle high fps video since no existing video chip can analyze pixel data this way, which would make the product prohibitively expensive.
If you’re wondering why DLP wobulation/pixel shifting does not have this issue:
DLP projectors receive a 4K 60Hz signal, store it in SRAM on the projector PCB, splits it into 4x 1080p frames at the PCB and displays them in sequence, wobulated. So the PCB does not have to deal with syncing with the GPU and a frame repeat at the GPU since it is producing the sub-frames itself and can sync itself with the optical component easily. But this kind of architecture introduces a 3-frame long latency which is not practical for high fps gaming and AR/VR.
I hope this discussion will help all of us learn why such frame repeats happen on modern GPUs so often, if they can be prevented or reduced with better/different code and if this is a GPU driver/hardware issue, that maybe we can pinpoint the exact cause and report it to Nvidia, AMD and Intel. If it's a code issue, then we can provide the solution to the Unity and Panda3D engine maintainers.
Thanks
This discussion was originally on MonitorTests and posted here by the suggestion of the admin.
I hope you will find the topic and issue interesting and will have some ideas to share.
THE ISSUE:
My company works on AR/VR optics technologies and for several months we've been trying to solve an issue plaguing our device and all off-the-shelf tested monitors. The issue involves frame repeats at the GPU/GPU runtime at >=90Hz.
Right now we are able to compensate for frame repeats in the code in the future frames, but it does not fix the visual artifacts that happen during the frame repeats.
By "frame repeat" I do not mean a frame skip: I mean our simple DirectX/Vulkan test programs present a frame to the GPU, and then the GPU doesn't display new presented frames on time and instead sends the earlier presented frame twice or more to the monitor. The issue is not that the program / game engine is not able to provide frames on time: the frame repeat seems to be happening at the DirectX/Vulkan, GPU runtime or GPU stage which we don’t have access or control over. The APIs are telling our minimal test program to wait while our program itself is not busy and isn’t doing anything.
The artifacts are especially noticeable in our case, since we do optical pixel shifting/wobulation to increase resolution. We are not the only company working on such AR/VR tech (see “Digilens T-Rex”).
We understand that the frame repeat artifact may not be completely preventable for 100% of the time, but we still hope it can be reduced greatly so that it doesn’t happen every 40 seconds or so on modern laptop hardware with no user background processes happening. As-is, we can’t really develop our pixel shifting prototypes into viable products if the user is going to see a flicker/shift effect so frequently.
Being a tiny startup, we haven’t been able to discuss this issue with GPU suppliers directly over email and we’ve posted the issue in the Nvidia developer forum but are not sure they’ll find it worth their time unless we can narrow down the issue or find thr actual root cause.
TEST HARDWARE/SOFTWARE:
Test software:
These are the test programs we've created for detecting and analyzing the issue:
1) A pure DirectX, Vulkan, Unity (DX11) and Panda3D (OpenGL) programs for displaying red and blue frames in sequence on regular monitors.
2) A SteamVR headset runtime utilizing VRWorks API doing the same and tested with regular 180-240Hz monitors. VRWorks API requires an NDA so cannot be shared here.
3) A Unity SteamVR program displaying red and blue frames in sequence and ran on an existing SteamVR headset (HP Reverb G2) with its own proprietary VR runtime.
All Windows Power Settings and GPU settings have been checked.
Program (1) source code and binary is provided in the below link.
Even though we have spent many months on tests regarding this, it is very much possible that there may be some other way to code this to reduce (even if not eliminate) the frame repeat issue.
This is what the test program provided below does:
1) Two white cards are moved one cell each frame in two (top and bottom) 2d grids. Mouse cursor is moved each frame as well.
2) When a frame repeat happens, you notice both the white cards and mouse cursors freezing in place, then:
3) The mouse cursor jumps a position to compensate for the repeat.
4) The top white card moves a cell, only later jumps a cell to compensate. This is because the next frame was already presented and the 3d program couldn’t recall the next presented frame from the GPU when it learned that the current presented frame had been repeated.
5) The bottom white card just resumes as usual after the frame repeat, as it has not been programmed to compensate its position due to a frame repeat.
The program code and executable, 480fps camera recordings, program logs and summary spreadhseet of the logs can be found here: https://e.pcloud.link/publink/show?code ... aQc4SgdJY7
Test hardware:
We've tried with 5 PCs and 5 monitors, and the issue exists on all 5 monitors with 3 out of 5 PCs.
Laptops tested on which have this issue:
1) Aorus 15G XC-8US2430SH (2021) (RTX3070)
2) HP Victus 15 (2023) (RTX2050)
3) Asus Rog G752VS (GTX1070)
There's a custom built PC and one national brand laptop we’ve tested which don't have the issue, both using RTX4090. Right now we’re hesitant to limit use of our hardware and software to RTX40xx series users, even if it was determined that these newer GPUs solve/greatly reduce the issue in general and not just the specific models we’ve tested on so far.
Monitors tested on:
1) AOC C27G2Z 27" - 240Hz - FreeSync Premium
2) SAMSUNG 25" Odyssey G4 LS25BG402ENXGO - 240Hz - FreeSync Premium
3) MSI G27C4X 27 - 240Hz - FreeSync Premium
4) AOC 24G15N 24" - 180Hz - Freesync
5) ASUS TUF Gaming 24” VG249Q1A - 165Hz - FreeSync Premium
HDMI vs DisplayPort, and cheap vs expensive video cables do not seem to make a difference.
Tested both on integrated AMD and Intel, as well as Nvidia GPUs.
(links to these monitors and laptops are available in the cloud folder file “Test hardware info.txt”)
VARIABLE REFRESH RATE: NO IMPACT ON THE ISSUE:
Enabling Variable Refresh Rate (VRR) does not seem to solve the issue.
Our guess is it is due to one of two reasons (or both):
1) The delay induced by the GPU runtime or firmware is much longer than what variable refresh rate monitors can support.
2) There is a bug or limitation in the GPU/GPU runtime that preserves the issue even when VRR is enabled and supported.
This is how we enable VRR in code:
1) Set swap effect to DXGI_SWAP_EFFECT_FLIP_SQUENTIAL
2) Set the application to borderless window mode.
3) Created and resized the swapchain with DXGI_SWAP_CHAIN_FLAG_ALLOW_TEARING flag.
4) Passed 0 to vsync interval parameter field in the Present method.
5) Passed DXGI_PRESENT_ALLOW_TEARING flag to flags parameter field in the Present method
WHY THE ISSUE MATTERS:
The issue does not seem to only affect our optical hardware, but also general VR use and also general 90Hz+ gaming, regardless of monitor used and whether they support Freesync/Freesync Premium or not. Of course for VR it’s much more important due to repeated frames causing mismatch between the current shown VR view and the user’s real head position/rotation.
The issue seems to mainly come up at 120Hz and much more frequently at 240Hz so I’m not surprised it’s not reported or discussed often.
In case you are wondering why our VR device PCB is not synced with the PC some other way: there’s no reliable way to have the frame index data in sync between the device PCB/optical component and PC GPU (A) due to varying latencies you get with USB and DisplayPort-AUX and (B) the GPU simply does not let us know the issue has occurred and it has sent the previously presented frame to the display twice and kept the next presented frame for later until the issue has actually happened.
In theory the frames could have their index embedded on the pixel data itself we could use instead and display black when the issue happens, but this would (A) not solve the artifact and replace the repeating/shifting artifact with a blanking artifact and (B) this would require an expensive FPGA able to handle high fps video since no existing video chip can analyze pixel data this way, which would make the product prohibitively expensive.
If you’re wondering why DLP wobulation/pixel shifting does not have this issue:
DLP projectors receive a 4K 60Hz signal, store it in SRAM on the projector PCB, splits it into 4x 1080p frames at the PCB and displays them in sequence, wobulated. So the PCB does not have to deal with syncing with the GPU and a frame repeat at the GPU since it is producing the sub-frames itself and can sync itself with the optical component easily. But this kind of architecture introduces a 3-frame long latency which is not practical for high fps gaming and AR/VR.
I hope this discussion will help all of us learn why such frame repeats happen on modern GPUs so often, if they can be prevented or reduced with better/different code and if this is a GPU driver/hardware issue, that maybe we can pinpoint the exact cause and report it to Nvidia, AMD and Intel. If it's a code issue, then we can provide the solution to the Unity and Panda3D engine maintainers.
Thanks