Pre-rendered frames etc. (continued from G-Sync 101 article)

Talk about NVIDIA G-SYNC, a variable refresh rate (VRR) technology. G-SYNC eliminates stutters, tearing, and reduces input lag. List of G-SYNC Monitors.
pegnose
Posts: 17
Joined: 29 Dec 2018, 04:22

Re: Pre-rendered frames etc. (continued from G-Sync 101 arti

Post by pegnose » 06 Jan 2019, 16:17

@Chief Blur Buster:
This is seriously getting out of hands!! :D

I will read your post carefully and maybe come back with one or two questions, if I may.

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Pre-rendered frames etc. (continued from G-Sync 101 arti

Post by Chief Blur Buster » 06 Jan 2019, 21:38

pegnose wrote:So RTSS replaces the present() function with one that is able to wait until the given frame time requirement is fulfilled. Effectively the system is CPU limited (hence 0 pre-rendered frames), even if actually not much work is done.
There are many tactics of waiting. Let's also note the GeDoSaTo technique of delaying inputread+render+flip too, which is not normally typically done by RTSS. It's possible for a capper to present quickly, followed by immediately a predictive blocking-wait (based on historic frametimes) before returning to the caller's Present() to put the next inputread-render-flip closer to the VSYNC intervals, like this:

Image

There is a new framerate capper here too:
viewtopic.php?f=10&t=4724
pegnose wrote:
RealNC wrote:The highest value you can use in NVidia Profile Inspector is 8. Maybe the driver allows even higher values, but there's no point in having an option to that. Even 8 is pretty much useless. Games become unplayable.

Not sure what the default ("app controlled") is when a game doesn't specify a pre-render queue size. I've heard it's 3, and that's decided by DirectX. But not sure. It could be 2. Or it might depend on the amount of cores in the CPU. The whole pre-render buffer mechanism and the switch to asynchronous frame buffer flipping was done because mainstream CPUs started to have more than one core in the 2000's.
Makes sense. Wow, 8!
It can be useful in Quad-SLI situations, then it's only 2 per card.

Look at Quad Titan setup for an eye opening setup I wrote about five years ago. Whoo!

Correct me if I am wrong, but it is my understanding that prerendered frames can be useful for assisting in SLI framepacing to fix microstutters. (Hopefully it's not multipled by cards, like 8 times 4 equals 32 -- that would be insanely ridiculous)
pegnose wrote:@Chief Blur Buster:
This is seriously getting out of hands!! :D
I will read your post carefully and maybe come back with one or two questions, if I may.
No worries, but since people were coming up with "scanout latencies" discussion. I had to make sure that was covered. Most of this is far beyond the average layperson's ability to conceptualize but this topic thread certainly attracted highly skilled discussion.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
jorimt
Posts: 2481
Joined: 04 Nov 2016, 10:44
Location: USA

Re: Pre-rendered frames etc. (continued from G-Sync 101 arti

Post by jorimt » 06 Jan 2019, 22:20

pegnose wrote:From the previous discussion I understood that VRR with V-Sync enabled in NVCP has the same issues as mere V-Sync, correct? Only if you disable V-Sync with VRR, and accept some minor tearing here and there, you can get rid of all the down-sides?
I *think* you're talking about the issue (correct me if I assumed wrong here) where G-SYNC + V-SYNC, when within the refresh rate, can exhibit more hitching during frametime spikes or transitions from 0 to x frames (and visa-versa) when compared to standalone V-SYNC, no sync, or G-SYNC + V-SYNC "Off"?

Because G-SYNC (even paired with the V-SYNC option) has none of the issues that standalone V-SYNC has within the refresh rate; it has a different issue (and cause) entirely.

If that's what you're referring to, it's because 1) G-SYNC + V-SYNC appears to have what could be called a brief "re-initialization" or "re-connection" period (between the module and the GPU) when there is an extremely abrupt transition in the framerate (FYI, there were official reports of a 1ms polling rate on the original modules, and it was never confirmed whether that polling rate either diminished or was entirely eliminated in further module iterations), and 2) since the scanout speed itself on the display is static and unchanging, unavoidable timing misalignments between framerate and scanout progress can occur during these abrupt transitions where the module attempts to adhere to the scanout (to completely avoid tearing), which will create frame delivery delay where the other methods would not, since the module may have to hold the next frame and skip part of (or sometimes, even all of a scanout cycle) before delivery in extreme instances, and thus repeat display of/refresh with the previous frame once or more in the meantime...

At very high frametimes (or instantly going from a very high frametime to a very low frame time, or visa-versa) that can obviously becomes problematic, causing a visible hitch (or hitches), separate and individual of (but ultimately directly triggered by) the system, which causes the frametime spikes (that are still visible without or with any form of syncing) in the first place.

Point "1" has never been fully confirmed (and isn't easy to test for), and as for point "2," even if there isn't a "polling rate" between the monitor module and the GPU, the issue can only really be diminished/avoided by nearly constant, perfect frametimes (which is virtually unachievable the vast majority of the time on all but older/least demanding games paired with the most powerful systems) or (the more achievable) continued increase of the static scanout speed (240Hz, 480Hz, etc).
(jorimt: /jor-uhm-tee/)
Author: Blur Busters "G-SYNC 101" Series

Displays: ASUS PG27AQN, LG 48CX VR: Beyond, Quest 3, Reverb G2, Index OS: Windows 11 Pro Case: Fractal Design Torrent PSU: Seasonic PRIME TX-1000 MB: ASUS Z790 Hero CPU: Intel i9-13900k w/Noctua NH-U12A GPU: GIGABYTE RTX 4090 GAMING OC RAM: 32GB G.SKILL Trident Z5 DDR5 6400MHz CL32 SSDs: 2TB WD_BLACK SN850 (OS), 4TB WD_BLACK SN850X (Games) Keyboards: Wooting 60HE, Logitech G915 TKL Mice: Razer Viper Mini SE, Razer Viper 8kHz Sound: Creative Sound Blaster Katana V2 (speakers/amp/DAC), AFUL Performer 8 (IEMs)

pegnose
Posts: 17
Joined: 29 Dec 2018, 04:22

Re: Pre-rendered frames etc. (continued from G-Sync 101 arti

Post by pegnose » 13 Jan 2019, 17:03

Thank you very much, this post was extremely interesting and helped my understanding!
Chief Blur Buster wrote: Which means latency is more uniform for the whole screen plane during VSYNC OFF, unlike for VSYNC ON
At the time a particular line appears on the screen this line is less old then if it was created with VSÝNC ON. So latency is more uniform with respect to when data appears on the screen. But at the time all lines have been drawn and the image is complete, the latency with VSYNC ON is more uniform again from a certain standpoint, because all pixels have been created at the same time and have the same latency wrt the time point of viewing (while with VSYNC OFF the image consists of multiple images in fact).
Chief Blur Buster wrote: For VRR, best stutter elimination occurs when refreshtime (the time the photons are hitting eyeballs) is exactly in sync with the time taken to render that particular frame. But VRR does current refreshtime equal to the timing of the end of previous frametime (one-off)...
Enjoy. :D
I am having trouble understanding this, particularly the last sentence. Do you happen to have a figure to illustrate this clearer? Which time depends on what time from 1 frame earlier?

pegnose
Posts: 17
Joined: 29 Dec 2018, 04:22

Re: Pre-rendered frames etc. (continued from G-Sync 101 arti

Post by pegnose » 13 Jan 2019, 17:34

Chief Blur Buster wrote: There are many tactics of waiting. Let's also note the GeDoSaTo technique of delaying inputread+render+flip too, which is not normally typically done by RTSS. It's possible for a capper to present quickly, followed by immediately a predictive blocking-wait (based on historic frametimes) before returning to the caller's Present() to put the next inputread-render-flip closer to the VSYNC intervals, like this:

Image
Ah, jorimt and RealNC were talking about that, too. Seems like the optimal solution. So, some games implement this?
Chief Blur Buster wrote: There is a new framerate capper here too:
viewtopic.php?f=10&t=4724
Nice.
Chief Blur Buster wrote:
pegnose wrote:
RealNC wrote:The highest value you can use in NVidia Profile Inspector is 8. Maybe the driver allows even higher values, but there's no point in having an option to that. Even 8 is pretty much useless. Games become unplayable.

Not sure what the default ("app controlled") is when a game doesn't specify a pre-render queue size. I've heard it's 3, and that's decided by DirectX. But not sure. It could be 2. Or it might depend on the amount of cores in the CPU. The whole pre-render buffer mechanism and the switch to asynchronous frame buffer flipping was done because mainstream CPUs started to have more than one core in the 2000's.
Makes sense. Wow, 8!
It can be useful in Quad-SLI situations, then it's only 2 per card.

Look at Quad Titan setup for an eye opening setup I wrote about five years ago. Whoo!
Holy shit! :D
Chief Blur Buster wrote: Correct me if I am wrong, but it is my understanding that prerendered frames can be useful for assisting in SLI framepacing to fix microstutters. (Hopefully it's not multipled by cards, like 8 times 4 equals 32 -- that would be insanely ridiculous)
Interesting, never tried that. Unfortunately, I have sold my 1080 SLI setup in favor of a 2080 Ti. Basically no VR game supported VR SLI.

pegnose
Posts: 17
Joined: 29 Dec 2018, 04:22

Re: Pre-rendered frames etc. (continued from G-Sync 101 arti

Post by pegnose » 13 Jan 2019, 17:42

jorimt wrote:
pegnose wrote:From the previous discussion I understood that VRR with V-Sync enabled in NVCP has the same issues as mere V-Sync, correct? Only if you disable V-Sync with VRR, and accept some minor tearing here and there, you can get rid of all the down-sides?
I *think* you're talking about the issue (correct me if I assumed wrong here) where G-SYNC + V-SYNC, when within the refresh rate, can exhibit more hitching during frametime spikes or transitions from 0 to x frames (and visa-versa) when compared to standalone V-SYNC, no sync, or G-SYNC + V-SYNC "Off"?

Because G-SYNC (even paired with the V-SYNC option) has none of the issues that standalone V-SYNC has within the refresh rate; it has a different issue (and cause) entirely.

If that's what you're referring to...
Honestly, I can't say anymore without reading up much of my past comments. I might have only meant the frame stacking issue. But this was much more interesting! If you think you know by now most of the intricate details of a technique, there is still so much in between. :)

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Pre-rendered frames etc. (continued from G-Sync 101 arti

Post by Chief Blur Buster » 14 Jan 2019, 01:58

pegnose wrote:Ah, jorimt and RealNC were talking about that, too. Seems like the optimal solution. So, some games implement this?
Very few. It takes a lot of intentional programmer work to do that. I've seen some emulators with a configurable inputread-delay to reduce input lag.

However, it's nowadays much easier to simply use VRR for that because you can simply use an accurate software timer to tighten the inputread+render+display cycle, since a VRR display immediately refreshes on frames being Present()'d to the display output (no such thing as a scheduled refresh interval for VRR...)

However, that does piddly squat for fixed-Hz use cases, as well as those of us who want to get perfect framepacing at low latencies for ULMB/motion blur reduction. (That's a big raison d'etre of the new RTSS scanline sync feature -- it makes great low-lag stutterless ULMB possible in some low-GPU-overhead games)
pegnose wrote:Interesting, never tried that. Unfortunately, I have sold my 1080 SLI setup in favor of a 2080 Ti. Basically no VR game supported VR SLI.
I've got an Oculus Rift, so I guess I should be replacing my GTX 1080 Ti Exteeme with an RTX 2080 Ti eventually.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Pre-rendered frames etc. (continued from G-Sync 101 arti

Post by Chief Blur Buster » 14 Jan 2019, 02:13

jorimt wrote:If that's what you're referring to, it's because 1) G-SYNC + V-SYNC appears to have what could be called a brief "re-initialization" or "re-connection" period (between the module and the GPU) when there is an extremely abrupt transition in the framerate (FYI, there were official reports of a 1ms polling rate on the original modules, and it was never confirmed whether that polling rate either diminished or was entirely eliminated in further module iterations), and 2) since the scanout speed itself on the display is static and unchanging, unavoidable timing misalignments between framerate and scanout progress can occur during these abrupt transitions where the module attempts to adhere to the scanout (to completely avoid tearing), which will create frame delivery delay where the other methods would not, since the module may have to hold the next frame and skip part of (or sometimes, even all of a scanout cycle) before delivery in extreme instances, and thus repeat display of/refresh with the previous frame once or more in the meantime...
From what I am understanding in newer developments:

FreeSync is fundamentally unidirectional, and VRR can be operated unidirectionally. I think some newer GSYNC implementations are now unidirectional from what I'm seeing, e.g. closer to simplified NVIDIA-specific piggyback enhancements onto VESA Adaptive-Sync panels. Further engineering appears to be simplifying VRR implementation. Which may mean that the difference between GSYNC and FreeSync is gradually diminishing on the low-end and mid-range. High end GSYNC probably still require some 2-way behaviours for best operation, but the low end doesn't need it anymore -- in fact, NVIDIA already piggybacks on the VESA Adaptive-Sync with probable simplifications to VRR-based overdrive algorithms that avoids the needs of FPGAs and such. The best FreeSync improved to the point where the venn diagram of quality begins to overlap the cheaper/basic G-SYNC monitors -- and NVIDIA decided it was time to certify certain FreeSync monitors as meeting G-SYNC quality requirements.

Now....
Theoretically (if the drivers lets it be precise), the granularity of FreeSync is simply 1 unit of horizontal scanrate, so 160KHz Horizontal Scan Rate (seen in Custom Resolution Utilities) means 1/160000sec granularities in timing your refresh cycle delivery timings.

Picture, this, the GPU simply is in an "infinite loop" of outputting Vertical Back Porch scanlines until the game Present()s the new refresh cycle (that variable-length blanking interval of VRR). Which thereupon the GPU output immediately begins scanning-out at the very next scanline output immediately right after the current Back Porch pixel row (offscreen) is being scanned-out of the GPU output.

So, with proper driver software design, the 1ms granularity can be reduced to just a few microseconds (granularity of horizontal scanrate). All but eliminating the granularity, and making it safer to framerate-cap much closer to VRR max-Hz (e.g. 143.9fps instead of 141fps for 144Hz). Currently, newer VRR displays will have to be lag-tested to see if tighter caps are possible without adding input lag.

In practicality, there are possibly software precision limitations preventing such precision, but theoretically, you can have few-microsecond precision transitioning from VRR to VSYNC ON, and vice versa, and design it in a way that there's never any framebuffer backpressure or monitor refresh-blocking during ultra-tight framerate caps from a good microsecond-accurate framerate capper such as RTSS (e.g. 143.9fps cap during 144Hz VRR operation). That "3fps below" capping is seemed unnecessary, but appeared necessary due to the original 1ms polling granularity. Newer VRR tests will have to determine if tighter caps are becoming lag-free on newer VRR displays.

Basically, graphics drivers and the GPU chip would control repeat-refreshing behaviours (LFC algorithms) instead of the monitor, and high framerates would simply throttle at the driver level, instead of display. So, fundamentally, VRR is optimizeable into a unidirectional protocol, and analog FreeSync also works on certain CRTs.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

pwn
Posts: 60
Joined: 15 Jan 2019, 18:22

Re: Pre-rendered frames etc. (continued from G-Sync 101 arti

Post by pwn » 15 Jan 2019, 18:28

Greetings.
I have one question that worries me very much, if you can easily answer please.
I will clarify in advance my English is not very (what I regret), and I do not really understand your specific technical conversation.
But based on the conversation, I understand the best configuration for playing CsGo is Locked FPS 237 (I have a 240Hz monitor supporting Gsink) and not free FPS 300+?

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Pre-rendered frames etc. (continued from G-Sync 101 arti

Post by Chief Blur Buster » 15 Jan 2019, 19:35

pwn wrote:But based on the conversation, I understand the best configuration for playing CsGo is Locked FPS 237 (I have a 240Hz monitor supporting Gsink) and not free FPS 300+?
Yes, but only if using G-SYNC (what you called Gsink)

If you use VSYNC OFF instead, use a higher cap (e.g. 300fps or 500fps), to gain Advantages of framerates above refresh rates.

You do have a choice of:
1. G-SYNC + 237fps cap
2. VSYNC OFF + higher cap (300fps or 500fps)
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Post Reply