How does frame pregeneration work, exactly?
Posted: 16 Sep 2024, 06:22
I do not understand why SSYNC is lower latency than vsync with NULL, and why vsync with NULL+boost isn't superior. When I can't understand something I flesh it out in writing to help myself understand, so that's what I'm doing here along with asking the question.
From my understanding there are 2 buffers, the video buffer, and the cpu buffer.
------------------------------------
The video card has 3 buffers. I wouldn't call it 3 buffers, but one read & 2 back buffers, but it would have a negative performance to transfer data from a back buffer to the read buffer when you can just simply read from any of the "buffers" themselves. So I see why they would call it 3 buffers instead of 2.
From what I remember, probably learned it from here lol, is that when they were running the first video display and there were major issues with reading from a buffer that is simultaneously being written to so we've basically always had 2 buffers. When vsync came along there was an annoying issues that if you're frame rate dropped below your refresh rate, it would read from the same buffer again, i,e, display the same frame twice in a row, which we call stutter. To help alleviate this issue they added another buffer to video cards and introduced triple buffered vsync. With double buffered vsync your computer would have 1 buffer being read from, and the second drawing, but if it drew fast enough it would then have to idle, waiting for the monitor, as both buffers were full. Adding a third buffer allowed it to keep drawing, meaning if you got 90fps the previous frame, but 50 the next, you would not stutter. If both back buffers are full(3) it would choose to draw the oldest image, this can GREATLY increase input lag, but IT IS the best way to prevent stuttering. Then came along (I forgot what amd calls it, but I do remember they created it and nvidia followed) & fastsync. Instead of reading the oldest image of the 2 back buffers it read the newest, and the computer never idles, as if uncapped framerate, because it alternates drawing frames between the 2 back buffers. When the monitor is ready the newest image is now the read buffer, and the old read buffer switches to a back buffer, and the back buffer that was drawing continues drawing. This resulting in the lowest input lag VSYNC, but unevenly paced frames, which we call microstutter. Something double buffer vsync did not suffer from. Triple buffered vsync did, but if can perceive microstuttering you can more easily perceive the input lag of triple buffered vsync, and would never have used it lololol.
------------------------------------
The cpu buffer, or pre-rendered frames, can have I suppose unlimited buffers. It collects input data, and calculates everything the cpu+gpu need to generate a frame like where you are facing and where everything is. Everything needed to draw the picture.
From what I've learned the 1 pre-rendered frame is a back a buffer. So unlike triple buffering (this is referring to the video buffer a COMPLETELY different buffer) which has a read buffer and 2 back buffers, but is called "three". The pre-rendered frames is called one, even though it has a read buffer, and 1 back buffer which we called double buffered when referring to the video buffer.
When drawing vsync off no frame cap and 1 pre-rendered frame, the order goes like this. Step 1 generate cpu data, step 2 generate cpu data for next frame, and draw image using cpu+gpu, step 2 repeats. If you're getting 300fps on a 60hz display, i.e. drawing 5 complete images to display 5 one/fifth images (4 tear lines), I'm going to skip this actually because it's very technical and chief has a research article on it already, but basically the top of the screen your input lag will be like 60hz monitor and the bottom will be closer to 300. I'm also skipping how vblanc in the scanout effects this and using 300 to keep it simple.
when drawing vsync on and 1 pre-rendered frame, the order is the same. As soon as the frame changes, the data is grabbed from the cpu back buffer, and the cpu data for the next frame is generated while it's draws using the cpu+gpu for the next frame. resulting in an addition frame of input lag.
aaaaand I may have just solved my own question. When drawing with ssync. Step 1 generate cpu data, step 2 generate image. Repeat steps 1 and 2. You get lower fps but lower input lag here is it drawn out.
VSYNC
(frame displays when gpu completes)
(there would be a gap between frames if your fps could be much higher than vsync limit)
[CPU1][CPU2] gap[CPU3]
[--------GPU1--------]gap[--------GPU2--------]
(if I had to guess the cpu would start filling the buffer as soon as the frame ended, at the beginning of the gap I did not draw, not when the gpu kicks in for the next as this could result in better fps but worse input lag. The same issue with UE4 and UNITY fighting, but with NVIDIA and AMD fighting over who cards get more fps, but in reality they were giving us a worse experience (input lag).)
SSYNC
[CPU1][GPU1][CPU2][GPU2][CPU3][GPU3]
I didn't even need 2 lines!
As you can see fps will be worse as to draw a frame the cpu buffer data MUST be generated at the start of the frame and not drawn from a buffer, resulting in a frame taking longer to draw, but the lowest possible input lag without excess fps prediction.
NULL??
I assume NULL works identical to VSYNC, it's just making sure the cpu back buffer is 1.
NULL+BOOST??
clearly it does NOT have as low input lag as SSYNC, and it doesn't hurt fps like SSYNC does.
This means it's filling the back buffer more than after the immediate end of the gpu like it does in vysnc, but sometime before the end of the frame like ssync does.
If I were the engineer I would make it so when the gpu kicks in for the next frame, after the gap, to simultaneously generate, so that if the cpu has any downtime when the gpu is generating which I assume there has to be even in CPU bottlenecked games as the gpu has to do some stuff before talking to the cpu about it, back and forth, that it could use that off time to generate the next frame. Maybe this is not true and the amount of downtime the cpu has in a cpu bottlenecked situtation is not enough to fill the back buffer resulting in lower fps. In this case I may do something like take the standard deviance of the gap time over the past 50 frames and predict when to pre-generate the frame. Of course you could probably spend eternity, finding the best formula, or even use a decent one to specialize per game (who knows maybe NULL does this too)
NOTE: Almost NO games are actually effected by the driver setting pre-rendered frames or NULL, as it's chosen engine level not driver level. Nvidia works with devs to set the engine pre-rendered frames to 1, and call the in-game setting NULL, but they're just helping game devs from engine devs. For example, unreal engine 4/5 sets the buffer to 2 by default. This is because you can sometimes achieve higher fps, but it costs you input lag. When you want to show off that your unreal engine looks so good and gets better fps than your competition unity engine, you set it to 2. When you want to have an enjoyable playing experience you set it to one. Devs don't know that, but you can edit config files to fix it. The devs missed this in every UE game i've played. ARK, Deep Rock Galactic, the Borderlands series, Sea of thieves. I've talked to super duper garret cooper, a dev who made black ice on the unity engine and tried to get him to change the setting, as unlike unreal engine only the dev can change the variable I think maybe only before compiling the game. I think it's safe to assume 99% of unity engine games also have it set to 2. So as I said. Nvidia Is just helping game devs change a setting to make their game better that engine devs set to make their engines better on paper.
------------------------------------
So I guess my questions come down to a few things...
Am I even right about where the frames are pre-generated?
Do you know who ate all the donuts?
Why is NULL+Boosts location of frame pre-generation soooooo much darn worse than SSYNC? (where is it)
Is predictive FPS limiting here yet for even lower input lag than SSYNC? i.e.
idle[CPU2][GPU2]
instead of SSYNC
[CPU2][GPU2]idle
------------------------------------
p.s. I usually go back, fix my grammer, and make things easier to understand, but I'm too tired or lazy or something. Sorry I puked my autism all over your forum. lol
From my understanding there are 2 buffers, the video buffer, and the cpu buffer.
------------------------------------
The video card has 3 buffers. I wouldn't call it 3 buffers, but one read & 2 back buffers, but it would have a negative performance to transfer data from a back buffer to the read buffer when you can just simply read from any of the "buffers" themselves. So I see why they would call it 3 buffers instead of 2.
From what I remember, probably learned it from here lol, is that when they were running the first video display and there were major issues with reading from a buffer that is simultaneously being written to so we've basically always had 2 buffers. When vsync came along there was an annoying issues that if you're frame rate dropped below your refresh rate, it would read from the same buffer again, i,e, display the same frame twice in a row, which we call stutter. To help alleviate this issue they added another buffer to video cards and introduced triple buffered vsync. With double buffered vsync your computer would have 1 buffer being read from, and the second drawing, but if it drew fast enough it would then have to idle, waiting for the monitor, as both buffers were full. Adding a third buffer allowed it to keep drawing, meaning if you got 90fps the previous frame, but 50 the next, you would not stutter. If both back buffers are full(3) it would choose to draw the oldest image, this can GREATLY increase input lag, but IT IS the best way to prevent stuttering. Then came along (I forgot what amd calls it, but I do remember they created it and nvidia followed) & fastsync. Instead of reading the oldest image of the 2 back buffers it read the newest, and the computer never idles, as if uncapped framerate, because it alternates drawing frames between the 2 back buffers. When the monitor is ready the newest image is now the read buffer, and the old read buffer switches to a back buffer, and the back buffer that was drawing continues drawing. This resulting in the lowest input lag VSYNC, but unevenly paced frames, which we call microstutter. Something double buffer vsync did not suffer from. Triple buffered vsync did, but if can perceive microstuttering you can more easily perceive the input lag of triple buffered vsync, and would never have used it lololol.
------------------------------------
The cpu buffer, or pre-rendered frames, can have I suppose unlimited buffers. It collects input data, and calculates everything the cpu+gpu need to generate a frame like where you are facing and where everything is. Everything needed to draw the picture.
From what I've learned the 1 pre-rendered frame is a back a buffer. So unlike triple buffering (this is referring to the video buffer a COMPLETELY different buffer) which has a read buffer and 2 back buffers, but is called "three". The pre-rendered frames is called one, even though it has a read buffer, and 1 back buffer which we called double buffered when referring to the video buffer.
When drawing vsync off no frame cap and 1 pre-rendered frame, the order goes like this. Step 1 generate cpu data, step 2 generate cpu data for next frame, and draw image using cpu+gpu, step 2 repeats. If you're getting 300fps on a 60hz display, i.e. drawing 5 complete images to display 5 one/fifth images (4 tear lines), I'm going to skip this actually because it's very technical and chief has a research article on it already, but basically the top of the screen your input lag will be like 60hz monitor and the bottom will be closer to 300. I'm also skipping how vblanc in the scanout effects this and using 300 to keep it simple.
when drawing vsync on and 1 pre-rendered frame, the order is the same. As soon as the frame changes, the data is grabbed from the cpu back buffer, and the cpu data for the next frame is generated while it's draws using the cpu+gpu for the next frame. resulting in an addition frame of input lag.
aaaaand I may have just solved my own question. When drawing with ssync. Step 1 generate cpu data, step 2 generate image. Repeat steps 1 and 2. You get lower fps but lower input lag here is it drawn out.
VSYNC
(frame displays when gpu completes)
(there would be a gap between frames if your fps could be much higher than vsync limit)
[CPU1][CPU2] gap[CPU3]
[--------GPU1--------]gap[--------GPU2--------]
(if I had to guess the cpu would start filling the buffer as soon as the frame ended, at the beginning of the gap I did not draw, not when the gpu kicks in for the next as this could result in better fps but worse input lag. The same issue with UE4 and UNITY fighting, but with NVIDIA and AMD fighting over who cards get more fps, but in reality they were giving us a worse experience (input lag).)
SSYNC
[CPU1][GPU1][CPU2][GPU2][CPU3][GPU3]
I didn't even need 2 lines!
As you can see fps will be worse as to draw a frame the cpu buffer data MUST be generated at the start of the frame and not drawn from a buffer, resulting in a frame taking longer to draw, but the lowest possible input lag without excess fps prediction.
NULL??
I assume NULL works identical to VSYNC, it's just making sure the cpu back buffer is 1.
NULL+BOOST??
clearly it does NOT have as low input lag as SSYNC, and it doesn't hurt fps like SSYNC does.
This means it's filling the back buffer more than after the immediate end of the gpu like it does in vysnc, but sometime before the end of the frame like ssync does.
If I were the engineer I would make it so when the gpu kicks in for the next frame, after the gap, to simultaneously generate, so that if the cpu has any downtime when the gpu is generating which I assume there has to be even in CPU bottlenecked games as the gpu has to do some stuff before talking to the cpu about it, back and forth, that it could use that off time to generate the next frame. Maybe this is not true and the amount of downtime the cpu has in a cpu bottlenecked situtation is not enough to fill the back buffer resulting in lower fps. In this case I may do something like take the standard deviance of the gap time over the past 50 frames and predict when to pre-generate the frame. Of course you could probably spend eternity, finding the best formula, or even use a decent one to specialize per game (who knows maybe NULL does this too)
NOTE: Almost NO games are actually effected by the driver setting pre-rendered frames or NULL, as it's chosen engine level not driver level. Nvidia works with devs to set the engine pre-rendered frames to 1, and call the in-game setting NULL, but they're just helping game devs from engine devs. For example, unreal engine 4/5 sets the buffer to 2 by default. This is because you can sometimes achieve higher fps, but it costs you input lag. When you want to show off that your unreal engine looks so good and gets better fps than your competition unity engine, you set it to 2. When you want to have an enjoyable playing experience you set it to one. Devs don't know that, but you can edit config files to fix it. The devs missed this in every UE game i've played. ARK, Deep Rock Galactic, the Borderlands series, Sea of thieves. I've talked to super duper garret cooper, a dev who made black ice on the unity engine and tried to get him to change the setting, as unlike unreal engine only the dev can change the variable I think maybe only before compiling the game. I think it's safe to assume 99% of unity engine games also have it set to 2. So as I said. Nvidia Is just helping game devs change a setting to make their game better that engine devs set to make their engines better on paper.
------------------------------------
So I guess my questions come down to a few things...
Am I even right about where the frames are pre-generated?
Do you know who ate all the donuts?
Why is NULL+Boosts location of frame pre-generation soooooo much darn worse than SSYNC? (where is it)
Is predictive FPS limiting here yet for even lower input lag than SSYNC? i.e.
idle[CPU2][GPU2]
instead of SSYNC
[CPU2][GPU2]idle
------------------------------------
p.s. I usually go back, fix my grammer, and make things easier to understand, but I'm too tired or lazy or something. Sorry I puked my autism all over your forum. lol