GeDoSaTo Dynamic FPS Capping

Sparky · Post by **Sparky** » 13 May 2015, 02:04

Did some more testing, PLR doesn't seem to make much difference between 0.3 and 0.9, in terms of latency, but what difference there is seems to be opposite of what's expected. PLR of 0.1 causes loss of sync at 85.2fps cap(at least, I was unable to get to an apparently steady state within 5 minutes of testing).

What you set as the framerate cap is extremely sensitive, there's a very small window where you can maintain sync with refresh rate without ballooning latency to 50ms+(85.2fps is inside this window, 85.19 and 85.3 were outside this window). Not sure about the best way to test resilience to frametime variance.

Basically, you need some stronger feedback from the refresh and frame completion timing.

If you have something else you want me to check, just ask.

Glide · Post by **Glide** » 13 May 2015, 03:48

flood wrote:
Glide wrote:Seems like an interesting approach to reducing input lag
doesn't reduce input lag over the simple way of capping when vsync is off (after each frame is drawn, sleep until time is equal to integer/cap)

does reduce input lag when youre using vsync or something where there is a time difference between when the frame finishes drawing and when it's actually displayed.

Well no, but surely you are always faster to be uncapped if you have V-Sync off?
Never really understood why people were trying to use frame caps with V-Sync off, or why they would intentionally cap below their refresh rate with V-Sync on. (which causes the game to stutter - one of the main reasons you would enable V-Sync in the first place!)

Sparky · Post by **Sparky** » 13 May 2015, 04:04

Glide wrote:
flood wrote:
Glide wrote:Seems like an interesting approach to reducing input lag
doesn't reduce input lag over the simple way of capping when vsync is off (after each frame is drawn, sleep until time is equal to integer/cap)

does reduce input lag when youre using vsync or something where there is a time difference between when the frame finishes drawing and when it's actually displayed.
Well no, but surely you are always faster to be uncapped if you have V-Sync off?
Never really understood why people were trying to use frame caps with V-Sync off, or why they would intentionally cap below their refresh rate with V-Sync on. (which causes the game to stutter - one of the main reasons you would enable V-Sync in the first place!)

The reason you enable v-sync is to get rid of tearing, getting rid of stutter would be a side effect of the huge latency jump of a display bottlenecked render chain. The big reason to cap framerate is to reduce the latency of v-sync, but normally you have to take some stutter when you do this. A second common reason for framerate caps is noise/heat/power usage, especially on a laptop.

In any case, if you can make this framerate cap work, you get v-sync without stutter, and with much less input lag than uncapped v-sync. I don't think it can be quite as good as g-sync or freesync though, because you have to add enough latency to cover your frametime variance.

Glide · Post by **Glide** » 13 May 2015, 07:11

Sparky wrote:
Glide wrote:
flood wrote:
Glide wrote:Seems like an interesting approach to reducing input lag
doesn't reduce input lag over the simple way of capping when vsync is off (after each frame is drawn, sleep until time is equal to integer/cap)

does reduce input lag when youre using vsync or something where there is a time difference between when the frame finishes drawing and when it's actually displayed.
Well no, but surely you are always faster to be uncapped if you have V-Sync off?
Never really understood why people were trying to use frame caps with V-Sync off, or why they would intentionally cap below their refresh rate with V-Sync on. (which causes the game to stutter - one of the main reasons you would enable V-Sync in the first place!)
The reason you enable v-sync is to get rid of tearing, getting rid of stutter would be a side effect of the huge latency jump of a display bottlenecked render chain. The big reason to cap framerate is to reduce the latency of v-sync, but normally you have to take some stutter when you do this. A second common reason for framerate caps is noise/heat/power usage, especially on a laptop.

In any case, if you can make this framerate cap work, you get v-sync without stutter, and with much less input lag than uncapped v-sync. I don't think it can be quite as good as g-sync or freesync though, because you have to add enough latency to cover your frametime variance.

But that's what V-Sync is for. If your framerate is not synchronized to your refresh rate - both of which are prone to some degree of fluctuation - you will get tearing and stutter.

Or is there some trick to forcing the tear to remain fixed at the last line on the display which I am unaware of?
Most of the time the tear seems to end up somewhere in the central portion of the display for me, and gradually shifts position over time due to the frame cap not being in perfect sync with the refresh rate.

Sparky · Post by **Sparky** » 13 May 2015, 08:24

Glide wrote:
Sparky wrote:
Glide wrote:
flood wrote: doesn't reduce input lag over the simple way of capping when vsync is off (after each frame is drawn, sleep until time is equal to integer/cap)

does reduce input lag when youre using vsync or something where there is a time difference between when the frame finishes drawing and when it's actually displayed.
Well no, but surely you are always faster to be uncapped if you have V-Sync off?
Never really understood why people were trying to use frame caps with V-Sync off, or why they would intentionally cap below their refresh rate with V-Sync on. (which causes the game to stutter - one of the main reasons you would enable V-Sync in the first place!)
The reason you enable v-sync is to get rid of tearing, getting rid of stutter would be a side effect of the huge latency jump of a display bottlenecked render chain. The big reason to cap framerate is to reduce the latency of v-sync, but normally you have to take some stutter when you do this. A second common reason for framerate caps is noise/heat/power usage, especially on a laptop.

In any case, if you can make this framerate cap work, you get v-sync without stutter, and with much less input lag than uncapped v-sync. I don't think it can be quite as good as g-sync or freesync though, because you have to add enough latency to cover your frametime variance.
But that's what V-Sync is for. If your framerate is not synchronized to your refresh rate - both of which are prone to some degree of fluctuation - you will get tearing and stutter.

The primary function of v-sync is to eliminate tearing, but it often comes at the cost of 50+ms more latency than a CPU bottlenecked render pipeline. You don't need 50ms worth of buffering, and that much latency is intolerable in many games, so you set a framerate cap to cut the latency down. The problem is, most framerate caps can't stay synchronized with refresh rate, so you miss frame deadlines occasionally, and get some stutter. Hence the need for a framerate limiter that can stay in sync with refresh rate, and control exactly how much buffer to keep. Or a variable refresh monitor, which lets you display frames whenever they happen to finish rendering.

Or is there some trick to forcing the tear to remain fixed at the last line on the display which I am unaware of?
Most of the time the tear seems to end up somewhere in the central portion of the display for me, and gradually shifts position over time due to the frame cap not being in perfect sync with the refresh rate.

There are tricks to get tear free animation without v-sync or input lag, but they're incompatible with modern rendering techniques. If you play some arcade games from the 80s, you might see some of those techniques. It's all hardware specific coding with just in time rendering, but that's not what I was talking about.

Durante · Post by **Durante** » 13 May 2015, 12:31

Sparky wrote:Testing methodology, I'm using an arduino micro with a photoresistor to detect a dark to light transition that happens when I move the mouse. The arduino emulates a mouse to the syatem, and measures the timings involved. I modified the USB library to put a timestamp on the usb interrupt, in order to remove the variance of the 1khz usb polling. It's very similar to flood's test setup, and there's more detail in that thread.

That's pretty awesome, I need a setup like that for development.

I updated the FPS limiting logic and added a new GPU flushing option (flushGPUEveryFrame true). Can you check if that changes anything for you?

Glide · Post by **Glide** » 13 May 2015, 14:09

Sparky wrote:The primary function of v-sync is to eliminate tearing, but it often comes at the cost of 50+ms more latency than a CPU bottlenecked render pipeline. You don't need 50ms worth of buffering, and that much latency is intolerable in many games, so you set a framerate cap to cut the latency down. The problem is, most framerate caps can't stay synchronized with refresh rate, so you miss frame deadlines occasionally, and get some stutter. Hence the need for a framerate limiter that can stay in sync with refresh rate, and control exactly how much buffer to keep. Or a variable refresh monitor, which lets you display frames whenever they happen to finish rendering.

Is the latency still going to be that high even if you force the flip queue size to 1? (NVIDIA Control Panel: Maximum Pre-Rendered Frames)
I thought that V-Sync (double-buffering) should be 2 frames in an ideal situation, and triple-buffering would be something in the region of 1-2 frames depending on how quickly the system can render the game.

Sparky wrote:There are tricks to get tear free animation without v-sync or input lag, but they're incompatible with modern rendering techniques. If you play some arcade games from the 80s, you might see some of those techniques. It's all hardware specific coding with just in time rendering, but that's not what I was talking about.

Well yes, but that's because those are closed systems and I assume that they are genlocked, running the game and the display off the same clock reference so there is no need for V-Sync.
It's similar in principle to G-Sync and FreeSync/Adaptive-Sync, only used at a fixed rate rather than variable.

Sparky · Post by **Sparky** » 13 May 2015, 16:43

Glide wrote:Is the latency still going to be that high even if you force the flip queue size to 1? (NVIDIA Control Panel: Maximum Pre-Rendered Frames)
I thought that V-Sync (double-buffering) should be 2 frames in an ideal situation, and triple-buffering would be something in the region of 1-2 frames depending on how quickly the system can render the game.

Forcing flip queue to 1 saves exactly 1 frame in CS:GO. Take a look at the second graph in this post: http://forums.blurbusters.com/viewtopic ... 270#p15668 Keep in mind that's at 85hz refresh rate, there would be a bigger difference for 60hz. If you're bottlenecked at the start of the display pipeline, the frame only spends actual calculation time in each stage of the pipeline. If you're bottlenecked at the end of the pipeline, each frame spends 1/framerate in each stage of the pipeline, regardless of how much computation happens in that stage, because you're always waiting on the next stage of the pipeline to be ready.

Durante wrote:
Sparky wrote:Testing methodology, I'm using an arduino micro with a photoresistor to detect a dark to light transition that happens when I move the mouse. The arduino emulates a mouse to the syatem, and measures the timings involved. I modified the USB library to put a timestamp on the usb interrupt, in order to remove the variance of the 1khz usb polling. It's very similar to flood's test setup, and there's more detail in that thread.
That's pretty awesome, I need a setup like that for development.

I updated the FPS limiting logic and added a new GPU flushing option (flushGPUEveryFrame true). Can you check if that changes anything for you?

Cool, I'll test that tonight.

Sparky · Post by **Sparky** » 14 May 2015, 00:36

Just doing synthetic tests to start:

With v-sync on:

Flush GPU after every frame works well when a framerate limiter is not active, with an average latency of 33.4ms. That's a significant improvement on normal uncapped v-sync, and it looks very consistent. Definitely something to test more in actual games. It doesn't seem to do anything if the framerate limiter is on(and set low enough to be effective).

As for the updated FPS limiting logic, the threshold seems to have changed a bit, it's capping the framerate slightly lower now. Is it supposed to stay synchronized with refresh rate, or did I misunderstand the blog post? In any case, the PLR seems to be working closer to expected now, with higher values giving lower latency, but less consistency. I'll do some more testing on this. PLR of 1.0 yields a framerate about 2fps below the specified cap, and has somewhat lower average latency than a PLR of 0.5. Both are significantly less consistent than without a cap.

Glide · Post by **Glide** » 14 May 2015, 08:44

Sparky wrote:Forcing flip queue to 1 saves exactly 1 frame in CS:GO. Take a look at the second graph in this post: http://forums.blurbusters.com/viewtopic ... 270#p15668 Keep in mind that's at 85hz refresh rate, there would be a bigger difference for 60hz. If you're bottlenecked at the start of the display pipeline, the frame only spends actual calculation time in each stage of the pipeline. If you're bottlenecked at the end of the pipeline, each frame spends 1/framerate in each stage of the pipeline, regardless of how much computation happens in that stage, because you're always waiting on the next stage of the pipeline to be ready.

OK, but surely saving at least one frame by default without even having to introduce a cap is a good thing?
The default value if an application does not specify anything is 3, so if it's reducing latency by one frame, it would seem as though CS:GO is setting a value of 2 rather than leaving it undefined. (more on that later)
In most games, which do not specify a value for the flip queue size, it should actually reduce latency by two frames.

Doing this has no undesirable effects, as far as I can tell.
It does not have a negative impact on game smoothness - in my testing it can actually improve frame-pacing, while frame limiting alone does not have as much of an effect.

Is there any reason to not combine this with frame limiting?
It seems as though there should only be positive effects from doing this.

I get the impression that you set it to "1" once, saw that it did not reduce latency as much as a frame cap, and then changed it back to "Use the 3D application setting" without trying it in conjunction with a frame cap.
Or are all your tests with a frame limiter performed with this set to "1" ?
It may be that while there is not much benefit in CS:GO, there may be benefits in other games/engines.

Sparky wrote:

In the post that you linked to, an in-game cap of 85 FPS plus V-Sync seems to have 2 frames of latency.
As I understand it, this is the best possible result for double-buffered V-Sync and shows that CS:GO's FPS cap is working very well. (assuming that gameplay remains smooth/stutter-free)

What we also see is that an external cap via RivaTuner adds one additional frame of latency.
This is also expected because an external cap is always going to add some amount of latency - but not all games have the ability to cap their framerate or do it well, so this may still be lower latency than uncapped V-Sync.

All of the other results are puzzling though - not that I think they are wrong, just that nothing is behaving as I expected.

The uncapped double/triple buffering results have significant latency which should not be there.
You should not be getting three or four additional frames of latency.

An FPS cap—especially an external FPS cap—should not be able to reduce latency by more than 1 frame with V-Sync enabled. Something is not right if it is.
All that an FPS cap should be doing is pushing the start of frame rendering closer to the V-Sync point, as seen in this graphic.

And uncapped triple-buffering seems to have an additional frame of latency over uncapped V-Sync.
Is CS:GO not doing triple-buffering correctly, or were you forcing it via the driver or using an injector to get "triple buffering"?

My understanding (which may be entirely wrong) was that triple buffering added an extra buffer so that there are two "render" buffers and one "display" buffer, instead of one "render" buffer and one "display" buffer.
So with a proper triple-buffering implementation, the game would render as many frames as possible (e.g. 600) switching between the two "render" buffers each time it completes a frame, and then when it is time for the display to refresh it presents the most recent complete frame to display.

Worst-case scenario: there should be 2 frames of latency—the same as double-buffered V-Sync—if it was not able to complete any additional frames between refreshes.
Best-case scenario: your hardware is able to render hundreds of frames between refreshes, so it should be able to present a frame which was rendered much closer to the refresh point, reducing latency to something in-between 1-2 frames, while still avoiding tearing and displaying a smooth image.

If we stick to the example of a game which can run at 600 FPS uncapped, on an 85Hz display:
600/85=7.06, so it should be able to render 7 complete frames between every refresh.
So instead of 23.5ms (2 frames at 85Hz) latency should be reduced to 13.5ms?

Capping the framerate should not improve latency with triple-buffering, that should basically force it into 2 frames of latency as though you were using double-buffering - which is exactly what your results show: capped triple-buffering has the same latency as double-buffering.

Durante wrote:
Sparky wrote:Testing methodology, I'm using an arduino micro with a photoresistor to detect a dark to light transition that happens when I move the mouse. The arduino emulates a mouse to the syatem, and measures the timings involved. I modified the USB library to put a timestamp on the usb interrupt, in order to remove the variance of the 1khz usb polling. It's very similar to flood's test setup, and there's more detail in that thread.
That's pretty awesome, I need a setup like that for development.

Not for development in my case, but I agree, it would be nice to have a device like this which can measure total round-trip latency from input to display.
I'm not sure what the parts cost is, but I'd pay for a pre-built kit if someone were to put it all together in an easy-to-use package. I'm sure there is at least a small demand for this sort of device.

Rather than modern games like CS:GO, I'm more concerned about latency with emulators like RetroArch for example, and would like to investigate the latency behavior outlined above.

Blur Busters Forums

GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping