GeDoSaTo Dynamic FPS Capping

Sparky · Post by **Sparky** » 14 May 2015, 20:49

Glide wrote:
Sparky wrote:Forcing flip queue to 1 saves exactly 1 frame in CS:GO. Take a look at the second graph in this post: http://forums.blurbusters.com/viewtopic ... 270#p15668 Keep in mind that's at 85hz refresh rate, there would be a bigger difference for 60hz. If you're bottlenecked at the start of the display pipeline, the frame only spends actual calculation time in each stage of the pipeline. If you're bottlenecked at the end of the pipeline, each frame spends 1/framerate in each stage of the pipeline, regardless of how much computation happens in that stage, because you're always waiting on the next stage of the pipeline to be ready.
OK, but surely saving at least one frame by default without even having to introduce a cap is a good thing?
The default value if an application does not specify anything is 3, so if it's reducing latency by one frame, it would seem as though CS:GO is setting a value of 2 rather than leaving it undefined. (more on that later)
In most games, which do not specify a value for the flip queue size, it should actually reduce latency by two frames.

Doing this has no undesirable effects, as far as I can tell.
It does not have a negative impact on game smoothness - in my testing it can actually improve frame-pacing, while frame limiting alone does not have as much of an effect.

Is there any reason to not combine this with frame limiting?
It seems as though there should only be positive effects from doing this.

I get the impression that you set it to "1" once, saw that it did not reduce latency as much as a frame cap, and then changed it back to "Use the 3D application setting" without trying it in conjunction with a frame cap.
Or are all your tests with a frame limiter performed with this set to "1" ?
It may be that while there is not much benefit in CS:GO, there may be benefits in other games/engines.

If you're using a CPU based framerate cap, it doesn't matter at all, because the flip queue is always empty. I guess I could test it with DFC and RTSS, but one frame doesn't make radeonpro faster than RTSS, and I'm a bit wary of using both utilities at the same time. In any case if you're using v-sync and want stable frame times, you should be using the flush GPU after every frame option, instead of a framerate limiter. At least, that's what the synthetic test suggests, I still need to test that in game.

Sparky wrote:
In the post that you linked to, an in-game cap of 85 FPS plus V-Sync seems to have 2 frames of latency.
As I understand it, this is the best possible result for double-buffered V-Sync and shows that CS:GO's FPS cap is working very well. (assuming that gameplay remains smooth/stutter-free)

What we also see is that an external cap via RivaTuner adds one additional frame of latency.
This is also expected because an external cap is always going to add some amount of latency - but not all games have the ability to cap their framerate or do it well, so this may still be lower latency than uncapped V-Sync.

All of the other results are puzzling though - not that I think they are wrong, just that nothing is behaving as I expected.

The uncapped double/triple buffering results have significant latency which should not be there.
You should not be getting three or four additional frames of latency.

An FPS cap—especially an external FPS cap—should not be able to reduce latency by more than 1 frame with V-Sync enabled. Something is not right if it is.
All that an FPS cap should be doing is pushing the start of frame rendering closer to the V-Sync point, as seen in this graphic.

that graphic is perhaps overly simplified, there are more than 2 pipeline steps, so you get more than 2 frames of input lag if you're bottlenecked at the end of it.

And uncapped triple-buffering seems to have an additional frame of latency over uncapped V-Sync.
Is CS:GO not doing triple-buffering correctly, or were you forcing it via the driver or using an injector to get "triple buffering"?

My understanding (which may be entirely wrong) was that triple buffering added an extra buffer so that there are two "render" buffers and one "display" buffer, instead of one "render" buffer and one "display" buffer.
So with a proper triple-buffering implementation, the game would render as many frames as possible (e.g. 600) switching between the two "render" buffers each time it completes a frame, and then when it is time for the display to refresh it presents the most recent complete frame to display.

That's possible to do, but nobody implements it that way. Frames aren't dropped, so you only see a benefit if your average framerate is below your refresh rate. Even then it's not a huge benefit to animation smoothness.

Worst-case scenario: there should be 2 frames of latency—the same as double-buffered V-Sync—if it was not able to complete any additional frames between refreshes.
Best-case scenario: your hardware is able to render hundreds of frames between refreshes, so it should be able to present a frame which was rendered much closer to the refresh point, reducing latency to something in-between 1-2 frames, while still avoiding tearing and displaying a smooth image.

If we stick to the example of a game which can run at 600 FPS uncapped, on an 85Hz display:
600/85=7.06, so it should be able to render 7 complete frames between every refresh.
So instead of 23.5ms (2 frames at 85Hz) latency should be reduced to 13.5ms?

Capping the framerate should not improve latency with triple-buffering, that should basically force it into 2 frames of latency as though you were using double-buffering - which is exactly what your results show: capped triple-buffering has the same latency as double-buffering.
Durante wrote:
Sparky wrote:Testing methodology, I'm using an arduino micro with a photoresistor to detect a dark to light transition that happens when I move the mouse. The arduino emulates a mouse to the syatem, and measures the timings involved. I modified the USB library to put a timestamp on the usb interrupt, in order to remove the variance of the 1khz usb polling. It's very similar to flood's test setup, and there's more detail in that thread.
That's pretty awesome, I need a setup like that for development.
Not for development in my case, but I agree, it would be nice to have a device like this which can measure total round-trip latency from input to display.
I'm not sure what the parts cost is, but I'd pay for a pre-built kit if someone were to put it all together in an easy-to-use package. I'm sure there is at least a small demand for this sort of device.

Rather than modern games like CS:GO, I'm more concerned about latency with emulators like RetroArch for example, and would like to investigate the latency behavior outlined above.

Parts cost is about $50 if you're starting with absolutely nothing. (should be similar cost for flood's diode setup, but you probably won't find a kit with everything you need in it. I think he has a BOM in his thread)
20 bucks for the microcontroller(either a teensy with pins, or an arduino micro with pins, so you can just plug it directly into the breadboard)
25 for a starter kit with photoresistor, breadboard, jumper wires, resistors, a switch/button, and a potentiometer.
5 for USB to micro usb cable.

I also used a soldering iron to attach some leads to the photoresistor, so I could put it directly in front of the monitor.

Glide · Post by **Glide** » 15 May 2015, 10:47

Sparky wrote:If you're using a CPU based framerate cap, it doesn't matter at all, because the flip queue is always empty. I guess I could test it with DFC and RTSS, but one frame doesn't make radeonpro faster than RTSS, and I'm a bit wary of using both utilities at the same time. In any case if you're using v-sync and want stable frame times, you should be using the flush GPU after every frame option, instead of a framerate limiter. At least, that's what the synthetic test suggests, I still need to test that in game.

It varies depending on the game in my testing.

With Life is Strange (Unreal Engine 3 game) I can only get fluid stutter-free motion if I reduce the flip queue to 1 and use driver-based V-Sync rather than in-game V-Sync.
If I leave the flip queue at the default value, introducing an FPS cap via RTSS smooths it out a bit, but does not eliminate the stuttering in that game.
I can't seem to get GeDoSaTo working to use the Flush GPU feature at all actually. I suspect the issue is that I'm running it with elevated permissions on a limited user account, rather than running on an admin account.

In Max Payne 2, which I always like for testing as it will run at hundreds of FPS and flip queue size has a direct impact on latency (set it to 8 and try to navigate the menus with a mouse!) if I set an FPS cap of 60 in RTSS, I do get very low-latency results.
But every so often it will drop a frame and stutter even though the GPU is not being taxed at all.

If I increase the cap to 62, it never drops a frame, but it does feel a little higher latency than 60.
If I reduce the flip queue size to 1 and set the cap to 62, never drops a frame and feels like it may be slightly lower latency than the cap of 62 on its own.

Reducing the flip queue size to 1 on its own without a cap still helps reduce latency compared to regular uncapped V-Sync, but latency feels like it is constantly changing between a low and high latency state - even in the menus. So it is definitely beneficial to combine FQ1 with an FPS cap.

You mentioned RadeonPro, and this has me wondering if it may differ between NVIDIA or AMD. (I thought you had a 970, but I guess I was mistaken)
With NVIDIA this is a setting in the driver. With AMD it seems that you need a third-party utility to set this.

Sparky wrote:that graphic is perhaps overly simplified, there are more than 2 pipeline steps, so you get more than 2 frames of input lag if you're bottlenecked at the end of it.

I see. Still, it would be interesting to see if there are any differences in latency if you combine FQ1 with a frame limiter - whether internal or external.
In all of my testing in games, there only ever seems to have been positive or neutral effects from setting it to 1.

Sparky wrote:That's possible to do, but nobody implements it that way. Frames aren't dropped, so you only see a benefit if your average framerate is below your refresh rate. Even then it's not a huge benefit to animation smoothness.

Huh, I thought that's what "true" triple-buffering was, and the injectors only force the "bad" triple buffering which is essentially double-buffering with an extra frame of latency and no tearing.

Sparky wrote:Parts cost is about $50 if you're starting with absolutely nothing. (should be similar cost for flood's diode setup, but you probably won't find a kit with everything you need in it. I think he has a BOM in his thread)
20 bucks for the microcontroller(either a teensy with pins, or an arduino micro with pins, so you can just plug it directly into the breadboard)
25 for a starter kit with photoresistor, breadboard, jumper wires, resistors, a switch/button, and a potentiometer.
5 for USB to micro usb cable.

I also used a soldering iron to attach some leads to the photoresistor, so I could put it directly in front of the monitor.

It's not so much the hardware that is the issue for me (I can solder well enough to put it together) but the software side of things.

sekta · Post by **sekta** » 19 May 2015, 22:22

Will using an external FPS limiter, such as RTSS, cause input lag even if you set it to an extremely high number? In some games, the FPS can go as high as 600fps but average out at 100-180. If I set it to 250fps limit, will there be extra input lag on every single frame?

Sparky · Post by **Sparky** » 20 May 2015, 00:18

sekta wrote:Will using an external FPS limiter, such as RTSS, cause input lag even if you set it to an extremely high number? In some games, the FPS can go as high as 600fps but average out at 100-180. If I set it to 250fps limit, will there be extra input lag on every single frame?

If the framerate cap is set above framerate it doesn't do anything at all to latency, if it's limiting framerate it will change latency. At 250fps v-sync off, RTSS would add about 4ms of latency over an in-game CPU cap.

sekta · Post by **sekta** » 20 May 2015, 01:09

Thank you. Great to know.

Sparky · Post by **Sparky** » 20 May 2015, 01:25

Glide wrote:
Sparky wrote:If you're using a CPU based framerate cap, it doesn't matter at all, because the flip queue is always empty. I guess I could test it with DFC and RTSS, but one frame doesn't make radeonpro faster than RTSS, and I'm a bit wary of using both utilities at the same time. In any case if you're using v-sync and want stable frame times, you should be using the flush GPU after every frame option, instead of a framerate limiter. At least, that's what the synthetic test suggests, I still need to test that in game.
It varies depending on the game in my testing.

With Life is Strange (Unreal Engine 3 game) I can only get fluid stutter-free motion if I reduce the flip queue to 1 and use driver-based V-Sync rather than in-game V-Sync.
If I leave the flip queue at the default value, introducing an FPS cap via RTSS smooths it out a bit, but does not eliminate the stuttering in that game.
I can't seem to get GeDoSaTo working to use the Flush GPU feature at all actually. I suspect the issue is that I'm running it with elevated permissions on a limited user account, rather than running on an admin account.

In Max Payne 2, which I always like for testing as it will run at hundreds of FPS and flip queue size has a direct impact on latency (set it to 8 and try to navigate the menus with a mouse!) if I set an FPS cap of 60 in RTSS, I do get very low-latency results.
But every so often it will drop a frame and stutter even though the GPU is not being taxed at all.

If I increase the cap to 62, it never drops a frame, but it does feel a little higher latency than 60.
If I reduce the flip queue size to 1 and set the cap to 62, never drops a frame and feels like it may be slightly lower latency than the cap of 62 on its own.

the cap of 62 doesn't do anything, because your framerate never gets high with v-sync on.

Reducing the flip queue size to 1 on its own without a cap still helps reduce latency compared to regular uncapped V-Sync, but latency feels like it is constantly changing between a low and high latency state - even in the menus. So it is definitely beneficial to combine FQ1 with an FPS cap.

You mentioned RadeonPro, and this has me wondering if it may differ between NVIDIA or AMD. (I thought you had a 970, but I guess I was mistaken)
With NVIDIA this is a setting in the driver. With AMD it seems that you need a third-party utility to set this.

I think it's flood that has the 970, I don't have an nvidia card to test right now(well, except for an old gtx 260, but that's not in a machine right now)

Sparky wrote:that graphic is perhaps overly simplified, there are more than 2 pipeline steps, so you get more than 2 frames of input lag if you're bottlenecked at the end of it.
I see. Still, it would be interesting to see if there are any differences in latency if you combine FQ1 with a frame limiter - whether internal or external.
In all of my testing in games, there only ever seems to have been positive or neutral effects from setting it to 1.

FQ1 does nothing if you're limited by the CPU, if you're not limited by the CPU or in game framerate cap it can have an effect.

Sparky wrote:That's possible to do, but nobody implements it that way. Frames aren't dropped, so you only see a benefit if your average framerate is below your refresh rate. Even then it's not a huge benefit to animation smoothness.
Huh, I thought that's what "true" triple-buffering was, and the injectors only force the "bad" triple buffering which is essentially double-buffering with an extra frame of latency and no tearing.

triple buffering is what lets v-sync work at 45fps on a 60hz monitor, normal double buffering knocks you down to 30, because the GPU stalls whenever a completed frame is waiting on the display. If you're below 60fps, triple buffering doesn't add any latency. But once you get to your refresh rate, the framerate caps out. If it was a low latency implementation of triple buffering, your framerate would go above your refresh rate and the extra frames would be dropped.

Sparky wrote:Parts cost is about $50 if you're starting with absolutely nothing. (should be similar cost for flood's diode setup, but you probably won't find a kit with everything you need in it. I think he has a BOM in his thread)
20 bucks for the microcontroller(either a teensy with pins, or an arduino micro with pins, so you can just plug it directly into the breadboard)
25 for a starter kit with photoresistor, breadboard, jumper wires, resistors, a switch/button, and a potentiometer.
5 for USB to micro usb cable.

I also used a soldering iron to attach some leads to the photoresistor, so I could put it directly in front of the monitor.
It's not so much the hardware that is the issue for me (I can solder well enough to put it together) but the software side of things.

The software isn't too bad, most of the weird stuff is just for making it faster. I'll try to clean my code up, so I'm okay with posting it. Right now there's some cruft from other stuff I was testing, and it uses a couple third party libraries it doesn't need.

Sparky · Post by **Sparky** » 20 May 2015, 01:26

sekta wrote:Thank you. Great to know.

Your'e welcome.

Glide · Post by **Glide** » 20 May 2015, 08:57

Sparky wrote:the cap of 62 doesn't do anything, because your framerate never gets high with v-sync on.

I think it depends on the game you're testing. Source Engine games (Half Life 2 anyway) do not seem to be affected by large flip queue values. (does anything over 2 make a difference?)
Play any game where the flip queue size is not limited and directly affects latency.
Though it is noticeable at the default of 3, set the value to 32 (the maximum for anything DX9 or older) and the test should be conclusive.

Without it, there is roughly half a second of latency. With an RTSS cap of 62 that is significantly reduced.
Not as low as it could be - it's still laggy - but even though the cap is above the refresh rate (otherwise it would stutter) it still reduces latency.
Even NVIDIA's FPS cap set to 61 helps in that situation, though not nearly as much as RTSS.
Latency is far more variable when the flip queue size is large and the FPS cap is above the refresh rate though. It seems to cycle between a low latency and high latency state. But I would not normally be using a flip queue size that large - that seems to exaggerate the problem.

Setting an RTSS cap of 60 will significantly reduce latency - to the point that the flip queue size doesn't have any effect - but that causes the game to stutter every few seconds.

Which is why I prefer to reduce the flip queue size to 1 - which minimizes latency as much as possible before adding a cap - and then adding a cap on top of that seems to reduce latency further - even an RTSS cap which is slightly above the refresh rate.
Though it may not be the absolute lowest latency possible, it feels like it should only be a frame higher than an RTSS cap of 60, and I find that to be an acceptable trade when it results in completely smooth stutter-free gaming.

Which is why I was hoping you would test it, though testing in Source may not give the best results.
In your previous tests, FQ1 without a cap has ~5 frames of latency.
And an RTSS cap at your refresh rate has a ~3 frames of latency.

What I'd expect is that FQ1 with a cap just above the refresh rate would have ~4 frames of latency rather than ~5.
Without any objective way for me to test this, it's difficult to be certain, but subjectively it feels lower latency than just setting the flip queue to 1 on its own, or just capping at 62.

Sparky wrote:FQ1 does nothing if you're limited by the CPU, if you're not limited by the CPU or in game framerate cap it can have an effect.

Then why are some games smoother with it set to 1 than not?
Maybe it's not affecting latency, but it does still have an effect on the game which an FPS cap alone does not.

Sparky wrote:triple buffering is what lets v-sync work at 45fps on a 60hz monitor, normal double buffering knocks you down to 30, because the GPU stalls whenever a completed frame is waiting on the display. If you're below 60fps, triple buffering doesn't add any latency. But once you get to your refresh rate, the framerate caps out. If it was a low latency implementation of triple buffering, your framerate would go above your refresh rate and the extra frames would be dropped.

Yes, but I thought that's what "true" triple buffering was supposed to do, instead of cap you at the refresh rate, it would render as many frames as it can, but only present frames in sync with your refresh rate to avoid tearing while keeping latency to a minimum.

If it's only used to prevent tearing when running below your refresh rate - especially if it adds another frame of latency to do so - then it's completely worthless.

Sparky wrote:The software isn't too bad, most of the weird stuff is just for making it faster. I'll try to clean my code up, so I'm okay with posting it. Right now there's some cruft from other stuff I was testing, and it uses a couple third party libraries it doesn't need.

That would be great.

The more time I spend with all of this, reading the results of your tests and others, and experiment with all the possible variables, the more it makes me think that I should just give up and buy a G-Sync monitor, even though there's nothing out there which currently offers everything I'd be looking for in a display. They're much smaller and lower contrast than I'd like - especially for how expensive they are.

Sparky · Post by **Sparky** » 20 May 2015, 09:36

Glide wrote:
Sparky wrote:the cap of 62 doesn't do anything, because your framerate never gets high with v-sync on.
I think it depends on the game you're testing. Source Engine games (Half Life 2 anyway) do not seem to be affected by large flip queue values. (does anything over 2 make a difference?)
Play any game where the flip queue size is not limited and directly affects latency.
Though it is noticeable at the default of 3, set the value to 32 (the maximum for anything DX9 or older) and the test should be conclusive.

Without it, there is roughly half a second of latency. With an RTSS cap of 62 that is significantly reduced.
Not as low as it could be - it's still laggy - but even though the cap is above the refresh rate (otherwise it would stutter) it still reduces latency.
Even NVIDIA's FPS cap set to 61 helps in that situation, though not nearly as much as RTSS.
Latency is far more variable when the flip queue size is large and the FPS cap is above the refresh rate though. It seems to cycle between a low latency and high latency state. But I would not normally be using a flip queue size that large - that seems to exaggerate the problem.

Setting an RTSS cap of 60 will significantly reduce latency - to the point that the flip queue size doesn't have any effect - but that causes the game to stutter every few seconds.

Which is why I prefer to reduce the flip queue size to 1 - which minimizes latency as much as possible before adding a cap - and then adding a cap on top of that seems to reduce latency further - even an RTSS cap which is slightly above the refresh rate.
Though it may not be the absolute lowest latency possible, it feels like it should only be a frame higher than an RTSS cap of 60, and I find that to be an acceptable trade when it results in completely smooth stutter-free gaming.

Which is why I was hoping you would test it, though testing in Source may not give the best results.
In your previous tests, FQ1 without a cap has ~5 frames of latency.
And an RTSS cap at your refresh rate has a ~3 frames of latency.

What I'd expect is that FQ1 with a cap just above the refresh rate would have ~4 frames of latency rather than ~5.
Without any objective way for me to test this, it's difficult to be certain, but subjectively it feels lower latency than just setting the flip queue to 1 on its own, or just capping at 62.

Flush gpu in gedosato already has 3 frames of latency, without the dropping frames like RTSS at the refresh rate, so it's kind of a moot point. I saw you had some trouble getting that to work.

The line to enable it is "flushGPUEveryFrame true" which you add to the ini.

Did you configure the render and present settings there too? For my testing I just made a render resolution the same as the present resolution. Not sure if there's another way to disable the downsampling stuff, as that's kind of the main point of the tool. Also make sure gedosato is enabled in whatever game you're using, by setting up the blacklist and whitelist.

Sparky wrote:FQ1 does nothing if you're limited by the CPU, if you're not limited by the CPU or in game framerate cap it can have an effect.
Then why are some games smoother with it set to 1 than not?
Maybe it's not affecting latency, but it does still have an effect on the game which an FPS cap alone does not.

not sure. I haven't been measuring game performance at different settings, just latency.

Sparky wrote:triple buffering is what lets v-sync work at 45fps on a 60hz monitor, normal double buffering knocks you down to 30, because the GPU stalls whenever a completed frame is waiting on the display. If you're below 60fps, triple buffering doesn't add any latency. But once you get to your refresh rate, the framerate caps out. If it was a low latency implementation of triple buffering, your framerate would go above your refresh rate and the extra frames would be dropped.
Yes, but I thought that's what "true" triple buffering was supposed to do, instead of cap you at the refresh rate, it would render as many frames as it can, but only present frames in sync with your refresh rate to avoid tearing while keeping latency to a minimum.

If it's only used to prevent tearing when running below your refresh rate - especially if it adds another frame of latency to do so - then it's completely worthless.

Sparky wrote:The software isn't too bad, most of the weird stuff is just for making it faster. I'll try to clean my code up, so I'm okay with posting it. Right now there's some cruft from other stuff I was testing, and it uses a couple third party libraries it doesn't need.
That would be great.

The more time I spend with all of this, reading the results of your tests and others, and experiment with all the possible variables, the more it makes me think that I should just give up and buy a G-Sync monitor, even though there's nothing out there which currently offers everything I'd be looking for in a display. They're much smaller and lower contrast than I'd like - especially for how expensive they are.

Well, G-sync makes the whole topic a lot easier, just set a framerate cap a few FPS below the max refresh rate, and you're good. I'll probably go that route with my next system, but I'm not exactly sure when that will be. I can't really justify a new GPU right now, because the games I'm playing simply don't need it(mostly KSP).

Glide · Post by **Glide** » 20 May 2015, 10:38

Sparky wrote:Flush gpu in gedosato already has 3 frames of latency, without the dropping frames like RTSS at the refresh rate, so it's kind of a moot point. I saw you had some trouble getting that to work.

The line to enable it is "flushGPUEveryFrame true" which you add to the ini.

Did you configure the render and present settings there too? For my testing I just made a render resolution the same as the present resolution. Not sure if there's another way to disable the downsampling stuff, as that's kind of the main point of the tool. Also make sure gedosato is enabled in whatever game you're using, by setting up the blacklist and whitelist.

I can't seem to get GeDoSaTo working at all. I think it's because I am on a Standard User account instead of an Admin account.
I haven't been able to get similar tools like Widescreen Game Fixer to work either.

MSI Afterburner/RTSS works just fine though, so that's why I'm mostly limited to using the controls available to me in the game, in the driver, and via RTSS.

Blur Busters Forums

GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping

Re: GeDoSaTo Dynamic FPS Capping