If you're using a CPU based framerate cap, it doesn't matter at all, because the flip queue is always empty. I guess I could test it with DFC and RTSS, but one frame doesn't make radeonpro faster than RTSS, and I'm a bit wary of using both utilities at the same time. In any case if you're using v-sync and want stable frame times, you should be using the flush GPU after every frame option, instead of a framerate limiter. At least, that's what the synthetic test suggests, I still need to test that in game.Glide wrote:OK, but surely saving at least one frame by default without even having to introduce a cap is a good thing?Sparky wrote:Forcing flip queue to 1 saves exactly 1 frame in CS:GO. Take a look at the second graph in this post: http://forums.blurbusters.com/viewtopic ... 270#p15668 Keep in mind that's at 85hz refresh rate, there would be a bigger difference for 60hz. If you're bottlenecked at the start of the display pipeline, the frame only spends actual calculation time in each stage of the pipeline. If you're bottlenecked at the end of the pipeline, each frame spends 1/framerate in each stage of the pipeline, regardless of how much computation happens in that stage, because you're always waiting on the next stage of the pipeline to be ready.
The default value if an application does not specify anything is 3, so if it's reducing latency by one frame, it would seem as though CS:GO is setting a value of 2 rather than leaving it undefined. (more on that later)
In most games, which do not specify a value for the flip queue size, it should actually reduce latency by two frames.
Doing this has no undesirable effects, as far as I can tell.
It does not have a negative impact on game smoothness - in my testing it can actually improve frame-pacing, while frame limiting alone does not have as much of an effect.
Is there any reason to not combine this with frame limiting?
It seems as though there should only be positive effects from doing this.
I get the impression that you set it to "1" once, saw that it did not reduce latency as much as a frame cap, and then changed it back to "Use the 3D application setting" without trying it in conjunction with a frame cap.
Or are all your tests with a frame limiter performed with this set to "1" ?
It may be that while there is not much benefit in CS:GO, there may be benefits in other games/engines.
that graphic is perhaps overly simplified, there are more than 2 pipeline steps, so you get more than 2 frames of input lag if you're bottlenecked at the end of it.In the post that you linked to, an in-game cap of 85 FPS plus V-Sync seems to have 2 frames of latency.
As I understand it, this is the best possible result for double-buffered V-Sync and shows that CS:GO's FPS cap is working very well. (assuming that gameplay remains smooth/stutter-free)
What we also see is that an external cap via RivaTuner adds one additional frame of latency.
This is also expected because an external cap is always going to add some amount of latency - but not all games have the ability to cap their framerate or do it well, so this may still be lower latency than uncapped V-Sync.
All of the other results are puzzling though - not that I think they are wrong, just that nothing is behaving as I expected.
The uncapped double/triple buffering results have significant latency which should not be there.
You should not be getting three or four additional frames of latency.
An FPS cap—especially an external FPS cap—should not be able to reduce latency by more than 1 frame with V-Sync enabled. Something is not right if it is.
All that an FPS cap should be doing is pushing the start of frame rendering closer to the V-Sync point, as seen in this graphic.
That's possible to do, but nobody implements it that way. Frames aren't dropped, so you only see a benefit if your average framerate is below your refresh rate. Even then it's not a huge benefit to animation smoothness.And uncapped triple-buffering seems to have an additional frame of latency over uncapped V-Sync.
Is CS:GO not doing triple-buffering correctly, or were you forcing it via the driver or using an injector to get "triple buffering"?
My understanding (which may be entirely wrong) was that triple buffering added an extra buffer so that there are two "render" buffers and one "display" buffer, instead of one "render" buffer and one "display" buffer.
So with a proper triple-buffering implementation, the game would render as many frames as possible (e.g. 600) switching between the two "render" buffers each time it completes a frame, and then when it is time for the display to refresh it presents the most recent complete frame to display.
Parts cost is about $50 if you're starting with absolutely nothing. (should be similar cost for flood's diode setup, but you probably won't find a kit with everything you need in it. I think he has a BOM in his thread)Worst-case scenario: there should be 2 frames of latency—the same as double-buffered V-Sync—if it was not able to complete any additional frames between refreshes.
Best-case scenario: your hardware is able to render hundreds of frames between refreshes, so it should be able to present a frame which was rendered much closer to the refresh point, reducing latency to something in-between 1-2 frames, while still avoiding tearing and displaying a smooth image.
If we stick to the example of a game which can run at 600 FPS uncapped, on an 85Hz display:
600/85=7.06, so it should be able to render 7 complete frames between every refresh.
So instead of 23.5ms (2 frames at 85Hz) latency should be reduced to 13.5ms?
Capping the framerate should not improve latency with triple-buffering, that should basically force it into 2 frames of latency as though you were using double-buffering - which is exactly what your results show: capped triple-buffering has the same latency as double-buffering.Not for development in my case, but I agree, it would be nice to have a device like this which can measure total round-trip latency from input to display.Durante wrote:That's pretty awesome, I need a setup like that for development.Sparky wrote:Testing methodology, I'm using an arduino micro with a photoresistor to detect a dark to light transition that happens when I move the mouse. The arduino emulates a mouse to the syatem, and measures the timings involved. I modified the USB library to put a timestamp on the usb interrupt, in order to remove the variance of the 1khz usb polling. It's very similar to flood's test setup, and there's more detail in that thread.
I'm not sure what the parts cost is, but I'd pay for a pre-built kit if someone were to put it all together in an easy-to-use package. I'm sure there is at least a small demand for this sort of device.
Rather than modern games like CS:GO, I'm more concerned about latency with emulators like RetroArch for example, and would like to investigate the latency behavior outlined above.
20 bucks for the microcontroller(either a teensy with pins, or an arduino micro with pins, so you can just plug it directly into the breadboard)
25 for a starter kit with photoresistor, breadboard, jumper wires, resistors, a switch/button, and a potentiometer.
5 for USB to micro usb cable.
I also used a soldering iron to attach some leads to the photoresistor, so I could put it directly in front of the monitor.