Alright, there's some very good looking news when it comes to NVidia's frame limiter. It seems NVidia added a low latency mode setting to their limiter, and Profile Inspector just got updated to support the new driver setting:
http://forums.guru3d.com/showpost.php?p ... stcount=10
You need the latest NVidia driver for this (needs to be based on driver branch R381). Latest Profile Inspector is provided here:
https://ci.appveyor.com/project/Orbmu2k ... /artifacts
I haven't tested this myself yet (will do ASAP), but if this new NVAPI setting actually does what it claims it does, then it could invalidate the current "don't use Profile Inspector" advice. Ideally, we'd need to test if it's now lower latency than RTSS.
Edit:
Also, the NVidia driver now also seems to support drawing of real-time performance graphs by setting appropriate NVAPI settings, and Profile Inspector also added configuration options for them ("Flip Indicator" setting.) Which is very useful for performance analysis:
GRAPH_FLIP_FPS - FPS graph, measured on display hw flip
GRAPH_PRESENT_FPS - FPS graph, measured when the user mode driver starts processing present
GRAPH_APP_PRESENT_FPS - FPS graph, measured on app present
DISPLAY_PAGING - Add red paging indicator bars to the GRAPH_PRESENT_FPS graph
DISPLAY_APP_THREAD_WAIT - Add app thread wait time indiator bars to the GRAPH_APP_PRESENT_FPS graph
Enabled - Enable everything
So it seems you can get some perf data at different stages of the frame output chain. Not sure yet what they mean exactly (I'm not a graphics developer), but they do sound very useful.
Edit 2:
There's also some "hidden" settings (visible only if set in a profile or activated show all settings option). There's one that stands out, and that's an option of forcing DX9 or DX10 Present() calls (the function call that flips your render buffers) to be synchronous or asynchronous. Again, as a non-graphics dev, I don't know what the effect of that is. It just sounds like it
might be a setting that affects latency and/or pacing.