Oh, and if we use GSYNC, we can simply bypass this problem completely. We'd simply frame-cap and it'd look exactly like VSYNC ON, but with much lower lag than VSYNC ON.
GSYNC + fps_max capping -- looks exactly like VSYNC ON -- but with less lag -- because GSYNC already does maximum-DisplayPort-speed frame delivery, regardless of current refresh rate, dynamically varying the interval between refresh cycles. The frame becomes visible as soon as it's being delivered. So the magic sauce is probably GSYNC + fps_max for a low-lag "VSYNC ON" experience, assuming the frametime jitter isn't too bad.
However, varying rendertimes inserts a varying lag to frame-capped GSYNC even if it's running flat out consistent at frame cap, because rendertimes can still vary quite a lot.
For consistent lag during GSYNC + fps_max, you would want the frame capping utility to do an intentional busywait if the rendering completes very fast. e.g. If your fps_max target is 120 and you're running a game engine capable of 500fps, you'd target a 2ms render window, and if a render occurs
I wonder if any frame-capping utility developers (or NVIDIA or game developer who develops the in-game frame cap) has already done this sort of thing -- i.e. the insertion of intentional precise busywaiting (e.g.
while (MicrosecondTimer() < TargetTime) { /* busywait; loop until target time */ } ) during rendering to force exact rendertimes (as long as GPU rendertime is less than user-specified rendertime target). No matter how faster-than-expected the frame renders. That would in theory allow exact input lag (<1ms latency jitter in theory) + GSYNC + fps_max, for a low-latency VSYNC ON experience with consistent fixed (non-varying) input lag, perhaps a much flatter graph. Basically using capped-out GSYNC instead of using VSYNC ON -- using GSYNC with consistent (capped) framerate as a method of lowering VSYNC ON input lag.
Adding input lag intentionally is anathema to a lot of programmers, but when we're doing 500fps+ in CounterStrike:GO on Titans and 1000 series GeForces -- it's just a 2ms rendertime window we're aiming for.
So if rendertime is 1.4ms, we intentionally busywait for 0.6ms (either on GPU or on CPU) to maintain perfect timing of input read for the next frame render (potentially microsecond-accurate), to force an exact 2.0ms rendertime.
If a frame capping programmer wanted to go elaborate Full Monty to target a Holy Grail, and want to inject a realtime shader modification into existing games (ugh, this would violate anti-bot anti-cheat guards, and not be allowed for online competition gaming), one could in theory do it. You'd use microsecond-accurate clocks on a GPU using shader logic (e.g. OpenGL glutGet(GLUT_ELAPSED_TIME) etc) and loop based on that to force precise rendertimes for frames on a GPU. Or whatever DirectX/etc equivalent. Every frame taking less than exactly 2ms, would be forced to render in exactly 2ms (plus or minus a few microseconds) and produce amazingly flat FRAPS graphs in certain games. Perfectly predictable input lag of perfectly constant 2ms, which is often much easier to aim for, than a varying input lag randomly jumping 0 through 4ms. Assuming all other factors weren't adding microstutter... The game engine will be a big problem, but what you want is a precise time between the input read & the VSYNC time, so using any trick to force that would be a benefit. Some game engines do an input read almost immediately right after returning from VSYNC, so forcing a busywait into that logic (e.g. either a CPU busywait loop or shader busywait loop that executes upon call to VSYNC page flip, to force a precise fixed delay after VSYNC before returning to the game app, to force the game to do an input read closer before the next VSYNC interval... depends on game engine's timing of input read... It'd work mainly with game engines that do the input read (for next frame) immediately upon returning from a page flip). This would in theory result in consistency-low-latency VSYNC ON (if disabling multicore rendering & using shortest possible frame queue) or consistency-low-latency fixed-framerate GSYNC.
I'd dare bet that competition gamers would prefer latency-variability-reduction, e.g. given the choice, preferring an added exact +2ms latency penalty (easier aiming), rather than variable added +0ms-thru-+4ms latency penalty (throws off aiming) -- 1ms of unpredictable latency adds a 2ms of mis-aiming error during 2000 pixels/second fast-aiming. So 4ms of unpredictable latency variability range (e.g. varies from 20ms thru 24ms) can mean you can mis-aim as much as 8pixels during 2000 (math: 4/1000ths of 2000 = 8 pixels) pixels/second fast-aiming -- that is approximately one-screen-width in 1 second. Meaning you won't correctly shoot that far-away enemy as quickly, because you have to do back-n-forth aiming corrections before pulling the trigger...
Now if only monitor manufacturers would release
strobed GSYNC... Getting blur reduction at the same time as GSYNC. Frame-capped GSYNC (below GSYNC max) is much lower latency than traditional VSYNC ON, so this would be a good way of getting the majority of the VSYNC ON experience without the lag.