Blur Busters Forums

Posted: **26 Dec 2017, 00:54**

Sparky wrote:
In theory, is there a solution that developers can implement to ensure responsive V-sync in all but the most unusual cases (possibly by reading the EDID to get a more precise refresh rate)?
In theory yes, but to get down to the minimum of latency you need the game developer to implement a synchronous framerate cap, which uses feedback from the GPU as to the timing of both frame completion and v_blank. You need the feedback in order to control the amount of buffering, every bit of buffering will add latency, but if you don't have enough you'll drop frames and reintroduce stutter.

So glFinish [glutSwapBuffers] should supposedly clear the buffer and keep it under control to prevent worst case scenarios. The instant lock to half frame rate at sync misses rules this out for anything demanding, and while it does lower the lag at high frame rates, it still seems to be up to two frames behind. I tested this by binding Vsync toggle to a key, strafing and tracking a wall with my eyes, then taking note on how many frames ahead vsync off is upon being switched to. It defaults to being one or zero frames behind (depending on the tear line's position), which is optimal, but if I increase the frame rate and let it drift, it will eventually be two frames behind. I am unable to visualize in my head why that is. If the buffer is totally cleared before the rendering proceeds, shouldn't it be within schedule?

Posted: **26 Dec 2017, 02:22**

silikone wrote:It defaults to being one or zero frames behind (depending on the tear line's position), which is optimal, but if I increase the frame rate and let it drift, it will eventually be two frames behind. I am unable to visualize in my head why that is. If the buffer is totally cleared before the rendering proceeds, shouldn't it be within schedule?

-- EDITED TO FIX ERRORS --

OpenGL implementations and graphics drivers implementations vary a lot. However, I understand that in many of them during VSYNC ON glutSwapBuffers() waits for VSYNC, does the page flip, before returning. However, it isn't always a simple traditional "double buffer" but a "frame queue" implementation in many modern graphics drivers nowadays. Graphics subsystems have become increasingly opaque black boxes, so what is described as double buffering does not behave exactly the same as 20 years ago in the old 3dfx / Riva days. Basically a back buffer, a queued buffer (called "front buffer" but is really a queued buffer), and a real front buffer (in the GPU) transmitting the pixels out to the display.

- Back Buffer (rendering to it)
- Queued Buffer (can be described as "front buffer" from OpenGL's perspective)
- Front Buffer (the actual, real, driver-based front buffer the GPU transmits pixel-by-pixel to the display)

Not all implementations do this, but some of them.... It can be less (no queued buffer) or more (bigger frame queue, e.g. NVInspector frame queue depth), which means it can pile up to more frames of input lag if you run unthrottled.

2 frames of lag occurs because of excessively early input reads. Upon attempting to page flip (glutSwapBuffers), driver blocks until the flip occurs before returning to your software. If you input read immediately, then you've rendered a frame which upon next glutSwapBuffers call, is forced to wait 1 full refresh cycle before it's added to the frame queue (1 frame input lag delay). Now, then that frame in the frame queue waits (1 frame of input lag delay) before the pixels of that frame finally gets transmitted to the display. Ugh, 2 frames of input lag.

So in that case, glutSwapBuffers() behaves "complete the rendering, and then give the buffer to the driver". Technically the graphics driver will immediately block (wait for VSYNC) but that's not necessarily true when there's a 1-frame queue. (OpenGL back buffer, Driver-based 1-frame queue, and displayed front buffer). I may have my terminology mixed up, but: The blocking kind of triple buffer, rather than a traditional lower-lag double buffer.

So if there's no frames in the queue, glutSwapBuffers returns immediately after render finish as driver exhibits no "Wait-on-VSYNC" behavior, despite VSYNC ON. This is because the front buffer is now parked into a 1-frame-buffer queue. You can see this by timing the start/end of glutSwapBuffers() being run at an interval lower than the refresh rate -- it returns immediately in some of these implementations. The driver will later automatically flip the queued buffer onto the screen (ala front buffer).

But now, if there's already a buffer queued in the driver, it means glutSwapBuffers() blocks until VSYNC. Until that queued frame buffer flips onto front buffer. So if you're rendering even very slightly faster than a refresh cycle, eventually, you'll hit the 2 frames behind state.

Use a high precision clock to measure the time between enter/exit of glutSwapBuffers (or for Direct3D, Present() ...), and this may be the effect you are seeing. Use a high precision time-measurement before/after glutSwapBuffers to measure its blocking behaviour. Such behaviours will be dependant on many factors (like queue depth in NVInspector, specific drivers, or even NVIDIA vs AMD) but needless to say, sometimes the "1 frame queue" is an annoying spin on traditional double buffering. It reduces stutters quite a lot, at the penalty of a bit more lag...

Users can solve this via a frame rate cap microscopically smaller than refresh rate, see Low Lag VSYNC ON.

Or instead, using a variable refresh rate display such as G-SYNC, where refresh cycles are immediately displayed on the moment of glutSwapBuffers() .... Basically refresh cycles are software triggered for VRR displays. A software frame rate cap is effectively the software-controlled refresh rate. It can be asynchronous (variable frame time, and refresh cycles stay in sync, as long as within VRR range).

This may not work (From a game developer perspective) but: Now, if you wanted VSYNC ON with 1 frame lower lag, intentionally make your next glutSwapBuffers call almost exactly one refresh cycle later. If it was blocking due to VSYNC, your return from the last glutSwapBuffers call coincides well with VSYNC. It's easy to measure a display refresh rate if you're going full throttle -- the page flip speed is your refresh rate as long as frametimes are below a refresh cycle. But if you hold back, you no longer can easily measure the current refresh rate (Except query the display's refresh rate)

Instead, upon return of glutSwapBuffers, intentionally wait almost a full refresh cycle before reading input and rendering next frame (could be timed from entry into previous glutSwapBuffers, or from exit of glutSwapBuffers -- you will get different stutter mechanics). Now you've got 1 less frame latency. You will be more sensitive to stuttering (missed vblanks) so you may need an adjustment (e.g. subtract Xms or measured render-time feedback) to call glutSwapBuffers slightly early. Also timer events aren't precise, so try a conservative timer event to 1-2ms prior, followed immediately by a precision busywait to attempt to line up almost perfectly, followed by the inputread-render-pageflip cycle.

If you do this unthrottled:

[input read][render][glutSwapBuffers][input read][render][glutSwapBuffers][input read][render][glutSwapBuffers]

Then if it's going full throttle, then eventually glFglutSwapBuffersnish calls are blocking. It then settles to darn near exactly 1 refresh cycle between exits from glutSwapBuffers() if you measure using high precision clock (e.g. multimedia time accurate to 1/10,000,000sec). That's your refresh rate metronome you can derive from.

Either way, there are many game developer algorithms to reduce VSYNC ON input lag by about 1 frame. The name of the game is to delay the "[input read]" -- putting a delay between return from glutSwapBuffers before the next input read. In an ideal situation, the input read is right before the next VSYNC (if you can render fast enough)

This is essentially what was earlier said in.... "which uses feedback from the GPU as to the timing of both frame completion and v_blank."

Adaptive lag-reducing VSYNC ON algorithm for game developers
Roughly, use precision clock-measurement varibles
-- Measure your frametimes (how long [render] took)
-- Measure your glutSwapBuffers times (how long [glutSwapBuffers] took)
-- Measure the times between exits of glutSwapBuffers (calling this [cycle time]. Run unthrottled, it eventually settles to a full refresh cycle apart)
-- After return from [glutSwapBuffers] intentionally delay input read approximately:
...... [cycle time] - [frame time] - [glutSwapBuffers time] - tiny padding margin (~1ms)
Or you may also test out algorithm:
...... [cycle time] - [frame time] - margin
Latter is leaving out glutSwapBuffers time out; the [cycle time] may end up being sufficient info about glutSwapBuffers. If your frame times and glutSwapBuffers times are extremely tiny, you'll be ending up waiting almost a full refresh cycle before the next input read -- reducing your input lag by 1 frame! If your frame times are longer than refresh cycle, you'll call the next glutSwapBuffers immediately, no problem (you should: This makes the algorithm compatible with VRR.
-- The tiny padding margin gives you an allowance for dynamic frametime variances from frame-to-frame. Smaller padding will stutter more (more chances of missed VSYNC's) during frametime changes, while bigger padding will have more lag. Make this a configurable constant (or config file), and/or dynamically change this value based on the standard deviation of your frametime variances (e.g. consistently slow frametime changes on fast systems can automatically create smaller padding margins, while slower systems will create more erratic frame times, leading to the generation of a larger margin)
-- Timer events aren't always high precision. Busywaits are much higher precision but consumes CPU. As a compromise, one can use a timer to about 1-2ms prior, then busywaiting until the exact precise time. Many solutions, depending on how you're frame pacing.

[cycle time] will fluctuate due to system imperfections. And if user switches refresh rates (as simple as switching between windowed/fullscreen) it changes. Experiment with making it non-averaged (use last [cycle time]) versus using a 1-frame, 2-frame or 3-frame trailing average [cycle time]. You might or might not see better behaviours.

Some modifications to this frame pacing algorithm may be needed, but this is the gist of a game developer's skills to create an adaptive software-based frame pacing algorithm that reduces input lag by about 1 frame. This is one of many possible VSYNC ON lag-reducing algorithms that a game developer can do.

The above algorithm also tends to work with GSYNC/VRR, since it effectively becomes a self-capping algorithm, preventing as much input-lag-surge effects when frame rates maxes out on a variable refresh rate display, and glutSwapBuffers begins blocking on VSYNC (e.g. VRR max Hz = VSYNC ON) unlike at frame rates lower than a VRR display's max Hz.

(As readers recall -- way back in GSYNC Preview #2 a few years ago -- we're the first website in the world to discover the input-lag-surge effect of framerates hitting maximum on a VRR. And this was also reconfirmed during Jorim's Blur Busters GSYNC 101 tests)

Posted: **27 Dec 2017, 18:13**

If you are using a nvidia card you may want to play around with the multi threaded optimization settings.

Posted: **27 Dec 2017, 18:14**

Very informative. Gold star for you!

The suggested solution of using RivaTuner to cap the rate below the display frequency does suffer from the problem of eventually running into a microstutter. In practice, desktop activity is probably going to be the main culprit of imperfect pacing for most, but it's still worth addressing.
Measuring (and adjusting?) glFinish [glutSwapBuffers] does seem like a very fine way to maximize responsiveness and avoid pathological cases. The variable of timer accuracy does come into question, and I wonder how common it is for PCs to suffer in this area, if at all.

Also, glFinish [glutSwapBuffers] right before or after rendering?
While the latter does ensure that it doesn't get in the way between input and rendering, it leaves no room to breathe after the commands have been issued. If it's set before the rendering, there will be a window when the game is running simulations for the GPU to finish up its work implicitly.

Posted: **27 Dec 2017, 22:54**

For most single thread heavy games the nvidia driver will run multiple threads to try to keep performance high. But this can add another level of buffering and as explained in the 2nd post or so, every buffering step can back up while the end of the chain waits for a refesh. Playing with the threaded optimization can help depending on the game (or application) and the cpu and even the gpu. This may be less relevant for the newer gpus but I am unsure. The reason it was implemented in the first place was due to nvidias drivers having to play more of the role in managing the que of operations but I believe the 10 series are finnaly better at this. Its for sure worth a try if you are thinking about input lag with vsync on.

Posted: **30 Dec 2017, 21:46**

silikone wrote:Very informative. Gold star for you!

Thanks!

I need to stress that while I am very familiar with display refreshing behaviour and VSYNC ON / VSYNC OFF / GSYNC / FreeSync -- my OpenGL API call blocking behaviour is a little rusty. Basically I know a lot about what goes on in the graphics drivers to display screen pipeline. We're intimately familiar with a different part of the display chain...

After I replied to your post, what I say as glFinish may more appropriately be the timing of glutSwapBuffers call instead.

Basically, the behaviours of that glutSwapBuffers call is more predictable:
-- It returns immediately on VSYNC OFF (or Fast Sync) but blocks on VSYNC ON.
-- For variable refresh rate displays, that call returns immediately if the graphics drivers successfully triggered a new refresh cycle immediately. But that call begins blocking if your VRR display is still in the process of refreshing the last frame (e.g. framerates above VRR display's maximum Hz)

So I've edited my post to clarify it is "glutSwapBuffers".

silikone wrote:The suggested solution of using RivaTuner to cap the rate below the display frequency does suffer from the problem of eventually running into a microstutter. In practice, desktop activity is probably going to be the main culprit of imperfect pacing for most, but it's still worth addressing.

Yes, desktop activity will be the main culprit. The slower the system, the more prone, but the faster the system, the easier 60fps frame pacing becomes.

Content that use only a tiny fraction (e.g. 25% or less) CPU/GPU during VSYNC ON (e.g. Half Life 2) tends to now have excellent 60fps frame pacing nowadays, almost no matter how much desktop activity there is in the background -- as long as it's the foreground application. The amount of CPU and GPU headroom you have left, gets used up by background applications, and if you have no long-freezes (e.g. a pure SSD system with a multicore CPU), I've seen 60fps content that is unbothered by background activity.

silikone wrote:Measuring (and adjusting?) glFinish [glutSwapBuffers] does seem like a very fine way to maximize responsiveness and avoid pathological cases. The variable of timer accuracy does come into question, and I wonder how common it is for PCs to suffer in this area, if at all.

Most computers have highly accurate clocks that ticks about 10,000,000 times a second (approx). Those are the High Precision Time - Performance Counters API. My go-to for ultra-precise time.

However, timer events usually don't have such precision -- my experience is they can be late by a millisecond or two. You can get much more accurate than that (e.g. multimedia timers, etc). Anyway, a good buffer margin is necessary -- trigger your timer slightly early. If you need precision without CPU consumption, trigger a timer event to a millisecond prior, then simply busywait to "close that early gap" if you need to some event at an even more precise time. But it depends on *what* you need a timer event for...

In other words, if there is something you want demanding precision, you want to trigger your timer a millisecond early then busywait to your exact system time. Then you get uncannily precise if you're using low level programming languages (C++) and relatively precise if you're using higher level (e.g. C#). Most of the time, in practice, you do not need such demanding precision...

silikone wrote:Also, glFinish [glutSwapBuffers] right before or after rendering?

(See my talk about glutSwapBuffers above first).

____

The [url=https://msdn.microsoft.com/en-us/librar ... s.85).aspx]RasterStatus.InVBlank API flag</a> Is another alternative for low-lag VSYNC ON tricks for game developers... but it's Direct3D

For Direct3D, some developers use RasterStatus.InVBlank to detect the VSYNC -- to essentially allow you to do tearing-free VSYNC OFF. It's essentially another low-lag VSYNC ON trick that a game developer can do -- basically Direct3D Present() when RasterStatus.InVBlank == true ... That puts the tearline right below the bottom edge of the screen, essentially simulating a VSYNC ON page flip while being in VSYNC OFF mode. The lag savings is roughly similar, and sometimes it's easier to check this flag than to try to "guess" the timings of VSYNC from the timing of returns from blocked Present() calls (when it begins blocking if you do it at an unthrottled rate during VSYNC ON).

silikone wrote:While the latter does ensure that it doesn't get in the way between input and rendering, it leaves no room to breathe after the commands have been issued. If it's set before the rendering, there will be a window when the game is running simulations for the GPU to finish up its work implicitly

Yes, lack of breathing room -- long processing/simulations/rendering times versus shorter ones. But you can compensate afterwards by varying your wait until beginning your next input read and render (from smart knowledge of the timing of return from glutSwapBuffers calls). That's what's the mentioned "margin" is for, to time the next time you call glutSwapBuffers(). Basically the perfect scenario is timing your call of glutSwapBuffers() in a way where it reliably exits as quickly as possible (minimal waits for VSYNC).

By default, call it right away, until it begins blocking, and if it blocks, monitor the return times to predict the best time to make the next call (taking into account the whole input-read-render-and-flip cycle)

THat way, this all doesn't matter during VSYNC OFF if your trying to render as many frames as quickly as possible, but this matters during VSYNC ON (and VRR too, with VSYNC ON behaviours occuring at framerates above max Hz)

Good framepacing logic that doesn't break during VSYNC OFF, VSYNC ON, FastSync, VRR, is extremely hard to do. Be careful that your frame pacing logic doesn't break in different modes. I've seen good VSYNC ON framepacing lead to terrible VSYNC OFF mechanics, and vice-versa, since optimizing for one or the other can be mutually opposing goals (in ways) if you don't do it carefully...

Posted: **01 Jan 2018, 01:56**

Are you sure that glutSwapBuffers will block? The "prerendered frames" setting on nvidia also applies to opengl. In that case, wouldn't the call return if the buffer queue wasn't full yet?

Posted: **08 Jan 2018, 13:43**

RealNC wrote:Are you sure that glutSwapBuffers will block? The "prerendered frames" setting on nvidia also applies to opengl. In that case, wouldn't the call return if the buffer queue wasn't full yet?

That's correct!

If you render flat out as quickly as you can, It will eventually block once the prerendered frames queue is full.

When it does, the blocking releases at exactly a refresh rate interval. That is timing information that your videogame app under development, can use to eliminate framebuffer backpressure by almost 1 frame of latency -- but only via game developer intentional coding as described above. Basically a latency-adapting algorithm.

When the blocking is detected, that's a full frame queue condition. You then self-throttle the NEXT input read intentionally upon return of a blocked glutSwapBuffers. But you do this ONLY when you detect that glutSwapBuffers had just blocked.

This puts the input read much closer to the next blockage. Reducing the latency between input read and the glutSwapBuffers call.

Do the input read about (average rendertime + padding margin) prior to the next call of glutSwapBuffers.

If your frames are simple (e.g. Quake like), rendertimes are a tiny fraction of a refresh cycle. Add a small padding margin for rendertime jittering (varying frame rates) and you've reduced your average input lag of VSYNC ON, via game developer programming techniques.

Not just for latency-lowering of VSYNC ON -- this also can work great for G-SYNC too, reducing the need for end-user to intentionally cap the frame rate below VRR max. (An in-game frame rate cap still performs better, but you can probably at least match RTSS without needing to install RTSS)

This technique only works if your rendertimes are a small fraction of a refresh cycle time. So mainly useful for things like simple graphics, Quake style games, emulators, etc.

Blur Busters Forums

What exactly causes V-sync to introduce input lag?

Re: What exactly causes V-sync to introduce input lag?

Re: What exactly causes V-sync to introduce input lag?

Re: What exactly causes V-sync to introduce input lag?

Re: What exactly causes V-sync to introduce input lag?

Re: What exactly causes V-sync to introduce input lag?

Re: What exactly causes V-sync to introduce input lag?

Re: What exactly causes V-sync to introduce input lag?

Re: What exactly causes V-sync to introduce input lag?