silikone wrote:It defaults to being one or zero frames behind (depending on the tear line's position), which is optimal, but if I increase the frame rate and let it drift, it will eventually be two frames behind. I am unable to visualize in my head why that is. If the buffer is totally cleared before the rendering proceeds, shouldn't it be within schedule?
-- EDITED TO FIX ERRORS --
OpenGL implementations and graphics drivers implementations vary a lot. However, I understand that in many of them during VSYNC ON glutSwapBuffers() waits for VSYNC, does the page flip, before returning. However, it isn't always a simple traditional "double buffer" but a "frame queue" implementation in many modern graphics drivers nowadays. Graphics subsystems have become increasingly opaque black boxes, so what is described as double buffering does not behave exactly the same as 20 years ago in the old 3dfx / Riva days. Basically a back buffer, a queued buffer (called "front buffer" but is really a queued buffer), and a real front buffer (in the GPU) transmitting the pixels out to the display.
- Back Buffer (rendering to it)
- Queued Buffer (can be described as "front buffer" from OpenGL's perspective)
- Front Buffer (the actual, real, driver-based front buffer the GPU transmits pixel-by-pixel to the display)
Not all implementations do this, but some of them.... It can be less (no queued buffer) or more (bigger frame queue, e.g. NVInspector frame queue depth), which means it can pile up to more frames of input lag if you run unthrottled.
2 frames of lag occurs because of excessively early input reads. Upon attempting to page flip (glutSwapBuffers), driver blocks until the flip occurs before returning to your software. If you input read immediately, then you've rendered a frame which upon next glutSwapBuffers call, is forced to wait 1 full refresh cycle before it's added to the frame queue (1 frame input lag delay). Now, then that frame in the frame queue waits (1 frame of input lag delay) before the pixels of that frame finally gets transmitted to the display. Ugh, 2 frames of input lag.
So in that case, glutSwapBuffers() behaves "complete the rendering, and then give the buffer to the driver". Technically the graphics driver will immediately block (wait for VSYNC) but that's not necessarily true when there's a 1-frame queue. (OpenGL back buffer, Driver-based 1-frame queue, and displayed front buffer). I may have my terminology mixed up, but: The blocking kind of triple buffer, rather than a traditional lower-lag double buffer.
So if there's no frames in the queue, glutSwapBuffers returns immediately after render finish as driver exhibits no "Wait-on-VSYNC" behavior, despite VSYNC ON. This is because the front buffer is now parked into a 1-frame-buffer queue. You can see this by timing the start/end of glutSwapBuffers() being run at an interval lower than the refresh rate -- it returns immediately in some of these implementations. The driver will later automatically flip the queued buffer onto the screen (ala front buffer).
But now, if there's already a buffer queued in the driver, it means glutSwapBuffers() blocks until VSYNC. Until that queued frame buffer flips onto front buffer. So if you're rendering even very slightly faster than a refresh cycle, eventually, you'll hit the 2 frames behind state.
Use a high precision clock to measure the time between enter/exit of glutSwapBuffers (or for Direct3D, Present() ...), and this may be the effect you are seeing. Use a high precision time-measurement before/after glutSwapBuffers to measure its blocking behaviour. Such behaviours will be dependant on many factors (like queue depth in NVInspector, specific drivers, or even NVIDIA vs AMD) but needless to say, sometimes the "1 frame queue" is an annoying spin on traditional double buffering. It reduces stutters quite a lot, at the penalty of a bit more lag...
Users can solve this via a frame rate cap microscopically smaller than refresh rate, see
Low Lag VSYNC ON.
Or instead, using a variable refresh rate display such as G-SYNC, where refresh cycles are immediately displayed on the moment of glutSwapBuffers() .... Basically refresh cycles are software triggered for VRR displays. A software frame rate cap is effectively the software-controlled refresh rate. It can be asynchronous (variable frame time, and refresh cycles stay in sync, as long as within VRR range).
This may not work (From a game developer perspective) but: Now, if you wanted VSYNC ON with 1 frame lower lag, intentionally make your next glutSwapBuffers call almost exactly one refresh cycle later. If it was blocking due to VSYNC, your return from the last glutSwapBuffers call coincides well with VSYNC. It's easy to measure a display refresh rate if you're going full throttle -- the page flip speed is your refresh rate as long as frametimes are below a refresh cycle. But if you hold back, you no longer can easily measure the current refresh rate (Except query the display's refresh rate)
Instead, upon return of glutSwapBuffers, intentionally wait almost a full refresh cycle before reading input and rendering next frame (could be timed from entry into previous glutSwapBuffers, or from exit of glutSwapBuffers -- you will get different stutter mechanics). Now you've got 1 less frame latency. You will be more sensitive to stuttering (missed vblanks) so you may need an adjustment (e.g. subtract Xms or measured render-time feedback) to call glutSwapBuffers slightly early. Also timer events aren't precise, so try a conservative timer event to 1-2ms prior, followed immediately by a precision busywait to attempt to line up almost perfectly, followed by the inputread-render-pageflip cycle.
If you do this unthrottled:
[input read][render][glutSwapBuffers][input read][render][glutSwapBuffers][input read][render][glutSwapBuffers]
Then if it's going full throttle, then eventually glFglutSwapBuffersnish calls are blocking. It then settles to darn near exactly 1 refresh cycle between exits from glutSwapBuffers() if you measure using high precision clock (e.g. multimedia time accurate to 1/10,000,000sec). That's your refresh rate metronome you can derive from.
Either way, there are many game developer algorithms to reduce VSYNC ON input lag by about 1 frame. The name of the game is to delay the "[input read]" -- putting a delay between return from glutSwapBuffers before the next input read. In an ideal situation, the input read is right before the next VSYNC (if you can render fast enough)
This is essentially what was earlier said in.... "
which uses feedback from the GPU as to the timing of both frame completion and v_blank."
Adaptive lag-reducing VSYNC ON algorithm for game developers
Roughly, use precision clock-measurement varibles
-- Measure your frametimes (how long [render] took)
-- Measure your glutSwapBuffers times (how long [glutSwapBuffers] took)
-- Measure the times between exits of glutSwapBuffers (calling this [cycle time]. Run unthrottled, it eventually settles to a full refresh cycle apart)
-- After return from [glutSwapBuffers] intentionally delay input read approximately:
...... [cycle time] - [frame time] - [glutSwapBuffers time] - tiny padding margin (~1ms)
Or you may also test out algorithm:
...... [cycle time] - [frame time] - margin
Latter is leaving out glutSwapBuffers time out; the [cycle time] may end up being sufficient info about glutSwapBuffers. If your frame times and glutSwapBuffers times are extremely tiny, you'll be ending up waiting almost a full refresh cycle before the next input read -- reducing your input lag by 1 frame! If your frame times are longer than refresh cycle, you'll call the next glutSwapBuffers immediately, no problem (you should: This makes the algorithm compatible with VRR.
-- The tiny padding margin gives you an allowance for dynamic frametime variances from frame-to-frame. Smaller padding will stutter more (more chances of missed VSYNC's) during frametime changes, while bigger padding will have more lag. Make this a configurable constant (or config file), and/or dynamically change this value based on the standard deviation of your frametime variances (e.g. consistently slow frametime changes on fast systems can automatically create smaller padding margins, while slower systems will create more erratic frame times, leading to the generation of a larger margin)
-- Timer events aren't always high precision. Busywaits are much higher precision but consumes CPU. As a compromise, one can use a timer to about 1-2ms prior, then busywaiting until the exact precise time. Many solutions, depending on how you're frame pacing.
[cycle time] will fluctuate due to system imperfections. And if user switches refresh rates (as simple as switching between windowed/fullscreen) it changes. Experiment with making it non-averaged (use last [cycle time]) versus using a 1-frame, 2-frame or 3-frame trailing average [cycle time]. You might or might not see better behaviours.
Some modifications to this frame pacing algorithm may be needed, but this is the gist of a game developer's skills to create an adaptive software-based frame pacing algorithm that reduces input lag by about 1 frame.
This is one of many possible VSYNC ON lag-reducing algorithms that a game developer can do.
The above algorithm also tends to work with GSYNC/VRR, since it effectively becomes a self-capping algorithm, preventing as much input-lag-surge effects when frame rates maxes out on a variable refresh rate display, and glutSwapBuffers begins blocking on VSYNC (e.g. VRR max Hz = VSYNC ON) unlike at frame rates lower than a VRR display's max Hz.
(As readers recall -- way back in GSYNC Preview #2 a few years ago -- we're the first website in the world to discover the input-lag-surge effect of framerates hitting maximum on a VRR. And this was also reconfirmed during Jorim's Blur Busters GSYNC 101 tests)