Post
by Chief Blur Buster » 24 Dec 2013, 13:21
Even though I may not know game programming as much as the pros do, I know the inner workings of displays more than most game programmers, so I parse things from a display perspective. With that knowledge, I know the whole-chain latency from code all the way to photons hitting eyeballs...
Your code is really simple. The simpleness of your execution suggests approximately 1 refresh cycle of input latency, relative to the start of scan-out.
-- This is because the rendering runs virtually instantaneously (very rudimentary rendering), so Present() would have near-zero render overhead.
-- As is the typical VSYNC ON situation for most API's, I assume Present() renders and then waits for VSYNC and then returns practically right on the page flip (upon blanking interval). That means Present() returns at the beginning of a refresh.
-- This means the next input read occurs very early in the next refresh cycle (as your loop repeats early in refresh cycle)
-- Therefore, the results of that input read won't be presented until after the current refresh cycle (the call to Present() wait-out the current full refresh cycle before the refresh can be displayed)
-- When the refresh cycle finally begin again, the scanout begins at the top. That's a full frame cycle after the input read.
-- You have more lag for bottom edge of the screen. That's yet another full frame cycle of input lag, again.
-- In short, the display is currently scanning out the contents of the frame of previous Present() call while the next call to Present() is waiting for that to finish scanning out. Present() then returns immediately upon the blanking interval, and the contents of that Present() is currently starting to be scanned out as your while loop begins the next iteration.
So assuming 60Hz refresh (1/60sec = 16.7ms), and using a CRT, with a zero-latency analog connection (VGA), your input lag would be 16.7ms (1/60sec) for the top edge of the screen and 33.3ms (2/60sec) for the bottom edge of the screen. If you're using one of the fastest LCDs on the market (modern ASUS/BENQ 120Hz/144Hz TN panel, realtime scanout without frame buffering, and 1-2ms transition time), add about two to three milliseconds extra on top of this.
If your code can execute a blocking busywait loop (e.g. a 15ms pause, to get closer to right before VSYNC, to get less than 2ms before VSYNC) you can reduce your input latency by practically a full frame. Realistically, you can easily reduce to approximately 1/10th refresh cycle of input latency before the start of scan-out. Basically, you want to wait until the raster is near the bottom of the existing refresh, before reading input and quickly rendering the next frame, and presented to display (with consequently fresher input reads). In that situation, you can reduce your absolute minimum input lag to almost 0ms for the top edge of the screen, and almost 16.7ms for the bottom edge of the screen. If Present() returns almost instantly, you were on time for VSYNC and successfully reduced input lag via the "wait-till-right-before-VSYNC-before-reading-input" technique. If Present() takes a full refresh cycle to refresh, you've missed VSYNC.
The problem of waiting till right before VSYNC to read input and render, is that you might potentially miss VSYNC. But if your rendertime is predictable (e.g. certain emulators), this is a great technique to massively reduce VSYNC ON input latency (without getting a GSYNC monitor). I know that certain emulators such as WinUAE has a command line option to pull this feat off, to reduce input latency of VSYNC ON. This is far less practical for highly-variable-framerate games, though. "Just-in-time" rendering right before VSYNC is one of the ways to reduce VSYNC ON input latency, but without a forgiving variable-refresh-rate monitor, missing VSYNC gives you an instant penalty of one full refresh cycle of input lag. So that's the fine line -- trying to render at the very last minute before VSYNC, is a "gamble".
1980's History Note: Did you know? Old arcade games and 8-bit computer games used to be able to read input at the very last minute. Very low VSYNC ON input latency because they could read and process input during the blanking interval, pre-position sprites and scrolling registers, or write a small amount of character-set-based data (e.g. a few bytes of new graphics data at edge of screen during scrolling). You could even actually COUNT the number of machine language instruction execution cycles, and successfully fit everything in the time period of a vertical blanking interval (about 1 millisecond for NTSC televisions), so input reads were ultra-fresh at the beginning of display scan-out. That was back in the 8-bit Atari and Nintendo era, the golden days of arcade games. All of them always ran VSYNC ON. So input lag of Super Mario Brothers was never bad, even though Super Mario Brothers always ran essentially VSYNC ON. In fact, sometimes input reads were read inside raster interrupts, and sprites pre-positioned just before the scanout began to display sprites. Input lag was sometimes only a few hundred microseconds in some ultra highly optimized 1980's games! Today, 3D graphics have difficult-to-predict render times, so we've gone to buffering schemes, which unfortunately adds input lag, typically often a full frame worth. In the 21st century, VSYNC ON has a bad reputation among competitive gamers, due to input lag. The few remaining programmers that still closely understand rasters (e.g. from Atari 2600 programming or raster interrupt programming from the 1980s), will be better-positions to understand whole-chain input latency issues than the average 21st century 3D game programmer who never played on CRTs and have no concept/idea of how displays are refreshed.