The software I've used are AMD's OCAT, Nvidia's FrameView and CapFrameX. These programs give you detailed data of every single frame they capture such as when it's started (presented to the CPU), when its render is complete (sent to the back buffer) and when it's displayed (sent to the front buffer). They also allow you to see frametimes graphs and CapFrameX offers input lag approximation and until-displayed-times graphs too.
I would like to point out that while overlay monitoring software such as Afterburner seem to measure frame time as the delta time between two consecutive frames being submitted to the GPU, the programs I utilized do it as the delta time between two consecutive frames being presented to the CPU. I believe this is the more appropiate way to do it and a better representation of frame pacing, since, assuming game time and render time are in sync, the data being processed by the CPU determines what will be displayed in a frame. This is also the reason why Afterburner shows perfectly stable frame times when using the RTSS framerate cap, and because the way RTSS operates is by blocking the CPU from delivering a new frame to the GPU until a specified time interval is reached.
I used the data of these three programs to contrast the results. I would like to post screenshots, but because it's a lot of cramped numerical data I don't think it's a good idea, but I encourage anyone who's interested to try these tests on their own.
I used the following list of games to test Scanline Sync with diferent engines and APIs:
- Assassin's Creed Unity
- Assassin's Creed Syndicate
- Battlefield 1
- Battlefield V
- Doom 2016
- Doom Eternal
- Black Mesa
- Crysis 1, 2 & 3
- Dark Souls 1, 2 & 3
- Hollow Knight
- Mad Max
- Call of Duty Modern Warfare 2019 (I got a permanent ban from Activision servers, I think they mistakenly detect monitoring software as cheats, so I don't recommend testing this game)
- Sleeping Dogs
- Skyrim Special Edition
- The Witcher 3
- Ghost Recon Wildlands
- Wolfenstein The New Order
- World War Z
From what I can gather, Scanline Sync makes the GPU try to push a new frame to the front buffer (or back buffer in combination with other sync methods) when the display reaches a certain scanline (the index you introduce in RTSS), and at the same time, presents the next frame to the CPU. This means that, not taking into account other sources of input lag, you will only get a refresh cycle equivalent of latency, but also that your hardware must be able to render frames faster than that.
VSYNC adds a back buffer, adding up to another refresh cycle of input lag, and swaps it with the front buffer usually at the beggining of the Vertical Blank Interval (also known as VBI or VBlank), but sometimes at the end of the VBI depending on the game's implementation.
But that's not all. VSYNC also involves CPU pre-rendered frames. The majority of games use a configuration of 3 pre-rendered frames. This means that when frame 1 beggins being displayed (or is sent to the front buffer), the CPU has already rendered frames 2,3 and 4, and when frame 1 is halfway of its refresh, the CPU will begin rendering frame 5. That's 3 and a half refreshes of input lag (around 60 ms at 60Hz!).
This is when Scanline Sync comes into play. Scanline Sync will prevent VSYNC from pre-rendering too many frames and alleviate its effect on input lag. With a scanline index of 1 (or exactly at the start of your display's VBI), just as frame 1 begins being displayed, the CPU will start rendering frame 3. That is, exactly 2 pre-rendered frames. The higher the positive scanline index, the later the CPU will start rendering new frames, being able to achieve only 1 pre-rendered frame and a similar configuration as stand-alone Scanline Sync, but again, your hardware will have less overhead to complete the rendering before the next VBI. I should note that Scanline Sync is not always 100% accurate and the numbers regarding the scanline index may vary.
Unfortunately, Scanline Sync is not an all-encompassing solution. Every game works differently (different VSYNC implentation, number of pre-rendered frames, buffer swap at the start or end of VBI), and Scanline Sync may interact negatively. Sometimes it causes stuttering and a sawtooth input lag effect, similar to Low-Lag VSYNC ON. I also think it messes up input read in some games, because in some situations I get bad camera judder when using keyboard&mouse, but not with a controller.
Another thing I would like to discuss is the behaviour of both sync methods when there is a framerate drop. When the frame buffer queue is emptied, VSYNC speeds up rendering in order to fill it again. Common sense leads me to think that it causes a slow motion effect, since I've observed that after a framerate drop, VSYNC stabilizes by rendering the next frames 3-8 ms apart, but they are still displayed every 16,7 ms.
Scanline Sync, on the other hand, halts the CPU until the next refresh if it misses a sync. That means the next frame will start rendering ~33 ms after the previous, so you may get more frequent and noticeable stutters, but less input lag and better frame pacing.
What I still don't get is how Scanline Sync achieves perfect glassfloor frametimes, even when the scanline jumps erratically. While VSYNC usually does a good job at keeping a consistent input lag and until-displayed-time, the time between frame starts fluctuate between 15-18 ms. With Scanline Sync, they stay at 16.680-16.690 ms. I'm not sure if it interferes with frame presentation or gametime:rendertime synchronization, but I haven't seen any other framerate limiter (in-game or external) manage to do that.
All these observations led me to wonder how console developers manage to alleviate these issues. I know that many games just use traditional double or triple buffering and suffer from stuttering when they fall below the target framerate, but there are others that feel smoother and with better frame pacing than PC even while fluctuating below 60 fps or capped at 30 fps. If you cap a game at 30 fps and use VSYNC in PC, you will still get heavy stuttering since frame rendering and delivery will not be in sync with the display.
I think I found the answer here:
https://developer.android.com/games/sdk/frame-pacing
If I'm not mistaken, you can achieve a similar behaviour to non-pipeline mode as described in the Android website just by using Scanline Sync + Enhanced Sync/Fast Sync/borderless windowed (to force triple buffering). For 30 fps, Scanline/2 + VSYNC/borderless works too.
Nvidia users might just use Adaptive VSYNC (if they prefer tearing instead of stuttering) or half-refresh-rate VSYNC, but these features don't seem to work well in all games and Radeon users can't benefit from them.
So, my final conclusions and recommendations are:
- In games (or emulators) in which VSYNC is forced ON, or when you don't care about input lag and just want something that works, just let VSYNC do its thing.
- For competitive or fast paced games in which reaction speed is key, use stand-alone Scanline Sync.
- For graphically demanding games in which you want a smooth and tear-free image, avoiding the input lag and frame pacing issues that comes with traditional VSYNC, use Scanline Sync in combination with Enhanced Sync/Fast Sync/borderless and give it a generous margin (scanline index at the middle or even at the top of the display, the equivalent of 1.5 or 2 pre-rendered frames instead of 3.5).
- If you can't get a stable frame rate that matches your display's refresh rate, use Scanline/2 + VSYNC/borderless, again, with a generous margin.