HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

YouTube · Post by **Chief Blur Buster** » 17 Aug 2021, 00:55

elexor wrote: ↑
16 Aug 2021, 20:53
Can also reduce lag with strobed lcd because less strobedelay is needed for crosstalk reduction, but this does not apply to oled.

A manual 60Hz QFT-2x may speed up an LG OLED BFI by up to 8ms (time difference of 1/60sec and 1/120sec).

Basically modify the 1080p VT1125 120Hz signal into a 1080p VT2250 60Hz signal for QFT-2x at 1080p (or 4K VT2250 120Hz signal into a 4K VT4500 60Hz signal for QFT-2x at 4K).

The lag reduction will be a different purpose is caused by faster delivery of GPU memory to the OLED's buffer memory, so that it can color-process sooner, and rolling-scanout sooner.

My custom in-house photodiode tester shows surprising latency reductions with undocumented QFT on many VRR-compatible displays. Sometimes bugs happen and lag reductions do not happen. But lag reductions happened anyway on some displays (OLED, LCD) with undocumented QFT support. This was because of their buffering behaviors, i.e. buffering a slow-delivering 60Hz signal for a fast scanout to a fixed horizontal scanrate panel. QFTing it to max scanrate essentially sped up the buffering (the transfer of GPU frame buffer to the TV frame buffer).

Many TV and monitors have a frame buffer in them for internal scan conversion (buffering low-Hz signals for fast scanout at max Hz velocity).

The sheer fact VRR was invented, was borderline miraculous for accidental undocumented QFT support for low fixed-Hz modes. QFT is just essentially a 0Hz-range VRR mode -- metaphorically an EDID-side framerate cap. Long-time VRR users with emulators and console gamers, know that 60fps-cap at 120Hz VRR is much lower lag than traditional 60Hz fixed-Hz mode. From a signal timings perspective, and latency benchmarking perspective, QFT is metaphorically a hardware fixed-Hz EDID version of a software low-capped VRR mode.

YouTube · Post by **Chief Blur Buster** » 17 Aug 2021, 01:00

elexor wrote: ↑
15 Aug 2021, 20:40
also Quick Frame Transport currently only reduces lag if you simultaneously use RTSS Scanline Sync (with negative number tearline indexes) to move Present() from beginning of VBI to the end of VBI.

Actually. In certain cases, you still get lag reductions without RTSS Scanline Sync.

With ordinary Microsoft/NVIDIA "VSYNC ON", I get lag reductions with these:

- Strobed VSYNC OFF on any panel (QFT allows strobe to occur earlier in monitor backlight firmware)
- Strobed VSYNC ON on any panel (QFT allows strobe to occur earlier in monitor backlight firmware)
- Nonstrobed Low-Hz VSYNC OFF on fixed-scanrate panels (QFT bypasses low-Hz buffering delay in scaler/TCON)
- Nonstrobed Low-Hz VSYNC ON on fixed-scanrate panels (QFT bypasses low-Hz buffering delay in scaler/TCON)

RTSS Scanline Sync is needed for lag reductions on non-strobed on flexible scanrate panels.
Kaldaien, visiting here, I presume, is now working on reproducing something similar (hopefully) in Special K.

Nonstrobed VSYNC OFF (if max Hz or if a flexible scanrate panel is used) has further no lag improvements with QFT, since there is no buffering (realtime scanout via tight rolling window) and tearlines are randomized throughout the entire refresh cycle (VBI and non-VBI), regardless of VBI. It still averages out to the same latency regardless of VBI size for any specific Hz.

The net result is cumilative input lag savings of two separate lag-saving stages:

(1) Speed up buffer transfer from GPU memory to monitor memory; AND
(immediate lag savings for monitors that are in full frame-buffering mode rather than realtime cable=panel scanout)

(2) Inputdelay subsequent input reads to the beginning of QFT-accelerated refresh cycle
(Lag savings requires custom software modifications, or RTSS Scanline Sync, or modifications to Special K)

We can still get lag savings with just (1) or (2), but combining (1)&(2) at the same time is superior. Combined, lag savings can actually exceed a full refresh cycle because step (1) saves 12.9ms lag when doing 60Hz on for buffered 240Hz refresh monitors, and (2) is yet another 12.9ms lag savings via inputdelaying 12.9ms closer to the next VBI, for a total lag savings of almost ~26ms for screen bottom edge of 60Hz fighting games and 60Hz emulators!

Kaldaien · Post by **Kaldaien** » 21 Aug 2021, 02:31

Chief Blur Buster wrote: ↑
17 Aug 2021, 00:45

Kaldaien wrote: ↑
16 Aug 2021, 16:00
Oh, that is interesting. I have code in place to under/overshoot VBLANK, but it is relative to the beginning of VBLANK. I move the target just slightly before VBI fires off so that the present never gets rescheduled to the next refresh cycle.
This is a hugely annoying part of Windows behavior, its compositing pipeline is not QFT-friendly (yet...) (...deep sigh...)

Auto Low Latency (ALL) has nothing to do with Quick Frame Transport (QFT) -- they are two independent latency lowering standards Although ALL could technically negotiate QFT in a roundabout way... ALL still can lower latency by itself via automatically disabling lag-increasing processing (e.g. automatically disabling interpolation or enhanced color processing modes), and QFT can lower latency by itself by faster signal transport).

Yeah, I realize this now. The setting seems like the ideal way to present a computer with an alternate EDID, and given G-Sync support is not recognized until turned on, I was under the impression the setting did more than it actually does.

Chief Blur Buster wrote: ↑
17 Aug 2021, 00:45

Kaldaien wrote: ↑
16 Aug 2021, 16:00
I guess it goes without saying the discussion assumes VSYNC is off A present that occurs in the middle of blanking tends to be penalized with a 1 frame delay if you let Windows decide where the deadline is and you really have to arrive even earlier than that if DWM composition is active. So I've been moving things the other way forever.
In custom frame rate capping software (using tearingless VSYNC OFF algorithms to emulate bufferless VSYNC ON) you really want to delay Present() to the end of the VBI, to gain the proper latency reductions. This inputdelays whatever your game/software is doing closer to frame presentation timing.

You still should have an option to select end-of-VBI presentation even with Fast Sync or Enhanced Sync algorithms since certain versions of some graphics rivers are reasonably well-designed to allow frame presentation to flush to the screen as long as presentation occurs before end of VBI. But it must be made optional, only VSYNC OFF is the way to guarantee it.

In my experience, tearingless VSYNC OFF is reliable on all modern GPUs in High Performance Mode (microsecond timers enabled) and GPU power management is force-disabled by making sure GPU is revved up at least 1-2ms before a microsecond-precision-delayed Present(). However, with an ultra large VBI, you can Present() early in VBI, but don't return from Present() until end of VBI. So the "simulated VSYNC ON" blocking behavior is now end-of-VBI. So you might theoretically modify a frame rate capping algorithm to the following:

1. Continue your existing algorithm internally (present to screen just before end of VBI), to let compositor work properly
2. But block returning to your application software until just exactly after the VBI.

This provides an immediate instant universal 12ms lag reduction for 60Hz single-strobed emulators including XG2431's single strobe support: www.blurbusters.com/xg2431 ... No software BFI needed; it's true native 60 Hz single strobe support.

This forces an inputdelay to the software since the next inputread is often done immediately upon return of Present(). So this would end up becoming a VSYNC ON compatible QFT, as a framerate capper "present timing modification" to make Windows QFT-friendly.

It's my understanding Special K frame rate capper already has a built-in inputdelayer, so adding QFT support to Special K probably is easy.

If you get one of those new 60Hz single strobed LCDs that I have worked on and released this summer, download www.blurbusters.com/strobe-utility-viewsonic (ViewSonic XG2431)

Thanks for the encouragement,

After a few hours of work, I was able to get the general timing right to do tear free V-Sync OFF. A feat I was conditioned to believe over the last 20 years should have become completely impractical in modern graphics APIs

It is just as you theorize, doing this means decomposing my framerate limiter into two waits, one to schedule Present / SwapBuffers on the necessary alternate blanking boundary and a second to actually block the game's render loop and implement steady frame pacing.

I use a single serialized sequence of events in my current implementation, and rendering suffers from the removal of CPU work-ahead.

Present blocks until VBLANK and deviations lead to dropped frames with no cushioning. Given the way V-Sync normally works, this is unsurprising. It only becomes a blocking call after multiple of them go incomplete, and up until the graphics API's render-ahead limit is hit, no push-back from VBLANK is expected.

I think to keep this asynchronous behavior, I will need to move the game's Present calls onto a separate thread and implement my own render-ahead. Doing that will increase latency a variable amount, but losing CPU/GPU parallelism by serializing Present -> VBLANK -> Input on the same thread is kryptonite for real-world games. Those things must occasionally proceed out of order or it cascades into stutter.

The render queue, or lack thereof, is the final piece of the puzzle I must solve.

Is there a reason I might need to expose user configurable scanline targets the way Scanline Sync does? My understanding is there should be an ideal calculable time to flip buffers for scan-out, and the end-user should not need to be burdened with this stuff since I already know the active display's timings.

YouTube · Post by **Chief Blur Buster** » 22 Aug 2021, 16:41

Kaldaien wrote: ↑
21 Aug 2021, 02:31
Is there a reason I might need to expose user configurable scanline targets the way Scanline Sync does? My understanding is there should be an ideal calculable time to flip buffers for scan-out, and the end-user should not need to be burdened with this stuff since I already know the active display's timings.

Yes, what you do should be made easier than RTSS Scanline Sync by default if possible.

Well, advanced users may prefer control. But it should be more automatic than RTSS Scanline Sync. I know there’s algorithms to fully automate this, and you should automatically have a checkbox [X] to decide automatic treatment for missed VSYNC’s — e.g. tear the next refresh cycle, or WAIT for the next VBI (flip on beginning of VBI, flip on end of VBI).

So you should have “Auto” by default with an optional “Manual” mode.

Also, consistent latency can be important, as it screws royally badly with aim training effects — even 1ms unexpected latency change can throw off an existing esports-train, please read my article-post The Amazing Human Visible Benefits Of The Millisecond. Not all milliseconds are human visible, but I assure you, SOME of them definitely are.

Caveat emptor with assumptions on Blur Busters — they are unceremoniously shot down around here, thanks to the Vicious Cycle Effect. Higher resolution, bigger displays, wider FOV, higher refresh rates all simultaneously combine to make effects of single millseconds more and more visible.

Oh, and precise frame pacing of unblocks can help reduce stutter — have an option where you don’t vary time interval between unblocks if you can, if you are stutter-priority. I can teach amazing things that — even jitter (70Hz ultrafast microstutter occuring during 410fps on a 480Hz display) blends to 1-pixel motion blur, which is also why things look clearer on a 360Hz display with the Razer 8KHz mouse rather than an everyday 1000Hz mouse, because of the elimination of high-frequency jitter blending to motion blur. Stutter and blur is a continuum, it is the same thing — see www.testufo.com/vrr and see www.testufo.com/eyetracking#speed=-1 — and it’s quite obvious that stutter and persistence blur is the same thing, like slow (vibrate) vs fast (blurry) guitar or harp strings. So if gametime:photontime jitters by 1ms, I can see the stutter caused by 1ms, as long as display persistence is less than the stutter-error. So a 1ms MPRT strobed display massively amplifies the visibility of tiny 2ms framepacing errors if the framepacing (Even if hidden by VSYNC ON) has the gametime inside the frames jittering by 2ms and rendered object positions inside these frames are off-by-2ms despite being perfectly VSYNC ON framepaced.

If you ever wondered "why is strobing so jittery compared to non-strobing"? -- THIS IS IT. Mange your Present() unblock pacing. If you want things to be EVEN more accurate, clock the beginning of the renders too -- so gametime:presenttime is constant despite varying rendertimes. You probably already know this.

Ideally game rendering algorithms should be smarter than that, but game developers do so much crazy shit, so you need to include precision modes for keeping the Present() unblocks microsecond accurate in advanced mode, with easy mode being more realtime floating (in a selective way).

So if MPRT is equal or less than the gametime:photontime relativity error, I can see the human visible stutters. Sometimes the next gametime timestamp is captured during present-time unblocks in some engines, which means things start to stutter if you vary that time-relativity. Coincidentially, that’s why a 4ms stutter is hidden in a 33ms 30fps nonstrobed display (MPRT barely 10% of frametime), but is a huge stutterjump during 240fps 240Hz VSYNC ON strobed (MPRT is frametime). Which is why strobing amplifies jittering/stuttering. Which is why I recommend framerate=Hz. Which is also why strobe lovers also love RTSS Scanline Sync and QFT for low-lag strobing. The smaller the MPRT (due to sheer Hz or from strobing), the more visible frametime error are in the rendered frames.

Ideally, you want perfect relative-time sync between gametime, rendertime, frametime, presenttime, photontime.

So, bottom line, have both an automatic and a manual setting, please.

Automatic: Easy for users, maybe two settings “Smoothness priority automatic” and “Latency priority automatic”. You would do a real-time automatic calculation as you do.

Manual: Easy to optimize for a sweet spot compomise for a specific game or specific emulator, perhaps loaded as a profile on a per-game basis or per-refreshrate/VT basis. Having a manual phase offset (e.g. fixed inputdelay) can keep things consistent.

And in the manual mode, please have a setting that enables/disables force flush e.g. Flush() after Present(), including both soft-flush and hard-flush. Hard flush will slow framerates by 50% but make present timing microsecond-accurate, which can be useful for lowering latency of low-GPU games such as emulators where low lag precise framepacing is more critical.

You can use waitable swapchains + full screen exclusive mode, to get full control over guaranteeing that your current Present() hits the next refresh cycle, regardless of GPU.

Remember to get the handle of the correct monitors for both the scanline check and for the waitable swapchains, so your software works correctly on a different-Hz multimonitor system, RTSS Scanline Sync still has a small bug with that. (Also prompt an error message if mirror is occuring — this will make beamraced VSYNc algorithm unreliable but at least it’s possible to query the system whether a multimonitor system is currently mirrored mode).

Theoretically you can detect VRR in a generic roundabout way on Windows by measuring Present()-vs-D3DKMTGetScanLine() if you wanted to go overkill on plug-n-play automaticness on any-sync multimonitor systems, for the ultimate GPU-agnostic plug-and-play scanline sync.

Display warnings about scan line sync unrealibility every time:
- You detect monitor surround mode; or
- You detect monitor mirroring mode; or
- You detect battery saver power management mode; or
- You detect fullscreen exclusive mode is unavailable (unless you do the Present-before-VBI method);
- (optional) You detect you’re running in a VM; or
- (optional) You detect VRR is enabled (Via direct API call, or via heuristics on Present()unblock-vs-D3DKMTGetScanLine()

Sufficient APIs exist to do all the above, if one wished for something vastly easier than RTSS Scanline Sync.

Also, profiles will be useful to automatically disable the algorithm for games and VR apps using its own built-in beam-raced-VSYNC algorithms. I have contributed my lagless VSYNC algorithm (emulator raster synchronized with real raster) to a few emulators, and is already implemented in WinUAE: https://blurbusters.com/blur-busters-la ... evelopers/

As the resident expert of Present()-to-Photons (P2P) black box, I’m happy to answer questions about the black box. Don’t be afraid to ask more questions needed to become a Tearline Jedi, since what you’re doing requires an adequate understanding of P2P.

BTRY B 529th FA BN · Post by **BTRY B 529th FA BN** » 15 Sep 2021, 21:30

How do I do this for my PG259QNR? I've opened CRU, clicked Add, and this shows up. I'm assuming the refresh-rate should say 360Hz. I have the NVCP set to Fixed Refresh. Monitor OSD info says 360Hz. I've manually set the cru to 360Hz and and the sans serif turns red.

YouTube · Post by **Chief Blur Buster** » 19 Sep 2021, 13:36

BTRY B 529th FA BN wrote: ↑
15 Sep 2021, 21:30
How do I do this for my PG259QNR? I've opened CRU, clicked Add, and this shows up. I'm assuming the refresh-rate should say 360Hz. I have the NVCP set to Fixed Refresh. Monitor OSD info says 360Hz. I've manually set the cru to 360Hz and and the sans serif turns red.

You've correctly loaded up the max-Hz timings and changed to Manual. Now you need to:

1. Move first radio button from "Back porch" to "Total"
2. Move second radio button from "Refresh rate" to either "Horizontal" or "Pixel Clock"
3. Now the ONLY box you want to edit is the Vertical Total (the number "1125" under Vertical and to right of Total)

Increasing that will lower the refresh rate. In all situations, your refresh cycle will be transmitted over the video cable at max Hz (1/280sec). Doubling the VT will create a half-Hz mode with a max-Hz DP/HDMI cable transmit velocity. Quadrupling the VT will create a quarter-Hz DP/HDMI mode with a max-Hz cable transmit velocity.

Don't edit other numbers, the Hz will automatically calculate everytime you edit the Vertical Total (increasing the "1125" to a bigger number)

BTRY B 529th FA BN · Post by **BTRY B 529th FA BN** » 19 Sep 2021, 15:26

Chief Blur Buster wrote: ↑
19 Sep 2021, 13:36

BTRY B 529th FA BN wrote: ↑
15 Sep 2021, 21:30
How do I do this for my PG259QNR? I've opened CRU, clicked Add, and this shows up. I'm assuming the refresh-rate should say 360Hz. I have the NVCP set to Fixed Refresh. Monitor OSD info says 360Hz. I've manually set the cru to 360Hz and and the sans serif turns red.
You've correctly loaded up the max-Hz timings and changed to Manual. Now you need to:

1. Move first radio button from "Back porch" to "Total"
2. Move second radio button from "Refresh rate" to either "Horizontal" or "Pixel Clock"
3. Now the ONLY box you want to edit is the Vertical Total (the number "1125" under Vertical and to right of Total)

Increasing that will lower the refresh rate. In all situations, your refresh cycle will be transmitted over the video cable at max Hz (1/280sec). Doubling the VT will create a half-Hz mode with a max-Hz DP/HDMI cable transmit velocity. Quadrupling the VT will create a quarter-Hz DP/HDMI mode with a max-Hz cable transmit velocity.

Don't edit other numbers, the Hz will automatically calculate every-time you edit the Vertical Total (increasing the "1125" to a bigger number)

Thanks. I did read the directions several times through in the opening post. But am I suppose to manually set the Refresh rate to 360Hz then start doubling the Vertical Total? Because the refresh rate starts at 60Hz. Shouldn't it start at 360Hz? If I leave the Refresh rate of 60Hz and start doubling the Vertical Total the Refresh rate drops to 30Hz, lol. It seems like I'm doing something wrong. I'm assuming it should detect the 360Hz refresh rate. If not why does it start at 60Hz?

YouTube · Post by **Chief Blur Buster** » 19 Sep 2021, 16:27

BTRY B 529th FA BN wrote: ↑
19 Sep 2021, 15:26
Thanks. I did read the directions several times through in the opening post. But am I suppose to manually set the Refresh rate to 360Hz then start doubling the Vertical Total? Because the refresh rate starts at 60Hz. Shouldn't it start at 360Hz? If I leave the Refresh rate of 60Hz and start doubling the Vertical Total the Refresh rate drops to 30Hz, lol. It seems like I'm doing something wrong. I'm assuming it should detect the 360Hz refresh rate. If not why does it start at 60Hz?

I don’t understand your question - but the ToastyX interface, is, indeed confusing.

You’re supposed to start at max Hz. The order of the plug and play list may be in scrambled or reverse order, simply load the existing highest-Hz mode. It’s an existing 360 Hz mode in the “plug and play” signalling of the monitor. Then modify VT to lower the refresh rate while keeping 1/360sec frame transport over cable.

It may be in a different part of ToastyX (e.g. CEA-861 Extension Block) if a 360 Hz mode is missing in one part of ToastyX, as there has been multiple plug-and-play standards (EDID, E-EDID, DisplayID, extensions like CEA861, etc) so the soup of resolutions and refreshes are all over the place.

bhoff · Post by **bhoff** » 21 Oct 2021, 17:41

I tried this with my BenQ XL2411P but I get a black screen, is it not compatible with this monitor?

I tried doing 144 to 60 & 120 to 60 but always get a black screen. See attached image

YouTube · Post by **Chief Blur Buster** » 22 Oct 2021, 09:59

bhoff wrote: ↑
21 Oct 2021, 17:41
I tried this with my BenQ XL2411P but I get a black screen, is it not compatible with this monitor?

I tried doing 144 to 60 & 120 to 60 but always get a black screen. See attached image

To give advice, I need a screenshot of the original pre-tested working 144 Hz non-QFT mode too. Thanks!

Blur Busters Forums

HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

Re: HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

Re: HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

Re: HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

Re: HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

Re: HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

Re: HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

Re: HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

Re: HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

Re: HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)

Re: HOWTO: Quick Frame Transport (QFT) - Large Vertical Totals (reduce lag, reduce crosstalk)