Pre-Rendered Frames (scam!?)

Website · Post by **RealNC** » 19 Mar 2021, 09:27

Kamen Rider Blade wrote: ↑
19 Mar 2021, 02:04
Chief Blur Busters, do most monitors wait for 1 entire frame of data before starting the rasterization process across each line?

Absolutely not. That would be awful. Most monitors only need a couple dozen scanlines, or couple hundred at worst. However, this assumes the monitor is running at its native res. If you use a non-native res (meaning you use display scaling rather than GPU scaling when running a non-native resolution), then many monitors will indeed need more scanlines before they start displaying anything, since they need to upscale the source signal and that requires processing. However, even then it's rather rare that they need to buffer a whole frame.

TVs are a different story. It's common there to see not only one, but several frames being buffered. This is usually circumvented by the TV's "game mode" setting.

YouTube · Post by **Chief Blur Buster** » 19 Mar 2021, 18:02

Rasterization = Scanout, And Yes, Both Are Real Time Nowadays

Kamen Rider Blade wrote: ↑
19 Mar 2021, 02:04
Chief Blur Busters, do most monitors wait for 1 entire frame of data before starting the rasterization process across each line?

Most all of them do subrefresh sync nowadays for the last 10 years. It's been a decade since most of them stopped doing full framebuffer latency.

First, At The Root Of Common Misconceptions: Lag Testing Methods Vary

The latency numbers you see on the Internet measured more than a frame is simply latency measurement methodology. Some people are measuring using a VSYNC ON latency tester, which can never have subrefresh latency for screen bottom edge. While others are using VSYNC OFF latency testers. So one site may measure 20ms and a different site 3ms. The moral of the story is latency benchmarking have different stopwatching methods. Stopwatch start can vary from Present() or start-of-VBI (making latency stopwatch start sensitive to sync technology), and latency stopwatch end can vary (which GtG % -- since a pixel fades in gradually as GtG is simply a slow fade). But if you use 1000fps VSYNC OFF latency testers, all current gaming monitors measure subrefresh Present()-to-photons latency nowadays, even if a different site measures 20ms lag because they used a 1080p 60Hz VSYNC ON lag tester device (i.e. Leo Bodnar) on a 240Hz monitor --creating 3 or 4 concurrent latency weak links not applicable to your gaming from sheer flawed lag testing method alone -- which doesn't help educate people that subrefresh latency has been routine on desktop gaming LCDs for over ten years.

(Because of different latency stopwatching methods of different websites, always compare lag numbers against numbers on the same website, and try to determine if their latency testing method is using the same sync technology you plan to use. LagTester.com and TomsHardware.com uses a VSYNC ON lag test devices, and RTINGS.com / TFTCentral.co.uk uses VSYNC OFF lag test devices).

All the big names (ASUS, ACER, ViewSonic, BenQs, etc) reliably does that now for their gaming monitors, as does most 144Hz-to-240Hz LCDs run at their native refresh rate.

Next, More About Modern Sub-Refresh Latency Capabilities

You might want to re-read my post above to understand better, as well as Custom Resolution Utility Glossary.

Blur Busters Termonology Glossary
"Rasterization" = "Scanout" (same thing)

So you can re-read our scanout-related articles as rasterization related. Cable rasterization is identical to panel rasterization on most panels. So they're usually sync'd (at least at max Hz). I talk a lot about "cable=panel scanout sync" and such.

So most gaming monitors (144Hz and up) now sync cable to panel scanout, since low lag is important. This is now common for modern IPS and TN panels. They don't need to framebuffer a full frame, they only need to linebuffer a few lines (just enough for HDMI/DP micropacket dejitter + color processing + overdrive processing + etc). You can also study the Area 51 forums to learn more about how modern LCDs operate currently.

Now, each pixel in this diagram has subrefresh latency on most 240Hz LCDs. Present()-to-Photons latency is mostly only cable-transmission and GtG latency in this specific case, I've seen as little as ~1.5second latency between software API Present() and the light glowing from the pixels:

The first scanline at top edge of a frameslice microseconds of latency on an analog VGA (like old GTX 680 connected via VGA to CRT), but adds to about 2-3ms of latency on say, a 240Hz 1ms IPS panel or 240Hz 1ms TN panel. This is because of codec and packet latency (Modern digital outputs such as DisplayPort/HDMI outputs is essentially sort of a modem with packetization) and the display motherboard has to buffer a few rows of pixels to dejitter the packets first. And pixel transitions (LCD GtG) takes time too.

But the bottom line is that's very sub-refresh; the cable is streaming almost straight onto the panel "in a manner of speaking" with only a small rolling window of a few pixel rows.

There are situations where lag can appear if your panel is rasterizing at fixed speed (e.g. fixed horizontal scan rate), which means some 240 Hz panels have high 60Hz lag (unless you use Quick Frame Transport tricks which is simply a high scanrate followed by a long blanking interval. So a 60Hz refresh cycle can be transmitted top-to-bottom over cable in 1/240sec, while simultaneously synchronously refreshed top-to-bottom onto the screen in 1/240sec, for lower VSYNC ON latency, if you need a low-lag low-Hz mode, on a high-Hz-capable panel, for example).

VRR is also rasterized too (which is why I can get FreeSync working on certain CRTs via a HDMI-to-VGA adaptor trick) -- it's just a varying-size blanking interval between refresh cycles. We've been using the same raster method for 100 years from the first 1920s TVs to current 2020s DisplayPort LCDs which still use the same raster sequence (in digital form) regardless of refresh rate or VRR technology.

GOOD: Any 240Hz panel run with 240Hz source (usually sync'd rasterization. Cable=panel realtime scanout)
BAD: Fixed-horizontal-scanrate 240Hz panel run at ATSC HDTV 60Hz (out-of-sync rasterization)
GOOD: Variable-horizontal-scanrate 240Hz panel run at ATSC HDTV 60 Hz (sync'd rasterization. Cable=panel realtime scanout)
BAD: Fixed-horizontal-scanrate 240Hz panel run with XBox/PS4 120Hz (out-of-sync rasterization)
GOOD: Variable-horizontal-scanrate 240Hz panel run with XBox/PS4 120Hz (sync'd rasterization. Cable=panel realtime scanout)
GOOD: Fixed-horizontal-scanrate 240Hz panel run with Quick Frame Transport 60Hz (Vertical Total 4500 to allow 60Hz refresh cycles transmitted in 1/240sec over cable for panels that can't scan slower than 1/240sec sweep) (sync'd rasterization. Cable=panel realtime scanout)
Etc.

Chief Blur Buster wrote:Crossposting an educational thread.

hmukos wrote: ↑
22 May 2020, 19:02
I would understand this if multiple frameslices to be scanned out in this cycle would be somehow preremembered and scanned only after as a whole. But doesn't scanout happen in realtime and show frame as soon as it is rendered?
There's a cable scanout and a panel scanout, and they both can be higher/lower than each other.

Jorim nailed most of it for the panel scanout level, though there should be two separate scanout diagrams to help understand context (scanout diagram for cable, scanout diagram for panel) whenever the scanout are different velocities.

However, there are some fundamental clarifications that is needed.

The frameslices are still compressed together because the frameslices are injected at the cable level, but the monitor motherboard is buffering the 60Hz refresh cycle to scanout in 1/240sec.

Fixed-Scanrate Panels

Fixed-scanrate panels create input lag at refresh rates lower than max-Hz, unless Quick Frame Transport is used to compensate.
I would bottom-align the 60Hz like this, however:

Scaler/TCON scan conversion "compresses the scanout downwards" towards the time delivery of the final pixel row. So about 3/4ths of 60Hz scanout is delivered before the panel begins refreshing at full 1/240sec velocity.

Also, sometimes this is intentionally done by a panel with a strobed backlight or scanning backlights, to artificially increase the size of VBI, to reduce strobe crosstalk (double image effects), by creating a VBI large enough to hide LCD GtG pixel response between refresh cycles (hiding GtG in backlight-OFF).

Flexible Scanrate Panels

However, some panels are scanrate multisync, such as the ASUS XG248, which has excellent low 60Hz console lag:

Learn more about Quick Frame Transport

For more information about compensating for buffering lag, you can use Quick Frame Transport (Large Vertical Totals) to lower latency of low refresh rates on 240Hz panels: Custom Quick Frame Transport Signals.

The Quick Frame Transport creates this situation:

This can dramatically reduce strobe lag, but Microsoft and NVIDIA needs to fix their graphics drivers to use end-of-VBI frame Present(). Look at the large green block, so frame Present() needs to be at the END of the green block, to be closer to the NEXT refresh (less lag!).

Microsoft / NVIDIA Limitation Preventing QFT Lag Reductions

Unfortunately, Quick Frame Transport currently only reduces lag if you simultaneously use RTSS Scanline Sync (with negative number tearline indexes) to move Present() from beginning of VBI to the end of VBI. So hacks have often been needed.

This simulates a VSYNC ON with a inputdelayed Present() as late as possible into the vertical blanking interval.

The software API, called Present(), built into all graphics drivers and Windows, to present a frame from software to the GPU. Normally Present() blocks (doesn't return to the calling software) until the blanking interval. But Present() blocks until the very beginning of VBI (after the final scan line) before releasing. Many video games does the next keyboard/mouse read at that instant right after Present() returns. So it's in our favour to delay returning from Present() until the very end of the VBI: That delays input reads closer to the next refresh cycle! Thus, delayed Present() return = lower input lag because keyboard/mouse input is read closer to the next refresh cycle.

A third party utility, called RTSS, has a new mode called "Scanline Sync", that can be used for Do-It-Yourself Quick Frame Transport.

Then that dramatically reduces VSYNC ON input lag (anything that's not VSYNC OFF) on both fixed-scanrate and flexible-scanrate panels, because the 60Hz scanout velocity is the same native velocity of 240Hz.

Great for reducing strobe lag, too!

(Not everyone at Microsoft, AMD, and NVIDIA fully understand this.)

We successfully reduced the input lag of ViewSonic XG270 PureXP+ by 12 milliseconds less input lag, while ALSO reducing strobe crosstalk, with this technique. ViewSonic XG270 120Hz PureXP+ Quick Frame Transport HOWTO.

Earlier, I tried large Front Porches, hoping that Microsoft inputdelayed to the first scanline of VBI before unblocking return from Present() API call. But unfortunately, Microsoft/NVIDIA unblocks Present() during VSYNC ON at the END of visible refresh (before first line of Front Porch). Arrrrrrgh. Turning Easy QFT, into Complex QFT.

But Wait! G-SYNC and FreeSync are Natural Quick Frame Transports

Want an easier Quick Frame Transport? Just use a 60fps cap at 240hz VRR. All VRR GPUs always transmit refresh cycles at maximum scanout velocity. Present() immediately starts delivering the first scanline at that instant (if monitor not currently busy refreshing or repeat-refreshing) since the monitor slaved to the VRR.

Present() is already permanently connected to the end of VBI during VRR operation. Unless the monitor is still busy refreshing (frametime faster than max Hz) or the monitor is busy repeat-refreshing (frametime slower than min Hz). As long as frametimes stay within the panel's VRR range, software is 100% controlling the timing of the monitor's refresh cycles!

This is why emulator users love high-Hz G-SYNC displays for lower emulator lag.

60fps at 240Hz is much lower latency than a 60hz monitor, because of the ultrafast 1/240sec scanout already automatically included with all 60fps material on all VRR monitors! The magic of delivering AND refreshing a "60Hz" refresh cycle in only 4.2 milliseconds (both cable and panel), means ultra-low latency for capped VRR

This is why VRR is is the world's lowest latency "Non-VSYNC-OFF" sync technology.

It doesn't help when you need to use fixed-Hz (consoles, strobing, non-VRR panels).

This Posts Helps you to:
- Understand Flexible-Scanrate LCD panels (most 1080p 144Hz panels, few 1080p 240Hz panels)
- Understand Fixed-Scanrate LCD panels (most 1080p 240Hz panels, most 144Hz 1440p panels)
- Understand Quick Frame Transport
- Understand Quick Frame Transport's ability to workaround low-Hz lag on Fixed-Scanrate Panels
- Understand VRR
- Understand How VRR is similar to Quick Frame Transport
- If you are a software developer, Understand that software controls triggering variable refresh monitor's refresh cycle via Present()

Variables matter! You can get lag if you don't know how to choose the right gaming monitor to pair up with your device, as not all 240Hz panels can do low-Hz properly.

To keep cable=panel synchronous you want:
(A) Choose the right sync technology for your application. VSYNC OFF, VSYNC ON, RTSS Scanline Sync, GSYNC, FreeSync, etc. They all have pros/cons. CS:GO might work better with VSYNC OFF, while stuttery games might work better with GSYNC (if stutter is so bad it hurts your aiming).
(B) Understand absolute latency is not everything for every single game. You've got other latencies involved such as latency jitter (aka stutter), so you may need to control things a bit via frame rate caps or alternative sync technologies, etc.
(C) Choose the right refresh rate. If you don't want to research which high-Hz panels can do low lag at low-Hz, then try to always use max refresh rate. 60fps at 240Hz is lower lag than 60fps at 60Hz.
(D) If you need lower-Hz low-lag, choose the right workaround (variable-scanrate-capable panel OR quick frame transport). Remember consoles don't do quick frame transport tricks with custom resolution utilities, but PCs can.
(E) Most latency comes from other parts of the latency chain rather than the display nowadays.

TL;DR: In fact, with proper user configuration of a typical 240 Hz gaming monitor, 90%+ of the latency is not the display fault anymore (for current recent 240Hz LCDs run at 240Hz refresh rate)

Those who want to read more about the Present()-to-Photons black box can read www.blurbusters.com/area51 and our Area51 forums. We talk a lot about rasterization stuff there (search keyword "scanout").

YouTube · Post by **Chief Blur Buster** » 19 Mar 2021, 18:17

chenifa wrote: ↑
19 Mar 2021, 07:02
Maybe this one works
https://ufile.io/plu0qygp

Thanks for this. Now I can see it.

Those are just momentary frametime spikes, so I suspect these are just mis-estimates of frame queue depth. I could tell your logging software is trying to estimate queue depth (floating point) rather than getting exact queue depth accurately (exact integers). Ignore those. Besides, it's just instantaneous latency spikes that lasts only a hundredth of a second, and instantly goes back to normal immediately after.

If you feel you have sustained latency problems, they are definitely not caused by those instantaneously brief (approx) ~1/100sec moments as the frametime spikes are only brief instantaneous latency spikes for a single frame. If you're feeling sustained latency somewhere else, then it's caused by something else.

Accurate frame queue depth estimating is sometimes difficult to do, given the benchmark/logging software may not have full access to the game engine's behaviour nor the driver's behavior, and can only estimate it. So when your logging software says a frame queue depth of 3, it may be a false estimate from frametime spike caused by something else. The number patterns of the frame queue depth clearly smells "best effort estimate" rather than "guaranteed exact number" to me. Sometimes software has to estimate because of lack of APIs to get exact data deeper into the software it's trying to measure.

chenifa · Post by **chenifa** » 21 Mar 2021, 07:02

Chief Blur Buster wrote: ↑
19 Mar 2021, 18:17

chenifa wrote: ↑
19 Mar 2021, 07:02
Maybe this one works
https://ufile.io/plu0qygp
Thanks for this. Now I can see it.

Those are just momentary frametime spikes, so I suspect these are just mis-estimates of frame queue depth. I could tell your logging software is trying to estimate queue depth (floating point) rather than getting exact queue depth accurately (exact integers). Ignore those.

It's nvidia frameview, but if it's not accurate fair enough. Is there another benchmark more accurate we could do?

Website · Post by **RealNC** » 27 Mar 2021, 08:12

chenifa wrote: ↑
21 Mar 2021, 07:02
It's nvidia frameview, but if it's not accurate fair enough. Is there another benchmark more accurate we could do?

I don't know about an "accurate benchmark", but you can see the effects of the "low latency" setting (previously named "pre-rendered frames") without any tools.

Don't use furmark or anything. Just run the game. In my test, I use Witcher 3:

Disable G-SYNC in NVCP, because the "ultra" setting for low latency will force an FPS cap and you don't want that.
Set vsync to "on" or "off" in NVCP, depending on whether you want to test vsync's effect on this or not.
Set "preferred refresh rate" to "highest available" (because disabling g-sync reverts that option back to "app controlled.")
If you have a high-end GPU, enable DSR 4x so that you can be sure Witcher 3 will max out your GPU.
Set "low latency mode" to "off".
Start the game, select the highest resolution, crack all postprocessing and graphics settings to maximum.
Disable the "hardware cursor" setting in the game's graphics settings.

Now just move the mouse around. The in-engine rendered mouse cursor should be very laggy.

Next, exit the game, set "low latency mode" to "on" in NVCP, start the game again. The in-game mouse cursor should now have less lag. With "ultra", it should have even less. Same goes for the game itself of course. Moving the mouse to pan the camera should now be less laggy. I use the game's software mouse cursor because it takes less time just load the game and test in its title screen compared to actually loading a game.

Any other GPU-heavy game will do as well for this test of course. I just mentioned Witcher 3 here because it has the option to use an game engine rendered mouse cursor, which enables quick testing without having to load up the full game between each test.

In the past and with an old driver where the "low latency" setting was still called "prerendered frames", I did the above test with a setting of 7 prerendered frames through nvidia profile inspector. This produced the expected effect. Input lag was extreme. Like half a second or so. The lower I've set it to, the less the input lag got, with a setting of 1 (now called "on" in the NVCP) producing the least lag.

I can actually see the effect of "on" vs "ultra" when playing GPU-bound games that can not maintain my target FPS. For example, I set my FPS to 100. When the game maxes out my GPU and falls below that, with "on" I can perceive that something is wrong. The near-zero lag feel of g-sync + 100FPS cap goes away. If I set it to "ultra", I actually cannot tell anymore. 100FPS with 90% GPU load and, say, 90FPS with 99% GPU load feel the same to me. I'm sure there is a small difference still, but nothing I can actually perceive anymore.

timecard · Post by **timecard** » 27 Mar 2021, 09:40

This is demonstrating the user perceived impact of prerendered frames on the CPU right?

Website · Post by **RealNC** » 27 Mar 2021, 13:23

timecard wrote: ↑
27 Mar 2021, 09:40
This is demonstrating the user perceived impact of prerendered frames on the CPU right?

The impact on input lag.

I just re-ran this test myself. I've set low latency mode to "off" in NVCP for the game, then configured 8 prerendered frames in profile inspector. Lag is still increased a lot, but not as much as when I was still running my old CPU, which was a 4c/4t i5 2500K. With the 8c/16t R7 3700X I use now, I'm not seeing the ridiculous half a second input lag anymore. But it's still bad, just not ludicrously bad as it was on the old CPU.

It might be worth checking out what happens when disabling CPU cores and SMT/HT in the BIOS and run a modern CPU with 4c/4t.

Kamen Rider Blade · Post by **Kamen Rider Blade** » 27 Mar 2021, 16:18

Should all games and Real Time Rendering software factor in the Upper Refresh Rate limit of the panel into their rendering stack and adjust their rendering portion to match it? Even if it's some form of VRR/GSync/FreeSync or regular V-Sync?

Would this help with input latency?

chenifa · Post by **chenifa** » 29 Mar 2021, 06:40

RealNC wrote: ↑
27 Mar 2021, 08:12
I don't know about an "accurate benchmark", but you can see the effects of the "low latency" setting (previously named "pre-rendered frames") without any tools.

How do you explain the difference you seeing between on and ultra, with gsync disabled? Both should have 1 pre-rendered frame, yet they feel different.
Even more dubious is that I can tell the difference between off + max pre-render 1(set in nvinspector) and ultra in a game like kovaak, which doesn't max out gpu at all.
My theory is that setting max prerender to 1 is only part of what low latency mode does and that it has other ways to reduce latency(maybe it changes buffering).

Website · Post by **RealNC** » 29 Mar 2021, 10:05

chenifa wrote: ↑
29 Mar 2021, 06:40
How do you explain the difference you seeing between on and ultra, with gsync disabled? Both should have 1 pre-rendered frame, yet they feel different.
Even more dubious is that I can tell the difference between off + max pre-render 1(set in nvinspector) and ultra in a game like kovaak, which doesn't max out gpu at all.
My theory is that setting max prerender to 1 is only part of what low latency mode does and that it has other ways to reduce latency(maybe it changes buffering).

You could always set it to 1 from within in the nvidia panel before the "ultra" mode was added. So yes, "ultra" doesn't just mean a value of 1, since we already had that. The "ultra" setting basically acts like a prerender queue of 0. The driver will completely prevent queuing frames before the GPU is actually ready to render them (or at least shortly before it's ready to render them.)

Normally, a frame limiter does that as well. Except when the game is not reaching the limiter's target FPS because GPU load is too high. So when that happened, there was a small latency increase, but not small enough as to be completely unnoticeable. With the "ultra" setting, it's now, at least for me, unnoticeable.

I don't know why this would help in situations where the GPU isn't being maxed out though.

Blur Busters Forums

Pre-Rendered Frames (scam!?)

Re: Pre-Rendered Frames (scam!?)

Re: Pre-Rendered Frames (scam!?)

Re: Pre-Rendered Frames (scam!?)

Re: Pre-Rendered Frames (scam!?)

Re: Pre-Rendered Frames (scam!?)

Re: Pre-Rendered Frames (scam!?)

Re: Pre-Rendered Frames (scam!?)

Re: Pre-Rendered Frames (scam!?)

Re: Pre-Rendered Frames (scam!?)

Re: Pre-Rendered Frames (scam!?)