PresentMon: measure both frame throughput (fps) and latency

Everything about latency. Tips, testing methods, mouse lag, display lag, game engine lag, network lag, whole input lag chain, VSYNC OFF vs VSYNC ON, and more! Input Lag Articles on Blur Busters.
1000WATT
Posts: 391
Joined: 22 Jul 2018, 05:44

Re: PresentMon: measure both frame throughput (fps) and late

Post by 1000WATT » 15 Jul 2019, 01:43

crossjeremiah wrote:
60hz (xl2430t) = 21ms with scanline sync + vsync (0.01) method
60hz (aw2518h) = 16.82ms with scanline sync + vsync (0.01) method
i just ran the program when i ran dolphin and it told me the latency
Someone explain how with the same settings and the same hz get different results?
I often do not clearly state my thoughts. google translate is far from perfect. And in addition to the translator, I myself am mistaken. Do not take me seriously.

Vleeswolf
Posts: 37
Joined: 25 Aug 2017, 15:59

Re: PresentMon: measure both frame throughput (fps) and late

Post by Vleeswolf » 15 Jul 2019, 04:04

1000WATT wrote:
crossjeremiah wrote:
60hz (xl2430t) = 21ms with scanline sync + vsync (0.01) method
60hz (aw2518h) = 16.82ms with scanline sync + vsync (0.01) method
i just ran the program when i ran dolphin and it told me the latency
Someone explain how with the same settings and the same hz get different results?
Assuming GPU model and NVIDIA/AMD control panel settings were identical, reason could be that xl2430t does not have Gsync, while aw2518h does. Gsync does allow flips to occur when frames are ready, instead of waiting on fixed frequency vblank signal.

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: PresentMon: measure both frame throughput (fps) and late

Post by Chief Blur Buster » 15 Jul 2019, 11:47

Remember PresentMon does NOT do present-to-photons on a per-pixel basis.

In some cases, latency gradients can sometimes be more important than absolute lag.

For example, the 21.82 might have a "plus one refresh" latency immediately above the tearline. If you steer a tearline or roll a tearline, remember that the input lag immediately above the tearline is 1 frame laggier than the input lag immediately below the tearline.

PresentMon simply benchmarks one small part of the GPU pipeline: How long Present() takes -- the API that a game uses to deliver a frame to the GPU. This often coincide with the first pixels beginning to become output at the monitor output. This doesn't take into account of things like scanout latency or monitor latency. Scanout latency is a complicated topic with VSYNC OFF, but you can study the high speed videos of scanout latency and also the diagrams, to understand that latency can still be distorted in the chain AFTER PresentMon. The true photons-to-pixels latency.

So it is sometimes in one interest to stick to perfect Scanline Sync (VSYNC + 0.0000000000 perfection), rather than use the 0.01 which causes the rolling-tearline effect and the sawtooth-varying input lag effect caused by a rolling tearline. I prefer slightly laggier but perfect zero-sawtooth.

The most glassfloor possible with Scanline Sync is to try to use ForceFlush of 1 or 2, and make sure that GPU utilization is only roughly 30%, and then calibrate the VSYNC OFF tearline stationary and then steer the stationary tearline off the screen. If the tearline jumps around too much or vibrates a wide amplitude, use a large vertical total that's taller than the jitter-amplitude of your tearline, and steer your tearline jitter completely in the enlarged-VBI between the two refresh cycles.

Add forceflush parameter to the RTSS config file, reduce game detail, reduce refresh rate, make sure refresh rate is at the bottom end of your framerate range (e.g. a "fluctuating 100fps-200fps" game may sometimes look much better at 100Hz than 144Hz) -- and make sure GPU utilization is low (30%-ish) until click to the 0.00000000000000 glass floor, no differential, and things become magically smooth and consistent latency.

The bonus of that is you've got glassfloor latency per-pixel, and it also produces the best strobefeel (ULMB, LightBoost, DyAc, etc) with the most predictable aimfeel even though it might be a few milliseconds laggier. That's because you don't want lag randomization -- I prefer glassfloor 10ms consistency -- over erratic 3ms-9ms -- and you don't want the weird latency-gradient effect of a slowly rolling tearline where half of the screen on other side of the tearline is noticeably laggier, etc.

If you're 240Hz then the refresh-cycle granularity is only 4ms, but if you're 100Hz, then the refresh cycle granularity is 10ms, so controlling the lag-differentials on the opposite sides of a tearline, is more critical at lower refresh rates (especially since strobed operation, ULMB/LightBoost, tends to look visually vastly smoother, jitter-free and less crosstalky at lower Hz when you're maintaining perfect fps=Hz to avoid the amplified microstuttering-effect).

LatencyMon is amazing and it definitely needs to be a tool of a great tweaker but one also needs to understand how VSYNC OFF interacts with scanout, and the difference in latency above/below a tearline. This matters less at 300fps where the tearlines are sprayed randomly all over the place and just want the lowest average competitive latency, but matters more if you're wanting the most glassfloor per-pixel latecy by keeping tearlines nearly permanently offscreen in the VBI between refresh cycles.

Buffers are used with VSYNC OFF to make things so much smoother so it's really hard to tweak. What saves grace is GPU overkill nowadays -- some games we play only utilize ~30% of the GPU -- and that presents an opportunity to eliminate the extra queue of framebuffers to save lag without adding stutters. But it's not easy.

It takes a lot of work to optimize the imperfections away from the 0.000000000 differential (it's hard: You need 30% GPU utilization + lower detail + lower Hz + enable Force Flush), but the 0.0000000 differential is more magical if you're successful in tweaking the hard-to-achieve stutterless glassfloor 0.0000 perfect framerate=Hz match in a low-lag manner.

However, sometimes that's not your goal, and you don't mind the tearing or can't feel the latency inconsistencies, or you're using a high enough framerate / high enough refreshrate, that the refresh-rounding effects (4ms at 240Hz) don't matter much to you -- so the 0.01 differentials are much easier of a compromise.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

crossjeremiah
Posts: 44
Joined: 14 Aug 2017, 10:21

Re: PresentMon: measure both frame throughput (fps) and late

Post by crossjeremiah » 15 Jul 2019, 15:15

So vsync on no scanline sync I'm getting 16ms at 60hz. With vsync off scanline sync I'm getting 0.25ms(that cant be right and with Vulkan I'm getting 0.05(that doesn't seem right either)

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: PresentMon: measure both frame throughput (fps) and late

Post by Chief Blur Buster » 15 Jul 2019, 17:01

crossjeremiah wrote:So vsync on no scanline sync I'm getting 16ms at 60hz. With vsync off scanline sync I'm getting 0.25ms(that cant be right and with Vulkan I'm getting 0.05(that doesn't seem right either)
They're correct if they're Present() delivery time. It is not button-to-photons latency.

When I was working with an app called "Tearline Jedi" (unreleased demo) to benchmark the time interval of Present(), it was able to execute in less than 1/8000sec -- approximately 0.125ms. Present() is taking 0.125ms in this YouTube video -- real raster Kefrens Bars on a GeForce GTX 1080 Ti. Precise beam racing technique -- with raster-controlled VSYNC OFF frame slices only a few pixels tall for realtime streaming of blocky pixels out of the GPU output! Essentially bruteforcing a GeForce GPU into an Atari 2600 TIA which had no framebuffer, and you had to essentially render-and-output scanlines in realtime on the scanout. (Tearlines are just rasters)

phpBB [video]
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

crossjeremiah
Posts: 44
Joined: 14 Aug 2017, 10:21

Re: PresentMon: measure both frame throughput (fps) and late

Post by crossjeremiah » 16 Jul 2019, 05:14

I get a perfect line when I limit the game to 59.9XXX on a 60fps dolphin game. When its uncapped I see the saw tooth. Do we want the saw tooth on the graph or not.
And if scanline sync is not on the latency in present will increase by a small amount like 0.6 or so. when with it on im getting 0.06

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: PresentMon: measure both frame throughput (fps) and late

Post by Chief Blur Buster » 16 Jul 2019, 10:44

There are weird complex interactions that makes a sawtooth appear and sometimes not.

You don't want varying latency in a sawtooth-shaped graph (latency slewing slowly and then suddenly jumping back -- at a harmonic frequency between FPS and Hz) that is for sure.

If you made the sawtooth disappear *AND* you don't see tearing in fast horizontal panning, then you're getting glassfloor low latency with your chosen sync method.

That said -- a caveat -- It's possible for Present() to sawtooth, while it is also possible for Present() to be glassfloor but the pixel-to-photons to sawtooth. There's different situations that causes that to happen. PresentMon cannot measure sawtoothing outside of the Present().

Basically to avoid sawtoothing for Present() *and* outside of Present(), you need to strictly use VSYNC OFF + RTSS Scanline Sync + high numbers of ForceFlush + low GPU utilization (~30%).

Occasionally, sometimes sawtooth may be the preferred of a "pick your poison", if you can't get low-lag and zero-sawtooth simultaneously. See this:

Image

The blue is higher lag but consistent...
The red is lower lag but sawtooth-varying...
So sometimes it is a pick-your-poison effect.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Vleeswolf
Posts: 37
Joined: 25 Aug 2017, 15:59

Re: PresentMon: measure both frame throughput (fps) and late

Post by Vleeswolf » 16 Jul 2019, 13:31

I believe (from scanning the source code) that PresentMon does not only measure time elapsed within a Present() call, but also until GPU work completion and frame display. It does this by consuming Windows ETW traces, instead of intercepting Present() calls like RTSS does. It could be that this is still too limited to measure the effects like you refer too, however.

crossjeremiah
Posts: 44
Joined: 14 Aug 2017, 10:21

Re: PresentMon: measure both frame throughput (fps) and late

Post by crossjeremiah » 01 Sep 2019, 18:26

So I ran PresentMon on my friends AOC 240hz(dont remember the model) it had higher latency than my AW2518H. AOC ran around 11ms and AW2518H was at 1-4ms. Can you explain why those readings were different when they were both at 240hz and presentmon doesnt take in account the monitor latency?

Vleeswolf
Posts: 37
Joined: 25 Aug 2017, 15:59

Re: PresentMon: measure both frame throughput (fps) and late

Post by Vleeswolf » 02 Sep 2019, 03:18

Did you connect your friends monitor to your PC, or did you run on his/her PC? If the latter, do you have the same GPUs? Same settings in whatever application you were running? In particular frame rate limiting and render ahead settings? Causes of lag measured by PresentMon are GPU render time, render queuing, sync queueing, and some (minor) API overheads.

Post Reply