[OLD] Non-CPU/GPU Bound Input Lag Tests

MT_ · Post by **MT_** » 23 Mar 2022, 10:31

We all know what increasing FPS does to our input latency assuming we prefer uncapped gameplay, and when shenanigans like pre-rendered frames don't come into play to ruin our lowered latency. I was personally curious on how certain tweaks can influence input latency if we are not limited by either CPU or GPU usage, on a fixed framerate. (Lets say you prefer G-sync or cap at exact or multiplier of your refresh rate i.e. 120/240/360)

Can it increase or reduce input latency somewhere in the chain regardless?

Testing Method:
__________________

We test with Adaptive G-sync ON (With a cap of 117 and V-sync off) so that we don't have to deal with potential tearlines, which will make measurements a bit easier, and we will measure the exact moment when the gun has any indication that it's moving upwards due to recoil. Technically, it should be the same as testing a capped CSGO session at 117 without having G-sync on, it is literally just to make measurements easier due to no tearline.

First sign of LED turning on, ever so slightly:

: red led.png (73.48 KiB) Viewed 4263 times

First sign of recoil of gun from the tip, no matter how slightly:

: recoil.png (260.5 KiB) Viewed 4263 times

The idea is to test each tweak individually and compare against 'default' configuration so that no tweak can somehow affect the other in mysterious ways.

We use a modified G305 (1000hz) with LED attached to left-button omicron switch, for this test we have soldered on a brand new omicron switch and tested whether it was reliable (Half/soft press) to observe its proper functioning.
(Before this, the switch was pretty worn out and would often double click, and the LED would flicker/pulse due to bad contact if not pressed firmly).

For the capturing we used a S10+ with super slow motion method on 0.4 second record (960 fps mode without software interpolation) and captured 20 samples for each test.

The system was rebooted each time for each test, no matter whether it made sense or not.

The OS (21H2) was conditioned with the following script (Generally):
https://github.com/Marctraider/LiveScript-LTSC-21H2

Testing Configuration: (8x msaa, multicore on)
___________________

- Game in particular: CSGO. (Just handy, simple and not as complex as most other games, with extensive console functionality, its not to say that some tweaks will effect this game as much as others, or might even be impacted negatively for others)
- G-sync with V-sync OFF (Tearline only visible for the bottom 5% of screen, we want this mode to test ULLM later on without the forced imposed nvidia framerate limit. For this and other reasons I've opted not to use v-sync.)
- maxfps 117, effectively (Confirmed by CapFrameX) guaranteeing zero tearing at the part of screen that we are testing on.
- Non-maxed GPU and CPU usage to eliminate (over)load affecting input latency. (Prerendering frames is a clear example of this process)
- Game affinity optimally calibrated with affinity to core #3 to #7, and background processes to core #1 #2 (Eliminates/reduces cpu cache trashing and thread hopping, and showed by far the highest performance in uncapped, mostly cpu bound csgo benchmark)
(This basically also means that on a default system core 0 is pretty offloaded so we don't expect fiddling around with MSI, interrupt affinity etc to have any meaningful effect, but that is what these tests are for)
(This also means that tweaking cpu scheduler in windows has most likely no effect as there would be very little context switching going on)
- Windows 10 LTSC (21H2) with minimal jitter, no undesired or surprises going on in background. Idle system practically has <1% cpu usage 24/7.
- Clock speeds of both GPU and CPU at a fixed max, C-states and other powersaving elements disabled. Same for SSD timeouts. Same for PCIe ASPM, clock gating, USB link power saving, etc)
- We check the mouse LED for the exact moment the first electricity moves through it (So the first indication that its turning on)
- We use the exact same weapon (Glock) and observe the very first recoil reaction by the physical weapon itself where the tip of the weapon goes upwards.
- We test all 'tweaks' individually from 'system default' results, and not combined. We will assume, but this is definitely no guarantee, that after these tests, stacking up the positive tweaks will yield the lowest possible input lag.
(This is obviously depending on how much individual tweaks are related in the total rendering chain. But testing each combination individually will be an extremely costly and time consuming endeavor,
having that said; I doubt most of these will have any meaningful or measurable effect so if only a few turn out to be any good, it might be possible to test a few combinations after all.)
- Console command: (To guarantee exact placement and scenery of each test to minimize fluctuations per test, and for ease of testing)
sv_cheats 1; mp_startmoney 10000; mp_roundtime 60; mp_roundtime_defuse 60; setpos 1303.892578 2973.072266 193.093811; setang -0.220073 -132.752014 0.000000

Tests:
_____

default (You can assume that all the latter tests are the reversed of this default)
default -> GPU Hardware Scheduling to 'On'
default -> Full Screen Optimizations to 'Off'
default -> Interrupt routing #1 (Offloads GPU to second cpu core, indirectly alleviating kernel execution, USB interrupts etc)
default -> Ultra low latency mode (Default is Prerendered 1/LLM=On)
default -> DisableDynamicTick yes
default -> MSI Mode on
default -> MSI Mode on / Interrupt Routing #1 (Combo)
default -> Max_Pending_Cmds_Buffers to 1
default -> MC_HOST_STAGING_BUFFER to 0x10
default -> Timer resolution to 0.5ms. (Usually defaults to 1.0ms when gaming or sound subsystem is active, but it depends what granularity the OS requires)
default -> Game Mode On (On is actually the default, but we assume Off is the default here)
default -> Write Combining On (Registry tweak, unsure if still valid with newer drivers)
default -> Multicore rendering off

Afterword:
_____________

If a tweak is doubtable or within margin of error (Or still leaning towards positive result), it might be prudent to test the stability of frametimes and see whether a tweak actually improves that factor instead. (If yes, it might still be worth it)

Another possibility is that a tweak could only be beneficial on high to max load of certain system components, or that they heavily rely on core 0 (In our test we've already decided that offloading this particular game from core 0 yields some pretty major performance advantage)

However if it does not improve anything at all, be it frametimes OR input lag, I would strongly advise to reset the tweak in question to its default state and simply don't touch it at all.

The Timer resolution test was surprising; Higher input lag at 0.5ms (From 1.0ms), but from experience 0.5ms resolution also gives smoother frametimes (in-game fps limiters often heavily rely on this for it's accuracy).
So are we looking at a smoothness vs input lag compromise here?

Since we've already dealt with a lot of issues (CPU affinitization) that these tests should likely expose, we're seeing very little to no result with most of them now, especially when it comes to fiddling around with the interrupt subsystems.

Game Mode looks to have zero effect on input lag, if this setting still does anything at all (And it has been undergoing quite some changes since a lot of builds) it must be on very weak systems with little resources to spare when running demanding games. So we can quite debunk the fact that you can magically attain ~5ms of input lag reduction for free just by toggling it on. (This claim could already not really be theortically explained, as you cant just magically reduce input lag with zero reason even if it involves simple OS prioritization, if there is no contention of resources going on)

GPU hardware scheduler seems to do nothing good here, but assuming you run a lot of background applications (With hardware acceleration on i.e., you might start to see benefits on weaker systems or on games with max GPU load)

Ultra Low Latency Mode didn't appear to affect input latency at all, neither positive or negatively. Assuming it works the same in fixed refresh rate as well as in adaptive G-sync mode, we will assume under all conditions that it does not negatively affect input latency in NON-gpu bound scenarios? (Either they fixed it's behavior, or something iffy was going on. Maybe game dependent after all?)

DisableDynamicTick yes seems to have a small benefit here? (The best tweak actually that shows potential)

I expected multicore rendering to off to actually yield some positive results but i could not conclude this with the current settings/config used. However maybe I should've used the command mat_queue_mode instead as there are two options for single threaded behavior. Maybe the benefits here are only visible on higher load (But not maxed, i.e. 240fps cap gameplay) or simply less observable on lower fps cap?

Frankly most fluctuations we see here seems to be very margin of error material, and only DisableDynamicTick, Timer Resolution and MC HOSTS STAGING BUFFER SIZE seem to show the highest outliers. I'm also not sure if there is any actual visual intentional randomization on when the weapon model actually changes, or whether this is based upon the (local) server tickrate, in which case the deviation would be 16ms?~ per weapon cycle?

Results are here:
______________

https://docs.google.com/spreadsheets/d/ ... sp=sharing

Mind you, I haven't done the calculation to millisecond differences, these are literally framestep numbers from the first LED indication to Recoil.

Eonds · Post by **Eonds** » 23 Mar 2022, 12:10

MT_ wrote: ↑
23 Mar 2022, 10:31
We all know what increasing FPS does to our input latency assuming we prefer uncapped gameplay, and when shenanigans like pre-rendered frames don't come into play to ruin our lowered latency. I was personally curious on how certain tweaks can influence input latency if we are not limited by either CPU or GPU usage, on a fixed framerate. (Lets say you prefer G-sync or cap at exact or multiplier of your refresh rate i.e. 120/240/360)

Can it increase or reduce input latency somewhere in the chain regardless?

Testing Method:
__________________

We test with Adaptive G-sync ON (With a cap of 117 and V-sync off) so that we don't have to deal with potential tearlines, which will make measurements a bit easier, and we will measure the exact moment when the gun has any indication that it's moving upwards due to recoil. Technically, it should be the same as testing a capped CSGO session at 117 without having G-sync on, it is literally just to make measurements easier due to no tearline.

First sign of LED turning on, ever so slightly:
red led.png

First sign of recoil of gun from the tip, no matter how slightly:
recoil.png

The idea is to test each tweak individually and compare against 'default' configuration so that no tweak can somehow affect the other in mysterious ways.

We use a modified G305 (1000hz) with LED attached to left-button omicron switch, for this test we have soldered on a brand new omicron switch and tested whether it was reliable (Half/soft press) to observe its proper functioning.
(Before this, the switch was pretty worn out and would often double click, and the LED would flicker/pulse due to bad contact if not pressed firmly).

For the capturing we used a S10+ with super slow motion method on 0.4 second record (960 fps mode without software interpolation) and captured 20 samples for each test.

The system was rebooted each time for each test, no matter whether it made sense or not.

Testing Configuration: (8x msaa, multicore on)
___________________

- Game in particular: CSGO. (Just handy, simple and not as complex as most other games, with extensive console functionality, its not to say that some tweaks will effect this game as much as others, or might even be impacted negatively for others)
- G-sync with V-sync OFF (Tearline only visible for the bottom 5% of screen, we want this mode to test ULLM later on without the forced imposed nvidia framerate limit. For this and other reasons I've opted not to use v-sync.)
- maxfps 117, effectively (Confirmed by CapFrameX) guaranteeing zero tearing at the part of screen that we are testing on.
- Non-maxed GPU and CPU usage to eliminate (over)load affecting input latency. (Prerendering frames is a clear example of this process)
- Game affinity optimally calibrated with affinity to core #3 to #7, and background processes to core #1 #2 (Eliminates/reduces cpu cache trashing and thread hopping, and showed by far the highest performance in uncapped, mostly cpu bound csgo benchmark)
(This basically also means that on a default system core 0 is pretty offloaded so we don't expect fiddling around with MSI, interrupt affinity etc to have any meaningful effect, but that is what these tests are for)
(This also means that tweaking cpu scheduler in windows has most likely no effect as there would be very little context switching going on)
- Windows 10 LTSC (21H2) with minimal jitter, no undesired or surprises going on in background. Idle system practically has <1% cpu usage 24/7.
- Clock speeds of both GPU and CPU at a fixed max, C-states and other powersaving elements disabled. Same for SSD timeouts. Same for PCIe ASPM, clock gating, USB link power saving, etc)
- We check the mouse LED for the exact moment the first electricity moves through it (So the first indication that its turning on)
- We use the exact same weapon (Glock) and observe the very first recoil reaction by the physical weapon itself where the tip of the weapon goes upwards.
- We test all 'tweaks' individually from 'system default' results, and not combined. We will assume, but this is definitely no guarantee, that after these tests, stacking up the positive tweaks will yield the lowest possible input lag.
(This is obviously depending on how much individual tweaks are related in the total rendering chain. But testing each combination individually will be an extremely costly and time consuming endeavor,
having that said; I doubt most of these will have any meaningful or measurable effect so if only a few turn out to be any good, it might be possible to test a few combinations after all.)
- Console command: (To guarantee exact placement and scenery of each test to minimize fluctuations per test, and for ease of testing)
sv_cheats 1; mp_startmoney 10000; mp_roundtime 60; mp_roundtime_defuse 60; setpos 1303.892578 2973.072266 193.093811; setang -0.220073 -132.752014 0.000000

Tests:
_____

default (You can assume that all the latter tests are the reversed of this default)
default -> GPU Hardware Scheduling to 'On'
default -> Full Screen Optimizations to 'Off'
default -> Interrupt routing #1 (Offloads GPU to second cpu core, indirectly alleviating kernel execution, USB interrupts etc)
default -> Ultra low latency mode (Default is Prerendered 1/LLM=On)
default -> DisableDynamicTick yes
default -> MSI Mode on
default -> MSI Mode on / Interrupt Routing #1 (Combo)
default -> Max_Pending_Cmds_Buffers to 1
default -> MC_HOST_STAGING_BUFFER to 0x10
default -> Timer resolution to 0.5ms. (Usually defaults to 1.0ms when gaming or sound subsystem is active, but it depends what granularity the OS requires)
default -> Game Mode On (On is actually the default, but we assume Off is the default here)
default -> Write Combining On (Registry tweak, unsure if still valid with newer drivers)
default -> Multicore rendering off

Afterword:
_____________

If a tweak is doubtable or within margin of error (Or still leaning towards positive result), it might be prudent to test the stability of frametimes and see whether a tweak actually improves that factor instead. (If yes, it might still be worth it)

Another possibility is that a tweak could only be beneficial on high to max load of certain system components, or that they heavily rely on core 0 (In our test we've already decided that offloading this particular game from core 0 yields some pretty major performance advantage)

However if it does not improve anything at all, be it frametimes OR input lag, I would strongly advise to reset the tweak in question to its default state and simply don't touch it at all.

The Timer resolution test was surprising; Higher input lag at 0.5ms (From 1.0ms), but from experience 0.5ms resolution also gives smoother frametimes (in-game fps limiters often heavily rely on this for it's accuracy).
So are we looking at a smoothness vs input lag compromise here?

Since we've already dealt with a lot of issues (CPU affinitization) that these tests should likely expose, we're seeing very little to no result with most of them now, especially when it comes to fiddling around with the interrupt subsystems.

Game Mode looks to have zero effect on input lag, if this setting still does anything at all (And it has been undergoing quite some changes since a lot of builds) it must be on very weak systems with little resources to spare when running demanding games. So we can quite debunk the fact that you can magically attain ~5ms of input lag reduction for free just by toggling it on. (This claim could already not really be theortically explained, as you cant just magically reduce input lag with zero reason even if it involves simple OS prioritization, if there is no contention of resources going on)

GPU hardware scheduler seems to do nothing good here, but assuming you run a lot of background applications (With hardware acceleration on i.e., you might start to see benefits on weaker systems or on games with max GPU load)

Ultra Low Latency Mode didn't appear to affect input latency at all, neither positive or negatively. Assuming it works the same in fixed refresh rate as well as in adaptive G-sync mode, we will assume under all conditions that it does not negatively affect input latency in NON-gpu bound scenarios? (Either they fixed it's behavior, or something iffy was going on. Maybe game dependent after all?)

DisableDynamicTick yes seems to have a small benefit here? (The best tweak actually that shows potential)

I expected multicore rendering to off to actually yield some positive results but i could not conclude this with the current settings/config used. However maybe I should've used the command mat_queue_mode instead as there are two options for single threaded behavior. Maybe the benefits here are only visible on higher load (But not maxed, i.e. 240fps cap gameplay) or simply less observable on lower fps cap?

Frankly most fluctuations we see here seems to be very margin of error material, and only DisableDynamicTick, Timer Resolution and MC HOSTS STAGING BUFFER SIZE seem to show the highest outliers. I'm also not sure if there is any actual visual intentional randomization on when the weapon model actually changes, or whether this is based upon the (local) server tickrate, in which case the deviation would be 16ms?~ per weapon cycle?

Results are here:
______________

https://docs.google.com/spreadsheets/d/ ... sp=sharing

Mind you, I haven't done the calculation to millisecond differences, these are literally framestep numbers from the first LED indication to Recoil.

I appreciate your detail but this isn't 100% accurate. Testing on a semi stock system is highly irresponsible & inaccurate. Not only that but windows itself is not necessarily what you'd want to be running such tests on. Setting a device into MSI mode is clearly and obviously developed in order to lower latency. This is undeniable & doesn't need a test. So if you can't measure it, maybe it's not accurate enough (its not). I don't mean to discourage you but these things are best left to people who own expensive and highly precise equipment with perfect testing environments. There's many things which you don't know about that are going on behind the scenes which you cannot control with out expert level knowledge/understanding. A simple example would be SMI's causing latency spikes. One factor out of thousands that could easily throw off millisecond/sub millisecond differences. As for Write Combining on NVIDIA GPU's anything past 457.30 it's deprecated from the driver.

Post by **jorimt** » 23 Mar 2022, 12:52

Eonds wrote: ↑
23 Mar 2022, 12:10

I see you've been at it on the forums again...

Regardless of his particular test methodology or results (which I have not had time to inspect), click-to-photon methodology primarily tests for one thing; how many scanout cycles it takes, on average, for the effects of an input to appear on-screen.

It doesn't matter if that input is a click or swipe on a mouse, a tap on a controller, or a key press on a keyboard, the appearance of any input is ultimately limited by the scanout (the completion time of which is dependent on the max refresh rate; the higher the refresh rate, the more scanout cycles per second and the faster they each complete), which is a fixed and constant process (even with VRR) containing all the given frame information per cycle.

So it stands to reason that if a variable or variables are changed, and it still takes the same amount of average scanout cycles for the input to appear in scenario B vs. A (min/max spread included, and margin-of-error accounted for), that said variable(s) may not have enough impact to be noticed, with the scanout process probably being the bottleneck, which is ultimately all that matters, since it's all the user has for visual feedback of any of their inputs in real-world scenarios.

Eonds · Post by **Eonds** » 23 Mar 2022, 13:04

jorimt wrote: ↑
23 Mar 2022, 12:52

Eonds wrote: ↑
23 Mar 2022, 12:10
I see you've been at it on the forums again...

Regardless of his particular test methodology or results (which I have not had time to inspect), click-to-photon methodology primarily tests for one thing; how many scanout cycles it takes, on average, for the effects of an input to appear on-screen.

It doesn't matter if that input is a click or swipe on a mouse, a tap on a controller, or a key press on a keyboard, the appearance of any input is ultimately limited by the scanout (the completion time of which is dependent on the max refresh rate; the higher the refresh rate, the more scanout cycles per second and the faster they each complete), which is a fixed and constant process (even with VRR) containing all the given frame information per cycle.

So it stands to reason that if a variable or variables are changed, and it still takes the same amount of average scanout cycles for the input to appear in scenario B vs. A (min/max spread included, and margin-of-error accounted for), that said variable(s) may not have enough impact to be noticed, with the scanout process probably being the bottleneck, which is ultimately all that matters, since it's all the user has for visual feedback of any of their inputs in real-world scenarios.

Long time no see.

Key word "margin of error" .

Post by **jorimt** » 23 Mar 2022, 13:12

Eonds wrote: ↑
23 Mar 2022, 13:04
Key word "margin of error" .

On a proper click-to-photon set-up, margin-of-error between like-for-like scenarios is typically <1ms.

There's a difference between there only being 1ms of latency left, and there be(*ing) 1ms added on top of, say, an existing 15ms of latency (i.e. 15ms vs 16ms); the former is absolutely noticeable (especially in touch screen application, for instance), whereas the perceivable benefits of the later is arguable and dependent on the capabilities and sensitivities of the given user.

Diminishing returns is real, and should always be considered, though lower latency is always better (even if not perceptively impactful to everyone), assuming it doesn't sacrifice consistency (which it usually does).

Eonds · Post by **Eonds** » 23 Mar 2022, 13:30

jorimt wrote: ↑
23 Mar 2022, 13:12

Eonds wrote: ↑
23 Mar 2022, 13:04
Key word "margin of error" .
On a proper click-to-photon set-up, margin-of-error between like-for-like scenarios is typically <1ms.

There's a difference between there only being 1ms of latency left, and there be 1ms added on top of, say, an existing 15ms of latency (i.e. 15ms vs 16ms); the former is absolutely noticeable (especially in touch screen application, for instance), whereas the perceivable benefits of the later is arguable and dependent on the capabilities and sensitivities of the given user.

Diminishing returns is real, and should always be considered, though lower latency is always better, assuming it doesn't sacrifice consistency (which it usually does).

I agree with the diminishing returns part but I'd only consider that when you're sub 5ms total system latency. I've seen people on all types of hardware have screen tearing with V-sync (this is because they are using the wrong timers). Usually it's because they set the wrong bcdedits as a result of browsing incompetent threads on forums. I wouldn't be so confident to say that a C2P is entirely accurate across all systems especially when you're talking about different game engines. The simple fact that games still choose to use QPC instead of RDTSC to fetch the TSC is honestly a giggle bob moment.

kriegor · Post by **kriegor** » 23 Mar 2022, 13:47

Eonds wrote: ↑
23 Mar 2022, 12:10
Setting a device into MSI mode is clearly and obviously developed in order to lower latency. This is undeniable & doesn't need a test. So if you can't measure it, maybe it's not accurate enough (its not).

He did measure it, and the result was that the difference is pretty much within margin of error, which is ZERO. Just because you cannot accept that this may be the case, doesn't make the measurement invalid.

The rest of your post is pointless and condescending. Leave it to the experts? Like who? You? Perfect testing environments? Who exactly are these experts with these perfectly sterile Windows installations and precise insturments conducting these tests?
Oh wait, you are just coping because the results are forcing you to question your reality.

Post by **jorimt** » 23 Mar 2022, 14:00

Eonds wrote: ↑
23 Mar 2022, 13:30
I wouldn't be so confident to say that a C2P is entirely accurate across all systems especially when you're talking about different game engines.

Click-to-photon results are entirely system + game-dependent. You of course can't apply the totals of one game's results to another, or from one system to another.

The more reliable metric is the difference between the totals. My article states as much:
https://blurbusters.com/gsync/gsync101- ... ettings/3/

This article does not seek to measure the impact of input lag differences incurred by display, input device, CPU or GPU overclocks, RAM timings, disk drives, drivers, BIOS, OS, or in-game graphical settings. And the baseline numbers represented in the results are not indicative of, and should not be expected to be replicable on other systems, which will vary in configuration, specs, and the games being run.

This article seeks only to measuring the impact V-SYNC OFF, G-SYNC, V-SYNC, and Fast Sync, paired with various framerate limiters, have on frame delivery and input lag, and the differences between them; the results of which are replicable across setups.

There are absolutely limits to latency test methodologies, be it click-to-photon or others, but if contextualized properly, basic information such as how much latency can be reduced with G-SYNC or no-sync over standalone V-SYNC, or how much latency can be reduce over a GPU-bound scenario vs. a non-GPU-bound scenario in the amount of "frames" (scanout cycles) is typically replicable across systems and games.

For instance, if you run double-buffer V-SYNC uncapped with framerates above the refresh rate, you're going to get at least an additional 2 frames of average latency vs properly configured G-SYNC or no sync, or if you max your GPU usage, you're going to get an additional 1-2 frames of average latency, regardless of sync being enabled/disabled, no matter the game, no matter the system, etc.

The simplest way to ensure you have the lowest possible engine-level latency in any game (counting high polling, low latency input devices) is to run no-sync uncapped on the highest refresh rate monitor available (with the lowest processing latency and average GtG) at the highest possibly achievable framerate without maxing your GPU.

That said, once you get out of the realm of syncing methods and the render queue (aka engine-level render buffering), latency differences become harder to test and prove, but the lion's share of perceivable impact at the user-level occurs at the render queue and display-level where gaming is concerned, so that is what is most focused on and cited. I.E. haul the chunks away and get the tweezers out for the rest.

MT_ · Post by **MT_** » 23 Mar 2022, 21:28

Eonds wrote: ↑
23 Mar 2022, 12:10

MT_ wrote: ↑
23 Mar 2022, 10:31
We all know what increasing FPS does to our input latency assuming we prefer uncapped gameplay, and when shenanigans like pre-rendered frames don't come into play to ruin our lowered latency. I was personally curious on how certain tweaks can influence input latency if we are not limited by either CPU or GPU usage, on a fixed framerate. (Lets say you prefer G-sync or cap at exact or multiplier of your refresh rate i.e. 120/240/360)

Can it increase or reduce input latency somewhere in the chain regardless?

Testing Method:
__________________

We test with Adaptive G-sync ON (With a cap of 117 and V-sync off) so that we don't have to deal with potential tearlines, which will make measurements a bit easier, and we will measure the exact moment when the gun has any indication that it's moving upwards due to recoil. Technically, it should be the same as testing a capped CSGO session at 117 without having G-sync on, it is literally just to make measurements easier due to no tearline.

First sign of LED turning on, ever so slightly:
red led.png

First sign of recoil of gun from the tip, no matter how slightly:
recoil.png

The idea is to test each tweak individually and compare against 'default' configuration so that no tweak can somehow affect the other in mysterious ways.

We use a modified G305 (1000hz) with LED attached to left-button omicron switch, for this test we have soldered on a brand new omicron switch and tested whether it was reliable (Half/soft press) to observe its proper functioning.
(Before this, the switch was pretty worn out and would often double click, and the LED would flicker/pulse due to bad contact if not pressed firmly).

For the capturing we used a S10+ with super slow motion method on 0.4 second record (960 fps mode without software interpolation) and captured 20 samples for each test.

The system was rebooted each time for each test, no matter whether it made sense or not.

Testing Configuration: (8x msaa, multicore on)
___________________

- Game in particular: CSGO. (Just handy, simple and not as complex as most other games, with extensive console functionality, its not to say that some tweaks will effect this game as much as others, or might even be impacted negatively for others)
- G-sync with V-sync OFF (Tearline only visible for the bottom 5% of screen, we want this mode to test ULLM later on without the forced imposed nvidia framerate limit. For this and other reasons I've opted not to use v-sync.)
- maxfps 117, effectively (Confirmed by CapFrameX) guaranteeing zero tearing at the part of screen that we are testing on.
- Non-maxed GPU and CPU usage to eliminate (over)load affecting input latency. (Prerendering frames is a clear example of this process)
- Game affinity optimally calibrated with affinity to core #3 to #7, and background processes to core #1 #2 (Eliminates/reduces cpu cache trashing and thread hopping, and showed by far the highest performance in uncapped, mostly cpu bound csgo benchmark)
(This basically also means that on a default system core 0 is pretty offloaded so we don't expect fiddling around with MSI, interrupt affinity etc to have any meaningful effect, but that is what these tests are for)
(This also means that tweaking cpu scheduler in windows has most likely no effect as there would be very little context switching going on)
- Windows 10 LTSC (21H2) with minimal jitter, no undesired or surprises going on in background. Idle system practically has <1% cpu usage 24/7.
- Clock speeds of both GPU and CPU at a fixed max, C-states and other powersaving elements disabled. Same for SSD timeouts. Same for PCIe ASPM, clock gating, USB link power saving, etc)
- We check the mouse LED for the exact moment the first electricity moves through it (So the first indication that its turning on)
- We use the exact same weapon (Glock) and observe the very first recoil reaction by the physical weapon itself where the tip of the weapon goes upwards.
- We test all 'tweaks' individually from 'system default' results, and not combined. We will assume, but this is definitely no guarantee, that after these tests, stacking up the positive tweaks will yield the lowest possible input lag.
(This is obviously depending on how much individual tweaks are related in the total rendering chain. But testing each combination individually will be an extremely costly and time consuming endeavor,
having that said; I doubt most of these will have any meaningful or measurable effect so if only a few turn out to be any good, it might be possible to test a few combinations after all.)
- Console command: (To guarantee exact placement and scenery of each test to minimize fluctuations per test, and for ease of testing)
sv_cheats 1; mp_startmoney 10000; mp_roundtime 60; mp_roundtime_defuse 60; setpos 1303.892578 2973.072266 193.093811; setang -0.220073 -132.752014 0.000000

Tests:
_____

default (You can assume that all the latter tests are the reversed of this default)
default -> GPU Hardware Scheduling to 'On'
default -> Full Screen Optimizations to 'Off'
default -> Interrupt routing #1 (Offloads GPU to second cpu core, indirectly alleviating kernel execution, USB interrupts etc)
default -> Ultra low latency mode (Default is Prerendered 1/LLM=On)
default -> DisableDynamicTick yes
default -> MSI Mode on
default -> MSI Mode on / Interrupt Routing #1 (Combo)
default -> Max_Pending_Cmds_Buffers to 1
default -> MC_HOST_STAGING_BUFFER to 0x10
default -> Timer resolution to 0.5ms. (Usually defaults to 1.0ms when gaming or sound subsystem is active, but it depends what granularity the OS requires)
default -> Game Mode On (On is actually the default, but we assume Off is the default here)
default -> Write Combining On (Registry tweak, unsure if still valid with newer drivers)
default -> Multicore rendering off

Afterword:
_____________

If a tweak is doubtable or within margin of error (Or still leaning towards positive result), it might be prudent to test the stability of frametimes and see whether a tweak actually improves that factor instead. (If yes, it might still be worth it)

Another possibility is that a tweak could only be beneficial on high to max load of certain system components, or that they heavily rely on core 0 (In our test we've already decided that offloading this particular game from core 0 yields some pretty major performance advantage)

However if it does not improve anything at all, be it frametimes OR input lag, I would strongly advise to reset the tweak in question to its default state and simply don't touch it at all.

The Timer resolution test was surprising; Higher input lag at 0.5ms (From 1.0ms), but from experience 0.5ms resolution also gives smoother frametimes (in-game fps limiters often heavily rely on this for it's accuracy).
So are we looking at a smoothness vs input lag compromise here?

Since we've already dealt with a lot of issues (CPU affinitization) that these tests should likely expose, we're seeing very little to no result with most of them now, especially when it comes to fiddling around with the interrupt subsystems.

Game Mode looks to have zero effect on input lag, if this setting still does anything at all (And it has been undergoing quite some changes since a lot of builds) it must be on very weak systems with little resources to spare when running demanding games. So we can quite debunk the fact that you can magically attain ~5ms of input lag reduction for free just by toggling it on. (This claim could already not really be theortically explained, as you cant just magically reduce input lag with zero reason even if it involves simple OS prioritization, if there is no contention of resources going on)

GPU hardware scheduler seems to do nothing good here, but assuming you run a lot of background applications (With hardware acceleration on i.e., you might start to see benefits on weaker systems or on games with max GPU load)

Ultra Low Latency Mode didn't appear to affect input latency at all, neither positive or negatively. Assuming it works the same in fixed refresh rate as well as in adaptive G-sync mode, we will assume under all conditions that it does not negatively affect input latency in NON-gpu bound scenarios? (Either they fixed it's behavior, or something iffy was going on. Maybe game dependent after all?)

DisableDynamicTick yes seems to have a small benefit here? (The best tweak actually that shows potential)

I expected multicore rendering to off to actually yield some positive results but i could not conclude this with the current settings/config used. However maybe I should've used the command mat_queue_mode instead as there are two options for single threaded behavior. Maybe the benefits here are only visible on higher load (But not maxed, i.e. 240fps cap gameplay) or simply less observable on lower fps cap?

Frankly most fluctuations we see here seems to be very margin of error material, and only DisableDynamicTick, Timer Resolution and MC HOSTS STAGING BUFFER SIZE seem to show the highest outliers. I'm also not sure if there is any actual visual intentional randomization on when the weapon model actually changes, or whether this is based upon the (local) server tickrate, in which case the deviation would be 16ms?~ per weapon cycle?

Results are here:
______________

https://docs.google.com/spreadsheets/d/ ... sp=sharing

Mind you, I haven't done the calculation to millisecond differences, these are literally framestep numbers from the first LED indication to Recoil.
I appreciate your detail but this isn't 100% accurate. Testing on a semi stock system is highly irresponsible & inaccurate. Not only that but windows itself is not necessarily what you'd want to be running such tests on. Setting a device into MSI mode is clearly and obviously developed in order to lower latency. This is undeniable & doesn't need a test. So if you can't measure it, maybe it's not accurate enough (its not). I don't mean to discourage you but these things are best left to people who own expensive and highly precise equipment with perfect testing environments. There's many things which you don't know about that are going on behind the scenes which you cannot control with out expert level knowledge/understanding. A simple example would be SMI's causing latency spikes. One factor out of thousands that could easily throw off millisecond/sub millisecond differences. As for Write Combining on NVIDIA GPU's anything past 457.30 it's deprecated from the driver.

Not 100% accurate. On a non-semi, thus stock system (And I mean on all levels) you are dealing with way higher jitter, variability, and the sorts. C-states? Fluctuating core clocks? Windows doing all sorts of things in the background which can throw off testing results? I'm well aware that even 20 samples per test is not sufficient and doesn't give very high confidence but these tests were primarily to catch any significant changes, and based on anything between the chain of game engine to rendering output.

I'm pretty sure I have no issues with SMI or other fluctuating (rare) issues, I know my system pretty well. Impossible? Probably not. But probably less likely than the confidence of my tests for what its worth.

But thank you for your input, this was mainly a test for myself out of curiosity, and if I found find anything of interest you can bet your ass that I would've done further investigation and probably did a few tests with at least 50 samples each.

In either case, I just wanted to 'share' what I found.

Not to brag here, but I'm pretty confident that if you take 50 randomly 'tweaked' game systems, my own system will come out of the top 5 most stable, predictable, and least fluctuating of them all. I can play games with internal FPS limiters on 120/240/360 fps with extremely little tearing. I can assure you under a G-sync (Non v-sync) test I can stand around in the map for hours without a single tearline hitting more than 5/10% of the bottom screen (Exactly what you do in these tests). And yes then we are indeed talking about sub milliseconds. That is confidence.

You can verify my script in the original post, and come to your own conclusions about my testbed. Sadly you cannot verify my hardware/bios configuration but I'm pretty well knowledged in that area. (Having physically flashed bios chips on brand new notebooks to unlock their full potential and expose hundreds of different options). I also will never claim that my tests and methods are flawless.

Let me ask you this: What do you think defines someone with an actual proper testing platform? Do we have any set guidelines and rules that must be adhered by in the general 'input lag measurement' community? It pretty much seems, just like in the FPS measuring scene, that everyone just does their own thing. Nobody knows whats actually going on behind the scenes on much of these review sites, only that they are measuring against their own results. Also whats your definition of 'expensive hardware and testing equipment'? Do you think that a cheap 1000 fps camera could do worse than an expensive one with the same basic principle? You also seem to put claims on my alleged knowledge.

Edit: Looks like I'm actually dealing with a child here, probably from one of those gaming tweak communities. Why do we actually keep such people on the forum? They add absolutely nothing.

Eonds · Post by **Eonds** » 24 Mar 2022, 06:59

MT_ wrote: ↑
23 Mar 2022, 21:28

Eonds wrote: ↑
23 Mar 2022, 12:10

MT_ wrote: ↑
23 Mar 2022, 10:31
We all know what increasing FPS does to our input latency assuming we prefer uncapped gameplay, and when shenanigans like pre-rendered frames don't come into play to ruin our lowered latency. I was personally curious on how certain tweaks can influence input latency if we are not limited by either CPU or GPU usage, on a fixed framerate. (Lets say you prefer G-sync or cap at exact or multiplier of your refresh rate i.e. 120/240/360)

Can it increase or reduce input latency somewhere in the chain regardless?

Testing Method:
__________________

We test with Adaptive G-sync ON (With a cap of 117 and V-sync off) so that we don't have to deal with potential tearlines, which will make measurements a bit easier, and we will measure the exact moment when the gun has any indication that it's moving upwards due to recoil. Technically, it should be the same as testing a capped CSGO session at 117 without having G-sync on, it is literally just to make measurements easier due to no tearline.

First sign of LED turning on, ever so slightly:
red led.png

First sign of recoil of gun from the tip, no matter how slightly:
recoil.png

The idea is to test each tweak individually and compare against 'default' configuration so that no tweak can somehow affect the other in mysterious ways.

We use a modified G305 (1000hz) with LED attached to left-button omicron switch, for this test we have soldered on a brand new omicron switch and tested whether it was reliable (Half/soft press) to observe its proper functioning.
(Before this, the switch was pretty worn out and would often double click, and the LED would flicker/pulse due to bad contact if not pressed firmly).

For the capturing we used a S10+ with super slow motion method on 0.4 second record (960 fps mode without software interpolation) and captured 20 samples for each test.

The system was rebooted each time for each test, no matter whether it made sense or not.

Testing Configuration: (8x msaa, multicore on)
___________________

- Game in particular: CSGO. (Just handy, simple and not as complex as most other games, with extensive console functionality, its not to say that some tweaks will effect this game as much as others, or might even be impacted negatively for others)
- G-sync with V-sync OFF (Tearline only visible for the bottom 5% of screen, we want this mode to test ULLM later on without the forced imposed nvidia framerate limit. For this and other reasons I've opted not to use v-sync.)
- maxfps 117, effectively (Confirmed by CapFrameX) guaranteeing zero tearing at the part of screen that we are testing on.
- Non-maxed GPU and CPU usage to eliminate (over)load affecting input latency. (Prerendering frames is a clear example of this process)
- Game affinity optimally calibrated with affinity to core #3 to #7, and background processes to core #1 #2 (Eliminates/reduces cpu cache trashing and thread hopping, and showed by far the highest performance in uncapped, mostly cpu bound csgo benchmark)
(This basically also means that on a default system core 0 is pretty offloaded so we don't expect fiddling around with MSI, interrupt affinity etc to have any meaningful effect, but that is what these tests are for)
(This also means that tweaking cpu scheduler in windows has most likely no effect as there would be very little context switching going on)
- Windows 10 LTSC (21H2) with minimal jitter, no undesired or surprises going on in background. Idle system practically has <1% cpu usage 24/7.
- Clock speeds of both GPU and CPU at a fixed max, C-states and other powersaving elements disabled. Same for SSD timeouts. Same for PCIe ASPM, clock gating, USB link power saving, etc)
- We check the mouse LED for the exact moment the first electricity moves through it (So the first indication that its turning on)
- We use the exact same weapon (Glock) and observe the very first recoil reaction by the physical weapon itself where the tip of the weapon goes upwards.
- We test all 'tweaks' individually from 'system default' results, and not combined. We will assume, but this is definitely no guarantee, that after these tests, stacking up the positive tweaks will yield the lowest possible input lag.
(This is obviously depending on how much individual tweaks are related in the total rendering chain. But testing each combination individually will be an extremely costly and time consuming endeavor,
having that said; I doubt most of these will have any meaningful or measurable effect so if only a few turn out to be any good, it might be possible to test a few combinations after all.)
- Console command: (To guarantee exact placement and scenery of each test to minimize fluctuations per test, and for ease of testing)
sv_cheats 1; mp_startmoney 10000; mp_roundtime 60; mp_roundtime_defuse 60; setpos 1303.892578 2973.072266 193.093811; setang -0.220073 -132.752014 0.000000

Tests:
_____

default (You can assume that all the latter tests are the reversed of this default)
default -> GPU Hardware Scheduling to 'On'
default -> Full Screen Optimizations to 'Off'
default -> Interrupt routing #1 (Offloads GPU to second cpu core, indirectly alleviating kernel execution, USB interrupts etc)
default -> Ultra low latency mode (Default is Prerendered 1/LLM=On)
default -> DisableDynamicTick yes
default -> MSI Mode on
default -> MSI Mode on / Interrupt Routing #1 (Combo)
default -> Max_Pending_Cmds_Buffers to 1
default -> MC_HOST_STAGING_BUFFER to 0x10
default -> Timer resolution to 0.5ms. (Usually defaults to 1.0ms when gaming or sound subsystem is active, but it depends what granularity the OS requires)
default -> Game Mode On (On is actually the default, but we assume Off is the default here)
default -> Write Combining On (Registry tweak, unsure if still valid with newer drivers)
default -> Multicore rendering off

Afterword:
_____________

If a tweak is doubtable or within margin of error (Or still leaning towards positive result), it might be prudent to test the stability of frametimes and see whether a tweak actually improves that factor instead. (If yes, it might still be worth it)

Another possibility is that a tweak could only be beneficial on high to max load of certain system components, or that they heavily rely on core 0 (In our test we've already decided that offloading this particular game from core 0 yields some pretty major performance advantage)

However if it does not improve anything at all, be it frametimes OR input lag, I would strongly advise to reset the tweak in question to its default state and simply don't touch it at all.

The Timer resolution test was surprising; Higher input lag at 0.5ms (From 1.0ms), but from experience 0.5ms resolution also gives smoother frametimes (in-game fps limiters often heavily rely on this for it's accuracy).
So are we looking at a smoothness vs input lag compromise here?

Since we've already dealt with a lot of issues (CPU affinitization) that these tests should likely expose, we're seeing very little to no result with most of them now, especially when it comes to fiddling around with the interrupt subsystems.

Game Mode looks to have zero effect on input lag, if this setting still does anything at all (And it has been undergoing quite some changes since a lot of builds) it must be on very weak systems with little resources to spare when running demanding games. So we can quite debunk the fact that you can magically attain ~5ms of input lag reduction for free just by toggling it on. (This claim could already not really be theortically explained, as you cant just magically reduce input lag with zero reason even if it involves simple OS prioritization, if there is no contention of resources going on)

GPU hardware scheduler seems to do nothing good here, but assuming you run a lot of background applications (With hardware acceleration on i.e., you might start to see benefits on weaker systems or on games with max GPU load)

Ultra Low Latency Mode didn't appear to affect input latency at all, neither positive or negatively. Assuming it works the same in fixed refresh rate as well as in adaptive G-sync mode, we will assume under all conditions that it does not negatively affect input latency in NON-gpu bound scenarios? (Either they fixed it's behavior, or something iffy was going on. Maybe game dependent after all?)

DisableDynamicTick yes seems to have a small benefit here? (The best tweak actually that shows potential)

I expected multicore rendering to off to actually yield some positive results but i could not conclude this with the current settings/config used. However maybe I should've used the command mat_queue_mode instead as there are two options for single threaded behavior. Maybe the benefits here are only visible on higher load (But not maxed, i.e. 240fps cap gameplay) or simply less observable on lower fps cap?

Frankly most fluctuations we see here seems to be very margin of error material, and only DisableDynamicTick, Timer Resolution and MC HOSTS STAGING BUFFER SIZE seem to show the highest outliers. I'm also not sure if there is any actual visual intentional randomization on when the weapon model actually changes, or whether this is based upon the (local) server tickrate, in which case the deviation would be 16ms?~ per weapon cycle?

Results are here:
______________

https://docs.google.com/spreadsheets/d/ ... sp=sharing

Mind you, I haven't done the calculation to millisecond differences, these are literally framestep numbers from the first LED indication to Recoil.
I appreciate your detail but this isn't 100% accurate. Testing on a semi stock system is highly irresponsible & inaccurate. Not only that but windows itself is not necessarily what you'd want to be running such tests on. Setting a device into MSI mode is clearly and obviously developed in order to lower latency. This is undeniable & doesn't need a test. So if you can't measure it, maybe it's not accurate enough (its not). I don't mean to discourage you but these things are best left to people who own expensive and highly precise equipment with perfect testing environments. There's many things which you don't know about that are going on behind the scenes which you cannot control with out expert level knowledge/understanding. A simple example would be SMI's causing latency spikes. One factor out of thousands that could easily throw off millisecond/sub millisecond differences. As for Write Combining on NVIDIA GPU's anything past 457.30 it's deprecated from the driver.
Not 100% accurate. On a non-semi, thus stock system (And I mean on all levels) you are dealing with way higher jitter, variability, and the sorts. C-states? Fluctuating core clocks? Windows doing all sorts of things in the background which can throw off testing results? I'm well aware that even 20 samples per test is not sufficient and doesn't give very high confidence but these tests were primarily to catch any significant changes, and based on anything between the chain of game engine to rendering output.

I'm pretty sure I have no issues with SMI or other fluctuating (rare) issues, I know my system pretty well. Impossible? Probably not. But probably less likely than the confidence of my tests for what its worth.

But thank you for your input, this was mainly a test for myself out of curiosity, and if I found find anything of interest you can bet your ass that I would've done further investigation and probably did a few tests with at least 50 samples each.

In either case, I just wanted to 'share' what I found.

Not to brag here, but I'm pretty confident that if you take 50 randomly 'tweaked' game systems, my own system will come out of the top 5 most stable, predictable, and least fluctuating of them all. I can play games with internal FPS limiters on 120/240/360 fps with extremely little tearing. I can assure you under a G-sync (Non v-sync) test I can stand around in the map for hours without a single tearline hitting more than 5/10% of the bottom screen (Exactly what you do in these tests). And yes then we are indeed talking about sub milliseconds. That is confidence.

You can verify my script in the original post, and come to your own conclusions about my testbed. Sadly you cannot verify my hardware/bios configuration but I'm pretty well knowledged in that area. (Having physically flashed bios chips on brand new notebooks to unlock their full potential and expose hundreds of different options). I also will never claim that my tests and methods are flawless.

Let me ask you this: What do you think defines someone with an actual proper testing platform? Do we have any set guidelines and rules that must be adhered by in the general 'input lag measurement' community? It pretty much seems, just like in the FPS measuring scene, that everyone just does their own thing. Nobody knows whats actually going on behind the scenes on much of these review sites, only that they are measuring against their own results. Also whats your definition of 'expensive hardware and testing equipment'? Do you think that a cheap 1000 fps camera could do worse than an expensive one with the same basic principle? You also seem to put claims on my alleged knowledge.

Edit: Looks like I'm actually dealing with a child here, probably from one of those gaming tweak communities. Why do we actually keep such people on the forum? They add absolutely nothing.

Your system is semi-stock. You're still on windows with a high standard deviation that cant measure sub ms differences concretely. You're not doubt competent to some degree so I wont be too harsh. The equipment which I'm talking about is hundreds of thousands of dollars which mostly companies like AMD & Intel & Nvidia use. I don't see the point of this post other than to look cool. It's true that most people have terrible systems. I think if you're not an actual expert on the topic you shouldn't be posting results that are inconclusive & inaccurate. Every BIOS (basically) has SMI's. They do cause latency spikes. This is !ONE! out of THOUSANDS of factors which you cannot change UNLESS you're HIGHLY knowledgeable. You'll only see what they want you to see. They don't care about you or anyone. A 1000 FPS camera is not good enough either. Do you think a company that has spent billions of dollars onto measuring latency would use a 1000 FPS camera to measure system latency ON WINDOWS? "Gaming Tweak Communities" says the drone shit posting on blurbusters with useless information and inaccurate results. Please don't talk shit to me, I'm trying to be constructive. In reality we both know why you made this post & it's cringe. Also gamemode itself has been changed over time through different windows version. You have hundreds of power saving features & hundreds of other useless clock gating/power gating things bogging down your GPU + thousands of other factors. It's hilarious because nothing of which I said was false & you still manage to turn into a twitter tweaker because I pointed out your false measurements. This is the problem with Science usually, EGO gets in the way. I don't care how you feel, I care about actual measurements. Leave your shitty attitude else where. Talk about adding nothing, look at your post dude....

Blur Busters Forums

[OLD] Non-CPU/GPU Bound Input Lag Tests

[OLD] Non-CPU/GPU Bound Input Lag Tests

Re: Non-CPU/GPU Bound Input Lag Tests

Re: Non-CPU/GPU Bound Input Lag Tests

Re: Non-CPU/GPU Bound Input Lag Tests

Re: Non-CPU/GPU Bound Input Lag Tests

Re: Non-CPU/GPU Bound Input Lag Tests

Re: Non-CPU/GPU Bound Input Lag Tests

Re: Non-CPU/GPU Bound Input Lag Tests

Re: Non-CPU/GPU Bound Input Lag Tests

Re: Non-CPU/GPU Bound Input Lag Tests