For random input lag: Hardware Cache Prefetcher? (BIOS)

Everything about latency. Tips, testing methods, mouse lag, display lag, game engine lag, network lag, whole input lag chain, VSYNC OFF vs VSYNC ON, and more! Input Lag Articles on Blur Busters.
Jonnyc55
Posts: 10
Joined: 15 Jan 2024, 08:09

For random input lag: Hardware Cache Prefetcher? (BIOS)

Post by Jonnyc55 » 15 Jan 2024, 08:19

I hear some people do lots of tweaking to only be surprised of input lag popping up again. Do you reckon hardware cache prefetcher, that regards predicting ahead to cache code in the CPU cache for performance gains.

I've read if the instructions are linear, simple, predictable, then this prefetcher works a treat, as you could simply imagine by the nature of it.

If the instructions are random, then the hardware prefetcher could become a hindrance. Others have mentioned, and I agree, a video game is surely random, as the computer does not know your next input will be, and will not know what you're about to activate in a game, or next stare at (rendering).

So therefore, hardware cache prefetcher could be a nice thing for basic system processes or linear tasks and not a good idea for gaming? I think if you haven't tuned your PC, slimed down the background processes, tasks and services etc. then hardware prefetcher may do a good job ironing that crap out. If you have slimed down your OS, then the prefetcher's use becomes a guy twiddling its thumbs somewhat and getting in the way of gaming.

Thoughts? I've not looked much on documentation on this feature, just going off some basic logic of what it does and the nature of certain applications.

User avatar
RealNC
Site Admin
Posts: 3781
Joined: 24 Dec 2013, 18:32
Contact:

Re: For random input lag: Hardware Cache Prefetcher? (BIOS)

Post by RealNC » 15 Jan 2024, 09:56

That's not how it works. Player input does not change the code. The result of the execution of the code changes, but the code itself does not.
SteamGitHubStack Overflow
The views and opinions expressed in my posts are my own and do not necessarily reflect the official policy or position of Blur Busters.

Jonnyc55
Posts: 10
Joined: 15 Jan 2024, 08:09

Re: For random input lag: Hardware Cache Prefetcher? (BIOS)

Post by Jonnyc55 » 15 Jan 2024, 11:16

RealNC wrote:
15 Jan 2024, 09:56
That's not how it works. Player input does not change the code. The result of the execution of the code changes, but the code itself does not.
Ok.

Is there any random nature to gaming, where this prediction might simply do more harm than good?

User avatar
RealNC
Site Admin
Posts: 3781
Joined: 24 Dec 2013, 18:32
Contact:

Re: For random input lag: Hardware Cache Prefetcher? (BIOS)

Post by RealNC » 15 Jan 2024, 13:14

Jonnyc55 wrote:
15 Jan 2024, 11:16
Is there any random nature to gaming, where this prediction might simply do more harm than good?
No. In fact, prefetching and speculative execution are what make modern CPUs so fast. Without it, the CPU you bought it 2023 would be even slower than a CPU you bought 15 years ago.
SteamGitHubStack Overflow
The views and opinions expressed in my posts are my own and do not necessarily reflect the official policy or position of Blur Busters.

Jonnyc55
Posts: 10
Joined: 15 Jan 2024, 08:09

Re: For random input lag: Hardware Cache Prefetcher? (BIOS)

Post by Jonnyc55 » 14 Feb 2024, 14:08

RealNC wrote:
15 Jan 2024, 09:56
That's not how it works. Player input does not change the code. The result of the execution of the code changes, but the code itself does not.
Isn't the result of the execution of the code, still in itself code?

User avatar
RealNC
Site Admin
Posts: 3781
Joined: 24 Dec 2013, 18:32
Contact:

Re: For random input lag: Hardware Cache Prefetcher? (BIOS)

Post by RealNC » 14 Feb 2024, 15:43

Jonnyc55 wrote:
14 Feb 2024, 14:08
Isn't the result of the execution of the code, still in itself code?
No. That statement doesn't even make sense :P It's like asking if the result of executing all the instructions of a cooking recipe is in itself a cooking recipe. It's not. The result is food.

Btw, the analogy to recipes is quite fitting for code. Even if you change the ingredients randomly, the recipe steps remain the same. They're fixed and not affected by randomness. The data the code operates can vary and even be random, but the code itself does not change.
SteamGitHubStack Overflow
The views and opinions expressed in my posts are my own and do not necessarily reflect the official policy or position of Blur Busters.

Jonnyc55
Posts: 10
Joined: 15 Jan 2024, 08:09

Re: For random input lag: Hardware Cache Prefetcher? (BIOS)

Post by Jonnyc55 » 15 Feb 2024, 09:22

1. Add 1 tbsp flour (suddenly becomes carrot instead of flour)

In my head that's what i see for random ingredient swapping but the step staying the same.

That wouldn't be the step staying the same.

If in a game a player was to stare at a tower then switch to looking at a bench. To a CPU that was random. The fundamental code on how in-game assets are drawn is the same, the sequence of code to draw the bench is different however.

The binary that sequences the pixel illumination pattern for the monitor to make a pattern of the bench is different binary to the tower.

I guess the hardware prefetcher works with just the functions of the code purely then. And the code output from them functions can't be predicted due to user input, obviously.

The prefetcher can't predict ahead of time, what binaries that user will illict for the monitor in the way of pixels and asset loading, well, not perfectly.

User avatar
RealNC
Site Admin
Posts: 3781
Joined: 24 Dec 2013, 18:32
Contact:

Re: For random input lag: Hardware Cache Prefetcher? (BIOS)

Post by RealNC » 15 Feb 2024, 10:18

Jonnyc55 wrote:
15 Feb 2024, 09:22
I guess the hardware prefetcher works with just the functions of the code purely then. And the code output from them functions can't be predicted due to user input, obviously.

The prefetcher can't predict ahead of time, what binaries that user will illict for the monitor in the way of pixels and asset loading, well, not perfectly.
These things are not meaningfully affected by prefetching and speculative execution. They are too high level. The prefetcher operates on things that happen within nanoseconds. For example the code that applies geometry transformations when preparing a frame, will do that in a loop that walks over all vertices and applies the change. A single multiplication operation takes like a few nanoseconds, but this is going to be much slower if the data needs to be fetched from RAM first. The prefetcher is responsible for loading that data (in this case vertices) into the CPU cache so it's readily available once the CPU needs it. There's no user input or randomness here. All of that has happened on a much higher level and all decisions about what data to operate on and which code to call have already been made long before the code starts doing the heavy lifting and number crunching.

You can code a graphics demo that has no user input. It just renders the same thing every time. If you add user input later on and make it more like a game, the performance will not change at all. It will run just as fast. The randomness of player input is just too far removed from the level at which CPU optimizations operate on.
SteamGitHubStack Overflow
The views and opinions expressed in my posts are my own and do not necessarily reflect the official policy or position of Blur Busters.

Jonnyc55
Posts: 10
Joined: 15 Jan 2024, 08:09

Re: For random input lag: Hardware Cache Prefetcher? (BIOS)

Post by Jonnyc55 » 18 Feb 2024, 17:13

RealNC wrote:
15 Feb 2024, 10:18
Jonnyc55 wrote:
15 Feb 2024, 09:22
I guess the hardware prefetcher works with just the functions of the code purely then. And the code output from them functions can't be predicted due to user input, obviously.

The prefetcher can't predict ahead of time, what binaries that user will illict for the monitor in the way of pixels and asset loading, well, not perfectly.
These things are not meaningfully affected by prefetching and speculative execution. They are too high level. The prefetcher operates on things that happen within nanoseconds. For example the code that applies geometry transformations when preparing a frame, will do that in a loop that walks over all vertices and applies the change. A single multiplication operation takes like a few nanoseconds, but this is going to be much slower if the data needs to be fetched from RAM first. The prefetcher is responsible for loading that data (in this case vertices) into the CPU cache so it's readily available once the CPU needs it. There's no user input or randomness here. All of that has happened on a much higher level and all decisions about what data to operate on and which code to call have already been made long before the code starts doing the heavy lifting and number crunching.

You can code a graphics demo that has no user input. It just renders the same thing every time. If you add user input later on and make it more like a game, the performance will not change at all. It will run just as fast. The randomness of player input is just too far removed from the level at which CPU optimizations operate on.
With the vertices already loaded, does the fact a user moves his camera around, not push anything new to the CPU then? I'm not having all viewing angles of all the vertices are pre-calculated.

Even if the CPU was quickly lining up the rest of summoned calculations in nanoseconds into cache, it would quickly get swamped with new needs by the new camera angles.

I don't know, I just can't get my head around it. I can't help but think the CPU is surely met with something new every time you move the camera around, even if things are calculated like models, you still have to queue the frame of those objects.

Maybe you have already explained it, I'm just not grasping it. No big worry, it's not important. I tested without hardware prefetcher anyway, and I didn't see much of anything or feel much of anything, other than slight sluggishness around windows.

It's always interesting these settings, I do read there are reasons to turn it off by professionals:

https://docs.oracle.com/cd/E19962-01/ht ... gljyu.html
Hardware prefetchers work well in workloads that traverse array and other regular data structures. The hardware prefetcher options are disabled by default and should be disabled when running applications that perform aggressive software prefetching or for workloads with limited cache. For example, memory-intensive applications with high bus utilization could see a performance degradation if hardware prefetching is enabled.
https://communities.vmware.com/t5/VI-VM ... is%20doing.
IBM enables the CPU hardware prefetch by default but Intel recommends turning the feature off depending on what the server is doing. Anyone have any preferences?
A guy responds:
I think you would be wrong. Try it and see what happens.

Instruction supply may become a substantial bottleneck in future generation processors that have very long memory latencies and run application workloads with large instruction footprints such as database servers. Prefetching is a well-known technique for improving the effectiveness of the cache hierarchy

employs a hardwarebased breadth-first search of future control-flow to cope with weakly-biased future branches, prescient instruction prefetch uses precomputation to resolve which controlflow path to follow. Furthermore, as the precomputation frequently contains load instructions, prescient instruction prefetch often improves performance by prefetching data.

prefetch uses helper threads to perform instruction prefetch on behalf of the main thread.

A key challenge for instruction prefetch is to accurately predict control flow sufficiently in advance of the fetch unit to tolerate the latency of the memory hierarchy. The notion of prescient instruction prefetch was first introduced as a technique that uses helper threads to improve single-threaded application performance by performing judicious and timely instruction prefetch.
And then:
Processor Hardware Prefetcher

When this setting is enabled, (disabled is the default for most systems), the

processors is able to prefetch extra cache lines for every memory request.

Recent tests in the performance lab have shown that you will get the best

performance for most commercial application types if you disable this feature.

The performance gain can be as much as 20% depending on the application.

For high-performance computing (HPC) applications, we recommend you turn

HW Prefetch enabled and for database workloads, we recommend you leave

the HW Prefetch disabled.

Both prefetch settings do decrease the miss rate for the L2/L3 cache when they

are enabled but they consume bandwidth on the front-side bus which can reach

capacity under heavy load. By disabling both prefetch settings, multi-core setups

achieve generally higher performance and scalability.
When I read this back and forth, pros/cons on stuff like this, it gets my imagination going as to all the real nuanced stuff going on deep inside. Which has me testing.

But there we are.

User avatar
RealNC
Site Admin
Posts: 3781
Joined: 24 Dec 2013, 18:32
Contact:

Re: For random input lag: Hardware Cache Prefetcher? (BIOS)

Post by RealNC » 19 Feb 2024, 09:43

Jonnyc55 wrote:
18 Feb 2024, 17:13
With the vertices already loaded, does the fact a user moves his camera around, not push anything new to the CPU then? I'm not having all viewing angles of all the vertices are pre-calculated.
Input has already been read. Games don't read input in the middle of rendering a frame and then cancel the frame rendering because there's new input. If that was the case, simply moving the mouse quickly would results in your FPS dramatically decreasing, because the mouse movement would result in constantly aborting the rendering of frames. Once the input is read, the next frame is prepared and then rendered. Nothing is going to change in the middle of rendering a frame. Every code path that needs to be taken and all data that is to be operated on is decided upon right after reading input.

Also keep in mind that games don't read user input constantly. They only poll the state of your input devices once per frame. Once input is polled, the game doesn't care anymore about input until the next frame. And that is a very long, long time for a CPU. The timescale on the other hand that the prefetcher operates in, is tiny.
Even if the CPU was quickly lining up the rest of summoned calculations in nanoseconds into cache, it would quickly get swamped with new needs by the new camera angles.
Nothing changes while rendering a frame. Once the frame is rendered, the state of the input devices is read again, and the next frame is getting prepared and rendered based on that input state. During the frame prepare+render step, the game needs to operate on data, and that data is being loaded by the prefetcher into the cache in parallel while the CPU operates on data already in the cache.

Also, the BIOS feature you're talking about is not about disabling prefetch entirely. It's an additional setting that mostly tweaks the prefetcher to aggressively load more things into cache. You cannot disable prefetch entirely. It's a vital optimization of any CPU for many years now and performance would be crippled without it.
SteamGitHubStack Overflow
The views and opinions expressed in my posts are my own and do not necessarily reflect the official policy or position of Blur Busters.

Post Reply