Some mysterious effects on gameplay

Everything about latency. Tips, testing methods, mouse lag, display lag, game engine lag, network lag, whole input lag chain, VSYNC OFF vs VSYNC ON, and more! Input Lag Articles on Blur Busters.
empleat
Posts: 149
Joined: 28 Feb 2020, 21:06

Re: Some mysterious effects on gameplay

Post by empleat » 11 Apr 2021, 03:36

iceboy wrote:
29 Jan 2021, 20:16
I've learnt a lot in this forum and improved my gameplay by magnitudes. However there are things that I didn't see discussions about and hard to explain. They also improve gameplays by a lot - I wouldn't say input lag here because I don't have measurements, but it should be a mix of things including input lag.
I have comprehensive list to reduce your input lag to minimum, you wanna check this: https://www.tenforums.com/gaming/117377 ... ost1454596

Yea RAM can affect mouse feel a lot, you wanna buy always lowest latency RAM.
And mostly BIOS features affect input lag, problems is no one is gonna tests these on every mobo, like: disable BCLK spread spectrum is insane, or VRM spread spectrum.
Then it is important to know what timers mobo will have.
And for DPC latency tests check anandtech!

iceboy
Posts: 22
Joined: 04 Oct 2020, 14:22

Re: Some mysterious effects on gameplay

Post by iceboy » 12 Apr 2021, 17:31

empleat wrote:
11 Apr 2021, 03:36
I have comprehensive list to reduce your input lag to minimum, you wanna check this: <ad link>
I have actually gone through your guide and some other guides in the past. Most of the items are good while there is one key thing I don't agree.

I have seen in multiple game communities - new players play with default settings, medium (think of gold/diamond rank of league of legends) players play with maximum pre-rendered frames set to 1 (or low-latency mode on/ultra) and very high FPS, and top/pro players play with default settings (maximum pre-rendered frames set to 3, I've seen some set to 4 or 8) again with very low FPS, and with secret FPS limiting/frame pacing techniques.

The pre-render queue is just like other producer-consumer queues. When the producer is faster than the consumer, some frames will be stored in the queue causing delay, by limiting the queue size to 1, the producer will block on the enqueue operation when the previous frame has not been dequeued, hence when the producer is faster, the delay is constantly 1 frame. While the 1 frame delay can already downgrade a top player to a medium player, the bigger problem is that it blocks the producer thread, both creating contention and with an extremely unstable delay time.

I still remember how low the latency is with NVidia 285/295 drivers with G92 (9800GTX/GTS250) GPU, because it allows to set the maximum pre-rendered frames to 0. I think that turns off the queue. When coming to Fermi (GTX400 series), suddenly no one can play the game good again. The darkest time was the with the 320 driver where the lag was extremely high. Things started to improve after 340 drivers.

Hardware and software developers seem to be ignorant - they only solve the requirements while that don't necessarily create a better system. Thanks to AR/VR applications that have low latency requirements, we have better systems and are able to play games again, although I'm not a fan with AR/VR. The idea here is the system is not designed for gaming and it's like we are reusing commodity hardwares for our special needs.

After struggling for many years, I wrote my own frame rate limiter, because none of the existing ones (e.g. RTSS) can do the job without adding latency. It started with a negative-feedback frame pacing with the windowed VSync - the windowed VSync is so stable, it must be a masterpiece, and it don't have any lag compared to full screen VSync which seems not implemented correctly. I didn't find a way to create a timer source with similar stability without a busy loop, comparing to the windowed VSync so I pace with it. However the frame pacing needs to add ~2ms lag for stability and I noticed - I can use busy loop, and I can create an accurate delay if I use busy loop in the last 1ms. When I limit the game to 59.94fps with a 59.94Hz screen, I can see a stable tearline in the middle of the screen - this is achieved without any scanline sync. The pixels just above the scanline is just computed so they have lowest latency. Now I play 59.94fps on 239.76Hz screen for faster scanout.

empleat
Posts: 149
Joined: 28 Feb 2020, 21:06

Re: Some mysterious effects on gameplay

Post by empleat » 14 Apr 2021, 05:35

iceboy wrote:
12 Apr 2021, 17:31
I have seen in multiple game communities - new players play with default settings, medium (think of gold/diamond rank of league of legends) players play with maximum pre-rendered frames set to 1 (or low-latency mode on/ultra) and very high FPS, and top/pro players play with default settings (maximum pre-rendered frames set to 3, I've seen some set to 4 or 8) again with very low FPS, and with secret FPS limiting/frame pacing techniques.
I should probably start by addressing pre-rendered frames. I once switched to 560 to try 0 pre-rendered frames and maybe there was difference - don't quite remember. Tho GPU needs at least 1 pre-rendered frame - says NVIDIA support. Otherwise it wouldn't have anything to work on. 1 pre-rendered frame will cause generally delay of 1 frame. Tho I heard Chief Blur Buster say (if I Am not mistaken) that 3 pre-rendered frames won't be always used. Some can be discarded, but still for lower input lag you should set it to 1!

LOOOOL can you show me please, e.g. link some stream? I never heard that pros would play on low FPS with secret FPS limiting/frame pacing techniques - interesting! Because logically you want more FPS, I don't think there would be any advantage on lower FPS - even with secret frame pacing techniques. G-Sync already reduces tearing drastically. What could you possibly gain from this on low FPS I wonder?

What exactly you mean by low FPS that pros play on BTW?
iceboy wrote:
12 Apr 2021, 17:31
The pre-render queue is just like other producer-consumer queues. When the producer is faster than the consumer, some frames will be stored in the queue causing delay, by limiting the queue size to 1, the producer will block on the enqueue operation when the previous frame has not been dequeued
I don't think this is relevant. Because CPU is significantly slower, almost about 50% than GPU! In BF1 (DX11) and Rainbov Six Siege (Vulkan) my GPU frame time is like 3-4 ms max all the time, while CPU never goes under 6.7ms in BF1. I have never seen GPU frame time to spike, not even once! And I was monitoring it extensively, because I had CPU frame time spikes in BF1.

While if I had 3 pre-rendered frames. I would have delay up to 3 frames! I always could tell between 1 and 3-4 pre-rendered frames. 3-4 introduces significant delay.

Interesting that if producer would be faster (you say) frame would be dequeued, didn't know that. Maybe if one has a weak GPU, this should be concern of his then.

But it makes logical sense since pre-rendered queue is "1". That CPU waits before it starts rendering a new frame on GPU, that also means it will draw more recent frame later, then if it started drawing right away. Not sure what would be better.
iceboy wrote:
12 Apr 2021, 17:31
I still remember how low the latency is with NVidia 285/295 drivers with G92 (9800GTX/GTS250) GPU, because it allows to set the maximum pre-rendered frames to 0. I think that turns off the queue.
NVIDIA staff said that you can't have less than 0 pre-rendered frames. GPU needs at least 1 frame from CPU in order to do work. I don't know if this is true, NVIDIA staff couldn't disclose something, which wasn't meant for public anyways. So what this could exactly mean for GPU, I don't know. Maybe like you said: 1 pre-rendered frame, but without waiting on GPU, so if CPU was faster, it would start next frame right away. Or that it would send less than 1 frame right away and NVIDIA lied to us :lol: Or it could try reduce driver latency, which is precisely what Ultra mode is for. But you have to have high GPU usage - someone did 1000fps camera test and if you don't have 99% GPU usage, it is worse than off - i think. I could tell difference in low GPU usage game, on felt much better than Ultra/off.
iceboy wrote:
12 Apr 2021, 17:31
Hardware and software developers seem to be ignorant - they only solve the requirements while that don't necessarily create a better system. Thanks to AR/VR applications that have low latency requirements
Don't even start about VR :lol: :D :lol: TELL ME THIS: How is this possible that 60fps Index with some reprojection: 110-120 interpolated frames (not fixed), on high/ultra feels like 0 input lag. Never seen something more responsive in my life!!!!!!!!!!!! I was afraid that VR will be lagged 60hz gameplay, but this is amazing! Even 120fps with G-Sync feels like shit to me! VR 60 FPS feels maybe even better than 144hz with G-Sync capped. It can also update controller position into interpolated frame, but still. You have interpolated frames only! It also doesn't have tearing, I didn't notice it! How is this EVEN possible??? It is best thing - I have ever experienced in terms of input lag and smoothness!
iceboy wrote:
12 Apr 2021, 17:31
After struggling for many years, I wrote my own frame rate limiter, because none of the existing ones (e.g. RTSS) can do the job without adding latency. It started with a negative-feedback frame pacing with the windowed VSync - the windowed VSync is so stable, it must be a masterpiece, and it don't have any lag compared to full screen VSync which seems not implemented correctly. I didn't find a way to create a timer source with similar stability without a busy loop, comparing to the windowed VSync so I pace with it. However the frame pacing needs to add ~2ms lag for stability and I noticed - I can use busy loop, and I can create an accurate delay if I use busy loop in the last 1ms. When I limit the game to 59.94fps with a 59.94Hz screen, I can see a stable tearline in the middle of the screen - this is achieved without any scanline sync. The pixels just above the scanline is just computed so they have lowest latency. Now I play 59.94fps on 239.76Hz screen for faster scanout.
V-Sync wait what? Even triple buffering V-Sync cause lag, there is also multi-buffering, which is sort of better. Not sure now, but I don't think: it is better than G-Sync. Oh I know why: it adds extreme amount of memory to VRAM (and decreases FPS maybe slightly) even like 12GB I heard. My RTX 3070 has only 8GB LOL!!!

Also even if you had 0 tearing on 60 FPS. 60 FPS is such a low value:
1. it increases input lag drastically
2. increases motion blur (someone claim at least 180 FPS is needed to see enemy character, when you rapidly looking around)
for me even on 144hz monitor with low pixel response time looking around quickly was blurred and on 144hz with G-Sync also blurred

60 FPS is such a low value. Pros played already on 200 FPS CRTs in CS GO 1.6. Can't believe anyone would play on it 60!!!
I have G-Sync monitor, but I would still prefer fixed refresh rate for competitive play. Even it causes tearing, it has ultimately lowest latency and on 144fps+ tearing is less of a concern.

Yeah Windowed mode has higher input lag, than exclusive. Also since 460.09 there is new MPO overlay, if you switch to windowed-bordeless. It has very low delay, tested in Rainbow Sig Siege on Vulkan.

500hz monitors coming in 1-2 year and 1000hz in 3-4 maybe. Can't believe anyone still plays on 60 :D :D :D

deama
Posts: 368
Joined: 07 Aug 2019, 12:00

Re: Some mysterious effects on gameplay

Post by deama » 14 Apr 2021, 10:29

iceboy wrote:
09 Apr 2021, 14:15
Update - tl;dr - I found out a big factor for difference between single vs dual memory is ... memory size

The memory kit is 2x8G and dual channel will have 16G total. I limited the memory size to 8G using "bcdedit /set truncatememory", and the gameplay starts to feel significantly snappier. This recalls me the fact that the game uses memory-mapped files to load its assets and with a large memory size there can be a lot of memory used for caching. I then limit the memory down to 4G and 2G and each step down brings a significant difference. It seems that limiting the memory to just enough can bring best performance. 2G is enough for me - this machine is only for playing this ancient game, I don't even run a web browser on it.

This also shows that memory is not free - using idle memory to cache data has a cost - it makes everything larger, page table, kernel memory pools, mm allocators, "working set", and it's not always a good idea. This can also partially explain why 32-bit system performed better (because the memory is capped at 3-4G) and enabling pagefile performed better (smaller working set). I've also seen people using ISLC to achieve similar goals and I'll try to dig down (e.g. play with RAMMap, CacheSet and tune kernel parameters) once I have time.

Edit: With memory limit to 2G, single and dual channel memory still has a difference and single channel still feels better as more precise, dual channel memory feels "bold, stronger" but tend to go out of control.

Intel 11th gen CPU is disappointing in core-to-core and memory latency.
So does that mean if you bought say 2x8GB dual channel sticks, without limiting the RAM, so you'd end up with 16GB, what if you bought 2x16GB dual channel and limited the RAM to 16GB, would those two perform the same, or would the 2x32GB sticks actually perform better because you halved the memory so I guess it's got more "room" or "cache" or whatever?

iceboy
Posts: 22
Joined: 04 Oct 2020, 14:22

Re: Some mysterious effects on gameplay

Post by iceboy » 14 Apr 2021, 16:54

deama wrote:
29 Jan 2021, 23:24
The single memory stick thing seems interesting, you're saying that gameplay feels better by just using 1 single-channel memory stick? What if you used a single dual-channel stick?
deama wrote:
14 Apr 2021, 10:29
So does that mean if you bought say 2x8GB dual channel sticks, without limiting the RAM, so you'd end up with 16GB, what if you bought 2x16GB dual channel and limited the RAM to 16GB, would those two perform the same, or would the 2x32GB sticks actually perform better because you halved the memory so I guess it's got more "room" or "cache" or whatever?
Not sure what a dual-channel stick looks like - I only heard of dual rank sticks and people say those tend to have better performance while having worse limits on timings and frequency. In the original post I found a huge difference with 1x8GB and 2x8GB, and in the update post I found the difference is smaller when limiting total system memory (e.g. to 2, 4 or 8GB) in both cases. It's not getting more "room" or "cache", but smaller of everything. For example, the windows kernel allocates differnt size non-paged pool depends on the total memory, and because of less memory usage, the page table is smaller. All of these makes the L3 cache usage smaller for these "management structures". In the past I found turning PAE off can reduce latency - it reduces the page table from 3 to 2 levels. However there is still diference when memory is capped. I don't want to buy more sticks with the risk that it may not improve anything. The current setup is already extreme - 40ns memory latency with 6700K with DDR4-2400. The previous plan was to buy a rocket lake system but the reviews are bad so giving up.

iceboy
Posts: 22
Joined: 04 Oct 2020, 14:22

Re: Some mysterious effects on gameplay

Post by iceboy » 16 Apr 2021, 19:19

empleat wrote:
14 Apr 2021, 05:35
I should probably start by addressing pre-rendered frames...
The different feeling between 0 and 1 pre-rendered frames reminded me of the following (anti-)pattern that are commonly seen in code:

Code: Select all

if (pre_rendered_frames > 0) {
    init_render_queue();
}

// 10,000 lines later ...

if (pre_rendered_frames == 0) {
    render(frame);
} else {
    enqueue(frame);
}
Some people may think pre_rendered_frames 0 and 1 behave the same, but they are actually not - render() waits for the rendering of the current frame to finish, while enqueue() waits for the rendering of the previous frame to finish. One day some guy came up with a brilliant idea - since they look the same, let's remove the branches and simplify the code. The guy ended up removing 10,000 lines of code from the NVidia driver code base and got promoted, but the behavior changed and it took users years to figure out. Some esports players ended their career, while some other people like me became tweakers, which are dissed by the gaming community, thinking that we are trying to create an unfair advantage, but gaming is really all about system performance and battle is nonsense.

https://www.nvidia.com/en-us/geforce/fo ... -frames-0/

For this reason, I like Linus Torvalds' opinion on never breaking the userland, and so does his opinion on NVidia.

gpuview.png
gpuview.png (43.38 KiB) Viewed 5226 times

GPUView is a useful tool to visualize CPU and GPU usage in games. In my gaming environment, every 1/59.94 second, the CPU computes a frame, and then the GPU renders the frame. A separate scanout process reads from the rendered buffer and transmits pixel to the screen every 1/239.76 second. There is no delay.

If you think about it, there is actually no way to accurately schedule an periodic event (say, every 1/60 second) on Windows, if spin loop is not an option. The only two timer sources widely available on Windows are: the system tick, and the VSync. The system tick, known to gamers as "the timer resolution", when configured to 1ms, adds 3% inaccuracy in average to an event at 60/s rate, and the inaccuracy is proportional to the event rate. Also when a system tick occurs, so many things are triggered, starving the CPU and adding more inaccuracy. Comparatively the VSync when used as a timer source, are more quiet, and the interval is also more suitable for triggering an event like the game tick. However the VSync is not exported directly and must be used in some indirect form, like actually enabling VSync, or play in windowed mode with D3DPRESENT_INTERVAL_DEFAULT. Most gamers ended up trying to solve the problem in a brute-force way, firing frames as fast as they can, giving up accuracy, but inaccuracy still exists no matter how high the FPS is.

https://gaming.stackexchange.com/questi ... indow-mode

I don't know the answer, but if you ask me, I think it's more of a selection process: pro players did not choose to play in windowed mode, people played in windowed mode are more likely to become pro players.

There is one thing I did not figure out: the DirectX 9 windowed mode seems to add a composition buffer somewhere between Present() and scanout, because I don't see the tearline. If anyone knows how to make them use the same buffer, please tell me!

empleat
Posts: 149
Joined: 28 Feb 2020, 21:06

Re: Some mysterious effects on gameplay

Post by empleat » 17 Apr 2021, 12:39

deama wrote:
29 Jan 2021, 23:24
The single memory stick thing seems interesting, you're saying that gameplay feels better by just using 1 single-channel memory stick? What if you used a single dual-channel stick?
If you use dual RAM sticks you gain slightly more performance, because data can be sent using 2 channels. Same principle with quadriple RAM sticks and more, but then performance gain is more negligible. While 1 vs 2 modules could be like 4 fps, not something drastic. But yeah it helps a bit. Sometimes 4 fps is big deal!
deama wrote:
14 Apr 2021, 10:29
So does that mean if you bought say 2x8GB dual channel sticks, without limiting the RAM, so you'd end up with 16GB, what if you bought 2x16GB dual channel and limited the RAM to 16GB, would those two perform the same, or would the 2x32GB sticks actually perform better because you halved the memory so I guess it's got more "room" or "cache" or whatever?
Interesting, not sure, if there is added latency using dual channel. Never heard about it, it could be better, because you can send data on 2 channels in parallel!
iceboy wrote:
16 Apr 2021, 19:19
Some people may think pre_rendered_frames 0 and 1 behave the same, but they are actually not - render() waits for the rendering of the current frame to finish, while enqueue() waits for the rendering of the previous frame to finish.
OK so pre-rendered frames = 0 doesn't wait on previous rending queue to finish, if I understand it correctly. But unless you are bottlenecked by GPU, this shouldn't matter at all, as GPU is much faster than CPU!
iceboy wrote:
16 Apr 2021, 19:19
I don't know the answer, but if you ask me, I think it's more of a selection process: pro players did not choose to play in windowed mode, people played in windowed mode are more likely to become pro players.
This does not make sense. FSE is better than windowed, as it:
1. has definitively less input lag
2. has less tearing

Also it will suspend user GUI and game uses its own compositor - I think (maybe not in Windows 10) instead of DWM. And DWM forces V-Sync in Windowed Mode! However since 460.09 NVIDIA drivers you can use Windowed bordeless mode, there is a good chance Windows will activate MPO overlay. It works on DX12/Vulkan, as since DX12/Vulkan devs can choose to not implement FSE into their games now :x ... And it has amazing input lag, tested it in Rainbow Six Siege on Vulkan!
iceboy wrote:
16 Apr 2021, 19:19
There is one thing I did not figure out: the DirectX 9 windowed mode seems to add a composition buffer somewhere between Present() and scanout, because I don't see the tearline. If anyone knows how to make them use the same buffer, please tell me!
On Windows 10, in Windowed mode DWM will force V-Sync, therefore no tearline! Maybe that's why you see input lag. But generally Windowed mode had always higher input lag. I never ever used it. FSE feels much better! Maybe pros use windowed, because they are streaming, so they are used to it! There were even some tests showing FSE is better in terms of input lag!

forii
Posts: 218
Joined: 29 Jan 2020, 18:23

Re: Some mysterious effects on gameplay

Post by forii » 17 Apr 2021, 15:25

n1zoo wrote:
01 Feb 2021, 10:11
shader cacher has some impact on hitreg. off feels better for me.
+ try to uncheck realtek ethernet in msi utility mode. feels like headshots landing more in cs
dont have realtek, should I still uncheck my Intel ethernet? does it really matter on windows 20h2?
thats what I have now, dont remember what was default but I think i turned them all on msi, which one should remain unchecked? :)

Image

User avatar
n1zoo
Posts: 182
Joined: 04 Feb 2020, 06:26
Location: Lithuania

Re: Some mysterious effects on gameplay

Post by n1zoo » 17 Apr 2021, 16:40

forii wrote:
17 Apr 2021, 15:25
n1zoo wrote:
01 Feb 2021, 10:11
shader cacher has some impact on hitreg. off feels better for me.
+ try to uncheck realtek ethernet in msi utility mode. feels like headshots landing more in cs
dont have realtek, should I still uncheck my Intel ethernet? does it really matter on windows 20h2?
thats what I have now, dont remember what was default but I think i turned them all on msi, which one should remain unchecked? :)

Image
Uncheck Intel(R) Ethernet Connection. If you do not notice any effect you can check mark again.

forii
Posts: 218
Joined: 29 Jan 2020, 18:23

Re: Some mysterious effects on gameplay

Post by forii » 18 Apr 2021, 10:08

n1zoo wrote:
17 Apr 2021, 16:40
forii wrote:
17 Apr 2021, 15:25
n1zoo wrote:
01 Feb 2021, 10:11
shader cacher has some impact on hitreg. off feels better for me.
+ try to uncheck realtek ethernet in msi utility mode. feels like headshots landing more in cs
dont have realtek, should I still uncheck my Intel ethernet? does it really matter on windows 20h2?
thats what I have now, dont remember what was default but I think i turned them all on msi, which one should remain unchecked? :)

Image
Uncheck Intel(R) Ethernet Connection. If you do not notice any effect you can check mark again.
Thanks mate
But what does it mean that it can degrades something? My mouse will get worse because of it?

Isnt it risky? to uncheck it?

Quote bellow
"the question is only now...for how long it stays like that and when it degrades again ?"

Post Reply