Page 1 of 2

Our favorite DWM

Posted: 11 Sep 2022, 15:32
by AndreyRGW
Hi all,

I decided to do one test on Windows 11.
Timerbench with and without DWM. The results are interesting.
Without DWM.jpg
Without DWM.jpg (27.82 KiB) Viewed 5306 times
With DWM.jpg
With DWM.jpg (26.14 KiB) Viewed 5306 times


What do you think?

Re: Our favorite DWM

Posted: 13 Sep 2022, 13:26
by Chief Blur Buster
Interesting!

That being said: Many test passes are needed though.

Run with DWM, then without, then with, then without, then with, then without, keep repeating. Preferably at least 10 more times. Alternating the runs also usually keeps the caching fairer too.

One run of each unfortunately produces too opaque data.

Needs averages to punch through the various noises.

Random things like a windows service or driver might suddenly decide to do more stuff during one run than a different run. Roundabout things like intermittent cloud sync checks, malware scans, memory garbage collection, and other random background stuff. Even minor run-to-run differences like moving a mouse 1 inch further than the previous run can create dramatic differences in "0.1%-different" results.

Re: Our favorite DWM

Posted: 13 Sep 2022, 14:46
by Anonymous768119
I think it's not worth of 0.06ms and 7 FPS

Re: Our favorite DWM

Posted: 13 Sep 2022, 20:07
by Chief Blur Buster
a_c_r_e_a_l wrote:
13 Sep 2022, 14:46
I think it's not worth of 0.06ms and 7 FPS
Depends on the machine -- it can be a much bigger difference (or not). Also, the Hz you use, and what you're doing.

There are times where it is clumped together, e.g. weird microfreezes between refresh cycles, that adds mouse jitter or keyboard inputread jitter, etc. Affecting lagfeel.

Compositing sometimes interrupts the GPU asynchronous rendering workflow on certain GPUs, the bunching-effect that occurs, especially if you've enabled RTSS Scanline Sync, whereupon its Force Flush setting also forces the compositing operations to also complete, before the GPU can render the next frame. Ending up requiring more GPU% headroom in order to successfully maintain RTSS Scanline Sync, if DWM is turned on.

Sometimes an inefficiently (bogged-down by flaws/badly written driver hook/etc) DWM creates microstutters. Most of the time the compositing overheads is 0.1ms or less on modern fast-memory-bandwidth GPUs (0.06) especially modern RTX GPUs, but sometimes it spikes to, say, 1ms -- big enough to create human visible microstutter in strobed modes.

The top-spec RTX 4090 is rumored to do 1 terabyte per second on its memory, and compositing framebuffers (megabytes) is just child's play with that nuclear powered damburst of memory bandwidth. DWM on/off probably be virtually zero human perceivable difference, if no other overheads are bogging down. But coding in Microsoft kernel has sometimes (ahem) has been a problem. Oh and the drivers cesspool? :D

But in the real world, not everyone will get 4090s, drivers can be bad, OS can be bad, and 3rd party software adds processing-expensive hooks (RTSS, Alt+Z GeForce menu, RGB utilities, etc), and all those calls between Ring 0 (kernel) and userspace is expensive. Who knows, some sheninigians may move them to DPC (Deferred Procedure Calls), which can have domino effects on latencies.

As we hit 500Hz and 1000Hz, all those compositing overheads adds up. So a thousand 0.1ms translates to 100ms, a 10% slowdown in framerate during 1000Hz Microsoft Windows.

Most of the time, DWM and non-DWM makes negligible difference in most use cases, especially casual play, especially with some pros (Alt+Tabbing) sometimes exceeding the cons.

But there are exceptions where DWM "unexpectedly" add 5-10% overheads (e.g. certain ultrahighHz scanline sync algorithms on a certain brand of GPU on a certain machine). However, when Microsoft introduced the Full Screen Optimizations, things became a bit different, and sometimes overheads that didn't exist appeared, and other overheads that existed disappeared.

It's crapshoot per-machine per-GPU per-settings, but if you're ultrahigh Hz (390Hz, 500Hz, etc), it's worth at least doublechecking DWM compositing is not a major % of your system workload. I've seen it as a 0.1% workload and I've seen it as a 10% workload.

All this is just mostly academic if you have a modern GPU with fast memory. Bigger issue if integrated GPU sharing DRAM -- that's when DWM compositing gives you a bigger hit due to less memory bandwidth.

But I'm always interested in more tests of Windows 11, to see what "problems" it might have.

TL;DR: Test it out, verify it's not a major workload difference, and also doublecheck for existence of frametime spikes.

Re: Our favorite DWM

Posted: 20 Sep 2022, 13:44
by BTRY B 529th FA BN
I tested with HPET Enabled and Disabled and the 'Timer Calls/s' changed dramatically between the timer changing.

HPET Timer Calls
-Windowed Synthetic Test - 964k
-Windowed Game Test - 521k
-Full Screen Synthetic Test - 964k
-Full Screen Game Test - 526k

Invariant TSC Calls
-Windowed Synthetic Test - 28m
-Windowed Game Test - 1.68m
-Full Screen Synthetic Test - 28m
-Full Screen Game Test - 1.77m

Re: Our favorite DWM

Posted: 25 Sep 2022, 11:29
by espresso
AndreyRGW wrote:
11 Sep 2022, 15:32
Hi all,

I decided to do one test on Windows 11.
Timerbench with and without DWM. The results are interesting.

What do you think?
How do you "turn off" DWM?

Re: Our favorite DWM

Posted: 25 Sep 2022, 11:41
by BTRY B 529th FA BN
espresso wrote:
25 Sep 2022, 11:29
AndreyRGW wrote:
11 Sep 2022, 15:32
Hi all,

I decided to do one test on Windows 11.
Timerbench with and without DWM. The results are interesting.

What do you think?
How do you "turn off" DWM?
Use Fullscreen mode.

Re: Our favorite DWM

Posted: 25 Sep 2022, 23:27
by pox02
https://prnt.sc/AfM0PPyKxlTn

1301.47 fullscreen
1293.60 RDTSC little less fps but give me clean mind of gaming :D

Re: Our favorite DWM

Posted: 26 Sep 2022, 07:41
by BTRY B 529th FA BN
pox02 wrote:
25 Sep 2022, 23:27
https://prnt.sc/AfM0PPyKxlTn

1301.47 fullscreen
1293.60 RDTSC little less fps but give me clean mind of gaming :D
You tested at 640x480, you game at 640x480?

Re: Our favorite DWM

Posted: 28 Sep 2022, 19:15
by Chief Blur Buster
BTRY B 529th FA BN wrote:
20 Sep 2022, 13:44
I tested with HPET Enabled and Disabled and the 'Timer Calls/s' changed dramatically between the timer changing.

HPET Timer Calls
-Windowed Synthetic Test - 964k
-Windowed Game Test - 521k
-Full Screen Synthetic Test - 964k
-Full Screen Game Test - 526k

Invariant TSC Calls
-Windowed Synthetic Test - 28m
-Windowed Game Test - 1.68m
-Full Screen Synthetic Test - 28m
-Full Screen Game Test - 1.77m
Oh wow -- that's a big difference. Differences big enough to create visible domino effects (see below).

In situations where you've created tons of timers, that's enough for a "death by a thousands cuts" effect in human feelable jitter/lag effects, depending on how they "bunch together".

(e.g. Bunched events can cause continuous blockages of lower priority events. Consecutive surges of timer calls in a higher-priority process that continually delay a specific lower-priority event such as a softcursor mouseread in game software = mouse jitter = more random cursor gapping)
____

Related commentary:

I am not sure if this applies to Windows 10 or 11, but I've heard that some driver and operating system optimizations bunch up similar-expiration timer events into a single DPC, but that creates a much higher-latency DPC due to the need to execute multiple timer events in one DPC = even 0.5ms to 1ms total bunching can create human visible effects now (like a sudden brief stroboscopic-stepping distance change, aka jitter in motion) at the current Hz & resolutions & persistence of today's display technologies that we're now dealing with.

Or in other words, like an internal self-DDoS like a classical NMI loop (infinite loop of Non Maskable Interrupts of old 8bit/16bit lore) -- The modern accidental repeat bunching of Ring 0/1/2 timer events + high priority can cause a freeze in Ring 3 timer events + userspace priority. I've seen the starved CPU cycle effect. Like a poorly written driver!

Like how you try to drag certain "paint-inefficient" apps with an 8KHz mouse, like dragging an old version of Excel (I think it was Excel 2011) at 8000 cursor positions = window position per second -- the Excel window tries to repaint itself (WM_PAINT) about 8000 times a second -- and essentially DDoS itself with a lagbehind where window dragging lags seconds behind. Causing Excel to temporarily inputlag for about 2-3 seconds behind, until you stop moving the mouse while dragging the window. Even though that's not a timer event, it is pretty much a real-world cascade effect of how event bunching can cause massive lag spikes. Heck, I would still be able to human-feel a mere 1/100th of that 3 second Microsoft Excel latency (30ms). Most modern apps don't repaint when window is moved since they all use backing store buffers automatically in all new versions of Windows and don't even fire a WM_PAINT anymore during window drags, but some apps still do -- and it really shows with ultra-high-Hz gaming mice in older productivity apps that paints every mouse cursor position...

So metaphorically, bunched timer event surges are also very jitter-problematic (lag issue) especially if it's processing in a higher-priority driver (thread priority, process priority) in a higher ring (e.g. ring 0,1,2) than the game itself (usually Normal Process Priority, Normal Thread Priority, Ring 3).

This wasn't a problem at 60fps 60Hz, but can now be a limiting factor for jitter-free framepacing at 1000fps 1000Hz framerate=Hz 0ms-GtG (aka no GtG blur to hide jitter error margins) trying to maintain perfect gametime:inputread:photontime sync -- to the needed error very-sub-millisecond error margin necessary to avoid human visible effects.

Software design and compensation would avoid this (e.g. like the optimizations that many VR developers already do) -- but I've seen too many crappy apps and drivers that use the Wrong Tool For The Wrong Job, e.g. choosing a timer vs busyloop vs semaphore vs mutex vs waitable lock vs whatever -- and accidentally self-DDoS the perfection of mousereads / inputreads / Present()'s / or accidentally delay a timestamping operation (e.g. unwanted delay read of RTDSC/QueryPerformanceCounter during a precision-critical timestamping operation) that cascades to visible rendering offsets (aka jitter).

In unintentional interactions, it is amazing how those nanoseconds/microseconds can bunch up to milliseconds, and how those milliseconds bunch up to seconds, because of The Way Things Are Designed in the processing workflow of a OS, drivers, application...

And sometimes it accidentally preempts things in userspace that does not get executed on time to avoid a human visible effect like a 1ms microstutter (4000 pixels/sec on an 4K display = 4 pixel jump for a 1ms microstutter) especially when stutter error is above MPRT (thanks to modern low-MPRT ultra-high-rez displays such as VR displays).

In other words "stutter error margin bigger than moition blur (MPRT) = can create results visible above human perception noisefloor". This was unimportant for low-Hz low-rez displays, but with today's vicious cycle effect of ultra high Hz + ultra high rez + low persistence, sub-millisecond issues now have become human visible, especially in VR.

Windows is not an RTOS... And unfortunately even RTOS can accidentally DDoS itself with event overload.

</ERROR code="1202" system="AGC" year="1969">