Human Benchmark implemented in C++ DirectX9

Everything about latency. Tips, testing methods, mouse lag, display lag, game engine lag, network lag, whole input lag chain, VSYNC OFF vs VSYNC ON, and more! Input Lag Articles on Blur Busters.
Meowchan
Posts: 36
Joined: 17 Jun 2020, 02:06

Human Benchmark implemented in C++ DirectX9

Post by Meowchan » 17 Aug 2020, 18:08

Code: https://github.com/NotThat/Benchmark-DirectX9
Bunaries: https://github.com/NotThat/Benchmark-Di ... s/tag/v1.0

Benchmark.DX9.Black.White.exe - changes screen color to white when pressing 'w' button, changes to black when pressing 'b'. Esc to quit.
Benchmark.DX9.Human.exe - Uses mouse buttons (and Esc to quit). First click turns screen red, after a random 3-4 seconds delay turns screen green similar to the popular Human Benchmark website. It keeps stats of min/max/avg delay as well as how many early clicks you had. Needless to say it is much faster than the web version.

My original goal was to implement a program that changes the color of the screen on a keypress. I've first implemented it in Love2D, then when I wasn't happy with the performance I've implemented it in DirectX9, DirectX12, OpenGL, and Vulkan to see which one is faster. DirectX9 ended up giving me the best performance so that's what I went with.

The program runs at ~6.7k-6.9k FPS on my machine producing full color, full screen resolution frames of a single color. My graphic card is showing 82% and I am CPU bound. Quite interesting since I have a very strong CPU (i9-9900k) and relatively weak graphic card (GTX 960). With lower resolutions it goes up to 16k FPS or so.

Once I've had the program going, implementing the human benchmark from it was trivial, and since I figured people might be interested in it I am sharing it here.

My focus was on getting everything to run as fast as I can. If anyone has ideas on how to get an increase in speed I would love to hear them. Some ideas:
- Lowering resolution and or color space. I rather like 1080p and feel as though deviating from that makes the task more synthetic and less applicable to real world situations. It does have its uses, however.
- Different graphical implementation; which one?
- Using threads. I've implemented an option to use a separate thread for input and rendering, but the results are about the same. Almost all of the run time is being spent in the Present() call which is out of my control.
- Not producing frames at all and simply waiting for the input before calling Present() - This does make sense and I have tried that, but empirically the best-case results were somehow worse. I suspect there could be a power saving issue in the graphic card, probably not in the CPU as the CPU should remain active due to busy input listening. However I do not myself have an Arduino/Teensy yet to test this myself and the numbers are from somebody else who does. Since I don't, the only metric I have to judge by is FPS numbers, and I know if my implementation ends up producing more FPS it should generally perform better as well.
- It is my understanding that at least some of the CPU overhead in calling Present() is due to the context switch of windows having to go driver side in kernel mode. I wonder if it's possible to do something about this, perhaps writing the program in driver mode? O_o



I'm curious to see some results as well as the web one. Be sure to use at least 5 clicks sample size. Here's mine:

Image
Image

deama
Posts: 180
Joined: 07 Aug 2019, 12:00

Re: Human Benchmark implemented in C++ DirectX9

Post by deama » 18 Aug 2020, 09:11

I've noticed a reduction of about 20-30ms from using your program, which I thought was interesting.
Though I've noticed that the delay is always 3-4 seconds or so, which is bad cause you can easily get used to it and end up being able to predict it far too easily, allowing you to sometimes get sub 100ms reaction time.

Alpha
Posts: 96
Joined: 09 Jul 2020, 17:58

Re: Human Benchmark implemented in C++ DirectX9

Post by Alpha » 18 Aug 2020, 09:16

Good topic. I've been interested in this due to how my machine handles USB and reading that mouse should always go to the ports the CPU vs the ports the chipset. I still haven't tested that yet. My averages are always about the same. Since I've been on a mouse conquest its been interesting playing with HBM website. What is impressive about your clicks are the consistency. Mine seems to vary by a significant margin. I tend to average around 145ms however I'll have a click @ 132 and one at 174 in same instances and never with in your margin which I'd prefer.

At my buddies house, he has a 7700k and 1070 and I hit 145 1-8 and he has a straight from China no name mouse (wtf).

My optimized 3900, 32gb CL14 3800 mem, all 4.0 nVME, 2080 ti @ +175 core and +1100 mem no way I can do that consistancy. That was on a Omen X 25F and he has a Predator Widescreen 120hz. He was around 225ms but everyone was watching so probably nerves.

We tried this at work under the worst case scenarios... wow easy 200 lol. Hyper focused and smashing the click like you'd never do in game since it'd throw off the aim, I did manage a 180ms (1 click hehe) but I probably looked like someone who's life was on the line :lol:

For the package, is this installer or just execute (I won't be able to look until later)?

Curious if you tried this assuming your running Win 10 2004 with their new GPU options?

@deama just saw your post. If its predictive that would be a massive problem for testing.

User avatar
Chief Blur Buster
Site Admin
Posts: 8326
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Human Benchmark implemented in C++ DirectX9

Post by Chief Blur Buster » 18 Aug 2020, 09:49

Welcome!

I can answer immediately, since I am familiar with Present()-to-Photons science of all sync technologies (VSYNC ON, VSYNC OFF, VRR, strobed, non-strobed).

The fastest reaction time will occur with 1000fps+ VSYNC OFF in full screen exclusive mode, non-strobed operation. However, absolute lag (latency offset) will be different and lag gradient (differences in lag of different pixels), and lag consistency (how the lag varies). So there are three latency variables. Your benchmark tests mainly absolute lag, approximately screen center.

If the window is dragged to top edge, it will have different reaction time offsets than if the window is dragged to bottom, so for consistency, always enforce fullscreen mode, and preferably exclusive (to best represent FSE games).

In VSYNC OFF and non-strobed operation, the latency differentiual of the entire screen surface is equalized to the span of one frametime. In other words, during 1000fps VSYNC OFF, non-strobed, latency of top=center=bottom within an error margin 1ms window (+/- 0.5ms)

For all other sync technologies in non-strobed mode, the top edge of the LCD screen will refresh before the bottom edge, as seen in high speed videos at www.blurbusters.com/scanout -- the scanout diagrams help people understand that not all pixels on a screen refresh at the same time.

Video cables and most video screens are simply serializations of 2D data into a 1D wire, raster-style, calendar-style, top-to-bottom, left-to-right. And most lag-optimized gaming monitors stream those pixels almost straight to the screen (with only pixel-row processing).

phpBB [video]


This is a 960fps video of a screen swapping between 4 images, one image per refresh cycle. As you can see, the screen refreshes top to bottom. So there's a ~16ms difference between top edge and bottom edge of the screen (in VSYNC ON mode / non-FSE mode). Now when you map the video onto a scanout diagram, it looks like:

Image

And if you've ever played with Custom Resolution Utility with those weird "Porch" numbers, this is how a screen is mapped out, heirarchically. Porches are simply virtual overscan pixels outside the edges of the screen, and sync are separators (like comma delimiters) separating pixel rows and refresh cycles in an infinite loop (like an imaginary "metaphorical filmreel").

Image

(From my Custom Resolution Glossary)

For simplicity, let's ignore displays that have to scan-convert from cable-sequence to display, such as plasmas, DLPs, or other displays that scanout differently or out-of-sync from cable sequence. However, the cable sequence after Present() is always unchanged, you can generally always trust cable-level sequence to be a top-to-bottom for a landscape monitor.

Now that you understand the basics of why not all pixels refresh at the same time, I can tell you some absolute mathematical generalities:

VSYNC ON, non strobed:
top<center<bottom, differential lag of 1 refresh cycle (bottom lag = top lag + 1/(Hz)sec), with absolute lag offset of frame queue depth/plus compositing (non-FSE)

VSYNC OFF, strobed:
top=center=bottom, differntial error margin of 1 frametime, absolute lag offset of +0ms. This is because Present() splices the new frame into mid-scanout, right at the moment of frame presentation (with a little offset from driver overhead, as little as few rows of pixels). Tearlines can also even be beam-raced (Tearline Jedi Demo)

VSYNC ON, strobed:
top=center=bottom, differential error margin depending on strobe phase, but best-case 0ms (global flash), absolute lag offset varies on strobe phase but typically an offset of 1 refresh cycle, PLUS the frame queue depth/plus compositing (non-FSE)

VSYNC OFF, strobed:
top>center>bottom, differential error margin of 1 refresh cycle, but has a complex overlapped lag gradient because of global-flash behavior interacting against the frameslice lag gradients.

VRR:
Same as VSYNC ON except it's more deterministic, and eliminates frame queue, since Present() immediately starts the refresh cycle, since the monitor is essentially syncing to the software whenever frametimes are within VRR range. So Present() is 0ms lag for top edge, through one refresh cycle lag offset for bottom edge. There's no such thing as rounding-off "to the next VSYNC" on a VRR monitor. Generically, it's infinite-looping Vertical Back Porch lines (aka variable-sized blanking interval (VBI) between refresh cycles) onto the cable until interrupted by Present() whereupon the first pixel row of visible pixels begin to be transmitted to the monitor. For minimum-VRR lag, you want highest Hz possible. In other words, 50 frames per second on a 240Hz monitor will have only a 1/240sec lag difference between top and bottom edge, despite being 50fps=50Hz at that instant.

Blur Busters understands the Present()-to-Photons black box science, I'm happy to answer your questions definitively. Cheers!

Obviously, there are other lag offsets (e.g. GtG pixel response as a slow fade from one color to the next. And monitor processing which is simply another absolute lag offset additive to all the above, done either rolling-window pixel row, or full framebuffer-based, though most gaming monitors have switched to pixel-row processing)

Now, when seeing through all of this
-- The lowest lag is VSYNC OFF for framerates far beyond refreshrate. Because instead of a latency range of [0...refreshtime], you have a latency range of [0...frametime]. So framerates beyond Hz, means VSYNC OFF is best for games having framerates far beyond refreshrates.
-- If you want lowest lag for any non-VSYNC-OFF technology, it's a VRR monitor for games with framerate ranges completely within VRR range. So higher Hz the merrier, even if your framerates doesn't reach max Hz. 60fps on 360Hz G-SYNC is massively lower lag than 60fps on 144Hz G-SYNC.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

       To support Blur Busters:
       • Official List of Best Gaming Monitors
       • List of G-SYNC Monitors
       • List of FreeSync Monitors
       • List of Ultrawide Monitors

Meowchan
Posts: 36
Joined: 17 Jun 2020, 02:06

Re: Human Benchmark implemented in C++ DirectX9

Post by Meowchan » 18 Aug 2020, 10:22

I went with a 1 second window since I found it easier to maintain maximal focus for that interval, and 3-4 seconds from start since I found starting from 2 seconds I would sometimes be 'not ready' for the change. One can certainly measure at different time spans and the results would be worse. It would also be a different test. A bit like sprinting vs long distance running. Having said that, I don't predict or attempt to guess, and my clicks are always in response to the change, and if they aren't I consider the attempt faulty and rerun it.

I found click variance decreases as you get better at taking the test.

One thing that's interesting to me is how with good gear results in the 140-150 ms range become common. There's a myth that human reaction time is 200 ms, and some even go as far as to say 250 ms (google human reaction time). That is simply not true. I don't know if the scientists who came up with these numbers used laggy hardware or if they are measuring a different kind of reaction, but eye - to brain - to hand - to system is definitely much shorter than 200 ms. I even have 9-12 ms latency on my hardware still (mostly due to my Viewsonic xg2530 monitor with it's inherent ~8 ms latency, which I very much regret buying), so my actual human element is 142 ms or so.

I have another computer that's old and 60 hz and uses an office mouse, and I got 250 ms over 5 runs on the web version.

Would be interesting to see if people who score 220-250+ (on proper hardware - important) can get it down fairly quickly if they practice at it for a while, or if this is an attribute that varies significantly from person to person, like an athletic ability, or the ability to dance. I read a study that mentioned the time until a signal reaches the brain from the eye is around 40 ms. I wonder about the remainder, how long does it take your brain to issue the command (and how much of it is first recognizing the change, and the second part is sending the action to the hand) and how long it takes for the signal to travel. Do humans have longer reaction time to their toes than to their hands? What if we activated the button with our tongues or with our eye balls

Would be interesting to see more numbers from people.

User avatar
Chief Blur Buster
Site Admin
Posts: 8326
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Human Benchmark implemented in C++ DirectX9

Post by Chief Blur Buster » 18 Aug 2020, 10:34

Meowchan wrote:
18 Aug 2020, 10:22
Would be interesting to see more numbers from people.
Once in a while, Blur Busters gets specially curated guest articles, sometimes commissioned (paid research).

I had Marwan (aka spacediver) to research esports reaction times for Blur Busters. He did some human reaction time tests and found some sub-100ms results, possibly helped by using specific higher-speed stimuli and predictive-cues. Before replying to say that's impossible, read all 4 pages. He acknowledges there may be a priming-effect / multi-stimuli effect.

The moral of the story is that there are different cues (sudden audio stimuli, sudden visual stimuli, human brain prediction of future movement, feel, multiple-stmuli, etc), combined with priming effects (predictive). many variables that simultaneously influence reaction times.
Meowchan wrote:
18 Aug 2020, 10:22
I went with a 1 second window since I found it easier to maintain maximal focus for that interval, and 3-4 seconds from start since I found starting from 2 seconds I would sometimes be 'not ready' for the change.
Oh, I assume you meant you randomize between 3-4 seconds?

The training effect can become impressive after practice if you keep it exactly the same (e.g. exactly 3 seconds or exactly 4 seconds), with players predictvely firing at a more exact moment than would otherwise done by the stimuli alone.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

       To support Blur Busters:
       • Official List of Best Gaming Monitors
       • List of G-SYNC Monitors
       • List of FreeSync Monitors
       • List of Ultrawide Monitors

Meowchan
Posts: 36
Joined: 17 Jun 2020, 02:06

Re: Human Benchmark implemented in C++ DirectX9

Post by Meowchan » 18 Aug 2020, 14:06

It is a random number between 3 seconds and 4 seconds after the initial click, a 1 second window.

Funnily reading the linked article he mentions 200 ms as being the average human reaction time. That's not to say that isn't right for some test, but the tests in the form of the human benchmark, from the numbers I have seen of myself and others with proper gear that is an overestimation. He mentions people who score 150 ms as being probably young people, I found that to be not the case as well. That's not to say young people don't have faster reaction times than older people (that is something I am also very curious about), but that the average of around 150 ms is reasonable for adults. I can give some anecdotal numbers that I've seen of myself and others, but after spending time on it and replacing some parts of my chain (this program being a notable one) and seeing the numbers go down, I strongly suspect that I'm right on this.

Reaction time to audio stimuli being around 30 ms faster than visual is something I've heard before. One thing to keep in mind when it comes to audio latency in video games, is that it is often lagging behind the visuals. Which means even if the human brain processes signals from audio 30 ms faster than visual, you might still react faster to the visuals in competitive games than the audio cues. That's on software developers to find a solution to, just like hardware manufacturers need to be more latency aware. And in order for this to happen the community at large needs to be more aware of latency and its implications.

People spend large amounts of money on getting a CPU that is 10% faster. They will go out and buy direct-to-die cooling solutions to get their temperatures 2 Celsius lower and consider it worth the time risk and effort. What about precious milliseconds added on to their every action, how much is that worth? Until people are aware of the problem and start voting with their wallets, companies will continue to produce software and equipment that is lagging behind. And when it comes to competitive video games, I don't want to lose a game because the other guy had less latency on his equipment. I don't want to win a game because I had less latency either. Games should be about skill, not about hidden external variables.

dendu
Posts: 25
Joined: 08 Aug 2020, 19:06

Re: Human Benchmark implemented in C++ DirectX9

Post by dendu » 19 Aug 2020, 01:07

win7 with the desktop set to:
1280x960@240hz
xl2540
i5-4460(3.2GHz)/gtx750ti/mx518g

Image
Cool app, thank you for taking the time to make and share.
And when it comes to competitive video games, I don't want to lose a game because the other guy had less latency on his equipment. I don't want to win a game because I had less latency either. Games should be about skill, not about hidden external variables. --Meowchan
I really agree with this as well. Ideally a competitive game would be optimized and have responsive input even on low spec hardware to help equalize the playing field. I feel a lot of us here are willing to go the extra distance, tweaking and tuning to optimize our setups. This gives an advantage where milliseconds matter, but also it is just satisfying to use a machine that is responsive to inputs. I think though, any sport or hobby has similar patterns, where if you get past a certain threshold of interest, you start investing more time and money into improving whatever as you move along the learning curve. I remember reading an amusing post on OCN, how at first it is so fun to play the game at 60hz while getting beat bad, then you upgrade and start trying out different mice and after some more steps you still are trying to find the reason you are getting beat.

diakou
Posts: 45
Joined: 09 Aug 2020, 11:28

Re: Human Benchmark implemented in C++ DirectX9

Post by diakou » 21 Aug 2020, 12:47

Image

I'll probably re-attempt some other day with better sleep/nutrition, interesting program for sure.

csplayer
Posts: 13
Joined: 26 Aug 2020, 12:10

Re: Human Benchmark implemented in C++ DirectX9

Post by csplayer » 26 Aug 2020, 12:21

Yep, getting on average about 20-30 ms faster times with this tool than i do with HB site.

Post Reply