New low latency nvidia drivers

Everything about latency. Tips, testing methods, mouse lag, display lag, game engine lag, network lag, whole input lag chain, VSYNC OFF vs VSYNC ON, and more! Input Lag Articles on Blur Busters.
Nocebo
Posts: 4
Joined: 16 Jul 2017, 12:32

Re: New low latency nvidia drivers

Post by Nocebo » 21 Aug 2019, 16:22

I would also love if someone did some research about the measurable diffrence when using this ultra setting.

pox02
Posts: 259
Joined: 28 Sep 2018, 06:04

Re: New low latency nvidia drivers

Post by pox02 » 21 Aug 2019, 21:58

Nocebo wrote:I would also love if someone did some research about the measurable diffrence when using this ultra setting.
just tested it i feel no difference from 1 to ultra using driver 436.02 rtx 2080 ti vsync on Gsync On tested reduce buffering overwatch on and off so i dont know
monitors xg258q aw2518hf 27GK750F-B pg248q xg240r lg w2363d-pf xb270hu XL2546 XL2546K NXG252R

ad8e
Posts: 68
Joined: 18 Sep 2018, 00:29

Re: New low latency nvidia drivers

Post by ad8e » 22 Aug 2019, 01:19

Their "just in time" technique sounds exactly like what my own vsync algo was doing. Really good for them. I had to do a lot of sophisticated math for frame estimation and sync timing which I doubt they did at all (so their algorithm will probably suck). But having a bad just in time technique is still way better than not having it; it will just be a little more conservative. This is a major step forward and I hope they make my own solution obsolete. They will be able to avoid some tradeoffs that I had to make, and I don't want to maintain my very domain-specific technical code.

"The new mode seems to be a direct response to AMD's recent inclusion of the Radeon Anti-Lag feature on its RX 5700 series graphics cards, which works in a similar fashion."

Well thanks, eurogamer.net. Back when I wanted to know what Radeon Anti-Lag was doing, I could not tell after a lengthy google search because AMD kept all their marketing vague and refused to say anything technical. I had no idea what it was doing or whether it helped for anything. Now I finally know, only after it's compared directly to Nvidia's solution.

If implemented correctly, Nvidia's low latency presentation mode should make Scanline Sync obsolete for gamers who don't like tearlines. (I doubt it's implemented correctly.)

Also, when choosing tearlines, I think it's better to turn vsync off, so that a tiny miss doesn't skip a full frame, instead causing some small tearing. Vsync off causes issues with waiting for the tearline, but since Nvidia writes the driver, they should have access to a waiting solution that we end developers don't.

HiAlgoBoost
Posts: 4
Joined: 22 Aug 2019, 06:18

Re: New low latency nvidia drivers

Post by HiAlgoBoost » 22 Aug 2019, 06:23

@ad8e That sounds great! Are you around Santa Clara /Bay Area by a chance? We are looking for good people to join our (AMD) driver team...

ad8e
Posts: 68
Joined: 18 Sep 2018, 00:29

Re: New low latency nvidia drivers

Post by ad8e » 22 Aug 2019, 22:05

HiAlgoBoost wrote:@ad8e That sounds great! Are you around Santa Clara /Bay Area by a chance? We are looking for good people to join our (AMD) driver team...
Sorry, I'm in New York, and I am not able to switch jobs until around next year July. Also, I'm unqualified to work on a GPU driver team, because my experience in this specific area (graphics) is quite low. I wouldn't be useful for anything else related to the driver side of graphics. Frame timing is a combination of math and design, and my specialties are math and design (and programming).

If you're looking for the vsync timing algorithm, just ask and I will give it out (privately or publicly). Public domain, no credit required, patent licenses granted (I have no patents).

I'll describe the vsync timing algorithm: it has two parts. A rejection part, then an estimation part. The rejection part says, "VBlanks can't be reported twice in the same monitor frame, because it is only reported after the VBlank completes". The estimation part then recursively checks intervals with a P-only controller, and then does some statistics to find the best guess.

From a mathematical perspective, the above algorithm is close to optimal. Rejection is very fast because of a small math lemma, and is optimal assuming sufficiently low clock skew (which holds true). Estimation is close to optimal; other controller types were benchmarked and found inferior. I think Fourier methods are not appropriate here. In practice, it uses <1% CPU and its accuracy is 1 horizontal tearline standard deviation on an Intel HD4000.

However, there's one thing I keep in mind: the driver has direct access to the GPU. We end developers use D3DKMTWaitForVerticalBlankEvent() to grab frame times. On Windows, Jerry Jongerius did testing and found that this is occasionally delayed, and that the reported monitor refresh rate is often false. I did some testing and uncovered other annoying problems, and wrote them in a document. Mark Rejhon (Chief Blur Buster, owner of this forum) was very helpful here and has also diagnosed the origin of some of these hardware/OS issues. The purpose of the vsync algorithm is to turn the noisy D3DKMT() data into an exact vblank period and phase.

If the driver has access to the true monitor refresh rate, then the vsync algorithm simplifies and speeds up greatly, with a small catch that the statistics part is still necessary. (I think most developers won't spot this. For example, Jongerius's algorithm omits it.)
If the driver has access to both exact frame timing and the monitor refresh rate, then only the statistics part of the vsync algorithm is necessary, and the algorithm becomes basically free CPU-wise.

As for the frame estimation algorithm, frame timing accuracy is very nice to have here. We end developers have poor accuracy access to frame timing, but the GPU driver might have much better access. Someone with a comfortable grasp of undergraduate statistics should be able to write a proper algorithm with some guidance. There are two cost models here: one is where vsync is on, and a frame miss causes a full frame delay. The other is vsync off, and a frame miss causes a tearline. From the past N frame times, it's possible to predict a distribution of expected next frame times, and then choose the right delay to minimize the cost, depending on whether vsync is on or off. Cost is in two dimensions: jitter and latency.
For continuous rendering and no input (video): latency is fine, jitter is bad.
For sporadic rendering (static webpages, reaction time tests): latency is bad, jitter is fine.
For continuous rendering and responsive input (most games): both are bad, a balance is needed.

I have a deeper writeup if you're interested.

HiAlgoBoost
Posts: 4
Joined: 22 Aug 2019, 06:18

Re: New low latency nvidia drivers

Post by HiAlgoBoost » 23 Aug 2019, 01:16

ad8e wrote:Sorry, I'm in New York, and I am not able to switch jobs until around next year July. Also, I'm unqualified to work on a GPU driver team, because my experience in this specific area (graphics) is quite low. I wouldn't be useful for anything else related to the driver side of graphics. Frame timing is a combination of math and design, and my specialties are math and design (and programming)..
That sounds very interesting! 1 horizontal tearline std - that is way more accuracy than is required, wow! Let's continue the discussion offline, I am interested - email me at hialgoboost(at)gmail.com.
Thanks,
Eugene.

andrelip
Posts: 160
Joined: 21 Mar 2014, 17:50

Re: New low latency nvidia drivers

Post by andrelip » 23 Aug 2019, 07:03

ad8e wrote:
HiAlgoBoost wrote:@ad8e That sounds great! Are you around Santa Clara /Bay Area by a chance? We are looking for good people to join our (AMD) driver team...
Sorry, I'm in New York, and I am not able to switch jobs until around next year July. Also, I'm unqualified to work on a GPU driver team, because my experience in this specific area (graphics) is quite low. I wouldn't be useful for anything else related to the driver side of graphics. Frame timing is a combination of math and design, and my specialties are math and design (and programming).
...
Amazing! Just out of curiosity what is the time that the prediction took to run and which frequency it is updated? You said "from the past frametimes..." but it seems very uncertain by nature to have only 1 line of deviation if your observable are just the past frametimes. Even for good performers for timeseries like xgb and LSTM and a large event window it seems unlikely. I can't imagine such a good result with simplier algo.

ad8e
Posts: 68
Joined: 18 Sep 2018, 00:29

Re: New low latency nvidia drivers

Post by ad8e » 23 Aug 2019, 13:49

andrelip wrote:Amazing! Just out of curiosity what is the time that the prediction took to run and which frequency it is updated? You said "from the past frametimes..." but it seems very uncertain by nature to have only 1 line of deviation if your observable are just the past frametimes. Even for good performers for timeseries like xgb and LSTM and a large event window it seems unlikely. I can't imagine such a good result with simplier algo.
You misunderstand - what I'm doing is much less impressive than the notion you're thinking of. I am timing the vblank points, not the render frame times. Frame time estimation and vsync timing are two separate parts, both necessary.

The program waits, then renders the frame, then the frame completes, then the program waits some more until the right tearline, then forces the swap. The two waits give the correct tearline as long as you can tell when that tearline is supposed to be. I am not able to predict variable frame times exactly. I can only fit frame times to distributions. I can not even observe frame times in my current application; my fences fail to trigger properly and I don't know why since I am a GPU novice. Note that trying to predict the exact frame time is the wrong approach; jitter/latency cost analysis shows that the second wait is a necessary buffer to minimize costs. The correct approach is to have a fitted distribution, as narrow as possible. The distribution's spread must be kept in mind, and the distribution's change in the future must also be kept in mind.

(I looked up xgb; it's GBM, gradient boosting. That is not quite appropriate for frame prediction but will work ok. LSTM is a neural net. I work in a related area, cognition, but not artificial intelligence directly. I think predicting frame times directly using a neural net will be inferior to a statistical approach. As long as the designer has a decent knowledge of statistics, the best a neural net can do is to repeat those same statistics, plus do pattern extraction. You could handle the statistics concept separately as additional inputs and outputs for the neural net. It'd remove the burden on the neural net to do statistics (which it won't do well) and leave it with pattern extraction (which the human designer won't do well).

A human who is good at designing pattern extraction could make the rest of the neural net obsolete too, but I don't know that is really achievable. The problem domain is equivalent in a sense to a compression algorithm, so I expect someone with a skillset similar to Yann Collet to do well.)

The following is what I sent. The code has some sharp edges still and parts are unready, but since I just gave it away privately, a public record is nice. It does only the vsync estimation, not the frame time estimation.

https://drive.google.com/open?id=1V-fr1 ... 8fSGus7Pp0 (this link will disappear after a month)
The code can be built with the given Visual Studio project, and has been done so in blurbusters.exe, which is a demo. To get 1 tearline std, set battery settings to High Performance. On my system, Balanced battery settings sometimes gives 2 tearline std and sometimes 40 tearline std.

Controls: right click drag to control tearline. There's also a demoscene effect by pressing 2; switch back with 1. It may crash occasionally (intentionally), because the estimation part has CPU spikes when the P-only controller jumps back and forth too much. The system checks for CPU spikes, and on detection, it intentionally crashes to make those spikes noticeable. The crashes are optional. A simple loopcount limiter should be applied to the estimation part but I didn't bother to do that yet.

The important algorithm is vsync.cpp. The algorithm component is well documented but it is still largely inscrutable because the math is so heavy. I can understand it perfectly but that is mainly because I wrote it and my math background.
The benchmarking and auto-optimizer are poorly documented, but those are for determining optimal constants during debugging. Those are in the #define blocks of VSYNC_BENCH and FIND_CONSTANTS. All code and algorithms in vsync.cpp are original, so I place it in the public domain (CC0).
The Windows-specific glue is in platform_vsync.cpp. Note the credits at the top of this file, which may cause issue with copyright. The full origins of all components of this file are listed in that credits section. I place everything in this file in the public domain (CC0) to the extent which I am able.

There's some interesting documentation in "etc/Blur Busters Forums.html". Jerry Jongerius also did some tests in the top of the source file in http://www.duckware.com/test/chrome/467 ... e-code.zip

...


3. Note that my code doesn't detect variable sync (it could if necessary). A different algorithm should be used for variable sync, depending on the sync's implementation details.
"etc/Blur Busters Forums.html" was a long and fruitful private discussion with ChiefBlurBuster. I'm not comfortable sharing it publicly unless he gives permission, so I removed it from the zip.

ad8e
Posts: 68
Joined: 18 Sep 2018, 00:29

Re: New low latency nvidia drivers

Post by ad8e » 23 Aug 2019, 14:47

I forgot to mention: the 40 tearline Balanced-battery standard deviation is not because the algorithm doesn't know the exact vsync period and phase. It still knows them exactly to 1 tearline. Rather, it has significant trouble trying to swap the framebuffer at the right time. It spinwaits on the time, then calls swap() at the exact correct moment. And though the swap time is exact, the frame only completes 0-80 random scanlines later (or sometimes 0-4 random scanlines later). Given how simple this task is and how crappy the performance still is, it should give an indication of how hard it is for userspace developers to get frame times to be exact. Maybe AMD/Nvidia/Intel's driver divisions know the underlying quirks of GPU timing and can work around it, but it's not something I can manage with the documentation I currently know about. It is caused by power management, since High Performance battery makes the swap() operation consistent.

(I use glfwSwapBuffers().)
Last edited by ad8e on 23 Aug 2019, 15:41, edited 2 times in total.

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: New low latency nvidia drivers

Post by Chief Blur Buster » 23 Aug 2019, 14:49

<Advanced Info>
Some background information for technical readers/programmers who are unfamiliar with our past due diligence on raster experiments on modern GPUs.

This was some basic elementary raster programming understanding that went behind the popular “RTSS Scanline Sync”, to beam-race VSYNC OFF where tearlines are pushed offscreen.

The short explanation is: “VSYNC OFF tearlines are just rasters” — tearline are essentially framebuffer splices mid-scanout, as seen at high speed videos at http://www.blurbusters.com/scanout as frame transmission from computer to display is top-to-bottom in a raster fashion (This is true whether a 1930s analog TV signal or a 2020s DisplayPort signal) — we’ve stuck to the same pixel delivery sequence as a matter of fact of serializing a 2D frame over a 1D cable.

Much info processing by humans have been top-to-bottom (reading, scrolling, texts, books) and so displays have standardized on top-to-bottom refreshing sequence, as law of physics prevent instant transmission of all pixels simultaneously from computer to display. Naturally, not all pixels arrive at the display at the same time — and so, this is the sequence that we’ve stuck with for practically 100 years of electronic displays of all kinds (with a few exceptions such as sideways scanout or bottom-to-top, but they still essentially “rasterization”).

ad8e’s excellent work was from the collaboration/discussions in the Tearline Jedi, a still-currently-unreleased experiment I created.
- Tearline Jedi Thread
- Tearline Jedi on poeut.net (with videos)

I was supposed to release it late last year but running Blur Busters is a full time endeavour — aka the “putting food on the table” factor! — But thank you for bringing this up and keeping this on the front burner is a good idea in the coming months / next year).

AFAIK, it was the first time a true real-raster Kefrens Bars was done through generic GPU API (Direct3D / OpenGL) — aka Amiga-style Alcatraz bars, normally driven by the copper chip using precise raster programming. It is only due to sheer brute power of a GPU that could compensate for the imprecision of a non-realtime OS unsuited for raster interrupts (etc).

For those who are unaware — and who are fascinated with the resurrection of classic raster-based programming (Raster interrupts, scan line, etc).

This was essentially a DIY implementation of something resembling “Max Prerendered Frames 0”, but without other lag-reducing algorithms on top (e.g. busywait-closer-to-vblank (VSYNC), so that inputreads can occur right before render time that completes just right before a VSYNC).

Now, for those who want to understand further, understanding the signal structure is key (and how it corresponds to the numbers in a Custom Resolution Utility). Top-to-bottom raster scanout even exists with FreeSync and G-SYNC signals, Quick Frame Transport, Large VBIs, 240Hz, 60Hz, 480Hz, and so it’s just tweaks/variations on a nearly 100-year-old display signal delivery structure.

</Advanced Info>
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Post Reply