Blur Busters Forums

Posted: **23 Sep 2022, 16:30**

More DLSS 3.0 Commentary

I actually expected NVIDIA to increase frame rate amplification capabilities.

Now, ignoring the Ugly pricing part of "The Good, The Bad, and The Ugly"

The new "Optical Flow Accelerator" (dedicated frame rate amplification silicon!) falls under the category of on-silicon frame rate amplification, that finally breaks the 2:1 barrier. Benchmarks are scoring 3x and 4x frame rate increases now! There are some issues and artifacts but they will get smaller and smaller over time.

I believe that they are now starting to do both spatial and temporal methods are done concurrently -- using AI-based enhancement of a low-resolution render into a high-resolution render, and also using information from the previous frame to accelerate rendering the current frame. There seems to be both spatial and temporal AI built into DLSS 3.0!

We have recently witnessed art-making AI's like MidJourney and DALL-E 2 which can create surprisingly incredible art when rendered at maximum settings with excellent queries. Stable Diffusion AI can create a single AI art in 13 seconds on a previous-gen RTX card.

DLSS 3.0 seems to be a hybrid technology that draws upon previous frame rate amplifiation and AI innovations -- so DLSS 3.0 can be thought of as a technological between between ASW, DLSS 2.0 and an ultra-lightweight art AI.

But, modifying a half-resolution GPU render to simply "sharpen" it, is a much simpler task that can be done in real time -- mere millisecond(s) with the improved DLSS 3.0 neural training set, depending on what the output resolution is.

Now, metaphorically, in a simplistic manner -- imagine, it as "I am a GPU AI on-silicon as part of an RTX 4090. That looks like a diagonal line. And that looks like a curve. Let me sharpen it to 4x resolution. Also, that fuzzy text on the distant sign is the same as the readable text of the same sign in a previous sign, let me borrow the asset from previous frame to sharpen it up. And so on" -- basically a realtime PhotoShopper of each frame that tries to do it as faithfully as possible (albiet imperfectly).

I imagine Optical Flow Accelerator probably has some hidden temporally dense raytracing behaviors built into it, and then denoising it all temporally, to allow something like ~4x the raytracing resolution with the same number of ray calculations. Longtime readers will know my TestUFO was cited on page 2 of NVIDIA's Temporally Dense Raytracing, a frame rate amplification techology for raytracing. A descendant of that is likely probably done concurrently simultaneously in a more error-diffusion manner with a motion-compensated noise-reduction technology, as a method of frame rate amplifying for ray tracing on the RTX 4000 series.

I should say I am only speculating what is under the hood of DLSS 3.0, projecting based on what NVIDIA has publicly discussed in white papers, talks, and conventions -- but this is a natural progression.

Although this is an over-anthropomorphism of an explanation, this helps many people to correctly technologically imagine future frame rate amplification technologies, since AI-silicon can do it faster than re-rendering billions of triangles per frame. Essentially artificially artisting those sharper frames based on both spatial and temporal history of previous frames as well as a game-specific AI training model (partially why DLSS is game-specific, the DLSS AI neural network model needs to be trained on the game's own assets).

Right now it seems to be a 1:1 sync between Present() and a new DLSS 3.0 frame, it has not been quite decoupled (at least not in DLSS 3.0).

While I lust after the 4090 to see if it can be milked to 1000fps 1080p in certain games (if it hasn't hit certain AI or memory bandwidth bottlenecks), the good news is that the cheapest 4000 series has the Optical Flow Accelerator silicon, so enabling this feature will still yield major increases, and you could still exceed 4090 frame rates (DLSS OFF) with just a 4070 (DLSS ON), if you're willing to accept the DLSS compromises. That being said, if you've got massive resolution (4K and 1440p) at massive refresh rates (240Hz, 360Hz and up), you definitely want to consider DLSS 3.0 on a higher numbered 4000 series if you want to get a bit closer to blurless sample-and-hold in supported games.

Be noted -- Frame rate amplification ratios can be somewhat less when you're starting off from a higher frame rate, than when starting off from a lower frame rate, since low frame rates are easier to frame-rate amplify massively than already-high frame rates. So 25fps may easily amplify to 100fps, but 100fps may only amplify to 200 or 300fps. There are many reasons for the tapering-off of ratios, but that still allows us to push our ultra high Hz monitors more.

Many geeks familiar with modern AI (of the 2020s) understand that many modern AI's are driven by a "training model", and the Optical Flow accelerator is combined with the neural network processing features. DLSS uses a training model pre-trained on a specific game's assets, and the DLSS "AI" becomes like an AutoComplete for missing details -- real-time rudimentary inpainting of sharper details. The original resolution is transformed by the DLSS AI training model into the destination resolution, infilling extra detail automatically, based on assets and rendering behaviors trained into the DLSS model, probably combined with other knowledge (raytracing, historical frames, motion vectors, etc). As DLSS improves, I bet it uses more of the motion variables (aka "Optical Flow" suggestiveness) like positional history, controller data, motion vectors, etc. So combining temporal data and spatial data in the basic neural network AI of DLSS. In other words, a simplistic real-time photoshopper artisting hundreds of frames per second. It's a lot less time-consuming to slightly spice up a low-=resolution frame, than to create AI art completely from scratch (ala MidJourney, Stable Diffusion, or DALL-E 2, etc) -- simply DLSS is using the low resolution frame to "trace out" the higher resolution frames as quickly as possible using known ground truths (game assets + current motion vectors of everything).

DLSS and AI art actually has some loosely common ground: They are trained on art assets. In new art AI's, it's a massive training set often consisting of the entire Internet, while in DLSS it's defacto an ultra-simplistic easier art AI only trained on the game assets.

DLSS came years before art AI's simply because it was easier to invent DLSS than to invent art AIs, but the concept is surprisingly similar where an art AI can use a reference image to trace/create a new image (whether sharper, or cartoonized or non-cartoonized or translated to a different style). Imagine, DLSS as kind of like a very basic proto-art-AI, except it's doing only minor modifications to an existing rendered image, using only a tiny subset of the skills built into a modern art AI, and using a much smaller training set (AI training model based only on that game's data, plus possibly common elementary data germane to all games and traditional rendering pipelines).

However, fundamentally, the AI concepts are surprisingly similar between DLSS and art AI's in certain areas -- it's simply a different order of magnitude of complexity. DLSS' only responsibility is to slightly "Zoom and Enhance" an existing image by 1.5x or 2x scaling. Where art AI's are doing many thousands of pixels from scratch, DLSS is only infilling tiny amounts -- like sharpening up a diagonal line or turning blurry distant text into clearer text (based on the bitmap trained into the AI model of DLSS for that specific game). The number of "parameters" required by DLSS is a tiny fraction of those required by art AI's -- and is a lot 'dumber' -- because you need to re-artist those frames in milliseconds. Rendering by polygons at low resolution, then rendering by AI to upscale the low resolution to high resolution as artifactlessly as possible, keeping as faithful as possible to the ground truth of the original traditional polygonal render (albiet imperfectly currently, mind you). Fewer pixels, fewer rays, while using DLSS AI magic to do the rest.

Mind you, DLSS is not perfect and it's still problem-ridden (pros and cons), but this is still a major technological step. The ratio of framerate-multiplier:artifacts will become less and less over time. At 2x ratios, DLSS 3.0 has much fewer artifacts than DLSS 2.0, but if you milk it to 4x ratios the artifacts can reappear again -- and not everyone is picky about them.

The good news is DLSS is tunable, you can adjusts the amount of DLSS work towards quality versus performance. So if you tune to the same frame rate amplification ratio as DLSS 2.0, you will be getting better quality with fewer artifacts than before! So there's proof of progress in the ratio of framerate:artifacts during frame rate amplification algorithms.

I expect the AI boom to have a major impact in frame rate amplification technologies, since it can make both spatial and temporal frame rate amplification technologies of all kinds much more accurate.

DLSS 3.0 (and beyond) is probably, by now, a complex AI-based soup of simultaneous interpolation, extrapolation and reprojection -- which is doable concurrently with neutral networks. And NVIDIA has branded all of that "Optical Flow Accelerator", highly suggestive of spatial (optical) and temporal (flow) -- a massive hint of what's under the hood.

We all know for humankind visibility benefit, pixel response needs to be pushed as close to 0ms GtG as possible (OLED 240Hz is clearer motion than LCD 360Hz!). Frame rates need to be upgraded geometrically, e.g. 60fps -> 120fps -> 240fps -> 480fps -> 1000fps.

I need to make a sequel article to Blur Busters Law: The Amazing Journey To Future 1000Hz Displays as well as Cheap 1000fps: Frame Rate Amplification Technologies, the latter article I wrote a few years ago that predicted the emergence of DLSS 3.0

I should admonish, that it's probably a better spend of money to get a 4070+OLED than only upgrading 4090, because OLED gives you the equivalent of a 1.5x framerate upgrade in blurless sample and hold (due to GtG-zeroing-based blur reductions). But if your wallet is fat, then get the best GPU and the best display.

Due to difficulty of Moore's Law continuing (in both clockspeed and transistor counts), I expect further progress in going even more massively multicore, e.g. more performance at less wattage, by using all kinds of spatial and temporal frame rate amplification tricks, in a co-processing manner.

I imagine DLSS 4.0 and 5.0 or beyond will start to approach the 5:1 and 10:1 ratios I predict in my article, necessary for future 8K 1000fps 1000Hz displays at UE5-engine quality. This will be a big challenge, given the prior semiconductor shortage and supply chain delays, so the improvements will be more incremental. And the 3000-series GPU glut from the Ethereum change and crypto crash, will probably slow down next-generation GPU sales somewhat for a while. So 4K-8K 1000fps 1000Hz frame rate amplification may delay itself for a bit yet, like to "sometime in the 2030s". At least for a single-GPU system. Unless you're rendering round-robin (custom optimized SLI setups) in a multi-GPU system. The AI boom accelerate unimpeded, which will figure a major role in shortcuts around Moore's Law limitations.

To be fair to other GPUs, we welcome all GPU progress, despite NVIDIA being the current gorilla of GPU progress --
I am excited to see what AMD's answer to DLSS 3.0 will be.
I must add I am also very pro-Intel and I want Intel to add better frame rate amplification to their next ARC (version 2.0).

Posted: **25 Sep 2022, 23:41**

Right on cue.

Chief Blur Buster wrote: ↑
23 Sep 2022, 16:30
The new "Optical Flow Accelerator" (dedicated frame rate amplification silicon!) falls under the category of on-silicon frame rate amplification, that finally breaks the 2:1 barrier.

A bit of clarification: the Optical Flow Accelerator has existed since Turing - it's not new hardware/silicon. It's just faster and higher quality in Ada Lovelace, and DLSS 3 relies on it for its Optical Multi Frame Generation (OMFG

) technology.

I believe that they are now starting to do both spatial and temporal methods are done concurrently -- using AI-based enhancement of a low-resolution render into a high-resolution render, and also using information from the previous frame to accelerate rendering the current frame. There seems to be both spatial and temporal AI built into DLSS 3.0!

Yep, that seems to be exactly what they're doing. It uses both engine motion vectors and optical flow to track motion.

The Optical Flow Accelerator captures spatial data not included in game engine motion vector calculations, such as particles, reflections, shadows, and lighting. By analyzing two sequential in-game frames, it calculates an optical flow field, determining the direction and speed at which pixels are moving from frame 1 to frame 2.

DLSS 3 also uses game engine motion vectors to track the movement of geometry in the scene, as generating frames using engine motion vectors alone would result in visual anomalies.

The DLSS Frame Generation AI network decides how to use information from the game motion vectors, the optical flow field, and the sequential game frames to create intermediate frames, reconstructing both geometry and effects.

Essentially artificially artisting those sharper frames based on both spatial and temporal history of previous frames as well as a game-specific AI training model (partially why DLSS is game-specific, the DLSS AI neural network model needs to be trained on the game's own assets).

Since DLSS 2, Nvidia has used a generalized neural network that can adapt to all games and scenes without specific training. Per-game training is what they did with the first iteration of DLSS Super Resolution, which provided results that didn't look nearly as good.

The good news is DLSS is tunable, you can adjusts the amount of DLSS work towards quality versus performance. So if you tune to the same frame rate amplification ratio as DLSS 2.0, you will be getting better quality with fewer artifacts than before! So there's proof of progress in the ratio of framerate:artifacts during frame rate amplification algorithms.

DLSS 3 is comprised of three separate technologies: Super Resolution (AI-based upscaling), Frame Generation (OMFG) and Reflex (which eliminates the render queue and synchronizes the GPU and CPU to reduce latency).

For now, Frame Generation only works on 40-series GPUs, due to the massive leap in processing throughput of the fourth-generation Tensor Cores and Optical Flow Accelerator of the Ada Lovelace architecture. And while, theoretically, it can work on other cards, my estimation is that it would run just as poorly as ray tracing does on Pascal (which lacks RT cores).

As for Super Resolution, Nvidia has said that they will continue to use their supercomputers to train and improve its AI model, and that the latest models will keep being delivered through driver updates to all GPUs with Tensor Cores (Turing and above). If a certain game offers DLSS 3, but your card only has support for DLSS 2, you can still expect to benefit from Super Resolution and Reflex - just not Frame Generation.

In the fine print of all DLSS 3 demos showcased so far, there's the following info:

DLSS Super Resolution Performance Mode, DLSS Frame Generation on RTX 40 Series.

"Performance" is the second lowest mode in terms of native render resolution (from lowest to highest native render res, it goes: Ultra Performance < Performance < Balanced < Quality). There's been no mention of a specific mode for Frame Generation, which leads me to believe that the setting has no granularity in its current state, and can simply be turned on or off.

I also believe that Frame Generation will have an individual toggle, apart from Super Resolution, as with the current "DLSS" (Super Resolution) and Reflex, which can be enabled independently from each other.

In the future, we may see other modes of Frame Generation, which will create even more intermediate frames, instead of inserting only one between every two frames.

Edit: Nvidia has answered a few questions to WCCF Tech, confirming what I said above. Here's the relevant question and answer:

For eSports users who might want to achieve the lowest latency, will it be possible to only enable NVIDIA DLSS 2 (Super Resolution) + Reflex without the DLSS Frame Generation that improves FPS but also increases system latency?

Our guidelines for DLSS 3 encourage developers to expose both a master DLSS On/Off switch, as well as individual feature toggles for Frame Generation, Super Resolution, and Reflex, so users have control and can configure the exact settings they prefer. Note that while DLSS Frame Generation can add roughly a half frame of latency, this latency is often mitigated by NVIDIA Reflex and DLSS Super Resolution - all part of the DLSS 3 package. NVIDIA Reflex reduces latency by synchronizing CPU and GPU, and DLSS Super Resolution reduces latency by decreasing the size of the render resolution.

Posted: **26 Sep 2022, 03:43**

"If a certain game offers DLSS 3"

so lets say..taking steam as a reference,there is every month 1m users..

playing games old as year or 2 till 10yo for some..

so if you talking about competitive games today in these wild times..

dlss 3.0 is completely useless ..

1 csgo
2 dota2
3 valorant
those are 3 games most played in comp enviroment
with most wiewed content and most money involved..

and if we already determined that dlls 3.0 can NOT be implemented on already published games..
in the next 10-15 years this tech is completely useless,regarding "comp" enviroment.

lets get something straight,i am all in for progress in any shape or form..
side note: i am cpu bottlenecked from the moment i got a 400hz display..

so being able to override cpu botlleneck with some switch on-off
you have my full support there..

but reality is this..

new csgo will c the light of a day,maybe never or in some distant future..
same like dota2
like valo..

regarding those games..
all they are doing in the last couple of years..
is adding new shaders ,new light calculations ,some little bit better modells mapping..thats it..

{and this is the 1 of the main reasons why you need new gpu every couple of years,
nvidia and amd sponsoring some of the most popular game studios,
with a little caviat,
i sponsor(nvidia,amd) you implement(game studios) more taxing calculations on gpu=gamer need a new gpu!

perfect example of true vicious circle.

and this should be regulated by the law..

and why it would never be regulated by the law,
is the simple fact that you have lobbiest in the every sphere of laws designed for common folks,

if we consider that in the rawest form lobbiest are basicaly legitimized bribing personas serving purpose,
to those with most ammount of money,
and gpu-cpu vendors are limitless source of money,
its ezzz to calculate why "never" is the only thing that will happen. }

by that formula those most played comp games will runn forever.
without a need to develope completely new game,
completely new engine..

and where is dlss 3.0 in that?
nowhere 2 be found..

Posted: **26 Sep 2022, 14:41**

EeK wrote: ↑
25 Sep 2022, 23:41
Right on cue.

Chief Blur Buster wrote: ↑
23 Sep 2022, 16:30
The new "Optical Flow Accelerator" (dedicated frame rate amplification silicon!) falls under the category of on-silicon frame rate amplification, that finally breaks the 2:1 barrier.

A bit of clarification: the Optical Flow Accelerator has existed since Turing - it's not new hardware/silicon. It's just faster and higher quality in Ada Lovelace, and DLSS 3 relies on it for its Optical Multi Frame Generation (OMFG ) technology.

Yes, thanks for the correction -- that being said, there's no dedicated silicon for this (it's more GPGPU-style stuff) in previous-gen RTX cards.

The 4000 series appears to have dedicated silicon that seems whose only purpose is Optical Flow acceleration (and very similar processing-related tasks).

Posted: **26 Sep 2022, 14:47**

Crazyness wrote: ↑
26 Sep 2022, 03:43
so if you talking about competitive games today in these wild times..

dlss 3.0 is completely useless ..

1 csgo
2 dota2
3 valorant
those are 3 games most played in comp enviroment
with most wiewed content and most money involved..

Yes. DLSS is not useful for all use cases.

That being said, the world of video games includes a lot more -- including VR.

1. Ultra-geometric upgrades are needed beyond a certain point
(e.g. 240Hz -> 1000Hz -> 4000Hz) with framerate to match, because of the diminishing curve of returns. 240Hz-vs-1000Hz (at 0ms GtG) is massively more visible than 240Hz-vs-360Hz, and even visible to over 90% of population (variables for blind test)

2. Vanishing point of Hz diminishing curve is not until 5-digit framerate=Hz
Simulating a perfect Star Trek Holodeck without impulsing (real life does not strobe = humankind bandaid!) requires approximately ~20,000fps at ~20,000Hz if you're avoiding tricks such as strobing and eye-tracking-compensated motion blurring systems. The Stroboscopic Effect of Finite Frame Rates

The geometric upgrade brick wall makes it impossible to do via just traditional triangle rendering, and we need various frame rate amplification tricks to move beyond.

Competitive games is a massive universe, but is only a slice of pie of the entire gaming universe (VR, mobile, console, desktop, etc), and some use cases really requires frame rate amplification technologies.

The Ultimate Napkin Exercise to understand all this is the example of a perfect Holodeck, that passes the A/B blind test between real life and virtual reality, can't tell them apart, nothing looks off, no extra display blurring above real life, no extra stroboscopic artifacts above real life, everything is analog-motion, etc.

Competitive will benefit from these technologies once it's much lower lag and the games are properly designed for it. When you have ultra-high-framerate feedstock, lag penalty can theoretically become as small as DyAc, which many competive players use anyway because faster human reaction time overcomes the minor (~2ms avg) strobe lag.

The tradeoff of 4x-10x less motion blur (see enemies faster) can reduce reaction times more than the extra blurbusting lag.

Back in the CRT era, it was easy to see 60Hz vs 85Hz flicker differences, but in the stratospheres, it's all about blur differences and stroboscopic differences instead (e.g. www.blurbusters.com/stroboscopics and www.blurbusters.com/area51) and it requires larger geometric upgrades (e.g. 4x Hz for 4x blur differences, e.g. 240Hz vs 1000Hz with framerate=Hz and GtG=0 ala OLED/MicroLED, not LCD).

Strobing is just a humankind band-aid as real life does not strobe and real life does not flicker.

But eliminating blur via brute rame rrate is a geometric upgrade problem -- for ergonomic PWM-free flicker-free strobe-free zero-blur, the only way to do it is via ultra high frame rates at ultra high refresh rates.

BenQ XL2546 needs about ~1000fps 1000Hz to match DyAc motion clarity without strobing. (1ms MPRT sample-and-hold)

Quest 2 needs about ~3333fps ~3333Hz to match its own motion clarity without its mandatory strobing (0.3ms MPRT sample-and-hold)

Now, doing 240-vs-1000 (12.9ms blur difference) is actually more human visible than 120-vs-240 (8.3ms blur difference), assuming GtG=0, MPRT=frametime, framrate=Hz, flicker=none, stutter=none which disqualifies low frame rates, disqualifies most LCD (since GtG above 0).

So, we have a massive geometric upgrade problem as we reach closer and closer to the vanishing point of diminishing curve of returns.

But, remember this, 60Hz vs 120Hz is a 8.3ms blur difference, and 120Hz vs 1000Hz is a 7.3ms blur difference -- and that is a massive upgrade again, with more humans telling apart 60Hz-vs-1000Hz at framerate=Hz sample and hold (15.7ms blur difference) than 60-vs-120, 120-vs-240.

However, the GPU power is a big problem, and that is why I am a big fan of frame rate amplification technology.

DLSS 3.0 is still laggy wright brothers technology that still needs to be massively improved to do 1000fps 1000Hz losslessly, but there is absolutely no way we can directly polygonal/triangle render every single 1000+ frames at 1000+ Hz, see Blur Busters Law: The Amazing Journey To Future 1000Hz Displays as well as Frame Rate Amplification Technologies.

It will take a long time to the performance and algorithms needed, as 1000fps 1000Hz displays are just around the corner, but trying to do 8K 1000fps 1000Hz UE5-quality by the decade of ~2030s, will require a lot of techniques that includes improved descendant frame rate amplification algorithms that can do 5x and 10x ratios in a perceptually lossless manner.

Theoretically, special software could probably generate 8K 1000fps UE5 to a real time rendering farm. All approaches to increasing frame rates is now on the table, in the journey to blurless sample-and-hold displays at photorealistic levels (UE5 and better). Initial approaches may first reach special-venue displays (e.g. expensive simulator premises, ride simulators, etc), since it may require many GPUs and coprocessors working in parallel, to achieve major frame rate amplification ratios. Although NVIDIA has depreciated consumer SLI, the paralellism approach is not obsolete in the supercomputer enterprise, especially now that PCIe 5.0 (and PCIe 6.0 in future) has the bandwidth to multiplex all the frames over the bus instead of via card-to-card connectors. So sped-up optimized low-latency render farms could double as real time frame rate amplifiers for enterprise/simulator/etc purposes, long before the tech hits consumers.

However, for single-processor frame rate amplification, requires quite a lot of tricks like better descendants of DLSS over the long-term.

Posted: **20 Sep 2023, 17:01**

Isn't this relevant for this topic? https://www.eurogamer.net/digitalfoundr ... 1dx12-game

Blur Busters Forums

New term: "Frame Rate Amplification" (1000fps in cheap GPUs)

Re: New term: "Frame Rate Amplification" (1000fps in cheap GPUs)

Re: New term: "Frame Rate Amplification" (1000fps in cheap GPUs)

Re: New term: "Frame Rate Amplification" (1000fps in cheap GPUs)

Re: New term: "Frame Rate Amplification" (1000fps in cheap GPUs)

Re: New term: "Frame Rate Amplification" (1000fps in cheap GPUs)

Re: New term: "Frame Rate Amplification" (1000fps in cheap GPUs)