Re: New term: "Frame Rate Amplification" (1000fps in cheap GPUs)
Posted: 23 Sep 2022, 16:30
More DLSS 3.0 Commentary
I actually expected NVIDIA to increase frame rate amplification capabilities.
Now, ignoring the Ugly pricing part of "The Good, The Bad, and The Ugly"
The new "Optical Flow Accelerator" (dedicated frame rate amplification silicon!) falls under the category of on-silicon frame rate amplification, that finally breaks the 2:1 barrier. Benchmarks are scoring 3x and 4x frame rate increases now! There are some issues and artifacts but they will get smaller and smaller over time.
I believe that they are now starting to do both spatial and temporal methods are done concurrently -- using AI-based enhancement of a low-resolution render into a high-resolution render, and also using information from the previous frame to accelerate rendering the current frame. There seems to be both spatial and temporal AI built into DLSS 3.0!
We have recently witnessed art-making AI's like MidJourney and DALL-E 2 which can create surprisingly incredible art when rendered at maximum settings with excellent queries. Stable Diffusion AI can create a single AI art in 13 seconds on a previous-gen RTX card.
DLSS 3.0 seems to be a hybrid technology that draws upon previous frame rate amplifiation and AI innovations -- so DLSS 3.0 can be thought of as a technological between between ASW, DLSS 2.0 and an ultra-lightweight art AI.
But, modifying a half-resolution GPU render to simply "sharpen" it, is a much simpler task that can be done in real time -- mere millisecond(s) with the improved DLSS 3.0 neural training set, depending on what the output resolution is.
Now, metaphorically, in a simplistic manner -- imagine, it as "I am a GPU AI on-silicon as part of an RTX 4090. That looks like a diagonal line. And that looks like a curve. Let me sharpen it to 4x resolution. Also, that fuzzy text on the distant sign is the same as the readable text of the same sign in a previous sign, let me borrow the asset from previous frame to sharpen it up. And so on" -- basically a realtime PhotoShopper of each frame that tries to do it as faithfully as possible (albiet imperfectly).
I imagine Optical Flow Accelerator probably has some hidden temporally dense raytracing behaviors built into it, and then denoising it all temporally, to allow something like ~4x the raytracing resolution with the same number of ray calculations. Longtime readers will know my TestUFO was cited on page 2 of NVIDIA's Temporally Dense Raytracing, a frame rate amplification techology for raytracing. A descendant of that is likely probably done concurrently simultaneously in a more error-diffusion manner with a motion-compensated noise-reduction technology, as a method of frame rate amplifying for ray tracing on the RTX 4000 series.
I should say I am only speculating what is under the hood of DLSS 3.0, projecting based on what NVIDIA has publicly discussed in white papers, talks, and conventions -- but this is a natural progression.
Although this is an over-anthropomorphism of an explanation, this helps many people to correctly technologically imagine future frame rate amplification technologies, since AI-silicon can do it faster than re-rendering billions of triangles per frame. Essentially artificially artisting those sharper frames based on both spatial and temporal history of previous frames as well as a game-specific AI training model (partially why DLSS is game-specific, the DLSS AI neural network model needs to be trained on the game's own assets).
Right now it seems to be a 1:1 sync between Present() and a new DLSS 3.0 frame, it has not been quite decoupled (at least not in DLSS 3.0).
While I lust after the 4090 to see if it can be milked to 1000fps 1080p in certain games (if it hasn't hit certain AI or memory bandwidth bottlenecks), the good news is that the cheapest 4000 series has the Optical Flow Accelerator silicon, so enabling this feature will still yield major increases, and you could still exceed 4090 frame rates (DLSS OFF) with just a 4070 (DLSS ON), if you're willing to accept the DLSS compromises. That being said, if you've got massive resolution (4K and 1440p) at massive refresh rates (240Hz, 360Hz and up), you definitely want to consider DLSS 3.0 on a higher numbered 4000 series if you want to get a bit closer to blurless sample-and-hold in supported games.
Be noted -- Frame rate amplification ratios can be somewhat less when you're starting off from a higher frame rate, than when starting off from a lower frame rate, since low frame rates are easier to frame-rate amplify massively than already-high frame rates. So 25fps may easily amplify to 100fps, but 100fps may only amplify to 200 or 300fps. There are many reasons for the tapering-off of ratios, but that still allows us to push our ultra high Hz monitors more.
Many geeks familiar with modern AI (of the 2020s) understand that many modern AI's are driven by a "training model", and the Optical Flow accelerator is combined with the neural network processing features. DLSS uses a training model pre-trained on a specific game's assets, and the DLSS "AI" becomes like an AutoComplete for missing details -- real-time rudimentary inpainting of sharper details. The original resolution is transformed by the DLSS AI training model into the destination resolution, infilling extra detail automatically, based on assets and rendering behaviors trained into the DLSS model, probably combined with other knowledge (raytracing, historical frames, motion vectors, etc). As DLSS improves, I bet it uses more of the motion variables (aka "Optical Flow" suggestiveness) like positional history, controller data, motion vectors, etc. So combining temporal data and spatial data in the basic neural network AI of DLSS. In other words, a simplistic real-time photoshopper artisting hundreds of frames per second. It's a lot less time-consuming to slightly spice up a low-=resolution frame, than to create AI art completely from scratch (ala MidJourney, Stable Diffusion, or DALL-E 2, etc) -- simply DLSS is using the low resolution frame to "trace out" the higher resolution frames as quickly as possible using known ground truths (game assets + current motion vectors of everything).
DLSS and AI art actually has some loosely common ground: They are trained on art assets. In new art AI's, it's a massive training set often consisting of the entire Internet, while in DLSS it's defacto an ultra-simplistic easier art AI only trained on the game assets.
DLSS came years before art AI's simply because it was easier to invent DLSS than to invent art AIs, but the concept is surprisingly similar where an art AI can use a reference image to trace/create a new image (whether sharper, or cartoonized or non-cartoonized or translated to a different style). Imagine, DLSS as kind of like a very basic proto-art-AI, except it's doing only minor modifications to an existing rendered image, using only a tiny subset of the skills built into a modern art AI, and using a much smaller training set (AI training model based only on that game's data, plus possibly common elementary data germane to all games and traditional rendering pipelines).
However, fundamentally, the AI concepts are surprisingly similar between DLSS and art AI's in certain areas -- it's simply a different order of magnitude of complexity. DLSS' only responsibility is to slightly "Zoom and Enhance" an existing image by 1.5x or 2x scaling. Where art AI's are doing many thousands of pixels from scratch, DLSS is only infilling tiny amounts -- like sharpening up a diagonal line or turning blurry distant text into clearer text (based on the bitmap trained into the AI model of DLSS for that specific game). The number of "parameters" required by DLSS is a tiny fraction of those required by art AI's -- and is a lot 'dumber' -- because you need to re-artist those frames in milliseconds. Rendering by polygons at low resolution, then rendering by AI to upscale the low resolution to high resolution as artifactlessly as possible, keeping as faithful as possible to the ground truth of the original traditional polygonal render (albiet imperfectly currently, mind you). Fewer pixels, fewer rays, while using DLSS AI magic to do the rest.
Mind you, DLSS is not perfect and it's still problem-ridden (pros and cons), but this is still a major technological step. The ratio of framerate-multiplier:artifacts will become less and less over time. At 2x ratios, DLSS 3.0 has much fewer artifacts than DLSS 2.0, but if you milk it to 4x ratios the artifacts can reappear again -- and not everyone is picky about them.
The good news is DLSS is tunable, you can adjusts the amount of DLSS work towards quality versus performance. So if you tune to the same frame rate amplification ratio as DLSS 2.0, you will be getting better quality with fewer artifacts than before! So there's proof of progress in the ratio of framerate:artifacts during frame rate amplification algorithms.
I expect the AI boom to have a major impact in frame rate amplification technologies, since it can make both spatial and temporal frame rate amplification technologies of all kinds much more accurate.
DLSS 3.0 (and beyond) is probably, by now, a complex AI-based soup of simultaneous interpolation, extrapolation and reprojection -- which is doable concurrently with neutral networks. And NVIDIA has branded all of that "Optical Flow Accelerator", highly suggestive of spatial (optical) and temporal (flow) -- a massive hint of what's under the hood.
We all know for humankind visibility benefit, pixel response needs to be pushed as close to 0ms GtG as possible (OLED 240Hz is clearer motion than LCD 360Hz!). Frame rates need to be upgraded geometrically, e.g. 60fps -> 120fps -> 240fps -> 480fps -> 1000fps.
I need to make a sequel article to Blur Busters Law: The Amazing Journey To Future 1000Hz Displays as well as Cheap 1000fps: Frame Rate Amplification Technologies, the latter article I wrote a few years ago that predicted the emergence of DLSS 3.0
I should admonish, that it's probably a better spend of money to get a 4070+OLED than only upgrading 4090, because OLED gives you the equivalent of a 1.5x framerate upgrade in blurless sample and hold (due to GtG-zeroing-based blur reductions). But if your wallet is fat, then get the best GPU and the best display.
Due to difficulty of Moore's Law continuing (in both clockspeed and transistor counts), I expect further progress in going even more massively multicore, e.g. more performance at less wattage, by using all kinds of spatial and temporal frame rate amplification tricks, in a co-processing manner.
I imagine DLSS 4.0 and 5.0 or beyond will start to approach the 5:1 and 10:1 ratios I predict in my article, necessary for future 8K 1000fps 1000Hz displays at UE5-engine quality. This will be a big challenge, given the prior semiconductor shortage and supply chain delays, so the improvements will be more incremental. And the 3000-series GPU glut from the Ethereum change and crypto crash, will probably slow down next-generation GPU sales somewhat for a while. So 4K-8K 1000fps 1000Hz frame rate amplification may delay itself for a bit yet, like to "sometime in the 2030s". At least for a single-GPU system. Unless you're rendering round-robin (custom optimized SLI setups) in a multi-GPU system. The AI boom accelerate unimpeded, which will figure a major role in shortcuts around Moore's Law limitations.
To be fair to other GPUs, we welcome all GPU progress, despite NVIDIA being the current gorilla of GPU progress --
I am excited to see what AMD's answer to DLSS 3.0 will be.
I must add I am also very pro-Intel and I want Intel to add better frame rate amplification to their next ARC (version 2.0).
I actually expected NVIDIA to increase frame rate amplification capabilities.
Now, ignoring the Ugly pricing part of "The Good, The Bad, and The Ugly"
The new "Optical Flow Accelerator" (dedicated frame rate amplification silicon!) falls under the category of on-silicon frame rate amplification, that finally breaks the 2:1 barrier. Benchmarks are scoring 3x and 4x frame rate increases now! There are some issues and artifacts but they will get smaller and smaller over time.
I believe that they are now starting to do both spatial and temporal methods are done concurrently -- using AI-based enhancement of a low-resolution render into a high-resolution render, and also using information from the previous frame to accelerate rendering the current frame. There seems to be both spatial and temporal AI built into DLSS 3.0!
We have recently witnessed art-making AI's like MidJourney and DALL-E 2 which can create surprisingly incredible art when rendered at maximum settings with excellent queries. Stable Diffusion AI can create a single AI art in 13 seconds on a previous-gen RTX card.
DLSS 3.0 seems to be a hybrid technology that draws upon previous frame rate amplifiation and AI innovations -- so DLSS 3.0 can be thought of as a technological between between ASW, DLSS 2.0 and an ultra-lightweight art AI.
But, modifying a half-resolution GPU render to simply "sharpen" it, is a much simpler task that can be done in real time -- mere millisecond(s) with the improved DLSS 3.0 neural training set, depending on what the output resolution is.
Now, metaphorically, in a simplistic manner -- imagine, it as "I am a GPU AI on-silicon as part of an RTX 4090. That looks like a diagonal line. And that looks like a curve. Let me sharpen it to 4x resolution. Also, that fuzzy text on the distant sign is the same as the readable text of the same sign in a previous sign, let me borrow the asset from previous frame to sharpen it up. And so on" -- basically a realtime PhotoShopper of each frame that tries to do it as faithfully as possible (albiet imperfectly).
I imagine Optical Flow Accelerator probably has some hidden temporally dense raytracing behaviors built into it, and then denoising it all temporally, to allow something like ~4x the raytracing resolution with the same number of ray calculations. Longtime readers will know my TestUFO was cited on page 2 of NVIDIA's Temporally Dense Raytracing, a frame rate amplification techology for raytracing. A descendant of that is likely probably done concurrently simultaneously in a more error-diffusion manner with a motion-compensated noise-reduction technology, as a method of frame rate amplifying for ray tracing on the RTX 4000 series.
I should say I am only speculating what is under the hood of DLSS 3.0, projecting based on what NVIDIA has publicly discussed in white papers, talks, and conventions -- but this is a natural progression.
Although this is an over-anthropomorphism of an explanation, this helps many people to correctly technologically imagine future frame rate amplification technologies, since AI-silicon can do it faster than re-rendering billions of triangles per frame. Essentially artificially artisting those sharper frames based on both spatial and temporal history of previous frames as well as a game-specific AI training model (partially why DLSS is game-specific, the DLSS AI neural network model needs to be trained on the game's own assets).
Right now it seems to be a 1:1 sync between Present() and a new DLSS 3.0 frame, it has not been quite decoupled (at least not in DLSS 3.0).
While I lust after the 4090 to see if it can be milked to 1000fps 1080p in certain games (if it hasn't hit certain AI or memory bandwidth bottlenecks), the good news is that the cheapest 4000 series has the Optical Flow Accelerator silicon, so enabling this feature will still yield major increases, and you could still exceed 4090 frame rates (DLSS OFF) with just a 4070 (DLSS ON), if you're willing to accept the DLSS compromises. That being said, if you've got massive resolution (4K and 1440p) at massive refresh rates (240Hz, 360Hz and up), you definitely want to consider DLSS 3.0 on a higher numbered 4000 series if you want to get a bit closer to blurless sample-and-hold in supported games.
Be noted -- Frame rate amplification ratios can be somewhat less when you're starting off from a higher frame rate, than when starting off from a lower frame rate, since low frame rates are easier to frame-rate amplify massively than already-high frame rates. So 25fps may easily amplify to 100fps, but 100fps may only amplify to 200 or 300fps. There are many reasons for the tapering-off of ratios, but that still allows us to push our ultra high Hz monitors more.
Many geeks familiar with modern AI (of the 2020s) understand that many modern AI's are driven by a "training model", and the Optical Flow accelerator is combined with the neural network processing features. DLSS uses a training model pre-trained on a specific game's assets, and the DLSS "AI" becomes like an AutoComplete for missing details -- real-time rudimentary inpainting of sharper details. The original resolution is transformed by the DLSS AI training model into the destination resolution, infilling extra detail automatically, based on assets and rendering behaviors trained into the DLSS model, probably combined with other knowledge (raytracing, historical frames, motion vectors, etc). As DLSS improves, I bet it uses more of the motion variables (aka "Optical Flow" suggestiveness) like positional history, controller data, motion vectors, etc. So combining temporal data and spatial data in the basic neural network AI of DLSS. In other words, a simplistic real-time photoshopper artisting hundreds of frames per second. It's a lot less time-consuming to slightly spice up a low-=resolution frame, than to create AI art completely from scratch (ala MidJourney, Stable Diffusion, or DALL-E 2, etc) -- simply DLSS is using the low resolution frame to "trace out" the higher resolution frames as quickly as possible using known ground truths (game assets + current motion vectors of everything).
DLSS and AI art actually has some loosely common ground: They are trained on art assets. In new art AI's, it's a massive training set often consisting of the entire Internet, while in DLSS it's defacto an ultra-simplistic easier art AI only trained on the game assets.
DLSS came years before art AI's simply because it was easier to invent DLSS than to invent art AIs, but the concept is surprisingly similar where an art AI can use a reference image to trace/create a new image (whether sharper, or cartoonized or non-cartoonized or translated to a different style). Imagine, DLSS as kind of like a very basic proto-art-AI, except it's doing only minor modifications to an existing rendered image, using only a tiny subset of the skills built into a modern art AI, and using a much smaller training set (AI training model based only on that game's data, plus possibly common elementary data germane to all games and traditional rendering pipelines).
However, fundamentally, the AI concepts are surprisingly similar between DLSS and art AI's in certain areas -- it's simply a different order of magnitude of complexity. DLSS' only responsibility is to slightly "Zoom and Enhance" an existing image by 1.5x or 2x scaling. Where art AI's are doing many thousands of pixels from scratch, DLSS is only infilling tiny amounts -- like sharpening up a diagonal line or turning blurry distant text into clearer text (based on the bitmap trained into the AI model of DLSS for that specific game). The number of "parameters" required by DLSS is a tiny fraction of those required by art AI's -- and is a lot 'dumber' -- because you need to re-artist those frames in milliseconds. Rendering by polygons at low resolution, then rendering by AI to upscale the low resolution to high resolution as artifactlessly as possible, keeping as faithful as possible to the ground truth of the original traditional polygonal render (albiet imperfectly currently, mind you). Fewer pixels, fewer rays, while using DLSS AI magic to do the rest.
Mind you, DLSS is not perfect and it's still problem-ridden (pros and cons), but this is still a major technological step. The ratio of framerate-multiplier:artifacts will become less and less over time. At 2x ratios, DLSS 3.0 has much fewer artifacts than DLSS 2.0, but if you milk it to 4x ratios the artifacts can reappear again -- and not everyone is picky about them.
The good news is DLSS is tunable, you can adjusts the amount of DLSS work towards quality versus performance. So if you tune to the same frame rate amplification ratio as DLSS 2.0, you will be getting better quality with fewer artifacts than before! So there's proof of progress in the ratio of framerate:artifacts during frame rate amplification algorithms.
I expect the AI boom to have a major impact in frame rate amplification technologies, since it can make both spatial and temporal frame rate amplification technologies of all kinds much more accurate.
DLSS 3.0 (and beyond) is probably, by now, a complex AI-based soup of simultaneous interpolation, extrapolation and reprojection -- which is doable concurrently with neutral networks. And NVIDIA has branded all of that "Optical Flow Accelerator", highly suggestive of spatial (optical) and temporal (flow) -- a massive hint of what's under the hood.
We all know for humankind visibility benefit, pixel response needs to be pushed as close to 0ms GtG as possible (OLED 240Hz is clearer motion than LCD 360Hz!). Frame rates need to be upgraded geometrically, e.g. 60fps -> 120fps -> 240fps -> 480fps -> 1000fps.
I need to make a sequel article to Blur Busters Law: The Amazing Journey To Future 1000Hz Displays as well as Cheap 1000fps: Frame Rate Amplification Technologies, the latter article I wrote a few years ago that predicted the emergence of DLSS 3.0
I should admonish, that it's probably a better spend of money to get a 4070+OLED than only upgrading 4090, because OLED gives you the equivalent of a 1.5x framerate upgrade in blurless sample and hold (due to GtG-zeroing-based blur reductions). But if your wallet is fat, then get the best GPU and the best display.
Due to difficulty of Moore's Law continuing (in both clockspeed and transistor counts), I expect further progress in going even more massively multicore, e.g. more performance at less wattage, by using all kinds of spatial and temporal frame rate amplification tricks, in a co-processing manner.
I imagine DLSS 4.0 and 5.0 or beyond will start to approach the 5:1 and 10:1 ratios I predict in my article, necessary for future 8K 1000fps 1000Hz displays at UE5-engine quality. This will be a big challenge, given the prior semiconductor shortage and supply chain delays, so the improvements will be more incremental. And the 3000-series GPU glut from the Ethereum change and crypto crash, will probably slow down next-generation GPU sales somewhat for a while. So 4K-8K 1000fps 1000Hz frame rate amplification may delay itself for a bit yet, like to "sometime in the 2030s". At least for a single-GPU system. Unless you're rendering round-robin (custom optimized SLI setups) in a multi-GPU system. The AI boom accelerate unimpeded, which will figure a major role in shortcuts around Moore's Law limitations.
To be fair to other GPUs, we welcome all GPU progress, despite NVIDIA being the current gorilla of GPU progress --
I am excited to see what AMD's answer to DLSS 3.0 will be.
I must add I am also very pro-Intel and I want Intel to add better frame rate amplification to their next ARC (version 2.0).