Re: Special K can drastically reduce latency
Posted: 08 Oct 2020, 03:23
Just discovered this thread
I feel weird not being part of the discussion on my own software...
Let me start by saying that, the reason I have not rushed to make any bold claims or publish anything is because I want the sweet end-to-end validation that NVIDIA's LDAT tool can offer instead of doing it all in software. Also, rather hilariously, latency never mattered to me at all until NVIDIA's Reflex PR.
I have been tuning my framerate limiter over the years for the sole purpose of scheduling frames at a constant rate so that awful console ports with physics tied to framerate work correctly on a wide range of PC hardware. There are some truly stupid games that cannot finish cutscenes if sampled time intervals are not aligned on perfect boundaries (i.e. NPC 0 cannot move from point A to point B without clipping an object in the scene and NPC 1 only starts doing something when NPC 0 reaches its final position (B)). Making those games work at arbitrary framerates was the reason SK's framerate limiter was created and up until a month ago, I was happy to leave it there
Simply placing the delay on the correct side of the swapchain Present (...) / GDI SwapBuffers (...) call was adequate to prevent my limiter from _introducing_ latency, that much was determined years and years ago. To be honest, I figured reducing latency could not be watered down to a two- or three-button process for the end-user and was content to leave my framerate limiter at "it does not unreasonably increase latency" and consider the design complete.
----
To my surprise, consistent timing + reduced latency is possible and I have spent the last month studying ways to minimize render queue latency without requiring the end-user to make 5 or 6 swapchain adjustments in Special K's control panel.
I have gone so far as to integrate Presentation Statistics (Fullscreen Exclusive / DXGI Flip Model) into my framepacing graph as a histogram showing present delay, it has proved incredibly useful to watch this data in real-time rather than rely on PresentMon to collect the information and analyze later.
Even if achieving lower-latency does not eventually boil down to a simple 2 or 3 click procedure, a power-user can watch the histogram and quickly begin to hunt down sources of latency (e.g. Xbox Game Bar adds +1 frame of latency whenever it is visible, and other similar multi-plane overlays behave the same). And that's tremendously powerful, and not something that's been done before.
The following video was captured for the purpose of testing NV's new HDR vidcap (works like a dream, BTW) when Special K's HDR functions are activated, but the framepacing widget (top-left) shows what I am discussing.
Sadly some of the tuning knobs that DXGI allows for lowering render queue latency are not practical in Direct3D 12 / Vulkan, because the driver does not implicitly manage resource allocation for the command queue(s) tied to each swapchain backbuffer in the low-level APIs. No doubt the same reason that "Ultra Low Latency" is inapplicable in Vk/D3D12.
Latency and pacing is what it is in D3D12; the best I have been able to do to improve on it is insert a WaitFor...Objects (...) call to delay frame-based game logic until there's a uncontested swapchain backbuffer to draw into -- most D3D12 engines should already be doing this (Horizon: Zero Dawn was not, and a screenshot earlier in this thread illustrates why engines should be doing this).
-----
Tl;Dr:
Why wasn't I invited to this party?
I am very pleased to hear that other individuals have requisitioned LDAT hardware from NVIDIA.
Distributing those tools to popular content creators w/o a formal application for actual developers, or better still, reaching out to well-known developers (such as Unwinder) is baffling. GamersNexus and Digital Foundry can undoubtedly put the tools to good use, but I think that's where the list ends and any other YouTuber who received one was given a fancy tool they will never use

Let me start by saying that, the reason I have not rushed to make any bold claims or publish anything is because I want the sweet end-to-end validation that NVIDIA's LDAT tool can offer instead of doing it all in software. Also, rather hilariously, latency never mattered to me at all until NVIDIA's Reflex PR.
I have been tuning my framerate limiter over the years for the sole purpose of scheduling frames at a constant rate so that awful console ports with physics tied to framerate work correctly on a wide range of PC hardware. There are some truly stupid games that cannot finish cutscenes if sampled time intervals are not aligned on perfect boundaries (i.e. NPC 0 cannot move from point A to point B without clipping an object in the scene and NPC 1 only starts doing something when NPC 0 reaches its final position (B)). Making those games work at arbitrary framerates was the reason SK's framerate limiter was created and up until a month ago, I was happy to leave it there

Simply placing the delay on the correct side of the swapchain Present (...) / GDI SwapBuffers (...) call was adequate to prevent my limiter from _introducing_ latency, that much was determined years and years ago. To be honest, I figured reducing latency could not be watered down to a two- or three-button process for the end-user and was content to leave my framerate limiter at "it does not unreasonably increase latency" and consider the design complete.
----
To my surprise, consistent timing + reduced latency is possible and I have spent the last month studying ways to minimize render queue latency without requiring the end-user to make 5 or 6 swapchain adjustments in Special K's control panel.
I have gone so far as to integrate Presentation Statistics (Fullscreen Exclusive / DXGI Flip Model) into my framepacing graph as a histogram showing present delay, it has proved incredibly useful to watch this data in real-time rather than rely on PresentMon to collect the information and analyze later.
Even if achieving lower-latency does not eventually boil down to a simple 2 or 3 click procedure, a power-user can watch the histogram and quickly begin to hunt down sources of latency (e.g. Xbox Game Bar adds +1 frame of latency whenever it is visible, and other similar multi-plane overlays behave the same). And that's tremendously powerful, and not something that's been done before.
The following video was captured for the purpose of testing NV's new HDR vidcap (works like a dream, BTW) when Special K's HDR functions are activated, but the framepacing widget (top-left) shows what I am discussing.
Sadly some of the tuning knobs that DXGI allows for lowering render queue latency are not practical in Direct3D 12 / Vulkan, because the driver does not implicitly manage resource allocation for the command queue(s) tied to each swapchain backbuffer in the low-level APIs. No doubt the same reason that "Ultra Low Latency" is inapplicable in Vk/D3D12.
Latency and pacing is what it is in D3D12; the best I have been able to do to improve on it is insert a WaitFor...Objects (...) call to delay frame-based game logic until there's a uncontested swapchain backbuffer to draw into -- most D3D12 engines should already be doing this (Horizon: Zero Dawn was not, and a screenshot earlier in this thread illustrates why engines should be doing this).
-----
Tl;Dr:
Why wasn't I invited to this party?

I am very pleased to hear that other individuals have requisitioned LDAT hardware from NVIDIA.
Distributing those tools to popular content creators w/o a formal application for actual developers, or better still, reaching out to well-known developers (such as Unwinder) is baffling. GamersNexus and Digital Foundry can undoubtedly put the tools to good use, but I think that's where the list ends and any other YouTuber who received one was given a fancy tool they will never use
