RTSS now has new automatic Low-Lag VSYNC ON (raster based)

Everything about displays and monitors. 120Hz, 144Hz, 240Hz, 4K, 1440p, input lag, display shopping, monitor purchase decisions, compare, versus, debate, and more. Questions? Just ask!
Post Reply
User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

RTSS now has new automatic Low-Lag VSYNC ON (raster based)

Post by Chief Blur Buster » 28 May 2018, 10:28

Everyone,

Thanks to my own suggestion to their team --
RTSS now has a new automatic Low-Lag VSYNC ON mode.

It supports beam-raced page flips now (VBI racing), producing what may be the lowest possible VSYNC ON input lag -- without additional programmer work to reduce input lag further in the gaming software.

(Original ChangeLog: https://forums.guru3d.com/threads/rtss- ... st-5549072 ...)

New feature highlighted in blue.
Unwinder, post: 5550748, member: 30019 wrote:RTSS 7.2.0 beta 1 is online:

http://www.guru3d.com/files-details/rts ... nload.html

· Added On-Screen Display performance profiler. Power users may enable it to measure and visualize CPU and GPU performance overhead added by On-Screen Display rendering. Two performance profiling modes are available:
o Compact mode provides basic and the most important CPU prepare (On-Screen Display hypertext formatting, parsing and tessellation), CPU rendering and total CPU times, as well as GPU rendering time (currently supported for Direct3D9+ and OpenGL applications only)
o Full mode provides additional and more detailed per-stage CPU times
· Improved built-in framerate limiter:
o Added power user oriented profile setting, allowing you to specify the limit directly as a target frametime with 1 microsecond precision
o Added power user oriented profile setting, allowing you to adjust throttle time. Throttle time adjustment is aimed to reduce input lag when framerate is below the target limit or without limiting the framerate
o Added power user oriented profile setting, allowing you to synchronize framerate to up to two independent scanline indices per refresh interval. Combining with user configurable scanline wait timeout, those settings provide experienced users low input lag adaptive VSync or FastSync functionality on any hardware

· Various On-Screen Display optimizations and improvements:
o Added adjustable minimum refresh period for On-Screen Display renderer. The period is set to 10 milliseconds by default, so now the On-Screen Display is not allowed to be refreshed more frequently than 100 times per second. Such implementation allows keeping smooth animation when On-Screen Display contents are being updated on each frame (e.g. when displaying realtime frametime graph) without wasting too much CPU time on it
o Added alternate GPU copy based Vector2D On-Screen Display rendering mode implementation for Direct3D1x applications. New mode provides up to 5x Vector2D performance improvement on NVIDIA graphics cards, however it is disabled on AMD hardware due to slow implementation of CopySubresourceRegion in AMD display drivers
o Vector2D rendering mode is now forcibly disabled in Vulkan applications on AMD graphics cards due to insanely slow implementation of vkCmdClearAttachments in AMD display drivers
o Revamped geometry batching and vertex buffer usage strategy in pure Direct3D12 On-Screen Display renderer (currently used in Halo Wars 2 only)
o Added Vector2D rendering mode support to pure Direct3D12 On-Screen Display renderer
o Optimized On-Screen Display hypertext parsing and tessellation implementation
o Optimized state changes in OpenGL On-Screen Display rendering implementation
o Optimized state changes in Direct3D1x On-Screen Display rendering implementation
o Solid rectangles and line primitives in Direct3D8 and Direct3D9 On-Screen Display rendering implementations are now rendered from vertex buffer instead of user memory
o Improved OpenGL framebuffer dimensions detection when framebuffer coordinate space is selected
· Fixed On-Screen Display rendering in wrong colors when Vector2D mode is selected and Direct3D1x applications use 10-bit framebuffer
· Fixed Vulkan fence synchronization issue, which could cause GPU-limited Vulkan applications to hang due to attempt to reuse busy command buffer
· Active busy-wait loop in the framerate limiter module is now forcibly interrupted during unloading the hooks library to minimize the risk of deadlocking 3D application when dynamically closing RivaTuner Statistics Server during 3D application runtime
· Improved synchronization in 32-bit hook uninstallation routines
· Updated profiles list

A few notes about new toys for power users:

New performance profiler

Performance profiler can be enabled by setting PerformanceProfiler field in [OSD] section to 1 (basic mode) or 2 (detailed mode). "Show own statistics" must be enabled in RTSS to see the profiler. The following performance counters are available for detailed mode:

CPU acquire – CPU time, spend on acquiring access to 3D API. This CPU time depends on 3D API used by application, in most cases it is zero, for D3D12 applications displaying OSD in D3D11on12 mode it is CPU time spend on acquiring D3D11on12 wrapper for rendering, in Vulkan applications asynchronically presenting frames from compute queue (e.g. DOOM on Wolfenstein II on AMD cards) it is CPU time spend on synchronizing graphics and compute queues. For OpenGL applications it can be nonzero if application is forcibly flushing the pipeline in the end of each frame rendering with glFlush. CPU acquire stage is executed on each frame.

CPU prepare – CPU time spend on preparing OSD contents for rendering. This CPU time doesn’t depend on 3D API used by application, it entirely depends on the amount of text/graphs you’re displaying in OSD. CPU prepare time is divided into the following substages: init, parse and tessellate. Init is CPU time spend on formatting own RTSS OSD contents (i.e. formatting own framerate counters, scanning hypertext and replacing framerate macro with real formatted framerate values, formatting performance counters, benchmark statistics etc). Parse is CPU time spend on parsing resulting OSD hypertext (including the hypertext supplied by OSD clients like MSI AB or HwInfo), processing hypertext formatting tags and preparing OSD contents to collection of text with attributes to be tessellated on the next stage. Tessellate is CPU time spend on converting parsed OSD text and attributes to renderable form (collection of vector rects for each symbol for vector 2D/3D OSD rendering modes or collection of textured quads for each symbol for raster 3D mode). CPU prepare stage is executed on the frames when OSD contents is refreshing, i.e. if you’re displaying OSD with framerate counter and default refresh rate in RTSS properties (500 ms), then OSD is refreshing and this stage is executed just twice per second.

CPU render – CPU time spend on rendering OSD. This CPU time depends on 3D API used by application and on OSD rendering mode selected in RTSS (Vector2D, Vector3D or Raster3D). CPU render time is divided into the following substages: save, submit and restore. Save is CPU time spend on saving 3D rendering pipeline state before rendering OSD. This substage entirely depends on 3D API used by application, for example state changes are most expensive for Direct3D9 applications (especially pure Direct3D9 ones). Low-level 3D APIs (pure Direct3D12 or Vulkan) do not require saving pipeline state, so this CPU time is zero. Vector2D OSD rendering mode also doesn’t require saving and restoring rendering pipeline state, so it is zero in this case too. Submit is CPU time spend on filling vertex buffers with previously tessellated OSD geometry and submitting it to 3D API. Restore is CPU time spend on restoring previously saved 3D rendering pipeline state after drawing OSD. CPU render stage is executed on each frame.

CPU capture – CPU time spend of capturing framebuffer contents. This stage is executed and this time is not equal to zero during videocapture only.

CPU flush – CPU time spend on the final stage of flushing OSD renderer and returning control to application’s 3D API. This time is D3D11on12 wrapper flushing time for all applications besides D3D12 applications displaying OSD in D3D11on12 mode. For applications using different 3D APIs it is zero. This stage is executed on each frame.

CPU total – total CPU time including all stages listed above.

GPU render – GPU time spend on rendering OSD. This performance counter is currently collected for Direct3D9, Direct3D10, Direct3D11, Direct3D12 applications displaying overlay in D3D11on12 mode and OpenGL applications only. GPU render time profiling is currently not supported for Vulkan and pure Direct3D12 applications.


New scanline sync based framerate limiter

Before you start experimenting with new sync mode, it is recommended to enable diagnostic scanline sync related info in OSD by setting SyncInfo field in [OSD] section to 1. "Show own statistics" must be also enabled in RTSS to see it. New scanline sync based framerate limiter is controlled by the following values:

SyncDisplay – name of logical display device to be synchronized with. Currently it is a primary display name.

SyncScanline0 – index of the first scanline for framerate synchronization. No synchronization is performed when it is set to zero, otherwise this is treated as scanline index starting from top of the frame. E.g. SyncScanline0=1 means that the frame is will be synchronized with the top (or more precisely the second scanline, because indices are zero based) scanline and SyncScanline0=1000 means that each frame will be synchronized with scanline 1000 (which is located in the bottom part of screen if we use 1080p mode with total 1125 scanlines total).

SyncScanline1 – index of the second scanline for framerate synchronization. Defining two independent sycnhronization points per refresh allows us to get functionality of NVIDIA's FreeSync, when why get 2xRefresh smooth framerate). No synchronization is performed when it is set to zero, otherwise this is treated as index starting from middle of the frame. E.g. SyncScanline1=1 with total 1125 scanlines means that the frame is will be synchronized with the scanline 562(1125/2)+1=563 scanline and SyncScanline1=400 means that each frame will be synchronized with scanline 562(1125/2)+400=962 (which is located in the bottom part of screen if we use 1080p mode with 1125 scanlines total).

SyncTimeout – allows adjusting timeout for scanline synchronization. The timeout provides functionality similar to NVIDIA’s Adaptive Sync, meaning that you may forcibly disable synchronization when framerate drops below the refresh rate. Timeout can be specified either explicitly in microseconds (e.g. SyncTimeout=16667 for 60Hz refresh rate) or you can let RTSS to benchmark and calibrate it automatically and set it to 1/N of refresh time when SyncTimeout=N is in [1,8] range).

Summarizing, you may start experiments with scanline sync with the following presets:

For traditional VSync with low input lag:

Code: Select all

SyncScanline0=1
SyncScanline1=0
SyncTimeout=0
In this case tearline position is fixed in the top of frame, so you can move it down via tuning and increasing SyncScanline0 value.

For adaptive VSync with low input lag on 60Hz refresh rate:

Code: Select all

SyncScanline0=1
SyncScanline1=0
SyncTimeout=16667
or calibrate timeout automatically:

Code: Select all

SyncScanline0=1
SyncScanline1=0
SyncTimeout=1
For FastSync (i.e. 2x refresh rate framerate, 120FPS for 60Hz refresh rate)

Code: Select all

SyncScanline0=1
SyncScanline1=1
SyncTimeout=0
In this case tearlines will be in the top and in the middle of frame, you can move it down via synchronically increasing SyncScanline0 and SyncScanline1 values. To control timeout in such case use either explicit value:

Code: Select all

SyncScanline0=1
SyncScanline1=1
SyncTimeout=8333
or calibrate timeout automatically:

Code: Select all

SyncScanline0=1
SyncScanline1=1
SyncTimeout=2
[/color][/b]
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
RealNC
Site Admin
Posts: 3740
Joined: 24 Dec 2013, 18:32
Contact:

Re: RTSS now has new automatic Low-Lag VSYNC ON (raster base

Post by RealNC » 28 May 2018, 10:37

I get very heavy tearing with this. Tearing is confined to a specific area of the screen, but that area is quite tall (about 30% screen height.)
SteamGitHubStack Overflow
The views and opinions expressed in my posts are my own and do not necessarily reflect the official policy or position of Blur Busters.

Glide
Posts: 280
Joined: 24 Mar 2015, 20:33

Re: RTSS now has new automatic Low-Lag VSYNC ON (raster base

Post by Glide » 29 May 2018, 06:58

So if I have this correct: this should essentially be setting up the framerate limiter to the best possible value for latency automatically based on your refresh rate, while trying to maintain a fixed position for the tear line.
The "SyncScanline0" option sets the target position for the tear line, with "1" being the very top of the display - though in practice it's typically 10% or so down for me, and can jump as far down as 50% at times.

And you could use this rather than setting a framerate limit of 3 FPS below your refresh rate with a G-Sync display? Or would it be best to leave that as it is?

User avatar
RealNC
Site Admin
Posts: 3740
Joined: 24 Dec 2013, 18:32
Contact:

Re: RTSS now has new automatic Low-Lag VSYNC ON (raster base

Post by RealNC » 29 May 2018, 19:53

Glide wrote:And you could use this rather than setting a framerate limit of 3 FPS below your refresh rate with a G-Sync display? Or would it be best to leave that as it is?
This is not for g-sync. This is for people who don't have a g-sync monitor. (Well, if it worked. Which it doesn't seem to here. Chief didn't reply yet if it actually works for him either. Did anyone other than me actually use it yet?)
SteamGitHubStack Overflow
The views and opinions expressed in my posts are my own and do not necessarily reflect the official policy or position of Blur Busters.

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: RTSS now has new automatic Low-Lag VSYNC ON (raster base

Post by Chief Blur Buster » 29 May 2018, 20:08

RealNC wrote:I get very heavy tearing with this. Tearing is confined to a specific area of the screen, but that area is quite tall (about 30% screen height.)
I have not had time yet to test (this release came suddenly for me).
I can use some extra testers or volunteers to vet this out.

For this, ultra-high-accuracy framepacing becomes extremely critical. A 30% screen height jitter means for 144Hz, (1/144sec * 30%) means a 2 millisecond inaccuracy in pageflipping. Looks like your system might be not be doing microsecond sleeping (mine is from other software!) -- millisecond-accuracy thread sleeping will cause too much tearline jitter.

I do need to test this though, and will try to do so soon. However, I'm at a point where I need to hire a freelancer to do this tests (inquire within -- [email protected]) -- volunteering readers can follow up here with tearline-jitter results!

Tearline jitter is a great benchmark on page-flip timing inaccuracy jitter. Just measure the amplitude (e.g. 10% screen height) then multiply that by refresh cycle time, and -- voila -- you got your framepacing error margin!
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
RealNC
Site Admin
Posts: 3740
Joined: 24 Dec 2013, 18:32
Contact:

Re: RTSS now has new automatic Low-Lag VSYNC ON (raster base

Post by RealNC » 29 May 2018, 20:11

Chief Blur Buster wrote:A 30% screen height jitter means for 144Hz, (1/144sec * 30%) means a 2 millisecond inaccuracy in pageflipping.
No, that's for 60Hz, not 144.
SteamGitHubStack Overflow
The views and opinions expressed in my posts are my own and do not necessarily reflect the official policy or position of Blur Busters.

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: RTSS now has new automatic Low-Lag VSYNC ON (raster base

Post by Chief Blur Buster » 29 May 2018, 20:17

RealNC wrote:
Chief Blur Buster wrote:A 30% screen height jitter means for 144Hz, (1/144sec * 30%) means a 2 millisecond inaccuracy in pageflipping.
No, that's for 60Hz, not 144.
Even worse. (1/60sec * 30%) = 5ms

I wonder if the jitter is being caused by the game or by RTSS.

I'll follow up at the RTSS forums to inform them about the usefulness of tearline position as a "framepacing accuracy" technique.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
RealNC
Site Admin
Posts: 3740
Joined: 24 Dec 2013, 18:32
Contact:

Re: RTSS now has new automatic Low-Lag VSYNC ON (raster base

Post by RealNC » 29 May 2018, 20:27

Chief Blur Buster wrote:I wonder if the jitter is being caused by the game or by RTSS.
Didn't you have to work on a prediction algorithm in order to get rid of this jitter? Seems like RTSS is taking the reported scaline positions at face value.
SteamGitHubStack Overflow
The views and opinions expressed in my posts are my own and do not necessarily reflect the official policy or position of Blur Busters.

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: RTSS now has new automatic Low-Lag VSYNC ON (raster base

Post by Chief Blur Buster » 29 May 2018, 20:51

RealNC wrote:
Chief Blur Buster wrote:I wonder if the jitter is being caused by the game or by RTSS.
Didn't you have to work on a prediction algorithm in order to get rid of this jitter? Seems like RTSS is taking the reported scaline positions at face value.
That's only because I'm completely avoiding D3DKMTGetScanLine() in my "Tearline Jedi" beam racing demo.

The values returned by D3DKMTGetScanLine() call is fairly accurate -- I can get to 4 scanline jitter. It does have some special programming considerations (see the beam racing related threads Software Developers Forums if you are a programmer). It definitely won't be the cause of your tearline jitter.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: RTSS now has new automatic Low-Lag VSYNC ON (raster base

Post by Chief Blur Buster » 29 May 2018, 21:12

General Guidelines

It's better to key scan lines to bottom edge of screen, not top edge of screen because
-- The GPU is less busy (less background processing) when it's nearly finished scanning-out than when it begins scanning-out
-- The blanking interval (aka VSYNC/VBI) hides a jittering bottom-edge tearline better.

New Testers Welcome Here

Have problems with RTSS scan line tweaking? Myself at Blur Busters is experienced with "beam raced" tearlines & will try to help in figuring out problems.

1. Film your motion problems at 60fps (tearline jittering, etc). Use horizontal panning game motion. Post to YouTube.

2. Paste the YouTube link here & your RTSS config file here.

We may not be able to help but some things are quite simple, since I have programmed beam racing (raster interrupts) and tearlines are simply raster scan line positions. I've gotten very good at interpreting stationary-tearline-jitter behaviour (beam raced tearline problems) -- so a good video of the jitter may allow me to diagnose the problem.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Post Reply