Emulator Developers: Lagless VSYNC ON Algorithm

Talk to software developers and aspiring geeks. Programming tips. Improve motion fluidity. Reduce input lag. Come Present() yourself!
User avatar
Chief Blur Buster
Site Admin
Posts: 11653
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 20 Mar 2018, 15:41

Thank you, that is very impressive stability, even in this experiment-only version! Scanline-exact stability.

Obviously, you're doing it in higher-precision programming languages, which is much better for that.

QUESTION -- When do you plan to submit your (optionalized, refined, nonbreaking) patch to GroovyMAME or some other venue?

Meanwhile -- I'll also be publishing an open source raster calculator/estimator module (only needs to be fed a VSYNC heartbeat) so anybody can use it as a fallback for when RasterStatus is not available. This will bring cross-platform beam chasing, as long as the platform has a VSYNC OFF tearing compatible API + ability to listen to a VSYNC heartbeat.

The direct use of RasterStatus makes it unnecessary to know the VBI size. That said, I'm trying to make beam chasing cross-platform by avoiding the use of RasterStatus now (use platform-specific APIs only when available, but fallback to software estimated raster position) -- I will be open sourcing my raster estimator module.

So, maybe to prepare for future RasterStatus.ScanLine-free implementations, you might want to wrapper your RasterStatus call so it can fallback to software-based raster estimator formulas on MacOS or Linux. (The only caveat is that it requires the ability to listen to a VSYNC heartbeat, while in VSYNC OFF mode -- in order to guess the ScanLine value). The VSYNC OFF tearline is always raster based as a time-basis from the last VBI, and the exact scanline of the tearing can be estimated as a time-basis between two blanking intervals (minus blanking interval time, which can either be assumed/guessed/configured/detected -- or use the 45/1125 constant as an assumption (usually only a 1% raster-line misguess ratio in the vertical screen dimension) -- even a 5% mis-guess of raster position isn't the end of the world -- less than 0.5ms of lag for not knowing exact raster position -- and still workable for 10-slice beam chasing).

Also -- WinUAE -- the Amiga Emulator (Toni who I had talked to about my ideas before you posted) said he's eagerly going to implement it for the next release, though will take time. In your post a few posts ago, you had said you wanted to be first so by posting your (still experimental, platform-specific) code, you've cemented being the first (unofficial) emulator implementation. I've just fired off a small email to Toni to mention that you had wanted to be first to an actual (official) release, but no guarantees -- I had let him know about my ideas before you posted -- just wanted to give you fair notification
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Calamity
Posts: 24
Joined: 17 Mar 2018, 10:36

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Calamity » 20 Mar 2018, 17:02

Hi Mark,

I'm not sure when or how I'm going to release this thing. In a way it's already released: you have the binary and the source. As I've said, it's problematic with many drivers, which is the reason why I refrained from releasing it right away. Then you opened this thread and my pride was stronger than my prudence.
BTW, Toni (Amiga emulator author) has confirmed he has an experiment going on in WinUAE and is expected to release beam chasing in WinUAE in the next release. But should I tell him to hold off until you've officially released your submission?
Why should he hold off? It's great that he's planning to implement this idea. The funny part of this story is that it all started with Dr.Venom from BYOAC testing Toni's beam chasing Amiga program on GroovyMAME and pointing out it could not replicate the real hardware response. (Amiga is one of the MAME drivers that supports the frame slice feature without apparently breaking too much because it's coded with partial screen updates in mind).

EDIT:
Chief Blur Buster wrote:In your post a few posts ago, you had said you wanted to be first so by posting your (still experimental, platform-specific) code, you've cemented being the first (unofficial) emulator implementation.
I was excited to show current frame response for the first time (unknowing anyone else was working on the same idea) but just because it's a super cool thing, rather than to see my nickname carved in marble. I refrained from doing an official release for these reasons:

a) I need specific equipment/setup in order to record a proper video for this task.
b) I got very disappointed because many drivers in MAME are broken due to this feature (I'll explain the reason on a future post)
c) Real life issues need attention before I can properly target a) and b)

---------------------

There's a feature that I think could be a nice addition to your library: an event based waitable raster status to avoid busy loops.

I'm highly interested on your RasterStatus-free implementations. Past experience has made me very skeptical about timers but things may have changed nowadays.

In Linux we're currently using drm to control vsync directly and then immediately do OpenGL flips with vsync off. It works like a charm, but we're missing the raster position for more advanced control in alignment with the Windows implementation. The issue with Linux is you can't access the raster position from userland. Doozer from BYOAC already suggested using timers instead to avoid a kernel patch solution.

Although, IMHO, you can be either awesome or cross-platform.

User avatar
Chief Blur Buster
Site Admin
Posts: 11653
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 20 Mar 2018, 18:39

Calamity wrote:Why should he hold off? It's great that he's planning to implement this idea. The funny part of this story is that it all started with Dr.Venom from BYOAC testing Toni's beam chasing Amiga program on GroovyMAME and pointing out it could not replicate the real hardware response. (Amiga is one of the MAME drivers that supports the frame slice feature without apparently breaking too much because it's coded with partial screen updates in mind).
Great!
I'm glad no hard feelings, I was worried for a moment!

I don't want to steal other people's work of arts, but to help spread the low-lag ideas around. As Blur Busters visitors are often low-lag nuts like me, so naturally, beam chasing plays well into this sphere!
Calamity wrote:There's a feature that I think could be a nice addition to your library: an event based waitable raster status to avoid busy loops.
That's a great idea! Initially it's just a C# raster calculator module, in a K.I.S.S. approach.
Will extend it to include that "raster interrupt" idea!
Calamity wrote:I'm highly interested on your RasterStatus-free implementations. Past experience has made me very skeptical about timers but things may have changed nowadays.
They certainly have!
I don't use timers, I simply do a math calculations on a monotonically-increasing counter registers that increments by 1 every exactly 1/10,000,000 second. So I can have really accurate "timestamping" techniques, and time since the last VSYNC event.

And you can de-jitter erratic VSYNC events (see http://www.vsynctester.com as an example of de-jittering VSYNC events).

I simply use high performance clock counters (basically the microsecond counters built into the CPU). Currently accurate to ~1/10,000,000th of a second on most platforms now. Windows API QueryPerformanceCounter() uses these CPU chip counters, and all chips manufactured in the last 10 years tend to have a counter register are accurate to 1/10,000,000th of a second now (though older CPUs ideally needs keep thread affinity). Sometimes it goes to finer resolution than that, but 10,000,000 tick-per-second counters is the current benchmark of accuracy.

Nowadays, in most programming languages, including the generic C++/C# objects, piggyback on these ultra-precise mononically-increasing high performance counters (built into CPUs) that are sufficiently accurate for raster guessing:

-- in C# it is the "Stopwatch" object that now gets 0.1us precision
-- in C++ it is the "std::chrono::high_resolution_clock::now();" that now gets 0.1us precision.

All these industry standard library calls piggybacks off modern high-precision clock-counters now. These high precision clock-counters has been continually improving, and the achievement of now brilliantly precise clock-counters made the Spectre/Meltdown problem possible widespread -- those fears only exist because computers now have microsecond-accurate clock-counters accessible to userspace now.

If you see my YouTube video (using Monogame, similar to Microsoft XNA) -- I'm only getting 2-scanline jittering from literally a script language (C# programming) using solely calculations on a high performance counter!

To help me out, I do use a floating-point clock accumulator variable (doubles) instead of integers, so I avoid the usual rounding-off errors that builds up, and I have de-jitter logic to intentionally compensate for late/early executions. But that is something you should do anyway no matter if hardware or software raster. Basically you make your next "raster interrupt" a little earlier if you're currently unexpectedly late in your current frameslice, to be more scanline-exact. That's what I am doing, to de-jitter on the fly, and it works.

My biggest source of noise is the jitter in VSYNC timestamps. Rasters are just simple-to-calculate offsets off these. But that's also a simple math problem: I was the first to deploy web-based refresh-rate-synchronized motion tests (www.testufo.com) and it's popular among display testers now -- anytime you see the UFO in a display test, it came from me. To predict the refresh rate of a display, I do heuristics on the timestamps of the animations (very jittery, noisy, etc) -- see http://www.testufo.com/animation-time-g ... =rendering (FireFox and Edge and Chrome have different precisions, the color coding is fun to watch!) .... and www.vsynctester.com -- and pull out a very accurate VBI guess to almost microsecond-exactness after a few seconds of random VBI callbacks (lots of jitter + lots of skipped VSYNCs) -- see www.testufo.com/refreshrate (it can compute a refresh rate to decimal digits). By having this data, one can then compute (from a microsecond-accurate counter), the raster position between two (de-jittered extrapolations of) VBI timestamps.

My tests show you can get within 2-3 scanlines of exactness using these algorithms concurrently:
- Dejittering/denoising a VSYNC callback event
- Access to microsecond accurate performance counters (typically 1/10,000,000sec accuracy)
- Knowing the size of the blanking interval (video timings)
If you do all three, then you get the raster prediction accuracy as seen in my YouTube video -- that's far more than good enough for 10-tile renderers. And I'm programming in C# rather than C/C++...

Skipping the VBI knowledge, you will get more offset, but using an assumed constant, it is only a roughly 2% error (e.g. raster offset by a few lines). Also, knowing how VBI is timed is useful -- VSYNC callbacks occuring at entry of VBI, or occuring at exit of VBI, is necessary to know where the VBI pause is in relation to the VBI callback event -- so you can mathematically compensate in generating a predicted raster position. The timing of the return of a blocking VSYNC ON Direct3D Present() call occurs at the entry into the VBI (so you've got a small VBI pause until the first scanline at the top edge of the new refresh cycle), and knowing the VBI size allows you to software-calculate an exact raster position to a very incredibly good accuracy with modern high performance counters.

(On Windows platforms you can read the VBI size using QueryDisplayConfig() which provides sufficient data in the .verticalTotal member -- subtract vertical resolution from it -- or use the horizontal scanrate (number of scanlines per second) that this returns. On Linux, you can optionally get it from the modeline or other sources. Or by default, just assume a 45/1125 constant which is a suitably 10-slice-accuracy catchall for >99%+ of video signals. You'd waterfall from this platform-independent assumption to using platform-specific APIs). But I have sufficient information needed to calculate near-scanline-exact rasters without a raster register, as long as I have access to a (noisy) VSYNC callback to signal the VSYNC intervals.
Calamity wrote:Although, IMHO, you can be either awesome or cross-platform.

The ideal goal is both.

Default to using only a VBI heartbeat, but improve accuracy by feeding it more data (optional platform specific hooks to scanline polls, optional platform specific hooks to knowing VBI size, etc). Basically cross-platform main module with optional hooks for platform-specific accuracy improvers.

I'm trying to decide how to architecture it in the long term, after my initial "RasterCalculator.cs" release (cross platform calculator).

I'm thinking of
-- Customizable VSYNC notification hook (so you can roll-your-own VSYNC signal listener & feed it into RasterCalculator).
-- Customizable raster-poll hook (if hardware raster is available) that RasterCalculator can use instead of calculating it internally
-- Manage callback events for rasters, essentially a raster interrupt! (meaning, I would handle the event-waiting, busylooping and spinning).

These can be refined with improvements, being opensource. Just want to get something out ASAP (within weeks). Several other projects of mine depend on this code, so I need this code anyway.
There's a feature that I think could be a nice addition to your library: an event based waitable raster status to avoid busy loops.
Tiny busy loops will be extremely hard to avoid at least for the 'final wait' but I'm researching options for maximum-precision events without wasting CPU cycles.

But good idea, I'll look to add waitable raster events and/or raster callbacks to RasterCalculator.css to create the defacto equivalent of raster interrupts. (again, it's C# initially -- may not be useful to your work just yet)

I probably will initially implement it as one spinning loop on a separate CPU core with thread affinity enabled, that runs at high priority (as realtime as Windows lets it), to drive the raster callbacks. System sleeping for long multi-millisecond delays, and then busylooping on tiny sub-millisecond delays, automatically as needed (configurable thresholds). I can use a pre-sorted list (with raster line number & event callback attached) with no limit to the number of events possible except computer performance available.

So users of my module will eventually be able to either just poll (spin on my module) or actually create "raster interrupts" (either waitable events or function callbacks upon raster reached). I'll use a standard pre-sorted list to create unlimited raster interrupts (As much as computer performance allows).

Initially this is in C# since my goal is simply quickly proving out the concepts first (quick for me in C#) -- so it needs to be ported to emulator-friendly C/C++ sometime after I release it ASAP -- I'm more than 50% done with the rasterdemo software (I want to release a version with zero Win32 API calls in the core RasterCalculator.cs), cleaning/refactoring, etc. Real life also takes precedence too. Keep tuned!
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
Chief Blur Buster
Site Admin
Posts: 11653
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 21 Mar 2018, 13:06

For other forum readers

Oh, and for Blur Busters readers who don't understand the beam racing stuff, I recommend this book:

Amazon Book: Racing the Beam: The Atari Video Computer System

WIRED magazine article: Racing The Beam: How Atari 2600's Crazy Hardware Changed Game Design

Back in the 8-bit days, you had to do things in realtime with the display scanout. Raster interrupts, etc. Changing graphics on the fly as the "beam" scanned out onto a CRT tube, top-to-bottom.

This is probably the simplest, least technical book that gives a fun read to people who aren't familiar with "beam racing" stuff.

All this talk is sort of (approximately) bringing that back to the PC, since PC performance (and clocks) is finally fast and precise enough to nearly-laglessly synchronize the raster of an emulator with the raster of a realworld display, and a creative use of VSYNC OFF (that gamers are friendly with) gives a safe error margin of jittering between the emu/real rasters.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
Chief Blur Buster
Site Admin
Posts: 11653
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 21 Mar 2018, 16:35

Sneak preview of the comment header I wrote (no source code, it's only the top comments part)

Current status: About 1000 lines of code written (excluding the graphics demo part of the software which uses this module).
Still refining at the moment then I'll put it out in an open source format.

Most importantly, I post this solely to show how few references I use (only stdlib-style functions -- no Win32 calls!) and why I'm making the core module platform-independent, and making platform-specific precision-increasers optional.

I leave it to the responsibility of other programmers to provide a vsync heartbeat to the module (call module once every vsync in your vsync event). But other than that, my module does the rest -- telling you the scan line number or executing a callback upon scan line number.

It'll be easy to port to C/C++

Code: Select all

using System;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace BlurBusters.UFO
{
    /// <summary>
    /// A cross-platform raster scan line position estimator for beam racing applications.
    /// 
    /// This module gives you:
    ///    * Raster scan line number -- current raster number calculated without hardware raster poll
    ///    * Raster interrupts -- callback function events on scan line number
    ///    * Hook to provide VSYNC timestamps -- Just call me once every VSYNC. 
    ///         - module calculates near-scanline-exact raster just from de-jittered VSYNC timestamps
    ///    * Or hook for optional external hardware raster poll
    ///         - purely optional platform-specific hook to override internal software raster estimator
    ///    
    /// This module helps all kinds of beam-racing applications:
    ///    * Reduce input lag in emulators
    ///    * Reduce input lag in virtual reality
    ///    * Precise control of exact location of tearlines during VSYNC OFF
    /// 
    /// Absolute minimum raster-capability requirement:
    /// 1. Your platform supports high-precision (under 1us) clock counters. Most CPUs from last 10 years do.
    /// 2. You feed this module timestamps of your VSYNC / VBLANK  (jitters & misses ok; module filters them)
    /// 3. This module accurately estimates (to ~1% error) the raster scan line number with only the above.
    /// </summary>
    public class RasterCalculator
    {
// [about 1000 lines written...check in about one week or so]
I'll be leaving the actual graphics stuff to a separate exercise though I'll provide a MonoGame beam chasing demo included with this (Visual Studio Community Edition 2017 + Free MonoGame 3.6 Library).

The goal is simply the world's most accurately guessed raster scan line numbers without access to a direct raster poll. I think I've achieved that thanks to my excellent knowledge about displays, so that's what I want to get out to the public.

Extra optional platform-specific cake frosting (e.g. video timings of real signal, modelines, vertical totals) only server to improve accuracy. This data is completely optional and you simply update some getter/setter values if you have that info. Or just leave them at their defaults for a slightly higher error (e.g. ~1-2% screen height offset -- still good enough for approx-10-tile renderers).

Since it uses so few library functions as it's simply well-understood timing mathematics.

Really, the VSYNC OFF tearline of any API (OpenGL, Direct3D, etc) accurately correspond to the scan line number guessed by this module! So the raster interrupt function of this module can call your VSYNC OFF buffer swap function, which puts the tearing exactly at the scan line number of the raster interrupt (to an accuracy of a few scanline misposition -- perfectly fine for tile-based beam racing applications). Without need for a hardware raster register poll!

Making this cross-platform. Doesn't matter what 3D API as long as the 3D API is VSYNC OFF compatible.... The module makes it possible for anyone to control (nearly) the exact location of your VSYNC OFF tearline, and once you do that, you're pretty much ready to beam race in your app if you're doing beam raced rendering.

It'll work with RasterCalculator as long as you are able to feed it timestamps of the real display signal's VSYNC (that timing roughly corresponds to the return of a blocking VSYNC ON frame-swap).
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Calamity
Posts: 24
Joined: 17 Mar 2018, 10:36

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Calamity » 23 Mar 2018, 13:04

Current frame response is finally possible!

phpBB [video]


Demonstration of the experimental "frame slice" feature. Tear free rendering at 500 fps, 728x567i 49.890 Hz. Emulation of each frame is divided in 10 "slices", synchronized with the physical raster. Input data is polled and processed for each slice. On pressing F11, slices are shown with a color filter, revealing the (low) existing jitter.

Setup:
- Intel i7-4771 3.5 GHz, AMD Radeon R9 270, Windows 8.1 64 bits
- GroovyMAME 0.195 - Direct3D9ex - custom "frame slice" build
- JPAC wired to a microswitch and a 5V LED.

Testing Toni Wilen's "Button test" program for Amiga, emulated by GroovyMAME. This program polls input at a scanline specified by the user (green line). If input is detected, it colors all lines below the green line in red. After that, on the next frame, it colors all lines until the polling line in yellow.

See how, in many instances, the program reacts to input (LED) right in the same frame.

The second part of the video was recorded at 240 fps. The brightness was raised a bit in GroovyMAME to make the raster visible on the black background.

Thanks to Dr.Venom for inspiring this work.

User avatar
Chief Blur Buster
Site Admin
Posts: 11653
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 23 Mar 2018, 13:19

Great proof of beam chasing with the concepts from this thread!

Toni tells me he already has a functioning WinUAE internally with this too. He said it was easier than he thought.

Also, I'm crossposting some stuff I wrote about variable refresh rate understandings, and how to beam-chase variable refresh rate cycles.

Not everyone understands exactly how much I already know about variable refresh rate displays, and why GSYNC/FreeSync is no different in beam chasing -- it's just simple understanding to me:
Chief Blur Buster wrote:[OPTIONAL INFO FOR VRR IMPLEMENTATION]

As Toni confirmed implementing beam chasing in WinUAE, I've gotten emails from multiple developers who are monitoring this, for their own respective emulators. So I'm adding this:

It has come to my attention that many software developers do not realize what variable refresh rate does. I keep re-re-re-rexplaining variable refresh rate to software developers, and also explaining why it reduces input lag of low-framerate material such as emulators. I understand exactly why, but many people don't. So here goes:

A variable refresh rate display (GSYNC, FreeSync), running in variable refresh rate mode, in summary:

* It scans out at a constant velocity corresponding to its maximum refresh rate.
* Only the blanking interval changes, to space out the refresh cycles.
* Horizontal scanrate never changes, regardless of refresh rate.
* A hardware poll is not needed but, I'll mention one only insofar as to improve mathematical understanding of VRR
* RasterStatus.ScanLine (and similar ilk) increments at a constant rate on a variable refresh rate
* RasterStatus.ScanLine (and similar ilk) begins incrementing only after Present()
* Your Present() call is the software-triggered beginning of a new refresh cycle.
* Your display is WAITING for your software to Present()
* The scanouts of a variable refresh rate is always at maximum velocity. 60fps on a 144Hz VRR display scans top-to-bottom in ~1/144sec
* That's also why lag is low when running emulators on a VRR display
* That's also why it is also possible to beam-chase a variable refresh rate display as a lag-reducing multiplier effect (having EVEN LESS lag).
* You can optionally query for horizontal scan rate (i.e. how fast RasterStatus.ScanLine will increment) via QueryDisplayConfig() on Windows.
* The graphics card is automatically continuously transmitting VBI lines until Present() at which point the first scan line begins scanning out immediately after.
* Yes. REALLY. Yes, you're controlling the display's exact timing of refresh cycles -- when a display is in variable refresh rate mode. The display is actually idling for YOU and really does begin its scanout when you Present()

Now that you understand VRR better, let me explain how VRR is combined with VSYNC ON and VSYNC OFF simultaneously.

* VRR supports VSYNC ON and VSYNC OFF
* Whenever VRR displays are running at framerates below refreshrate, VSYNC ON and VSYNC OFF doesn't matter
* When VRR is running with VSYNC ON, Present() only sometimes blocks.
......Present() while the display is waiting for you, begins a new refresh cycle, and returns immediately. i.e. frame rates below max Hz.
......Present() while the display is still scanning last frame, forces a wait for the refresh cycle to finish, and blocks until then, just like VSYNC ON. i.e. frame rates that wants to be above max Hz.
* When VRR is running with VSYNC OFF, Present() never blocks.
......Present() while the display is waiting for you, begins a new refresh cycle
......Present() while the display is still scanning last frame, behaves like VSYNC OFF. The new frame scans out instead, at the current raster position. t's as if you never were running a variable refresh rate display, at this instant moment. (at least for the remainder of the refresh cycle). Refresh cycles that are already started always finishes.

Generally, that's why BlurBusters articles advocate frame rate caps slightly below VRR. Caps too tight against VRR, 144fps can mean some frametimes are 1/140sec, and other frametimes are 1/150sec, since frame rate caps aren't always perfect in many games. One frame gets the VRR treatment and the other frame gets either the VSYNC ON or VSYNC OFF treatment. For the perspective of emulator development, that means you ideally want to run your emulator at ~59 to ~59.5 frames per second (slow down your emulation slightly) if you're running on a VRR display whose maximum refresh rate is 60 Hz (e.g. 4K 60 Hz FreeSync displays). There's been reported input lag problems trying to run 60fps on a 60Hz VRR display. Fortunately, most VRR displays run at above 60Hz, so you probably don't need to care or worry about this situation, but I only mention this additional item, to be familiar with the considerations.

In all situations, Present() can also be glutSwapBuffers() in OpenGL, or any equivalent call that generates a tearline during VSYNC OFF. As long as you have access to an API call on your platform, that can generate a tearing artifact. Controlling the exact location of tearing is simply clock mathematics as an offset from the last VSYNC timestamp.

Tearing artifacts during VSYNC OFF are raster-based. They correspond to the time of the API call to flip a frame buffer. What changed is that today's computers have microsecond-accuracy performance counters these days. This makes it possible for software to control the exact location of tearing artifacts -- just like in my YouTube video: https://www.youtube.com/watch?v=OZ7Loh830Ec

That means, if you want to make your beam-racing algorithm compatible with VRR mode, simply do the first Present() after you render your first tile. Do the Present() at your emulator framerate interval (e.g. 1/60sec) after your last first-tile Present(). These Present() calls will trigger the beginnings of those refresh calls, and this is your figurative "starting pistol" for beam racing because at that point, RasterStatus.ScanLine suddenly starts incrementing as a result of your Present() call.

Variable refresh rate displays also makes random-looking framerates smooth. If you've never seen a variable refresh display, it has the uncanny ability to synchronize a random frame rate with a random refresh rate -- the refresh interval can change every single refresh cycle, exactly matching gametime deltas (frame times), keeping things moving smoothly despite erratic framerates. Framerate changes are de-suttered.

But variable refresh rate displays are also great for emulators for a different reason: They function as a defacto "Quick Frame Transport" mechanism, in the form of fast-scanouts between long blanking intervals. HDMI recently standardized the "Quick Frame Transport" (QFT) technique for Version 2.1 of HDMI. This is one of the multiple features that is part of the new Auto-Low-Latency mode for automatic Game Mode operation (which also, supposedly, allows consoles to signal TVs to automatically switch into "Game Mode", and ability to do VRR and/or QFT. The mathematics of this is the same -- it's simply a higher scanrate signal while keeping Hz same. This is excellent for reducing input lag for VSYNC ON material (as most consoles are VSYNC ON). Don't worry about this complicated spec stuff. Just try (on a best-effort-basis) to query the system's Vertical Total or Horizontal Scan Rate.

The best practice is a waterfall accuracy approach:

* Hook to hardware scanline callback
* Missing hook, we want either Vertical Total or Scan Rate
* Missing that, we assume 45-line VBI (raster stays 99% accurate on 99% of signals). This is because 1080p has 1125 scanlines, and this is because 480p has 525 scanlines. So 45 is an excellent default catch-all.

In all situations, I still need a hook to a VSYNC event (but it can come from **anything** -- a DWM loop, a 2nd 3D instance in background, a hardware poll, a pre-existing kernal event, etc). But other than that, as long as the 3D API is VSYNC OFF compatible...

That keeps everything cross-platform, while optionally improved-accuracy on platforms with a hardware poll (fixed Hz or VRR) _or_ scanrate knowledge (fixed Hz or VRR) _or_ vertical total knowledge (fixed Hz).

By programming things in a sensible common-sense fashion, your beam chasing algorithm becomes compatible with almost everything. No access to RasterStatus? No problem, as long as I can listen to VSYNC events. No access to modelines? No problem, as long as I can listen to VSYNC events. No access to QueryDisplayConfig? No problem, as long as I can listen to VSYNC events. Etc. Do the calculations correctly, and then everything else is optional cake frosting.

For developers reading this, all this is optional however, you don't have to make your beam-chasing algorithm compatible with VRR. But it's quite easy to do so if you already have working beam chasing. (It's only a minor modification to an existing beam chasing algorithm once you've gotten it working for fixed-Hz displays)

[/OPTIONAL INFO FOR VRR IMPLEMENTATION]
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Sparky
Posts: 682
Joined: 15 Jan 2014, 02:29

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Sparky » 23 Mar 2018, 21:50

Calamity wrote:Current frame response is finally possible!

phpBB [video]


Demonstration of the experimental "frame slice" feature. Tear free rendering at 500 fps, 728x567i 49.890 Hz. Emulation of each frame is divided in 10 "slices", synchronized with the physical raster. Input data is polled and processed for each slice. On pressing F11, slices are shown with a color filter, revealing the (low) existing jitter.

Setup:
- Intel i7-4771 3.5 GHz, AMD Radeon R9 270, Windows 8.1 64 bits
- GroovyMAME 0.195 - Direct3D9ex - custom "frame slice" build
- JPAC wired to a microswitch and a 5V LED.

Testing Toni Wilen's "Button test" program for Amiga, emulated by GroovyMAME. This program polls input at a scanline specified by the user (green line). If input is detected, it colors all lines below the green line in red. After that, on the next frame, it colors all lines until the polling line in yellow.

See how, in many instances, the program reacts to input (LED) right in the same frame.

The second part of the video was recorded at 240 fps. The brightness was raised a bit in GroovyMAME to make the raster visible on the black background.

Thanks to Dr.Venom for inspiring this work.
Lots of bounce on that switch at 1:53. If you're going to work this into an actual game controller I'd suggest using the NC contact of the switch for debouncing.

User avatar
Chief Blur Buster
Site Admin
Posts: 11653
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 23 Mar 2018, 23:21

Nice, the lag is what I expect:

10 slices at 50 Hz = 2ms per slice = 500 slices a second = 2 slice chase behind (since immediate slice = jitter margin) = up to 4ms lag

That said, if graphics card performance allows, I'd go 20 slices per refresh cycle (maybe more). That reduces input lag by half and will probably be quite dramatic for high speed video demonstrations.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Calamity
Posts: 24
Joined: 17 Mar 2018, 10:36

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Calamity » 24 Mar 2018, 04:52

Sparky wrote: Lots of bounce on that switch at 1:53. If you're going to work this into an actual game controller I'd suggest using the NC contact of the switch for debouncing.
Thanks for the suggestion, you're right. I thought I'd post the uncut video material with all its imperfections. Initially I was planning to get a Teensy in order to do these tests, but I was put down by the added complexity.

This JPAC is an old model. I'm not sure how much latency is attributable to its microcontroller, but judging by the results it's not performing bad at all. The JPAC was connected through an USB 3.0 port overclocked to 1000 Hz with hidusbf. I'm not sure if the later actually made any difference though.

Post Reply