Emulator Developers: Lagless VSYNC ON Algorithm

Talk to software developers and aspiring geeks. Programming tips. Improve motion fluidity. Reduce input lag. Come Present() yourself!
Calamity
Posts: 24
Joined: 17 Mar 2018, 10:36

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Calamity » 28 Mar 2018, 08:59

Seeing that you're taking so much care in dejittering vsync pulse, I'm wondering. How bad is jitter in vsync? In our experience (GM) vsync was way more reliable than cpu timing to base emulation throttling on. E.g., tying sound update to vblank-end lead to very little or none buffer overflows/underflows, compared to using the regular cpu timing. I'm not saying the jitter wasn't there, but definitely didn't cause major problems.

Polling raster status directly in a busy loop often gets you 3 hits or more per scanline. That's pretty good I'd say. By knowing hfreq you could hang subsequent slices from this initial "hard poll"/vsync pulse by means of cpu timers without having to calculate vfreq till the sixth decimal. That said, the hfreq figure obtained from the OS is always going to be a rough estimation due to dotclock granularity (real dotclock is unknown to the system, you can only get it undirectly from measuring the refresh rate of by reverse engineering of the pll dividers algorithm in the drivers).

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 28 Mar 2018, 10:12

Excellent question:
The answer is: I'm working to make my raster calculator as cross-platform as possible.

VSYNC timestamps are really bad in some frameworks
Some languages, game engines & frame works have VSYNC jitter as bad as Javascript:
https://www.testufo.com/animation-time- ... busywait=0
(Try that in Chrome & FireFox & Edge -- on PC, on Mac, on iPad, on Galaxy, compare the browsers! Some have glass floor graphs, others are a royal mess). It's possible to have near-perfect beam chasing with bad VSYNC jitter, but you need good de-jittering algorithms.

Example: MonoGame 3.6 VSYNC timestamps
For example, the portable game engine in C# (which I base my raster demo off) -- MonoGame 3.6 (PC/Mac/Linux) -- has extremely bad VSYNC timestamp jitter that jitters 5% -- usually worse than JavaScript! (Much worse than the graphs at the above link on an AMD/NVIDIA system)! A 1-to-2ms timestamp jitter equals as much as 1/8th screen height random offsetting (out-of-sync emu-vs-real raster).

This is not a problem for plain VSYNC ON as the buffering smooths that perfectly but 1ms jitter is a mess for beam racing applications. But that's solvable by processing the history of timestamps & extrapolating a near-perfect VSYNC timestamp. It's not as simple as just averaging, since you've got missed-frames and random floating average offsets (especially due to varying background processing loads), so simpler averaging creates a "floaty offset" effect.

So, essentially, you gotta dejitter better. So good vsync timestamp dejitter algorithms, for universal noisy vsync. But you see, I am the inventor of the TestUFO tests (Haven't seen it yet? Try all 30 tests in the pulldown menu at top of http://www.testufo.com such as Persistence, Eye Tracking, Black Frame Insertion, Scan Skew, etc). And guess how I'm able to inform the user whenever a frame got skipped in TestUFO, to a 95%+ accuracy in the Chrome browser? That's because I'm doing heuristics on vsync timestamps in TestUFO. So does vsynctester.com who I often correspond with .... And you know, browsers are cross platforms so this graph have different shapes and patterns in different browsers & different platforms .... The random vsync timestamps reveal a lot of information (refresh rate, frame dropping, etc), after some data processing on them, to help detect things like accurate dropped-frame prediction & accurate refresh rate (to over three decimal digits) despite fuzzy/random/dropped VSYNC timestamps in a highly-fluctuating script language. And I see signals-in-the-noise big enough to predict whenever I think animations will stutter in a browser -- and inform (invalidate) certain tests such as the TestUFO frameskipping test (which requires perfect VSYNC or it's invalid) -- its' the world's most popular test for display overclockers who overclock the refresh rate of their displays. But it requires mandatory perfect VSYNC or the test is invalid. Nobody else has succeeded on a reliable online test for monitor overclocking, but I have -- and thousands of people make forum posts talking about that particular TestUFO test about monitor overclocking, so it's the gold standard test for monitor overclocking. Needless to say, it can get quite complicated to create a universal VSYNC timestamp dejitterer. But between myself & vsynctester.com -- I think between us, we've ended up having the world's best VSYNC timestamp dejitterer algorithms, and I'm preparing to opensource at least one of the algorithms for the greater good of the beam racing world. This has cross-platform beam racing spinoff applications.

I simply create two generic Monogame instances in one app (a VSYNC OFF foreground thread for displaying frameslices, and a hidden VSYNC ON background thread for timestamping the VSYNC events) -- that's all I need for beam racing. The magic ingredient is VSYNC timestamp dejittering. But it apparently works, as you can see in the YouTube video.

Cross-platform VSYNC timestamp accuracy is often unexpectedly jittery, just like the TestUFO graphs
It's probably because it's using a power-efficient Thread.Sleep rather than busywaiting. Power saving, etc. Things like that. So because of that, gameTime has noisy timestamps. Yet thanks to VSYNC dejittering, I'm successfully pulling off near-raster-exact beam racing in C# with ZERO Interop calls (no Win32 API calls). I want to enable hardware access whenever possible but I want to make that optional -- platform-specific stuff should become cake frosting.

You're starting off from easy stuff (Windows-only RasterStatus.ScanLine) while I'm working on the worst-case "calculation" scenario, a zero-access situation (no access to .INVBlank, no access to .ScanLine) and only an approximate weak tap to a noisy external VSYNC callback event. Yes, I can call .INVBlank and .ScanLine with API calls but I'm intentionally avoiding doing that at the moment during my tests as a mandatory cross-platform requirement.

If I can do reliable raster scan line prediction from noisy VSYNC timestamps, then I can pretty much beam race almost EVERYTHING without access to a raster register. Many other emulator authors want a cross-platform solution.

A beam racing playground coming
Once I validate in C# -- I'll release the "Hello World" beam chasing demo as an easy sandbox playground for any "Direct3D/OpenGL beam racing newbie" to become a Tearline Jedi.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Calamity
Posts: 24
Joined: 17 Mar 2018, 10:36

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Calamity » 28 Mar 2018, 16:27

I've done some progress today. Now I can finally set an arbitrary number of slices. I can't seem to get much higher than 20 slices on my current hardware though. I'm also trying to implement an algorithm to figure out the right value of vsync offset on the fly, with little success so far.

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 28 Mar 2018, 22:08

I think a VSYNC offset should probably be implemented as a user-adjustable slider while displaying panning motion. Slide the slider until tearing artifacts disappear. Done.

It looks like default VSYNC offset could initially default to ~1/4 frame slice height, since rasters stabilize more with tinier frame slices if GPU performance is sufficient. Because it's 2 frame slices of lag anyway, adding 1/4 frame slice height of additional lag isn't important, given 1000 frameslice per second operation.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 29 Mar 2018, 13:03

Update. ArsTechnica has posted an Amiga article so I've added two beam racing comments (introductory post, follow up post) since Ars has a fairly high software developer readership. To raise awareness of the existence of real-time beam racing a tiny bit, since Amiga-born programmers are likely reading too.

Looks like my introductory posts are currently being well-received there.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 01 Apr 2018, 14:51

For detecting a VSYNC offset durin de-jittering very jittery VSYNC (when trying to extrapolate rasters):
-- Early VSYNC's pretty much never happen (but outliers could cause a few datapoints to occur there, so filter these)
-- Late VSYNC's delay the VSYNC timestamp offset overage, creating larger VSYNC offsets for dejittered VSYNCs
-- A standard deviation formula actually seems to work well in compensating for VSYNC offset problems
-- Jittery VSYNC listener can be come less jittery / more jittery as computer performance fluctuations occurs (e.g. like http://www.testufo.com/animation-time-graph when you open new tabs, browse a different window, etc). This can cause the VSYNC offset to "float downwards" as the average timestamp "line" shifts downwards with a noisier cloud of timestamps. One have to dynamically compensate for this. I'm adding new logic to my VSYNC timestamp de-jitterer.

It's a miracle to pull raster stabilities out of super-noisy random-missed VSYNC timestamps that also has continually varying noise (standard deviation always changing due to varying levels of background processing).

I can achieve stabilities of <0.1/67500th (67.5KHz is the scanrate of 1080p). Meaning if I stopped tracking VSYNC after about 10-to-20 seconds of VSYNC listening (that gives sufficient data to extrapolate many seconds into the future) -- my rasters stay stable and shifts only 1 scanline every 1 second to 30 seconds (Varies depending on the quality of earlier VSYNC timestamps). They stay scanline-exact (typically jitters up/down 1 scanline) if the VSYNC timestamp keep coming. I'm able to kill VSYNC listening and the rasters stay miraculously stable solely on extrapolated information, drifting 1 scanline every 10 seconds (with no VSYNC polls, with no raster polls -- just memorized dejittered clock information!). So I'm able to resume rasters successfully after a momentary freeze (computer stall), merrily as if nothing happened. It's got very good missed-VSYNC-events tolerance now.

My biggest problem today is ultra-fast-tracking of changes to the Standard Deviation (the method of detecting realtime changes to the "noisiness" of noisy VSYNC timestamps) so that I can do realtime VSYNC offset compensation to prevent rasters from floating upwards/downwards during noisiness-changes. This is an even tougher mathematical problem than simple vsync-timestamp dejittering.

Given no information about refresh rate, it takes 1 second to stabilize when vsync timestamps are really random (20% jitter error -- e.g. 3-4ms randomizations of 60Hz VSYNC timestamps). The picture rolls until the rasters lock quickly into their scanline-exact positions as the RasterCalculator/VsyncCalculator de-jitters the randomized VSYNC timestamps (in a situation of no access to a hardware raster ScanLine API or register). Given prior knowledge of refresh rate, it will startup much more quickly. The more random/jittery the VSYNC timestamps, the longer it will take to stabilize (stabilizing can be instantaneous for low-noise VSYNC timestamps, but takes 1 second for really noisy VSYNC timestamps).

Getting better will require calculus/algebra skills in improved grid-fitting the VSYNC timestamp/intervals dots to a line, but I'm trying to avoid that (I just visualize this using geometry instead -- fitting a straight line across history of interval-lengths (differences between two VSYNC timestamps), rejecting outliers, and then using standard deviation formula to shift the line upwards to align better with the beginning of the VSYNC / Scan Line #1 -- much simpler mathematics that's no worse than high school mathematics. But I still feel like I'm studying for an exam for the first time in a decade!

At least after I'm done, the fun part starts: Adding optional hardware hooks for specific platforms (First Windows, then I'll do Mac after). Once I've got a cross-platform C# rasterdemo (using MonoGame Engine).

Then once the RasterCalculator/VsyncCalculator modules are ported to C++ I think the code could be the gold standard source code in raster-ScanLine-guessing for "no-access-to-raster-register" cross-platform situations.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 01 Apr 2018, 22:28

An example of jittery timestamps versus dejittered timestamps:

(Both videos: Completely avoids RasterStatus.ScanLine to see how accurately we can do it from noisy VSYNC timestamps)

Original VSYNC timestamps:
phpBB [video]


Dejittered VSYNC timestamps:
phpBB [video]


(Some other code changes like UFO size, but the jitteriness shows the differences achievable with timestamp dejittering).
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Calamity
Posts: 24
Joined: 17 Mar 2018, 10:36

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Calamity » 03 Apr 2018, 10:33

Your second video (de-jittered vsync) is truly amazing. Specially considering how bad the actual vsync is based on your first video. I wouldn't have thought that was possible.

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 03 Apr 2018, 11:18

One moment, VSYNC events gives me microsecond accurate timestamps, then I run on a laptop that enters a new power management mode that reduces certain timer precision to 1ms granularity, and boom -- frameslices are jittering all over the place (e.g. timestamps returned from MonoGame).

It even varies while the laptop runs (thermal throttling, timer precision changes) -- one moment, it's accurate timestamps, next moment, timestamps are jittery.  So the art of raster-register-free rasters necessarily require timestamp dejittering. This is important for cross-platform rasters via VSYNC OFF tearlines.

VSYNC events tend to occur at the end of a refresh cycle (the moment you enter VBLANK, not exit VBLANK) -- that's important knowledge for timing calculation.

Even it is never 100% perfect, and I sometimes need to calibrate using an offset. (Both optional .TimeOffset and .ScanLineOffset are provided). Monogame (C# game engine) seems to be happy with ScanLineOffset == 0 (lines) and TimeOffset = 0.0004 (ms) and the rasters align scanline-exact or near-scanline-exact at any Hz/refresh/resolution, with the biggest offset distortion (about 5 scanlines) occuring on my Zisworks 4K 120Hz display, and only 1-2 scanlines offset on my 1080p displays.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Tommy
Posts: 6
Joined: 04 Apr 2018, 14:02

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Tommy » 04 Apr 2018, 15:24

Hi, sorry, another emulator author here, jumping in after a brief email discussions.

My emulator supports a handful of early-'80s machines, but does so with two relevant features:
  • all produce a real video signal, to talk to an emulated CRT. Video output looks like sync level for n units of time; then this PCM data at this clock rate for m units of time; then this constant level for of units of time; etc. It's not "line based" or any other reductive term like that, it's a real video signal. Many of the machines batch up their PCM periods but it's never as much as a whole line. For composite colour machines, the signal a real NTSC signal that your GPU really decodes, etc;
  • it's predicated around on-demand emulation. The input is always that it's now time X, so would the emulator please catch up to then. Prompting events are currently either audio exhaustion (every 5–10ms, depending on your output rate — it'd scale up to megahertz if your audio card did, but permits itself some latency so as to cooperate nicely with power management) or vertical sync.
On the receiving end of the video signal is the emulated CRT. So it maintains an actual pretend flying spot, discerns horizontal and vertical syncs and attempts to phase-lock to them, and paints a pretend CRT surface.

Heresy here: I'm a lot more annoyed by lag and by motion aliasing than I am by blurring. So I prefer blurring to motion aliasing. Also I'm originally British and a Mac user who is not in any other way a gamer, which means that most of the time I want to play my 50Hz games on a fixed-60Hz display. So I have an approximation of phosphor decay in there. Which, if I were playing devil's advocate, I'd describe as the specific addition of blurring to avoid 50->60 stuttering. It also decreases input lag though, so I call it a double win. But it's the opposite of Blur Busting. Damn me!

Also because I'm painting the screen with a bunch of individual bits of geometry to represent each raster sweep it means I get deinterlacing that isn't terrible for free. The alternate frames are at different vertical offsets because the machine signalled sync at a different time, and they're blended because I have pretend phosphors.

So, cool, that's me. Pulling around to relevancy:

1. I think the problem of syncing my pretend CRT with discerned incoming syncs is exactly the same problem as syncing to real machine vertical syncs based on a noisy trigger. In both cases we're talking about a phase-locked loop with a low-pass filter. So I might be interested in discussing that more closely. My solution mimics that which I found in the CRT literature, being a flywheel sync. So it's always spinning and it triggers the actual (pretend) flyback upon the completion of each metaphorical revolution. When it receives a sync trigger from the feeding signal that prompts a comparison with what the flywheel already believes. The flywheel can't change phase but it can change frequency. So it shifts itself in the direction of the error proportionally to the error subject to a cap.

That's my specific IIR implementation of a low-pass filtering phase-locked loop. You could just as easily go FIR; I'm using that for the pretend PLL that pretends to interpret flux signals when emulating a disk drive and have found it to be robust. There's quite a lot of literature in the Atari ST world on exactly how the WD177x implements a digital FIR PLL, none of which I have read. But it's on my mental list.

2. For any public library, I would advocate for the same sort of event-based API. What would be ideal to me would be no more than giving you two callbacks and telling you to go, and you calling it periodically, those being:

Code: Select all

outputUpTo(distance down display) -> amount of emulated time I expended to get there
setSpeedMultiplier(multiplier)
A low-level variant would allow me to announce whatever my OS is telling me about vertical syncs and, if available, specific line times so that I could bind it to my OS of choice but for anything mainstream I'd dare imagine it's just a case of asking for you to go and responding to the callback. You tell me where you think I should down to, I tell you how much emulated time I took to get there.

That gives you enough information to pick any slicing strategy, decide whether I'm close enough to in-sync with the display to justify beam racing at all, and to warn me if you're changing the meaning of time so that I can adjust my expectations around the other time-based outputs, like audio.

Re: my emulator, I've a few platforms in there with programmable field timing so if and when I get around to trying to adapt this fantastic new idea in lag elimination, I'm actually going to need to up my game so that I'm sync monitoring on both sides and making appropriate instantaneous decisions about emulated speed to line the two up if and when frequencies are appropriately close.

Also mine is a proper native app on the Mac at least, which makes a common-enough use case being to have multiple emulated machines all over your desktop, and something like 90% of Macs are laptops nowadays so I'm not sure that a busy loop will be acceptable to most of my users. Which makes for a bunch of extra factors.

Sadly it's not priority one so I'm unlikely to act all that soon, but the new idea is really exciting and when I've done it once, it'll naturally flow into all of my current machines and any additional ones I implement from now on. So it will get done.

EDIT: and re: whether to handle 120Hz display of 60Hz as a double-speed burst for the first frame followed by a repeat or a blank, or to abandon raster racing, I think I'd prefer the latter because latency is my overriding concern. Yes, more than blur — sorry! I'm emulating early-'80s machines so the real experience would have been blurry but lag free, so that's just trying to be accurate. Therefore I'd probably be likely to fall back on longer phosphors and not racing the beam, just as you'd get if you used my emulator today.

EDIT2: oh, and audio latency too if you tried to fit 60Hz to 120Hz by taking every other frame off. I think that, in summary, my perspective is that there are at least three latency factors at work here: input, video and audio, and I don't agree that any one trumps the other two.

Post Reply