Emulator Developers: Lagless VSYNC ON Algorithm

Talk to software developers and aspiring geeks. Programming tips. Improve motion fluidity. Reduce input lag. Come Present() yourself!
User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 04 Apr 2018, 16:36

Welcome Tommy!
You're the confirmed author of CLK (github) -- Acorn Electron, Amstrad CPC, Atari 2600, ColecoVision, Commodore Vic-20, MSX 1, Oric and ZX80/81

It's great that a third emulator author has joined this thread.

Your software is fantastic so I'm looking forward to seeing all these ideas being cherrypicked for your emulator too.
Tommy wrote:When it receives a sync trigger from the feeding signal that prompts a comparison with what the flywheel already believes. The flywheel can't change phase but it can change frequency. So it shifts itself in the direction of the error proportionally to the error subject to a cap.
Neat. Funnily, it's sounds similar to the behaviour of my "NO RASTER REGISTER" raster guesser algorithm at startup. I'm bumping upwards/downwards an interval estimate based on which direction my error margin leans in, in a decaying fashion. If I don't give my algorithm a refresh rate (e.g. has to make a wild guess at the refresh rate), often my tearlines rolls 1 or 2 cycles for the first 1/2 to 1 second (like a VHOLD knob miss) and then suddenly disappears (stabilizes) as a good history of timestamps is built. I guess it's a figurative digital version of a flywheel spinning up. If I have a genuine raster hook, this doesn't happen.

I need to refine the VSYNC listener startup stabilization more, but this will be an open source project soon (title "Tearline Jedi Demo")...so other people may be able to stabilize the startup better. I'm prioritizing on raster stability, but sometimes algorithms for continued stability is 100% contrary to goals of quick sync (in under 500ms).
Tommy wrote:Also mine is a proper native app on the Mac at least, which makes a common-enough use case being to have multiple emulated machines all over your desktop, and something like 90% of Macs are laptops nowadays
Neat -- My main development machine is a Mac.
(It makes a good Mac + Windows + Linux machine too -- all three cakes in one!)
Though I program mainly in Windows, the beam racing stuff also works on Mac too.
Fortunately, the concepts here are 100% cross platform -- fundamentally, VSYNC OFF tearlines are simply rasters.

Make sure you turn off Mac "BeamSync" to get beam racing working on Mac -- you have to enable tearlines on a Mac before one gets the ability to have incredible fun beam-racing the tearlines out of the way. I haven't begun doing that, but there's an API call to do that...
Tommy wrote:I'm not sure that a busy loop will be acceptable to most of my users. Which makes for a bunch of extra factors
Hmmm....
Most of the time, my Mac laptop is plugged in, so I wouldn't care during those moments.

One possible solution? Maybe give the users a choice between....
-- Low-precision beam racing (4 frame slices can work with millisecond sleeping)
-- High-precision beam racing (10, 20+ frame slices, requires busy waiting).

You'll just simply need to use larger chase margins/offsets. (e.g. 2ms or 3ms chase distance between real-raster and emu-raster). If you use millisecond sleeping though. As long as you do the "forgiving method" (full refresh cycle jitter margin technique), you can gently use millisecond sleepers with beam racing. Now that I think about it, I suppose adding 3-4ms of extra lag to save lots of battery power is worth it! With a refresh cycle being 16.7ms, that gives you lots of play margin to tolerate an imprecise sleep.

I am doing some research on more power-efficient sleep methods, some platforms have a very precise "sleep-until" technique that is almost microsecond accurate. I've not quite unlocked this feature, but what seems to be happening is some platforms have microsecond-accurate nanosleep() -- it doesn't always go that accurate but it apparently gets really accurate on at least one of my systems! Unfortunately, not all platforms have such accurate microsecond sleeping.

If you gain access to 0.1ms sleep, the good news is that 0.1ms is good enough for 10-frameslice operation. You might need ~0.01ms accuracy for 100-frameslice-per-refresh-cycle operation with a tight beam-racing margin. The determining factor is the horizontal scan rate -- 67.5 KHz for 1080p 120Hz

Perhaps I was too hasty in recommending busywaits only because it is the only 100% reliable way to do it on all platforms (slow laptops included). A "trust it and forget it" mechanism. But if you're targetting Mac only, I think there's a Mac microsleeper (thanks to Apple's religious approaches to power management and all) but I haven't researched that far yet. Macs are (usually) predictable & consistent so if you've found an Apple sub-millisecond microsleeper, it probably works on all or almost all of them. Need to figure out.

You could simply have a toggle for power-priority versus precision-priority. On some platforms, they're the same (thanks to practically microsecond-accurate sleeper), but on some platforms they diverge (busysleep method during precision-priority).
Tommy wrote:EDIT: and re: whether to handle 120Hz display of 60Hz as a double-speed burst for the first frame followed by a repeat or a blank, or to abandon raster racing, I think I'd prefer the latter because latency is my overriding concern.
Einstein is relative: It doesn't add audio latency relative to joystick button. That actually decreases audio latency slightly, b
Tommy wrote:I'm emulating early-'80s machines so the real experience would have been blurry but lag free
You must mean blurry in the spatial dimension (CRTs) -- rather than blurry in the temporal dimension (CRTs were blur-free for 60 frames per second).

For readers familiar with motion blur reduction strobe backlights -- non-strobed 60fps@120Hz doesn't reduce blur unless you add black frame insertion, and you can still do 60Hz with traditional strobing (ala BenQ XL2720Z) but most gaming monitors only do 120Hz CRT clarity via 120Hz because they don't want to enable painful 60Hz flicker to end users.

While everyone has their legitimate goals & preferences that I certainly respect..... For me, my motto is usually giving users a choice of what faithfulness priorities to target is sometimes useful given ULMB actually can make emulators more faithful looking to original CRTs, but ULMB adds slight amount of lag. But so does HLSL shader and fuzzy-scanlines renderers too! More frametime lag means you're adding lag to become more faithful in a different area in a pick-your-poison way. Einstein is relative -- add less faithfulness in one area (lag) to improve faithfulness in another area (looks)
Tommy wrote:EDIT2: oh, and audio latency too if you tried to fit 60Hz to 120Hz by taking every other frame off. I think that, in summary, my perspective is that there are at least three latency factors at work here: input, video and audio, and I don't agree that any one trumps the other two.
Agreed. User needs choice.

But audio lag doesn't increase. It's visual lag being reduced (by the ultra-fast-scanout) to the point where audio may feel lag relative to visual stimuli.

However, absolute audio lag never gets longer between joystick FIRE button and the audio. It's just photons hitting eyes sooner.

Yes, you're having to buffer and dejitter the audio, but the absolute time between Joystick FIRE button and the audio stimuli never, ever gets longer, right? Yep. Now you get it -- lag is only because photons hit the eyes sooner, thanks to the fast-scanout cheat. :D
But that's fixable (see below)

Remember some TVs have unavoidable input lag, so the technique of speeding up frame delivery can somewhat compensate for a laggy TV to be more faithful (lag-wise). So the fast-scanout-method can help some laggier 120Hz-compatible TVs overcome television-buffer lag a little bit. 120Hz doesn't always mean lower absolute lag. So you can use the fast-scan-beam-racing to compensate for the television handicap. And more closely align lag to original machine. The cheat compensated for the handicapped television. Faithfulness restored, and the audio lag actually realigns itself to the display-electronics-lagged photons (no audio lag!).

As Einstein says, it is all relative -- so personally, my approach is simply to be 100% compatible with all scanout velocities, if possible -- Flexibility to remain faithful to original machine becomes widest.

For unexpected future things thrown in our direction (e.g. a laggy 120Hz display that has a buffer on the monitor side, or a laggy 60Hz HDMI 2.1 Quick Frame Transport compatible display (where the new HDMI spec creates a fast-scanout "60Hz" cycle to help overcome other things like HDR display panel lag...). 60Hz fixed-hz modes aren't always slow-scanout, I've seen 60Hz signals with big-VBI too, sometimes.

Tomorrow in year 2020, you might use HDR to improve electron gun emulation better (e.g. fringing artifacts, or making shadowmask dots brighter on blacker backgrounds), but find that HDR sometimes adds lag. Then, you know, you'll possibly be reconsidering the scanout velocity problem and suddenly finding yourself facing 60Hz HDMI 2.1 Quick Frame Transport (QFT). (Basically a 60Hz signal with humongous VBI sizes).

Some MAME arcade cabinet makers sometimes use a 31.5KHz VGA CRT to simulate 15.3KHz NTSC scanout by doing double-refresh (120Hz 240p works fine on 31.5KHz VGA-only CRT). That creates a double-image effect during 60fps scrolling, as explained by this diagram (from my 1000Hz journey article). Recently, from the software black frame insertion (that got added on my advice -- as a GroovyMAME patch a few years back; Calamity here in these forums would probably remember...).

Then sometime after that was done to improve LightBoost displays originally -- Someone clever suggested enabling software-based black frame insertion with the arcade CRT! Very clever. To black out the 2nd repeat refresh cycle to eliminate the double-image effect. Viola! Now it looks just like a perfect NTSC 15.3KHz CRT even though the VGA CRT can only do VGA 31.5KHz.

But it's a fast-scanout signal (1/120sec). Yup. Gotta beam race that fast-scanout if you want better "original-lag" faithfulness. Having audio 4ms behind video is lesser evil than audio 30ms ahead of video....Isn't that more faithful indeed?
The audio lag never worsens relative to joystick input.

Now, say, you combine software BFI with fast-beam-racing to allow a low-lag 15.3KHz emulation on a 31.5KHz CRT -- you've only simply reduced video lag to less than original machine because of fast-scanout. Yep. Video lag less than original arcade machine! Meaning audio is slightly lagged relative to the now too-fast video. But that's fixable, see below:

TIP: You can calibrate the chase distance between emuraster + realraster if you want a "video delay" :D :D :D :D ...
Nitpick fixed, eh? Could be a slider in an on-screen popup menu in your emulator.

(Remember, when rasterplotting on top of the existing emulator framebuffer, without clearing between emulator refresh cycles, we have a full refresh cycle minus one frameslice worth of jitter margin. That gives you an optional "video delay" adjustment with, just by adjusting the chase margin between realraster + emuraster -- it's fully wraparound -- chase margin varies depending on variables but can be up to ~16ms of video delay adjustment.

So sometimes actually deciding to be compatible with 1/120sec scanout actually improve faithfulness in many ways (by user configurable choice) when you're thrown imperfect hardware in your direction. (Convinced yet? ;) ...)

No worries -- if you only ever do 60Hz slow-scan beam racing, all good -- that's the most faithful way to do so. I agree. (We all have our own different approaches to how we give users a choice to abusing settings & modes to reproduce certain faithfulness aspects)
From what you said, it should be easy to add beam racing to your specific emulator.

I'm looking forward to seeing more emulators implement beam racing techniques!
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 04 Apr 2018, 16:47

Tommy wrote:My emulator supports a handful of early-'80s machines, but does so with two relevant features:
  • all produce a real video signal, to talk to an emulated CRT. Video output looks like sync level for n units of time; then this PCM data at this clock rate for m units of time; then this constant level for of units of time; etc. It's not "line based" or any other reductive term like that, it's a real video signal. Many of the machines batch up their PCM periods but it's never as much as a whole line. For composite colour machines, the signal a real NTSC signal that your GPU really decodes, etc;
  • it's predicated around on-demand emulation. The input is always that it's now time X, so would the emulator please catch up to then. Prompting events are currently either audio exhaustion (every 5–10ms, depending on your output rate — it'd scale up to megahertz if your audio card did, but permits itself some latency so as to cooperate nicely with power management) or vertical sync.
On the receiving end of the video signal is the emulated CRT. So it maintains an actual pretend flying spot, discerns horizontal and vertical syncs and attempts to phase-lock to them, and paints a pretend CRT surface.
Incredible stuff indeed!

With that, I bet you do composite-color-artifacts emulation too, accurately emulating composite artifacts. The screenshots of your emulators seem to have correct composite color artifacts, so hats off to you, I didn't know you were emulating a CRT at this fine a granularity!

That means, if any future PC emulator (DosBox?) implements your CRT emulator module, it could even become fully compatible with the 8088 MPH demo running on a 1981 IBM PC 5150.

Image

1024 colors on a CGA adaptor, all achieved via an abuse of text-mode composite color artifacts. This won Revision 2015, and here's how they successfully pulled off 1024 colors on a 4-color CGA adaptor from year 1981 .... only true CRT emulator like yours (with colorburst etc) would be able to properly reproduce the colors in this demo.

Image

Just incredible stuff, cherrypicking known composite color artifacts of a text mode to generate a brand new CGA graphics "mode" with 1024 pixel colors! Tommy, this is the stuff only people like YOU can understand -- having written a CRT emulator that actually understands the NTSC colorburst.

-------

About beam-racing a simulated curved CRT / HLSL:

Due to the very forgiving full-refresh-cycle jitter margin -- it is possible to beam-race a HLSL or CRT emulation. Even simulated curved CRTs or other geometry distortion shaders, etc.

You simply make sure your emulated CRT beam is completely scanning in a frameslice different than the realraster currently is..

So you might, for example, use a 2-frameslice margin because curved CRT emulation or fuzzy-lines simulation often has lines that overlaps 2 adjacent VSYNC OFF frameslices .... or a "2ms time offset" between emuraster + realraster. That ensures that no tearing occurs at the top (highestmost) edges of any curved-emulated-scanlines or fuzzy-blend-scanlines you do.

It's all beamraceable -- it just simply means your jitter margin is one full refresh cycle minus 2 frame slices, instead of one full refresh cycle minus 1 frame slice.
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Tommy
Posts: 6
Joined: 04 Apr 2018, 14:02

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Tommy » 04 Apr 2018, 19:19

Just to close the digression: my CRT is probably about 10 lines shy of being able to handle 8088MPH because the currently-implemented machines just explicitly say "output a colour burst, amplitude x phase y for n units of time", and if the CRT sees that metainformation anywhere in the correct window it obeys it.

It has a test in there that says that if the window is reached and only PCM data is forthcoming then it should apply a PLL to find the colour burst without being told, but doesn't actually yet attempt to do so because it can't actually inspect the PCM data. It's provided in any format the machine likes with a GLSL fragment to map it to a composite stream, but right now that's all opaque to the CPU.

But that's the only missing step; proper colours would appear if I corrected those shortcomings. All the other machines do indeed show the proper composite artefacts, whether the rainbow effect for in-phase machines like the Atari and ColecoVision or chroma crawl for machines that are closer to the proper standards like the Electron or the Oric.

Anyway, sorry for the aside.

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 04 Apr 2018, 19:25

Tommy wrote:But that's the only missing step; proper colours would appear if I corrected those shortcomings. All the other machines do indeed show the proper composite artefacts, whether the rainbow effect for in-phase machines like the Atari and ColecoVision or chroma crawl for machines that are closer to the proper standards like the Electron or the Oric.
I'm amazed that you emulate NTSC that accurately. Impressive stuff.

No apologies needed, it's all welcome side discussion....CRT emulation is a fascinating offshoot of the beam racing topic. While not fully in scope of the beam racing software tricks, it's well within the scope of at least understanding the mere concept of beam racing, which is borne out of necessity from the CRT days...

Brainstorm Moment! Oh and one can theoretically emulate phosphor fade using 4 subsequent refresh cycles (60fps at 240Hz), as a more advanced CRT-emulation-"blend-o-bfi" at the electron gun / 240Hz granularity. Too bad 240Hz monitors are TN panels and won't look very good, especially with checkerboard inversion artifacts (a property of TN panels). But if 240Hz+ OLEDs come in a few years, then, let's revisit this idea, shall we....
Hmmmmmm..... This may also be useful for future 1000Hz displays to emulate a CRT even more accurately via fuzzy-blended frame slices (like a high speed video of a CRT, but played in realtime). Basically beam-racing at the full-refresh-cycle-granularity using one fuzzy band per 1000Hz refresh cycle and completely black the rest of the screen, perhaps with phosphor-fade chasing behind! True advanced realtime CRT phosphor-fade emulation at the 1000Hz granularity, imagine that! Refresh-blending algorithms work better with more refresh cycles too (e.g. mapping 50Hz onto 240Hz) -- I do some frame blending tricks at http://www.testufo.com/vrr demo of variable refresh rates (emulating a variable refresh rate motion onto a fixed-Hz display).

No matter, I'm even getting sidetracked myself. ;)
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 04 Apr 2018, 20:02

UPDATE (for all readers), useful tip posted in WinUAE forums by mark_k. Could increase frameslice throughput.

Right now, VSYNC OFF frame-slice beam racing is mostly bottlenecked by memory bandwidth.

That said, we do care about the rest of the display though, if we want a huge beam-racing jitter margin (and/or a video delay adjustment). What would be ideal is reusing the existing VSYNC OFF framebuffers, and just blitting a new frameslice into existing GPU buffers without clearing them first.
mark_k wrote:DXGI 1.2 supports partial presentation, see Enhancing presentation with the flip model, dirty rectangles, and scrolled areas on MSDN. With that you'd use DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL or DXGI_SWAP_EFFECT_SEQUENTIAL and specify the dirty rectangle to be the just-rendered strip. (DXGI_SWAP_EFFECT_DISCARD might work as well, if Windows allows it. Since we don't care what's in the rest of the display, just the current strip.)

Also, could partial texture updates help? Use ID3D11DeviceContext::UpdateSubresource() to partially update the texture? Or maybe use multiple textures, one for each strip. Before each Present() you'd only update one texture.

Is there any way to visualise how much GPU bandwidth is being used, in real-time? (That can co-exist on-screen with WinUAE preferably.)
Interesting tip. I'll have to research this if this will increase frameslice throughput to achieve single-scanline-height frameslices. :D

At 7000 frameslices per second on 2560x1440 on my computer, that's up to 77 gigabytes per second of memory throughput (24bit framebuffers being blitted repeatedly over and over). Memory bandwidth is the bottleneck for VSYNC OFF based beam racing. But that can apparently be avoided (and also increasing frameslice throughput too!).

I've had success avoiding that memory bandwidth (to an extent) simply by not pre-clearing the framebuffer before rendering into it. VSYNC OFF still internally uses 2 different frame buffers, alternating between the two when flipped, according to my tests when I leave my buffer uncleared and keep flipping the two, it alternates pre-existing buffer junk between two different buffers between the tearlines. But this is a beneficial behaviour that can be piggybacked upon for more memory-bandwidth-optimized beamracing.

To make beam racing compatible with that, one could skip preclearing the buffers and just blit the last two emulator frameslices only. That'll cover both of the uncleared framebuffers that swap back-and-fourth during VSYNC OFF. (The back buffer & the front buffer swaps back and fourth -- once I Present(), my frame buffer suddenly has the contents of the PREVIOUS graphics ... which means back<->front swaps back and fourth during VSYNC OFF). That said, no guarantee that all VSYNC OFF implementations universally behave this way. But this is true in 99% of cases (I've seen this behavior occur on Mac). Probably best done as a configuration option for a "memory-bandwidth-reducing" setting. Will help make beam racing more compatible with laptop GPUs running off non-VRAM.

We need to reuse the whole emulator frame buffer ideally, if we want a full refresh cycle of jitter safety margin (e.g. variable distance between emulator raster & real raster). But the wonderful thing is that the frame buffer already has the graphics, so we'd only need to blit 2 frameslices (current frame slice and 1 frameslice ago) to update the reused buffer to the state we want! No clearing, no redrawing of the rest of the framebuffer (this will be a bit more difficult for fuzzylines/HLSL kind of stuff, but not impossible)...
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Tommy
Posts: 6
Joined: 04 Apr 2018, 14:02

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Tommy » 05 Apr 2018, 08:21

I don't know how relevant or helpful it is, but on embedded GPUs like those in phones and Intel's various solutions it's often more expensive not to clear a target buffer upon first binding it. The reason is that if you clear it then that puts it into a known valid state without having to restore it from anywhere; if you don't clear it then to be API compliant the previous state has to be restored from a slower memory pool.

Under OpenGL ES 3.0+ and OpenGL 4+ you should actually go one better than that and use glInvalidateFramebuffer at the end of each use of a buffer, before you select another. That signals you're fine with its contents being in an undefined state from then on. So you don't just save the cost of restoring it later, but also of storing it in the first place.

GL_EXT_discard_framebuffer is defined to do the same thing under ES 2; I'm not sure whether the Raspberry Pi provides it. Similarly, I think it's missing from the original WebGL specification but arrives as WebGL2RenderingContext.invalidateFramebuffer().

EDIT: whoops, somehow missed replying to the below entirely!
Chief Blur Buster wrote:
Tommy wrote:I'm not sure that a busy loop will be acceptable to most of my users. Which makes for a bunch of extra factors
Hmmm....
Most of the time, my Mac laptop is plugged in, so I wouldn't care during those moments.
Battery life is one issue, but some Macs have very noisy fans so I think the switch from there being no fan running to the fan blasting at full pelt is a concern. Especially if you're using the built-in speakers.
Chief Blur Buster wrote:
Tommy wrote:One possible solution? Maybe give the users a choice between....
-- Low-precision beam racing (4 frame slices can work with millisecond sleeping)
-- High-precision beam racing (10, 20+ frame slices, requires busy waiting).

You'll just simply need to use larger chase margins/offsets. (e.g. 2ms or 3ms chase distance between real-raster and emu-raster). If you use millisecond sleeping though. As long as you do the "forgiving method" (full refresh cycle jitter margin technique), you can gently use millisecond sleepers with beam racing. Now that I think about it, I suppose adding 3-4ms of extra lag to save lots of battery power is worth it! With a refresh cycle being 16.7ms, that gives you lots of play margin to tolerate an imprecise sleep.
Agreed; it's not an insurmountable obstacle, just an opinion on one way in which a user might actually be less happy with raster chasing. It'd be interesting if there were a graph of something like amount of latency versus percentage of people who are bothered, which I'm sure wouldn't reach zero until you got to 0ms, but I guess we're just going to have to use gut instinct. That being the case, I think that improving from 16.7ms to 3–4ms would indeed be a huge win, and probably good enough for at least 90% of the people for whom 16.7ms isn't already good enough.
Chief Blur Buster wrote:
Tommy wrote:I'm emulating early-'80s machines so the real experience would have been blurry but lag free
You must mean blurry in the spatial dimension (CRTs) -- rather than blurry in the temporal dimension (CRTs were blur-free for 60 frames per second).

For readers familiar with motion blur reduction strobe backlights -- non-strobed 60fps@120Hz doesn't reduce blur unless you add black frame insertion, and you can still do 60Hz with traditional strobing (ala BenQ XL2720Z) but most gaming monitors only do 120Hz CRT clarity via 120Hz because they don't want to enable painful 60Hz flicker to end users.

While everyone has their legitimate goals & preferences that I certainly respect..... For me, my motto is usually giving users a choice of what faithfulness priorities to target is sometimes useful given ULMB actually can make emulators more faithful looking to original CRTs, but ULMB adds slight amount of lag. But so does HLSL shader and fuzzy-scanlines renderers too! More frametime lag means you're adding lag to become more faithful in a different area in a pick-your-poison way. Einstein is relative -- add less faithfulness in one area (lag) to improve faithfulness in another area (looks)
I am aware that if you filmed a real CRT at 1000fps or some other very high number what you'd actually see is only some tiny portion of the screen lit at any one time, but lit very brightly so as to create persistance of vision in the receiver. So in that sense there's no temporal blurring, but subjectivity tends to be different. For hand-waving evidence, see the huge number of demos that "get more colours from the hardware" by just alternating two colours. So I think what you'd call my approach, when being the most charitable you could possibly be, is a camera-at-a-screen model. But only trivially.

Anyway, yes, I'm aware that my priorities are likely distinct. They usually are. I thought it would be easier just to state my prejudice up front.
Chief Blur Buster wrote:
Tommy wrote:EDIT2: oh, and audio latency too if you tried to fit 60Hz to 120Hz by taking every other frame off. I think that, in summary, my perspective is that there are at least three latency factors at work here: input, video and audio, and I don't agree that any one trumps the other two.
Agreed. User needs choice.

But audio lag doesn't increase. It's visual lag being reduced (by the ultra-fast-scanout) to the point where audio may feel lag relative to visual stimuli.
That surely depends on how you are generating audio. If tasked with producing a classic 44100Khz output, I currently get a guaranteed worst-case audio latency of 5.8ms because audio output exhaustion is one of the triggers to do more emulation. So that's not just sub-frame, but completely detached from the frame rate.

In the hypothetical 60->120 mapping of a surge frame then a fallow frame, the worst case is now the length of the fallow frame. So it's gone up to around 8.3ms. If I were 60->240 mapping by a surge frame and then three fallows, I'm up at 12.5ms. Etc.

Or am I labouring under a misapprehension?

Calamity
Posts: 24
Joined: 17 Mar 2018, 10:36

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Calamity » 05 Apr 2018, 10:45

Chief Blur Buster wrote:UPDATE (for all readers), useful tip posted in WinUAE forums by mark_k. Could increase frameslice throughput.
Interesting tip. I'll have to research this if this will increase frameslice throughput to achieve single-scanline-height frameslices. :D
D3D9 also allows dirty rectangles with Present(), but that requires the use of D3DSWAPEFFECT_COPY. In the tests I did this was painfully slow so I abandoned it. Maybe it's been improved on the recent API.

Based on our results so far, single-scanline frame slicing is still science-fiction nowadays. It could be done however using custom hardware that totally bypasses the os video stack, with 1:1 scaling (this would be my ultimate goal.. for next decade?).

User avatar
Chief Blur Buster
Site Admin
Posts: 11647
Joined: 05 Dec 2013, 15:44
Location: Toronto / Hamilton, Ontario, Canada
Contact:

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Chief Blur Buster » 05 Apr 2018, 11:48

Calamity wrote:Based on our results so far, single-scanline frame slicing is still science-fiction nowadays.
Yeah, science fiction for VSYNC OFF frameslicing.
But front buffers! (Which NVIDIA re-enabled for VRWorks)

Image

If we could access the front buffer, we could just rasterplot the emulator scanlines directly there, instead of an internal framebuffer.

I think it's probably doable on high end computers already if I could only enable front buffer rendering...

Look at how stable my rasters are in my YouTube, they barely jitter -- I'm running in C#, and not even at raised process priority, Dropbox is running in the background, lots of processes, yet my software still keeps rasters stable. And those are 1/1080th screen-height rasters -- even a 5-scanline jitter at 1080p is closer to 1-scanline jitter at ~240p

Even with 1-scanline-at-a-time output directly to front buffers.... I would just an approximately 10-scanline chase margin between the emulator raster and the (scaled version thereof) real-world raster. The raster beam chase margin would be configured to the size of the raster jitter for a specific given system.

There's a full-refresh-cycle of jitter margin for the beam chasing, if I'm simply using the forgiving wraparound technique, someone can just use an on-screen slider adjustment (Settings/Options) to adjust the chase-margin between realraster + emuraster. You'd watch a horizontally panning platformer-style motion, and slider the slider to as tight as possible without artifacts for your particular system. An additional use is that the slider can also act as a video delay (to better sync the audio to video) especially when your Active:VBI ratio is very different from the original system's. Easy peasy with a front buffer.

But fuggedaboudit, we don't have easy access to a front buffer (yet)....

NVIDIA/AMD, front buffer, pretty-pretty-pretty-please? ;)
Head of Blur Busters - BlurBusters.com | TestUFO.com | Follow @BlurBusters on Twitter

Image
Forum Rules wrote:  1. Rule #1: Be Nice. This is published forum rule #1. Even To Newbies & People You Disagree With!
  2. Please report rule violations If you see a post that violates forum rules, then report the post.
  3. ALWAYS respect indie testers here. See how indies are bootstrapping Blur Busters research!

Tommy
Posts: 6
Joined: 04 Apr 2018, 14:02

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Tommy » 05 Apr 2018, 12:10

I'd be more optimistic if it weren't for the very specific needs of VR: for them the ideal scenario would be dynamic reprojection as the raster runs. So you still supply a single monolithic frame every 1/90th of a second, using your classic rendering pipeline in its heavily-optimised use case, but with a little spare field of view around the current known head position, and in an ideal implementation the headset makes an instantaneous decision at each pixel which source it should look up based on instantaneous orientation.

I'm not aware of any devices that yet do that, but its advantage is that it keeps much the same decoupling while still significantly increasing responsiveness.

I'm also a big worried about eGPUs. Has anybody performed any metrics on those? I understand that when an eGPU is running an internal screen it's actually still the original GPU generating the display, with the eGPU piping data? If so then is there much extra latency there?

I guess it could flip the other way: wanting to provide eGPU support without introducing latency might effectively require regular GPUs to support beam racing without undue overhead?

Calamity
Posts: 24
Joined: 17 Mar 2018, 10:36

Re: Emulator Developers: Lagless VSYNC ON Algorithm

Post by Calamity » 05 Apr 2018, 12:19

Chief Blur Buster wrote: But fuggedaboudit, we don't have easy access to a front buffer (yet)....
You want front buffer access? Use DirectDraw :D

Promise, DirectDraw allows front buffer access. Too bad the api is totally broken/emulated since 8.
NVIDIA/AMD, front buffer, pretty-pretty-pretty-please? ;)
This is something that has to be done from the os side. We really don't want another proprietary api with pompous branding for something that is a hack.

Post Reply