<Warning: 1% Chance Answer>
Jorim is likely more right than I am. But, did I hear right you haven't tested a different mobo?
99% chance my suggestion may not work, but some last-ditch efforts.
I am not 100% sure..... But.... that seems crappy. A 500 microsecond spike to service interrupts? I've seen better reliability even for
software timers, chrissakes (via my Tearline Jedi experience.... My
realtime Kefrens Bars required literally 1/8000sec permanent software-timer precision to function properly. Your hardware interrupt accuracy is more inaccurate than MY software-driven stuff.
There might be some kind of undiagnosed thing in the motherboard. I've seen cards function perfectly with latency spikes instead of crashes -- ECC (Error Code Correction) might be something related. I wonder if there's a utility to detect if there's some errors occuring on the PCI bus between your graphics card and PC, slowing down texture loads. That can dramatically slow down the speed of one random texture loads when a 2080 suddenly has to "fight through" a weak/defective PCI-X bus lane (RFI-wise).
(Does your PC ever crash during gaming? My system practically never does if I'm not overclocking.)
Those 500 microseconds of fault-induced delays builds up easily into 10 milliseconds of delays over the period of 20 DPC-spiked texture loads, creating a human-noticeable stutter. 0.5ms x 20 = 10ms = human noticeable.
It's like thermal throttling, except it's error-corrected throttling occuring on a PCI-X bus. Those are VERY HARD to diagnose, but I've heard of anecdotes. Maybe even an intermittent avalanche of buildups can cause human visible stutter.
(Is there a utility to diagnose what the source of those 500us spikes are?)
You've certainly done a very extensive amount of troubleshooting.
Sooooo....
Did you ever try a different motherboard? That's now my #1 numero uno tip.
You've already swapped GPUs so that's probably not the issue. But did you swap the other endpoint of the PCI-X bus: the motherboard?
While it probably only a "10% chance" of fix the problem, it's an unturned stone in your extensive troubleshoot list: You're clearly serious about troubleshooting this. Just because a card works in a PC, doesn't mean the motherboard is running the card as flawlessly as possible.
There are definitely computer hardware that adds lag instead of crashing, because something was weak (e.g. abnormally low signal strength on one of the motherboard PCI-X lanes, that is being weakened by RFI surges of nearby objects inside the computer case -- those high end GPUs are huge power hogs with RFI outbursts that can emit quite a nano-scale "EMP" at random moments. You'd be amazed at how much RFI there can occur inside a computer case. Injecting just enough ECC garbage "at a surge moment" (like a GPU clocking up/down) to randomly/intermittently lag-down a slightly weaker-than-tolerance manufactured motherboard. It might not be the motherboard's fault. It might be. But even if it is... It may be well below warranty claim thresholds, because it's working and doesn't crash, and the motherboard manufacturer doesn't cover subtle nuances always. Like a single dead pixel on a monitor with a 5-dead-pixel policy. It happens.
Try a 20% underclock of both CPU and GPU and disable clockrate changes, disable all power maangement. e.g. lock your CPU to 3GHz and lock your GPU 20% lower - and disable all power management. Does the freezing stop?
I ask you to do this because multiple clockrates inside a PC creates resonant-frequency-interference during multiple random RFIs fighting each other that creates some ECC damage. Like those rare "rogue waves" in an ocean. A perfect storm moment of RFI between all those clock chips and random power surging can create those rare inside-case RFI peaks that suddenly overcome a weak communications link (e.g. PCI-X lane with a cold solder joint) -- and triggers an avalanche of faults that builds up to a consecutive series of heavily-delayed interrupts that build up to a noticeable frame time spike.... Let's call it a "resonant cascade" (quite apt here), shall we?
Yes, I am speaking in metaphors, but to help people understand how voodoo motherboard engineering has become in recent years with ultra high clockrates at ultra-low-voltages, and the huge amount of layers of ECC slapped on as bandaids to mop up the huge messes caused by trying to milk Moore's Law further... Fun, eh?
Or sudden clockrate upshifts causing nano-EMP-surges at the beginning of the clockrate shift (like a fridge starting up etc, but). Or lots of theoreticals. But who knows what the hell is causing it -- I've heard of it all, even from motherboard manufacturers how weird things can get. There are lot of "1 in a million" motherboards that manufacturers receive back from their users, and they're quite weird and dandy in how they self-fault themselves.
Literally over >99% (pick any number of nines -- like 99.999%+) of RFI and ECC stuff doesn't create noticeable latency spikes. But RFI/ECC/feedback-loop/etc metaphorical equivalents of rogue waves do exist -- a major RFI/ECC/feedback-loop/etc surge inside a computer can cause a single freeze lasting 0.5 seconds (500,000 microseconds). They exist, even if
very rare. Basically a self-cascading event or a feedback event, one RFI/ECC event occurs that self-resonates (feedback loops on itself) to human timescales. Amazing things show up on an oscillscope sometimes at a motherboard manufacturer. Basically a fault of some kind that suddenly makes things frozen for a moment. Normally these things used to crash a computer to a hard reset, but the ECC in modern computers have become so amazingly strong. Because of those ultralow voltages and all those ultrahigh clockrates, the manufacturers had to make the computers much more ECC-robust. So a nano-scale RFI EMP/interrupts/conflicts/etc event sometimes causes a temporary freeze instead of a crash. All that happens is simply a computer freezing longer than you'd expect... whether 10ms or 100ms or 1second. Impossible to diagnose sometimes but hair pullingly maddening because so few utilities can tell you the exact problem.
But that utility are showing numbers that make my face literally grimace. Urggg...
I keep my power management enabled but, temporarily for testing's sake:
1. Try locking everything at fixed slightly-underclocked frequency with ALL power management turned off. Everything that says "power management" turn that thing off. Eliminate all clockrate shifts temporarily, even if you have to underclock a little bit to prevent thermal-throttling-induced clockrate shifts. Zero out ALL your clockrate shifting, make everything fixed-frequency (the zero-shift) with no power management and no thermal throttle (the underclock), and reduce your power load a bit (the underclock). All of that combined can reduces an RFI load of computer internals. Possibly enough to stop those latency spikes.
If that fixed the problem, then boom. You've stabilized your ship and battened down the hatches.
If not, then there's a zillion other possibilities, not worth time to diagnosing, goodbye motherboard.
So.
2. I observe you've been sticking to the same motherboard. Right?
So.... Try. A. Different. Motherboard.
Those ultra-tweaker forums elsewhere.... Often you're literally chasing red herrings 99% of the time on a wild goose chase to fix that 1% real problem. For most of us it is a waste of time but I can respect the huge deal of effort as a desire to try to solve that 1% problem.
Once everything is optimized, games are almost always the culprit for sure.
Jorim is probably more correct than I am here....but I'm adding fuel to the fire because I make the observation you haven't tested a different motherboard yet. Am I riiiiiight?
</Warning: 1% Chance Answer>