Blur Busters Forums

Posted: **28 Feb 2015, 23:13**

heh, I'm completely lost now. I'll do some diagrams later and see if that helps me. I'll post them here. I have a feeling I'm just not grasping something obvious.

One last question before I try to tackle this with diagrams - is there any significance to separating out CPU vs GPU for understanding the general way this works? For now, can I just lump them together as "frame calculation time"?

and thank you for your patience

Posted: **01 Mar 2015, 00:55**

spacediver wrote:heh, I'm completely lost now. I'll do some diagrams later and see if that helps me. I'll post them here. I have a feeling I'm just not grasping something obvious.

One last question before I try to tackle this with diagrams - is there any significance to separating out CPU vs GPU for understanding the general way this works? For now, can I just lump them together as "frame calculation time"?

and thank you for your patience

You're thinking of the first frame going into an empty pipeline, Take a look at what happens to the subsequent frames as the pipeline fills up when limited by 60hz v-sync(it helps to work backwards):

Frame time for frame 1, 2, 3, 4
CPU starts at 0,5,15, 31.67
CPU finished at 5,10, 20, 36.67
GPU starts at 5, 15, 31.67, 48.33
GPU finished at 15, 25, 41.67, 58.33
Monitor starts at 15, 31.67, 48.33, 65
Monitor finished at 31.67, 48.33, 65, 81.67

And compare it to what it looks like if you capped the framerate at 60fps in game(assuming a variable refresh monitor here):
CPU starts at 0,16.67, 33.33, 50
CPU finished at 5, 21.67, 38.33, 55
GPU starts at 5, 21.67, 38.33, 55
GPU finished at 15, 31.67, 48.33, 65
Monitor starts at 15, 31.67, 48.33, 65
Monitor finished at 31.67, 48.33, 65, 81.67

So if you click the mouse at 15.5ms, you would see the results at 65ms in the first scenario, and at 48.33ms in the second.

Obviously you don't get perfectly predictable calculation times in real life, but the backpressure principle does apply.

Posted: **01 Mar 2015, 01:13**

I think I'm missing some key concept that's required to even understand all this. I know that CPU means central processing unit, and GPU means graphics processing unit, but I have no idea why they're being distinguished here. Can you at least give me a hint about why this CPU/GPU distinction is important?

Posted: **01 Mar 2015, 02:23**

Pipelining is basically breaking a task into chunks, like an assembly line, so that you can work on each chunk in a more specialized, or more parallelized, manner. I had a post explaining it in more detail, but this does it better: http://www.anandtech.com/show/6857/amd- ... ap-fraps/2

If you don't want to go into that much detail, the CPU is basically the architect designing a frame and making plans, while the GPU is the army of workers that turns the plans into the finished frame.

Posted: **01 Mar 2015, 02:40**

thanks, lots of reading and learning to do. Given how daft my brain is at certain things, I think I'll need to build a simulation in Matlab of the pipeline to see how it affects input lag. That's usually how I learn tricky concepts these days.

Posted: **01 Mar 2015, 11:54**

I think the analogy Sparky used was pretty convenient.

Think of a supermarket (CPU). Customers (frames) fill their shopping bags with the stuff they need (your input) and then head to the checkout (GPU). If the customers are slower at finding the stuff they need than the checkout can deal with them, there will be no line (no frame queue) but also less customers per time (FPS in a CPU-limited environment).
If the customers come faster then the checkout can process them there will be a queue. The checkout speed will now determine the amount of customers leaving the store per time unit (GPU-limited FPS) and since there is a queue, the groceries get less fresh with passing time (input lag).
A limit for customers entering the market (pre-render limit) can help in such a scenario where the checkout would otherwise be overwhelmed. But that also means again less customers processed per time.
Now imagine that already busy checkout having to deal with customers of two supermarkets (multithreaded rendering). Longer queues, less fresh groceries and still not more customers per time processed.
Reducing the shopping bag size (resolution) and complexity of the things the checkout has to do (graphical detail, post-processing) can help fasten the checkout process, preferably to a degree where the amount of customers leaving the market is not determined by the checkout speed any longer (basically you want to be CPU-limited).

You could extend the analogy to where the supermarket's exit is the monitor. VSync means the door opens, allows 1 customer to pass through, closes, and then opens again for the next customer, creating potentially another queue where groceries get less fresh. Non-VSync means the door stays open and customers can leave at the pace they are processed.

SLI/CrossFire would be adding another checkout to deal with the flood of customers, the problem being that if there are less customers than you'd need two checkouts for, there's forced queuing because the checkouts split the load between them (alternated frame rendering; the second GPU starts working on a frame while the first one is half-way finished, meaning there will be two consecutive frames containing old input whereas a single GPU would have processed both timely as well and without forcing older input to queue up).
Another problem with SLI would be microstuttering - one of the two checkouts spends a bit more time dealing with a customer, and so while the amount of customers leaving stays the same, the time difference between any customers leaving constantly varies (at 100fps for example you'd want each frame to be finished 10ms after the previous frame - in SLI/CF, assuming perfect rendering times, GPU1 starts working on frame1 at 0ms, GPU2 starts working on frame2 at 5ms, GPU1 finishes frame1 at 10ms, GPU2 finishes frame2 at 15ms (5ms between frame1 and frame2), so now the pipe has to wait in order to sustain the 10ms interval, so GPU1 starts working on frame3 at 20ms in order to finish rendering at 30ms (15ms after frame2), while GPU2 starts at 25ms and finishes frame4 at 35ms (again 5ms after the previous frame, so the next rendering will be delayed by 5ms again etc. and you get constantly alternating times of 5ms and 15ms between frames finishing even though the GPU rendering time is precisely 10ms for each rendering process). With unlimited FPS this is not a problem, but there's still increased microstuttering because instead of having only the inconsistencies in rendering time you encounter with one GPU, you have two.

Posted: **01 Mar 2015, 12:07**

A traffic jam on the highway might be a better analogy for the input lag consequences. You have the same number of cars passing any given point on the highway in a given amount of time(framerate), but the cars after the bottleneck move faster than the cars before the bottleneck. If you limit the number of cars so they don't jam up at the bottleneck, each car never has to slow down, so it spends less time on the highway.

Posted: **01 Mar 2015, 14:29**

thanks sparky, stirner. The supermarket analogy is useful, and you fleshed it out really nicely. I'm gonna read through Derek Wilson's input lag piece: http://www.anandtech.com/show/2803 also, and hopefully things will start to make sense.

Once I have a better grasp of things, I'll try to build a matlab simulation that incorporates USB polling, CPU processing, GPU processing, and display scanning, in a scenario where a constant stream of identical frames are rendered until a user hits a mouse button to change the color of the screen (say from red to green). Then I can start to explore how these various scenarios affect input lag (vsync vs non vsync, cpu limited vs gpu limited). Might have to wait until thursday as I have a major deadline to meet.

Posted: **02 Mar 2015, 01:31**

usb polling and display scanning are simple. cpu and gpu... it's a mysterious black box

Posted: **02 Mar 2015, 04:31**

my plan is just to treat CPU and GPU as processes that each take a fixed amount of time. Right now I'm conceptualizing them as just two consecutive processes. I'm not really considering what they do - Just thinking of them as process A and process B, and seeing how changing their durations affects things.

Not sure if thinking about it this way is missing something essential...

Blur Busters Forums

flood's input lag measurements

Re: flood's input lag measurements

Re: flood's input lag measurements

Re: flood's input lag measurements

Re: flood's input lag measurements

Re: flood's input lag measurements

Re: flood's input lag measurements

Re: flood's input lag measurements

Re: flood's input lag measurements

Re: flood's input lag measurements

Re: flood's input lag measurements