i think i understand what's going on
yet another one of my shitty paint drawings:
for single threaded rendering
input lag = (some offset) + [0,1/fps] + 1/fps
for multicore/mqm2
input lag = (the same offset) + [0,1/fps] + 2/fps

let's see how this fits with the data in the above image
for the single thread case where i get 500fps the input lag is distributed 3.4ms and 5.4ms.
input lag = 1.4ms + [0, 2ms] + 2ms
this offset is higher than what i'd expect but whatever
for the multithreaded case where i get 700fps the input lag is distributed between 4.3 and 5.7ms
input lag = 1.5ms + [0,1.4ms] + 2.8ms
seems to work out!
would be interesting to see how this would work with sli... fortunately i've just upgraded to a miniitx build so i have absolute no excuse to waste money on sli
so what are the implications of this for people like myself with single gpus...
0. it doesn't really matter unless your gpu is a potato
1. there's no disadvantage to multicore IF using it increases your framerate by 67% or more. this is where the average input lag of both are equal.