The only way around it I have found is to dispatch far fewer thread groups per call and put a D3D flush in between them but this understandably destroys performance and is really not a good solution.
I am fortunate in a way that with effort I can separate my problem set into multiple buffers which I can run independently and accumulate at the end but this adds a whole bunch of code, effort, complexity and work to a problem that simply would not be required if Windows would allow compute shaders to run for arbitrary amounts of time without rebooting the driver.
I am still left with the problem though that I can't tell how much work can be done in this magic two second window so I need to divide up the work into tiny pieces so low end GPUs don't choke which will then waste loads of performance on higher end hardware and defeat much of the point in using GPU acceleration in the first place.
I know why the TDR feature is there, but surely there must be a better way that DirectX and compute shaders can play together with it?