I’m putting this here since I haven’t seen it mentioned elsewhere and it took me forever to figure out:
I was getting timeouts on two different NVME drives, 2 different motherboards and two different CPUs, until I extended the timeout and after that I was just getting stalls. The fix was to disable ASPM:
echo performance > /sys/module/pcie_aspm/parameters/policy
Or you can add “pcie_aspm=off” to your kernel command line.