Well, well. That’s an interesting feature you don’t want to miss out on when talking low latency. If the NIC can’t put data into Level 3 cache, you can’t win the race.
Data Direct I/O (DDIO) is a follow up on Direct Cache Access (DCA) and pretty much enabled on all recent Intel CPUs. In a latency sensitive environment (e.g running packets across PCIe for both RX and TX), this feature does bring much better results (hundreds of nanoseconds). We see this when running our libfsock trading stack , but also people have published about it [2,3,5].
Sure you want’ to have Zen in your life, but your Zen story (hello AMD…) can have a different outcome if you don’t support it.
Make sure your favorite CPU supports DDIO — otherwise it might end up with an “oh oh”. Until then, it’s better to way for Zen 4.
Update (04/20/2022) : Early test results on a Genoa AMD (Zen 4) platform do confirm, that AMD did get it right and this CPU generation is well suited for low latency (half the latency than earlier Zen generations)
We used the ExaNIC X25 and measured the time it takes for an ethernet frame to be looped back. The X25 has both RX and TX timestamps in hardware. This allows for a perfect measurement of PCIe latencies.
 Zen: https://www.britannica.com/topic/Zen
 FastSockets: libfsock trading stack, http://www.fastsockets.com