I/O connectivity, a new (5th) generation is showing off

For a long time CPUs were limited because of bottlenecks in I/O. Those bottlenecks were tackled in various places and gradually improved. We first saw CPU caches taking care of delays in memory access. Caches with varies levels (L1, …, L3) were introduced but also main memory got a lot faster — we enjoy DDR5 here in late 2021.

We also saw improvements in storage capacity. I remember the days when we were still juggling with 1.44MB Floppy disks. My personal data rate was probably a few hundred kilobytes per second when I would bicycle to university and having 10 disks with me. But hey, that was way faster than the 33kbit/s modem we had for our dial up connection.

Disks were followed by USB sticks. Those with a capacity of 8 and more GB are still nice because you can have your OS image saved on them. Takes quite a while to get them formatted, doesn’t it?

But what we see in networking as of today is that standard hardware gives you more than1GB per second through your network. Did I mention _per second_ ? Even by riding my e-bike I won’t be able to compete with that.

Design Space for Network I/O

Courtesy Bruening, Giloi [4]

For years, we saw a trend using Commercial off the shelf (COTS) for supercomputers, but recently special purpose, vendor locked networks like the TOFU [6] network are needed for maximum scale for ExaFlop Systems.

This also includes, that for traditional networking using commodity protocols like Ethernet, the I/O bus is our choice and we rely on the major vendors to increase its performance: ISA/VESA/PCI/PCI-X/PCI Express… come to mind.

Network I/O Evolution

  1. lowering latency (typically measured as a roundtrip operation and dividing the elapsed time by 2)
  2. increasing bandwidth (typically an amount of data exchanged in time)

There were a handful of vendors (Quadrics [QsNet], Dolphinics [Scalable Coherent Interface], Myricom [Myrinet] and hundreds of startups [Atoll, Dimmnet, …]) vetting on this requirement [3]. During the late nineties Myricom introduced Myrinet 2000 [2], which would eventually be used in more than 50% of the systems listed in the Top500, a respected list of the fastest supercomputers in the world.

In terms of bandwidth Myrinet 2000 would allow for 2000Mbit/s per port, hence 2+2Gbit/s (or 500MByte/s) going through a PCI-X (1.0) [1] bus (which was maxing out at 533 MB/s for its 64-bit, 66MHz implementation). At Myricom we would keep a list of motherboards which were able achieve that performance as not all chipset vendors would be able to deliver this throughput and hence not allowing the NIC to deliver its full potential.

That’s that for that on PCI-X. It was superseded by PCI express. (Yes, there was a PCI-X 2.0 specification and a handful of implementations but it never got any traction).

Myrinet 2000 at that time was a good fit. But HPC supercomputers had a need for even more bandwidth and the community established Infiniband as a standard (Initially offering Single Data Rate (SDR) = 10Gbit/s → 8Gbit/s data rate). On Myricom’s roadmap was a new NIC which would target 10Gbit/s data rates and allowing for both the Myrinet and 10GigE protocol to run on the same device (using 10-Gigabit Ethernet PHYs as the same physical layer for both). Infiniband’s roadmap not only included DDR, but QDR, EDR and many more *DRs to come. And with many more DRs available today, there is a need for much more bandwidth for network I/O (leaving the GPUs aside on this topic).

Press fast forward… PCIe Gen5

So far GT’s bandwidth doubled through each increment of PCIe generations. As an example, we’ll compute the available bandwidth for an x8 PCIe Gen3 slot: It’s 8GT * x8 * 128/130 = ~63Gbit/s . What we see here is that this won’t suit a NIC which is targeting 100Gbit/s data rates (e.g 100GbE). You would either need to switch to the next PCIe Generation (Gen4) (an x8 16GT slot giving 126Gbit/s) or add more lines, e.g using an x16 PCIe slot in combination with an x16 NIC (x16 * 8GT).

Throwing the gauntlet onto PCIe Gen5: 400GbE

Let’s see if PCIe Gen5 can offer support (single port configuration) for 400GbE: x16 * 32GT * 128/130 = 504Gbits.

All good. PCIe Gen5 can certainly deliver 400Gbit/s and you can enable 100GbE on the 2nd port, too :D

Oh, one more thing: If you are about to RX 50GBytes (400Gbit/s) of traffic per second, is your memory (capacity wise, performance wise) ready to consume it (DDR5 Scales to 8.4 Gbps) ? tbc…

More to PCIe Gen5 …

References

[2] Myrinet
https://en.wikipedia.org/wiki/Myrinet

[3] BOF on High Speed Interconnects at SC99
http://webserver.ziti.uni-heidelberg.de/atoll/bofsc99.html

[4] Ulrich Brüning, Wolfgang K. Giloi, “Future Building Blocks for Parallel Architectures”, Keynote talk at of the 2004 International Conference on Parallel Processing (ICPP.04), Montreal, CA, 2004
http://ieeexplore.ieee.org/document/1327943/

[5] DIMMnet-2, Evaluation of Network Interface Controller on DIMMnet-2 Prototype Board
https://ieeexplore.ieee.org/document/1579028

[6] The Tofu Interconnect D - Fujitsu
https://www.fujitsu.com/hk/imagesgig5/08514929.pdf

[7] CXL
https://www.computeexpresslink.org/post/introduction-to-compute-express-link-cxl-the-cpu-to-device-interconnect-breakthrough

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
NEIO Systems, Ltd.

http://fastsockets.com || low latency, networking experts, 10GbE++, FPGA trading, Linux and Windows internals gurus