10GbE vs 25GbE Latency or … How to DUT

NEIO Systems, Ltd.
4 min readAug 31, 2021

--

There is ongoing excitement moving from 10GbE to 25GbE in network connectivity. For some use cases like 4K streaming (video) this is already a requirement. In finance (trading), we see network upgrades on the south end of things. While we can easily see that it more than doubles the potential bandwidth, we can’t say this applies to things like latency. Yes, we have a faster serialization but how does this affect real world applications which rely on latency?

We can take two approaches for measuring the latency improvement. First would be the traditional pingpong benchmark between two identical hosts. We would take a 10/25GbE capable device, use a SFP28 cable and run the benchmark twice: a) in 10GbE mode and b) in 25GbE mode.

The second approach is using a method known as device under test (DUT). The benchmark offers both modes. It’s available as part of the FastSockets middleware framework from NEIO Systems, Ltd [1]

The following results present the 10GbE and 25GbE modes available on the popular ExaNIC X25 from Exablaze, now Cisco [2]

PingPong Results

A pingpong benchmark is well known into measuring latency, in which it reports roundtrip times divided by 2. This allows taking a timestamp right before sending (via the host) and right after its reply has been received.

We’ll run this twice, in 10GbE modes and 25GbE modes, no switch involved.

Performance for 10GbE

C:\ExaNIC\bin>fsock_pingpong.exe -h 10.0.0.102 -l 10.0.0.103 -E 2
fsock_pingpong.exe[4284] FSOCK 4.002-0007-08222021 Copyright NEIO Systems, Ltd. 2020-2021
> Successfully received Remote MAC Address : 64:3f:5f:01:c0:56
> Successfully received Local MAC Address : 64:3f:5f:01:c0:54
Half Round-Trip latency benchmark [UDP:FSOCK_RECV_DEFAULT]
Size 1: 710 (ns) (mean=830 median=800, 99%=950)
Size 2: 720 (ns) (mean=840 median=805, 99%=950)
Done.

Performance for 25GbE

C:\ExaNIC\bin>fsock_pingpong.exe -h 10.0.0.102 -l 10.0.0.103 -E 2
fsock_pingpong.exe[3024] FSOCK 4.002-0007-08222021 Copyright NEIO Systems, Ltd. 2020-2021
> Successfully received Remote MAC Address : 64:3f:5f:01:c0:56
> Successfully received Local MAC Address : 64:3f:5f:01:c0:54
Half Round-Trip latency benchmark [UDP:FSOCK_RECV_DEFAULT]
Size 1: 652 (ns) (mean=762 median=750, 99%=950)
Size 2: 653 (ns) (mean=763 median=752, 99%=950)
Done.

DUT Results

Another way is to use the device under test (DUT) approach. A device under test, also known as equipment under test (EUT) and unit under test (UUT), is a manufactured product undergoing testing. The benefit is that no technical details need to be known and we can treat it as a black box.

DUT Overview

Using this setup, we will measure its portion of the pingpong latency. That is we will following the pingpong benchmark approach by sending a message to the (remote) application and waiting for its reply. For the measurement we will then take the local time of the measurement device (an ExaNIC/Nexus X25 in this case) when the message was sent (at the PHY level) and received back (at the PHY level). This is not exactly half of the latency as it includes the cable and transceivers as well. But our focus here is to analyze the gain when using 10GbE vs 25GbE. As another benefit of this approach we can see that we have control of the measurement quality. We are already way below the 1us latency (even sub 700ns latencies are common), and using the host can introduce side effects.

Our measurement device is having a resolution of just 4ns and hence a good choice for measuring the difference between 10 and 25 GbE.

DUT using ExaNIC X25 relying on its TX and RX hardware timestamps

Performance for 10GbE

C:\ExaNIC\bin>fsock_pingpong.exe -h 10.0.0.102 -l 10.0.0.103 -E 2 -d
fsock_pingpong.exe[1840] FSOCK 4.002–0007–08222021 Copyright NEIO Systems, Ltd. 2020–2021
> Successfully received Remote MAC Address : 64:3f:5f:01:c0:56
> Successfully received Local MAC Address : 64:3f:5f:01:c0:54
Measuring dev_under_test. time elapsed (NIC TX — RX)
Size 1: 748.00 (ns) (mean=857.75 median=828.00)
Size 2: 752.00 (ns) (mean=856.83 median=828.00)
Done.

Performance for 25GbE

C:\ExaNIC\bin>fsock_pingpong.exe -h 10.0.0.102 -l 10.0.0.103 -E 2 -d
fsock_pingpong.exe[4468] FSOCK 4.002-0007-08222021 Copyright NEIO Systems, Ltd. 2020-2021
> Successfully received Remote MAC Address : 64:3f:5f:01:c0:56
> Successfully received Local MAC Address : 64:3f:5f:01:c0:54
Measuring dev_under_test. time elapsed (NIC TX - RX)
Size 1: 696.00 (ns) (mean=800.57 median=772.00)
Size 2: 696.00 (ns) (mean=801.51 median=772.00)
Done.

We see that the switch from 10 → 25 GbE does give a noticeable improvement of about 50ns — quite a good amount if you put this into relation to the already low latency available.

ps: latency values rely on the CPU. The faster the better. The results were obtained on a 5Ghz i7 CPU

Summary

Switching from 10GbE to 25GbE does show a noticeable and measureable improvement. As a basic rule we can say it improves the latency by 50ns

References:

[1] FastSockets, http://www.fastsockets.com , NEIO Systems, Ltd.

[2] ExaNIC X25 and other SmartNICs from Cisco: https://www.cisco.com/c/en/us/products/interfaces-modules/nexus-smartnic/index.html

--

--

NEIO Systems, Ltd.
NEIO Systems, Ltd.

Written by NEIO Systems, Ltd.

http://fastsockets.com || low latency, networking experts, 10GbE++, FPGA trading, Linux and Windows internals gurus