Latency & Response Time: Sub-Millisecond IO over 10 Million Requests

When your application communicates with a Brainboxes remote IO device over Ethernet, every command-and-response exchange takes time. This article explains what that response time means, why it matters for industrial IO and PLC applications, how to measure it accurately using HDR Histograms, and what results you can expect from Brainboxes devices.

What is Latency?

Think of latency like taking a bus to a destination. Your total journey time has two distinct parts: the time you spend waiting for the bus to arrive, and the time you spend travelling on the bus. Neither part alone tells you the full story -- your experience of the journey is the total of both combined. A fast bus is no use if you waited 30 minutes at the stop.

In the same way, when your application sends a command to a Brainboxes remote IO device, the total response time (or latency) includes:

The time for your command to travel from your application, through the software stack, across the network, to the device
The time for the device to process the command and interact with the IO hardware
The time for the response to travel all the way back

The response time is the total round-trip time from the moment your application calls SendCommand() to the moment it receives the response.

Why Latency Matters for Industrial IO

Control Loop Timing

Industrial IO applications often run in tight control loops: read a sensor, make a decision, actuate an output. The response time of each IO operation directly limits how fast the control loop can run. A device that completes an IO round trip in 0.5 ms can theoretically support 2,000 control cycles per second.

Determinism and Predictability

For many industrial applications, it is not just the average response time that matters but the worst-case response time. A system that averages 1 ms but occasionally spikes to 500 ms may be unsuitable for safety-critical or time-sensitive applications. Predictable, bounded response times are essential.

PLC and SCADA Integration

When a Brainboxes device acts as remote IO for a PLC or SCADA system, the device's response time becomes part of the PLC's scan cycle. If the IO device responds slowly, the PLC must either extend its scan cycle or risk communication timeouts. Faster and more consistent IO response times allow tighter PLC scan cycles and more responsive control.

What Affects Response Time

A command travels through many layers on its way from your application to the IO hardware and back. Each layer adds time to the overall response.

Segment	What Happens
Application layer	Your C# code calls `ed.SendCommand()` or reads `ed.Inputs[0].Value`
Software stack	.NET runtime serialises the call, OS TCP/IP stack builds packets, NIC driver queues them
PC NIC hardware	The Ethernet NIC transmits the packet onto the wire
Ethernet network	The packet travels through cables, switches, and routers
Brainboxes hardware	The device's Ethernet interface receives and buffers the packet
Brainboxes firmware	The command is parsed and the appropriate IO operation is executed
IO hardware	Physical reading of inputs or setting of outputs (relays, ADCs, DACs)
Return path	The response travels the same chain in reverse back to your application

Returning to the Bus Analogy

Just as many factors affect a bus journey, many factors affect response time. Consider what can slow a bus down:

Bus Journey	System Equivalent
Traffic lights and junctions	Network switches and routers -- each hop adds processing time
Road works	Network congestion, packet retransmissions, or TCP retries
Driver's familiarity with the route	Firmware optimisation and protocol efficiency on the device
Bus engine maintenance	Hardware condition -- NIC performance, cable quality
Time of day and busyness of roads	Network load -- other devices competing for bandwidth
Weather	Electromagnetic interference, environmental conditions affecting signal quality

On a good day with clear roads, your bus arrives quickly and predictably. On a bad day, any single factor can slow the whole journey. The same applies to system latency: every component in the chain contributes, and any one of them can introduce delay.

The total response time is the sum of all these segments in both directions. In a well-configured test environment -- direct cable, static IP, no switches -- network latency is minimised, so the measured time primarily reflects the device's own processing and IO hardware performance.

Latency Does Not Follow a Normal Distribution

A common assumption is that if the average response time is 0.5 ms, most responses will be close to 0.5 ms, with a symmetrical spread above and below. This would be true if latency followed a normal (bell curve) distribution. But it does not.

  Normal Distribution                    Long-Tail Distribution (Latency)

        ▄███▄                            █
       ▐█████▌                           █▌
      ▐███████▌                          ██
     ▄█████████▄                         ██▌
   ▄█████████████▄                       ███
 ▄█████████████████▄                     ███▌
█████████████████████                    ████▄▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
─────────┼─────────                      ──────────────────────────────▶
      mean = median                      median    99%  99.99%     max
                                              ◀─ long tail ──────────▶
  Symmetrical: most values                Skewed: most values are fast,
  cluster around the mean                 but outliers can be 10-100x slower

In practice, latency distributions have long tails. The vast majority of responses are fast and tightly clustered, but a small number of outliers take much longer -- sometimes 10x, 50x, or even 100x slower than the median.

Why Outliers Matter

In an industrial system processing millions of IO operations, even a tiny percentage of slow responses can cause real problems. Consider: if 99.99% of responses complete in under 1 ms but the remaining 0.01% take 35 ms, that means every million operations includes about 100 operations that are 70 times slower than normal. For systems where every cycle must complete within a deadline, these outliers define the true worst-case behaviour of the system.

A small number of outliers often has a disproportionate effect on overall system performance. It is not the typical case that causes failures -- it is the exceptional one.

Mean and Standard Deviation Are Not Enough

Because the distribution is not normal, the mean and standard deviation do not give a useful picture of latency behaviour. What you need is a way to see the full percentile distribution -- especially at the extreme tail. You need to be able to answer questions like: "What response time did 99.99% of my requests complete within?" and "How bad is my worst 1-in-a-million case?"

For example, the ED-588 with a single connection shows a mean of 0.496 ms and a standard deviation of just 0.085 ms. These numbers suggest very tight, consistent performance. But looking deeper: 99.9% of requests complete within 0.645 ms, while at 99.999% (five nines) the latency jumps to 11 ms, and the absolute maximum across 10 million requests is 35 ms. The mean alone hides this 70x jump entirely.

The Compounding Effect of Latency

The tail latency problem gets worse when a system must make multiple requests to complete a single operation. The end user does not experience the median latency of an individual request -- they experience the worst request in the chain.

A Simple Example: 10 Friends Meeting for Coffee

Imagine 10 friends agree to meet at a coffee shop at 10am. Each takes a different bus, planning their journey based on the typical (median) travel time. They all expect to arrive right on time.

What are the chances all 10 arrive on time?

If each friend has a 95% chance of being on time -- a pretty good bus service -- the probability that all 10 arrive on time is:

The Maths

0.95^10 = 59.9%

Nearly half the time, someone is late. Even with a 95% reliable journey, the group only has a 60% chance of all being there on time. With a more realistic 90% on-time rate per person, it drops to:

The Maths

0.90^10 = 34.9%

Two times out of three, the group is waiting for someone. The group's experience is not the median -- it is dominated by the slowest individual.

How This Applies to Systems

Consider a system that must make 100 internal IO requests to present one complete result to the user. The user does not experience the median latency of a single request. They experience the worst of those 100 requests. If a typical user interaction involves viewing 5 such results, they are exposed to the worst of 500 individual requests. Most users will experience worse than the 99th percentile of individual request latency.

An Industrial Example

An industrial control system makes 50 remote IO requests per cycle and runs 2,000 cycles per day. That is 100,000 individual IO requests per day.

If 99.99% of requests complete within 1 ms but the remaining 0.01% take 35 ms, that means roughly 10 slow requests per day. For a factory operator who needs tasks to complete within tight time windows, experiencing a noticeable delay at least twice per day is almost certain.

The probability of seeing zero delays in a day:

The Maths

0.9999^100,000 ≈ 0.005 -- less than 1%

This is why tail latency matters so much, and why measuring at the 99.99th percentile and beyond is not academic -- it is the reality your users and operators experience every day.

HDR Histogram: A Better Way to Measure Latency

HDR Histogram (High Dynamic Range Histogram) is a data structure designed by Gil Tene to record and analyse latency distributions with precision across a very wide range of values. Unlike a simple average or a standard histogram with fixed-width bins, an HDR Histogram preserves detail at every percentile -- from the median all the way out to the 99.9999th percentile and beyond.

This matters because it lets you answer the questions that actually determine system behaviour: not just "what is the average?", but "what response time did 99.99% of requests complete within?" and "how does my tail latency compare at different concurrency levels?"

Gil Tene's talk How NOT to Measure Latency is essential viewing for anyone working with latency-sensitive systems. It explains why common measurement approaches produce misleading results and how HDR Histograms solve the problem.

The HDR Histogram project provides implementations in many languages including C# (via the HdrHistogram NuGet package). Test results are saved as .hgrm files, which can be plotted interactively using the HdrHistogram online plotter.

How to Read an HDR Percentile Plot

HDR percentile plots have the percentile on the X-axis (scaled logarithmically toward 100%) and latency on the Y-axis. A flat horizontal line means consistent, predictable performance. An upward curve at the right-hand side reveals tail latency -- the behaviour of the slowest responses. The further right the line stays flat before curving up, the more "nines" of consistent performance the system delivers.

Brainboxes Response Time Performance

Brainboxes remote IO devices deliver robust, predictable response times to a very high number of nines. Here are the results from an ED-588 running the @AA command (read all digital IO line states) over the ASCII protocol:

Concurrent Connections	Mean (ms)	99th %ile (ms)	99.99th %ile (ms)	Max (ms)	Total Requests
1	0.496	~0.588	~0.866	35.123	10,000,000
2	0.642	~0.805	~10.758	11.053	100,000
4	1.292	~1.487	~25.843	41.677	100,000
8	2.487	~24.307	~79.360	99.277	100,000

Key observations:

Sub-millisecond median: With a single connection, the median response time is 0.495 ms -- under half a millisecond.
Extremely tight distribution: From the 50th to the 99th percentile, latency increases by only 0.093 ms (from 0.495 to 0.588 ms). 99 out of 100 requests complete within 19% of the median.
Very high nines: At 99.9% (three nines), latency is still only 0.645 ms. At 99.99% (four nines), it is 0.866 ms -- still under 1 ms.
Graceful concurrency scaling: With 2 concurrent connections, the mean rises to 0.642 ms (1.3x). With 4, it reaches 1.292 ms (2.6x). The near-linear scaling shows the device handles concurrent load predictably.

Test Setup

The results above were collected under controlled conditions designed to isolate the device's own response time performance.

Software: C# test application using the Brainboxes.IO .NET API with the HdrHistogram NuGet package for latency recording.
Network: The PC's Ethernet NIC is connected directly to the Brainboxes device with a single Ethernet cable. Both use static IP addresses. No switches, routers, or other network devices are in the path.
Protocol: ASCII protocol on TCP port 9500.
Timing resolution: Stopwatch.GetTimestamp() provides 100-nanosecond resolution -- the highest available in .NET.
Test duration: The single-connection test runs 10 million iterations. Multi-connection tests run 100,000 iterations each. The 10 million iteration test provides extremely high precision at the tail percentiles.

The Test Code

Run It Yourself

The Brainboxes.IO .NET API is open source. You can download the code from GitHub and run these latency tests against your own Brainboxes devices using LatencyTestASCII.cs.

Here are the key portions that show how the measurements work.

Histogram Setup

(View on GitHub -- line 209)

LatencyTestASCII.cs - Histogram Setup
// A Histogram covering the range from 1 tick (100ns) to 1 hour
// with a resolution of 3 significant figures:
var histogram = new LongConcurrentHistogram(1, TimeStamp.Hours(1), 3);

LongConcurrentHistogram is thread-safe, so it can be written to from multiple parallel connections simultaneously. The range covers 1 tick (100 ns) up to 1 hour, with 3 significant digits of precision -- meaning values are recorded with 0.1% accuracy across the entire range.

The Measurement Loop

(View on GitHub -- lines 221-254)

LatencyTestASCII.cs - Measurement Loop
Parallel.For(0, parallelRequests, (p) =>
{
    EDDevice ed = new EDDevice(new TCPConnection(ip, port, 10000), new ASCIIProtocol());
    ed.Connect();
    for (int i = 0; i < requestsPerLoop; i++)
    {
        string newCommand = nextCommand();
        long startTimestamp = Stopwatch.GetTimestamp();
        ed.SendCommand(newCommand);
        long elapsed = Stopwatch.GetTimestamp() - startTimestamp;
        histogram.RecordValue(elapsed);
    }
    ed.Disconnect();
});

Each parallel connection creates its own EDDevice with a separate TCPConnection and ASCIIProtocol. The timing wraps only the SendCommand() call -- the full synchronous send-and-receive round trip. The elapsed time in ticks is recorded directly into the concurrent histogram.

Output

(View on GitHub -- lines 257-267)

LatencyTestASCII.cs - Output
histogram.OutputPercentileDistribution(
    writer: writer,
    outputValueUnitScalingRatio: OutputScalingFactor.TimeStampToMilliseconds);
string fileName = $"{deviceType}-{commandName}-{parallelRequests}x.hgrm";
File.WriteAllText(fileName, writer.ToString());

The histogram outputs its full percentile distribution with values scaled to milliseconds and writes the result to a .hgrm file. These files can be uploaded directly to the HdrHistogram online plotter for visualisation.

Command Generators

(View on GitHub -- full test file)

The test suite exercises several different ASCII commands:

Test Method	Command	Description
`AtAALatencyTest`	`@AA`	Read all digital IO line states
`AtAADDLatencyTest`	`@AA(Data)`	Set digital outputs to random values
`HashAANLatencyTest`	`#AAN`	Read digital input counter for a random channel
`AA1N0FLatencyTest`	`#AA1N0F`	Set all 4 analog outputs to random voltages (ED-560)
`HashAA1NVVDataAnalogSetMultipleLines`	`#AA1NVVData`	Set multiple analog output channels simultaneously

Each command generator produces a new command string for every iteration. Commands that accept data parameters use random values to exercise different code paths on the device firmware.

Reading the Results

The .hgrm output files contain the full percentile distribution. Here is a condensed excerpt from [email protected] (10 million requests, single connection):

[email protected] -- percentile distribution (click to expand)

[email protected]
       Value     Percentile TotalCount 1/(1-Percentile)

       0.429 0.000000000000          1           1.00
       0.457 0.100000000000    1017078           1.11
       0.495 0.500000000000    5019833           2.00
       0.537 0.900000000000    9013591          10.00
       0.588 0.990625000000    9907099         106.67
       0.645 0.999023437500    9990355        1024.00
       0.866 0.999877929688    9998785        8192.00
      11.162 0.999992370605    9999924      131072.00
      35.123 0.999999904633   10000000    10485760.00
#[Mean    =        0.496, StdDeviation   =        0.085]
#[Max     =       35.123, Total count    =     10000000]

Each column tells you something specific:

Column	Meaning
Value	The response time in milliseconds at this percentile
Percentile	The fraction of all requests that completed at or below this value
TotalCount	The cumulative number of requests at or below this value
1/(1-Percentile)	The "one in N" ratio -- how many requests you would need to send before seeing one slower than this value. At the 99.9th percentile (1,024), one in every 1,024 requests exceeds 0.645 ms

The HDR Histogram Graph

ED-588 @AA HDR Histogram - Latency by Percentile Distribution

This graph shows the latency percentile distribution for the ED-588 @AA command with 1, 2, and 4 concurrent connections.

The X-axis shows the percentile on a logarithmic scale stretching from 0% to 99.99999%. The Y-axis shows latency in milliseconds.

Key observations:

The 1x line (blue) stays flat near 0.5 ms for the first 99.99% of all requests, only rising at the extreme tail to reach 35 ms at the absolute maximum. This flatness demonstrates remarkably consistent performance across 10 million operations.
The 2x line (orange) runs slightly higher (~0.6 ms) and also remains flat until the extreme tail, reaching ~11 ms.
The 4x line (red) runs at ~1.3 ms and shows a more gradual rise starting around the 99th percentile, where contention between the four concurrent connections becomes visible.
The fact that all three lines remain flat for so long -- well past 99% and even past 99.99% for the single connection case -- demonstrates the device's consistent and predictable performance under load.

Comparison to Other Industrial IO Systems

To put these results in context, here is how Brainboxes response times compare to other common industrial IO approaches:

System Type	Typical Response Time	Notes
Brainboxes ED-588 (direct Ethernet)	0.5 ms median, <1 ms at 99th %ile	ASCII protocol over direct Ethernet cable
PLC local IO backplane	0.01 - 0.1 ms	Fastest option, but requires physical co-location
Modbus TCP over managed network	2 - 10 ms	Standard industrial Ethernet with managed switches
Modbus RTU (serial RS-485)	5 - 50 ms	Depends on baud rate and bus utilisation
Wi-Fi or wireless IO	10 - 100+ ms	Highly variable, not suitable for fast control loops
Cloud/MQTT-based IO	50 - 500+ ms	Round trip through internet infrastructure

Brainboxes devices on a direct Ethernet connection achieve response times approaching those of dedicated fieldbus systems, while maintaining the flexibility and cost advantage of standard Ethernet TCP/IP. The sub-millisecond median with consistent tail latency makes them well suited for demanding real-time control applications.

Production Deployments

The test results above use a direct cable connection. In a production deployment with network switches, response times will be somewhat higher but typically remain well under 5 ms on a properly configured industrial Ethernet network.

Published Results for All Devices

Brainboxes will be publishing HDR Histogram response time results for all IO devices in the ED range, covering both the ASCII protocol (TCP port 9500) and Modbus TCP protocol (TCP port 502). This will give a complete picture of device performance across both communication methods.

Currently available results cover the ED-588 (digital IO, @AA command) and ED-560 (analog output, #AA1N0F command) over ASCII, with Modbus TCP results and additional devices to follow.

What is Latency?​

Why Latency Matters for Industrial IO​

Control Loop Timing​

Determinism and Predictability​

PLC and SCADA Integration​

What Affects Response Time​

Returning to the Bus Analogy​

Latency Does Not Follow a Normal Distribution​

Why Outliers Matter​

Mean and Standard Deviation Are Not Enough​

The Compounding Effect of Latency​

A Simple Example: 10 Friends Meeting for Coffee​

How This Applies to Systems​

An Industrial Example​

HDR Histogram: A Better Way to Measure Latency​

How to Read an HDR Percentile Plot​

Brainboxes Response Time Performance​

Test Setup​

The Test Code​

Histogram Setup​

The Measurement Loop​

Output​

Command Generators​

Reading the Results​

The HDR Histogram Graph​

Comparison to Other Industrial IO Systems​

Published Results for All Devices​

Further Reading​

What is Latency?

Why Latency Matters for Industrial IO

Control Loop Timing

Determinism and Predictability

PLC and SCADA Integration

What Affects Response Time

Returning to the Bus Analogy

Latency Does Not Follow a Normal Distribution

Why Outliers Matter

Mean and Standard Deviation Are Not Enough

The Compounding Effect of Latency

A Simple Example: 10 Friends Meeting for Coffee

How This Applies to Systems

An Industrial Example

HDR Histogram: A Better Way to Measure Latency

How to Read an HDR Percentile Plot

Brainboxes Response Time Performance

Test Setup

The Test Code

Histogram Setup

The Measurement Loop

Output

Command Generators

Reading the Results

The HDR Histogram Graph

Comparison to Other Industrial IO Systems

Published Results for All Devices

Further Reading