Re: latency (was: RE: cooling door)

30 Mar 2008


      swmike@swm.pp.se (Mikael Abrahamsson) writes:
...
...
Back in the 10 megabit/s days, there were switches that did cut-through, 
ie if the output port was not being used the instant the packet came in, 
it could start to send out the packet on the outgoing port before it was 
completely taken in on the incoming port (when the header was received, 
the forwarding decision was taken and the equipment would start to send 
the packet out before it was completely received from the input port).
had packet sizes scaled with LAN transmission speed, i would agree.  but
the serialization time for 1500 bytes at 10MBit was ~1.2ms, and went down
by a factor of 10 for FastE (~120us), another factor of 10 for GigE (~12us)
and another factor of 10 for 10GE (~1.2us).  even those of us using jumbo
grams are getting less serialization delay at 10GE (~7us) than we used to
get on a DEC LANbridge 100 which did cutthrough after the header (~28us).
...
..., it's the store-and-forward architecture used in all modern equipment 
(that I know of). A packet has to be completely taken in over the wire 
into a buffer, a lookup has to be done as to where this packet should be 
put out, it needs to be sent over a bus or fabric, and then it has to be 
clocked out on the outgoing port from another buffer. This adds latency in 
each switch hop on the way.
you may be right about the TCAM lookup times having an impact, i don't know
if they've kept pace with transmission speed either.  but someone's theory
here yesterday that software (kernel and IP stack) architecture is more
likely to be at fault, there are still plenty of "queue it here, it'll go
out next time the device or timer interrupt handler fires" and this can be
in the ~1ms or even ~10ms range.  this doesn't show up on file transfer
benchmarks since packet trains usually do well, but miss an ACK, or send
a ping, and you'll see a shelf.
...
As Adrian Chadd mentioned in the email sent after yours, this can of 
course be handled by modifying or creating new protocols that handle this 
fact. It's just that with what is available today, this is a problem. Each 
directory listing or file access takes a bit longer over NFS with added 
latency, and this reduces performance in current protocols.
here again it's not just the protocols, it's the application design, that 
has to be modernized.  i've written plenty of code that tries to cut down
the number of bytes of RAM that get copied or searched, which ends up not
going faster on modern CPUs (or sometimes going slower) because of the
minimum transfer size between L2 and DRAM.  similarly, a program that sped
up on a VAX 780 when i taught it to match the size domain of its disk I/O
to the 512-byte size of a disk sector, either fails to go faster on modern
high-bandwidth I/O and log structured file systems, or actually goes slower.

in other words you don't need NFS/SMB, or E-O-E, or the WAN, to erode what
used to be performance gains through efficiency.  there's plenty enough new
latency (expressed as a factor of clock speed) in the path to DRAM, the
path to SATA, and the path through ZFS, to make it necessary that any
application that wants modern performance has to be re-oriented to take
modern (which in this case means, streaming) approach.  correspondingly,
applications which take this approach, don't suffer as much when they move
from SATA to NFS or iSCSI.
...
Programmers who do client/server applications are starting to notice this
and I know of companies that put latency-inducing applications in the
development servers so that the programmer is exposed to the same
conditions in the development environment as in the real world.  This
means for some that they have to write more advanced SQL queries to get
everything done in a single query instead of asking multiple and changing
the queries depending on what the first query result was.
while i agree that turning one's SQL into transactions that are more like
applets (such that, for example, you're sending over the content for a
potential INSERT that may not happen depending on some SELECT, because the
end-to-end delay of getting back the SELECT result is so much higher than
the cost of the lost bandwidth from occasionally sending a useless INSERT)
will take better advantage of modern hardware and software architecture
(which means in this case, streaming), it's also necessary to teach our
SQL servers that ZFS "recordsize=128k" means what it says, for file system
reads and writes.  a lot of SQL users who have moved to a streaming model
using a lot of transactions have merely seen their bottleneck move from the
network into the SQL server.
...
Also, protocols such as SMB and NFS that use message blocks over TCP have 
to be abandonded and replaced with real streaming protocols and large 
window sizes. Xmodem wasn't a good idea back then, it's not a good idea 
now (even though the blocks now are larger than the 128 bytes of 20-30 
years ago).
i think xmodem and kermit moved enough total data volume (expressed as a
factor of transmission speed) back in their day to deserve an honourable
retirement.  but i'd agree, if an application is moved to a new environment
where everything (DRAM timing, CPU clock, I/O bandwidth, network bandwidth,
etc) is 10X faster, but the application only runs 2X faster, then it's time
to rethink more.  but the culprit will usually not be new network latency.
-- 
Paul Vixie