![](https://secure.gravatar.com/avatar/294b6da7d3be799d3cb4f5cebe765dd9.jpg?s=120&d=mm&r=g)
swmike@swm.pp.se (Mikael Abrahamsson) writes:
... Back in the 10 megabit/s days, there were switches that did cut-through, ie if the output port was not being used the instant the packet came in, it could start to send out the packet on the outgoing port before it was completely taken in on the incoming port (when the header was received, the forwarding decision was taken and the equipment would start to send the packet out before it was completely received from the input port).
had packet sizes scaled with LAN transmission speed, i would agree. but the serialization time for 1500 bytes at 10MBit was ~1.2ms, and went down by a factor of 10 for FastE (~120us), another factor of 10 for GigE (~12us) and another factor of 10 for 10GE (~1.2us). even those of us using jumbo grams are getting less serialization delay at 10GE (~7us) than we used to get on a DEC LANbridge 100 which did cutthrough after the header (~28us).
..., it's the store-and-forward architecture used in all modern equipment (that I know of). A packet has to be completely taken in over the wire into a buffer, a lookup has to be done as to where this packet should be put out, it needs to be sent over a bus or fabric, and then it has to be clocked out on the outgoing port from another buffer. This adds latency in each switch hop on the way.
you may be right about the TCAM lookup times having an impact, i don't know if they've kept pace with transmission speed either. but someone's theory here yesterday that software (kernel and IP stack) architecture is more likely to be at fault, there are still plenty of "queue it here, it'll go out next time the device or timer interrupt handler fires" and this can be in the ~1ms or even ~10ms range. this doesn't show up on file transfer benchmarks since packet trains usually do well, but miss an ACK, or send a ping, and you'll see a shelf.
As Adrian Chadd mentioned in the email sent after yours, this can of course be handled by modifying or creating new protocols that handle this fact. It's just that with what is available today, this is a problem. Each directory listing or file access takes a bit longer over NFS with added latency, and this reduces performance in current protocols.
here again it's not just the protocols, it's the application design, that has to be modernized. i've written plenty of code that tries to cut down the number of bytes of RAM that get copied or searched, which ends up not going faster on modern CPUs (or sometimes going slower) because of the minimum transfer size between L2 and DRAM. similarly, a program that sped up on a VAX 780 when i taught it to match the size domain of its disk I/O to the 512-byte size of a disk sector, either fails to go faster on modern high-bandwidth I/O and log structured file systems, or actually goes slower. in other words you don't need NFS/SMB, or E-O-E, or the WAN, to erode what used to be performance gains through efficiency. there's plenty enough new latency (expressed as a factor of clock speed) in the path to DRAM, the path to SATA, and the path through ZFS, to make it necessary that any application that wants modern performance has to be re-oriented to take modern (which in this case means, streaming) approach. correspondingly, applications which take this approach, don't suffer as much when they move from SATA to NFS or iSCSI.
Programmers who do client/server applications are starting to notice this and I know of companies that put latency-inducing applications in the development servers so that the programmer is exposed to the same conditions in the development environment as in the real world. This means for some that they have to write more advanced SQL queries to get everything done in a single query instead of asking multiple and changing the queries depending on what the first query result was.
while i agree that turning one's SQL into transactions that are more like applets (such that, for example, you're sending over the content for a potential INSERT that may not happen depending on some SELECT, because the end-to-end delay of getting back the SELECT result is so much higher than the cost of the lost bandwidth from occasionally sending a useless INSERT) will take better advantage of modern hardware and software architecture (which means in this case, streaming), it's also necessary to teach our SQL servers that ZFS "recordsize=128k" means what it says, for file system reads and writes. a lot of SQL users who have moved to a streaming model using a lot of transactions have merely seen their bottleneck move from the network into the SQL server.
Also, protocols such as SMB and NFS that use message blocks over TCP have to be abandonded and replaced with real streaming protocols and large window sizes. Xmodem wasn't a good idea back then, it's not a good idea now (even though the blocks now are larger than the 128 bytes of 20-30 years ago).
i think xmodem and kermit moved enough total data volume (expressed as a factor of transmission speed) back in their day to deserve an honourable retirement. but i'd agree, if an application is moved to a new environment where everything (DRAM timing, CPU clock, I/O bandwidth, network bandwidth, etc) is 10X faster, but the application only runs 2X faster, then it's time to rethink more. but the culprit will usually not be new network latency. -- Paul Vixie