Re: latency (was: RE: cooling door)
Understandably, some applications fall into a class that requires very-short distances for the reasons you cite, although I'm still not comfortable with the setup you've outlined. Why, for example, are you showing two Ethernet switches for the fiber option (which would naturally double the switch-induced latency), but only a single switch for the UTP option? Now, I'm comfortable in ceding this point. I should have made allowances for this type of exception in my introductory post, but didn't, as I also omitted mention of other considerations for the sake of brevity. For what it's worth, propagation over copper is faster propagation over fiber, as copper has a higher nominal velocity of propagation (NVP) rating than does fiber, but not significantly greater to cause the difference you've cited. As an aside, the manner in which o-e-o and e-o-e conversions take place when transitioning from electronic to optical states, and back, affects latency differently across differing link assembly approaches used. In cases where 10Gbps or greater is being sent across a "multi-mode" fiber link in a data center or other in-building venue, for instance, "parallel optics" are most ofen used, i.e., multiple optical channels (either fibers or wavelengths) that undergo multiplexing and de-multiplexing (collectively: inverse multiplexing or channel bonding) -- as opposed to a single fiber (or a single wavelength) operating at the link's rated wire speed. By chance, is the "deserialization" you cited earlier, perhaps related to this inverse muxing process? If so, then that would explain the disconnect, and if it is so, then one shouldn't despair, because there is a direct path to avoiding this. In parallel optics, e-o processing and o-e processing is intensive at both ends of the 10G link, respectively. These have the effect of adding more latency than a single-channel approach would. Yet, most of the TIA activity taking place today that is geared to increasing data rates over in-building fiber links continues to favor multi-mode and the use of parallel optics, as opposed to specifying single-mode supporting a single channel. But singlemode solutions are also available to those who dare to be different. I'll look more closely at these issues and your original exception during the coming week, since they represent an important aspect in assessing the overall model. Thanks. Frank A. Coluccio DTI Consulting Inc. 212-587-8150 Office 347-526-6788 Mobile On Sat Mar 29 20:30 , Mikael Abrahamsson sent:
On Sat, 29 Mar 2008, Frank Coluccio wrote:
Please clarify. To which network element are you referring in connection with extended lookup times? Is it the collapsed optical backbone switch, or the upstream L3 element, or perhaps both?
I am talking about the matter that the following topology:
server - 5 meter UTP - switch - 20 meter fiber - switch - 20 meter fiber - switch - 5 meter UTP - server
has worse NFS performance than:
server - 25 meter UTP - switch - 25 meter UTP - server
Imagine bringing this into metro with 1-2ms delay instead of 0.1-0.5ms.
This is one of the issues that the server/storage people have to deal with.
-- Mikael Abrahamsson email: swmike@swm.pp.se
On Sat, 29 Mar 2008, Frank Coluccio wrote:
Understandably, some applications fall into a class that requires very-short distances for the reasons you cite, although I'm still not comfortable with the setup you've outlined. Why, for example, are you showing two Ethernet switches for the fiber option (which would naturally double the switch-induced latency), but only a single switch for the UTP option?
Yes, I am showing a case where you have switches in each rack so each rack is uplinked with a fiber to a central aggregation switch, as opposed to having a lot of UTP from the rack directly into the aggregation switch.
Now, I'm comfortable in ceding this point. I should have made allowances for this type of exception in my introductory post, but didn't, as I also omitted mention of other considerations for the sake of brevity. For what it's worth, propagation over copper is faster propagation over fiber, as copper has a higher nominal velocity of propagation (NVP) rating than does fiber, but not significantly greater to cause the difference you've cited.
The 2/3 speed of light in fiber as opposed to propagation speed in copper was not in my mind.
As an aside, the manner in which o-e-o and e-o-e conversions take place when transitioning from electronic to optical states, and back, affects latency differently across differing link assembly approaches used. In cases where 10Gbps
My opinion is that the major factors of added end-to-end latency in my example is that the packet has to be serialisted three times as opposed to once and there are three lookups instead of one. Lookups take time, putting the packet on the wire take time. Back in the 10 megabit/s days, there were switches that did cut-through, ie if the output port was not being used the instant the packet came in, it could start to send out the packet on the outgoing port before it was completely taken in on the incoming port (when the header was received, the forwarding decision was taken and the equipment would start to send the packet out before it was completely received from the input port).
By chance, is the "deserialization" you cited earlier, perhaps related to this inverse muxing process? If so, then that would explain the disconnect, and if it is so, then one shouldn't despair, because there is a direct path to avoiding this.
No, it's the store-and-forward architecture used in all modern equipment (that I know of). A packet has to be completely taken in over the wire into a buffer, a lookup has to be done as to where this packet should be put out, it needs to be sent over a bus or fabric, and then it has to be clocked out on the outgoing port from another buffer. This adds latency in each switch hop on the way. As Adrian Chadd mentioned in the email sent after yours, this can of course be handled by modifying or creating new protocols that handle this fact. It's just that with what is available today, this is a problem. Each directory listing or file access takes a bit longer over NFS with added latency, and this reduces performance in current protocols. Programmers who do client/server applications are starting to notice this and I know of companies that put latency-inducing applications in the development servers so that the programmer is exposed to the same conditions in the development environment as in the real world. This means for some that they have to write more advanced SQL queries to get everything done in a single query instead of asking multiple and changing the queries depending on what the first query result was. Also, protocols such as SMB and NFS that use message blocks over TCP have to be abandonded and replaced with real streaming protocols and large window sizes. Xmodem wasn't a good idea back then, it's not a good idea now (even though the blocks now are larger than the 128 bytes of 20-30 years ago). -- Mikael Abrahamsson email: swmike@swm.pp.se
swmike@swm.pp.se (Mikael Abrahamsson) writes:
... Back in the 10 megabit/s days, there were switches that did cut-through, ie if the output port was not being used the instant the packet came in, it could start to send out the packet on the outgoing port before it was completely taken in on the incoming port (when the header was received, the forwarding decision was taken and the equipment would start to send the packet out before it was completely received from the input port).
had packet sizes scaled with LAN transmission speed, i would agree. but the serialization time for 1500 bytes at 10MBit was ~1.2ms, and went down by a factor of 10 for FastE (~120us), another factor of 10 for GigE (~12us) and another factor of 10 for 10GE (~1.2us). even those of us using jumbo grams are getting less serialization delay at 10GE (~7us) than we used to get on a DEC LANbridge 100 which did cutthrough after the header (~28us).
..., it's the store-and-forward architecture used in all modern equipment (that I know of). A packet has to be completely taken in over the wire into a buffer, a lookup has to be done as to where this packet should be put out, it needs to be sent over a bus or fabric, and then it has to be clocked out on the outgoing port from another buffer. This adds latency in each switch hop on the way.
you may be right about the TCAM lookup times having an impact, i don't know if they've kept pace with transmission speed either. but someone's theory here yesterday that software (kernel and IP stack) architecture is more likely to be at fault, there are still plenty of "queue it here, it'll go out next time the device or timer interrupt handler fires" and this can be in the ~1ms or even ~10ms range. this doesn't show up on file transfer benchmarks since packet trains usually do well, but miss an ACK, or send a ping, and you'll see a shelf.
As Adrian Chadd mentioned in the email sent after yours, this can of course be handled by modifying or creating new protocols that handle this fact. It's just that with what is available today, this is a problem. Each directory listing or file access takes a bit longer over NFS with added latency, and this reduces performance in current protocols.
here again it's not just the protocols, it's the application design, that has to be modernized. i've written plenty of code that tries to cut down the number of bytes of RAM that get copied or searched, which ends up not going faster on modern CPUs (or sometimes going slower) because of the minimum transfer size between L2 and DRAM. similarly, a program that sped up on a VAX 780 when i taught it to match the size domain of its disk I/O to the 512-byte size of a disk sector, either fails to go faster on modern high-bandwidth I/O and log structured file systems, or actually goes slower. in other words you don't need NFS/SMB, or E-O-E, or the WAN, to erode what used to be performance gains through efficiency. there's plenty enough new latency (expressed as a factor of clock speed) in the path to DRAM, the path to SATA, and the path through ZFS, to make it necessary that any application that wants modern performance has to be re-oriented to take modern (which in this case means, streaming) approach. correspondingly, applications which take this approach, don't suffer as much when they move from SATA to NFS or iSCSI.
Programmers who do client/server applications are starting to notice this and I know of companies that put latency-inducing applications in the development servers so that the programmer is exposed to the same conditions in the development environment as in the real world. This means for some that they have to write more advanced SQL queries to get everything done in a single query instead of asking multiple and changing the queries depending on what the first query result was.
while i agree that turning one's SQL into transactions that are more like applets (such that, for example, you're sending over the content for a potential INSERT that may not happen depending on some SELECT, because the end-to-end delay of getting back the SELECT result is so much higher than the cost of the lost bandwidth from occasionally sending a useless INSERT) will take better advantage of modern hardware and software architecture (which means in this case, streaming), it's also necessary to teach our SQL servers that ZFS "recordsize=128k" means what it says, for file system reads and writes. a lot of SQL users who have moved to a streaming model using a lot of transactions have merely seen their bottleneck move from the network into the SQL server.
Also, protocols such as SMB and NFS that use message blocks over TCP have to be abandonded and replaced with real streaming protocols and large window sizes. Xmodem wasn't a good idea back then, it's not a good idea now (even though the blocks now are larger than the 128 bytes of 20-30 years ago).
i think xmodem and kermit moved enough total data volume (expressed as a factor of transmission speed) back in their day to deserve an honourable retirement. but i'd agree, if an application is moved to a new environment where everything (DRAM timing, CPU clock, I/O bandwidth, network bandwidth, etc) is 10X faster, but the application only runs 2X faster, then it's time to rethink more. but the culprit will usually not be new network latency. -- Paul Vixie
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Paul Vixie Sent: Sunday, March 30, 2008 10:35 AM To: nanog@merit.edu Subject: Re: latency (was: RE: cooling door)
swmike@swm.pp.se (Mikael Abrahamsson) writes:
Programmers who do client/server applications are starting to notice this and I know of companies that put latency-inducing applications in the development servers so that the programmer is exposed to the same conditions in the development environment as in the real world. This means for some that they have to write more advanced SQL queries to get everything done in a single query instead of asking multiple and changing the queries depending on what the first query result was.
while i agree that turning one's SQL into transactions that are more like applets (such that, for example, you're sending over the content for a potential INSERT that may not happen depending on some SELECT, because the end-to-end delay of getting back the SELECT result is so much higher than the cost of the lost bandwidth from occasionally sending a useless INSERT) will take better advantage of modern hardware and software architecture (which means in this case, streaming), it's also necessary to teach our SQL servers that ZFS "recordsize=128k" means what it says, for file system reads and writes. a lot of SQL users who have moved to a streaming model using a lot of transactions have merely seen their bottleneck move from the network into the SQL server.
I have seen first hand (worked for a company and diagnosed issues with their applications from a network perspective, prompting a major re-write of the software), where developers work with their SQL servers, application servers, and clients all on the same L2 switch. They often do not duplicate the environment they are going to be deploying the application into, and therefore assume that the "network" is going to perform the same. So, when there are problems they blame the network. Often the root problem is the architecture of the application itself and not the "network." All the servers and client workstations have Gigabit connections to the same L2 switch, and they are honestly astonished when there are issues running the same application over a typical enterprise network with clients of different speeds (10/100/1000, full and/or half duplex). Surprisingly, to me, they even expect the same performance out of a WAN. Application developers today need a "network" guy on their team. One who can help them understand how their proposed application architecture would perform over various customer networks, and that can make suggestions as to how the architecture can be modified to allow the performance of the application to take advantage of the networks' capabilities. Mikael (seems to) complain that developers have to put latency inducing applications into the development environment. I'd say that those developers are some of the few who actually have a clue, and are doing the right thing.
Also, protocols such as SMB and NFS that use message blocks over TCP have to be abandonded and replaced with real streaming protocols and large window sizes. Xmodem wasn't a good idea back then, it's not a good idea now (even though the blocks now are larger than the 128 bytes of 20- 30 years ago).
i think xmodem and kermit moved enough total data volume (expressed as a factor of transmission speed) back in their day to deserve an honourable retirement. but i'd agree, if an application is moved to a new environment where everything (DRAM timing, CPU clock, I/O bandwidth, network bandwidth, etc) is 10X faster, but the application only runs 2X faster, then it's time to rethink more. but the culprit will usually not be new network latency. -- Paul Vixie
It may be difficult to switch to a streaming protocol if the underlying data sets are block-oriented. Fred Reimer, CISSP, CCNP, CQS-VPN, CQS-ISS Senior Network Engineer Coleman Technologies, Inc. 954-298-1697
On Sun, 30 Mar 2008, Fred Reimer wrote:
application to take advantage of the networks' capabilities. Mikael (seems to) complain that developers have to put latency inducing applications into the development environment. I'd say that those developers are some of the few who actually have a clue, and are doing the right thing.
I was definately not complaining, I brought it up as an example where developers have clue and where they're doing the right thing. I've too often been involved in customer complaints which ended up being the fault of Microsoft SMB and the customers having the firm idea that it must be a network problem since MS is a world standard and that can't be changed. Even proposing to change TCP Window settings to get FTP transfers quicker is met with the same sceptisism. Even after describing to them about the propagation delay of light in fiber and the physical limitations, they're still very suspicious about it all. -- Mikael Abrahamsson email: swmike@swm.pp.se
Thanks for the clarification; that's why I put the "seems to" in the reply. Fred Reimer, CISSP, CCNP, CQS-VPN, CQS-ISS Senior Network Engineer Coleman Technologies, Inc. 954-298-1697
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Mikael Abrahamsson Sent: Sunday, March 30, 2008 12:30 PM To: nanog@merit.edu Subject: RE: latency (was: RE: cooling door)
On Sun, 30 Mar 2008, Fred Reimer wrote:
application to take advantage of the networks' capabilities. Mikael (seems to) complain that developers have to put latency inducing applications into the development environment. I'd say that those developers are some of the few who actually have a clue, and are doing the right thing.
I was definately not complaining, I brought it up as an example where developers have clue and where they're doing the right thing.
I've too often been involved in customer complaints which ended up being the fault of Microsoft SMB and the customers having the firm idea that it must be a network problem since MS is a world standard and that can't be changed. Even proposing to change TCP Window settings to get FTP transfers quicker is met with the same sceptisism.
Even after describing to them about the propagation delay of light in fiber and the physical limitations, they're still very suspicious about it all.
-- Mikael Abrahamsson email: swmike@swm.pp.se
participants (4)
-
Frank Coluccio
-
Fred Reimer
-
Mikael Abrahamsson
-
Paul Vixie