On Sun, Mar 30, 2008, Mikael Abrahamsson wrote:
On Sat, 29 Mar 2008, Frank Coluccio wrote:
Please clarify. To which network element are you referring in connection with extended lookup times? Is it the collapsed optical backbone switch, or the upstream L3 element, or perhaps both?
I am talking about the matter that the following topology:
server - 5 meter UTP - switch - 20 meter fiber - switch - 20 meter fiber - switch - 5 meter UTP - server
has worse NFS performance than:
server - 25 meter UTP - switch - 25 meter UTP - server
Imagine bringing this into metro with 1-2ms delay instead of 0.1-0.5ms.
This is one of the issues that the server/storage people have to deal with.
Thats because the LAN protocols need to be re-jiggled a little to start looking less like LAN protocols and more like WAN protocols. Similar things need to happen for applications. I helped a friend debug an NFS throughput issue between some Linux servers running Fortran-77 based numerical analysis code and a 10GE storage backend. The storage backend can push 10GE without too much trouble but the application wasn't poking the kernel in the right way (large fetches and prefetching, basically) to fully utilise the infrastructure. Oh, and kernel hz tickers can have similar effects on network traffic, if the application does dumb stuff. If you're (un)lucky then you may see 1 or 2ms of delay between packet input and scheduling processing. This doesn't matter so much over 250ms + latent links but matters on 0.1ms - 1ms latent links. (Can someone please apply some science to this and publish best practices please?) adrian