
Leo Bicknell writes:
However, if you put 15G down your "20G" path, you have no redundancy. In a cut, dropping 5G on the floor, causing 33% packet loss is not "up", it might as well be down.
Sorry, it doesn't work like that either. 33% packet loss is an upper limit, but not what you'd see in practice. The vast majority of traffic is responsive to congestion and will back off. It is difficult to predict that actual drop rate; that depends a lot on your traffic mix. A million "web mice" are much less elastic than a dozen bulk transfers. It is true that on average (averaged over all bytes), *throughput* will go down by 33%. But this reduction will not be distributed evenly over all connections. In an extreme (ly benign) case, 6G of the 20G are 30 NNTP connections normally running at 200 Mb/s each, with 50 ms RTT. A drop rate of just 0.01% will cause those connections to back down to 20 Mb/s each (0.6 Gb/s total). This alone is more than enough to handle the capacity reduction. All other connections will (absent other QoS mechanisms) see the same 0.01% loss, but this won't cause serious issues to most applications. What users WILL notice is when suddenly there's a 200ms standing queue because of the overload situation. This is a case for using RED (or small router buffers). Another trick would be to preferentially drop "low-value" traffic, so that other users wouldn't have to experience loss (or even delay, depending on configuration) at all. And conversely, if you have (a bounded amount of) "high-value" traffic, you could configure protected resources for that.
If your redundancy solution is at Layer 3, you have to have the policies in place that you don't run much over 10G across your dual 10G links or you're back to effectively giving up all redundancy.
The recommendation has a good core, but it's not that black&white. Let's say that whatever exceeds the 10G should be low-value and extremely congestion-responsive traffic. NNTP (server/server) and P2P file sharing traffic are examples for this category. Both application types (NetNews and things like BitTorrent) even have application-level congestion responsiveness beyond what TCP itself provides: When a given connection has bad throughput, the application will prefer other, hopefully less congested paths. -- Simon.