unscientific multi-homing survey: results
Results of my unscientific straw poll for multi-homed operators, as requested by a few people. Some people gave multiple answers to questions, and some didn't answer particular questions at all. Figures in brackets indicate frequency of particular responses. Like I said, unscientific :) Thanks to those who sent me mail. Joe On Tue, Jun 26, 2001 at 09:15:09AM -0400, Joe Abley wrote:
If you currently multi-home between two or more providers using IPv4:
+ why do you do it? What are the high-level goals you are hoping to achieve? Are you finding that you are achieving them?
redundancy (3) load-balancing (2) "shorter paths", performance, better throughput for users (4) geographic redundancy (2) provider redundancy (3) we are achieving these objectives (7)
+ how often does your traffic shift between transit providers due to a failure of some kind triggering a re-homing?
a couple of times per day (1) once per week (1) once per month (1) a couple of times per month (2) we don't notice when it shifts (1) rarely (1) once per six months per circuit (1)
+ how often do you manually shift traffic, and why? Just inbound traffic? Or outbound too?
never (1) normally never (1) once per month (1) occasionally (1) when we get complaints from specific netblock/asns that we can fix (1) when we need to due to transient heavy traffic loads (e.g. a promotion) (1) rarely (three or four times a year) (1) to rebalance costs after some kind of DoS attack (1) (never) because we are overprovisioned with capacity and traffic shifting isn't necessary (1) to optimise use of various peering links (1) manually setting best paths for various ASes (where BGP doesn't see the best path) (1) to avoid some specific poor connectivity on a short path (1) both inbound and outbound (2)
+ what impact does a manual or automatic shift in traffic between providers have to your users? Do their TCP sessions break? Or do you think they normally stay alive, maybe after a delay? What makes you think this?
people don't normally notice the transition (1) transparent if done right (1) stuff breaks a bit, but most TCP sessions remain up (1) minimal impact (1) for 90% of people the sessions sit for a while and then resume, based on feedback in a support forum (1) there is a several minute delay as the BGP tables across the internal core routers adjust (1) VPN users get kicked out when there is a transition, since we don't have provider-independent space and the addresses associated with the down provider become unreachable (1) most traffic is HTTP which is not really affected as long is there isn't too much flapping (1) assume all drop and take any that don't as a bonus (1) no detectable effect (1) we have never had a ticket opened which resolved to TCP session breakage due to a re-homing transition (1)
+ what is *bad* about the way that multi-homing works in IPv4?
need more control of incoming traffic (1) hard to limit incoming via upstream, although outgoing preference is easy (1) nothing that needs fixing at layer 3 (1) withdrawing and advertersing routes socks the network too much (1) synchronisation takes too long, and sucks up too much router cpu (1) nothing (1) bgp is unintelligent, and requires too much manual tweaking to find the "best" route (1) finding providers which will let you multi-home is hard (1) needing provider-independent address space (1) it can get ugly (1) router vendors need to make more boxes with more memory (1) i would like to have confidence in using RRs more (1) having to trust and cooperate with your peers (1) bgp convergence sucks (1)
+ what is good about it?
it works (4) it can be done using lots of devices "besides just cisco" (1) it's pretty obvious how it works, and even level-1 ticket takers at backbone providers understand it most of the time (1) it allows us to increase performance, and we like redundancy (1) "The redundancy! The multiple paths! The ability to choose." (1) reliability and shorter paths (1)
participants (1)
-
Joe Abley