On Fri, Jan 23, 2004 at 11:01:14AM -0800, Richard J. Sears wrote:
In reality, I learned that BGP is simply not up to the task of handling anything beyond its limited scope - best path routing. In today's world, we need to look beyond best path as it simply has nothing to do with best performance, at least not in 40 to 50% of my traffic routing decisions. You can do that with bodies (if your a purest) or you can utilize route optimization equipment. In either case, you have to do it.
I think for the time being, route optimization equipment, and the companies that utilize them will have an edge over those doing things the manual way. Regardless of which box I could have chosen, the end result is that myself and my backbone engineers have far more time on their hands for other tasks and my customers are much happier than they were before.
BGP is relatively good at determining the best path when you a major carrier with connectivity to "everyone" (i.e. when traffic flows "naturally"), in many locations, and you engineer your network so that you have sufficient capacity to support the traffic flows. However, BGP is relatively BAD at determining the best path when you are the customer of many carriers, some of whom have serious problems on their network that they spend a lot of time and effort trying to hide from you, and when you have a diverse assortment of link speeds. In this setup, traffic does not flow "naturally". I often find myself spending a fair amount of time talking people down from trying to make their network "better" by buying transit from every carrier they can get their hands on. A single flapping session on a single transit can get you dampened for quite a while, making you only as strong as your weakest link. Also, the convergence becomes painfully slow, not to mention flaptacular, as best paths are computed, announced, re-computed, re-announced, re-re-computed, etc (and if you don't believe me watch Internap converge some time). Plus if you are an inbound heavy network, the localpref increase via certain paths (everyone localprefs their own customers above routes they hear from peers/transits) will cause a skew in traffic that prepending may have little to no influence over. Botton line, BGP is most useful when you select paths as naturally as possible, with as few transits are as needed for redundancy, and use equal-sized pipes with sufficient capacity to support the traffic flow (or where you make capacity decisions based on the traffic levels, not the other way around). When you try to force BGP to work with the model you described, it will go kicking and screaming. Now this isn't to say that even the best run carrier doesn't have their off days, and that there is potential benefit from having many different carriers to choose from, but it does almost REQUIRE a different system of path selection to be effective. Unfortunately there are some serious problems to overcome in order for any such system to scale, not the least of which are: * The inability to receive FULL bgp routes from every bgp peer to your optimization box without requiring your transit providers to set up a host of eBGP Multihop sessions (which most refuse to do). This means you will always be stuck assuming that every egress path is a transit and can reach any destination on the Internet until your active or passive probing says otherwise. * The requirement of deaggregation in order to make best path decisions effective. For example, someone's T3 to genuithree gets congested and the best path to their little /24 of the Internet is through another provider. Do you move 4.0.0.0/8? * The constant noise of stupid scripts pinging everything on the Internet. Once upon a time I heard some pretty interesting numbers about the amount of traffic a newly routed /8 with no usage received just in Internet noise from all the scanners, hackers, and worms out there. I don't know if it was true or not (though I'm sure someone on this list has done such and can tell us exactly how much traffic it is), but just looking at the amount of noise much smaller blocks receive leads one to the conclusion that active analysis will not scale to support everyone. etc etc etc. There is certainly room for improvement of traffic engineering in the protocols, but the perl scripts and zebra hacks most people are throwing at the problem currently are far from capable of handling it. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)