On Mon, Jan 26, 2004 at 10:58:49AM -0800, Sean Finn wrote:
The issue that you describe does indeed offer some constraints to the application of route optimization technology. Within the scope of this issue, though, I think that you would agree that a network which is ALL transit would face no challenge here -- and more specifically, if there is a routing optimization decision among local transit links, that problem could be solved independantly of the existance of "non-transit" links.
Just noting why it will never be anything other than a small customer transit-only solution. As long as you are guaranteed by design that your product will never be applicable to large networks or networks with any peering, you know that odds are VERY slim you'll ever have anyone with real network clue using the product. Under such conditions, snake oil sales flurish.
Applying this technology in the presence of "non- transit" routes requires constraining measurments to only the prefixes appropriate for a given link. It is true that knowing all BGP routes ("BGP Losers") would be a nice way to get this information ... but it's not necessarily the only approach towards the goal. Some solutions may have topological dependancies, but it can be feasible to simply drop all measurement towards "illegal" destinations.
In other cases, it may be possible to define the set of destinations that are legal over a given link, and constrain measurements for that link.
Good luck making this scale. :)
* The requirement of deaggregation in order to make best path decisions effective. For example, someone's T3 to genuithree gets congested and the best path to their little /24 of the Internet is through another provider. Do you move 4.0.0.0/8?
Perhaps. Yes, it's a /8. But if measurements to the /8 show better collective performance over another link, why NOT move it? Yes, it could be carrying a lot of traffic, and could result in congesting the next link ... so it is necessary to be able to:
- know when links are at/near capacity, and so avoid their use; and
- react quickly in case of congestion
What is broken for one provider and fixed at another may very well break something else that was working before at the first provider, yes? Besides the difficulties of assigning a true metric to the overall reachability of a /8 or any aggregate for that matter ("ok we decreased rtt by 20ms to these 3 destinations doing 15Mbps each but we increased rtt to this other destination doing 40Mbps by 60ms so we're better right?"), do you really want to see the problems you are supposed to be solving with optimized routing popping up and going away again throughout the day? And yes you do bring up another valid point, how much of the congestion you're trying to avoid is caused by your own traffic? If the answer is none you're fine, but this by definition means the failure of your optimized routing product. If it is a success you will either a) have people with lots of traffic using it, or b) have so many small-traffic users that the collective decisions of your box become the "huge user". The problems then become: * The quicker you try to react, the more you place yourself at risk of starting a best path flap cycle. * Congestion does not only happen on your uplink circuit, it can happen at every point along the path, including peers, backbone circuits, and even the end user/site links. While I find the sales pitches of people touting the horrors of peering to be quite sad (from Internap to the classic MAE Dulles :P), peering capacity is largely based on the ability to predict the traffic levels far in advance. It doesn't take that many "large" customers selecting certain destinations through one provider at once to blow up a peer in one region. Balancing the traffic of a GigE and a couple of FastE transits to keep each one uncongested may be enough functionality to sell some boxes to some low end users, but this falls into the categories I've described above, and does nothing to address the true end to end performance. Thus the only real solution to the problem if you actually want to optimize traffic is:
c) Dynamically measure all of the possible deaggregations of all active space, and dynamically determine which prefixes need to be deaggregated to what level.
Note that in any of the above cases, the de-aggregated routes should be marked NO_EXPORT.
Throw away the BGP routing table completely, and build your own based on the topology and metrics you have detected. Of course, this means saying goodbye to the usual failsafe method of keeping the normal BGP routes in the table with a lower localpref so if the box falls over you just fail back to normal BGP path selection. And probably more importantly, there isn't enough scale in the traffic probing system to gather the necessary topology info once for every customer... Maybe if you made everyone's boxes report data back to a central site, you could gather something useful from it.
Pinging the Internet is clearly a wasteful approach. Essentially no one needs optimization to the ENTIRE Internet. Granted, major backbones probably actually use a great deal of the routing table ...
(Quiz for the list readers: What percentage of the Internet routing table does your network actually use?)
... but for many ISP/hosting facility/major multihomed enterprise, our experience shows that only a very small fraction of traffic is seen beyond about (20,000-30,000) routes in a given day.
There is no reason to measure destinations unless they are involved with traffic to your network. Basing measurements on observed traffic, or having applications instrumented to automatically generate their own measurement are both "clean" options here.
The usage numbers sound about right, and targetting only destinations where you actually exchange traffic is certainly a big improvement over not, but it's still going to generate a lot of noise for active traffic destinations. But I guess there are always passive measurement alternatives, like measuring the of a gif customers have to link on their websites *cough*. :)
Manually tweaking routing policies to achieve these goals is a time-honored craft (especially with this crowd :) ... but I suspect that even the most experienced in this area will acknowledge that there is a tier of this problem that may be best automated. (Note that I said "a tier" -- there are clearly additional problems that current route optimization technology DOESN'solve. :)
I doubt you'll find anyone here who will stand up and admit to enjoying tweaking metrics and policies more often than once a month. The problem with interest from most of this crowd (or at least "those of this crowd who actually run networks", which probably doesn't qualify as most any more) is simply that none of the product and very little of the technology applies to the networks they run or the work they have to do. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)