Richard, you have made some good points in this thread. One general observation, and then specific responses ... I don't assert that current route optimization technology solves ALL routing problems, but do think that there are some specific problems that automation can effectively, and gracefully solve.
* The inability to receive FULL bgp routes from every bgp peer to your optimization box without requiring your transit providers to set up a host of eBGP Multihop sessions (which most refuse to do). This means you will always be stuck assuming that every egress path is a transit and can reach any destination on the Internet until your active or passive probing says otherwise.
The issue that you describe does indeed offer some constraints to the application of route optimization technology. Within the scope of this issue, though, I think that you would agree that a network which is ALL transit would face no challenge here -- and more specifically, if there is a routing optimization decision among local transit links, that problem could be solved independantly of the existance of "non-transit" links. Applying this technology in the presence of "non- transit" routes requires constraining measurments to only the prefixes appropriate for a given link. It is true that knowing all BGP routes ("BGP Losers") would be a nice way to get this information ... but it's not necessarily the only approach towards the goal. Some solutions may have topological dependancies, but it can be feasible to simply drop all measurement towards "illegal" destinations. In other cases, it may be possible to define the set of destinations that are legal over a given link, and constrain measurements for that link.
* The requirement of deaggregation in order to make best path decisions effective. For example, someone's T3 to genuithree gets congested and the best path to their little /24 of the Internet is through another provider. Do you move 4.0.0.0/8?
Perhaps. Yes, it's a /8. But if measurements to the /8 show better collective performance over another link, why NOT move it? Yes, it could be carrying a lot of traffic, and could result in congesting the next link ... so it is necessary to be able to: - know when links are at/near capacity, and so avoid their use; and - react quickly in case of congestion Note that these problems are not specific to /8s, and that traffic loads are dynamic - even if it does look like there is "room" for a prefix on a link, once the route gets changed, conditions could very well change also. Any route optimization system needs to deal with these issues for ALL prefixes. There are multiple levels of optimization possible on top of this: a) If there is a general belief that /8s are simply "too big" to move, they can be manually deaggregated. Our experience shows that by breaking up a /8 into as few as (10) or (15) carefully designed "chunks", the resultant load per (deaggregated) prefix becomes equivalent to hundreds of other prefixes. b) If manually configuring deaggregates is not desirable, automated approaches to deaggregation are possible: "If I see traffic in this range, and a /xx does not exist for the observed traffic, then create the /xx". c) Dynamically measure all of the possible deaggregations of all active space, and dynamically determine which prefixes need to be deaggregated to what level. Note that in any of the above cases, the de-aggregated routes should be marked NO_EXPORT. I know of solid commercial implementations of (a) and (b). (c) is a more interesting project ... :)
* The constant noise of stupid scripts pinging everything on the Internet.
Pinging the Internet is clearly a wasteful approach. Essentially no one needs optimization to the ENTIRE Internet. Granted, major backbones probably actually use a great deal of the routing table ... (Quiz for the list readers: What percentage of the Internet routing table does your network actually use?) ... but for many ISP/hosting facility/major multihomed enterprise, our experience shows that only a very small fraction of traffic is seen beyond about (20,000-30,000) routes in a given day. There is no reason to measure destinations unless they are involved with traffic to your network. Basing measurements on observed traffic, or having applications instrumented to automatically generate their own measurement are both "clean" options here. Companies and ISPs today spend time(=money) managing their connectivity to the Internet. Loop-free connectivity is a basic first step; but in many cases real connectivity goals include: - Capacity management (especially in the presence of asymmetrical bandwidth) - Load management (in the case of usage-based billig) - Performance management (realizing 'best possible' performance) - Maximizing application availability (fastest possible reroute, in the case of congestive failure) Manually tweaking routing policies to achieve these goals is a time-honored craft (especially with this crowd :) ... but I suspect that even the most experienced in this area will acknowledge that there is a tier of this problem that may be best automated. (Note that I said "a tier" -- there are clearly additional problems that current route optimization technology DOESN'solve. :) cheers -- Sean