Re: Outbound Route Optimization

26 Jan 2004

      On Mon, Jan 26, 2004 at 10:58:49AM -0800, Sean Finn wrote:
...
The issue that you describe does indeed offer some constraints to the
application of route optimization technology. Within the scope of this
issue, though, I think that you would agree that a network which is ALL
transit would face no challenge here -- and more specifically, if there
is a routing optimization decision among local transit links, that
problem could be solved independantly of the existance of "non-transit"
links.
Just noting why it will never be anything other than a small customer
transit-only solution. As long as you are guaranteed by design that your
product will never be applicable to large networks or networks with any
peering, you know that odds are VERY slim you'll ever have anyone with
real network clue using the product. Under such conditions, snake oil
sales flurish.
...
Applying this technology in the presence of "non- transit" routes
requires constraining measurments to only the prefixes appropriate for a
given link. It is true that knowing all BGP routes ("BGP Losers") would
be a nice way to get this information ...  but it's not necessarily the
only approach towards the goal. Some solutions may have topological
dependancies, but it can be feasible to simply drop all measurement
towards "illegal" destinations.
In other cases, it may be possible to define the set of destinations
that are legal over a given link, and constrain measurements for that
link.
Good luck making this scale. :)
...
...
* The requirement of deaggregation in order to make best path decisions 
effective. For example, someone's T3 to genuithree gets congested and the 
best path to their little /24 of the Internet is through another provider. 
Do you move 4.0.0.0/8?
Perhaps. Yes, it's a /8. But if measurements to the /8 show
better collective performance over another link, why NOT 
move it? Yes, it could be carrying a lot of traffic, and 
could result in congesting the next link ... so it is 
necessary to be able to:
- know when links are at/near capacity, 
    and so avoid their use; and
- react quickly in case of congestion
What is broken for one provider and fixed at another may very well break
something else that was working before at the first provider, yes? Besides
the difficulties of assigning a true metric to the overall reachability of
a /8 or any aggregate for that matter ("ok we decreased rtt by 20ms to
these 3 destinations doing 15Mbps each but we increased rtt to this other
destination doing 40Mbps by 60ms so we're better right?"), do you really 
want to see the problems you are supposed to be solving with optimized 
routing popping up and going away again throughout the day?

And yes you do bring up another valid point, how much of the congestion
you're trying to avoid is caused by your own traffic? If the answer is 
none you're fine, but this by definition means the failure of your 
optimized routing product. If it is a success you will either a) have 
people with lots of traffic using it, or b) have so many small-traffic 
users that the collective decisions of your box become the "huge user".

The problems then become:

 * The quicker you try to react, the more you place yourself at risk of 
   starting a best path flap cycle.

 * Congestion does not only happen on your uplink circuit, it can happen 
   at every point along the path, including peers, backbone circuits, and 
   even the end user/site links. While I find the sales pitches of people 
   touting the horrors of peering to be quite sad (from Internap to the
   classic MAE Dulles :P), peering capacity is largely based on the 
   ability to predict the traffic levels far in advance. It doesn't take 
   that many "large" customers selecting certain destinations through one 
   provider at once to blow up a peer in one region.

Balancing the traffic of a GigE and a couple of FastE transits to keep
each one uncongested may be enough functionality to sell some boxes to 
some low end users, but this falls into the categories I've described 
above, and does nothing to address the true end to end performance.

Thus the only real solution to the problem if you actually want to
optimize traffic is:
...
c) Dynamically measure all of the possible 
     deaggregations of all active space, and dynamically
     determine which prefixes need to be deaggregated
     to what level.
Note that in any of the above cases, the de-aggregated 
routes should be marked NO_EXPORT.
Throw away the BGP routing table completely, and build your own based on
the topology and metrics you have detected. Of course, this means saying
goodbye to the usual failsafe method of keeping the normal BGP routes in
the table with a lower localpref so if the box falls over you just fail
back to normal BGP path selection. And probably more importantly, there
isn't enough scale in the traffic probing system to gather the necessary
topology info once for every customer... Maybe if you made everyone's 
boxes report data back to a central site, you could gather something 
useful from it.
...
Pinging the Internet is clearly a wasteful approach. Essentially
no one needs optimization to the ENTIRE Internet. Granted, major
backbones probably actually use a great deal of the routing 
table ...
(Quiz for the list readers: 
   What percentage of the Internet routing table does 
   your network actually use?)
... but for many ISP/hosting facility/major multihomed
enterprise, our experience shows that only a very small
fraction of traffic is seen beyond about (20,000-30,000)
routes in a given day.
There is no reason to measure destinations unless they 
are involved with traffic to your network. Basing 
measurements on observed traffic, or having applications 
instrumented to automatically generate their own measurement 
are both "clean" options here.
The usage numbers sound about right, and targetting only destinations
where you actually exchange traffic is certainly a big improvement over
not, but it's still going to generate a lot of noise for active traffic
destinations.

But I guess there are always passive measurement alternatives, like 
measuring the of a gif customers have to link on their websites *cough*. 
:)
...
Manually tweaking routing policies to achieve these goals is a
time-honored craft (especially with this crowd :) ... but I suspect that
even the most experienced in this area will acknowledge that there is a
tier of this problem that may be best automated. (Note that I said "a
tier" -- there are clearly additional problems that current route
optimization technology DOESN'solve. :)
I doubt you'll find anyone here who will stand up and admit to enjoying
tweaking metrics and policies more often than once a month. The problem
with interest from most of this crowd (or at least "those of this crowd
who actually run networks", which probably doesn't qualify as most any
more) is simply that none of the product and very little of the technology
applies to the networks they run or the work they have to do.

-- 
Richard A Steenbergen <ras@e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)