Outbound Route Optimization
Hello, I am trying to determine for myself the relevance of Intelligent Routing Devices like Sockeye, Route Science etc. I am not trying to determine who does it better, but rather if the concept of optimizing routes is addressing a significant problem in terms of improved traffic performance ( not in cost savings of disparate transit pipes ) I am interested in hearing other views ( both for and against ) these devices in the context of optimizing latency for a small multi-homed ISP. I want to make sure I understand their context correctly and have not missed any important points of view. My questions are these: "Is sub-optimal routing caused by BGP so pervasive it needs to be addressed?" "Are these devices able to effectively address the need?" Thanks, Jim
On Jan 21, 2004, at 3:27 PM, Jim Devane wrote:
Hello,
I am trying to determine for myself the relevance of Intelligent Routing Devices like Sockeye, Route Science etc. I am not trying to determine who does it better, but rather if the concept of optimizing routes is addressing a significant problem in terms of improved traffic performance ( not in cost savings of disparate transit pipes )
I am interested in hearing other views ( both for and against ) these devices in the context of optimizing latency for a small multi-homed ISP. I want to make sure I understand their context correctly and have not missed any important points of view.
My questions are these:
“Is sub-optimal routing caused by BGP so pervasive it needs to be addressed?”
“Are these devices able to effectively address the need?”
BGP makes no decisions based on "quality" of a route. If you are using anything that's dependent on low latency/packet loss/jitter (eg, VoIP, games, ssh for someone who gets annoyed by >20ms of latency, etc), there's lots of room for improvement, especially when you are buying from "bargain" transit's. Everyone I know who's used a device like Sockeye, Route Science, etc, falls into one of two categories. 1) For reasons unrelated to them owning said device, I consider them to be generally lacking clue. 2) They hated it. I've never used one myself, but based on testimonials like that, I'd tend to say that they generally don't work too well. If you hire a consultant who knows what they're doing, it should be pretty simple to set up a meaningful routing policy which does this for you. Just my $0.02 --Phil Rosenthal ISPrime, Inc.
My questions are these:
"Is sub-optimal routing caused by BGP so pervasive it needs to be addressed?"
that depends on your isp, and whether their routing policies (openness or closedness of peering, shortest vs. longest exit, respect for MEDs) are a good match for their technology/tools, skills/experience, and resources/headroom.
"Are these devices able to effectively address the need?"
some of the devices i've seen will address some of the weaknesses in some of the isp's i've seen. however, and more to what i think is the point here, none of the devices i've seen will make an isp better since (a) tools alone can't help, and (b) this isn't the tool that's missing. and now for the question you didn't ask... "why not?" controlling which paths you install based on any kind of observational or predictive metrics is theoretically only going to be as good as those metrics, which is usually not very good. but there's another limit, which is bgp path symmetry. most tcp implementations are still stone-aged compared to what the ietf recommends in terms of congestion avoidance and output timing, and are therefore pretty dependent on overall isochrony and on symmetric congestion/latency. let's say that you had ideal metrics for deciding which path to install -- your overall performance would then be limited by what other people chose to install as their path toward you. (experience says they're not going to trust your MEDs even if they're close enough to hear them.) -- Paul Vixie
On Wed, Jan 21, 2004 at 09:05:46PM +0000, Paul Vixie wrote:
My questions are these:
"Is sub-optimal routing caused by BGP so pervasive it needs to be addressed?"
that depends on your isp, and whether their routing policies (openness or closedness of peering, shortest vs. longest exit, respect for MEDs) are a good match for their technology/tools, skills/experience, and resources/headroom.
In practice, all of the above just turn out to be marketing sauce or in some cases, outright lies. There is no substitute for dollar spend (opex and capex) to make a network perform. There is no magic sauce, there is no silver bullet. You have adequate resources, you will have adequate performance.
metrics, which is usually not very good. but there's another limit, which is bgp path symmetry. most tcp implementations are still stone-aged
AKA optimizing for outbound doesn't do you any good on optimizing for inbound.
(experience says they're not going to trust your MEDs even if they're close enough to hear them.)
Most people don't trust MEDs for a reason paul, and it is not because they want to mess with your customers. /vijay
On Jan 21, 2004, at 4:20 PM, vijay gill wrote:
On Wed, Jan 21, 2004 at 09:05:46PM +0000, Paul Vixie wrote:
My questions are these:
"Is sub-optimal routing caused by BGP so pervasive it needs to be addressed?"
that depends on your isp, and whether their routing policies (openness or closedness of peering, shortest vs. longest exit, respect for MEDs) are a good match for their technology/tools, skills/experience, and resources/headroom.
In practice, all of the above just turn out to be marketing sauce or in some cases, outright lies.
There is no substitute for dollar spend (opex and capex) to make a network perform. There is no magic sauce, there is no silver bullet. You have adequate resources, you will have adequate performance.
I dunno if the last sentence is a type-o or not, but it is definitely incorrect in at least some cases. Having "adequate resources" in no way guarantees "adequate performance". (Unless you define "resources" to include the political clout to override business decisions which help the bottom line but hurt performance - e.g. not peering with a network because they are too small.) OTOH, having inadequate resources does give you a near perfect chance of having inadequate performance.
(experience says they're not going to trust your MEDs even if they're close enough to hear them.)
Most people don't trust MEDs for a reason paul, and it is not because they want to mess with your customers.
There are a variety of reasons for not listening to MEDs, including political reasons which may not be in the best interest of performance, or even may be detrimental to performance. I've found most people willing to put in the time & effort to give you MEDs will give reasonably good MEDs. It also seems the hight of hubris to assume you know what is happening inside someone else's network better than the people who run that network. At least IMHO.... In any case, no matter how many resources or black boxes you have, you cannot guarantee good performance on the 'Net. Too many people involved over which you have no control. Even if you had control, BGP is not the right tool to exert such control in all cases. -- TTFN, patrick
On Thu, 22 Jan 2004, Patrick W.Gilmore wrote:
In any case, no matter how many resources or black boxes you have, you cannot guarantee good performance on the 'Net. Too many people involved over which you have no control. Even if you had control, BGP is not the right tool to exert such control in all cases.
Even more reason for people to buy the Sugar Mountain RouteMaster5000. No matter how good the claims are, you still end up with humans in the mix dictating "policy" of some sort over packets.
On Wed, Jan 21, 2004 at 12:27:16PM -0800, Jim Devane wrote:
"Are these devices able to effectively address the need?"
Sugar pills effectively address the needs of a great many ailments when given to people who believe that they will work. And if the end result is an addressed need, who are we to say that it wasn't worth paying for. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
On Wed, 21 Jan 2004, Richard A Steenbergen wrote:
On Wed, Jan 21, 2004 at 12:27:16PM -0800, Jim Devane wrote:
"Are these devices able to effectively address the need?"
Sugar pills effectively address the needs of a great many ailments when given to people who believe that they will work. And if the end result is an addressed need, who are we to say that it wasn't worth paying for. :)
That sounds like a yes answer. That being said, the Sugar Mountain RouteMaster5000 is probably the best unit out there. It has lots of blinking lights, and sets the "low latency" bits on most types of IP traffic that needs high prioritization over regular internet traffic. It can speed up your network traffic up to %1000, but the results may vary depending on your packet mix, and in that case, it doesn't change your traffic patterns at all.
On Wed, Jan 21, 2004 at 02:30:19PM -0800, Tom (UnitedLayer) wrote:
On Wed, 21 Jan 2004, Richard A Steenbergen wrote:
On Wed, Jan 21, 2004 at 12:27:16PM -0800, Jim Devane wrote:
"Are these devices able to effectively address the need?"
Sugar pills effectively address the needs of a great many ailments when given to people who believe that they will work. And if the end result is an addressed need, who are we to say that it wasn't worth paying for. :)
That sounds like a yes answer. That being said, the Sugar Mountain RouteMaster5000 is probably the best unit out there. It has lots of blinking lights, and sets the "low latency" bits on most types of IP traffic that needs high prioritization over regular internet traffic. It can speed up your network traffic up to %1000, but the results may vary depending on your packet mix, and in that case, it doesn't change your traffic patterns at all.
I don't know if they're doing the same thing in Cali or not (they probably are, since all the radio stations are owned by the same 2 companies), but here in NoVA land there is currently a massive radio ad campaign for a Rocky Mountain Radar radar-jamming product called the Phazer, which claims to jam police radar (legally, because it doesn't actually put out any RF, it is entirely passive) "or they'll pay for your ticket". When faced with such a deal, I have heard many people say "how could they possibly back it up that kind of a promise if it didn't work, they would be losing money left and right paying for people's tickets". Then I recall the quote from the inventor, "I could ship an empty black box with a weight in the bottom and only get 22-24 percent back." If ever there was a market for voodoo science products, it was IP transit. It is a big-bucks industry, the consumer can almost never see what is really going on behind the scenes with their providers (thanks to loads of NDAs), for the most part they have no real idea what they're doing (but they like to think they do), and they have been mislead into thinking that all IP transit is the same -- simply a commodity. A fool and his money, and all that... Oh well, at least web hosting is still worse (ever notice that EVERY hoster has an OC192 backbone, even the ones with 2 machines and a 10Mbps hub?). :P -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
On Wed, 21 Jan 2004, Richard A Steenbergen wrote:
I don't know if they're doing the same thing in Cali or not (they probably are, since all the radio stations are owned by the same 2 companies),
Yeah, NPR and CBS, both monopolistic empires with the same viewpoint :)
but here in NoVA land there is currently a massive radio ad campaign for a Rocky Mountain Radar radar-jamming product called the Phazer, which claims to jam police radar (legally, because it doesn't actually put out any RF, it is entirely passive) "or they'll pay for your ticket".
I've heard of this, and I believe I know some people who've invested. They wanted to diverisfy from the IP Transit biz, and go into 100% pure sales and marketing. I believe the quote was "man, its hard to sell hosting off of an OC768 with my 2600 powered network, I'm going to greener pastures"
Then I recall the quote from the inventor, "I could ship an empty black box with a weight in the bottom and only get 22-24 percent back."
I've heard that before, but it sounds so much better in russian.
Oh well, at least web hosting is still worse (ever notice that EVERY hoster has an OC192 backbone, even the ones with 2 machines and a 10Mbps hub?). :P
Yeah, the market for OC192 -> 10Mbps Ethernet is really booming! Those puppies fly off the shelves like fleeing rats.
Hello,
I am trying to determine for myself the relevance of Intelligent Routing Devices like Sockeye, Route Science etc. I am not trying to determine who does it better, but rather if the concept of optimizing routes is addressing a significant problem in terms of improved traffic performance ( not in cost savings of disparate transit pipes )
An alternative to using such devices would be to tune the BGP configuration of the routers. See below for a description of an algorithm that allows to determine the optimum configuration of BGP routers for some traffic objectives [UBQ03] S. Uhlig, O. Bonaventure, and B. Quoitin. Interdomain traffic engineering with minimal BGP configurations. In 18th International Teletraffic Congress (ITC), September 2003. http://totem.info.ucl.ac.be/publications.html This work is being pursued in the framework of a governement-funded three-years research project, where we are developping an open-source traffic engineering toolbox. This objective of this toolbox is to provide a set of tools that can be used by ISPs and entreprise networks to optimize their intradomain and interdomain traffic flow. The first release of the toolbox is planned for november 2004. To help us fit this toolbox to the needs of ISP and enterprise networks, we would appreciate if you could fill our survey at : http://totem.info.ucl.ac.be/te_form.html Best regards, Olivier Bonaventure -- CSE Dept. UCL, Belgium - http://www.info.ucl.ac.be/people/OBO/
I have been on a personal crusade for the last 8 months to address this very issue! We identified the exact same issues and questions as we grew from a single backbone to 7 backbones, each of various sizes ranging from gig connections to DS3s. In total I have almost 3GB of total available capacity, but two small DS3 links make routing decisions very interesting :-) It was becoming a nightmare for my engineers to manage the BGP for all of these backbones in such a way that dealt with both the business case as well as the performance case. In the end, it was becoming a customer service problem when we had spikes that saturated some of our smaller links and left our larger links untouched. BGP simply did not care about my capacity issues. In our specific setting, we are an ISP that buys all of our connectivity, and has spent a tremendous amount of time searching for total connectivity as opposed to total capacity. While most of our bandwidth per mb costs the same, our commit levels with our different carriers are different and required constant vigilance to maintain the levels we needed to see without overloading any particular link. We have no private peering at all. After some very unfortunate dealings with a bandwidth provider in the "performance based routing" business, I decided to do it on my own. Its important to note that in my world, my mandate was simple - get us the best possible performance from our network as you can possibly get. Worry about cost after performance. We house some large VoIP, Gaming and E-Commerce farms and cost was the lowest concern on our plate - keeping the customers happy was the primary concern. I started out by going from 2 backbones to buying backbone bandwidth from a total of 7 carriers, spreading those out among Cisco 7507s and Juniper M20s and basically relying on BGP and my engineering staff to monitor and manage those resources. In the end I discovered that it was a huge job to keep all of those balls in the air while not upsetting some of our larger customers, I spent months researching and talking to friends that drive some of the largest networks in the world. In the end, it was very clear to see that BGP was not up to the task of dealing with my network requirements. Best path simply did not equate to best performance and BGP had no provisions for determining saturation on my links. My engineers and I spent months talking to vendor after vendor about their products, doing research and trying to find the closest thing to a 'silver bullet' that we could find. An engineer friend of mine at Google turned me onto RouteScience and we put them into the mix of vendors we were testing. Our needs were simple - 100% performance based routing until we came within 15% max utilization on any given backbone, then next best performance path. In my world, cost based routing was the last thing we needed to deal with. We enlisted the help of several of our larger data center customers in a kind of blind trial of the various manufacturers as well as utilized KeyNote locations around the world for testing. After four months of testing and evaluation, we choose the RouteScience box. In my mind, the question about utilizing route optimization boxes is moot. Until we build into BGP (or some other method) the ability to sense latency and capacity issues, optimize bandwidth allocation based on our preferences, and maintain service level agreements by keeping our traffic heading down the best performance path automatically, we have to employ and dedicate an increasing number of engineers to these tasks. Route Optimization equipment plays a critical part in keeping my customers happy and myself and my other expensive engineers focused on other tasks more closely related to the bottom line. No smoke, no mirrors, no BS - these are real world numbers from our network. For me the proof was in the performance. After four months of baseline reporting, we were seeing an average performance increase (measure in decrease in latency) of 40 to 50% between the routes my pathcontrol box is selecting and standard BGP routes. My backbones include carriers such as Level3, UUNet, Qwest, XO, Verio - decent backbones with major connectivity. In reality, I learned that BGP is simply not up to the task of handling anything beyond its limited scope - best path routing. In today's world, we need to look beyond best path as it simply has nothing to do with best performance, at least not in 40 to 50% of my traffic routing decisions. You can do that with bodies (if your a purest) or you can utilize route optimization equipment. In either case, you have to do it. I think for the time being, route optimization equipment, and the companies that utilize them will have an edge over those doing things the manual way. Regardless of which box I could have chosen, the end result is that myself and my backbone engineers have far more time on their hands for other tasks and my customers are much happier than they were before. On Wed, 21 Jan 2004 12:27:16 -0800 "Jim Devane" <jim@powerpulse.cc> wrote:
Hello,
I am trying to determine for myself the relevance of Intelligent Routing Devices like Sockeye, Route Science etc. I am not trying to determine who does it better, but rather if the concept of optimizing routes is addressing a significant problem in terms of improved traffic performance ( not in cost savings of disparate transit pipes )
I am interested in hearing other views ( both for and against ) these devices in the context of optimizing latency for a small multi-homed ISP. I want to make sure I understand their context correctly and have not missed any important points of view.
My questions are these:
"Is sub-optimal routing caused by BGP so pervasive it needs to be addressed?"
"Are these devices able to effectively address the need?"
Thanks,
Jim
****************************************** Richard J. Sears Vice President American Digital Network ---------------------------------------------------- rsears@adnc.com http://www.adnc.com ---------------------------------------------------- 858.576.4272 - Phone 858.427.2401 - Fax ---------------------------------------------------- I fly because it releases my mind from the tyranny of petty things . . "Work like you don't need the money, love like you've never been hurt and dance like you do when nobody's watching."
On Fri, Jan 23, 2004 at 11:01:14AM -0800, Richard J. Sears wrote:
In reality, I learned that BGP is simply not up to the task of handling anything beyond its limited scope - best path routing. In today's world, we need to look beyond best path as it simply has nothing to do with best performance, at least not in 40 to 50% of my traffic routing decisions. You can do that with bodies (if your a purest) or you can utilize route optimization equipment. In either case, you have to do it.
I think for the time being, route optimization equipment, and the companies that utilize them will have an edge over those doing things the manual way. Regardless of which box I could have chosen, the end result is that myself and my backbone engineers have far more time on their hands for other tasks and my customers are much happier than they were before.
BGP is relatively good at determining the best path when you a major carrier with connectivity to "everyone" (i.e. when traffic flows "naturally"), in many locations, and you engineer your network so that you have sufficient capacity to support the traffic flows. However, BGP is relatively BAD at determining the best path when you are the customer of many carriers, some of whom have serious problems on their network that they spend a lot of time and effort trying to hide from you, and when you have a diverse assortment of link speeds. In this setup, traffic does not flow "naturally". I often find myself spending a fair amount of time talking people down from trying to make their network "better" by buying transit from every carrier they can get their hands on. A single flapping session on a single transit can get you dampened for quite a while, making you only as strong as your weakest link. Also, the convergence becomes painfully slow, not to mention flaptacular, as best paths are computed, announced, re-computed, re-announced, re-re-computed, etc (and if you don't believe me watch Internap converge some time). Plus if you are an inbound heavy network, the localpref increase via certain paths (everyone localprefs their own customers above routes they hear from peers/transits) will cause a skew in traffic that prepending may have little to no influence over. Botton line, BGP is most useful when you select paths as naturally as possible, with as few transits are as needed for redundancy, and use equal-sized pipes with sufficient capacity to support the traffic flow (or where you make capacity decisions based on the traffic levels, not the other way around). When you try to force BGP to work with the model you described, it will go kicking and screaming. Now this isn't to say that even the best run carrier doesn't have their off days, and that there is potential benefit from having many different carriers to choose from, but it does almost REQUIRE a different system of path selection to be effective. Unfortunately there are some serious problems to overcome in order for any such system to scale, not the least of which are: * The inability to receive FULL bgp routes from every bgp peer to your optimization box without requiring your transit providers to set up a host of eBGP Multihop sessions (which most refuse to do). This means you will always be stuck assuming that every egress path is a transit and can reach any destination on the Internet until your active or passive probing says otherwise. * The requirement of deaggregation in order to make best path decisions effective. For example, someone's T3 to genuithree gets congested and the best path to their little /24 of the Internet is through another provider. Do you move 4.0.0.0/8? * The constant noise of stupid scripts pinging everything on the Internet. Once upon a time I heard some pretty interesting numbers about the amount of traffic a newly routed /8 with no usage received just in Internet noise from all the scanners, hackers, and worms out there. I don't know if it was true or not (though I'm sure someone on this list has done such and can tell us exactly how much traffic it is), but just looking at the amount of noise much smaller blocks receive leads one to the conclusion that active analysis will not scale to support everyone. etc etc etc. There is certainly room for improvement of traffic engineering in the protocols, but the perl scripts and zebra hacks most people are throwing at the problem currently are far from capable of handling it. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Richard, you have made some good points in this thread. One general observation, and then specific responses ... I don't assert that current route optimization technology solves ALL routing problems, but do think that there are some specific problems that automation can effectively, and gracefully solve.
* The inability to receive FULL bgp routes from every bgp peer to your optimization box without requiring your transit providers to set up a host of eBGP Multihop sessions (which most refuse to do). This means you will always be stuck assuming that every egress path is a transit and can reach any destination on the Internet until your active or passive probing says otherwise.
The issue that you describe does indeed offer some constraints to the application of route optimization technology. Within the scope of this issue, though, I think that you would agree that a network which is ALL transit would face no challenge here -- and more specifically, if there is a routing optimization decision among local transit links, that problem could be solved independantly of the existance of "non-transit" links. Applying this technology in the presence of "non- transit" routes requires constraining measurments to only the prefixes appropriate for a given link. It is true that knowing all BGP routes ("BGP Losers") would be a nice way to get this information ... but it's not necessarily the only approach towards the goal. Some solutions may have topological dependancies, but it can be feasible to simply drop all measurement towards "illegal" destinations. In other cases, it may be possible to define the set of destinations that are legal over a given link, and constrain measurements for that link.
* The requirement of deaggregation in order to make best path decisions effective. For example, someone's T3 to genuithree gets congested and the best path to their little /24 of the Internet is through another provider. Do you move 4.0.0.0/8?
Perhaps. Yes, it's a /8. But if measurements to the /8 show better collective performance over another link, why NOT move it? Yes, it could be carrying a lot of traffic, and could result in congesting the next link ... so it is necessary to be able to: - know when links are at/near capacity, and so avoid their use; and - react quickly in case of congestion Note that these problems are not specific to /8s, and that traffic loads are dynamic - even if it does look like there is "room" for a prefix on a link, once the route gets changed, conditions could very well change also. Any route optimization system needs to deal with these issues for ALL prefixes. There are multiple levels of optimization possible on top of this: a) If there is a general belief that /8s are simply "too big" to move, they can be manually deaggregated. Our experience shows that by breaking up a /8 into as few as (10) or (15) carefully designed "chunks", the resultant load per (deaggregated) prefix becomes equivalent to hundreds of other prefixes. b) If manually configuring deaggregates is not desirable, automated approaches to deaggregation are possible: "If I see traffic in this range, and a /xx does not exist for the observed traffic, then create the /xx". c) Dynamically measure all of the possible deaggregations of all active space, and dynamically determine which prefixes need to be deaggregated to what level. Note that in any of the above cases, the de-aggregated routes should be marked NO_EXPORT. I know of solid commercial implementations of (a) and (b). (c) is a more interesting project ... :)
* The constant noise of stupid scripts pinging everything on the Internet.
Pinging the Internet is clearly a wasteful approach. Essentially no one needs optimization to the ENTIRE Internet. Granted, major backbones probably actually use a great deal of the routing table ... (Quiz for the list readers: What percentage of the Internet routing table does your network actually use?) ... but for many ISP/hosting facility/major multihomed enterprise, our experience shows that only a very small fraction of traffic is seen beyond about (20,000-30,000) routes in a given day. There is no reason to measure destinations unless they are involved with traffic to your network. Basing measurements on observed traffic, or having applications instrumented to automatically generate their own measurement are both "clean" options here. Companies and ISPs today spend time(=money) managing their connectivity to the Internet. Loop-free connectivity is a basic first step; but in many cases real connectivity goals include: - Capacity management (especially in the presence of asymmetrical bandwidth) - Load management (in the case of usage-based billig) - Performance management (realizing 'best possible' performance) - Maximizing application availability (fastest possible reroute, in the case of congestive failure) Manually tweaking routing policies to achieve these goals is a time-honored craft (especially with this crowd :) ... but I suspect that even the most experienced in this area will acknowledge that there is a tier of this problem that may be best automated. (Note that I said "a tier" -- there are clearly additional problems that current route optimization technology DOESN'solve. :) cheers -- Sean
On Mon, Jan 26, 2004 at 10:58:49AM -0800, Sean Finn wrote:
The issue that you describe does indeed offer some constraints to the application of route optimization technology. Within the scope of this issue, though, I think that you would agree that a network which is ALL transit would face no challenge here -- and more specifically, if there is a routing optimization decision among local transit links, that problem could be solved independantly of the existance of "non-transit" links.
Just noting why it will never be anything other than a small customer transit-only solution. As long as you are guaranteed by design that your product will never be applicable to large networks or networks with any peering, you know that odds are VERY slim you'll ever have anyone with real network clue using the product. Under such conditions, snake oil sales flurish.
Applying this technology in the presence of "non- transit" routes requires constraining measurments to only the prefixes appropriate for a given link. It is true that knowing all BGP routes ("BGP Losers") would be a nice way to get this information ... but it's not necessarily the only approach towards the goal. Some solutions may have topological dependancies, but it can be feasible to simply drop all measurement towards "illegal" destinations.
In other cases, it may be possible to define the set of destinations that are legal over a given link, and constrain measurements for that link.
Good luck making this scale. :)
* The requirement of deaggregation in order to make best path decisions effective. For example, someone's T3 to genuithree gets congested and the best path to their little /24 of the Internet is through another provider. Do you move 4.0.0.0/8?
Perhaps. Yes, it's a /8. But if measurements to the /8 show better collective performance over another link, why NOT move it? Yes, it could be carrying a lot of traffic, and could result in congesting the next link ... so it is necessary to be able to:
- know when links are at/near capacity, and so avoid their use; and
- react quickly in case of congestion
What is broken for one provider and fixed at another may very well break something else that was working before at the first provider, yes? Besides the difficulties of assigning a true metric to the overall reachability of a /8 or any aggregate for that matter ("ok we decreased rtt by 20ms to these 3 destinations doing 15Mbps each but we increased rtt to this other destination doing 40Mbps by 60ms so we're better right?"), do you really want to see the problems you are supposed to be solving with optimized routing popping up and going away again throughout the day? And yes you do bring up another valid point, how much of the congestion you're trying to avoid is caused by your own traffic? If the answer is none you're fine, but this by definition means the failure of your optimized routing product. If it is a success you will either a) have people with lots of traffic using it, or b) have so many small-traffic users that the collective decisions of your box become the "huge user". The problems then become: * The quicker you try to react, the more you place yourself at risk of starting a best path flap cycle. * Congestion does not only happen on your uplink circuit, it can happen at every point along the path, including peers, backbone circuits, and even the end user/site links. While I find the sales pitches of people touting the horrors of peering to be quite sad (from Internap to the classic MAE Dulles :P), peering capacity is largely based on the ability to predict the traffic levels far in advance. It doesn't take that many "large" customers selecting certain destinations through one provider at once to blow up a peer in one region. Balancing the traffic of a GigE and a couple of FastE transits to keep each one uncongested may be enough functionality to sell some boxes to some low end users, but this falls into the categories I've described above, and does nothing to address the true end to end performance. Thus the only real solution to the problem if you actually want to optimize traffic is:
c) Dynamically measure all of the possible deaggregations of all active space, and dynamically determine which prefixes need to be deaggregated to what level.
Note that in any of the above cases, the de-aggregated routes should be marked NO_EXPORT.
Throw away the BGP routing table completely, and build your own based on the topology and metrics you have detected. Of course, this means saying goodbye to the usual failsafe method of keeping the normal BGP routes in the table with a lower localpref so if the box falls over you just fail back to normal BGP path selection. And probably more importantly, there isn't enough scale in the traffic probing system to gather the necessary topology info once for every customer... Maybe if you made everyone's boxes report data back to a central site, you could gather something useful from it.
Pinging the Internet is clearly a wasteful approach. Essentially no one needs optimization to the ENTIRE Internet. Granted, major backbones probably actually use a great deal of the routing table ...
(Quiz for the list readers: What percentage of the Internet routing table does your network actually use?)
... but for many ISP/hosting facility/major multihomed enterprise, our experience shows that only a very small fraction of traffic is seen beyond about (20,000-30,000) routes in a given day.
There is no reason to measure destinations unless they are involved with traffic to your network. Basing measurements on observed traffic, or having applications instrumented to automatically generate their own measurement are both "clean" options here.
The usage numbers sound about right, and targetting only destinations where you actually exchange traffic is certainly a big improvement over not, but it's still going to generate a lot of noise for active traffic destinations. But I guess there are always passive measurement alternatives, like measuring the of a gif customers have to link on their websites *cough*. :)
Manually tweaking routing policies to achieve these goals is a time-honored craft (especially with this crowd :) ... but I suspect that even the most experienced in this area will acknowledge that there is a tier of this problem that may be best automated. (Note that I said "a tier" -- there are clearly additional problems that current route optimization technology DOESN'solve. :)
I doubt you'll find anyone here who will stand up and admit to enjoying tweaking metrics and policies more often than once a month. The problem with interest from most of this crowd (or at least "those of this crowd who actually run networks", which probably doesn't qualify as most any more) is simply that none of the product and very little of the technology applies to the networks they run or the work they have to do. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Richard A Steenbergen wrote:
The issue that you describe does indeed offer some constraints to the application of route optimization technology. Within the scope of this issue, though, I think that you would agree that a network which is ALL transit would face no challenge here -- and more specifically, if there is a routing optimization decision among local transit links, that problem could be solved independantly of the existance of "non-transit" links.
Just noting why it will never be anything other than a small customer transit-only solution. As long as you are guaranteed by design that your product will never be applicable to large networks or networks with any peering, you know that odds are VERY slim you'll ever have anyone with real network clue using the product. Under such conditions, snake oil sales flurish.
It appears to me that you've acknowledged that route optimization solves a problem, albeit one that is not a complete solution for your network. The claims of 'snake oil' seems inappropriate in this context. One step further: if you are running a network of this type, then there seems to be a large likelihood that you are selling transit. Thus, your customers may well be using technology of this sort to provide real solutions to THEIR problems. (specifically, they may be directing traffic towards providers that are to _their_ advantage; and be gaining detailed insight as to the real quality of connectivity being provided to them.) It's not clear to me how you chose to define "real network clue", but I would not suggest that your customers are completely lacking in that area. :)
In other cases, it may be possible to define the set of destinations that are legal over a given link, and constrain measurements for that link.
Good luck making this scale. :)
Granted - it is a limited solution -- but still a solution that does solve a set of real-world problems.
What is broken for one provider and fixed at another may very well break something else that was working before at the first provider, yes? Besides the difficulties of assigning a true metric to the overall reachability of a /8 or any aggregate for that matter ("ok we decreased rtt by 20ms to these 3 destinations doing 15Mbps each but we increased rtt to this other destination doing 40Mbps by 60ms so we're better right?"),
Having measurement traffic that directly correlates to actual traffic makes this problem much more managable.
The problems then become:
* The quicker you try to react, the more you place yourself at risk of starting a best path flap cycle.
* Congestion does not only happen on your uplink circuit, it can happen at every point along the path, including peers, backbone circuits, and even the end user/site links. While I find the sales pitches of people touting the horrors of peering to be quite sad (from Internap to the classic MAE Dulles :P), peering capacity is largely based on the ability to predict the traffic levels far in advance. It doesn't take that many "large" customers selecting certain destinations through one provider at once to blow up a peer in one region.
Flap control is an important consideration. Note that in the described topology, changing the selection of an egress point does not affect the routing tables of external networks (as opposed to flapping of route advertisements, for inbound traffic.) I do think that it's useful to compare the behaviour of "mortal" BGP in the conditions you describe ... if BGP selects a path that is, or becomes, congested ... BGP has no feedback mechanism to make a change until the overall topology changes, or until manual intervention. An automated route optimization system can evaluate the performance, and current load, of alternate egresses, make an automated change to the egress, and then monitor the success of the change. In most cases, the overall conditions will have been improved. In the case you describe above, the route change results in suboptimal performance, and a new decision is needed. This process needs to have effective flap control. This is an area in which I've seen a fair amount of development; and have seen good results in years of production use.
Balancing the traffic of a GigE and a couple of FastE transits to keep each one uncongested may be enough functionality to sell some boxes to some low end users, but this falls into the categories I've described above, and does nothing to address the true end to end performance.
It's not clear to me what you mean here by "true end to end performance". I don't pretend that the approach being discussed is a COMPREHENSIVE solution to all the problems that can impair performance; but I do think that for the class of performance problems that are directly observable via inspection of alternate egresses, redirecting the egress does in fact address "true end to end performance".
Thus the only real solution to the problem if you actually want to optimize traffic is:
c) Dynamically measure all of the possible deaggregations of all active space, and dynamically determine which prefixes need to be deaggregated to what level.
Note that in any of the above cases, the de-aggregated routes should be marked NO_EXPORT.
Throw away the BGP routing table completely, and build your own based on the topology and metrics you have detected. Of course, this means saying goodbye to the usual failsafe method of keeping the normal BGP routes in the table with a lower localpref so if the box falls over you just fail back to normal BGP path selection.
This alone seems to make adoption of such technology rather difficult ...
And probably more importantly, there isn't enough scale in the traffic probing system to gather the necessary topology info once for every customer... ... Maybe if you made everyone's boxes report data back to a central site, you could gather something useful from it.
IMHO, that approach has demonstrated scalability limitations. Performance, and load information, tends to get stale very quickly. ------------------------------------------------------ While it does seem obvious that a richer palette of routing policy control SHOULD be a core part of the routing fabric, I don't expect to see BGPv4, (or multihoming under IPv6,) providing real solutions for this set of problems for the foreseeable future. cheers -- Sean
RAS> Date: Mon, 26 Jan 2004 15:35:28 -0500 RAS> From: Richard A Steenbergen RAS> On Mon, Jan 26, 2004 at 10:58:49AM -0800, Sean Finn wrote: RAS> RAS> > (Quiz for the list readers: RAS> > What percentage of the Internet routing table does RAS> > your network actually use?) Perhaps around 25% for a "moderate"-sized organizition, but as low as 5% is not unreasonable for regionals and locals. Discount spam from around the world, and I suspect the numbers drop even more. :-) RAS> I doubt you'll find anyone here who will stand up and admit RAS> to enjoying tweaking metrics and policies more often than RAS> once a month. The proble with interest from most of this A couple days a couple time a year of manual testing and tweaking for "most important" prefixes usually does the trick. Considering industry instability, one probably changes an upstream about as often as one would tune anyway. Considering the difficulty one often has finding a clued rep, one probably spends more time educating sales reps than tuning traffic. ;-) Eddy -- Brotsman & Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _________________________________________________________________ DO NOT send mail to the following addresses : blacklist@brics.com -or- alfra@intc.net -or- curbjmp@intc.net Sending mail to spambait addresses is a great way to get blocked.
participants (11)
-
E.B. Dreger
-
Jim Devane
-
Olivier Bonaventure
-
Patrick W.Gilmore
-
Paul Vixie
-
Phil Rosenthal
-
Richard A Steenbergen
-
Richard J. Sears
-
Sean Finn
-
Tom (UnitedLayer)
-
vijay gill