If we need to go to Cisco and say, "Hey, make a GSR that does BGP updates faster", then that's what we need to do! Imposing limitations on end users which make the internet less useful is not a solution to the problem, at best it's a kludge which reduces the headache for backbone providers, but doesn't actually solve any long-term problems.
Gee..., why didn't anyone else think of that? Cisco, are you listening? You need to make a GSR that does BGP updates faster, ok? While you're at it, you need to write bug-free software and build routers that never fail, ok? As for the "imposing limitations on end-users which make the (I)internet less useful", well, personally, I'd trade "less useful" for "available" is a second. Though you should understand that those limitations aren't imposed because folks have nothing better to do, they're imposed because people with experience attempt to design things that are reliable, scale well, and are manageable. Having worked for several large service providers, I assure you, hacks and one-offs don't scale well. -danny
Gee..., why didn't anyone else think of that? Cisco, are you listening? You need to make a GSR that does BGP updates faster, ok? While you're at it, you need to write bug-free software and build routers that never fail, ok?
C'mon. I'm obviously not suggesting it is as easy as "ask and ye shall receive". My point here is that demand drives the market, and if it becomes clear that routers with faster BGP implementations are what is needed, that is just what the vendors will (eventually, at least) develope. Do you think vendors have been madly increasing the throughput of their switches and routers over the last couple years just for the fun of it? I doubt it. It's because increased bandwidth is *what's in demand*. If there's a theoretical limit that prevents routers from processing BGP updates faster than they do today, I'd love to hear an explanation -- then again, I also heard convincing arguments that modems could never get faster than 9600 baud.
As for the "imposing limitations on end-users which make the (I)internet less useful", well, personally, I'd trade "less useful" for "available" is a second. Though you should understand that those limitations aren't imposed because folks have nothing better to do, they're imposed because people with experience attempt to design things that are reliable, scale well, and are manageable. Having worked for several large service providers, I assure you, hacks and one-offs don't scale well.
I'm not suggesting that the limitations are being imposed for no good reason -- I'm simply saying that imposing that kind of limitation *is not a long-term solution*, if what users of the network actually need is something else. Also, keep in mind that in the case we are talking about, the *exact reason* that users need portable /24s is for reliability -- so at least for those users, I'm sure they would rather each provider be 10% less reliable, if it meant they could multihome to multiple providers.
Gee..., why didn't anyone else think of that? Cisco, are you listening? You need to make a GSR that does BGP updates faster, ok? While you're at it, you need to write bug-free software and build routers that never fail, ok?
C'mon. I'm obviously not suggesting it is as easy as "ask and ye shall receive". My point here is that demand drives the market, and if it becomes clear that routers with faster BGP implementations are what is needed, that is just what the vendors will (eventually, at least) develope.
you're forgetting (or not admitting here) that a corporation's primary motivators are profit and shareholder value. a vendor will surely develop anything that gains them significant market share, or significant increase in profits, or significant increase in revenues. driven by demand alone, a market is not. "routers with faster bgp implemetations are what is needed" is what we say, but the question a vendor asks is "does it increase my profit margin, revenues, or market position?". what we "want" is mostly irrelevant. -b
I'm not sure this is really the right forum for this facet of the discussion, but..
"routers with faster bgp implemetations are what is needed" is what we say, but the question a vendor asks is "does it increase my profit margin, revenues, or market position?". what we "want" is mostly irrelevant.
Don't you think introducing something which is A) In significant demand, and B) Nobody else has, generally works to increase one's market share? Sure, for various handy features and widgets, you can tell the vendor how much you want them all day and nothing will get done. Why? Because although it would be damn handy, and although it might be unique in the market, it's not actually something which is going to sway a lot of purchasing decisions -- if it's not a big enough feature that you can convince management you should switch from Vendor A to Vendor B just because vendor B has the feature, then it's not going to affect the current balance of market share very much. They're better off spending their money on marketing to attract new customers than on developing your new widget. On the other hand, arguments like "it will significantly improve the quality of service we can provide to our customers" or "it will reduce our operating costs by allowing us to more efficiently route traffic", tend to more directly impact purchasing decisions. This then represents a real incentive for vendors. Really, this seems like kind of a silly argument -- do you honestly think routers are going to start geting slower and decreasing in capacity?
On Mon, 15 May 2000 15:05:45 EDT, Chris Williams <chris.williams@third-rail.net> said:
"routers with faster bgp implemetations are what is needed" is what we say, but the question a vendor asks is "does it increase my profit margin, revenues, or market position?". what we "want" is mostly irrelevant. Don't you think introducing something which is A) In significant demand, and B) Nobody else has, generally works to increase one's market share?
You (and a lot of other people) seem to have forgotten a line item: C) Price Point. How much are people willing/able to PAY for such a feature, and how many do you have to sell at that price to offset the R&D costs? Sure, *any* good router vendor can build a router that can handle 100 million routing table entries. The questions are (a) can they do it for a pricetag of under $2M, and (b) how many will they sell? -- Valdis Kletnieks Operating Systems Analyst Virginia Tech
Giving this some thought, (something i never do before posting to nanog nromally), I assume initally only the larger ISPs would opt for such a purchase of that magnatude, lets say these new routers go into production, and only the largest ISPs buy them, now we have largest ISPs listening to the annoucements of /24s in space normally disallowed, now over time other ISPs will still refuse to listen to those annoucements except for the ones with these new magical routers. Consider that for a moment, Quickly customers will ditch their ISP for one of the few who will annouce and will listen to the networks. Image the Markering Dept nightmare with ISPs claiming to be able to "Route more of the net then our competitor. This is why we have policy disallowing this, if ISPs aren't on the same page, this sort of nonsense scenario would happen, since we do have policy, and a ounce of sense, this scenario should never happen. The net must route, but lets all do it the same way, shall we? On Mon, 15 May 2000 Valdis.Kletnieks@vt.edu wrote:
On Mon, 15 May 2000 15:05:45 EDT, Chris Williams <chris.williams@third-rail.net> said:
"routers with faster bgp implemetations are what is needed" is what we say, but the question a vendor asks is "does it increase my profit margin, revenues, or market position?". what we "want" is mostly irrelevant. Don't you think introducing something which is A) In significant demand, and B) Nobody else has, generally works to increase one's market share?
You (and a lot of other people) seem to have forgotten a line item:
C) Price Point. How much are people willing/able to PAY for such a feature, and how many do you have to sell at that price to offset the R&D costs?
Sure, *any* good router vendor can build a router that can handle 100 million routing table entries. The questions are (a) can they do it for a pricetag of under $2M, and (b) how many will they sell?
-- Valdis Kletnieks Operating Systems Analyst Virginia Tech
Folks, I've been playing around with inter-backbone latency measurements using an N^2 matrix of Keynote agents. The results were interesting enough that I've put up a website that shows TCPOpen latency numbers (last hour and last 24 hours) for seven US backbones. Have a look at: http://internetpulse.net/ and let me know your thoughts. Please review the "About this site" link at the bottom of the main page for details on measurements and methodology. Note especially that these are *not* web measurements. The values are the number of mS required to establish a TCP connection. I'm especially interested in your opinions on the thresholds and the use of geometric means in the calculations. Please send comments to me directly, as there is no need to clog up the NANOG list. I'll summarize after a few days if there's interest. --Lloyd
Gee, just when I thought I got the required answers to my "simple" multi-homing questions to/from the comp.protocols.tcp-ip News group, I started seeing these Nanog threads related to multi-homing, now I'm not so sure its "simple", or even if a "Simple" multi-homing to > 1 provider is possible?....[I've pasted the News posting below]. Our requirements are: - we needs link redundancy (a single location) our web service requires 24 x 7 availability. - We cant co-locate our web service to an ISP/Hosting-Provider at the current time. - Our current provider, UUNET (were using T1 burstable service), has a single POP in our location (Ottawa, Ontario Canada). We dont want 2 links to the same provider (POP), or do we? - As far as I know there is only one provider in this area that has > 1 POP. [its not financially prudent for us to drop UUNET and go with them read we have a contract]. - We have been allocated a /24 from UUNET. BTW: Since last Thursday afternoon, we've been having T1 link flaky-ness and have been intermittantly up and down (mostly down). We replaced our router [it was delivered to the guy in charge of our network on Saturday night - sort-a link a pizza delivery - One router please, with a T1 for extra topping...:-)] and the T1 card and some cabling - the finger point is still fricken going on.....So please don't try to tell me that us small /24 guys don't need link redundancy and multi-homing....we do... I basically need to be educated and told whether the configuration described below will work. Based upon these Nanog threads, I'm concerned that when our primary link goes down, our IP /24 address block will not be globably routeable via our backup link/ISP since some ISPs filter /24s out? Comments? What are the alternatives? In summary, the configuration we "were" considering: - use only default routing. - we need to get our own ASN. - primary T1 link will uses local-preferece so all outgoing traffic will use this link. - backup T1 will use AS_Path manipulation to insert bogus AS entries to pad-out the AS length so incoming should prefer the primary T1. ThatsAll...
Below is the response I got from a "comp.protocols.tcp-ip" news group query, the ">" are my original questions, the other info. is from a person who responded to my questions.... In article <391E3098.CB2C084@home.com>, Todd Sandor <tsandor@home.com> wrote:
Hi, I need some assistance/direction in trying to determine what I need to do in order to multi-home to different providers. It sounds simple enough please provide help/hints/references. Cheers
I think the bottom line question is how to I reliably multi-home to multiple (2 in this case) providers without a PI (provider-independent IP address) and without an ASN? Maybe someone can direct me to a document that described
If you're going to run BGP, you need an ASN, but you don't need PI addresses. You can advertise your UUNET-assigned address block to your backup provider. As long as you tell UUNET to export it as well, you should be fine. For a good description of how to configure multi-homing with BGP, see <http://www.netaxs.com/~freedman/bgp.html>.
I've done some reading about BGP (e.g. Bassam Halabi's "Internet Routing Architectures"), but have no "hands on" experience. What I would like to be able to do is run BGP to each provider and use one link as a primary link and the second link as a backup. I think I would need to: - Use default routing [dynamically learn 0/0 from both providers]. I would use the BGP attribute "local" preference (or Cisco's weight parameter) to affect outgoing traffic to use the primary link. - Would use AS_Path manipulation to insert bogus AS entries into the AS_path attribute on the backup link to influence inbound traffic [from what I understand this need to be done all the way up to the NAP -- will my providers help me with this (tell me the # of bogus entries I'll need to add?)].
If your primary ISP is a tier-1 like UUNET, 1-2 levels of padding should be sufficient. You may also need to send a community to your backup provider, if you don't want them using that connection for traffic from their own customers. This is because some providers use local-pref to prefer direct customer links over peering links.
- Would filter inbound routes to only accept 0/0. Filter outbound to only send our address block.
You should be able to ask the providers to only send you default routes.
- We currently have a Cisco 2610 -- is this sufficient? Is there a particular IOS release we should run?
For default routes, this should be fine. Any recent IOS should be OK.
- We have been allocated a /24 from UUNET. We are probably not going to be able to justify a /20 from Arin in order to get PI (provider independent) IP address space. I believe we'll need to use an IP address from UUNET or the future "other" provider.
Few organizations other than ISPs can justify /20's and larger.
- It may be difficult to get an ASN (see question #1) .
Questions:
1) The Arin ASN request information requires verification that you are a multi-homed site -- if your just planning be become multi-homed will Arin still give you a ASN?
They want you to provide contact information for both providers. Once you purchase the service from the second provider, you'll be multi-homed and they'll give you the ASN. Until then, you don't need it.
2) If we were to use a private ASN, both providers would need to strip this off [our IP addresses would seem to be part of each providers AS], then the same IP address block (say our /24) would have different ORIGIN attributes -- other then being "illegal" would this cause routing in-stabilities? Do some provider allow this?
You shouldn't do this.
3) What are some of the reasons why Arin at page "http://www.arin.net/regserv.html" specifies "Provider-independent (portable) addresses obtained directly from ARIN are the least likely to be globally Routable".
Some ISPs filter out advertisements smaller than /19. So even though ARIN will assign /20's, they may not be seen by everyone. --
Let me summarize and ask this in a different way - the goal is to provide stability to end-users [service availability (HA)] and it requires cooperation between network service providers [a shorter question with a little spin to make the question fall within the charter of Nanog :-)]...The question is How? When customer in one location is using a multi-homed setup to two providers A and B, with A being the primary (using one of the primary's /24s they've loaned to the customer) and B being the secondary (advertising B with a longer AS_path - simple case that uses default routes). When the customers link to A fails, will the /24 that needs to be globably visible via B (a non-aggregate IP address for B) NOT be globably visible because of the BGP filtering policies of some other provider somewhere, say C ? [I think I know the answer - which is "it will NOT be globally visible, but it depends ... e.g. who A and B is, etc.." but this is not the answer customer's want...They want a viable/reliable solution and I'm not sure how you go about providing it? [if its not possible, I'm stating the obvious, but this is a problem that is only going to get bigger as more /24 types want/require redundant links...i.e. It an operational issue, No?...]... Todd Sandor wrote:
Gee, just when I thought I got the required answers to my "simple" multi-homing questions ....
In the last year I worked for a company which had multihomed /24s and we never had any problem with parts of the internet being unreachable when our primary provider was down, at least not that anyone noticed. I suspect this is because of which providers were upstream -- the configuration was that we were directly peered with C&W, using C&W address space, and our backup was a tier 2 who peered with UUnet and Sprint. My theory is that when our connection to C&W was down, networks which filtered our /24 advertisement would send traffic destined for us to C&W (who was still advertising large aggregates which our /24s were under), and then once it reached C&W, C&W would use its own peering connections with UUnet and/or Sprint to deliver the traffic. Does this sound plausible, or am I missing something? Do a lot of multihomed /24ers get away with it by this principle? In what situations would something like this _not_ happen, aside from peering directly with a primary provider who would not accept advertisements for your small address block from outside? (which would be kind of pointless, anyway..)
[ On Tuesday, May 16, 2000 at 00:59:55 (-0400), Todd Sandor wrote: ]
Subject: "Simple" Multi-Homing ? (was Re: CIDR Report)
Our requirements are: - we needs link redundancy (a single location) \226 our web service requires 24 x 7 availability.
That shouldn't be too hard to achieve if you can get Bell to co-operate. What you might want to look at is provisioning two different services, eg. the T1 you have as well as an ISDN BRI or DSL line for backup (or a second T1 routed through a different telco CO if you absolutely need the bandwidth, but then you'd probably have to be willing to re-locate to a place in the city where such redundancy would be possible at reasonable cost). UUNET should be able to assist in getting full link reduncancy.
- We can�t co-locate our web service to an ISP/Hosting-Provider at the current time.
OK, but why not? They have the advantage of aggregating many costs together and thus can provide far more reliable services than you can ever hope to do at any similar cost on your own.
- Our current provider, UUNET (we�re using T1 burstable service), has a single POP in our location (Ottawa, Ontario Canada). We don�t want 2 links to the same provider (POP), or do we?
To decide how "deep" you need redundancy you have to look at where the risks are. You also have to have a long and hard look at exactly where the primary community you serve is located on the Internet. I'd bet that >50% of the risk lies in your own on-premises facilities. About 25% of the risk will be the local loop to UUNET's POP, and the rest of the risk is that UUNET's link to Ottawa will go down. However the perception you no doubt have is that if your link goes down you're dead no matter what else might happen. In that case link-level redundancy to your provider will suffice to eliminate the obvious finger-pointing problems. Then what remains is either entirely your responsibility (eg. your building burns to the ground, or your disk fries and you learn your backups are all garbage); or UUNET's responsibility (eg. someone digs up the fibre they use to connect to Toronto/Montreal/wherever). If your customers are also regional UUNET customers then having redundancy to another ISP isn't likely going to help you any if UUNET themselves are down for whatever reason. If your customers are mostly in the USA then you should probably think harder about why you haven't moved your servers to a good co-location facility in the USA. If your customers are mostly in Europe, then why aren't your servers there too? If your customers are all over the place then why don't you have multiple servers located in diverse (Interent geography-wise) locations? We're talking about the Internet here aren't we? It shouldn't matter one iota where your servers are located! Before you say that you must have 24x7 availability you really need to think awfully hard about just how much money that level of service is worth to you, and then you have to get some expert advice to tell you how much that level of service costs in the real world. If the numbers don't match then you really need to carefully analyze the risks and the costs to mitigate them and then find some balance between what you can afford to spend and what you can do to reduce the biggest risks equally across the board.
- As far as I know there is only one provider in this area that has > 1 POP. [its not financially prudent for us to drop UUNET and go with them \226 read \205 we have a \223contract\224].
Oops -- sounds like someone didn't do enough up-front planning for this! ;-)
- We have been allocated a /24 from UUNET.
IP allocation is essentially meaningless in your case. You are not going to benefit from any kind of IP routing redundancy unless you can pull your own fibre down different routes to different locations in the USA. Period. Don't even think about BGP -- it won't help you unless you're willing to pay mega-bucks for your own long-haul links and unless you've got incredibly "secure" facilities. "multi-homing" to two different providers in the same city isn't really going to be any more reliable than simply paying one good provider enough incentive to sign a decent service level agreement with you and let them deal with the redundancy issues to the rest of the Internet. (Does your UUNET contract include a service level agreement that reflects your true requirements? If not, why not?)
BTW: Since last Thursday afternoon, we've been having T1 link flaky-ness and have been intermittantly up and down (mostly down). We replaced our router [it was delivered to the guy in charge of our network on Saturday night - sort-a link a pizza delivery - One router please, with a T1 for extra topping...:-)] and the T1 card and some cabling - the finger point is still fricken going on.....So please don't try to tell me that us small /24 guys don't need link redundancy and multi-homing....we do...
Multi-homing != link-level redundancy. Physical multi-homing is one hell of a lot cheaper than IP multi-homing. Fix the right problem! Don't let "teething problems" fool you into thinking you need something that you don't. -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
IP allocation is essentially meaningless in your case. You are not going to benefit from any kind of IP routing redundancy unless you can pull your own fibre down different routes to different locations in the USA. Period. Don't even think about BGP -- it won't help you unless you're willing to pay mega-bucks for your own long-haul links and unless you've got incredibly "secure" facilities. "multi-homing" to two different providers in the same city isn't really going to be any more reliable than simply paying one good provider enough incentive to sign a decent service level agreement with you and let them deal with the redundancy issues to the rest of the Internet.
Although I agree that co-locating is probably the right solution in this guy's case, I _strongly disagree_ with the assertion that having links to two different providers is not helpful, even if somewhere futher up the line there is still a single point of failure (just as the fibre trunk to the US, or whatever). There is a whole class of human-error and miscommunication problems which can cause an interruption of service from one provider, regardless of how technically redundant your links to that provider are. Examples I've directly experienced in the past year: -- The provider finds a cancellation for an old circuit floating around in their DB, and confusing it with your current service, shuts down your connection (I had this happen with a prominent tier-1 provider -- these kinds of mistakes are not just made by small no-name ISPs) -- The provider sends your bill to the wrong address, and so your accounting dept doesn't pay it, and they suspend your service. -- A box on your network is cracked and used to send SPAM. The providers shuts you down for spamming without warning, or with warnings sent to the wrong address. Given, these types of things can be largely prevented by maintaining good lines of communication with your provider. But, in the real world, miscommunication and mistakes do happen, especailly if you are busy and understaffed, as I think most small high-tech companies are. In the time I recently worked with a small multihomed company (about 2 years), I would say the outages we experienced were 30% downed circuits, 40% incompetance on the part of our tier-2 upstream, 10% our own incompetance, and 20% legitimate miscommunication as described above (if you count by duration of the outage -- if you counted number of discrete outages, you would see us rebooting our flakey router a lot ;)). If you assumed we had gone with a competant backup provider, that would leave 1/3 of all downtime to "human factors" other than our own foul-ups, rather than actual hardware and/or software failures. Of course, my experience in this type of scenario may not be universal. ;) In any case, it seems to me we've veered significantly from the original topic. I thought we had pretty well established that "there are some legitimate needs for multihomed /24s" -- the question brought to the list was really "will my /24 work if I multihome it?", not "please tell me I don't know what I want". [BTW, thanks Rob for pointing out my date problem -- somebody here thought it was clever for our login script to get the date from a server with a dead CMOS battery..]
[ On Tuesday, May 16, 2000 at 12:09:16 (-0400), Chris Williams wrote: ]
Subject: Re: "Simple" Multi-Homing ? (was Re: CIDR Report)
Although I agree that co-locating is probably the right solution in this There is a whole class of human-error and miscommunication problems which can cause an interruption of service from one provider, regardless of how technically redundant your links to that provider are. Examples I've directly experienced in the past year: -- The provider finds a cancellation for an old circuit floating around in their DB, and confusing it with your current service, shuts down your connection (I had this happen with a prominent tier-1 provider -- these kinds of mistakes are not just made by small no-name ISPs) -- The provider sends your bill to the wrong address, and so your accounting dept doesn't pay it, and they suspend your service.
Those are examples of contingencies that should be covered in your service level agreement. The SLA should hold the provider responsible for any loss of income, recovery costs, or whatever, should they be the ones to screw up. Assuming you trust the provider to honour the SLA you've worked out with them then your level of risk is mitigated even if it's not done exactly in the way you might prefer under ideal circumstances -- we are talking about (hopefully exceptional) events here, after all! If you don't trust your provider to honour their agreements then I'd humbly suggest you find one you can trust! ;-) All but the last are also examples where basic link-level redundancy will help to avoid total outages. You don't need an ASN and full BGP route peering just to remain connected when your T1 goes down! Please let's solve the right problem here!
-- A box on your network is cracked and used to send SPAM. The providers shuts you down for spamming without warning, or with warnings sent to the wrong address.
If such action is specified in your contract then you've accepted that risk and you should mitigate it appropriately (eg. by regularly testing and securing your servers!). I'd hope that if you did have redundant routing then your other provider would also cut you off for the same reason and at approximately the same time!
In any case, it seems to me we've veered significantly from the original topic. I thought we had pretty well established that "there are some legitimate needs for multihomed /24s" -- the question brought to the list was really "will my /24 work if I multihome it?", not "please tell me I don't know what I want".
I'm not entirely sure I agree yet. There are lots of excuses flying around, but few real reasons. Sure you can concoct fake requirements until they add up to this being the only alternative but I don't think they'll weigh out in the long run. I think there's another alternative that's being missed here too that'll satisfy the majority of needs of quite a few people, if not most. It should be trivial to obtain only the minimum necessary address space from both providers and truly multi-home the servers requiring redundancy! For outgoing connections you simply flip the default route on each server as necessary (perhaps using automated tools) and for incoming connections you just put multiple A RRs in your DNS for each service requiring redundancy. Load balancing opportunities spring to mind here too! I.e. please let's solve the right problem! -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
Those are examples of contingencies that should be covered in your service level agreement. The SLA should hold the provider responsible for any loss of income, recovery costs, or whatever, should they be the ones to screw up. Assuming you trust the provider to honour the SLA you've worked out with them then your level of risk is mitigated even if it's not done exactly in the way you might prefer under ideal circumstances -- we are talking about (hopefully exceptional) events here, after all! If you don't trust your provider to honour their agreements then I'd humbly suggest you find one you can trust! ;-)
Most SLAs I've seen, at least for smaller customers, are of the type "if we're down for a day, you get a free week", which means in general your maximum remedy for an outage is the cost of a T1 for a month. I think it is pretty plausible that a company which only needed a T1 of bandwidth could lose a lot more than $1500 worth of business if they were down for a day or two.
All but the last are also examples where basic link-level redundancy will help to avoid total outages. You don't need an ASN and full BGP route peering just to remain connected when your T1 goes down! Please let's solve the right problem here!
All three were examples of miscommunication causing someone at the provider to intentionally suspend or terminate service. It would hardly matter how many links you had to the provider when they chose to shut you down.
If such action is specified in your contract then you've accepted that risk and you should mitigate it appropriately (eg. by regularly testing and securing your servers!). I'd hope that if you did have redundant routing then your other provider would also cut you off for the same reason and at approximately the same time!
The situation I was trying to highlight was one where such an incident occurs, and the customer quickly and appropriately responds, but one of their providers overreacts and at some point during the process suspends service. It is really not about who is right, but about the fact that any given provider is run by a small group of humans, and that any given group of humans is to some degree unpredictable. If you only have one provider, it only takes one human mishandling a situation to take you offline. I would hope that most reasonable providers would _not_ cut off a customer immediately if they were found to be a source of misbehavior, but first ask them politely to fix the problem (with the exception, of course, of immediately blocking any traffic that was actively interfering with someone else's operation). If you have discovered a way to make a machine guaranteed and perfectly secure, I might reconsider this position. ;P
I think there's another alternative that's being missed here too that'll satisfy the majority of needs of quite a few people, if not most. It should be trivial to obtain only the minimum necessary address space from both providers and truly multi-home the servers requiring redundancy! For outgoing connections you simply flip the default route on each server as necessary (perhaps using automated tools) and for incoming connections you just put multiple A RRs in your DNS for each service requiring redundancy. Load balancing opportunities spring to mind here too!
Although I agree that this is a possible solution, I think at some point it would become awefully hard to manage -- also, it only addresses a subset of the situations requiring multihoming. Do you know of any software to help implement this type of solution? I can imagine how to script up the default-route swapping pretty easily on a Unix box, but AFAIK it would likely require a reboot under NT, and I'm not sure how you would go about automating it even then.. Maybe a good way to go about it would be to set up a box to do reverse NAT for incoming connections to either set of server IPs, and then round-robin between IP spaces for outgoing connections? I think this could be set up with IPF under *BSD/Linux, I'm not familiar enough with NAT under IOS to know how hard it would be to do with a Cisco router.. This would have the advantage of simplifying the server configurations, and there should really be something in the way of a firewall/filter in front of them anyhow. The only real disadvantage I can see of this solution is that the load-balancing is not topology-sensitive -- on the other hand, if you weren't going to receive full views anyway, or if both providers end up connecting to the tier-1 backbones in the smae place, this is a moot point, and you are actually better off with round-robin load-balancing.
[ On Monday, May 16, 1988 at 14:49:16 (-0400), Chris Williams wrote: ]
Subject: Re: "Simple" Multi-Homing ? (was Re: CIDR Report)
Most SLAs I've seen, at least for smaller customers, are of the type "if we're down for a day, you get a free week", which means in general your maximum remedy for an outage is the cost of a T1 for a month. I think it is pretty plausible that a company which only needed a T1 of bandwidth could lose a lot more than $1500 worth of business if they were down for a day or two.
An SLA is like any other contract between two parties. If one of them doesn't negotiate what they really need but still signs off then who's to blame? If you really are in a position to loose a lot of hard cash business, but not enough to justify a more reliable setup, then perhaps you shouldn't be asking your ISP to be your insurance company too, but rather go to a real insurance company for such services instead.
All three were examples of miscommunication causing someone at the provider to intentionally suspend or terminate service. It would hardly matter how many links you had to the provider when they chose to shut you down.
Actually it would probably make a big difference. It could also make a big difference as to how quickly you would be able to get back up and running too. In any case all of the problems you've outlined are also greatly reduced if you forge a good business and working relationship with your provider. It's a lot less likely for a business "partner" to do nasty things to you, even by accident, if you do a bit more than just plug into their router and pay them anonymously every month or whatever.
I would hope that most reasonable providers would _not_ cut off a customer immediately if they were found to be a source of misbehavior, but first ask them politely to fix the problem (with the exception, of course, of immediately blocking any traffic that was actively interfering with someone else's operation). If you have discovered a way to make a machine guaranteed and perfectly secure, I might reconsider this position. ;P
Like I said you do active and very regular testing. If you can't catch an open relay on your own servers before the spammers do (or a smurf amplifier) then something's not working right in your operations department!
Although I agree that this is a possible solution, I think at some point it would become awefully hard to manage -- also, it only addresses a subset of the situations requiring multihoming.
It's one hell of a lot easier to "manage" physical multi-homing than it is to spoof the requirements necessary to have portable address space and an ASN! ;-)
Do you know of any software to help implement this type of solution? I can imagine how to script up the default-route swapping pretty easily on a Unix box, but AFAIK it would likely require a reboot under NT, and I'm not sure how you would go about automating it even then..
I don't do NT and don't cater to those who do! ;-)
Maybe a good way to go about it would be to set up a box to do reverse NAT for incoming connections to either set of server IPs, and then round-robin between IP spaces for outgoing connections? I think this could be set up with IPF under *BSD/Linux, I'm not familiar enough with NAT under IOS to know how hard it would be to do with a Cisco router.. This would have the advantage of simplifying the server configurations, and there should really be something in the way of a firewall/filter in front of them anyhow.
Yes a carefully configured, multi-homed, NAT on your network's "edge" could do all of the redundancy tricks too.... (I don't know enough of the Cisco IOS NAT either to know if it could do this, but I'd bet it can for at least basic scenarios where only a few "well known services" are multi-homed.) The only problem with NAT is that you have to be DAMN careful about how you handle the non-TCP packets too otherwise all kinds of error-handling goes out the window and you may as well not have any redundancy in the first place (eg. if you send ICMP host-unreachable replies when one of the servers is down but they have an RFC-1918 source address and are filtered out before they can make it to the client, well...). Personally I'd just run both networks over the same "wire" (but with totally separate routers) and put interface aliases on all the necessary machines. Except for the redundant connection itself I already do this on my home network! ;-) Of course you could also set up completely twinned servers (perhaps with a private administrative LAN between them on the inside) and thus enjoy the benefits of physical redundancy all the way to the core. However if you do this then why not put one set of servers in SF and the other in NY and be truly dual-homed!?!?!? -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
briefly speaking, there are two major solutions to the problem like this one, when the constantly increasing number of customers ask "please sell me x" (in our case, x is equal to multihoming of very small networks to two different isps) and you do not have x on-stock because you have y. the solutions are: 1) to tell the customers "you do not need x, you need y" (and yes, customers really need y, not x); 2) to work on getting x on-stock and to sell this x to customers. it is extremely obvious to me that in the free market environment the winners are those who use the second approach. -- dima.
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of woods@weird.com Sent: Tuesday, May 16, 2000 4:26 PM To: North America Network Operators Group Mailing List Cc: Todd Sandor Subject: Re: "Simple" Multi-Homing ? (was Re: CIDR Report)
[ On Monday, May 16, 1988 at 14:49:16 (-0400), Chris Williams wrote: ]
Subject: Re: "Simple" Multi-Homing ? (was Re: CIDR Report)
Most SLAs I've seen, at least for smaller customers, are of the type "if we're down for a day, you get a free week", which means in general your maximum remedy for an outage is the cost of a T1 for a month. I think it is pretty plausible that a company which only needed a T1 of bandwidth could lose a lot more than $1500 worth of business if they were down for a day or two.
An SLA is like any other contract between two parties. If one of them doesn't negotiate what they really need but still signs off then who's to blame?
If you really are in a position to loose a lot of hard cash business, but not enough to justify a more reliable setup, then perhaps you shouldn't be asking your ISP to be your insurance company too, but rather go to a real insurance company for such services instead.
All three were examples of miscommunication causing someone at the provider to intentionally suspend or terminate service. It would hardly matter how many links you had to the provider when they chose to shut you down.
Actually it would probably make a big difference. It could also make a big difference as to how quickly you would be able to get back up and running too.
In any case all of the problems you've outlined are also greatly reduced if you forge a good business and working relationship with your provider. It's a lot less likely for a business "partner" to do nasty things to you, even by accident, if you do a bit more than just plug into their router and pay them anonymously every month or whatever.
I would hope that most reasonable providers would _not_ cut off a customer immediately if they were found to be a source of misbehavior, but first ask them politely to fix the problem (with the exception, of course, of immediately blocking any traffic that was actively interfering with someone else's operation). If you have discovered a way to make a machine guaranteed and perfectly secure, I might reconsider this position. ;P
Like I said you do active and very regular testing. If you can't catch an open relay on your own servers before the spammers do (or a smurf amplifier) then something's not working right in your operations department!
Although I agree that this is a possible solution, I think at some point it would become awefully hard to manage -- also, it only addresses a subset of the situations requiring multihoming.
It's one hell of a lot easier to "manage" physical multi-homing than it is to spoof the requirements necessary to have portable address space and an ASN! ;-)
Do you know of any software to help implement this type of solution? I can imagine how to script up the default-route swapping pretty easily on a Unix box, but AFAIK it would likely require a reboot under NT, and I'm not sure how you would go about automating it even then..
I don't do NT and don't cater to those who do! ;-)
Maybe a good way to go about it would be to set up a box to do reverse NAT for incoming connections to either set of server IPs, and then round-robin between IP spaces for outgoing connections? I think this could be set up with IPF under *BSD/Linux, I'm not familiar enough with NAT under IOS to know how hard it would be to do with a Cisco router.. This would have the advantage of simplifying the server configurations, and there should really be something in the way of a firewall/filter in front of them anyhow.
Yes a carefully configured, multi-homed, NAT on your network's "edge" could do all of the redundancy tricks too.... (I don't know enough of the Cisco IOS NAT either to know if it could do this, but I'd bet it can for at least basic scenarios where only a few "well known services" are multi-homed.) The only problem with NAT is that you have to be DAMN careful about how you handle the non-TCP packets too otherwise all kinds of error-handling goes out the window and you may as well not have any redundancy in the first place (eg. if you send ICMP host-unreachable replies when one of the servers is down but they have an RFC-1918 source address and are filtered out before they can make it to the client, well...).
Personally I'd just run both networks over the same "wire" (but with totally separate routers) and put interface aliases on all the necessary machines. Except for the redundant connection itself I already do this on my home network! ;-)
Of course you could also set up completely twinned servers (perhaps with a private administrative LAN between them on the inside) and thus enjoy the benefits of physical redundancy all the way to the core. However if you do this then why not put one set of servers in SF and the other in NY and be truly dual-homed!?!?!?
-- Greg A. Woods
+1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
On Tue, May 16, 2000 at 05:15:15PM -0400, Dmitri Krioukov wrote:
it is extremely obvious to me that in the free market environment the winners are those who use the second approach.
yes, but by selling your customers somthing that is not appropriate, you turn your customers into losers, which eventually will bite you in the ass. -- [ Jim Mercer jim@reptiles.org +1 416 410-5633 ] [ Reptilian Research -- Longer Life through Colder Blood ] [ Don't be fooled by cheap Finnish imitations; BSD is the One True Code. ]
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of Jim Mercer Sent: Tuesday, May 16, 2000 5:11 PM To: Dmitri Krioukov Cc: nanog@merit.edu; chris.williams@third-rail.net; woods@weird.com; Todd Sandor Subject: Re: "Simple" Multi-Homing ? (was Re: CIDR Report)
On Tue, May 16, 2000 at 05:15:15PM -0400, Dmitri Krioukov wrote:
it is extremely obvious to me that in the free market environment the winners are those who use the second approach.
yes, but by selling your customers somthing that is not appropriate, you turn your customers into losers, which eventually will bite you in the ass.
this ass biting statement is too loose. consider the situation on the ip te front. look at the recent oxc thread on this list. isn't anything biting you? you've been selling ip... -- dima.
Thanks you for all the responses - the info has been educational (special thanks to Chris and Greg) and its given us lots more things to consider..(e.g. our SLA with our provider)..thanks... We have always planned to co-locate (still a few months away), but we just building our "web-service" (we're a startup) and in the short-term we need our boxes on-site and we need the service as HA (Highly Available) as possible [the goal is 24x7, but ...]. We have considered most of the other alternatives that were discussed in the mails (e.g. run effectively two networks, each providing the "web-service" and using some load-balancing mechanism/products with other tricks/scripts when links fail) but I wanted to "flush-out" what the issues would be if we were to multi-home via BGP to different providers [its currently planned as a short-term solution] so we could weigh the benefits/drawbacks of the approaches we were considering.... My hope was that the BGP approach would be "cleaner" ...maybe a little more upfront work, but less on-going maintenance when a link fails [none of the change the default routes on the fly hacks [and yes were running NT], manual/scripted procedures when links fails - e.g. DNS changes, etc.). But we need to understand the issues before we can make the decision. This brings me back to my original "operational" question of whether a non-aggregated /24 will be globablly visiable when our primary link fails...The situation: "When a customer in one location is using a multi-homed setup to two providers A and B, with A being the primary (using a /24 from loaned from provider A) and B being the secondary (updates via B would have a longer AS_path - using default routes with local-pref on the primary). When the customers link to A fails, will the /24 that needs to be now globably visible via B (a non-aggregate IP address for B) NOT be globably visible because of the BGP filtering policies of some other provider somewhere, say C ?
Since I've been getting a few private email questions on this - here is what I was what I was able to find out...Basically it can work, but its dependent upon on you selecting appropriate ISP providers and that these provider peer (or I would assume are not too many ASs away)... Chris Williams had it correct all along in his mail: http://www.merit.edu/mail.archives/nanog/msg02315.html Here is how I understand it: Yes, the /24s will be filtered out by some [big "meanies" - but give'm lots of $$ and they co-loc for ya - its for the greater good of the Internet...;-)...]. If Provider A (primay) has a /19 or larger, and you get you /24 from them, then when your link to them goes down - your /24 is withdrawn by A but Provider A's /19 or larger is still globablly visible and packets will still be sent to Provider A for your /24. If provider B is peered with Provider A, provider B is advertising your /24, then packets will be routed from A through B to your site. Note: In this case, Provider A must not filter /24s from his peers to their own address space. A must also export your /24 when your link is up with your AS (in addition to their /19). See the "BGP Multi-Homing to > 1 ISP Help Required" thread in the comp.protocols.tcp-ip if you want more info. Cheers... Todd Sandor wrote:
This brings me back to my original "operational" question of whether a non-aggregated /24 will be globablly visiable when our primary link fails...The situation:
"When a customer in one location is using a multi-homed setup to two providers A and B, with A being the primary (using a /24 from loaned from provider A) and B being the secondary (updates via B would have a longer AS_path - using default routes with local-pref on the primary). When the customers link to A fails, will the /24 that needs to be now globably visible via B (a non-aggregate IP address for B) NOT be globably visible because of the BGP filtering policies of some other provider somewhere, say C ?
[ On Friday, May 19, 2000 at 00:04:30 (-0400), Todd Sandor wrote: ]
Subject: Re: "Simple" Multi-Homing ? (was Re: CIDR Report)
Here is how I understand it:
[[ followed by lots of ifs and buts ]] Yup, and you're still really only protecting just your local loop and that can be done a hell of a lot more reliably, and probably cheaper, with physical-layer equipment and no handwaving with routing protocols. You'd be far better off with a fall-back ISDN PRI, or wireless, connection to your same primary ISP. At least then you'll still be on the air to 100% of the Internet even though at degraded performance (which is something you can manage by reducing the bandwidth requirements of your application when such things happen, perhaps by dropping all the graphics in favour of a single "TV test-screen" logo or something! ;-) -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
brett watson: Monday, May 15, 2000 11:44 AM
C'mon. I'm obviously not suggesting it is as easy as "ask and ye shall receive". My point here is that demand drives the market, and if it becomes clear that routers with faster BGP implementations are what is needed, that is just what the vendors will (eventually, at least) develope.
you're forgetting (or not admitting here) that a corporation's primary motivators are profit and shareholder value. a vendor will surely develop anything that gains them significant market share, or significant increase in profits, or significant increase in revenues. driven by demand alone, a market is not.
"routers with faster bgp implemetations are what is needed" is what we say, but the question a vendor asks is "does it increase my profit margin, revenues, or market position?". what we "want" is mostly irrelevant.
I beg to differ, the real issues are "find what the market wants, give it to them". The problem is that they haven't figured out how to give it to us. We have to help them, if we want it bad enough. A lot of us, in the new dot-com arena, have a certifiable need for a portable /24. Truthfully, that's about all we do need. What we also have a need for is multi-homing over a very large and diverse area. The usual reasons for this (availability/reliability) is not the ONLY reason for this requirement. Actually, I listed three scenarios; small company with intercontinental locations; large dot-com with regional many co-locations; virtual companies with large collaboration networks. Under the current system, what we are forced to do is either obtain a /24 for each location (even when there are <16 hosts there), or "engineer" our way into a portable /20 so we can participate in peering. Either method burns IP addresses (one engineering trick is to stop using NAT) in huge gulps. Most of us, being more than a little bit social conscious, cringe at the act of burning the IPs. But, the system makes us do it anyway. Alternatively, we let the system dictate our business-model and stop developing global dot-coms (not! - try flying *that* past a VC <heh> I'll even let you borrow my flame-suit). The real short-term answer is to universally allow /24 announcements (I disagree with going below /26). If router capabilities do not support this than then the vendors will have to be encouraged to beef up their equipment. (Having just bought three Cisco Catalyst 6509s, with 3524XL end nodes, in the past six months, another few $20K wouldn't send my CPA into a tail-spin.) However, no amount of hardware is going to change filtering policies and that's why I bring this up here. What are the operational alternatives?
The real short-term answer is to universally allow /24 announcements (I disagree with going below /26). If router capabilities do not support this than then the vendors will have to be encouraged to beef up their equipment. (Having just bought three Cisco Catalyst 6509s, with 3524XL end nodes, in the past six months, another few $20K wouldn't send my CPA into a tail-spin.) However, no amount of hardware is going to change filtering policies and that's why I bring this up here. What are the operational alternatives?
Since you mentioned your flame-suit earlier, you may want to lend it out to whoever goes to Verio next to ask that they listen to annoucements of networks smaller then /16. Better yet, let me borrow it for this next post. Hey I'd love it if the ISP's did listen'd to those annoucements. It would solve all our problems, though it may create a new one. The real question to be asking is, can the net handle tables that large?, That would be an interesting way to ask an ISP why they refuse any annoucement for anything smaller then a /16, are they worried their hardware can't take it? If not, why isnt such hardware available? - So then the fingers no longer point at the ISP with the nazi filtering policy but at the evil router vendor for not giving us hardware that can handle about 16 million entries. As for operational alternatives, the only solution I ever see suggested is telling the person who is complaining that said ISP is being a big meanie and filtering out their /24 network (which happens to be in A-Space) is told that he/she should take it up with their own ISP and have them aggregate it properly. I do like this word 'properly' its proper when its not trying to be broken up over many areas, but when a customer has the issue of being broken up all over (per the example given a few emails back on here), the customer is... well screwed. the only answer is to burn the IP blocks up., less IPs for the world, but hey, everyone's hardware has less routes to handle :), before I'm flamed for saying this, let me assure you I do not consider the situation ideal at all. Rodney Caston SBC Internet Services
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of Roeland Meyer (E-mail) Sent: Monday, May 15, 2000 3:31 PM To: 'brett watson'; nanog@merit.edu Subject: RE: CIDR Report
Under the current system, what we are forced to do is either obtain a /24 for each location (even when there are <16 hosts there), or
so, basically we have this very common and well understood situation that there are some customers who do not require large blocks but who want to see their small pi blocks advertised and routable more or less everywhere (if those blocks are pa, how many over there are using the second option from rfc2260? if not that many, what's wrong with that?). among these customers, there is some portion, who may be educated that they do not really need what they ask for. the other portion either cannot be educated or does really need that. as for "interim" :) solution, i cannot see much problem (except significant coordination effort) in having (regional) irs allocating blocks, from which longer (than /24) prefixes would be acceptable even by verio. this way, verio would be happy filtering all longer prefixes except from these well-known blocks and the address space wouldn't be wasted on the aforementioned customers. some tables may be even created matching allowed prefix lengths and the well-known block(s) for them. -- dima.
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of Dmitri Krioukov Sent: Monday, May 15, 2000 5:46 PM To: rmeyer@mhsc.com; 'brett watson'; nanog@merit.edu Subject: fighting cidr dead ending (was: RE: CIDR Report)
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of Roeland Meyer (E-mail) Sent: Monday, May 15, 2000 3:31 PM To: 'brett watson'; nanog@merit.edu Subject: RE: CIDR Report
Under the current system, what we are forced to do is either obtain a /24 for each location (even when there are <16 hosts there), or
so, basically we have this very common and well understood situation that there are some customers who do not require large blocks but who want to see their small pi blocks advertised and routable more or less everywhere (if those blocks are pa, how many over there are using the second option from rfc2260? if not that many, what's wrong with that?). among these customers, there is some portion, who may be educated that they do not really need what they ask for. the other portion either cannot be educated or does really need that. as for "interim" :) solution, i cannot see much problem (except significant coordination effort) in having (regional) irs allocating blocks, from which longer (than /24) prefixes would be acceptable even by verio. this way, verio would be happy filtering all longer prefixes except from these well-known blocks and the address space wouldn't be wasted on the aforementioned customers.
some tables may be even created matching allowed prefix lengths and the well-known block(s) for them.
it looks like i'm reinventing the classfull routing, though :)
-- dima. -- dima.
On Mon, 15 May 2000, Danny McPherson wrote:
Cisco, are you listening? You need to make a GSR that does BGP updates faster, ok? While you're at it, you need to write bug-free software....
Those were going to be my words. It seems every time we buy something relatively new & high-end from Cisco, they don't actually have working code available yet...so we either end up suffering with serious bugs or they send us replacement hardware for which there's more stable software available. Latest non-working combo...7206VXR/NPE225/PA-MC-T3. Anyone know of a stable release for this? TAC doesn't seem to. Solution for now is to swap out the NPE225 for an older more widely supported one...since only 2 trains support the combo above and none work. ---------------------------------------------------------------------- Jon Lewis *jlewis@lewis.org*| I route System Administrator | therefore you are Atlantic Net | _________http://www.lewis.org/~jlewis/pgp for PGP public key__________
participants (13)
-
brett watson
-
Chris Williams
-
Danny McPherson
-
Dmitri Krioukov
-
Jim Mercer
-
jlewis@lewis.org
-
Lloyd Taylor
-
Randy Bush
-
Rodney L Caston
-
Roeland Meyer (E-mail)
-
Todd Sandor
-
Valdis.Kletnieks@vt.edu
-
woods@weird.com