RIPE "Golden Networks" Document ID - 229/210/178
Hello folks, This is actually NANOG applicable, despite referring to RIPE... ;-) How many of you who manage BGP speaking networks implement the RIPE "best practices" regarding dampening parameters for so-called "golden networks"? See: http://www.ripe.net/ripe/docs/routeflap-damping.html and http://www.qorbit.net/documents/golden-networks (thanks, Steve!) If you do, what parameters do you use, or do you not dampen the "golden networks" at all? If you don't implement ripe-229, why not? If there is enough interest/response (i.e if anyone besides me feels this is a real operational issue currently and wants to deal with it), I'll work on compiling the responses and producing a report. Note: A *significant* number of networks appear to *not* follow ripe-229 guidelines at all. Thanks, Rodney Joffe CenterGate Research Group, LLC http://www.centergate.com "Technology so advanced, even WE don't understand it"(R)
well.... RIPE is the RIR for Europe. RIPE-229 is, from my viewpoint, arbitrary and capricious. the root servers are -ONE- set of interesting servers. what about the web sites that point to these "important" documents? or the time servers, or my NOC & monitoring machines? The idea of an Internet Registry stepping into giving routing advice is a leap of faith. An RIR can tell you what was delegated - but presuming to give advice on what is important for everyone that uses IP protocols is over the top. so no, i don't use this document as a guideline for "golden networks". the advice on dampening is important tho and it worthwhile. On Sep 3, 2004, at 3:44, Rodney Joffe wrote:
Hello folks,
This is actually NANOG applicable, despite referring to RIPE... ;-)
How many of you who manage BGP speaking networks implement the RIPE "best practices" regarding dampening parameters for so-called "golden networks"?
See: http://www.ripe.net/ripe/docs/routeflap-damping.html and http://www.qorbit.net/documents/golden-networks (thanks, Steve!)
If you do, what parameters do you use, or do you not dampen the "golden networks" at all?
If you don't implement ripe-229, why not?
If there is enough interest/response (i.e if anyone besides me feels this is a real operational issue currently and wants to deal with it), I'll work on compiling the responses and producing a report.
Note: A *significant* number of networks appear to *not* follow ripe-229 guidelines at all.
Thanks,
Rodney Joffe CenterGate Research Group, LLC http://www.centergate.com "Technology so advanced, even WE don't understand it"(R)
On Fri, Sep 03, 2004 at 04:06:12AM +1200, Bill Manning wrote:
well....
RIPE is the RIR for Europe. RIPE-229 is, from my viewpoint, arbitrary and capricious. the root servers are -ONE- set of interesting servers. what about the web sites that point to these "important" documents? or the time servers, or my NOC & monitoring machines?
The idea of an Internet Registry stepping into giving routing advice is a leap of faith. An RIR can tell you what was delegated - but presuming to give advice on what is important for everyone that uses IP protocols is over the top.
No. RIPE != RIPE NCC (RIR). This document is a product of the RIPE Routing-WG [1]. Read the reference. Fred [1] http://www.ripe.net/ripe/about/index.html
Bill, I agree with your general line of reasoning, but would likely characterize RIPE as an RIR *and* operator forum... formulating and reviewing recommendations on operational matters make some sense as a result. As to the particular set of prefixes, there's a great question as to what criteria make a particular network "important"... one could easily come up with a list of extremely popular commercial sites (CNN, Amazon, etc.) which might be more noticeable if route damped for an hour. /John At 4:06 AM +1200 9/3/04, Bill Manning wrote:
RIPE is the RIR for Europe. RIPE-229 is, from my viewpoint, arbitrary and capricious. the root servers are -ONE- set of interesting servers. what about the web sites that point to these "important" documents? or the time servers, or my NOC & monitoring machines?
The idea of an Internet Registry stepping into giving routing advice is a leap of faith. An RIR can tell you what was delegated - but presuming to give advice on what is important for everyone that uses IP protocols is over the top.
Some facts: "RIPE" is an operator forum, comparable to NANOG, APRICOT, AFNOG, .... (Strictly speaking RIPE pre-dates all of the others if one disregards that NANOG started as the NSFnet regional network meetings. ;-) "RIPE NCC" is a Regional Internet Registry, comparable to ARIN, APNIC, LACNIC, AFRINIC, .... (The RIPE NCC is the first of the regional registries.) RIPE is the public forum where RIPE NCC policies and procedures are set; they describe how the RIR allocates and assigns internet numbers. RIPE NCC policies and procedures are *extremely* careful not to prescribe any inter-domain routing practises and go out of their way to stress that operators have the authority about that. RIPE also makes general recommendations, which have nothing to do with the RIPE NCC. The "golden networks" recommendations are in this category. They are also just that: recommendations. ----------- Some opinions: The goals of the RIPE recommendations are laudable: Make dampening work predictably by aligning parameters. They reflect a general belief in "think globally, act locally" which still permeates RIPE discussions. However, this is not likely to work because: - operators have the sole authority about routing - different local goals - different capabilities of infrastructure - different capabilities of staff (design and operation) - sheer ignorance (Yes, I tend to be a bit more cynical these days than the average RIPE attender.) I doubt if a survey such as Rodney tries to perform will yield any useful results because responses will not be universal nor evenly distributed. ------ More personal opinions: If there is a significant number of operators that want to base any of their decisions on what services are behind specific addresses, this cannot be solved by static documents but must be solved by a registry. I would design this as a voluntary registry which operators may use in any which way they want if they choose to. The *art* in the design of such a registry would be defining the classes of services and agreeing on how to verify that an address houses a service of a given class for some classes such as DNS root and TLD servers. OTOH for DNS root and TLD servers, determining their addresses is trivial using the DNS itself and the containing prefixes can be found from current BGP tables and/or BGP archives such as the RIS. So the case for such a registry for DNS servers alone is highly questionable. Daniel
In article <20040906090324.GC3641@reifa.local>, Daniel Karrenberg <daniel.karrenberg@ripe.net> writes
RIPE NCC policies and procedures are *extremely* careful not to prescribe any inter-domain routing practises and go out of their way to stress that operators have the authority about that.
RIPE also makes general recommendations, which have nothing to do with the RIPE NCC. The "golden networks" recommendations are in this category. They are also just that: recommendations.
I think Rodney was worried that RIPE-NCC wasn't following the rule, which he thought odd if RIPE-NCC was part of RIPE. We've had several attempts, including yours just now, to debunk the latter. -- Roland Perry
On Thu, Sep 02, 2004 at 08:44:34AM -0700, Rodney Joffe wrote:
See: http://www.ripe.net/ripe/docs/routeflap-damping.html and http://www.qorbit.net/documents/golden-networks (thanks, Steve!)
If you do, what parameters do you use, or do you not dampen the "golden networks" at all?
If you don't implement ripe-229, why not?
Note: A *significant* number of networks appear to *not* follow ripe-229 guidelines at all.
I think the real quesiton is: Based on the increased performance of routers these days.. most people running BGP aren't using a 2500 or AGS+ anymore, or at least not getting a full routing table on them. Is bgp dampening really necessary anymore? Obviously we should dampen people that flap a high number of times in an hour, but the vast majority of the internet operates in a state where dampening causes more pain than benifit, imho. - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
Is bgp dampening really necessary anymore? Obviously we should dampen people that flap a high number of times in an hour, but the vast majority of the internet operates in a state where dampening causes more pain than benifit, imho.
I agree with your line of reasoning. However, if you follow the RIPE document's guidelines [ included below for reference ]... I don't fundamentally have a problem with any of it. 4 flaps before you start dampening in a time window is a lot of flapping. Which means you are flapping that prefix throughout your internal network views as well * the number of distributed forwarding line cards you have, etc, etc. Its not necessarily a good thing to leave unmanaged, no matter how slightly. I don't know if everything needs to be stable for an hour when it takes 4 flaps to bring the wrath of dampening on it in the first place though. Maybe 15-20 minutes of stability on the high end (/24 and longer prefixes). If someone flapped every 30 minutes or so, while not ideal, its certainly not causing wide-spread network failures and its keeping you from blackholing a good chunk of their traffic. I think the idea harkens to a day when coming up with 100% of your sessions & recalcs could bring your router down as traffic started to flow. So dampening helped you and everyone else stabilize before significant amounts of traffic started flowing through the 2500, 3600, AGS or whathaveyou. Clearly this isn't really the case anymore. If your router needs to protect itself from the big-bad-bgp sessions of its more powerful upstream routers, it can dampening more aggressively. Just my opinion, Deepak Jain AiNET --- 2.2 Description of recommended damping parameters Basically the recommended values do the following with harsher treatment for /24 and longer prefixes: * don't start damping until the 4th flap * /24 and longer prefixes: max=min outage 60 minutes * /22 and /23 prefixes: max outage 45 minutes; min outage of 30 minutes * all other prefix lengths: max outage 30 minutes; min outage 10 minutes If a specific damping implementation does not allow configuration of prefix-dependent parameters the least aggressive set should be used: * don't start damping before the 4th flap in a row * max outage 30 minutes; min outage 10 minutes Sample configurations for different vendors are referenced in Appendix A.2. These samples can be used as a basis for a configuration on other router platforms not listed there.
I don't fundamentally have a problem with any of it. 4 flaps before you start dampening in a time window is a lot of flapping.
you may want to look at http://rip.psg.com/~randy/030226.apnic-flap.pdf randy
On Fri, Sep 03, 2004 at 10:03:26AM +1200, Randy Bush wrote:
I don't fundamentally have a problem with any of it. 4 flaps before you start dampening in a time window is a lot of flapping.
you may want to look at
I've been wondering what the net results would be if one dampened aggressively but only for a max of 7-15 mins. Might that allow for the networks to be properly penalized yet provide the users a minimum amount of time to recover once the prefix is stable? - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
On 2-sep-04, at 23:58, Randy Bush wrote:
If you don't implement ripe-229, why not?
because the golden address space stuff is stupid
Maybe so, but the logic seems rather irrefutable: - as a rule, shorter prefixes are more important and/or more stable than long ones - so we dampen long prefixes more aggressively - the root DNS servers tend to live in long prefixes - so we exclude the root DNS prefixes But then again, dampening really doesn't buy you much as it only applies to routes that are flapping beyond the link to the next AS. So if you have an instable link somewhere, you can't dampen that instability away yourself.
On Fri, 3 Sep 2004 00:15:42 +0200 Iljitsch van Beijnum <iljitsch@muada.com> wrote:
But then again, dampening really doesn't buy you much as it only applies to routes that are flapping beyond the link to the next AS. So > if you have an instable link somewhere, you can't dampen that instability away yourself.
And this is the point: dampening can actually lead to decreased network stability and non-deterministic behavior. Granted, this behavior is exasperated by not deploying a common dampening policy across all ASes (which is the why RIPE-229 was written). This would not be as problematic if dampening could be applied to a path rather than a prefix, since an alternate could then be selected. But since this would require modifications to core aspects of BGP (and additional memory and processor requirements) it does not seem a likely solution.
--On 02 September 2004 16:09 -0700 John Bender <johnbender@speakeasy.net> wrote:
This would not be as problematic if dampening could be applied to a path rather than a prefix, since an alternate could then be selected. But since this would require modifications to core aspects of BGP (and additional memory and processor requirements) it does not seem a likely solution.
Hmmm.... So returning to the illustration Rodney gave Randy about the .foo domain, are we saying that if the .foo domain's DNS is anycast, then as (just from statistics of multiple paths) prefix flaps (as opposed to flaps of individual paths) are going to be more likely [*], route dampening adversely affects such (anycast) sources more than straight unicast? Or, looking at it the other way around, if in a heavily plural anycast domain prefix route changes (as opposed to route changes of individual paths) are more common than "normal" routes [*] (albeit without - dampening aside - affecting reachability), does this mean route dampening disproportionately harms such routes? i.e. is the answer to Randy "because such networks [might] have a higher tendency to use anycast. * = note untested assumption Alex
On Sat, 4 Sep 2004, Alex Bligh wrote: > if in a heavily plural anycast domain prefix route changes are more > common than "normal" routes (albeit without - dampening aside - > affecting reachability), does this mean route dampening > disproportionately harms such routes? This would be an argument in favor of either asking peers to tag anycast-learned routes no-export, as F-root does, or using anycast prefixes which are short enough that they won't make it through many people's filters, and advertising the aggregate from your tunnel-hub (which is presumed to be stable), as we do. I suspect that a stand-alone prefix, advertised with equal mask length from all instances, without no-export, would be relatively more vulnerable to dampening, as Alex suggests. Topologically, it appears little different than a massively peered or massively multi-homed network of any other sort, as the papers Randy is citing describe. -Bill
On Fri, 3 Sep 2004, Iljitsch van Beijnum wrote: > the logic seems rather irrefutable: > - as a rule, shorter prefixes are more important and/or more stable > than long ones > - so we dampen long prefixes more aggressively > - the root DNS servers tend to live in long prefixes > - so we exclude the root DNS prefixes What about the ccTLD prefixes? There are a lot more of them. And the gTLDs? And exchange points? And Microsoft Update servers? Where do you stop? -Bill
Bill Woodcock wrote:
On Fri, 3 Sep 2004, Iljitsch van Beijnum wrote:
the logic seems rather irrefutable: - as a rule, shorter prefixes are more important and/or more stable than long ones - so we dampen long prefixes more aggressively - the root DNS servers tend to live in long prefixes - so we exclude the root DNS prefixes
What about the ccTLD prefixes? There are a lot more of them. And the gTLDs? And exchange points? And Microsoft Update servers? Where do you stop?
Pay me to treat your prefixes more nicely? 1/2 :-) Pete
--- Petri Helenius <pete@he.iki.fi> wrote: <snip>
Pay me to treat your prefixes more nicely? 1/2 :-)
Isn't that the difference between transit and peering? Does anyone dampen people who are paying them? ===== David Barak -fully RFC 1925 compliant- __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail
Bill Woodcock <woody@pch.net> writes:
What about the ccTLD prefixes? There are a lot more of them. And the gTLDs? And exchange points? And Microsoft Update servers? Where do you stop?
If you simply don't dampen (hooray for adequate CPUs), then you are not only honoring the "golden networks", you aren't setting yourself up for annoyance. ---Rob
Hi Randy, On Sep 2, 2004, at 2:58 PM, Randy Bush wrote:
If you don't implement ripe-229, why not?
because the golden address space stuff is stupid
OK. I'll bite... Given Network A, which has "golden network" content behind it as described by the RIPE paper (root and tld data), if the network has some combination of events that result in all of their announcements to you being dampened by you, your users can't get "there". For grin's, let's say we're talking about .foo, one of the larger gtld's. You are absolutely right in suggesting that .foo has to get its act together. You may even tell your users that. But you'll be telling every single one of them, because every single one of them is going to attempt to resolve .foo domain names during the hour you have them dampened. And your cost in dealing with those support calls will probably outweigh the benefits of dampening .foo. I am polling networks so that I can get an idea of who handles their network this way, and who doesn't. I don't know if it is stupid or not, because I don't know enough about the subject yet. What I do know is that dampening these special networks with long prefixes already causes real-world problems. In many cases, the pain is felt by networks who may have a policy of not dampening, but are downstream of a major network that *does* dampen aggressively. Unless they're looking at the routing announcement and withdrawal data and analyzing it, they may never realize why their support infrastructure was overwhelmed. And Jared has a good point - modern BFR's *can* handle lots of flaps without breaking a sweat so maybe dampening aggressively, or even at all, may be an artifact whose time has gone. Notwithstanding the normal response of "If what is on that network is broken, let them fix it" which is tantamount to cutting off your nose to spite your face, saying it is stupid is more of a generalization and opinion, but doesn't really give reasons as to why it is stupid, so it really has no real value. What are the reasons you think (or know) it is *stupid*? And what is the solution technically, not to include "let them fix it - I'm in the right, so I'm not going to do anything". Thanks /rlj
because the golden address space stuff is stupid Given Network A, which has "golden network" content behind it as described by the RIPE paper
i don't care. if i had spare time on my hands, i would damp them more quickly for stupidity and greed. again, golden network space is a stupid idea. check out the dns for name to address mapping. randy
On Thu, 2 Sep 2004, Rodney Joffe wrote:
On Sep 2, 2004, at 2:58 PM, Randy Bush wrote:
If you don't implement ripe-229, why not?
because the golden address space stuff is stupid
OK. I'll bite...
Given Network A, which has "golden network" content behind it as described by the RIPE paper (root and tld data), if the network has some combination of events that result in all of their announcements to you being dampened by you, your users can't get "there". For grin's, let's say we're talking about .foo, one of the larger gtld's.
But .foo is announced from 13 IPs globally, allowing for anycast probably 40 nodes. If gtld-A has an incident it may be a good thing to dampen it from the internet as it may not be reachable, the other 12 gtlds will be able to serve responses in a stable manner. Unless you're suggesting *all* the gtlds are flapping at once? Steve
On Sep 3, 2004, at 10:46 AM, Stephen J. Wilcox wrote:
Given Network A, which has "golden network" content behind it as described by the RIPE paper (root and tld data), if the network has some combination of events that result in all of their announcements to you being dampened by you, your users can't get "there". For grin's, let's say we're talking about .foo, one of the larger gtld's.
But .foo is announced from 13 IPs globally, allowing for anycast probably 40 nodes. If gtld-A has an incident it may be a good thing to dampen it from the internet as it may not be reachable, the other 12 gtlds will be able to serve responses in a stable manner.
Unless you're suggesting *all* the gtlds are flapping at once?
Sorry. I thought I made that clear, in that "if the network has some combination of events that result in all of their announcements to you being dampened by you". I am not talking about events that happen all of the time, where one of 13 hiccups. .foo may have 13 IPs but they have two upstream providers, and the event causes all of their routes to flap. Rodney Joffe CenterGate Research Group, LLC http://www.centergate.com "Technology so advanced, even WE don't understand it"(R)
On Fri, 3 Sep 2004, Rodney Joffe wrote:
On Sep 3, 2004, at 10:46 AM, Stephen J. Wilcox wrote:
Given Network A, which has "golden network" content behind it as described by the RIPE paper (root and tld data), if the network has some combination of events that result in all of their announcements to you being dampened by you, your users can't get "there". For grin's, let's say we're talking about .foo, one of the larger gtld's.
But .foo is announced from 13 IPs globally, allowing for anycast probably 40 nodes. If gtld-A has an incident it may be a good thing to dampen it from the internet as it may not be reachable, the other 12 gtlds will be able to serve responses in a stable manner.
Unless you're suggesting *all* the gtlds are flapping at once?
Sorry. I thought I made that clear, in that "if the network has some combination of events that result in all of their announcements to you being dampened by you". I am not talking about events that happen all of the time, where one of 13 hiccups. .foo may have 13 IPs but they have two upstream providers, and the event causes all of their routes to flap.
ok so as someone else mentioned this would be a local problem. in a network such as this, you should be concerned for the possibility of having large numbers of prefixes dampened and soften your dampening parameters accordingly. there is nothing special in this scenario about 'golden networks' Steve
On Thu, 2 Sep 2004, Rodney Joffe wrote:
You are absolutely right in suggesting that .foo has to get its act together. You may even tell your users that. But you'll be telling every single one of them, because every single one of them is going to attempt to resolve .foo domain names during the hour you have them dampened. And your cost in dealing with those support calls will probably outweigh the benefits of dampening .foo.
I am polling networks so that I can get an idea of who handles their network this way, and who doesn't. I don't know if it is stupid or not, because I don't know enough about the subject yet. What I do know is that dampening these special networks with long prefixes already causes real-world problems. In many cases, the pain is felt by networks who may have a policy of not dampening, but are downstream of a major
While I'm not going to encourage anybody to avoid doing something to make their network stable because it should be somebody else's problem (just as I wouldn't suggest that somebody cross the street in front of a speeding truck just because pedestrians have the right of way at California crosswalks), this whole discussion strikes me as something that needs to be looked at in the context of DNS diversity. In the case of the root servers, there are 13 IP addresses, announced from different ASes, most of them by different organizations. Some of them are anycasted; I believe some of them still aren't. As long as a network still has reachability to one of them, things should work. Anything that causes a network to see all 13 of them flapping simultaneously is probably a local problem, and probably leaves much of the rest of the Internet inaccessible from that network The same really can't be said for some of the TLDs, either on the qorbit.net Golden Networks list or off (it omits all the ccTLDs, which include some of the most important TLDs in some parts of the world). I suspect many of the TLDs that have only two or three listed name servers are anycasted, and anycast does add a lot of reliability. For most forms of network or server failure, a good anycast implementation can force fail-over to another server, and users not doing traceroutes to the name servers will never notice. But one thing anycast doesn't do is protect against route flapping. If a domain is served from two anycast addresses, and two announced routes, all it takes to make it completely unreachable from some part of the Internet is for the two local servers to start flapping at the same time. If reliability of the individual components is equal, that should be a lot less robust than the root server architecture. So, it seems to me that there are three questions here: What is critical infrastructure? DNS for which domains? What about other services? Google? Hotmail or Yahoo? The answer to this presumably varies considerably from place to place. What should the providers of critical infrastructure be doing to make sure their critical infrastructure remains available? What should network operators be doing to make sure their networks can access critical infrastructure? -Steve
Hi Steve, Steve Gibbard wrote:
On Thu, 2 Sep 2004, Rodney Joffe wrote:
<snip>
So, it seems to me that there are three questions here:
What is critical infrastructure? DNS for which domains? What about other services? Google? Hotmail or Yahoo? The answer to this presumably varies considerably from place to place.
What should the providers of critical infrastructure be doing to make sure their critical infrastructure remains available?
What should network operators be doing to make sure their networks can access critical infrastructure?
The main question I was asking was actually: "How many of you who manage BGP speaking networks implement the RIPE "best practices" regarding dampening parameters for so-called "golden networks"? So, while I know your analysis and suggestions are important, this list suffers sufficiently from the "rathole" syndrome for me to respectfully move it back on subject. My primary question was quite simple, and most of the responses I have had have been just as simple and concise. For those who care, based on responses and some analysis, it appears that very few networks do follow the ripe-229 recommendations regarding "golden networks", including, oddly enough, parts of RIPE itself. Thanks to those who did respond. /rlj
In article <41393B29.6010804@centergate.com>, Rodney Joffe <rjoffe@centergate.com> writes
For those who care, based on responses and some analysis, it appears that very few networks do follow the ripe-229 recommendations regarding "golden networks", including, oddly enough, parts of RIPE itself.
Did you mean "parts of RIPE-NCC"? Sorry to be so pedantic, but this thread started off with a mild diversion caused by confusion between RIPE and RIPE-NCC. -- Roland Perry
Roland Perry wrote:
Did you mean "parts of RIPE-NCC"?
Sorry to be so pedantic, but this thread started off with a mild diversion caused by confusion between RIPE and RIPE-NCC.
You're right - it is a little confusing. According to their joined "about" pages, RIPE-NCC provides the administrative support for RIPE. So I guess it is a part of RIPE. But to answer you properly, I meants parts of RIPE, specifically including RIPE-NCC, do not follow the RIPE-NCC recommendations. ;-) /rlj
participants (18)
-
Alex Bligh
-
Bill Manning
-
Bill Woodcock
-
Daniel Karrenberg
-
David Barak
-
Deepak Jain
-
Frederico A C Neves
-
Iljitsch van Beijnum
-
Jared Mauch
-
John Bender
-
John Curran
-
Petri Helenius
-
Randy Bush
-
Robert E.Seastrom
-
Rodney Joffe
-
Roland Perry
-
Stephen J. Wilcox
-
Steve Gibbard