It's now been 2.5 business days since Panix was taken out. Do we know what the root cause was? It's hard to engineer a solution until we know what the problem was. --Steven M. Bellovin, http://www.cs.columbia.edu/~smb
On Wed, 25 Jan 2006, Steven M. Bellovin wrote:
It's now been 2.5 business days since Panix was taken out. Do we know what the root cause was? It's hard to engineer a solution until we know what the problem was.
Is it really that hard to engineer this solution? We do have several of them proposed (SBGP, soBGP, etc) and new WG is likely to be formed soon within IETF to finally work it out. -- William Leibzon Elan Networks william@elan.net
On Wed, 25 Jan 2006, william(at)elan.net wrote:
On Wed, 25 Jan 2006, Steven M. Bellovin wrote:
It's now been 2.5 business days since Panix was taken out. Do we know what the root cause was? It's hard to engineer a solution until we know what the problem was.
Is it really that hard to engineer this solution? We do have several of them proposed (SBGP, soBGP, etc) and new WG is likely to be formed soon within IETF to finally work it out.
It'd be darn difficult to engineer a solution that would end up being deployed in any reasonable time if we don't know the requirements first. Yes, there's a draft -- draft-ietf-rpsec-bgpsecrec-03.txt -- but it has been woefully lacking on the operator & deployment requirements. More people should participate in the effort. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
On Thu, 26 Jan 2006 07:54:30 +0200, Pekka Savola said:
It'd be darn difficult to engineer a solution that would end up being deployed in any reasonable time if we don't know the requirements first.
Fortunately, when we know the requirements and engineer a solution, deployment is straightforward. RFC2827, for example, has a stellar deployment record. In other words - what is the business case for deploying this proposed solution? I may be able to get things deployed at $WORK by arguing that it's The Right Thing To Do, but at most shops an ROI calculation needs to be attached to get movement....
On Thu, 26 Jan 2006, Valdis.Kletnieks@vt.edu wrote:
In other words - what is the business case for deploying this proposed solution? I may be able to get things deployed at $WORK by arguing that it's The Right Thing To Do, but at most shops an ROI calculation needs to be attached to get movement....
Exactly. If $OTHER_FOLKS don't deploy it, cases like Panix may not really be avoided. I think that's what folks proposing perfect -- but practically undeployable -- security solutions are missing. -- Pekka Savola "You each name yourselves king, yet the Netcore Oy kingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
In message <Pine.LNX.4.64.0601260832510.15682@netcore.fi>, Pekka Savola writes:
On Thu, 26 Jan 2006, Valdis.Kletnieks@vt.edu wrote:
In other words - what is the business case for deploying this proposed solution? I may be able to get things deployed at $WORK by arguing that it's The Right Thing To Do, but at most shops an ROI calculation needs to be attached to get movement....
Exactly. If $OTHER_FOLKS don't deploy it, cases like Panix may not really be avoided.
I think that's what folks proposing perfect -- but practically undeployable -- security solutions are missing.
That is, of course, why I asked the question -- I'm trying to understand the actual failure modes and feasible fixes. I agree that many of the solutions proposed thus far are hard to deploy; some colleagues and I are working on variants that we think are deployable. But we need data first. --Steven M. Bellovin, http://www.cs.columbia.edu/~smb
In terms of the larger question.... ConEd Communications was recently acquired by RCN. I'm not sure if the transaction has formally closed. I suspect there are serious transition issues occurring. "Financial Stability", "Employee Churn", and "Ownership" are, unfortunately, tough things to factor into BGP algorithms. http://investor.rcn.com/ReleaseDetail.cfm?ReleaseID=181194 Internet access has always been a sideline for CEC - they are more of a provider of transport, and their customers have included some very well known entities in the NY metro area. Perhaps someone from RCN would care to comment? - Dan
"Daniel Golding" <dgolding@burtongroup.com> wrote:
ConEd Communications was recently acquired by RCN. I'm not sure if the transaction has formally closed. I suspect there are serious transition issues occurring. "Financial Stability", "Employee Churn", and "Ownership" are, unfortunately, tough things to factor into BGP algorithms.
I have no idea if this is really related, but the issue was the same weekend that ConEd had major network maintenance going on. My ConEd service was down (NYC area) for the entire weekend (about 60 hours) during their planned maintenance window to convert their network to MPLS. I saw their maintenance notice and noticed that the window lasted multiple days. I expected the link to go down - but I never imagined they meant it would stay down for the entire maintenance window. So, I'm speculating that even if there weren't organization issues their engineers were probably very busy and distracted by the major technical changes going on.
Steven, all, On Wed, Jan 25, 2006 at 03:04:30PM -0500, Steven M. Bellovin wrote:
It's now been 2.5 business days since Panix was taken out. Do we know what the root cause was? It's hard to engineer a solution until we know what the problem was.
I keep hearing that Con Ed Comm was previously an upstream of of Panix ( http://www.renesys.com/blog/2006/01/coned_steals_the_net.shtml#comments ) and that this might have explained why Con Ed had Panix routes in their radb as-27506-transit object. But I checked our records of routing data going back to jan 1, 2002, and see no evidence of 27506 and 2033 being adjacent to each other in any announcement from any of our peers at any time since then. So I can't really verify that Panix was ever a Con Ed Comm customer. Can anyone else clear this up? So far, it's not making sense. The supposition was that all of the other affected ASes that are not currently customers of Con Ed Comm were also previously customers. Some appear to have been (Walrus Internet (AS7169), Advanced Digital Internet (AS23011), and NYFIX (AS20282) for sure) but I haven't been able to verify that all of them were. I know that this isn't really a "root cause" that Steven was asking for, though. The root cause is that filtering is imperfect and out of date frequently. This case is particularly intersting and painful because Verio is known for building good filters automatically. In this case, they did so based on out-of-date information, unfortunately. This is particularly depressing because normally in cases of leaks like this, the propagation is via some provider or peer who doesn't filter at all. In this case, one of the vectors was one of the most responsible filterers on the net. sigh. So in terms of engineering good solutions, the space is pretty crowded. One camp is of the "total solution" variety that involves new hardware, new protocols, and a Public Key approach where originations (or any announcements) are signed and verified. This is obviously a very good and complete approach to the problem but it's also obviously seeing precious little adoption. And in the mean time we have nothing. Another set of approaches has been to look at alternate methods of building filters, taking into account more information about history of routing announcements and dampening or refusing to accept novel, questionable announcements for some fixed, short amount of time. Josh Karlin's paper suggests that as does some of the stuff that Tom Scholl, Jim Deleskie and I presented at the last nanog. All of this has the disadvantage of being a partial solution, the advantage of being implementable easily and in stages without a network forklift or a protocol upgrade, but the further disadvantage of being nowhere near fully baked. Clearly more, smarter people need to keep searching for good solutions to this set of problems. Extra credit for solutions that can be implemented by individual autonomous systems without hardware upgrades or major protocol changes, but that may not be possible. t. p.s.: wrt comments made previously that imply that moving parts of routing control off of the routers is "Bell-like" or "bell-headed": although the comments are silly and made somewhat in jest, they're obviously not true. anyone who builds prefix filters or access lists off of routers is already generating policy somewhere other than the router. using additional history or smarts to do that and uploading prefix filters more often doesn't change that existing architecture or make the network somehow "bell-like". it might not work well enough to solve the problem, but that's another, interesting objection. -- _____________________________________________________________________ todd underwood chief of operations & security renesys - internet intelligence todd@renesys.com http://www.renesys.com/blog
Dislcaimer: I work for AS2914 On Thu, Jan 26, 2006 at 02:39:59PM -0500, Todd Underwood wrote:
Another set of approaches has been to look at alternate methods of building filters, taking into account more information about history of routing announcements and dampening or refusing to accept novel, questionable announcements for some fixed, short amount of time. Josh Karlin's paper suggests that as does some of the stuff that Tom Scholl, Jim Deleskie and I presented at the last nanog. All of this has the disadvantage of being a partial solution, the advantage of being implementable easily and in stages without a network forklift or a protocol upgrade, but the further disadvantage of being nowhere near fully baked.
Clearly more, smarter people need to keep searching for good solutions to this set of problems. Extra credit for solutions that can be implemented by individual autonomous systems without hardware upgrades or major protocol changes, but that may not be possible.
t.
p.s.: wrt comments made previously that imply that moving parts of routing control off of the routers is "Bell-like" or "bell-headed": although the comments are silly and made somewhat in jest, they're obviously not true. anyone who builds prefix filters or access lists off of routers is already generating policy somewhere other than the router. using additional history or smarts to do that and uploading prefix filters more often doesn't change that existing architecture or make the network somehow "bell-like". it might not work well enough to solve the problem, but that's another, interesting objection.
This is something that (as i mentioned to you in private) some others have thought of as well. We at 2914 build the filters and such off-the-route and load them to the router with sometimes quite large configurations. (they have been ~8MB in the past) I'd love to see some prefix stability data (eg: 129.250/16 has been announced by origin-as 2914 for X years/seconds/whatnot) which can help score the data better. Do we need a origin-as match in our router policies? does it exist already? What about a way to dampen/delay announcements that don't match the origin-as data that exists? I think a solution like this would help out a number of networks that have these types of problems/challenges. Obviously noticing an origin change and alerting or similar on that would be nice and useful, but would the noise be too much for a NOC display? - jared ps. i'm glad our NOC/operations people were able to solve the PANIX issue quickly for them. -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
The noise of origin changes is fairly heavy, somewhere in the low hundreds of alerts per day given a 3 day history window. Supposing a falsely originated route was delayed, what is the chance of identifying and fixing it before the end of the delay period? Do operators commonly catch misconfigurations on their own or do they usually find out about it from other operators due to service disruption?
On Thu, Jan 26, 2006 at 04:22:29PM -0700, Josh Karlin wrote:
The noise of origin changes is fairly heavy, somewhere in the low hundreds of alerts per day given a 3 day history window. Supposing a falsely originated route was delayed, what is the chance of identifying and fixing it before the end of the delay period? Do operators commonly catch misconfigurations on their own or do they usually find out about it from other operators due to service disruption?
Are the origin changes for a small set of the prefixes that tend to repeat (eg: connexion as planes move), or is it a different set of prefixes day-to-day or week-to-week? I suspect there are the obvious prefixes that don't change (eg: 12/8, 18/8, 35/8, 38/8) but subparts of that may change, but for most people with allocations in the range of 12-17 bits, I suspect they won't change frequently. - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
I unfortunately don't have answers to those questions, but you've piqued my interest so I will try to look into it within the next couple of days. Josh On 1/26/06, Jared Mauch <jared@puck.nether.net> wrote:
On Thu, Jan 26, 2006 at 04:22:29PM -0700, Josh Karlin wrote:
The noise of origin changes is fairly heavy, somewhere in the low hundreds of alerts per day given a 3 day history window. Supposing a falsely originated route was delayed, what is the chance of identifying and fixing it before the end of the delay period? Do operators commonly catch misconfigurations on their own or do they usually find out about it from other operators due to service disruption?
Are the origin changes for a small set of the prefixes that tend to repeat (eg: connexion as planes move), or is it a different set of prefixes day-to-day or week-to-week?
I suspect there are the obvious prefixes that don't change (eg: 12/8, 18/8, 35/8, 38/8) but subparts of that may change, but for most people with allocations in the range of 12-17 bits, I suspect they won't change frequently.
- jared
-- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
jared, i may have missed the answer to my question. but, as verio was the upstream, and verio is known to use the irr to filter, could you tell us why that approach seemed not to suffice in this case? randy
On Thu, Jan 26, 2006 at 05:41:10PM -0800, Randy Bush wrote:
jared,
i may have missed the answer to my question. but, as verio was the upstream, and verio is known to use the irr to filter, could you tell us why that approach seemed not to suffice in this case?
Sure, what I saw by going through the diffs, etc.. that I have available to me is that the prefix was registered to be announced by our customer and hence made it into our automatic IRR filters. it was no longer in there by the time that I personally looked things up in our registry, but saw diffs go through removing that prefix later in the day (night) from the acl. Someone that has a snapshot of the various IRR data from those days can likely put this together better than I can explain. - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
All these explanations can only go so far as to show that ConEd and its upstreams may have had these prefixes as something that is allowed (due to previous transit relationships) to be annnounced. However presumably all these were transit arrangements with ConEd and ip blocks would have originated from different ASN where a during the accident ConEd actually directly announced prefix as originating from its own ASN. One thing I can think of is that ConEd started doing syncrhonization so all eBGP routes were redistributed into ospf or some other igp protocol. This could led to situation that some previously configured router that redistributes summarized rotues from igp go bgp could think the route needs to be advertised as coming from ConEd and announced it Verio. But I think result of all this should have been that route would be flapping (i.e. they start announcing and then it gets removed from what they learn from upstream and so no longer redistributed to igp and no longer announced; back to the beginning) and they weren't. -- William Leibzon Elan Networks william@elan.net
what I saw by going through the diffs, etc.. that I have available to me is that the prefix was registered to be announced by our customer and hence made it into our automatic IRR filters.
i.e., the 'error' was intended, and followed all process. so, what i don't see is how any hacks on routing, such as delay, history, ... will prevent this while not, at the same time, have very undesired effects on those legitimately changing isps. seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol. what am i missing here? randy
On Fri, Jan 27, 2006 at 04:36:28AM -0800, Randy Bush wrote:
what I saw by going through the diffs, etc.. that I have available to me is that the prefix was registered to be announced by our customer and hence made it into our automatic IRR filters.
i.e., the 'error' was intended, and followed all process.
so, what i don't see is how any hacks on routing, such as delay, history, ... will prevent this while not, at the same time, have very undesired effects on those legitimately changing isps.
seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol.
perhaps you mean certified validation of prefix origin and path. Ownership of any given prefix is a dicey concept at best. as a start, i'd want two things for authentication and integrity checks: AS P asserts it is the origin of prefix R and prefix R asserts the true origin AS is P (or Q or some list). Being able to check these assertions and being assured of the authenticity and integrity of the answers goes a long way, at least for me. path validation is something else and a worthwhile goal. --bill
what am i missing here?
randy
On 27-Jan-2006, at 07:51, bmanning@vacation.karoshi.com wrote:
perhaps you mean certified validation of prefix origin and path.
In the absense of path valdiation, a method of determining the real origin of a prefix is also required, if the goal is to prevent intentional hijacking as well as unintentional origination. Simply looking at the right-most entry in the AS_PATH doesn't cut it, since anybody can "set as-path prepend P". This suggests to me that either we can't separate origin validation from path validation (which sucks the former into the more difficult problems associated with the latter), or we need a better measure of "origin" (e.g. a PKI and an attribute which carries a signature). Joe
On Fri, Jan 27, 2006 at 10:42:11AM -0500, Joe Abley wrote:
On 27-Jan-2006, at 07:51, bmanning@vacation.karoshi.com wrote:
perhaps you mean certified validation of prefix origin and path.
In the absense of path valdiation, a method of determining the real origin of a prefix is also required, if the goal is to prevent intentional hijacking as well as unintentional origination. Simply looking at the right-most entry in the AS_PATH doesn't cut it, since anybody can "set as-path prepend P".
but by definition, the right-most entry is the prefix origin... the question becomes, is that the origin the prefix expects? to use an historical example: 198.32.6.0/24 thinks that AS 4555 is the correct origin AS 4555 thinks that it should (and does) originate prefix 198.32.6.0/24 AS 4555 uses AS 226 and 701 as transit providers. AS 1239 wants to be helpful and tells its peers that it is the proper origin for prefix 198.32.0.0/16 -BUT- never tells AS 4555 about this and has no direct means to deliver packets to AS 4555. Or... we see 128.9.160.0/24 as originating from multiple ASNs. there is no requirement for single AS origin - is that "theft" or an engineering tradeoff?
This suggests to me that either we can't separate origin validation from path validation (which sucks the former into the more difficult problems associated with the latter), or we need a better measure of "origin" (e.g. a PKI and an attribute which carries a signature).
i was just interested in the problem of assertion of origination. it needs to be done w/o a centralized repositiory (imho) because that method has scalability problems. such a technique does open new chances to "confuse" ... e.g. what happens when the prefix is seen from the same apparent AS but w/ two or more different signatures? path validation is (again imho) a severable problem the prefix/as origin.
Joe
On 27-Jan-2006, at 11:12, bmanning@vacation.karoshi.com wrote:
but by definition, the right-most entry is the prefix origin...
Suppose AS 9327 decides to originate 198.32.6.0/24, but prepends 4555 to the AS_PATH as it does so. Suppose 9327's uses a transit provider which builds prefix filters from the IRR, and the "as9327" aut-num object is modified to include policy which suggests 9327 provides transit for 4555. Suppose this is not actually the case, though, and in fact 9327 is a rogue AS which is trying to capture 4555's traffic. The rest of the world sees a prefix with an AS_PATH attribute which ends with "9327 4555". In this case, from the point of view of those trying to discern legitimacy of advertisements, what is the origin of the prefix? Is it 4555, or 9327? Is it possible to tell, from just the right-most entry in the AS_PATH attribute? Joe [note: 9327 is not a rogue AS, in fact. This is just hypothetical :-)]
On Jan 27, 2006, at 11:39 AM, Joe Abley wrote:
On 27-Jan-2006, at 11:12, bmanning@vacation.karoshi.com wrote:
but by definition, the right-most entry is the prefix origin...
Suppose AS 9327 decides to originate 198.32.6.0/24, but prepends 4555 to the AS_PATH as it does so. Suppose 9327's uses a transit provider which builds prefix filters from the IRR, and the "as9327" aut-num object is modified to include policy which suggests 9327 provides transit for 4555. Suppose this is not actually the case, though, and in fact 9327 is a rogue AS which is trying to capture 4555's traffic.
The rest of the world sees a prefix with an AS_PATH attribute which ends with "9327 4555".
In this case, from the point of view of those trying to discern legitimacy of advertisements, what is the origin of the prefix? Is it 4555, or 9327?
Is it possible to tell, from just the right-most entry in the AS_PATH attribute?
Suggested solutions do not have to solve every possible problem. Knowing the "correct" origin will stop accidental announcements, like the one under discussion in this thread. And, I suspect, most problems we see today of this sort. We are not (yet) to the point where maliciously originated prefixes are as big a problem as accidentally originated prefixes. -- TTFN, patrick
On Fri, Jan 27, 2006 at 11:39:27AM -0500, Joe Abley wrote:
On 27-Jan-2006, at 11:12, bmanning@vacation.karoshi.com wrote:
but by definition, the right-most entry is the prefix origin...
Suppose AS 9327 decides to originate 198.32.6.0/24, but prepends 4555 to the AS_PATH as it does so. Suppose 9327's uses a transit provider which builds prefix filters from the IRR, and the "as9327" aut-num object is modified to include policy which suggests 9327 provides transit for 4555. Suppose this is not actually the case, though, and in fact 9327 is a rogue AS which is trying to capture 4555's traffic.
The rest of the world sees a prefix with an AS_PATH attribute which ends with "9327 4555".
In this case, from the point of view of those trying to discern legitimacy of advertisements, what is the origin of the prefix? Is it 4555, or 9327?
from BGP's perspective, you tell me. being the naive BGP listen/speaker - i think that AS 4555 is the origin. now... what does Prefix 198.32.6.0/24 say is the correct origin?
Is it possible to tell, from just the right-most entry in the AS_PATH attribute?
nope - but you have jumped right into the path question. (what does the as4555 aut-num object say about using 9327 as an upstream AS?)
Joe
[note: 9327 is not a rogue AS, in fact. This is just hypothetical :-)]
sez you :) (reminder to send Cingular the royalty check if you receive the above two characters ":" and ")" as listed above AND you chose to infer mood or intent.) I think -all- AS are run by rouges and pirates. -- (headless) bill
seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol.
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? -certified prefix ownership -certified AS path ownership -dynamic changes to the above two items It seems to me that most of the pieces needed to do this already exist. RPSL, IRR softwares, regional addressing authorities (RIRs). If there are to be certified AS paths in a central database this also opens the door to special arrangements for AS path routing that go beyond peering, i.e. agreements with the peers of your peers. Seems to me that operational problem solving works better when the problem is not thrown into the laps of the protocol designers. --Michael Dillon
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
-certified prefix ownership -certified AS path ownership -dynamic changes to the above two items
It seems to me that most of the pieces needed to do this already exist. RPSL, IRR softwares, regional addressing authorities (RIRs). If there are to be certified AS paths in a central database this also opens the door to special arrangements for AS path routing that go beyond peering, i.e. agreements with the peers of your peers.
Hasn't that been said for years? Wouldn't perfect IRRs be great? I couldn't agree more. But in the meanwhile, why not protect your own ISP by delaying possible misconfigurations. Our proposed delay does *not* affect reachability, if the only route left is suspicious, it will be chosen regardless. If you are changing providers, which takes awhile anyway, just advertise both for a day and you have no problems. Or, if you are concerned about speed, simply withdraw one and the new one will have to be used. If you are anycasting the prefix and a new origin pops up that your view has not seen before, then you might have a temporary load balance issue, but there is absolutely no guarantee of what routers many hops away from you will see anyway. Josh
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
-certified prefix ownership -certified AS path ownership -dynamic changes to the above two items
It seems to me that most of the pieces needed to do this already exist. RPSL, IRR softwares, regional addressing authorities (RIRs). If there are to be certified AS paths in a central database this also opens the door to special arrangements for AS path routing that go beyond peering, i.e. agreements with the peers of your peers.
It is true that most of the pieces do exist. The problem appears to be not a want of tools, but the fact that the tools are not coupled properly---updating records about prefix ownership is, today, performed out-of-band from the routing protocol. This is a losing proposition. The data in the IRR, CA, or any mechanism that is updated out-of-band from the protocol itself will inherently be out-of-sync. A better idea, I think, would be to tie the identifier of the route something that is inherently bound to some cryptographic information (e.g., a public key), rather than a separate piece of information whose ownership must be "certified" (i.e., an IP prefix, an AS number). I can think of some great ways to do this, but they all involve varying degrees of departure from prefix-based routing. I would certinaly be interested in talking offline about this with any forward-thinking types.
Hasn't that been said for years? Wouldn't perfect IRRs be great? I couldn't agree more. But in the meanwhile, why not protect your own ISP by delaying possible misconfigurations. Our proposed delay does *not* affect reachability, if the only route left is suspicious, it will be chosen regardless.
Depending on the threat model, then, one attack would be to cause an AS to damp the non-suspicious route. This seems bad, right? A flapping, correct route seems better than a stable, suspicious one. Perhaps I am missing something, but how does imposing a delay help in ascertaining a route's correctness? Even looking at some of the "suspicious" routes I see by hand in the anomalies we detect, I can't personally tell what's incorrect/actionable vs. simply unusual (again, this goes back to the problem of inaccurate registries). In the case of Panix/ConEd, I can imagine that an operator would have responded to the alarms, checked the registry information and said, "these routes look reasonable; go for it!" Or, as human nature suggests, the operator might have even just ignored the alarms (particularly if origin changes are as frequent as they seem to be). What is really needed, in any case, is a better way to determine the route's veracity. This still requires some auxiliary mechanism to distinguish "unusual" from "suspcious", and, while you're designing that auxiliary mechanism, it might as well be in-band (per the arguments above).
If you are changing providers, which takes awhile anyway,
That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm -Nick
Hasn't that been said for years? Wouldn't perfect IRRs be great? I couldn't agree more. But in the meanwhile, why not protect your own ISP by delaying possible misconfigurations. Our proposed delay does *not* affect reachability, if the only route left is suspicious, it will be chosen regardless.
Depending on the threat model, then, one attack would be to cause an AS to damp the non-suspicious route. This seems bad, right? A flapping, correct route seems better than a stable, suspicious one.
A flapping route would only be considered suspicious if it disappears for many consecutive days and no other known route for the prefix originates at the same AS. At which point the attacker has already won. Our primary concern is with keeping BGP stable until its replacement (e.g. sBGP) is ready for deployment.
Perhaps I am missing something, but how does imposing a delay help in ascertaining a route's correctness? Even looking at some of the "suspicious" routes I see by hand in the anomalies we detect, I can't personally tell what's incorrect/actionable vs. simply unusual (again, this goes back to the problem of inaccurate registries). In the case of Panix/ConEd, I can imagine that an operator would have responded to the alarms, checked the registry information and said, "these routes look reasonable; go for it!" Or, as human nature suggests, the operator might have even just ignored the alarms (particularly if origin changes are as frequent as they seem to be).
Ascertaining correctness is only half of the work. If you correctly classify a malicious route, but do not take some measure to prevent its spread, you have just done yourself and your customers harm. In the case of PGBGP, there is a lot that an operator can do to verify correctness. Multiple viewpoints of anomalous routes can be collected into a single database in which operators can, once per day, check to make sure that their own address space is not being announced elsewhere. This can easily be automated for both the NOC and the collection process. Relationship information need not be revealed as only the originator of the suspicious route is needed. If, in the worst case, the route is not detected as malicious before it is considered normal, the next wave of routers will be introduced to the route and consider it suspicious. The first wave will then notice the problem and fix it, still protecting a significant portion of the network. Josh
On Fri, 3 Feb 2006, Josh Karlin wrote:
Our primary concern is with keeping BGP stable until its replacement (e.g. sBGP) is ready for deployment.
veering off course for a tick: "I wonder how well sbgp/sobgp will behave in a world of 1million routes in the DFZ? 5 million? 10? 20?... " Someone better be thinking about that part of the problem as well with the coming doom of ipv6 :)
Josh Karlin wrote:
Hasn't that been said for years? Wouldn't perfect IRRs be great? I couldn't agree more. But in the meanwhile, why not protect your own ISP by delaying possible misconfigurations. Our proposed delay does *not* affect reachability, if the only route left is suspicious, it will be chosen regardless. Depending on the threat model, then, one attack would be to cause an AS to damp the non-suspicious route. This seems bad, right? A flapping, correct route seems better than a stable, suspicious one.
A flapping route would only be considered suspicious if it disappears for many consecutive days and no other known route for the prefix originates at the same AS. At which point the attacker has already won.
My point was actually that an adversary could flap a correct route to damp it, to induce a router to select a suspicious one. (This threat also exists today, I believe, but the delay tactic does not solve the problem.)
Ascertaining correctness is only half of the work. If you correctly classify a malicious route, but do not take some measure to prevent its spread, you have just done yourself and your customers harm.
I would say that ascertaining correctness is more than half of the work. If a router can definitively say that a route is bogus, the "measure to prevent its spread" is pretty simple, right? i.e., just drop the route.
In the case of PGBGP, there is a lot that an operator can do to verify correctness. Multiple viewpoints of anomalous routes can be collected into a single database in which operators can, once per day, check to make sure that their own address space is not being announced elsewhere. This can easily be automated for both the NOC and the collection process. Relationship information need not be revealed as only the originator of the suspicious route is needed.
Analysis of multiple vantage points could definitely help in your case. The method for determining what a "suspcious" route is is not obvious, though. In the example you present, a router can install route filters to reject incoming announcements for its own address space (many ISPs seem to deploy these types of filters already). Much trickier is determining things like route hijacks, where even a delay won't help much without a reasonable way to ask "Is this route hijacked?" The best way I know of for doing that is to go back to the registry. If there are other ways to do this, I'd certainly be very interested to know about the state of the art. The proposal seems useful in a case where collection of measurements from multiple vantage points could run analysis to detect suspcious routes, assuming the detection algorithms could be run quickly enough and the information about suspicious routes could be propagated back out to the network...which might not always be true in an attack scenario. -Nick
On Fri, Feb 03, 2006 at 02:15:45PM -0500, Nick Feamster wrote: [snip]
This is a losing proposition. The data in the IRR, CA, or any mechanism that is updated out-of-band from the protocol itself will inherently be out-of-sync.
Provisioning systems are out of synch with the protocol, but essential for many(maont?) networks' connectivity. Many providers who do use the IRR have it as an adjunct/offshoot of their provisionign system. Of course, to some monolithic entities the suggestion that any alteration (or $deity-forbid, a not-invented-here *improvement*) to their system is anathema. [snip some interesting stuff]
If you are changing providers, which takes awhile anyway,
That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm
See 'whois -h whois.radb.net rs-ed-ash' and similar objects; great support for "IRR as externally-relevant portion of a provisioning system". Cheers, Joe -- RSUC / GweepNet / Spunk / FnB / Usenix / SAGE
[ SNIP ]
If you are changing providers, which takes awhile anyway,
That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm
NOT an ISP product. -M< Martin Hannigan (c) 617-388-2663 Renesys Corporation (w) 617-395-8574 Member of the Technical Staff Network Operations hannigan@renesys.com
Martin Hannigan wrote:
[ SNIP ]
If you are changing providers, which takes awhile anyway,
That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm
NOT an ISP product.
Independent of ED, one should be cautious when designing routing protocols based on logistical and business assumptions (e.g., switching providers takes awhile, most business policies are vanilla peering, etc.). These assumptions are certainly not fundamental, and they may not always be true, regardless of what exists today. -Nick
At 02:05 AM 2/6/2006, Nick Feamster wrote:
Martin Hannigan wrote:
[ SNIP ]
If you are changing providers, which takes awhile anyway,
That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm
NOT an ISP product.
Independent of ED, one should be cautious when designing routing protocols based on logistical and business assumptions (e.g., switching providers takes awhile, most business policies are vanilla peering, etc.).
These assumptions are certainly not fundamental, and they may not always be true, regardless of what exists today
This is strictly a market-maker product, IMHO, which is different from a transition or provisioning strategy. YMMV. ISP's don't switch providers, typically, enterprises do. ISP's add, move, and drop, so physical layer management is more important, believe it or not. -M< Martin Hannigan (c) 617-388-2663 Renesys Corporation (w) 617-395-8574 Member of the Technical Staff Network Operations hannigan@renesys.com
At 02:05 AM 2/6/2006, Nick Feamster wrote:
Martin Hannigan wrote:
[ SNIP ]
If you are changing providers, which takes awhile anyway,
That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm
NOT an ISP product.
Independent of ED, one should be cautious when designing routing protocols based on logistical and business assumptions (e.g., switching providers takes awhile, most business policies are vanilla peering, etc.).
These assumptions are certainly not fundamental, and they may not always be true, regardless of what exists today.
I got some "can you elaborate" comments so please forgive my second response. What I thought I read was that you thought Equinix had an interesting play in a transitioning and provisioning strategy for ISP's. My answer, in short, was to say that I see it as more of an enterprise play because it's a managed service and the hardest part of provisioning is typically the order cycle. If you are an ISP, you are theoretically multi homed by definition and your providers are going to remain fairly stable (you hope) based on your own needs. Equinix direct is a bandwidth commodity in my mind. Anyone remember Invisible Hand (still in business, btw http://www.invisiblehand.net/) Equinix handles the software interaction and is the market maker. Customers appear to providers and providers can decide if they want to sell to customers. For example, if you show up at ED and need X gigs, a provider could opt out of the market because you are a highcap customer. In the end, the market maker gets a piece of the action from the provider and sends the "customer" a bill since it is theoretically the provider. I think there's a question about neutrality, but there are no more pure neutral colo houses so that is somewhat irrelevant unless it's completely bogus like selling interconnect network or something vs. the ILEC. In an environment like Equinix or S&D, you could attach to the public peering fabric and "make connections", and then if you need someone specific you can hope to get them on ED (in Equinixs case) without buying dedicated transit. In short, it's easy. With that said, I believe most ISP's would be better suited to overlapped service or TE'ing vs. using commodity markets for b/w, IMHO. Thanks, -M< Martin Hannigan (c) 617-388-2663 Renesys Corporation (w) 617-395-8574 Member of Technical Staff Network Operations hannigan@renesys.com
Martin Hannigan wrote:
My answer, in short, was to say that I see it as more of an enterprise play because it's a managed service and the hardest part of provisioning is typically the order cycle. If you are an ISP, you are theoretically multi homed by definition and your providers are going to remain fairly stable (you hope) based on your own needs.
My point remains: designs based on such assumptions are not a good idea, since these assumptions are by no means fundamental and could certainly change. People get creative with how they announce prefixes, change upstreams, etc., and you can't assume that things like this would stay the way they are. As an aside, another question occurred to me about delaying unusual announcements. Boeing Connexion offers another example of unorthodox prefix announcements. Wouldn't the tactic of delaying unusual announcements would cause problems for this service? -Nick
On Tue, 7 Feb 2006, Nick Feamster wrote:
As an aside, another question occurred to me about delaying unusual announcements. Boeing Connexion offers another example of unorthodox prefix announcements. Wouldn't the tactic of delaying unusual announcements would cause problems for this service?
I had thought Josh's paper (or maybe not josh, whomever it was) said something along the lines of: 1) if more than one announcement prefer 'longer term', 'older', 'more usual' route 2) if only one route take it and run! So.. provided Connexion withdraws from 'as-germany' and announces in 'as-atlantic ocean', and so on there would only be 1 route, and you'd fall to step 2. (yes, the paper was more detailed and there were more steps...)
Chris has it! And to be clear, we only require a slow (1 day) provider changeover in the case that you want to announce your old provider's sub-prefix at a new provider. For instance, if you are an AT&T customer using a 12/8 sub-prefix and change providers but keep the prefix, the prefix will look funny coming from another originator for the first day and be delayed. All other methods of changing providers will not be interfered with. Josh
I had thought Josh's paper (or maybe not josh, whomever it was) said something along the lines of: 1) if more than one announcement prefer 'longer term', 'older', 'more usual' route 2) if only one route take it and run!
So.. provided Connexion withdraws from 'as-germany' and announces in 'as-atlantic ocean', and so on there would only be 1 route, and you'd fall to step 2.
(yes, the paper was more detailed and there were more steps...)
On Wed, Feb 08, 2006 at 04:37:31AM +0000, Christopher L. Morrow wrote:
I had thought Josh's paper (or maybe not josh, whomever it was) said something along the lines of: 1) if more than one announcement prefer 'longer term', 'older', 'more usual' route 2) if only one route take it and run!
FWIW, this sort of mechanism was discussed among the IETF RPSEC WG task group that is working on BGP security requirements. On the presumption that some database of stable routes and paths is present, you could bias your preference in your routes for more stable routes and paths. You would also need to decide what to do about more specific routes covered by stable routes. Do you ignore them? This is a harder question. -- Jeff Haas NextHop Technologies
Here is what we propose in PGBGP. If you have a more specific route and its AS Path does not contain any of the less specific route's origins, then ignore it for a day and keep routing to the less specific origin. If it's legitimate the less specific origin should forward the data on for the day. We see about 30 of these suspicious routes per day. I imagine some of you will not like this sceheme. Please let me know why. Josh On 2/8/06, Jeffrey Haas <jhaas@nexthop.com> wrote:
On Wed, Feb 08, 2006 at 04:37:31AM +0000, Christopher L. Morrow wrote:
I had thought Josh's paper (or maybe not josh, whomever it was) said something along the lines of: 1) if more than one announcement prefer 'longer term', 'older', 'more usual' route 2) if only one route take it and run!
FWIW, this sort of mechanism was discussed among the IETF RPSEC WG task group that is working on BGP security requirements.
On the presumption that some database of stable routes and paths is present, you could bias your preference in your routes for more stable routes and paths.
You would also need to decide what to do about more specific routes covered by stable routes. Do you ignore them? This is a harder question.
-- Jeff Haas NextHop Technologies
At 11:27 PM 2/7/2006, Nick Feamster wrote:
Martin Hannigan wrote:
My answer, in short, was to say that I see it as more of an enterprise play because it's a managed service and the hardest part of provisioning is typically the order cycle. If you are an ISP, you are theoretically multi homed by definition and your providers are going to remain fairly stable (you hope) based on your own needs.
My point remains: designs based on such assumptions are not a good idea, since these assumptions are by no means fundamental and could certainly change. People get creative with how they announce prefixes, change upstreams, etc., and you can't assume that things like this would stay the way they are.
Nick: I wouldn't call them assumptions. I would call them engineering decisions in operational environments. I guess I fail to see where a commodity market with a broker adding a vig resolves a real network problem. I'm think tier1? They aren't buying service from anyone on Equinix direct and move/add/drop is just another day on the Internet. I really can't see any provider doing it, but perhaps smaller ones. *shrug*. I don't know why you wouldn't make temporary arrangements via peering fabric, PNI, or transit and eliminate the middle man (point of failure).
As an aside, another question occurred to me about delaying unusual announcements. Boeing Connexion offers another example of unorthodox prefix announcements. Wouldn't the tactic of delaying unusual announcements would cause problems for this service?
[ snip ] -M<
-Nick
Martin Hannigan (c) 617-388-2663 Renesys Corporation (w) 617-395-8574 Member of Technical Staff Network Operations hannigan@renesys.com
Thus spake <Michael.Dillon@btradianz.com>
seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol.
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
We have such a database (used by Verio and others), but the Panix incident happened anyway due to bit rot. We've got to find a way to fix the layer 8 problems before we can make improvements at layer 3. S Stephen Sprunk "Stupid people surround themselves with smart CCIE #3723 people. Smart people surround themselves with K5SSS smart people who disagree with them." --Aaron Sorkin
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
We have such a database (used by Verio and others), but the Panix incident happened anyway due to bit rot. We've got to find a way to fix the layer 8 problems before we can make improvements at layer 3.
If an IRR suffers from bit-rot, then I don't consider it to be "well-operated" and therefore it cannot be considered to be part of a well-operated network of IRRs. The point is that the tools exist. The failing is in how those tools are managed. In other words this is an operational problem on both the scale of a single IRR and on the scale of the IRR system. Is this what you mean by a "layer 8" problem? --Michael Dillon
On Mon, Jan 30, 2006 at 09:48:13AM +0000, Michael.Dillon@btradianz.com wrote:
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
We have such a database (used by Verio and others), but the Panix incident happened anyway due to bit rot. We've got to find a way to fix the layer 8 problems before we can make improvements at layer 3.
If an IRR suffers from bit-rot, then I don't consider it to be "well-operated" and therefore it cannot be considered to be part of a well-operated network of IRRs.
The point is that the tools exist. The failing is in how those tools are managed. In other words this is an operational problem on both the scale of a single IRR and on the scale of the IRR system. Is this what you mean by a "layer 8" problem?
Take it up with the people putting data into the system, not the IRR operators. Anyone who is behind an IRR-based provider (like Verio) has motivation to put data into the system ("hey look I do this and now routing works"), but there is no motivation to take stale data OUT of the system. I can't even begin to count the number of networks I know who theoretically "use" IRR who don't even know HOW to remove data, let alone make any active attempt to do so when a customer leaves or a route is returned. Combine this with the idiots who run around proxy registering routes for other people based on everything they see in the table (gee theres a good idea, define filters for what is allowed in the table based on what we see people trying to put into the table, brilliant!) and you quickly see how IRR data becomes stale and eventually worthless. I'll save the rest of my rant for the presentation on the subject in Dallas. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
As we roll out a new network, on one of our links it is remarkably cheaper to run a T1 ptp vs. MPLS (running 66% data, 33% voice). Based on comments received from this list (much thanks, you know who you are) MPLS satisfaction seems to be determined by backbone noc competence, not the technology itself. So back to price....if I consider layer1 issues to be equal in either scenario, and aggregation/meshing/hardware is not a real concern, it seems to me that a correctly configured, directly connected pipe would work as well as mpls, with the benefit of local control of my routers and owning any incompetence. If anyone has enlightening experiences of ptp vs. mpls, I'd appreciate hearing about them. Thanks, Andrew
On Mon, 30 Jan 2006, Andrew Staples wrote:
As we roll out a new network, on one of our links it is remarkably cheaper to run a T1 ptp vs. MPLS (running 66% data, 33% voice). Based on comments received from this list (much thanks, you know who you are) MPLS satisfaction seems to be determined by backbone noc competence, not the technology itself. So back to price....if I consider layer1 issues to be equal in either scenario, and aggregation/meshing/hardware is not a real concern, it seems to me that a correctly configured, directly connected pipe would work as well as mpls, with the benefit of local control of my routers and owning any incompetence.
Also, the PTP T1 has fewer hops (probably lower latency), fewer points of failure, fewer ways to break, less complexity, etc. I can't think of any reason you'd want to go with an MPLS(VPN) solution over PTP solution if startup and MRC were equal. ---------------------------------------------------------------------- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net | _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
it seems to me that a correctly configured, directly connected pipe would work as well as mpls, with the benefit of local control of my routers and owning any incompetence.
I feel like I'm living in the twilight zone... Is it possible that IP on a plain old PDH circuit could really work as well as IP over MPLS over SDH over DWDM over fibre?
I feel like I'm living in the twilight zone...
Heheh :D
Is it possible that IP on a plain old PDH circuit could really work as well as IP over MPLS over SDH over DWDM over fibre?
Chances are that the PDH circuit would be an emulated service running over SDH, over DWDM, over fibre! Neil.
--On January 31, 2006 9:56:46 AM +0000 Michael.Dillon@btradianz.com wrote:
it seems to me that a correctly configured, directly connected pipe would work as well as mpls, with the benefit of local control of my routers and owning any incompetence.
I feel like I'm living in the twilight zone...
No no, that's just the vendor koolaide machine running momentarily dry. Hold on a moment, I'm sure someone will refill it shortly with the buzzwordblend ;)
At 10:32 AM -0800 1/30/06, Andrew Staples wrote:
As we roll out a new network, on one of our links it is remarkably cheaper to run a T1 ptp vs. MPLS (running 66% data, 33% voice). Based on comments received from this list (much thanks, you know who you are) MPLS satisfaction seems to be determined by backbone noc competence, not the technology itself. So back to price....if I consider layer1 issues to be equal in either scenario, and aggregation/meshing/hardware is not a real concern, it seems to me that a correctly configured, directly connected pipe would work as well as mpls, with the benefit of local control of my routers and owning any incompetence.
As long as you're referring to PTP with the voice packetized in some manner (so as to effectively achieve dynamic bandwidth allocation which you can get with multiple LSP's), then your tradeoff summary is on target. If you are looking at PTP TDM solution with fixed allocation, the PTP alternative wastes any idle voice bandwidth which would otherwise be available for data. /John
On Jan 30, 2006, at 5:02 AM, Richard A Steenbergen wrote:
On Mon, Jan 30, 2006 at 09:48:13AM +0000, Michael.Dillon@btradianz.com wrote:
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
We have such a database (used by Verio and others), but the Panix incident happened anyway due to bit rot. We've got to find a way to fix the layer 8 problems before we can make improvements at layer 3.
If an IRR suffers from bit-rot, then I don't consider it to be "well-operated" and therefore it cannot be considered to be part of a well-operated network of IRRs.
The point is that the tools exist. The failing is in how those tools are managed. In other words this is an operational problem on both the scale of a single IRR and on the scale of the IRR system. Is this what you mean by a "layer 8" problem?
Take it up with the people putting data into the system, not the IRR operators. Anyone who is behind an IRR-based provider (like Verio) has motivation to put data into the system ("hey look I do this and now routing works"), but there is no motivation to take stale data OUT of the system.
It gets even more fun if you're delegating route-origination to 3rd parties. Add a mnt-routes: so they can create a route object, but then you can't remove that inetnum block whilst their route object exists (nor remove the mnt-routes). *sigh*
On Mon, 30 Jan 2006 Michael.Dillon@btradianz.com wrote:
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
We have such a database (used by Verio and others), but the Panix incident happened anyway due to bit rot. We've got to find a way to fix the layer 8 problems before we can make improvements at layer 3.
If an IRR suffers from bit-rot, then I don't consider it to be "well-operated" and therefore it cannot be considered to be part of a well-operated network of IRRs.
honestly I'm not a fan of IRR's, so don't pay attention to them, but... is the IRR 'not well operated' or is the data stale because the 'users' of the IRR are 'not well operated' ? (the IRR as near as I can tell is nothing but a web/whois server that you sign-up-for and push/pull data through, right?)
On 4-Feb-2006, at 15:21, Christopher L. Morrow wrote:
honestly I'm not a fan of IRR's, so don't pay attention to them, but... is the IRR 'not well operated' or is the data stale because the 'users' of the IRR are 'not well operated' ?
The data ought to be maintained by the people to whom it relates. Customers (and peers) of some ISPs have great incentives to add appropriate records, since if they don't do so their ISPs' filters will not be widened to accept their routes. Other networks have no such incentive, since their transit providers and peers either build their filters in other ways, or don't filter at all. Generally, there is no incentive to remove data from the IRR, except in the case where resources are returned and reallocated to someone else who wants to make their own records. Wherever there is a lack of incentive to keep records accurate, we can probably safely assume that they are either missing or stale. "Customer" in this context means "anybody whose routes might be filtered by someone else". Since large, default-free carriers tend not to have their routes filtered by peers, those that don't use RPSL expressions to build customer filters don't have much reason to care about the IRR. It's probably fair to say that if all the large, default-free carriers insisted that their customers submitted their routes to the IRR, then every route would be registered. This would not completely address the problem of stale data, though.
(the IRR as near as I can tell is nothing but a web/whois server that you sign-up-for and push/pull data through, right?)
The IRR is a loosely-connected collection of route registries, all run by different people. Data originating in one database is frequently found to be mirrored in other databases, but not in any great systematic fashion. Together these databases form a distributed repository of RPSL objects. Objects are generally submitted by e-mail and retrieved using whois, but some registry operators also make web interfaces available. Anybody who doesn't know what RPSL is can find out at <http://www.irr.net/docs/rpsl.html>. Joe
Other networks have no such incentive, since their transit providers and peers either build their filters in other ways, or don't filter at all.
There is nothing wrong with building your filter in some other way, however, that does not mean that you cannot validate your filters against the IRR and take some action on mismatches. For instance you could email the prefix owners about the mismatch and ask them to update the IRR.
Wherever there is a lack of incentive to keep records accurate, we can probably safely assume that they are either missing or stale.
Yes. Without regular validation or auditing of data, it does not stay up to date.
It's probably fair to say that if all the large, default-free carriers insisted that their customers submitted their routes to the IRR, then every route would be registered. This would not completely address the problem of stale data, though.
It's a good start. Perhaps if we decouple the idea of an IRR from "building filters" more people will see the usefulness of a distributed repository of information against which they can validate (cryptographically or otherwise) their routing data. Right now the secure BGP protocols require a network to climb the hurdles of cryptographic certification in order to participate. A revised and renewed IRR can lower that barrier so that people can participate even before they implement cryptographic signing and certification.
The IRR is a loosely-connected collection of route registries, all run by different people. Data originating in one database is frequently found to be mirrored in other databases, but not in any great systematic fashion.
If the networking community can't solve the problem of managing the distributed route registries in a systematic fashion, then how can it implement one of the secure BGP proposals? --Michael Dillon
If an IRR suffers from bit-rot, then I don't consider it to be "well-operated" and therefore it cannot be considered to be part of a well-operated network of IRRs.
honestly I'm not a fan of IRR's, so don't pay attention to them, but... is the IRR 'not well operated' or is the data stale because the 'users' of the IRR are 'not well operated' ? (the IRR as near as I can tell is nothing but a web/whois server that you sign-up-for and push/pull data through, right?)
Indeed it is not much more than a server with a database which is why I do not consider it to be well-operated. In order to be "well-operated", somebody (or some organization) needs to take responsibility for the data in the database and make sure that this data is as accurate as can be. I'm really saying that if people want to solve this problem jointly, then the tools are already there for a membership organization to use. And such an organization could also work on a revised BGP protocol as a longer term solution. But, in the absence of such an organization we have nothing more than a disorganized chaos in which nothing much changes. --Michael Dillon
On Jan 27, 2006, at 8:29 AM, Michael.Dillon@btradianz.com wrote:
seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol.
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
Maybe I missed something, but didn't Verio say the prefix was in their internal registry, and that's why it was accepted. IOW: It didn't solve this problem. So I guess we're discussing the other 5%? -- TTFN, patrick
On 27-Jan-2006, at 11:54, Patrick W. Gilmore wrote:
On Jan 27, 2006, at 8:29 AM, Michael.Dillon@btradianz.com wrote:
seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol.
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
Maybe I missed something, but didn't Verio say the prefix was in their internal registry, and that's why it was accepted.
Perhaps by "well-operated", Michael was referring to something like the hierarchical authentication scheme used by the RIPE database, which ultimately provides access control for route objects using RIR allocation/assignment data? Joe
On Jan 27, 2006, at 12:57 PM, Joe Abley wrote:
On 27-Jan-2006, at 11:54, Patrick W. Gilmore wrote:
On Jan 27, 2006, at 8:29 AM, Michael.Dillon@btradianz.com wrote:
seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol.
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
Maybe I missed something, but didn't Verio say the prefix was in their internal registry, and that's why it was accepted.
Perhaps by "well-operated", Michael was referring to something like the hierarchical authentication scheme used by the RIPE database, which ultimately provides access control for route objects using RIR allocation/assignment data?
Yet it can still have stale data. That said, if there were a centralized store for such information and "you" were in charge of "your" objects, then the only person to blame when "your" prefix was incorrectly accepted would be "you". (We're talking things like accidental origination here, not malicious attempts to go around safeguards.) Put more concretely, Panix would have no one to blame but themselves if Verio accepted a prefix because it was properly registered in the DB. This, IMHO, would be a Good Thing. Not a panacea, but a Good Thing. And would avoid some very long threads on NANOG (which is also a Good Thing :). -- TTFN, patrick
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements?
Maybe I missed something, but didn't Verio say the prefix was in their internal registry, and that's why it was accepted.
IOW: It didn't solve this problem. So I guess we're discussing the other 5%?
You missed the words "well-operated". Today there is no well-operated network of IRRs so there is bad data in the databases. In addition, there is the question of how to use the IRR data. Should you build filters from it? Should you use it to validate your own internal database with human beings chasing up the differences and fixing whichever database is wrong? --Michael Dillon
randy, all, On Fri, Jan 27, 2006 at 04:36:28AM -0800, Randy Bush wrote:
what I saw by going through the diffs, etc.. that I have available to me is that the prefix was registered to be announced by our customer and hence made it into our automatic IRR filters.
i.e., the 'error' was intended, and followed all process.
yep. that's the depressing part.
so, what i don't see is how any hacks on routing, such as delay, history, ... will prevent this while not, at the same time, have very undesired effects on those legitimately changing isps.
you're probably right (as usual). but it seems that if you delay acceptance of announcements with novel origination patterns, you don't harm very many legitimate uses. in particular, ASes changing upstreams won't be harmed at all. people moving their prefix to a new ISP will have a fixed delay in getting their announcement propagated, sure. but they already have this delay now. they tell the new ISP: 'announce my prefix' and the new ISP says 'prove it's yours'. they do that for a couple of emails. then the new ISP asks it's upstreams to accept that announcement. that takes a little while (ranging from 4 to 72 hours in my recent experience).
seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol.
certified validation of prefix ownership (and path, as has been pointed out) would be great. it's clearly a laudable goal and seemed like the right way to go. but right now, no one is doing it. the rfcs that's i've found have all expired. and the conversation about it has reached the point where people seem to have stopped even disagreeing about how to do it. in short, it's as dead as dns-sec. so what are we do do in the meantime? t. -- _____________________________________________________________________ todd underwood chief of operations & security renesys - internet intelligence todd@renesys.com www.renesys.com
certified validation of prefix ownership (and path, as has been pointed out) would be great. it's clearly a laudable goal and seemed like the right way to go. but right now, no one is doing it. the rfcs that's i've found have all expired. and the conversation about it has reached the point where people seem to have stopped even disagreeing about how to do it. in short, it's as dead as dns-sec. so what are we do do in the meantime?
Perhaps people should stop trying to have these operational discussions in the IETF and take the discussions to NANOG where network operators gather. Writing RFCs is a fine way to document operational best practices, but it is not a good way to work out joint operational practices. Of course, NANOG is no magic bullet, but it seems like a more reasonable place to talk about how to make things better. A good start would be to try and get an agreed statement of what the problem is. Once you have broad agreement on the problem, then move on to solutions. --Michael Dillon
In message <OFA6D31A52.8D06553F-ON80257103.005A5CB4-80257103.005AB975@btradianz .com>, Michael.Dillon@btradianz.com writes:
certified validation of prefix ownership (and path, as has been pointed out) would be great. it's clearly a laudable goal and seemed like the right way to go. but right now, no one is doing it. the rfcs that's i've found have all expired. and the conversation about it has reached the point where people seem to have stopped even disagreeing about how to do it. in short, it's as dead as dns-sec. so what are we do do in the meantime?
Perhaps people should stop trying to have these operational discussions in the IETF and take the discussions to NANOG where network operators gather.
We have tried, of course; see, for example, NANOG 28 (Salt Lake City). There was no more consensus at NANOG than in the IETF... --Steven M. Bellovin, http://www.cs.columbia.edu/~smb
Perhaps people should stop trying to have these operational discussions in the IETF and take the discussions to NANOG where network operators gather.
We have tried, of course; see, for example, NANOG 28 (Salt Lake City). There was no more consensus at NANOG than in the IETF...
One attempt almost 3 years ago, doesn't sound very serious to me. And if the discussion is only concerned with seeking consensus on implementing a new flavor of BGP protocol then it isn't much of a discussion. In fact, there was a consensus at Salt Lake City that the issues of routing security could be adequately dealt with by existing tools and protocols. Not all problems require new protocols to solve them. --Michael Dillon
participants (27)
-
Andrew Staples
-
bmanning@vacation.karoshi.com
-
Christopher L. Morrow
-
Daniel Golding
-
Jared Mauch
-
Jeffrey Haas
-
Joe Abley
-
Joe Provo
-
John Curran
-
John Payne
-
Jon Lewis
-
Josh Karlin
-
Martin Hannigan
-
Matt Buford
-
Michael Loftis
-
Michael.Dillon@btradianz.com
-
Neil J. McRae
-
Nick Feamster
-
Patrick W. Gilmore
-
Pekka Savola
-
Randy Bush
-
Richard A Steenbergen
-
Stephen Sprunk
-
Steven M. Bellovin
-
Todd Underwood
-
Valdis.Kletnieks@vt.edu
-
william(at)elan.net