In an attempt to return to an argument, rather than simple contradiction (ok, ok, it's far more polite and reasonable so far than that would imply,
but I couldn't miss the cheap shot; apologies hereby tendered), perhaps we
should consider *what* the RFC should say, if it should be changed? Going to the WG with a proposal in hand and a rationale to support it would seem
to be the best path.
So, a summary of my view on it at the moment:
Assumption #1) Resetting a BGP session is 'costly'. Both in terms of the time it takes, the stability it removes, and the fact that it flaps all of your *outgoing* announcements as well as incoming ones.
Assumption #2) A router that sends a malformed route is clearly doing something which it Should Not Be Doing (tm) (ok, this might be axiomatic, but should still be laid out)
Assumption #3) The current practice has been shown to demonstrably increase the brittleness of the Internet, by causing severe flapping when someone only partially follows the RFC (in particular, propagating bad route data, whether or not the origional source session is reset).
Assumption #4) Routing errors which are bad data, but *not* malformed routes, will not generally be caught by normal means in normal operation, until a case of human intervention to cross-check the data.
Assumption #5) Any router which breaks so badly as to start spewing large amounts of validly formed but errorneous data, and is *also* spewing badly
formed data, will spew noticeable amounts of said badly formed data. (This
one is key, and is only a conjecture; field evidence would be of great use
in validating it).
Hello; Can "badly formed data" be reasonably clearly defined ? What tests are there for "validly formed but errorneous data" ? There are several monitoring efforts (including the one done here) which compare sets of (m)bgp routing tables over time. It seems to me that such (m)bgp pollution should be detectable with a monitoring project. BTW, what seems to be the clearest sign here of the recent flap was the dropping of 43 Autonomous Systems by UU.net for the Sat Jun 23 16:37:41 2001 status run. This is not a good enough metric to relieably detect such problems. There do seem to be a lot of weird changes in the routing table in that dump, but a simple test for this is not apparent to me at present. Regards Marshall Eubanks Multicast Technologies, Inc. 10301 Democracy Lane, Suite 410 Fairfax, Virginia 22030 Phone : 703-293-9624 Fax : 703-293-9609 e-mail : tme@on-the-i.com http://www.on-the-i.com
Conclusion: changing the RFC from saying you MUST do a NOTIFY and ditch the
session could be adjusted to stating that you MUST handle the error in one
of two ways: do a NOTIFY and ditch the session (traditional), or send an ALERT and discard the badly formed route. Additionally, this alternative handling MUST NOT be enabled by default, and SHOULD have a threshhold parameter at which the session will undergo a NOTIFY/reset, under the assumption that the host sending an appreciable amount of badly formed routes is, in fact, in danger of sending correctly formed but erroneous data as well.
Suitable threshold values are left as an excercise to local admins and BCP
documents; I would think this could be negotiated as a capability extension
to BGP4, with the fallback, of course, being to follow the traditional RFC
practice.
Thoughts? -- ***************************************************************************
Joel Baker System Administrator - lightbearer.com
lucifer@lightbearer.com http://www.lightbearer.com/~lucifer
Marshall Eubanks tme@21rst-century.com
Marshall Eubanks wrote:
In an attempt to return to an argument, rather than simple contradiction (ok, ok, it's far more polite and reasonable so far than that would imply,
but I couldn't miss the cheap shot; apologies hereby tendered), perhaps we
should consider *what* the RFC should say, if it should be changed? Going to the WG with a proposal in hand and a rationale to support it would seem
to be the best path.
So, a summary of my view on it at the moment:
Assumption #1) Resetting a BGP session is 'costly'. Both in terms of the time it takes, the stability it removes, and the fact that it flaps all of your *outgoing* announcements as well as incoming ones.
Assumption #2) A router that sends a malformed route is clearly doing something which it Should Not Be Doing (tm) (ok, this might be axiomatic, but should still be laid out)
Assumption #3) The current practice has been shown to demonstrably increase the brittleness of the Internet, by causing severe flapping when someone only partially follows the RFC (in particular, propagating bad route data, whether or not the origional source session is reset).
Assumption #4) Routing errors which are bad data, but *not* malformed routes, will not generally be caught by normal means in normal operation, until a case of human intervention to cross-check the data.
Assumption #5) Any router which breaks so badly as to start spewing large amounts of validly formed but errorneous data, and is *also* spewing badly
formed data, will spew noticeable amounts of said badly formed data. (This
one is key, and is only a conjecture; field evidence would be of great use
in validating it).
Hello;
Can "badly formed data" be reasonably clearly defined ? What tests are there for "validly formed but errorneous data" ?
To clarify, I will use a non-BGP example. "Sky blue is the" is a badly formed english sentance. It violates the standard rules of syntax, and is patently obvious to anyone familiar with said syntax to be invalid on it's surface, even if one could infer what it probably was trying to say and derive the information. "The sky is purple with green polka dots" is a correctly *formed* sentance, but the data in it is erroneous/corrupt (well, assuming we haven't altered the laws of physics, etc etc). In BGP, badly formed data does not meet the requirements as laid out in the RFC for properly formated messages, while erroneous data is simply any data which does not accurately represent what it should (a typoed routing announcement would be a clear example). The former can be caught by very simple automated checking on input (validity testing), while the latter can only be detected by some level of human intervention (noticing the bad route, or filtering based on human-defined policy, even if the filtering is done automatically).
There are several monitoring efforts (including the one done here) which compare sets of (m)bgp routing tables over time. It seems to me that such (m)bgp pollution should be detectable with a monitoring project.
This is one way for humans to cross-check for erroneous but well-formed data.
BTW, what seems to be the clearest sign here of the recent flap was the dropping of 43 Autonomous Systems by UU.net for the Sat Jun 23 16:37:41 2001 status run. This is not a good enough metric to relieably detect such problems. There do seem to be a lot of weird changes in the routing table in that dump, but a simple test for this is not apparent to me at present.
The only simply test that I was implying could be added is "if we have received X badly formed routes from a peer, assume it is insane and can't be trusted", where we permit X to be >1 and do something short of a full session drop/admin-down/etc for N bad routes where X > N > 0. -- *************************************************************************** Joel Baker System Administrator - lightbearer.com lucifer@lightbearer.com http://www.lightbearer.com/~lucifer
I think that we should petition the Federal Government to set up a master BGP route server and everyone must peer with the government. That would fix things. The government always makes it better. I'm sure that the project could be done for less than $3 billion dollars. Anyone want to sign up as an independent contractor? Larry Diffey CCNA and all around nice guy ldiffey@technologyforward.com ----- Original Message ----- From: "Marshall Eubanks" <tme@21rst-century.com> To: <lucifer@lightbearer.com>; <nanog@merit.edu> Sent: Tuesday, June 26, 2001 9:27 PM Subject: Re: RFC 1771, further thoughts
In an attempt to return to an argument, rather than simple contradiction (ok, ok, it's far more polite and reasonable so far than that would
imply,
but I couldn't miss the cheap shot; apologies hereby tendered), perhaps
we
should consider *what* the RFC should say, if it should be changed? Going to the WG with a proposal in hand and a rationale to support it would
seem
to be the best path.
So, a summary of my view on it at the moment:
Assumption #1) Resetting a BGP session is 'costly'. Both in terms of the time it takes, the stability it removes, and the fact that it flaps all of your *outgoing* announcements as well as incoming ones.
Assumption #2) A router that sends a malformed route is clearly doing something which it Should Not Be Doing (tm) (ok, this might be axiomatic, but should still be laid out)
Assumption #3) The current practice has been shown to demonstrably increase the brittleness of the Internet, by causing severe flapping when someone only partially follows the RFC (in particular, propagating bad route data, whether or not the origional source session is reset).
Assumption #4) Routing errors which are bad data, but *not* malformed routes, will not generally be caught by normal means in normal operation, until a case of human intervention to cross-check the data.
Assumption #5) Any router which breaks so badly as to start spewing large amounts of validly formed but errorneous data, and is *also* spewing
badly
formed data, will spew noticeable amounts of said badly formed data.
(This
one is key, and is only a conjecture; field evidence would be of great
use
in validating it).
Hello;
Can "badly formed data" be reasonably clearly defined ? What tests are there for "validly formed but errorneous data" ?
There are several monitoring efforts (including the one done here) which
sets of (m)bgp routing tables over time. It seems to me that such (m)bgp pollution should be detectable with a monitoring project.
BTW, what seems to be the clearest sign here of the recent flap was the dropping of 43 Autonomous Systems by UU.net for the Sat Jun 23 16:37:41 2001 status run. This is not a good enough metric to relieably detect such problems. There do seem to be a lot of weird changes in the routing table in that dump, but a simple test for this is not apparent to me at present.
Regards Marshall Eubanks
Multicast Technologies, Inc. 10301 Democracy Lane, Suite 410 Fairfax, Virginia 22030 Phone : 703-293-9624 Fax : 703-293-9609 e-mail : tme@on-the-i.com http://www.on-the-i.com
Conclusion: changing the RFC from saying you MUST do a NOTIFY and ditch
compare the
session could be adjusted to stating that you MUST handle the error in
one
of two ways: do a NOTIFY and ditch the session (traditional), or send an ALERT and discard the badly formed route. Additionally, this alternative handling MUST NOT be enabled by default, and SHOULD have a threshhold parameter at which the session will undergo a NOTIFY/reset, under the assumption that the host sending an appreciable amount of badly formed routes is, in fact, in danger of sending correctly formed but erroneous data as well.
Suitable threshold values are left as an excercise to local admins and
BCP
documents; I would think this could be negotiated as a capability
extension
to BGP4, with the fallback, of course, being to follow the traditional
RFC
practice.
Thoughts? --
***************************************************************************
Joel Baker System Administrator -
lightbearer.com
lucifer@lightbearer.com http://www.lightbearer.com/~lucifer
Marshall Eubanks
tme@21rst-century.com
participants (3)
-
Larry Diffey
-
lucifer@lightbearer.com
-
Marshall Eubanks