Using crypto auth for detecting corrupted IGP packets?
Hi, I believe, based on what i have heard, that some operators turn on cryptographic authentication because the internet checksum that OSPF, etc use for packet sanity is quite weak and offers trifle little protection against lot of known errors like: - re-ordering of 2-byte aligned words - various bit flips that keep the 1s complement sum the same (e.g. 0x0000 to 0xffff and vice versa) So a corrupted packet could still pass the ethernet CRC checks and IP and OSPF checksums. Or it could be valid till the ethernet CRC check is done and gets corrupted after that (PCI transmission errors, DMA errors, memory issues, line card corruption and last but not the least, CRCs and internet checksums could miss wire-corrupted packets) Currently an operator can do the following: - Use the poor internet checksum OR - Turn on cryptographic authentication in the routing protocols to catch all such bit errors which could be caused by line card corruption, etc. One can go through http://portal.acm.org/citation.cfm?id=294357.294364 to understand the issues with the internet checksums. I would be interested in knowing if operators use the cryptographic authentication for detecting the errors that i just described above. You could send me a mail offline and i will consolidate the responses and send a summary on the list in a few days time. Cheers, Manav
On Sep 30, 2010, at 11:34 PM, Manav Bhatia wrote:
I would be interested in knowing if operators use the cryptographic authentication for detecting the errors that i just described above.
Additionally, one might venture to understand the effects of such mechanisms and why knob's such as IS-IS's "ignore-lsp-errors" were added ~15 years ago. LSP corruption storms driven by receivers that purge corrupted LSPs and originators that re-originate and flood on receipt of said purged LSPs are very problematic and otherwise difficult to identify in practice. Coincidentally, it's also why logging LSPs that trigger such errors is important, whether you ignore them or propagate them. -danny
Sent from my iThing On Oct 1, 2010, at 12:16 AM, Danny McPherson <danny@tcb.net> wrote:
On Sep 30, 2010, at 11:34 PM, Manav Bhatia wrote:
I would be interested in knowing if operators use the cryptographic authentication for detecting the errors that i just described above.
Additionally, one might venture to understand the effects of such mechanisms and why knob's such as IS-IS's "ignore-lsp-errors" were added ~15 years ago. LSP corruption storms driven by receivers that purge corrupted LSPs and originators that re-originate and flood on receipt of said purged LSPs are very problematic and otherwise difficult to identify in practice.
Coincidentally, it's also why logging LSPs that trigger such errors is important, whether you ignore them or propagate them.
I really wish there was a good way to (generically) keep a 4-6 hour buffer of all control-plane traffic on devices. While you can do that with some, the forensic value is immense when you have a problem. - Jared
I really wish there was a good way to (generically) keep a 4-6 hour buffer of all control-plane traffic on devices. While you can do that with some, the forensic value is immense when you have a problem.
Buffering for 4-6 hours worth of control traffic is HUGE! What about mirroring your control traffic arriving on your network ports to some other dedicated port? Manav
On Oct 1, 2010, at 11:07 AM, Manav Bhatia wrote:
Buffering for 4-6 hours worth of control traffic is HUGE!
If 4-6 hours of *control-plane* traffic on a given device is 'HUGE!', for some reasonable modern value of 'HUGE!', then there's definitely a problem on the network in question. ;> ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Sell your computer and buy a guitar.
Buffering for 4-6 hours worth of control traffic is HUGE!
If 4-6 hours of *control-plane* traffic on a given device is 'HUGE!', for some reasonable modern value of 'HUGE!', then there's definitely a problem on the network in question.
With BFD alone (assuming 20 sessions, 50ms timer) you will have 400pps. In 6 hours you will have around 8000K BFD packets. Add OSPF, RSVP, BGP, LACP (for lags), dot1AG, EFM and you would really get a significant number of packets to buffer. Cheers, Manav
On Oct 1, 2010, at 1:01 PM, Manav Bhatia wrote:
In 6 hours you will have around 8000K BFD packets. Add OSPF, RSVP, BGP, LACP (for lags), dot1AG, EFM and you would really get a significant number of packets to buffer.
Which isn't a 'HUGE!' amount of packets. ;> ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Sell your computer and buy a guitar.
On Oct 1, 2010, at 3:49 AM, Dobbins, Roland wrote:
On Oct 1, 2010, at 1:01 PM, Manav Bhatia wrote:
In 6 hours you will have around 8000K BFD packets. Add OSPF, RSVP, BGP, LACP (for lags), dot1AG, EFM and you would really get a significant number of packets to buffer.
Which isn't a 'HUGE!' amount of packets.
;>
Yup, but when trying to figure out the root cause of some problem, having a few gigs of data would be helpful. In the event people have not noticed, hard drives are semi-popular in routers now, so assuming you have some variable amount of disk space greater than 8MB for an image is feasible. - Jared
On Fri, Oct 1, 2010 at 7:26 AM, Jared Mauch <jared@puck.nether.net> wrote:
On Oct 1, 2010, at 3:49 AM, Dobbins, Roland wrote:
On Oct 1, 2010, at 1:01 PM, Manav Bhatia wrote:
In 6 hours you will have around 8000K BFD packets. Add OSPF, RSVP, BGP, LACP (for lags), dot1AG, EFM and you would really get a significant number of packets to buffer.
Which isn't a 'HUGE!' amount of packets.
;>
Yup, but when trying to figure out the root cause of some problem, having a few gigs of data would be helpful.
In the event people have not noticed, hard drives are semi-popular in routers now, so assuming you have some variable amount of disk space greater than 8MB for an image is feasible.
on at least one platform you can get some details with traceoptions, no?
On Fri, 1 Oct 2010 00:25:34 -0400 Jared Mauch <jared@puck.nether.net> wrote:
I really wish there was a good way to (generically) keep a 4-6 hour buffer of all control-plane traffic on devices. While you can do that with some, the forensic value is immense when you have a problem.
Not precisely what you're looking for, but you can monitor the OSPF database in other ways. See some of early OSPF work described here for instance: <http://www2.research.att.com/~ashaikh/presentations.php> I had written a simple utility to grab the LSA counts and checksum values from a set of routers.when I converted a RIP network to OSPF. The network consisted of about 25 routers and 300 routes. It was invaluable to as a sanity check to see if all routers were in agreement. Packet Design's Route Explorer may be a commercial implementation of this sort of thing. I've only an early version of that at an earlier NANOG and have never used it. It seemed like cool technology at the time, but don't take that as an endorsement. Ob op note: I do recall one older IOS router where it would never have exactly the same checksum values as the other. After manually inspecting the routes I had concluded that it was an artifact of the IOS code being run, which was an old 11.x train and the only one in the net at the time. John
Hi, I received 7 replies of which 3 stated that they were using crypto to only detect the issues that i have described in my email below. Another 3 said that they were using it for authentication and 1 person replied saying that they were using crypto for both authentication and integrity. Folks who are using cryptographic authentication mechanisms only for integrity may want to look at http://www.ietf.org/id/draft-jakma-ospf-integrity-00.txt Cheers, Manav On Fri, Oct 1, 2010 at 9:04 AM, Manav Bhatia <manavbhatia@gmail.com> wrote:
Hi,
I believe, based on what i have heard, that some operators turn on cryptographic authentication because the internet checksum that OSPF, etc use for packet sanity is quite weak and offers trifle little protection against lot of known errors like:
- re-ordering of 2-byte aligned words - various bit flips that keep the 1s complement sum the same (e.g. 0x0000 to 0xffff and vice versa)
So a corrupted packet could still pass the ethernet CRC checks and IP and OSPF checksums. Or it could be valid till the ethernet CRC check is done and gets corrupted after that (PCI transmission errors, DMA errors, memory issues, line card corruption and last but not the least, CRCs and internet checksums could miss wire-corrupted packets)
Currently an operator can do the following:
- Use the poor internet checksum OR
- Turn on cryptographic authentication in the routing protocols to catch all such bit errors which could be caused by line card corruption, etc.
One can go through http://portal.acm.org/citation.cfm?id=294357.294364 to understand the issues with the internet checksums.
I would be interested in knowing if operators use the cryptographic authentication for detecting the errors that i just described above. You could send me a mail offline and i will consolidate the responses and send a summary on the list in a few days time.
Cheers, Manav
participants (6)
-
Christopher Morrow
-
Danny McPherson
-
Dobbins, Roland
-
Jared Mauch
-
John Kristoff
-
Manav Bhatia