On BB, so top posting. Apologies. It seems that creating a worst case BGP test suite for all kinds of nastiness (in light of the recent RIPE thing) might not be a bad idea - so that we can all test the implementation ourselves before we deploy new code. Like all funky attributes, all funky AS SETs... With knobs for 1 to mem exhaust (for long data sets, etc). Unless BGP is massively more complicated than I remember, its not a very advanced CS grad project. I'm thinking a quagga or perl BGP talker would be a good place to start. Deepak ----- Original Message ----- From: Christopher Morrow <morrowc.lists@gmail.com> To: Florian Weimer <fw@deneb.enyo.de> Cc: nanog@nanog.org <nanog@nanog.org> Sent: Sun Aug 29 01:12:00 2010 Subject: Re: Did your BGP crash today? On Sat, Aug 28, 2010 at 6:14 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
* Christopher Morrow:
(you are asking your vendors to run full bit sweeps of each protocol in a regimented manner checking for all possible edge cases and properly handling them, right?)
The real issue is that both spec and current practice say you need to drop the session as soon as you encounter any unexpected data. That's
sorry, I conflated two things... or didn't mean to but did anyway. 1) users of gear that does BGP really need to ask loudly and longly (and then go test for themselves) that their BGP speakers do the 'right thing' when faced with oddball scenarios. If someone sends you a previously unknown attribute... don't corrupt it and pass it on, pass if transitive, drop if not. 2) some thought and writing and code-changes need to go into how the bgp-speakers of the world deal with bad-behaving bgp speakers. Is 'send notify and reset' the right answer? is there one 'right answer' ? Should some classes of fugly exchange end with a 'dropped that update, moved along' and some end with 'pull eject handle!' ? it's doubtful that 2 can get solved here (nanog, though certainly some operational thought on the right thing would be great as guidance). i would hope that 1 can get some traction here (via folks going back to their vendors and asking: "Did you run the Mu-security/Oolu-univ/etc fuzzing test suites against this code? can I see the results? I hope they match the results I'm going to be getting from my folks in ~2wks... or we'll be having a much more structured/loud conversation..." another poster had a great point about 'all the world can screw with you, you have no protections other than trust that the next guy won't screw you over (inadvertently)'. There are no protections available to you if someone sets (example) bit 77 in an ipv4 update message to 1 when it should by all accounts be 0. Or (apparently) if they send a previously unknown attribute on a route :( You can put in max-prefix limits, as-path limits (length and content), prefix-filters.. but internal-message-content you are stuck hoping the vendors all followed the same playbook. With everyone saying together: "Please appropriately test your implementation for all boundary cases" maybe we can get to where these happen less often (or nearly never) - every 3 months is a little tedious. -chris