-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
-----Original Message----- From: Gadi Evron [mailto:ge@linuxbox.org] Sent: Thursday, December 02, 2004 3:21 PM To: Chad Skidmore Cc: Aaron Glenn; nanog@merit.edu Subject: What good is a noc team? How do you mitigate this? [was: How many backbones ...]
Okay, making this an operational issue. Say you are attacked. Say it isn't even a botnet. Say a new worm is out and you are getting traffic from 19 different class A's.
Who do you call? What do you block?
How can a noc team here help?
"Please block any outgoing connections from your network to ours on port 25? Please?" I tried this once.. it doesn't help. I ended up blackholing an entire country just to mitigate it a bit, for a few hours.
Any practical suggestions?
Gadi.
Well, the easy answer is that it depends. Lets use SQL Slammer as one example that might be comparable to the scenario you mention. During Slammer some networks did stay up. We'd have to ask each one of them what they did to know why they stayed up but I think I can guess at some. Shortly after Slammer there was a NANOG presentation on Slammer and some discussion at the NSP-Sec BOF at that NANOG regarding why some people survived and others didn't. What came out of that was enlightening, if not obvious in hind sight. 1. Those providers that made use of contacts at other providers and worked together, shared information, etc. were less affected than those that did not. 2. Those providers that had various mechanisms in place for just such an issue did better than those that did not. This included, but was not limited to, darknet monitoring & quick reaction to darknet data anomalies, automated and semi-automated sifting of Netflow data, pre-staged classification ACLs on at least key backbone/peering/transit routers, and BGP (or other) triggered blackhole mechanisms. 3. Teams with dedicated incident response teams did better than those that didn't. 4. Those with grossly oversubscribed networks did worse than those with sufficient bandwidth to handle the ebb and flow of traffic that rides the Internet today. Good traffic engineering practices don't mean that you have to purchase lots of excess bandwidth to make this happen. Not being oversubscribed is also not just an issue of circuit utilization. For example, make sure you have enough CPU on your routers, line cards, whatever so that you can turn various features on to help track and mitigate an attack without making your routers fall over. So, armed with that data you can assume the following. With good darknet monitoring practices you would likely see a rapid up tick in scanning, backscatter, etc. and could start investigating the cause prior to the issue becoming service affecting. Maybe it is so crazy and randomized that you don't see it on your darknet monitoring but you see it on your PPS data collection. More often than not I know we see indications of miscreant activity on PPS monitoring first. The classification ACLs are a good way to turn the router into a poor mans sniffer (assuming it isn't so heavily loaded already that it falls over) so you can see what types of traffic you are dealing with. Using MCI/UUs method you could track any spoofed traffic back to where it enters your network pretty easily. I know that Chris and company do it with amazing speed across 701. If it works for them then it likely works for the rest of you. Netflow data would likely lead you to sources of the most pain so you could go after those first. Fighting an attack isn't always about making the attack go away. Often times the key to not getting killed is to find the "big guns" and get them silenced first. Sure, you're still getting shot, but it isn't going to kill you and you can take some additional time to find the smaller guns. If you are seeing the bulk of the attack come from a few sources let their security teams deal with it and take the pain away from you. Armed with the data you glean from this approach you will usually be able to get a positive response from your upstream or peers. If not make a quick note to yourself that you need to replace them once your attack is over and done with. If all else fails blackhole the host under attack at your borders, or even better on your upstream's network via BGP triggered blackhole (if they don't support it make a note to replace them with someone who does when the attack is over). You might sacrifice that host but you'll save the rest of your network and likely buy yourself some more time to track back to the source and kill it. I'm certainly not suggesting I have all the answers or that I have it all figured out. I also realize that the world is not a rosy place where inter-provider communication is perfect and I always get the answers I need when I call them. I'm just tired of seeing people play the victim, complaining how the "Big Providers" won't protect them, etc. without looking in the mirror and deciding that today is the day I take my network back and take care of myself. Gadi and Aaron, this isn't directed specifically at you so please don't take that as a personal flame to you. I've personally had great luck in getting quick reaction from a number of providers when an attack is ongoing. That certainly isn't always the case but more often than not it is. Some that have done a great job in the past that come to mind for me are 701, 2914, 3549, 1239, 3356. That isn't all of them but they are the ones that come to mind the quickest. Chad - ---------------------------- Chad E Skidmore One Eighty Networks, Inc. http://www.go180.net 509-688-8180 -----BEGIN PGP SIGNATURE----- Version: PGP 8.1 iQA/AwUBQa+tZ02RUJ5udBnvEQJk8QCeLel/Nj4FFIIXInGSEF/i5RovlW0An1E8 UECC1HswfhvbtyIJi0QL8vS0 =FHWN -----END PGP SIGNATURE-----