-larry directly since I'm sure he's either tired of this, or already reading it via the nanog subscription. On Mon, Feb 3, 2014 at 7:54 PM, Peter Phaal <peter.phaal@gmail.com> wrote:
On Mon, Feb 3, 2014 at 2:58 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:
wait, so the whole of the thread is about stopping participants in the attack, and you're suggesting that removing/changing end-system switch/routing gear and doing something more complex than: deny udp any 123 any deny udp any 123 any 123 permit ip any any
is a good plan?
I'd direct you at: <https://www.nanog.org/resources/tutorials>
and particularly at: "Tutorial: ISP Security - Real World Techniques II" <https://www.nanog.org/meetings/nanog23/presentations/greene.pdf>
Thanks for the links. Many SDN solutions can be replicated using
you're sort of a broken record on this bit ... I don't think folk are (me in particular) knocking sdn things, in general. In the specific though: 1) you missed the point originally, stop marketing your blog pls. 2) you missed the point(s) about availability and realistic deployment of solutions in the near term
manual processes (or are ways of automating currently manual processes). Programmatic APIs allows the speed and accuracy of the response to be increased and the solution to be delivered at scale and at lower cost.
and all of these require very strict and very careful deployment of oss measures to watch over current state and intended state. They require also very careful training and troubleshooting steps for the ops folk running the systems. None of this is deployable 'tomorrow' (in under 24hrs) safely, and most likely it'll be a bit more time until there is ubiquitous deployment of sdn-like functionality in larger scale networks. not that I'm not a fan, and not that I don't like me some automation, but.. having seen automation go very wrong (l3's acl spider... crushes l3..., flowspec 'whoopsie' at cloudflare and TWTC... there are lots of other examples).
it's probably not a good plan to forklift your edge, for dos targets where all you really need is a 3 line acl.
For many networks it doesn't need to be forklift upgrade - vendors are adding programmatic APIs to their existing products (OpenFlow, Arista eAPI, NETCONF, ALU Web Services ...) - so a firmware upgrade may be
arista is deployed in which large scale networks with api/sdn functionality ? they're a great bunch of folks, they make some nice gear, it's still getting baked though, and it's not displacing (today) existing gear that's still being depreciated. for anything to be workable in the near-term, the above examples just aren't going to work. note my many references to "5-7 yrs when deprecation cycles and next-replacement happens"
all that is required.
I do think that there are operational advantages to using protocols like OpenFlow, I2RS, BGP FlowSpec for these soft controls since they allow the configuration to remain relatively static and they avoid problems of split control (for example, and operator makes a config change and saves, locking in a temporary control from the SDN system).
automation, with protections, safety checks, assurances that the process won't break things in odd failure modes.. not to mention bug^H^H^Hfeature issues with gear, we're still a bit from large scale deployment.
I would argue that the more specific the ACL can be the less collateral damage. Built-in measurement allows for a more targeted response.
sure, I think roland and I at least have been saying the same thing.
Good point - the proposed solution is most effective for protecting customers that are targeted by DDoS attacks. While trying to prevent
Oh, so the 3 line acl is not an option? or (for a lot of customers a fine answer) null route? Some things have changed in the world of dos mitigation, but a bunch of the basics still apply. I do know that in the unfortunate event that your network is the transit or terminus of a dos attack at high volume you want to do the least configuration that'll satisfy the 2 parties involved (you and your customer)... doing a bunch of hardware replacement and/or sdn things when you can get the job done with some acls or routing changes is really going to be risky.
I think an automatic system using a programmatic API to install as narrowly scoped a filter as possible is the most conservative and least risky option. Manual processes are error prone, slow, and blunt instruments like a null route can cause collateral damage to services.
folk say this, but the customer very often explicitly asks for null routes. The thing being targetted is very often not 'revenue generating ecommerce site', and for providers where the default answer is 'everything is a null route', their customers ought to find a provider that thinks differently.
Typical networks probably only see a few DDoS attacks an hour at the most, so pushing a few rules an hour to mitigate them should have little impact on the switch control plane.
based on what math did you get 'few per hour?' As an endpoint (focal point) or as a contributor? The problem that started this discussion was being a contributor...which I bet happens a lot more often than /few an hour/.
I am sorry, I should have been clearer, the SDN solution I was describing is aimed at protecting the target's links, rather than mitigating the botnet and amplification layers.
and i'd say that today sdn is out of reach for most deployments, and that the simplest answer is already available.
The number of attacks was from the perspective of DDoS targets and their service providers. If you are considering each participant in the attack the number goes up considerably.
I bet roland has some good round-numbers on number of dos attacks per day... I bet it's higher than a few per hour globally, for the ones that get noticed.
The "few per hour" number isn't a global statistic. This is the number that a large hosting data center might experience. The global number
I wonder how many rackspace, softlayer, amazon-aws, xs4all, hetzner, etc experience per hour. in any case, 'often' is probably close enough.
is much larger, but not very relevant to a specific provider looking to size a mitigation solution.
note that the focus of the original thread was on the contributors. I think the target part of the problem has been solved since before the slides in the pdf link at the top...
Do most service providers allow their customers to control ACLs in the upstream routers? Do they automatically monitor traffic and insert the
nope, and I don't necessarily think that changes with SDN... letting your customer traffic-engineer is ... dangerous. it tosses capacity planning concerns out the window :( There are several providers, however, that let their customers initiate smart/intelligent mitigation solutions though. I know of 3 that let the customer trigger based on BGP community. A customer can choose how they want to 'detect' and then simply bgp-update for mitigation... I bet there are folk that don't own networks that provide this service as well... I'm sure roland has some work stories he's presented on about this very thing.
filters themselves when there is an attack? I don't believe so - while
some providers do, based upon customer demand for the service. it's not really that hard, though it is a cost for the provider so that's shared with the customers using the solution(s).
the slides describe a solution, automation is needed to make available at large scale.
automation isn't precluded from solution space in the slides, note that they were presented and created in ~2002... so the state of the art has changed a bit since then, but the methodology and practices from 2002 can be applied fairly directly today.
you're getting pretty complicated for the target side: ip access-list 150 permit ip any any log
(note this is basically taken verbatim from the slides)
view logs, see the overwhelming majority are to hostX port Y proto Z... filter, done. you can do that in about 5 mins time, quicker if you care to rush a bit.
An automated system can perform the analysis and apply the filter in a second with no human intervention. What if you have to manage thousands of customer links?
been there, done that... got several tshirts. it's honestly not that bad.
This brings up an interesting point use case for an OpenFlow capable switch - replicating sFlow, NetFlow, IPFIX, Syslog, SNMP traps etc. Many top of rack switches can also forward the traffic through a GRE/VxLAN tunnel as well.
yes, more complexity seems like a great plan... in the words of someone else: "I encourage my competitors to do this"
Using the existing switches to replicate and tap production traffic is less complex and more scalable than alternatives. You may find the following use case interesting:
http://blog.sflow.com/2013/04/sdn-packet-broker.html
I think roland's other point that not very many people actually even use sflow is not to be taken lightly here either.
It doesn't have to be sFlow - the sFlow solution was provided as a concrete example since that is the technology I am most familiar with.
and which, according to a credible source, is not deployed by and large by service providers. certainly in some IDC situations sflow is interesting, but it's not there according to someone who I believe is in a position to know, for isp situations. leaving it out though, some signal of 'traffic looks like' is available if deployed. not everyone does...some don't because 'meh!' some don't because 'not in featureset bought' some don't because '<other silly reason>'. folk that don't have it generally can't just crank it up 'now' though.
However, sFlow, IPFIX, NetFlow, jFlow etc. combined with analytics and a programmatic control API allows DDoS mitigation to be automated. I
right, arbor sells this, as one example. (there are others of course) there are several large US isp's that use that solution (or an offspring of that) today. it's not quite sdn, but it is automated and relatively fire/forget. -chris