Re: TWC (AS11351) blocking all NTP?

newer
Visual tools for RSVP-TE

older
FW: Updated ARIN allocation...

Larry Sheldon

3 Feb 2014 3 Feb '14

3:38 a.m.

On 2/2/2014 9:17 PM, ryangard@gmail.com wrote:

...

I'd hate to think that NetOps would be so heavy handed in blocking all of UDP, as this would essentially halt quite a bit of audio/video traffic. That being said, there's still quite the need for protocol improvement when making use of UDP, but blocking UDP as a whole is definitely not a resolution, and simply creating a wall that not only keeps the abusive traffic out, but keeps legitimate traffic from flowing freely as it should.

"We had to burn down the village to save it." -- Requiescas in pace o email Two identifying characteristics of System Administrators: Ex turpi causa non oritur actio Infallibility, and the ability to learn from their mistakes. (Adapted from Stephen Pinker)

Show replies by date

Cb B

3 Feb 3 Feb

3:45 a.m.

New subject: TWC (AS11351) blocking all NTP?

On Feb 2, 2014 7:41 PM, "Larry Sheldon" <LarrySheldon@cox.net> wrote:

...

On 2/2/2014 9:17 PM, ryangard@gmail.com wrote:

...
I'd hate to think that NetOps would be so heavy handed in blocking all of UDP, as this would essentially halt quite a bit of audio/video traffic. That being said, there's still quite the need for protocol improvement when making use of UDP, but blocking UDP as a whole is definitely not a resolution, and simply creating a wall that not only keeps the abusive traffic out, but keeps legitimate traffic from flowing freely as it should.

"We had to burn down the village to save it."

Close. More like a hurricane is landing in NYC so we are forcing an evacuation. But. Your network, your call. CB

...

-- Requiescas in pace o email Two identifying characteristics of System Administrators: Ex turpi causa non oritur actio Infallibility, and the ability to learn from their mistakes. (Adapted from Stephen Pinker)

Geraint Jones

3:49 a.m.

New subject: TWC (AS11351) blocking all NTP?

On 3/02/14 4:45 pm, "Cb B" <cb.list6@gmail.com> wrote:

...

On Feb 2, 2014 7:41 PM, "Larry Sheldon" <LarrySheldon@cox.net> wrote:

...
On 2/2/2014 9:17 PM, ryangard@gmail.com wrote:

...
I'd hate to think that NetOps would be so heavy handed in blocking all of UDP, as this would essentially halt quite a bit of audio/video traffic. That being said, there's still quite the need for protocol improvement when making use of UDP, but blocking UDP as a whole is definitely not a resolution, and simply creating a wall that not only keeps the abusive traffic out, but keeps legitimate traffic from flowing freely as it should.

"We had to burn down the village to save it."

Close. More like a hurricane is landing in NYC so we are forcing an evacuation.

But. Your network, your call.

CB

We block all outbound UDP for our ~200,000 Users for this very reason (with the exception of some whitelisted NTP and DNS servers). So far we have had 0 complaints, and 0 UDP floods sourced from us -- Geraint Jones Director of Systems & Infrastructure Koding AS62805 (We are hiring) https://koding.com geraint@koding.com Phone (415) 653-0083

...

...
-- Requiescas in pace o email Two identifying characteristics of System Administrators: Ex turpi causa non oritur actio Infallibility, and the ability to learn from their mistakes. (Adapted from Stephen Pinker)

Dobbins, Roland

3:58 a.m.

New subject: TWC (AS11351) blocking all NTP?

On Feb 3, 2014, at 10:49 AM, Geraint Jones <geraint@koding.com> wrote:

...

We block all outbound UDP for our ~200,000 Users for this very reason

Actually, you could've (and should've) been far more selective in what you filtered via ACLs, IMHO. What about your users who play online games like BF4? I'm a big believer in using ACLs to intelligently preclude reflection/amplification abuse, but wholesale filtering of all UDP takes matters too far, IMHO. My suggestion would be to implement antispoofing on the southward interfaces of the customer aggregation edge (if you can't implement it via mechanisms such as cable ip source verify even further southward), and then implement a default ingress ACL on the coreward interfaces of the customer aggregation gateways to block inbound UDP destined to ntp, chargen, DNS, and SNMP ports only. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

Dobbins, Roland

4:09 a.m.

New subject: TWC (AS11351) blocking all NTP?

On Feb 3, 2014, at 10:58 AM, Dobbins, Roland <rdobbins@arbor.net> wrote:

...

I'm a big believer in using ACLs to intelligently preclude reflection/amplification abuse, but wholesale filtering of all UDP takes matters too far, IMHO.

I also think that restricting your users by default to your own recursive DNS servers, plus a couple of well-known, well-run public recursive services, is a good idea - as long as you allow your users to opt out. This has nothing to do with DDoS, but with other types of issues. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

Stephane Bortzmeyer

9:55 a.m.

New subject: TWC (AS11351) blocking all NTP?

On Mon, Feb 03, 2014 at 04:09:39AM +0000, Dobbins, Roland <rdobbins@arbor.net> wrote a message of 20 lines which said:

...

I also think that restricting your users by default to your own recursive DNS servers, plus a couple of well-known, well-run public recursive services, is a good idea - as long as you allow your users to opt out.

That's a big "as long". I agree with you but I'm fairly certain that most ISP who deny their users the ability to do DNS requests directly (or to run their own DNS resolver) have no such opt-out (or they make it expensive and/or complicated). After all, when outside DNS is blocked, it is more often for business reasons (forcing the users to use a local lying resolver, with ads when NXDOMAIN is returned) than for security reasons.

Dobbins, Roland

11:16 a.m.

New subject: TWC (AS11351) blocking all NTP?

On Feb 3, 2014, at 4:55 PM, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:

...

I agree with you but I'm fairly certain that most ISP who deny their users the ability to do DNS requests directly (or to run their own DNS resolver) have no such opt-out (or they make it expensive and/or complicated).

There are some who do it, though, with a user-friendly portal - I've seen it in action. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

John Kristoff

10:21 p.m.

New subject: TWC (AS11351) blocking all NTP?

On Mon, 03 Feb 2014 16:49:37 +1300 Geraint Jones <geraint@koding.com> wrote:

...

We block all outbound UDP for our ~200,000 Users for this very reason (with the exception of some whitelisted NTP and DNS servers). So far we have had 0 complaints

I've heard this sort of absence of complaint statement used to justify some sort of truth claim about how to operate a network a number of times before. There is a certain appeal to it, particularly in cases such as this and for certain types of networks and operators, but if nothing else, for those that do it, I would also like to see some additional analysis about what is being filtered. It leaves many unconvinced and left to conjecture what the right approach is otherwise. If you have done that analysis or if you could make available some of that data for a research project, it would be very helpful for everyone to see what the measurable effect is. It would also make for a useful research project. John

Peter Phaal

5:42 p.m.

New subject: TWC (AS11351) blocking all NTP?

Why burn the village when only one house is the problem? I thought there might be some interest in hearing about work being done to use SDN to automatically configure filtering in existing switches and routers to mitigate flood attacks. Real-time analytics based on measurements from switches/routers (sFlow/PSAMP/IPFIX) can identify large UDP flows and integrated hybrid OpenFlow, I2RS, REST, NETCONF APIs, etc. can be used to program the switches/routers to selectively filter traffic based on UDP port and IP source / destination. By deploying a DDoS mitigation SDN application, providers can use their existing infrastructure to protect their own and their customers networks from flood attacks, and generate additional revenue by delivering flood protection as a value added service. https://datatracker.ietf.org/doc/draft-krishnan-i2rs-large-flow-use-case/ http://events.linuxfoundation.org/sites/events/files/slides/flow-aware-real-... Specifically looking at sFlow, large flood attacks can be detected within a second. The following article describes a simple example using integrated hybrid OpenFlow in a 10/40G ToR switch: http://blog.sflow.com/2014/01/physical-switch-hybrid-openflow-example.html The example can be modified to target NTP mon_getlist requests and responses using the following sFlow-RT flow definition: {'ipdestination,udpsourceport',value:'ntppvtbytes',filter:'ntppvtreq=20,42'} or to target DNS ANY requests: {keys:'ipdestination,udpsourceport',value:'frames',filter:'dnsqr=true&dnsqtype=255'} The OpenFlow block control can be modified to selectively filter UDP traffic based on the identified UDP source port and destination IP address. Vendors are adding new SDN capabilities to their platforms (often as software upgraded), so it's worth taking a look and seeing what is possible. Peter On Sun, Feb 2, 2014 at 7:38 PM, Larry Sheldon <LarrySheldon@cox.net> wrote:

...

On 2/2/2014 9:17 PM, ryangard@gmail.com wrote:

...
I'd hate to think that NetOps would be so heavy handed in blocking all of UDP, as this would essentially halt quite a bit of audio/video traffic. That being said, there's still quite the need for protocol improvement when making use of UDP, but blocking UDP as a whole is definitely not a resolution, and simply creating a wall that not only keeps the abusive traffic out, but keeps legitimate traffic from flowing freely as it should.

"We had to burn down the village to save it."

-- Requiescas in pace o email Two identifying characteristics of System Administrators: Ex turpi causa non oritur actio Infallibility, and the ability to learn from their mistakes. (Adapted from Stephen Pinker)

Christopher Morrow

6:16 p.m.

New subject: TWC (AS11351) blocking all NTP?

On Mon, Feb 3, 2014 at 12:42 PM, Peter Phaal <peter.phaal@gmail.com> wrote:

...

Why burn the village when only one house is the problem? I thought there might be some interest in hearing about work being done to use SDN to automatically configure filtering in existing switches and routers to mitigate flood attacks.

that's great... who's got sdn capable gear in deployments today? with code and OSS stuff to deal with random SDN pokery? and who has spare tcam/etc to deal with said pokery of 'block the attack-du-jour' ? There's certainly the case that you could drop acls/something on equipment to selectively block the traffic that matters... I suspect in some cases the choice was: "50% of the edge box customers on this location are a problem, block it across the board here instead of X00 times" (see concern about tcam/etc problems)

...

Real-time analytics based on measurements from switches/routers (sFlow/PSAMP/IPFIX) can identify large UDP flows and integrated hybrid OpenFlow, I2RS, REST, NETCONF APIs, etc. can be used to program the switches/routers to selectively filter traffic based on UDP port and IP source / destination. By deploying a DDoS mitigation SDN application, providers can use their existing infrastructure to protect their own and their customers networks from flood attacks, and generate additional revenue by delivering flood protection as a value added service.

yup, that sounds wonderous... and I'm sure that in the future utopian world (like 7-10 years from now, based on age-out of gear and OSS IT change requirements) we'll see more of this. I don't think you'll see much (in terms of edge ports on the network today) of this happening 'right now' though.

...

https://datatracker.ietf.org/doc/draft-krishnan-i2rs-large-flow-use-case/ http://events.linuxfoundation.org/sites/events/files/slides/flow-aware-real-...

Specifically looking at sFlow, large flood attacks can be detected within a second. The following article describes a simple example using integrated hybrid OpenFlow in a 10/40G ToR switch:

hopefully there's some clamp on how much change per device/port you plan too? :) I'd hate to see the RP/RE/etc get so busy programming tcam that bgp/isis/ospf/etc flaps :(

...

http://blog.sflow.com/2014/01/physical-switch-hybrid-openflow-example.html

The example can be modified to target NTP mon_getlist requests and responses using the following sFlow-RT flow definition:

{'ipdestination,udpsourceport',value:'ntppvtbytes',filter:'ntppvtreq=20,42'}

or to target DNS ANY requests:

{keys:'ipdestination,udpsourceport',value:'frames',filter:'dnsqr=true&dnsqtype=255'}

this also assume almost 1:1 sampling... which might not be feasible either...otherwise you'll be seeing fairly lossy results, right?

...

The OpenFlow block control can be modified to selectively filter UDP traffic based on the identified UDP source port and destination IP address.

hopefully your OSS and netflow/sflow collection isn't also being used for traffic engineering/capacity planning purposes? else... you might get odd results from that infrastructure with such changes to the sflow/netflow sender platform.

...

Vendors are adding new SDN capabilities to their platforms (often as software upgraded), so it's worth taking a look and seeing what is possible.

the device side is PROBABLY the simple side of the equation for most people...

...

On Sun, Feb 2, 2014 at 7:38 PM, Larry Sheldon <LarrySheldon@cox.net> wrote:

...
On 2/2/2014 9:17 PM, ryangard@gmail.com wrote:

...
I'd hate to think that NetOps would be so heavy handed in blocking all of UDP, as this would essentially halt quite a bit of audio/video traffic. That being said, there's still quite the need for protocol improvement when making use of UDP, but blocking UDP as a whole is definitely not a resolution, and simply creating a wall that not only keeps the abusive traffic out, but keeps legitimate traffic from flowing freely as it should.

"We had to burn down the village to save it."

-- Requiescas in pace o email Two identifying characteristics of System Administrators: Ex turpi causa non oritur actio Infallibility, and the ability to learn from their mistakes. (Adapted from Stephen Pinker)

Peter Phaal

7:42 p.m.

New subject: TWC (AS11351) blocking all NTP?

On Mon, Feb 3, 2014 at 10:16 AM, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...

On Mon, Feb 3, 2014 at 12:42 PM, Peter Phaal <peter.phaal@gmail.com> wrote:

...
Why burn the village when only one house is the problem? I thought there might be some interest in hearing about work being done to use SDN to automatically configure filtering in existing switches and routers to mitigate flood attacks.

that's great... who's got sdn capable gear in deployments today? with code and OSS stuff to deal with random SDN pokery? and who has spare tcam/etc to deal with said pokery of 'block the attack-du-jour' ?

There's certainly the case that you could drop acls/something on equipment to selectively block the traffic that matters... I suspect in some cases the choice was: "50% of the edge box customers on this location are a problem, block it across the board here instead of X00 times" (see concern about tcam/etc problems)

I agree that managing limited TCAM space is critical to the scaleability of any mitigation solution. However, tying up TCAM space on every edge device with filters to prevent each new threat is likely to be less scaleable than a measurement driven control that only takes a TCAM slot on a device when an active attack is detected transiting that device.

...

...
Real-time analytics based on measurements from switches/routers (sFlow/PSAMP/IPFIX) can identify large UDP flows and integrated hybrid OpenFlow, I2RS, REST, NETCONF APIs, etc. can be used to program the switches/routers to selectively filter traffic based on UDP port and IP source / destination. By deploying a DDoS mitigation SDN application, providers can use their existing infrastructure to protect their own and their customers networks from flood attacks, and generate additional revenue by delivering flood protection as a value added service.

yup, that sounds wonderous... and I'm sure that in the future utopian world (like 7-10 years from now, based on age-out of gear and OSS IT change requirements) we'll see more of this. I don't think you'll see much (in terms of edge ports on the network today) of this happening 'right now' though.

The current 10G upgrade cycle provides an opportunity to deploy equipment that is SDN capable. The functionality required for this use case is supported by current generation merchant silicon and is widely available right now in inexpensive switches.

...

...
Specifically looking at sFlow, large flood attacks can be detected within a second. The following article describes a simple example using integrated hybrid OpenFlow in a 10/40G ToR switch:

hopefully there's some clamp on how much change per device/port you plan too? :) I'd hate to see the RP/RE/etc get so busy programming tcam that bgp/isis/ospf/etc flaps :(

With integrated hybrid OpenFlow, there is very little activity on the OpenFlow control plane. The normal BGP, ECMP, LAG, etc. control planes handles forwarding of packets. OpenFlow is only used to selectively override specific FIB entries. I2RS provides a similar capability to selectively override RIB entries and implement controls. However, I don't know if any vendors are shipping I2RS capable routers today. Typical networks probably only see a few DDoS attacks an hour at the most, so pushing a few rules an hour to mitigate them should have little impact on the switch control plane. A good working definition of a large flow is 10% of a link's bandwidth. If you only trigger actions for large flows then in the worst case you would only require 10 rules per port to change how these flows are treated.

...

...
http://blog.sflow.com/2014/01/physical-switch-hybrid-openflow-example.html

The example can be modified to target NTP mon_getlist requests and responses using the following sFlow-RT flow definition:

{'ipdestination,udpsourceport',value:'ntppvtbytes',filter:'ntppvtreq=20,42'}

or to target DNS ANY requests:

{keys:'ipdestination,udpsourceport',value:'frames',filter:'dnsqr=true&dnsqtype=255'}

this also assume almost 1:1 sampling... which might not be feasible either...otherwise you'll be seeing fairly lossy results, right?

Actually, to detect large flows (defined as 10% of link bandwidth) within a second, you would only require the following sampling rates: 1G link, sampling rate = 1-in-1,000 (large flow >= 100M bit/s) 10G link, sampling rate = 1-in-10,000 (large flow >= 1G bit/s) 40G link, sampling rate = 1-in-40,000 (large flow >= 4G bit/s 100G link, sampling rate = 1-in-100,000 (large flow >= 10G bit/s) These sampling rates are realistically achievable in production networks (enabling monitoring on all ports) and would allow you to detect the specific IP destination and UDP source port associated with a flood attack, and the switches in the attack path, within a second.

...

...
The OpenFlow block control can be modified to selectively filter UDP traffic based on the identified UDP source port and destination IP address.

hopefully your OSS and netflow/sflow collection isn't also being used for traffic engineering/capacity planning purposes? else... you might get odd results from that infrastructure with such changes to the sflow/netflow sender platform.

This use case might be more problematic for NetFlow since obtaining the measurements may affect the router configuration (flow cache definitions) and other applications that depend on them (like capacity planning). In the case of sFlow monitoring, the flow cache is built externally and you can feed the sFlow to multiple independent analysis tools without risk of interference. http://blog.sflow.com/2013/05/software-defined-analytics.html

Christopher Morrow

8:38 p.m.

New subject: TWC (AS11351) blocking all NTP?

On Mon, Feb 3, 2014 at 2:42 PM, Peter Phaal <peter.phaal@gmail.com> wrote:

...

On Mon, Feb 3, 2014 at 10:16 AM, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...
On Mon, Feb 3, 2014 at 12:42 PM, Peter Phaal <peter.phaal@gmail.com> wrote:

...

...
There's certainly the case that you could drop acls/something on equipment to selectively block the traffic that matters... I suspect in some cases the choice was: "50% of the edge box customers on this location are a problem, block it across the board here instead of X00 times" (see concern about tcam/etc problems)

I agree that managing limited TCAM space is critical to the scaleability of any mitigation solution. However, tying up TCAM space on every edge device with filters to prevent each new threat is likely

yup, there's a tradeoff, today it's being made one way, tomorrow perhaps a different way. My point was that today the percentage of sdn capable devices is small enough that you still need a decimal point to measure it. (I bet, based on total devices deployed) The percentage of oss backend work done to do what you want is likely smaller... the folk in NZ-land (Citylink, reannz ... others - find josh baily / cardigan) are making some strides, but only in the exchange areas so far. fun stuff... but not the deployed gear as an L2/L3 device in TWC/Comcast/Verizon.

...

...
...
Real-time analytics based on measurements from switches/routers (sFlow/PSAMP/IPFIX) can identify large UDP flows and integrated hybrid OpenFlow, I2RS, REST, NETCONF APIs, etc. can be used to program the switches/routers to selectively filter traffic based on UDP port and IP source / destination. By deploying a DDoS mitigation SDN application, providers can use their existing infrastructure to protect their own and their customers networks from flood attacks, and generate additional revenue by delivering flood protection as a value added service.

yup, that sounds wonderous... and I'm sure that in the future utopian world (like 7-10 years from now, based on age-out of gear and OSS IT change requirements) we'll see more of this. I don't think you'll see much (in terms of edge ports on the network today) of this happening 'right now' though.

The current 10G upgrade cycle provides an opportunity to deploy

by 'current 10g upgrade cycle' you mean the one that happened 2-5 yrs ago? or somethign newer? did you mean 100G?

...

equipment that is SDN capable. The functionality required for this use case is supported by current generation merchant silicon and is widely available right now in inexpensive switches.

right... and everyone is removing their vendor supported gear and replacing it with pica8 boxes? The reality is that as speeds/feeds have increased over the last while basic operations techiques really haven't. Should they? maybe? will they? probably? is that going to happen on a dime? nope. Again, I suspect you'll see smaller deployments of sdn-like stuff 'soon' and larger deployments when people are more comfortable with the operations/failure modes that change.

...

...
...
Specifically looking at sFlow, large flood attacks can be detected within a second. The following article describes a simple example using integrated hybrid OpenFlow in a 10/40G ToR switch:

hopefully there's some clamp on how much change per device/port you plan too? :) I'd hate to see the RP/RE/etc get so busy programming tcam that bgp/isis/ospf/etc flaps :(

With integrated hybrid OpenFlow, there is very little activity on the OpenFlow control plane. The normal BGP, ECMP, LAG, etc. control planes handles forwarding of packets. OpenFlow is only used to selectively override specific FIB entries.

that didn't really answer the question :) if I have 10k customers behind the edge box and some of them NOW start being abused, then more later and that mix changes... if it changes a bunch because the attacker is really attackers. how fast do I change before I can't do normal ops anymore?

...

Typical networks probably only see a few DDoS attacks an hour at the most, so pushing a few rules an hour to mitigate them should have little impact on the switch control plane.

based on what math did you get 'few per hour?' As an endpoint (focal point) or as a contributor? The problem that started this discussion was being a contributor...which I bet happens a lot more often than /few an hour/.

...

A good working definition of a large flow is 10% of a link's bandwidth. If you only trigger actions for large flows then in the worst case you would only require 10 rules per port to change how these flows are treated.

10% of a 1g link is 100mbps, For contributors to ntp attacks, many of the contributors are sending ONLY 300x the input, so less than 100mbps. On a 10g link it's 1G... even more hidden. This math and detection aren't HARD, but tuning it can be a bit challenging.

...

...
...
http://blog.sflow.com/2014/01/physical-switch-hybrid-openflow-example.html

The example can be modified to target NTP mon_getlist requests and responses using the following sFlow-RT flow definition:

{'ipdestination,udpsourceport',value:'ntppvtbytes',filter:'ntppvtreq=20,42'}

or to target DNS ANY requests:

{keys:'ipdestination,udpsourceport',value:'frames',filter:'dnsqr=true&dnsqtype=255'}

this also assume almost 1:1 sampling... which might not be feasible either...otherwise you'll be seeing fairly lossy results, right?

Actually, to detect large flows (defined as 10% of link bandwidth) within a second, you would only require the following sampling rates:

your example requires seeing the 1st packet in a cycle, and seeing into the first packet. that's going to required either acceptance of loss (and gathering the loss in another rule/fashion) or 1:1 sampling to be assured of getting ALL of the DNS packets and seeing what was queried. I wonder also about privacy concerns with this.

...

...
...
The OpenFlow block control can be modified to selectively filter UDP traffic based on the identified UDP source port and destination IP address.

hopefully your OSS and netflow/sflow collection isn't also being used for traffic engineering/capacity planning purposes? else... you might get odd results from that infrastructure with such changes to the sflow/netflow sender platform.

This use case might be more problematic for NetFlow since obtaining the measurements may affect the router configuration (flow cache definitions) and other applications that depend on them (like capacity planning). In the case of sFlow monitoring, the flow cache is built externally and you can feed the sFlow to multiple independent analysis tools without risk of interference.

http://blog.sflow.com/2013/05/software-defined-analytics.html

provided your device does sflow and can export to more than one destination, sure.

Peter Phaal

10:16 p.m.

New subject: TWC (AS11351) blocking all NTP?

On Mon, Feb 3, 2014 at 12:38 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...

On Mon, Feb 3, 2014 at 2:42 PM, Peter Phaal <peter.phaal@gmail.com> wrote:

...
On Mon, Feb 3, 2014 at 10:16 AM, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...
On Mon, Feb 3, 2014 at 12:42 PM, Peter Phaal <peter.phaal@gmail.com> wrote:

...
...
There's certainly the case that you could drop acls/something on equipment to selectively block the traffic that matters... I suspect in some cases the choice was: "50% of the edge box customers on this location are a problem, block it across the board here instead of X00 times" (see concern about tcam/etc problems)

I agree that managing limited TCAM space is critical to the scaleability of any mitigation solution. However, tying up TCAM space on every edge device with filters to prevent each new threat is likely

yup, there's a tradeoff, today it's being made one way, tomorrow perhaps a different way. My point was that today the percentage of sdn capable devices is small enough that you still need a decimal point to measure it. (I bet, based on total devices deployed) The percentage of oss backend work done to do what you want is likely smaller...

the folk in NZ-land (Citylink, reannz ... others - find josh baily / cardigan) are making some strides, but only in the exchange areas so far. fun stuff... but not the deployed gear as an L2/L3 device in TWC/Comcast/Verizon.

I agree that today most networks aren't SDN ready, but there are inexpensive switches on the market that can perform these functions and for providers that have them in their network, this is an option today. In some environments, it could also make sense to drop in a layer switches to monitor and control traffic entering / exiting the network.

...

...
The current 10G upgrade cycle provides an opportunity to deploy

by 'current 10g upgrade cycle' you mean the one that happened 2-5 yrs ago? or somethign newer? did you mean 100G?

I was referring to the current upgrade cycle in data centers, with servers connected with 10G rather than 1G adapters. The high volumes are driving down the cost of 10/40/100G switches.

...

...
equipment that is SDN capable. The functionality required for this use case is supported by current generation merchant silicon and is widely available right now in inexpensive switches.

right... and everyone is removing their vendor supported gear and replacing it with pica8 boxes? The reality is that as speeds/feeds have increased over the last while basic operations techiques really haven't. Should they? maybe? will they? probably? is that going to happen on a dime? nope. Again, I suspect you'll see smaller deployments of sdn-like stuff 'soon' and larger deployments when people are more comfortable with the operations/failure modes that change.

Not just Pica8, most vendors (branded or white box) are using the same Broadcom merchant silicon, including Cisco, Juniper, Arista, Dell/Force10, Extreme etc.: http://blog.sflow.com/2014/01/drivers-for-growth.html

...

...
...
...
Specifically looking at sFlow, large flood attacks can be detected within a second. The following article describes a simple example using integrated hybrid OpenFlow in a 10/40G ToR switch:

hopefully there's some clamp on how much change per device/port you plan too? :) I'd hate to see the RP/RE/etc get so busy programming tcam that bgp/isis/ospf/etc flaps :(

With integrated hybrid OpenFlow, there is very little activity on the OpenFlow control plane. The normal BGP, ECMP, LAG, etc. control planes handles forwarding of packets. OpenFlow is only used to selectively override specific FIB entries.

that didn't really answer the question :) if I have 10k customers behind the edge box and some of them NOW start being abused, then more later and that mix changes... if it changes a bunch because the attacker is really attackers. how fast do I change before I can't do normal ops anymore?

Good point - the proposed solution is most effective for protecting customers that are targeted by DDoS attacks. While trying to prevent attackers entering the network is good citizenship, the value and effectiveness of the mitigation service increases as you get closer to the target of the attack. In this case there typically aren't very many targets and so a single rule filtering on destination IP address and protocol would typically be effective (and less disruptive to the victim that null routing).

...

...
Typical networks probably only see a few DDoS attacks an hour at the most, so pushing a few rules an hour to mitigate them should have little impact on the switch control plane.

based on what math did you get 'few per hour?' As an endpoint (focal point) or as a contributor? The problem that started this discussion was being a contributor...which I bet happens a lot more often than /few an hour/.

I am sorry, I should have been clearer, the SDN solution I was describing is aimed at protecting the target's links, rather than mitigating the botnet and amplification layers. The number of attacks was from the perspective of DDoS targets and their service providers. If you are considering each participant in the attack the number goes up considerably.

...

...
A good working definition of a large flow is 10% of a link's bandwidth. If you only trigger actions for large flows then in the worst case you would only require 10 rules per port to change how these flows are treated.

10% of a 1g link is 100mbps, For contributors to ntp attacks, many of the contributors are sending ONLY 300x the input, so less than 100mbps. On a 10g link it's 1G... even more hidden.

This math and detection aren't HARD, but tuning it can be a bit challenging.

Agreed - the technique is less effective for addressing the contributors to the attack. RPF and other edge controls should be applied, but until everyone participates and eliminates attacks at source, there is still a value in filtering close to the target of the attack.

...

...
...
...
http://blog.sflow.com/2014/01/physical-switch-hybrid-openflow-example.html

The example can be modified to target NTP mon_getlist requests and responses using the following sFlow-RT flow definition:

{'ipdestination,udpsourceport',value:'ntppvtbytes',filter:'ntppvtreq=20,42'}

or to target DNS ANY requests:

{keys:'ipdestination,udpsourceport',value:'frames',filter:'dnsqr=true&dnsqtype=255'}

this also assume almost 1:1 sampling... which might not be feasible either...otherwise you'll be seeing fairly lossy results, right?

Actually, to detect large flows (defined as 10% of link bandwidth) within a second, you would only require the following sampling rates:

your example requires seeing the 1st packet in a cycle, and seeing into the first packet. that's going to required either acceptance of loss (and gathering the loss in another rule/fashion) or 1:1 sampling to be assured of getting ALL of the DNS packets and seeing what was queried.

The flow analysis is stateless - based on a random sample of 1 in N packets, you can decode the packet headers and determine the amount of traffic associated with specific DNS queries. If you are looking at the traffic close to the target, there may be hundreds of thousands of DNS responses per second and so you very quickly determine the target IP address and can apply a filter to remove DNS traffic to that target.

...

provided your device does sflow and can export to more than one destination, sure.

This brings up an interesting point use case for an OpenFlow capable switch - replicating sFlow, NetFlow, IPFIX, Syslog, SNMP traps etc. Many top of rack switches can also forward the traffic through a GRE/VxLAN tunnel as well. http://blog.sflow.com/2013/11/udp-packet-replication-using-open.html

Christopher Morrow

10:58 p.m.

New subject: TWC (AS11351) blocking all NTP?

wait, so the whole of the thread is about stopping participants in the attack, and you're suggesting that removing/changing end-system switch/routing gear and doing something more complex than: deny udp any 123 any deny udp any 123 any 123 permit ip any any is a good plan? I'd direct you at: <https://www.nanog.org/resources/tutorials> and particularly at: "Tutorial: ISP Security - Real World Techniques II" <https://www.nanog.org/meetings/nanog23/presentations/greene.pdf> On Mon, Feb 3, 2014 at 5:16 PM, Peter Phaal <peter.phaal@gmail.com> wrote:

...

On Mon, Feb 3, 2014 at 12:38 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...
On Mon, Feb 3, 2014 at 2:42 PM, Peter Phaal <peter.phaal@gmail.com> wrote:

...
On Mon, Feb 3, 2014 at 10:16 AM, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...
On Mon, Feb 3, 2014 at 12:42 PM, Peter Phaal <peter.phaal@gmail.com> wrote:

...
...
There's certainly the case that you could drop acls/something on equipment to selectively block the traffic that matters... I suspect in some cases the choice was: "50% of the edge box customers on this location are a problem, block it across the board here instead of X00 times" (see concern about tcam/etc problems)

I agree that managing limited TCAM space is critical to the scaleability of any mitigation solution. However, tying up TCAM space on every edge device with filters to prevent each new threat is likely

yup, there's a tradeoff, today it's being made one way, tomorrow perhaps a different way. My point was that today the percentage of sdn capable devices is small enough that you still need a decimal point to measure it. (I bet, based on total devices deployed) The percentage of oss backend work done to do what you want is likely smaller...

the folk in NZ-land (Citylink, reannz ... others - find josh baily / cardigan) are making some strides, but only in the exchange areas so far. fun stuff... but not the deployed gear as an L2/L3 device in TWC/Comcast/Verizon.

I agree that today most networks aren't SDN ready, but there are inexpensive switches on the market that can perform these functions and for providers that have them in their network, this is an option today. In some environments, it could also make sense to drop in a layer switches to monitor and control traffic entering / exiting the network.

it's probably not a good plan to forklift your edge, for dos targets where all you really need is a 3 line acl.

...

...
...
The current 10G upgrade cycle provides an opportunity to deploy

by 'current 10g upgrade cycle' you mean the one that happened 2-5 yrs ago? or somethign newer? did you mean 100G?

I was referring to the current upgrade cycle in data centers, with servers connected with 10G rather than 1G adapters. The high volumes are driving down the cost of 10/40/100G switches.

again, lots of cost and churn for 3 lines of acl... I'm not sold.

...

...
...
With integrated hybrid OpenFlow, there is very little activity on the OpenFlow control plane. The normal BGP, ECMP, LAG, etc. control planes handles forwarding of packets. OpenFlow is only used to selectively override specific FIB entries.

that didn't really answer the question :) if I have 10k customers behind the edge box and some of them NOW start being abused, then more later and that mix changes... if it changes a bunch because the attacker is really attackers. how fast do I change before I can't do normal ops anymore?

Good point - the proposed solution is most effective for protecting customers that are targeted by DDoS attacks. While trying to prevent

Oh, so the 3 line acl is not an option? or (for a lot of customers a fine answer) null route? Some things have changed in the world of dos mitigation, but a bunch of the basics still apply. I do know that in the unfortunate event that your network is the transit or terminus of a dos attack at high volume you want to do the least configuration that'll satisfy the 2 parties involved (you and your customer)... doing a bunch of hardware replacement and/or sdn things when you can get the job done with some acls or routing changes is really going to be risky.

...

attackers entering the network is good citizenship, the value and effectiveness of the mitigation service increases as you get closer to the target of the attack. In this case there typically aren't very many targets and so a single rule filtering on destination IP address and protocol would typically be effective (and less disruptive to the victim that null routing).

...
...
Typical networks probably only see a few DDoS attacks an hour at the most, so pushing a few rules an hour to mitigate them should have little impact on the switch control plane.

based on what math did you get 'few per hour?' As an endpoint (focal point) or as a contributor? The problem that started this discussion was being a contributor...which I bet happens a lot more often than /few an hour/.

I am sorry, I should have been clearer, the SDN solution I was describing is aimed at protecting the target's links, rather than mitigating the botnet and amplification layers.

and i'd say that today sdn is out of reach for most deployments, and that the simplest answer is already available.

...

The number of attacks was from the perspective of DDoS targets and their service providers. If you are considering each participant in the attack the number goes up considerably.

I bet roland has some good round-numbers on number of dos attacks per day... I bet it's higher than a few per hour globally, for the ones that get noticed.

...

...
...
A good working definition of a large flow is 10% of a link's bandwidth. If you only trigger actions for large flows then in the worst case you would only require 10 rules per port to change how these flows are treated.

10% of a 1g link is 100mbps, For contributors to ntp attacks, many of the contributors are sending ONLY 300x the input, so less than 100mbps. On a 10g link it's 1G... even more hidden.

This math and detection aren't HARD, but tuning it can be a bit challenging.

Agreed - the technique is less effective for addressing the contributors to the attack. RPF and other edge controls should be

note that the focus of the original thread was on the contributors. I think the target part of the problem has been solved since before the slides in the pdf link at the top...

...

applied, but until everyone participates and eliminates attacks at source, there is still a value in filtering close to the target of the attack.

...
...
...
...
http://blog.sflow.com/2014/01/physical-switch-hybrid-openflow-example.html

The example can be modified to target NTP mon_getlist requests and responses using the following sFlow-RT flow definition:

{'ipdestination,udpsourceport',value:'ntppvtbytes',filter:'ntppvtreq=20,42'}

or to target DNS ANY requests:

{keys:'ipdestination,udpsourceport',value:'frames',filter:'dnsqr=true&dnsqtype=255'}

this also assume almost 1:1 sampling... which might not be feasible either...otherwise you'll be seeing fairly lossy results, right?

Actually, to detect large flows (defined as 10% of link bandwidth) within a second, you would only require the following sampling rates:

your example requires seeing the 1st packet in a cycle, and seeing into the first packet. that's going to required either acceptance of loss (and gathering the loss in another rule/fashion) or 1:1 sampling to be assured of getting ALL of the DNS packets and seeing what was queried.

The flow analysis is stateless - based on a random sample of 1 in N packets, you can decode the packet headers and determine the amount of traffic associated with specific DNS queries. If you are looking at

you're getting pretty complicated for the target side: ip access-list 150 permit ip any any log (note this is basically taken verbatim from the slides) view logs, see the overwhelming majority are to hostX port Y proto Z... filter, done. you can do that in about 5 mins time, quicker if you care to rush a bit.

...

the traffic close to the target, there may be hundreds of thousands of DNS responses per second and so you very quickly determine the target IP address and can apply a filter to remove DNS traffic to that target.

...
provided your device does sflow and can export to more than one destination, sure.

This brings up an interesting point use case for an OpenFlow capable switch - replicating sFlow, NetFlow, IPFIX, Syslog, SNMP traps etc. Many top of rack switches can also forward the traffic through a GRE/VxLAN tunnel as well.

yes, more complexity seems like a great plan... in the words of someone else: "I encourage my competitors to do this" I think roland's other point that not very many people actually even use sflow is not to be taken lightly here either. -chris

...

http://blog.sflow.com/2013/11/udp-packet-replication-using-open.html

Domain Name: SFLOW.COM <snip> Registry Registrant ID: Registrant Name: PHAAL, PETER Registrant Organization: InMon Corp. <snip>

Glen Turner

4 Feb 4 Feb

12:40 a.m.

New subject: TWC (AS11351) blocking all NTP?

On 4 Feb 2014, at 9:28 am, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...

wait, so the whole of the thread is about stopping participants in the attack, and you're suggesting that removing/changing end-system switch/routing gear and doing something more complex than: deny udp any 123 any deny udp any 123 any 123 permit ip any any

Which just pushes NTP to some other port, making control harder. We’ve already pushed all ‘interesting' traffic to port 80 on TCP, which has made traffic control very expensive. Let’s not repeat that history. -- Glen Turner <http://www.gdt.id.au/~gdt/>

Christopher Morrow

2:13 a.m.

New subject: TWC (AS11351) blocking all NTP?

On Mon, Feb 3, 2014 at 7:40 PM, Glen Turner <gdt@gdt.id.au> wrote:

...

On 4 Feb 2014, at 9:28 am, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...
wait, so the whole of the thread is about stopping participants in the attack, and you're suggesting that removing/changing end-system switch/routing gear and doing something more complex than: deny udp any 123 any deny udp any 123 any 123 permit ip any any

Which just pushes NTP to some other port, making control harder. We've already pushed all 'interesting' traffic to port 80 on TCP, which has made traffic control very expensive. Let's not repeat that history.

I think in the case of 'oh crap, customer is getting 100gbps of ntp...' the above (a third party notes that the 2nd line is redundant) is a fine answer, till the flood abates. I wouldn't recommend wholesale blocking of anything across an ISP edge, but for the specific case paul was getting at: "ntp reflection attack target is your customer" ... it's going to solve the problem.

Jay Ashworth

5:52 a.m.

New subject: TWC (AS11351) blocking all NTP?

----- Original Message -----

...

From: "Glen Turner" <gdt@gdt.id.au>

...

On 4 Feb 2014, at 9:28 am, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...
wait, so the whole of the thread is about stopping participants in the attack, and you're suggesting that removing/changing end-system switch/routing gear and doing something more complex than: deny udp any 123 any deny udp any 123 any 123 permit ip any any

Which just pushes NTP to some other port, making control harder. We’ve already pushed all ‘interesting' traffic to port 80 on TCP, which has made traffic control very expensive. Let’s not repeat that history.

"Those who do not understand the Internet are condemned to reinvent it. Poorly." -- after henry@utzoo, though he was talking about Unix, and I am generally looking at Tapatalk and talking about Usenet. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274

Peter Phaal

12:54 a.m.

New subject: TWC (AS11351) blocking all NTP?

On Mon, Feb 3, 2014 at 2:58 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...

wait, so the whole of the thread is about stopping participants in the attack, and you're suggesting that removing/changing end-system switch/routing gear and doing something more complex than: deny udp any 123 any deny udp any 123 any 123 permit ip any any

is a good plan?

I'd direct you at: <https://www.nanog.org/resources/tutorials>

and particularly at: "Tutorial: ISP Security - Real World Techniques II" <https://www.nanog.org/meetings/nanog23/presentations/greene.pdf>

Thanks for the links. Many SDN solutions can be replicated using manual processes (or are ways of automating currently manual processes). Programmatic APIs allows the speed and accuracy of the response to be increased and the solution to be delivered at scale and at lower cost.

...

it's probably not a good plan to forklift your edge, for dos targets where all you really need is a 3 line acl.

For many networks it doesn't need to be forklift upgrade - vendors are adding programmatic APIs to their existing products (OpenFlow, Arista eAPI, NETCONF, ALU Web Services ...) - so a firmware upgrade may be all that is required. I do think that there are operational advantages to using protocols like OpenFlow, I2RS, BGP FlowSpec for these soft controls since they allow the configuration to remain relatively static and they avoid problems of split control (for example, and operator makes a config change and saves, locking in a temporary control from the SDN system). I would argue that the more specific the ACL can be the less collateral damage. Built-in measurement allows for a more targeted response.

...

...
Good point - the proposed solution is most effective for protecting customers that are targeted by DDoS attacks. While trying to prevent

Oh, so the 3 line acl is not an option? or (for a lot of customers a fine answer) null route? Some things have changed in the world of dos mitigation, but a bunch of the basics still apply. I do know that in the unfortunate event that your network is the transit or terminus of a dos attack at high volume you want to do the least configuration that'll satisfy the 2 parties involved (you and your customer)... doing a bunch of hardware replacement and/or sdn things when you can get the job done with some acls or routing changes is really going to be risky.

I think an automatic system using a programmatic API to install as narrowly scoped a filter as possible is the most conservative and least risky option. Manual processes are error prone, slow, and blunt instruments like a null route can cause collateral damage to services.

...

...
...
...
Typical networks probably only see a few DDoS attacks an hour at the most, so pushing a few rules an hour to mitigate them should have little impact on the switch control plane.

based on what math did you get 'few per hour?' As an endpoint (focal point) or as a contributor? The problem that started this discussion was being a contributor...which I bet happens a lot more often than /few an hour/.

I am sorry, I should have been clearer, the SDN solution I was describing is aimed at protecting the target's links, rather than mitigating the botnet and amplification layers.

and i'd say that today sdn is out of reach for most deployments, and that the simplest answer is already available.

...
The number of attacks was from the perspective of DDoS targets and their service providers. If you are considering each participant in the attack the number goes up considerably.

I bet roland has some good round-numbers on number of dos attacks per day... I bet it's higher than a few per hour globally, for the ones that get noticed.

The "few per hour" number isn't a global statistic. This is the number that a large hosting data center might experience. The global number is much larger, but not very relevant to a specific provider looking to size a mitigation solution.

...

note that the focus of the original thread was on the contributors. I think the target part of the problem has been solved since before the slides in the pdf link at the top...

Do most service providers allow their customers to control ACLs in the upstream routers? Do they automatically monitor traffic and insert the filters themselves when there is an attack? I don't believe so - while the slides describe a solution, automation is needed to make available at large scale.

...

you're getting pretty complicated for the target side: ip access-list 150 permit ip any any log

(note this is basically taken verbatim from the slides)

view logs, see the overwhelming majority are to hostX port Y proto Z... filter, done. you can do that in about 5 mins time, quicker if you care to rush a bit.

An automated system can perform the analysis and apply the filter in a second with no human intervention. What if you have to manage thousands of customer links?

...

...
This brings up an interesting point use case for an OpenFlow capable switch - replicating sFlow, NetFlow, IPFIX, Syslog, SNMP traps etc. Many top of rack switches can also forward the traffic through a GRE/VxLAN tunnel as well.

yes, more complexity seems like a great plan... in the words of someone else: "I encourage my competitors to do this"

Using the existing switches to replicate and tap production traffic is less complex and more scalable than alternatives. You may find the following use case interesting: http://blog.sflow.com/2013/04/sdn-packet-broker.html

...

I think roland's other point that not very many people actually even use sflow is not to be taken lightly here either.

It doesn't have to be sFlow - the sFlow solution was provided as a concrete example since that is the technology I am most familiar with. However, sFlow, IPFIX, NetFlow, jFlow etc. combined with analytics and a programmatic control API allows DDoS mitigation to be automated. I think Roland would agree that an automated response is more effective than a manual process.

Christopher Morrow

2:33 a.m.

New subject: TWC (AS11351) blocking all NTP?

-larry directly since I'm sure he's either tired of this, or already reading it via the nanog subscription. On Mon, Feb 3, 2014 at 7:54 PM, Peter Phaal <peter.phaal@gmail.com> wrote:

...

On Mon, Feb 3, 2014 at 2:58 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:

...
wait, so the whole of the thread is about stopping participants in the attack, and you're suggesting that removing/changing end-system switch/routing gear and doing something more complex than: deny udp any 123 any deny udp any 123 any 123 permit ip any any

is a good plan?

I'd direct you at: <https://www.nanog.org/resources/tutorials>

and particularly at: "Tutorial: ISP Security - Real World Techniques II" <https://www.nanog.org/meetings/nanog23/presentations/greene.pdf>

Thanks for the links. Many SDN solutions can be replicated using

you're sort of a broken record on this bit ... I don't think folk are (me in particular) knocking sdn things, in general. In the specific though: 1) you missed the point originally, stop marketing your blog pls. 2) you missed the point(s) about availability and realistic deployment of solutions in the near term

...

manual processes (or are ways of automating currently manual processes). Programmatic APIs allows the speed and accuracy of the response to be increased and the solution to be delivered at scale and at lower cost.

and all of these require very strict and very careful deployment of oss measures to watch over current state and intended state. They require also very careful training and troubleshooting steps for the ops folk running the systems. None of this is deployable 'tomorrow' (in under 24hrs) safely, and most likely it'll be a bit more time until there is ubiquitous deployment of sdn-like functionality in larger scale networks. not that I'm not a fan, and not that I don't like me some automation, but.. having seen automation go very wrong (l3's acl spider... crushes l3..., flowspec 'whoopsie' at cloudflare and TWTC... there are lots of other examples).

...

...
it's probably not a good plan to forklift your edge, for dos targets where all you really need is a 3 line acl.

For many networks it doesn't need to be forklift upgrade - vendors are adding programmatic APIs to their existing products (OpenFlow, Arista eAPI, NETCONF, ALU Web Services ...) - so a firmware upgrade may be

arista is deployed in which large scale networks with api/sdn functionality ? they're a great bunch of folks, they make some nice gear, it's still getting baked though, and it's not displacing (today) existing gear that's still being depreciated. for anything to be workable in the near-term, the above examples just aren't going to work. note my many references to "5-7 yrs when deprecation cycles and next-replacement happens"

...

all that is required.

I do think that there are operational advantages to using protocols like OpenFlow, I2RS, BGP FlowSpec for these soft controls since they allow the configuration to remain relatively static and they avoid problems of split control (for example, and operator makes a config change and saves, locking in a temporary control from the SDN system).

automation, with protections, safety checks, assurances that the process won't break things in odd failure modes.. not to mention bug^H^H^Hfeature issues with gear, we're still a bit from large scale deployment.

...

I would argue that the more specific the ACL can be the less collateral damage. Built-in measurement allows for a more targeted response.

sure, I think roland and I at least have been saying the same thing.

...

...
...
Good point - the proposed solution is most effective for protecting customers that are targeted by DDoS attacks. While trying to prevent

Oh, so the 3 line acl is not an option? or (for a lot of customers a fine answer) null route? Some things have changed in the world of dos mitigation, but a bunch of the basics still apply. I do know that in the unfortunate event that your network is the transit or terminus of a dos attack at high volume you want to do the least configuration that'll satisfy the 2 parties involved (you and your customer)... doing a bunch of hardware replacement and/or sdn things when you can get the job done with some acls or routing changes is really going to be risky.

I think an automatic system using a programmatic API to install as narrowly scoped a filter as possible is the most conservative and least risky option. Manual processes are error prone, slow, and blunt instruments like a null route can cause collateral damage to services.

folk say this, but the customer very often explicitly asks for null routes. The thing being targetted is very often not 'revenue generating ecommerce site', and for providers where the default answer is 'everything is a null route', their customers ought to find a provider that thinks differently.

...

...
...
...
...
Typical networks probably only see a few DDoS attacks an hour at the most, so pushing a few rules an hour to mitigate them should have little impact on the switch control plane.

based on what math did you get 'few per hour?' As an endpoint (focal point) or as a contributor? The problem that started this discussion was being a contributor...which I bet happens a lot more often than /few an hour/.

I am sorry, I should have been clearer, the SDN solution I was describing is aimed at protecting the target's links, rather than mitigating the botnet and amplification layers.

and i'd say that today sdn is out of reach for most deployments, and that the simplest answer is already available.

...
The number of attacks was from the perspective of DDoS targets and their service providers. If you are considering each participant in the attack the number goes up considerably.

I bet roland has some good round-numbers on number of dos attacks per day... I bet it's higher than a few per hour globally, for the ones that get noticed.

The "few per hour" number isn't a global statistic. This is the number that a large hosting data center might experience. The global number

I wonder how many rackspace, softlayer, amazon-aws, xs4all, hetzner, etc experience per hour. in any case, 'often' is probably close enough.

...

is much larger, but not very relevant to a specific provider looking to size a mitigation solution.

...
note that the focus of the original thread was on the contributors. I think the target part of the problem has been solved since before the slides in the pdf link at the top...

Do most service providers allow their customers to control ACLs in the upstream routers? Do they automatically monitor traffic and insert the

nope, and I don't necessarily think that changes with SDN... letting your customer traffic-engineer is ... dangerous. it tosses capacity planning concerns out the window :( There are several providers, however, that let their customers initiate smart/intelligent mitigation solutions though. I know of 3 that let the customer trigger based on BGP community. A customer can choose how they want to 'detect' and then simply bgp-update for mitigation... I bet there are folk that don't own networks that provide this service as well... I'm sure roland has some work stories he's presented on about this very thing.

...

filters themselves when there is an attack? I don't believe so - while

some providers do, based upon customer demand for the service. it's not really that hard, though it is a cost for the provider so that's shared with the customers using the solution(s).

...

the slides describe a solution, automation is needed to make available at large scale.

automation isn't precluded from solution space in the slides, note that they were presented and created in ~2002... so the state of the art has changed a bit since then, but the methodology and practices from 2002 can be applied fairly directly today.

...

...
you're getting pretty complicated for the target side: ip access-list 150 permit ip any any log

(note this is basically taken verbatim from the slides)

view logs, see the overwhelming majority are to hostX port Y proto Z... filter, done. you can do that in about 5 mins time, quicker if you care to rush a bit.

An automated system can perform the analysis and apply the filter in a second with no human intervention. What if you have to manage thousands of customer links?

been there, done that... got several tshirts. it's honestly not that bad.

...

...
...
This brings up an interesting point use case for an OpenFlow capable switch - replicating sFlow, NetFlow, IPFIX, Syslog, SNMP traps etc. Many top of rack switches can also forward the traffic through a GRE/VxLAN tunnel as well.

yes, more complexity seems like a great plan... in the words of someone else: "I encourage my competitors to do this"

Using the existing switches to replicate and tap production traffic is less complex and more scalable than alternatives. You may find the following use case interesting:

http://blog.sflow.com/2013/04/sdn-packet-broker.html

...
I think roland's other point that not very many people actually even use sflow is not to be taken lightly here either.

It doesn't have to be sFlow - the sFlow solution was provided as a concrete example since that is the technology I am most familiar with.

and which, according to a credible source, is not deployed by and large by service providers. certainly in some IDC situations sflow is interesting, but it's not there according to someone who I believe is in a position to know, for isp situations. leaving it out though, some signal of 'traffic looks like' is available if deployed. not everyone does...some don't because 'meh!' some don't because 'not in featureset bought' some don't because '<other silly reason>'. folk that don't have it generally can't just crank it up 'now' though.

...

However, sFlow, IPFIX, NetFlow, jFlow etc. combined with analytics and a programmatic control API allows DDoS mitigation to be automated. I

right, arbor sells this, as one example. (there are others of course) there are several large US isp's that use that solution (or an offspring of that) today. it's not quite sdn, but it is automated and relatively fire/forget. -chris

Laszlo Hanyecz

6:45 p.m.

New subject: TWC (AS11351) blocking all NTP?

Why not just provide a public API that lets users specify which of your customers they want to null route? It would save operators the trouble of having to detect the flows.. and you can sell premium access that allows the API user to null route all your other customers at once. Once everyone implements these awesome flow detectors it will just take short bursts of flooding to DoS their customers. If you can detect them in less than a second, it might not even show up on any interface graphs. I think this is already the case at a lot of VPS and hosting providers, since they're such popular sources as well as targets. I don't know what, if anything, is the answer to these problems, but building complex auto-filtering contraptions is not it. Filtering NTP or UDP or any other specific application will just break things more, which is the goal of a 'denial of service' attack. Eventually everything will just be stuffed into TCP port 80 packets and the arms race will continue. The recent abuse of NTP is unfortunate, but it will get fixed. I just wonder if UDP will have to be tunneled inside HTTP by then. Laszlo

William Herrin

6:52 p.m.

New subject: TWC (AS11351) blocking all NTP?

On Tue, Feb 4, 2014 at 1:45 PM, Laszlo Hanyecz <laszlo@heliacal.net> wrote:

...

Why not just provide a public API that lets users specify which of your customers they want to null route?

They're spoofed packets. There's no way for anyone outside your AS to know which of your customers the packets came from. It's not particularly easy to trace inside your AS either. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Christopher Morrow

7 p.m.

New subject: TWC (AS11351) blocking all NTP?

On Tue, Feb 4, 2014 at 1:52 PM, William Herrin <bill@herrin.us> wrote:

...

On Tue, Feb 4, 2014 at 1:45 PM, Laszlo Hanyecz <laszlo@heliacal.net> wrote:

...
Why not just provide a public API that lets users specify which of your customers they want to null route?

They're spoofed packets. There's no way for anyone outside your AS to know which of your customers the packets came from. It's not particularly easy to trace inside your AS either.

wasn't laszlo joking and sort of making a point about sdn/api/etc usage by customers willy-nilly in your network? (which was sort of my point about customers influencing TE in your network as well)

Laszlo Hanyecz

7:01 p.m.

New subject: TWC (AS11351) blocking all NTP?

I was joking, I meant that the operator provides an API for attackers, so they can accomplish their goal of taking the customer offline, without having to spoof or flood or whatever else. Automatically installing ACLs in response to observed flows accomplishes almost the same thing. As a concrete example, say a customer is running a game server that utilizes UDP port 12345. An attacker sends a large flow to customer:12345 and your switches and routers all start filtering anything with destination customer:12345, for say 2 hours. Then the attacker can just repeat in 2 hours and send only a few seconds worth of flooding each time. On Feb 4, 2014, at 6:52 PM, William Herrin <bill@herrin.us> wrote:

...

On Tue, Feb 4, 2014 at 1:45 PM, Laszlo Hanyecz <laszlo@heliacal.net> wrote:

...
Why not just provide a public API that lets users specify which of your customers they want to null route?

They're spoofed packets. There's no way for anyone outside your AS to know which of your customers the packets came from. It's not particularly easy to trace inside your AS either.

Regards, Bill Herrin

-- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

William Herrin

7:03 p.m.

New subject: TWC (AS11351) blocking all NTP?

On Tue, Feb 4, 2014 at 2:01 PM, Laszlo Hanyecz <laszlo@heliacal.net> wrote:

...

I was joking,

And I was being a tad obtuse. My apoligies. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Dobbins, Roland

3 Feb 3 Feb

7:19 p.m.

New subject: TWC (AS11351) blocking all NTP?

On Feb 4, 2014, at 12:42 AM, Peter Phaal <peter.phaal@gmail.com> wrote:

...

Real-time analytics based on measurements from switches/routers (sFlow/PSAMP/IPFIX) can identify large UDP flows and integrated hybrid OpenFlow, I2RS, REST, NETCONF APIs, etc. can be used to program the switches/routers to selectively filter traffic based on UDP port and IP source / destination. By deploying a DDoS mitigation SDN application, providers can use their existing infrastructure to protect their own and their customers networks from flood attacks, and generate additional revenue by delivering flood protection as a value added service.

This is certainly a general capability set towards which many operators are evolving (and it's always amusing how you leave out NetFlow, which many operators use, but include sFlow, which very few operators use, heh), but it's going to be quite some time before this sort of thing is practical and widely-deployale. Believe me, I've been working towards this vision for many years. It isn't going to happen overnight.

...

Specifically looking at sFlow, large flood attacks can be detected within a second.

And with NetFlow, and with IPFIX - the first of which is widely deployed today, and the second of which will be widely deployed in future. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

4290

Age (days ago)

4291

Last active (days ago)

List overview

Download

24 comments

12 participants

participants (12)

Cb B
Christopher Morrow
Dobbins, Roland
Geraint Jones
Glen Turner
Jay Ashworth
John Kristoff
Larry Sheldon
Laszlo Hanyecz
Peter Phaal
Stephane Bortzmeyer
William Herrin