Re: TWC (AS11351) blocking all NTP?

4 Feb 2014

      -larry directly since I'm sure he's either tired of this, or already
reading it via the nanog subscription.

On Mon, Feb 3, 2014 at 7:54 PM, Peter Phaal <peter.phaal@gmail.com> wrote:
...
On Mon, Feb 3, 2014 at 2:58 PM, Christopher Morrow
<morrowc.lists@gmail.com> wrote:
...
wait, so the whole of the thread is about stopping participants in the
attack, and you're suggesting that removing/changing end-system
switch/routing gear and doing something more complex than:
  deny udp any 123 any
  deny udp any 123 any 123
  permit ip any any
is a good plan?
I'd direct you at:
  <https://www.nanog.org/resources/tutorials>
and particularly at:
 "Tutorial: ISP Security - Real World Techniques II"
 <https://www.nanog.org/meetings/nanog23/presentations/greene.pdf>
Thanks for the links. Many SDN solutions can be replicated using
you're sort of a broken record on this bit ... I don't think folk are
(me in particular) knocking sdn things, in general. In the specific
though:
  1) you missed the point originally, stop marketing your blog pls.
  2) you missed the point(s) about availability and realistic
deployment of solutions in the near term
...
manual processes (or are ways of automating currently manual
processes). Programmatic APIs allows the speed and accuracy of the
response to be increased and the solution to be delivered at scale and
at lower cost.
and all of these require very strict and very careful deployment of
oss measures to watch over current state and intended state. They
require also very careful training and troubleshooting steps for the
ops folk running the systems.  None of this is deployable 'tomorrow'
(in under 24hrs) safely, and most likely it'll be a bit more time
until there is ubiquitous deployment of sdn-like functionality in
larger scale networks.

not that I'm not a fan, and not that I don't like me some automation,
but.. having seen automation go very wrong (l3's acl spider... crushes
l3..., flowspec 'whoopsie' at cloudflare and TWTC... there are lots of
other examples).
...
...
it's probably not a good plan to forklift your edge, for dos targets
where all you really need is a 3 line acl.
For many networks it doesn't need to be forklift upgrade - vendors are
adding programmatic APIs to their existing products (OpenFlow, Arista
eAPI, NETCONF, ALU Web Services ...) - so a firmware upgrade may be
arista is deployed in which large scale networks with api/sdn
functionality ? they're a great bunch of folks, they make some nice
gear, it's still getting baked though, and it's not displacing (today)
existing gear that's still being depreciated. for anything to be
workable in the near-term, the above examples just aren't going to
work. note my many references to "5-7 yrs when deprecation cycles and
next-replacement happens"
...
all that is required.
I do think that there are operational advantages to using protocols
like OpenFlow, I2RS, BGP FlowSpec for these soft controls since they
allow the configuration to remain relatively static and they avoid
problems of split control (for example, and operator makes a config
change and saves, locking in a temporary control from the SDN system).
automation, with protections, safety checks, assurances that the
process won't break things in odd failure modes.. not to mention
bug^H^H^Hfeature issues with gear, we're still a bit from large scale
deployment.
...
I would argue that the more specific the ACL can be the less
collateral damage. Built-in measurement allows for a more targeted
response.
sure, I think roland and I at least have been saying the same thing.
...
...
...
Good point - the proposed solution is most effective for protecting
customers that are targeted by DDoS attacks. While trying to prevent
Oh, so the 3 line acl is not an option? or (for a lot of customers a
fine answer) null route? Some things have changed in the world of dos
mitigation, but a bunch of the basics still apply. I do know that in
the unfortunate event that your network is the transit or terminus of
a dos attack at high volume you want to do the least configuration
that'll satisfy the 2 parties involved (you and your customer)...
doing a bunch of hardware replacement and/or sdn things when you can
get the job done with some acls or routing changes is really going to
be risky.
I think an automatic system using a programmatic API to install as
narrowly scoped a filter as possible is the most conservative and
least risky option. Manual processes are error prone, slow, and blunt
instruments like a null route can cause collateral damage to services.
folk say this, but the customer very often explicitly asks for null
routes. The thing being targetted is very often not 'revenue
generating ecommerce site', and for providers where the default answer
is 'everything is a null route', their customers ought to find a
provider that thinks differently.
...
...
...
...
...
Typical networks probably only see a few DDoS attacks an hour at the
most, so pushing a few rules an hour to mitigate them should have
little impact on the switch control plane.
based on what math did you get 'few per hour?' As an endpoint (focal
point) or as a contributor? The problem that started this discussion
was being a contributor...which I bet happens a lot more often than
/few an hour/.
I am sorry, I should have been clearer, the SDN solution I was
describing is aimed at protecting the target's links, rather than
mitigating the botnet and amplification layers.
and i'd say that today sdn is out of reach for most deployments, and
that the simplest answer is already available.
...
The number of attacks was from the perspective of DDoS targets and
their service providers.  If you are considering each participant in
the attack the number goes up considerably.
I bet roland has some good round-numbers on number of dos attacks per
day... I bet it's higher than a few per hour globally, for the ones
that get noticed.
The "few per hour" number isn't a global statistic. This is the number
that a large hosting data center might experience. The global number
I wonder how many rackspace, softlayer, amazon-aws, xs4all, hetzner,
etc experience per hour. in any case, 'often' is probably close
enough.
...
is much larger, but not very relevant to a specific provider looking
to size a mitigation solution.
...
note that the focus of the original thread was on the contributors. I
think the target part of the problem has been solved since before the
slides in the pdf link at the top...
Do most service providers allow their customers to control ACLs in the
upstream routers? Do they automatically monitor traffic and insert the
nope, and I don't necessarily think that changes with SDN... letting
your customer traffic-engineer is ... dangerous. it tosses capacity
planning concerns out the window :(

There are several providers, however, that let their customers
initiate smart/intelligent mitigation solutions though. I know of 3
that let the customer trigger based on BGP community. A customer can
choose how they want to 'detect' and then simply bgp-update for
mitigation... I bet there are folk that don't own networks that
provide this service as well... I'm sure roland has some work stories
he's presented on about this very thing.
...
filters themselves when there is an attack? I don't believe so - while
some providers do, based upon customer demand for the service. it's
not really that hard, though it is a cost for the provider so that's
shared with the customers using the solution(s).
...
the slides describe a solution, automation is needed to make available
at large scale.
automation isn't precluded from solution space in the slides, note
that they were presented and created in ~2002... so the state of the
art has changed a bit since then, but the methodology and practices
from 2002 can be applied fairly directly today.
...
...
you're getting pretty complicated for the target side:
  ip access-list 150 permit ip any any log
(note this is basically taken verbatim from the slides)
view logs, see the overwhelming majority are to hostX port Y proto
Z... filter, done.
you can do that in about 5 mins time, quicker if you care to rush a bit.
An automated system can perform the analysis and apply the filter in a
second with no human intervention. What if you have to manage
thousands of customer links?
been there, done that... got several tshirts. it's honestly not that bad.
...
...
...
This brings up an interesting point use case for an OpenFlow capable
switch - replicating sFlow, NetFlow, IPFIX, Syslog, SNMP traps etc.
Many top of rack switches can also forward the traffic through a
GRE/VxLAN tunnel as well.
yes, more complexity seems like a great plan... in the words of
someone else: "I encourage my competitors to do this"
Using the existing switches to replicate and tap production traffic is
less complex and more scalable than alternatives. You may find the
following use case interesting:
http://blog.sflow.com/2013/04/sdn-packet-broker.html
...
I think roland's other point that not very many people actually even
use sflow is not to be taken lightly here either.
It doesn't have to be sFlow - the sFlow solution was provided as a
concrete example since that is the technology I am most familiar with.
and which, according to a credible source, is not deployed by and
large by service providers. certainly in some IDC situations sflow is
interesting, but it's not there according to someone who I believe is
in a position to know, for isp situations.

leaving it out though, some signal of 'traffic looks like' is
available if deployed. not everyone does...some don't because 'meh!'
some don't because 'not in featureset bought' some don't because
'<other silly reason>'. folk that don't have it generally can't just
crank it up 'now' though.
...
However, sFlow, IPFIX, NetFlow, jFlow etc. combined with analytics and
a programmatic control API allows DDoS mitigation to be automated. I
right, arbor sells this, as one example. (there are others of course)
there are several large US isp's that use that solution (or an
offspring of that) today. it's not quite sdn, but it is automated and
relatively fire/forget.

-chris