it seems that anycasting was quite insufficient to protect netsol's service from being severely damaged (udp dead, tcp worked) for a considerable length of time by a ddos [0] last week [1]. it would be very helpful to other folk concerned with service deployment to understand how the service in question was/is anycast, and what might be done differently to mitigate exposure of similar services. anyone have clues or is this ostrich city? maybe a preso at nanog would be educational. randy --- [0] - as it seems that the ddos sources were ip address spoofed (which is why the service still worked for tcp), i owe paul an apology for downplaying the immediacy of the need for source address filtering. [1] - netsol is not admitting anything happened, of course <sigh>. but we all saw the big splash as it hit the water, the bubbles as it sank, and the symptoms made the cause pretty clear.
On May 6, 2005, at 12:40 PM, Randy Bush wrote:
it seems that anycasting was quite insufficient to protect netsol's service from being severely damaged (udp dead, tcp worked) for a considerable length of time by a ddos [0] last week [1]. it would be very helpful to other folk concerned with service deployment to understand how the service in question was/is anycast, and what might be done differently to mitigate exposure of similar services.
anyone have clues or is this ostrich city? maybe a preso at nanog would be educational.
Seconded.
[0] - as it seems that the ddos sources were ip address spoofed (which is why the service still worked for tcp), i owe paul an apology for downplaying the immediacy of the need for source address filtering.
I was under the - possibly mistaken - impression that they activated their Riverhead boxes and that's why only TCP worked, not because of spoofed source. Or are you saying that since the sources were spoofed, they could not filter the attack and had to resort to Riverhead's 'truncate' mechanism?
[1] - netsol is not admitting anything happened, of course <sigh>. but we all saw the big splash as it hit the water, the bubbles as it sank, and the symptoms made the cause pretty clear.
How much does it suck that a major piece of Internet infrastructure was severely affected and the details are shrouded? -- TTFN, patrick
On Fri, 6 May 2005, Randy Bush wrote:
it seems that anycasting was quite insufficient to protect netsol's service from being severely damaged (udp dead, tcp worked) for a considerable length of time by a ddos [0] last week [1]. it would be very helpful to other folk concerned with service deployment to understand how the service in question was/is anycast, and what might be done differently to mitigate exposure of similar services.
was the service in question anycast'ed? I got the impression that the worldnic servers were all NON-anycast... I only see the /21 covering these servers through 10515 (which is verisign as I recall?) Judging by latency I even think they are in the northern virginia area... I also noted: worldnic.com. 86400 IN NS ns1.netsol.com. worldnic.com. 86400 IN NS ns2.netsol.com. worldnic.com. 86400 IN NS ns3.netsol.com. ;; ADDITIONAL SECTION: ns1.netsol.com. 86400 IN A 216.168.229.228 ns2.netsol.com. 86400 IN A 216.168.229.229 ns3.netsol.com. 86400 IN A 216.168.229.229 why have 3 records and 2 ips? odd. You'd think they would have more ips in that /21 or other /24's to allocate from, just in case they had to jettison 1 address which was getting pounded :( (not that these were getting attacked per-say, but still)
[0] - as it seems that the ddos sources were ip address spoofed (which is why the service still worked for tcp), i owe paul an apology for downplaying the immediacy of the need for source address filtering.
It's also not clear that the sources were spoofed, if as Patrick says they put in a riverhead(s) (which isn't too far fetched) the normal mode for 'protection' of DNS is to: 1) truncate 2) rate-limit - and cache (I think it caches atleast, I know it will go into proxy mode and rate-limit) truncate forces TCP which allows RHG to verify the source address is really asking to chat, rate-limit function keeps 'bad actors' from beatting the hell out of the protected resource. So, without more info from NetSol (seems not to be forthcoming?) about the mix of attack traffic (which the RHG will provide) it's hard to state definitively that the attack was 'mostly spoofed' :(
On Fri, 6 May 2005, Randy Bush wrote:
was the service in question anycast'ed? I got the impression that the worldnic servers were all NON-anycast.
i not believe you are correct, they are not anycasted. i was silly enough to believe one (seemingly false) report. apologies.
no apologies to me required, but it'd still be interesting to hear what happend, eh? :)
[ i figure if i keep asking poking and naive questions i'll keep learning more about this, which may help me and others learn from the mistakes of others. ]
no apologies to me required, but it'd still be interesting to hear what happend, eh? :)
i suspect that we don't hear from the horse's mouth is a symptom of one of the causes, "we know well enough to go it alone, and we can pretend that we're perfect." well, a day+ long wipeout should make it pretty clear that the bunker mentality is as much a fallacy as the technology of the deployment. it failed, and badly. but you are correct, those of us more responsible for network engineering are as much concerned by the technological aspect(s). and folk seem to think that it is a bunker mentality centralized deployment, i.e., a small number of server clusters ripe for the picking, that fell to a simple, though likely intense, ddos attack. and one that we do not know was spoofed (i unapologize, paul:-) and did not really need to be because of the weaknesses of the service deployment. and the above combined with problems of riverhead configuration and limitations, and lack of cooperation with upstreams to mitigate the attack, turned a fairly normal ddos into a day+ serious mess? randy
participants (3)
-
Christopher L. Morrow
-
Patrick W. Gilmore
-
Randy Bush