Well folks, since the middle of August I've been tracking the spread and subsequent efforts by our community to stop the nachia/welchia infection that took down so many networks. Sadly, by my estimations, only about 20-30% of infected hosts were cleaned. After Jan 1, 2004 it appears that the thousands, (millions?) of remaining infected hosts were rebooted and the worm removed itself. Network traffic has finally returned to normal. What kind of effects did everyone see from this devastating worm and what lessons did we learn for preventing network downtime in the future?
Flow based "attacks" can kill flow based routers. James Edwards Routing and Security Administrator jamesh@cybermesa.com At the Santa Fe Office: Internet at Cyber Mesa Store hours: 9-6 Monday through Friday 505-988-9200 SIP:1(747)669-1965
: What kind of effects did everyone see from this devastating worm and what : lessons did we learn for preventing network downtime in the future? Proper network design is a good thing... ;-) scott
: : What kind of effects did everyone see from this devastating worm and what : : lessons did we learn for preventing network downtime in the future? : : : Proper network design is a good thing... ;-) Before I get flamed, I should say that is for end-user networks, not the normal BANs (Big A$$ Networks) that're the norm on this list. I'm a .edu-eyeball network now-a-days and I don't have to get everything to the end-user nor do I have to send out everything from the end-user, so I can block whatever I need, so long as my customers are happy. 99.9% of the time, they don't know I'm blocking anything... scott
lesson learned: stop using /makeshift/ layer3 switches (without naming vendor) to run L3 core -J On Tue, Jan 20, 2004 at 02:22:52PM -0800, Brent Van Dussen wrote:
Well folks, since the middle of August I've been tracking the spread and subsequent efforts by our community to stop the nachia/welchia infection that took down so many networks.
Sadly, by my estimations, only about 20-30% of infected hosts were cleaned. After Jan 1, 2004 it appears that the thousands, (millions?) of remaining infected hosts were rebooted and the worm removed itself. Network traffic has finally returned to normal.
What kind of effects did everyone see from this devastating worm and what lessons did we learn for preventing network downtime in the future?
-- James Jun (formerly Haesu) TowardEX Technologies, Inc. 1740 Massachusetts Ave. Boxborough, MA 01719 Consulting, IPv4 & IPv6 colocation, web hosting, network design & implementation http://www.towardex.com | james@towardex.com Cell: (978)394-2867 | Office: (978)263-3399 Ext. 170 Fax: (978)263-0033 | AIM: GigabitEthernet0 NOC: http://www.twdx.net | POC: HAESU-ARIN, HDJ1-6BONE
Not all L3-switches are flow-based; prefix-based ones should do just fine. Can people add/correct this initial list ? Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A) Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with Sup2(A), Sup3(A/BXL) Rubens ----- Original Message ----- From: <haesu@towardex.com> To: "Brent Van Dussen" <vandusb@attens.com> Cc: "NANOG" <nanog@merit.edu> Sent: Tuesday, January 20, 2004 9:46 PM Subject: Re: Nachi/Welchia Aftermath
lesson learned: stop using /makeshift/ layer3 switches (without naming vendor) to run L3 core
-J
On Tue, Jan 20, 2004 at 02:22:52PM -0800, Brent Van Dussen wrote:
Well folks, since the middle of August I've been tracking the spread and subsequent efforts by our community to stop the nachia/welchia infection that took down so many networks.
Sadly, by my estimations, only about 20-30% of infected hosts were cleaned. After Jan 1, 2004 it appears that the thousands, (millions?)
of
remaining infected hosts were rebooted and the worm removed itself. Network traffic has finally returned to normal.
What kind of effects did everyone see from this devastating worm and what lessons did we learn for preventing network downtime in the future?
-- James Jun (formerly Haesu) TowardEX Technologies, Inc. 1740 Massachusetts Ave. Boxborough, MA 01719 Consulting, IPv4 & IPv6 colocation, web hosting, network design & implementation http://www.towardex.com | james@towardex.com Cell: (978)394-2867 | Office: (978)263-3399 Ext. 170 Fax: (978)263-0033 | AIM: GigabitEthernet0 NOC: http://www.twdx.net | POC: HAESU-ARIN, HDJ1-6BONE
yes in concur.. prefix based ones (like FIB) are fine. unfortunately some models from some vendors (tisk tisk) who use slow process path to reprogram the CAM per flow can be quite painful during situations like random dest. dos attacks and worms.. add the E vendor to your list too.. we had summit48i that loved the worm traffic -J On Tue, Jan 20, 2004 at 10:16:03PM -0200, Rubens Kuhl Jr. wrote:
Not all L3-switches are flow-based; prefix-based ones should do just fine. Can people add/correct this initial list ?
Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A) Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with Sup2(A), Sup3(A/BXL)
Rubens
----- Original Message ----- From: <haesu@towardex.com> To: "Brent Van Dussen" <vandusb@attens.com> Cc: "NANOG" <nanog@merit.edu> Sent: Tuesday, January 20, 2004 9:46 PM Subject: Re: Nachi/Welchia Aftermath
lesson learned: stop using /makeshift/ layer3 switches (without naming vendor) to run L3 core
-J
On Tue, Jan 20, 2004 at 02:22:52PM -0800, Brent Van Dussen wrote:
Well folks, since the middle of August I've been tracking the spread and subsequent efforts by our community to stop the nachia/welchia infection that took down so many networks.
Sadly, by my estimations, only about 20-30% of infected hosts were cleaned. After Jan 1, 2004 it appears that the thousands, (millions?)
of
remaining infected hosts were rebooted and the worm removed itself. Network traffic has finally returned to normal.
What kind of effects did everyone see from this devastating worm and what lessons did we learn for preventing network downtime in the future?
-- James Jun (formerly Haesu) TowardEX Technologies, Inc. 1740 Massachusetts Ave. Boxborough, MA 01719 Consulting, IPv4 & IPv6 colocation, web hosting, network design & implementation http://www.towardex.com | james@towardex.com Cell: (978)394-2867 | Office: (978)263-3399 Ext. 170 Fax: (978)263-0033 | AIM: GigabitEthernet0 NOC: http://www.twdx.net | POC: HAESU-ARIN, HDJ1-6BONE
-- James Jun (formerly Haesu) TowardEX Technologies, Inc. 1740 Massachusetts Ave. Boxborough, MA 01719 Consulting, IPv4 & IPv6 colocation, web hosting, network design & implementation http://www.towardex.com | james@towardex.com Cell: (978)394-2867 | Office: (978)263-3399 Ext. 170 Fax: (978)263-0033 | AIM: GigabitEthernet0 NOC: http://www.twdx.net | POC: HAESU-ARIN, HDJ1-6BONE
lesson learned: stop using /makeshift/ layer3 switches (without naming vendor) to run L3 core
more generally... "if you want routing, buy a router." i have a hybrid switer that i'm very happy with. at my house, that is. (the idea of using one in commerce or production gives me cold shivers.) -- Paul Vixie
more generally... "if you want routing, buy a router."
amen. imho there can't be a better routing equipment than a real router :) -J
i have a hybrid switer that i'm very happy with. at my house, that is. (the idea of using one in commerce or production gives me cold shivers.) -- Paul Vixie
-- James Jun (formerly Haesu) TowardEX Technologies, Inc. 1740 Massachusetts Ave. Boxborough, MA 01719 Consulting, IPv4 & IPv6 colocation, web hosting, network design & implementation http://www.towardex.com | james@towardex.com Cell: (978)394-2867 | Office: (978)263-3399 Ext. 170 Fax: (978)263-0033 | AIM: GigabitEthernet0 NOC: http://www.twdx.net | POC: HAESU-ARIN, HDJ1-6BONE
On Wed, Jan 21, 2004 at 12:11:43PM -0500, haesu@towardex.com wrote:
more generally... "if you want routing, buy a router."
amen. imho there can't be a better routing equipment than a real router :)
But unfortunately, not true. A router is anything which makes decisions by performing a longest prefix match lookup against a layer 3 header, period. That "I route with a router and switch with a switch" nonsense is tired, usually covers for a lack of understanding of the issues involved, and prevents you from reaching the correct conclusion which is "I route with the device which is most appropriate for the task". There are some good routers, there are some bad routers, there are some TERRIBLE routers, there are even some routers which are good at some things and bad at others, but a router does not have to be a switch-turned-router to suck (at a specific task) any more than a switch-turned-router has to suck. For example, would you rather have the reassuring consistancy of a 7206VXR which tops out at 300Mbps come rain or shine, or might you prefer to use a Foundry BigIron which routes a couple gigabits under normal friendly non-stressful conditions and sits at 1% CPU? Of course, depending on the type of traffic and if you are from an older school of thinking your answer might very well be "I'd take the VXR", but the reality is that there is a lot more bandwidth out there than there used to be, and 300Mbps might just be an insignificant amount of traffic that is coming off 1 server for some people. Understanding the design limitations of ANY device, be it a software router, an asic based router with a prepopulated FIB, an asic based router with a CPU first lookup, a "hack on an ethernet cam" router, or two people with tin cans and a string yelling at each other in binary, is the first step to using it effectively. Understanding that the limitations of a "layer 3 switch" may make it ENTIRELY inappropriate for core routing work is a good beginning, understanding that a Juniper T640 may be entirely inappropriate for edge work or datacenter ethernet aggregation is a good middle ground, and understanding where and with what steps a "layer 3 switch" CAN be used effectively is even better still. Anyone who doesn't understand this is probably working for a bankrupt or soon to be bankrupt company. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
more generally... "if you want routing, buy a router."
amen. imho there can't be a better routing equipment than a real router :)
i guess i need to explain in more detail. keep in mind that i'm technophobic and that when VLANs first appeared i was convinced that the end of the world was upon us... that having been said, "ip switching" isn't a bad thing. if you've got more than one vlan'd subnet in a switch or switch-cluster, then it's not good to scoot packets up and down a trunk to a router just to let folks on one vlan talk to folks on another. that's the way i use my switer at home and i'm an ideal target audience for it since my kids can't invoke an SLA when they aren't able to play netgames. at work, though (for all values of "at" and "work"), there's a router trunk and the packets between vlans go through real routers. in addition to what might be a router-centric superstition, it's MUCH easier to find problems when you can point to each powered box and say "this one's a switch" and "this one's a router". and when it comes to wide area links, it turns out that the reputation of switches was wrecked in its earliest years, both with poor diagnostics and unreasonably low buffer sizes and a serious lag in implementation of things like RED. the DEC GigaSwitch, and various Vitalink products, were the poster children for "why wide area bridging is bad". i won't list the poster children for "why switches that try to do point-to-point routing is bad" since unlike DEC and Vitalink, the companies in question are still in business. watching nanog discussions over the years of how to make switches be routers without bloodshed or lost weekends is a lot like, to paraphrase tom lehrer, watching a christian scientist cope with appendicitis. so with the possible exception of inter-vlan "ip switching" in a lan context, if you want routing, buy a router.
ok so.. please note that, that was rather a foolish statement of mine :) for more constructive thought, i agree with ras' comments. -J On Wed, Jan 21, 2004 at 12:11:43PM -0500, haesu@towardex.com wrote:
more generally... "if you want routing, buy a router."
amen. imho there can't be a better routing equipment than a real router :)
-J
i have a hybrid switer that i'm very happy with. at my house, that is. (the idea of using one in commerce or production gives me cold shivers.) -- Paul Vixie
-- James Jun (formerly Haesu) TowardEX Technologies, Inc. 1740 Massachusetts Ave. Boxborough, MA 01719 Consulting, IPv4 & IPv6 colocation, web hosting, network design & implementation http://www.towardex.com | james@towardex.com Cell: (978)394-2867 | Office: (978)263-3399 Ext. 170 Fax: (978)263-0033 | AIM: GigabitEthernet0 NOC: http://www.twdx.net | POC: HAESU-ARIN, HDJ1-6BONE
-- James Jun (formerly Haesu) TowardEX Technologies, Inc. 1740 Massachusetts Ave. Boxborough, MA 01719 Consulting, IPv4 & IPv6 colocation, web hosting, network design & implementation http://www.towardex.com | james@towardex.com Cell: (978)394-2867 | Office: (978)263-3399 Ext. 170 Fax: (978)263-0033 | AIM: GigabitEthernet0 NOC: http://www.twdx.net | POC: HAESU-ARIN, HDJ1-6BONE
On Tuesday 20 January 2004 04:16 pm, Rubens Kuhl Jr. wrote:
Not all L3-switches are flow-based; prefix-based ones should do just fine. Can people add/correct this initial list ?
Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A) Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with Sup2(A), Sup3(A/BXL)
Rubens
Where do the Extreme and Juniper fit into this?
----- Original Message ----- From: <haesu@towardex.com> To: "Brent Van Dussen" <vandusb@attens.com> Cc: "NANOG" <nanog@merit.edu> Sent: Tuesday, January 20, 2004 9:46 PM Subject: Re: Nachi/Welchia Aftermath
lesson learned: stop using /makeshift/ layer3 switches (without naming vendor) to run L3 core
-J
On Tue, Jan 20, 2004 at 02:22:52PM -0800, Brent Van Dussen wrote:
Well folks, since the middle of August I've been tracking the spread and subsequent efforts by our community to stop the nachia/welchia infection that took down so many networks.
Sadly, by my estimations, only about 20-30% of infected hosts were cleaned. After Jan 1, 2004 it appears that the thousands, (millions?)
of
remaining infected hosts were rebooted and the worm removed itself. Network traffic has finally returned to normal.
What kind of effects did everyone see from this devastating worm and
what
lessons did we learn for preventing network downtime in the future?
-- James Jun (formerly Haesu) TowardEX Technologies, Inc. 1740 Massachusetts Ave. Boxborough, MA 01719 Consulting, IPv4 & IPv6 colocation, web hosting, network design &
implementation
http://www.towardex.com | james@towardex.com Cell: (978)394-2867 | Office: (978)263-3399 Ext. 170 Fax: (978)263-0033 | AIM: GigabitEthernet0 NOC: http://www.twdx.net | POC: HAESU-ARIN, HDJ1-6BONE
-- Donovan Hill Electronics Engineering Technologist, CCNA www.lazyeyez.net, www.gwsn.com
Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A) Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with Sup2(A), Sup3(A/BXL) Where do the Extreme and Juniper fit into this?
Private and public answers to my question indicate that both Summit 48i and Black Diamond from Extreme are flow-based; Juniper doesn't make layer 3 switches, but their routers also do prefix-based forwarding; Cisco routers also do prefix-based forwarding at usual configurations. Also of notice, flow-based forwarding is not the only thing that makes a L3 device suffer at worm attacks. If a directly connected interface is an Ethernet (or any other medium that is not point to point), ARPing for a lot of new addresses per second can also do harm. Rubens
----- Original Message ----- From: <haesu@towardex.com> To: "Brent Van Dussen" <vandusb@attens.com> Cc: "NANOG" <nanog@merit.edu> Sent: Tuesday, January 20, 2004 9:46 PM Subject: Re: Nachi/Welchia Aftermath
lesson learned: stop using /makeshift/ layer3 switches (without naming vendor) to run L3 core
On Tue, 20 Jan 2004, Rubens Kuhl Jr. wrote:
Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A) Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with Sup2(A), Sup3(A/BXL) Where do the Extreme and Juniper fit into this?
Private and public answers to my question indicate that both Summit 48i and Black Diamond from Extreme are flow-based; Juniper doesn't make layer 3 switches, but their routers also do prefix-based forwarding; Cisco routers also do prefix-based forwarding at usual configurations.
Also of notice, flow-based forwarding is not the only thing that makes a L3 device suffer at worm attacks. If a directly connected interface is an Ethernet (or any other medium that is not point to point), ARPing for a lot of new addresses per second can also do harm.
Nearly. Any frames needing to go to the CPU will harm your box.. this tends to be L2 occurances (arp storms is one ) which therefore means connected ethernets. DoSing (L3 IP eg smurf) a router will usually hurt and if you can manage it higher level applications (announce/withdraw 1000s routes in BGP, fill up NAT tables). Of course your architectures differ so ymmv. Steve
Rubens
----- Original Message ----- From: <haesu@towardex.com> To: "Brent Van Dussen" <vandusb@attens.com> Cc: "NANOG" <nanog@merit.edu> Sent: Tuesday, January 20, 2004 9:46 PM Subject: Re: Nachi/Welchia Aftermath
lesson learned: stop using /makeshift/ layer3 switches (without naming vendor) to run L3 core
On Tue, 20 Jan 2004, Donovan Hill wrote:
Where do the Extreme and Juniper fit into this?
Juniper do not make L3-switches so they dont really compare. Extreme i-plattform is currently destination ip based with inital cache lookup. (guess this is flow based) -- Mikael Abrahamsson email: swmike@swm.pp.se
On Wednesday 21 January 2004 12:07 am, Mikael Abrahamsson wrote:
On Tue, 20 Jan 2004, Donovan Hill wrote:
Where do the Extreme and Juniper fit into this?
Juniper do not make L3-switches so they dont really compare.
Others have said that too, but given where Junipers are used, I think they sneak into the same category as the Cisco 6500/7600s and other high-end L3 switches.
Extreme i-plattform is currently destination ip based with inital cache lookup. (guess this is flow based)
I guess I just don't understand the architecture. What I really don't understand is _why_ you'd bother with flow-based architecture over prefix-based architecture..... am I looking green yet? (If this isn't appropriate on-list, then feel free to reply off-list.) -- Donovan Hill Electronics Engineering Technologist, CCNA www.lazyeyez.net, www.gwsn.com
On Wed, 21 Jan 2004, Donovan Hill wrote:
Extreme i-plattform is currently destination ip based with inital cache lookup. (guess this is flow based)
I guess I just don't understand the architecture. What I really don't understand is _why_ you'd bother with flow-based architecture over prefix-based architecture..... am I looking green yet?
Cheap + Legacy. Some gear doesn't want to die :)
On Wed, 21 Jan 2004, Donovan Hill wrote:
I guess I just don't understand the architecture. What I really don't understand is _why_ you'd bother with flow-based architecture over prefix-based architecture..... am I looking green yet?
Since these boxes are priced around $3000-$4000 or so and have multiple gig ports and loads of 10/100 ports, they make a nice edge/distribution box. Extreme I-chipset stuff talk ISIS, OSPF and BGP just fine and have 128 megs of memory, and they do L2/L3 at wirespeed (once the flow is set up). The L2 is interesting since then you can use it for basically everything and not L2 or L3, but both in the same box and on the same links. -- Mikael Abrahamsson email: swmike@swm.pp.se
On Tue, Jan 20, 2004 at 10:16:03PM -0200, Rubens Kuhl Jr. wrote:
Not all L3-switches are flow-based; prefix-based ones should do just fine. Can people add/correct this initial list ?
Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A) Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with Sup2(A), Sup3(A/BXL)
Don't confuse "flow based" with "slow-path initial lookup", they aren't the same. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
On Tue, 20 Jan 2004, Rubens Kuhl Jr. wrote:
Not all L3-switches are flow-based; prefix-based ones should do just fine. Can people add/correct this initial list ?
Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A) Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with Sup2(A), Sup3(A/BXL)
The 2948G-L3 and the 4908G-L3 I believe are Prefix/ASIC based. I believe the 3550-EMI is as well, but I'm not familiar with that equipment.
On Tue, Jan 20, 2004 at 08:02:23PM -0800, Tom (UnitedLayer) wrote:
Not all L3-switches are flow-based; prefix-based ones should do just fine. Can people add/correct this initial list ?
Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A) Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with Sup2(A), Sup3(A/BXL)
The 2948G-L3 and the 4908G-L3 I believe are Prefix/ASIC based. I believe the 3550-EMI is as well, but I'm not familiar with that equipment.
Cisco Catalyst 4500 with Sup3/4 is also prefix based. -- John Lyons Phone: +353-1-660-9040 Network Engineer Fax: +353-1-660-3666 HEAnet Ltd. Email: john.lyons@heanet.ie
## On 2004-01-20 20:02 -0800 Tom (UnitedLayer) typed: T(> T(> On Tue, 20 Jan 2004, Rubens Kuhl Jr. wrote: T(> > Not all L3-switches are flow-based; prefix-based ones should do just fine. T(> > Can people add/correct this initial list ? T(> > T(> > Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A) T(> > Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with T(> > Sup2(A), Sup3(A/BXL) T(> T(> The 2948G-L3 and the 4908G-L3 I believe are Prefix/ASIC based. T(> I believe the 3550-EMI is as well, but I'm not familiar with that T(> equipment. T(> T(> Anyone know about the: Cisco Catalyst 3750 ? Nortel Passport 8600/1600 ? As for the 3550-EMI "real life" experience as a 10/100 BT aggregation switch wasn't affected(CPU <5%) at all by rather aggressive scanning but did generate around 11 Mb/sec of ARP requests on all the 100Mb/sec ports in the same VLAN and totally killed connectivity to legacy equipment connected at 10 Mb/s ... -- Thanks! Rafi
T(> > Flow-based: Foundry with IronCore modules, Cisco Catalyst 6500 with Sup1(A) T(> > Prefix-based: Foundry with JetCore modules, Cisco Catalyst 6500/7600 with T(> > Sup2(A), Sup3(A/BXL) T(> T(> The 2948G-L3 and the 4908G-L3 I believe are Prefix/ASIC based. T(> I believe the 3550-EMI is as well, but I'm not familiar with that T(> equipment. T(> T(>
Anyone know about the: Cisco Catalyst 3750 ? Nortel Passport 8600/1600 ?
As for the 3550-EMI "real life" experience as a 10/100 BT aggregation switch wasn't affected(CPU <5%) at all by rather aggressive scanning but did generate around 11 Mb/sec of ARP requests on all the 100Mb/sec ports in
Nortel Passport 8600 is flow-based according to a description I saw once; it might have changed. the same
VLAN and totally killed connectivity to legacy equipment connected at 10 Mb/s ...
Cisco Cat6k/Sup2+ has some throttling mechanisms that are worth testing to see if it also happens on that architeture. Rubens
On Wed, 21 Jan 2004, Rafi Sadowsky wrote:
As for the 3550-EMI "real life" experience as a 10/100 BT aggregation switch wasn't affected(CPU <5%) at all by rather aggressive scanning but did generate around 11 Mb/sec of ARP requests on all the 100Mb/sec ports in the same VLAN and totally killed connectivity to legacy equipment connected at 10 Mb/s ...
In my experience, a few megs of broadcast that a 7206VXR handled fairly well totally killed off a 3550. -- Mikael Abrahamsson email: swmike@swm.pp.se
participants (14)
-
Brent Van Dussen
-
Donovan Hill
-
haesu@towardex.com
-
james
-
John Lyons
-
Mikael Abrahamsson
-
Paul Vixie
-
Paul Vixie
-
Rafi Sadowsky
-
Richard A Steenbergen
-
Rubens Kuhl Jr.
-
Scott Weeks
-
Stephen J. Wilcox
-
Tom (UnitedLayer)