What Worked - What Didn't - Test

What Worked - What Didn't

Sean Donelan

17 Sep 2001 17 Sep '01

4:52 p.m.

As the New York Stock market re-opens, and some things are returning to normal, I'd like to look at how well the Internet performed last week. At the Oakland NANOG I'd like to give a presentation about what worked, and what didn't work during the last week in regards to the Internet. I would like to gather what details I can from both small and large providers in New York, the rest of the USA, and even overseas about what they saw, what problems they experienced, and what things worked. You can send me private mail if you wish, with or without attribution. This is a personal effort, not assocated with my employer. Oakland NANOG is several weeks away, so I don't expect an immediate response. I expect many ISPs will be conducting their own internal reviews. But if you could, please consider responding. I'm looking for input from small, medium and large providers. Thank you. A few questions, all related to the time between Sept 11 and 17: 1. Briefly tell me who you are, and generally where your operations were located? 2. What worked? 3. What didn't work? 4. Did you activate your emergency response plan? 5. Were you required to do anything different operationally? Did you make preventive operational changes? 6. Were any infrastructure administration functions impaired, such as DNS registration, routing registry, address delegation? 7. Were you able to communicate NOC-to-NOC when needed? 8. Were any means of communications nonfunctional or impaired (direct dial telephone, toll-free telephone, pager, e-mail, fax) when you attempted to communicate with other NOC's? 9. Did you ask for or receive a request for mutual aid from any other providers? Was it provided? 10. Within the limits of safety and rescue efforts, where you able to gain access to your physical facilities? 11. Did hoaxes or rumors impact your operations? 12. Do you have any recommendations how Internet providers could have responded differently?

Show replies by date

Marshall Eubanks

17 Sep 17 Sep

5:17 p.m.

Sean Donelan wrote:

...

As the New York Stock market re-opens, and some things are returning to normal, I'd like to look at how well the Internet performed last week.

At the Oakland NANOG I'd like to give a presentation about what worked, and what didn't work during the last week in regards to the Internet. I would like to gather what details I can from both small and large providers in New York, the rest of the USA, and even overseas about what they saw, what problems they experienced, and what things worked.

You can send me private mail if you wish, with or without attribution. This is a personal effort, not assocated with my employer.

Oakland NANOG is several weeks away, so I don't expect an immediate response. I expect many ISPs will be conducting their own internal reviews. But if you could, please consider responding. I'm looking for input from small, medium and large providers. Thank you.

A few questions, all related to the time between Sept 11 and 17:

1. Briefly tell me who you are, and generally where your operations were located?

2. What worked?

3. What didn't work?

4. Did you activate your emergency response plan?

5. Were you required to do anything different operationally? Did you make preventive operational changes?

6. Were any infrastructure administration functions impaired, such as DNS registration, routing registry, address delegation?

7. Were you able to communicate NOC-to-NOC when needed?

8. Were any means of communications nonfunctional or impaired (direct dial telephone, toll-free telephone, pager, e-mail, fax) when you attempted to communicate with other NOC's?

9. Did you ask for or receive a request for mutual aid from any other providers? Was it provided?

10. Within the limits of safety and rescue efforts, where you able to gain access to your physical facilities?

11. Did hoaxes or rumors impact your operations?

12. Do you have any recommendations how Internet providers could have responded differently?

Sean; Multicasting worked. It handled a big traffic spike without a hiccup. Regards Marshall Eubanks T.M. Eubanks Multicast Technologies, Inc 10301 Democracy Lane, Suite 410 Fairfax, Virginia 22030 Phone : 703-293-9624 Fax : 703-293-9609 e-mail : tme@multicasttech.com http://www.on-the-i.com Test your network for multicast : http://www.multicasttech.com/mt/ Check the status of multicast in real time : http://www.multicasttech.com/status/index.html

Daniel Golding

5:39 p.m.

The big lessons seem to be these... 1) The Internet, as currently constituted makes a lousy news propagation method, for large audiences. The one to many model in unicast IP puts too large of a load on the source. Good multicast (which we don't have yet) may fix this. Until that happens, the TV is still a better broadcast news medium. Mechanisms like Akamai's Edgesuite are a pretty good solution until that occurs, as they distribute the load pattern, from a "one to many" to a "many to many" model. 2) The Internet is superior to circuit switched services for one to one communications during this sort of condition. Fast busies were the order of the day in NYC and DC for the PSTN and cell phone networks. Instant Messanger services, IRC and email were more reliable than the telephone network by several orders of magnitude. 3) Since the transient from normal conditions was server-limited, there were not any significant network congestion issues. The next time a major event like this happens (and, of course, there will be a next time), news sites may be better prepared, which could cause the next transient from normal conditions to be network-limited. The big winners were cable TV, email, packet networks and IM applications. The big losers with cell phones, circuit switching, PSTN, non-akamized news sites. (My apologies if this post if perceived to be on-topic, operational, or has anything to do with internetworking. We will now return to our regularly scheduled, off-topic posts) - Daniel Golding Sockeye Networks -----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of Marshall Eubanks Sent: Monday, September 17, 2001 1:17 PM To: Sean Donelan Cc: nanog@merit.edu Subject: Re: What Worked - What Didn't Sean Donelan wrote:

...

As the New York Stock market re-opens, and some things are returning to normal, I'd like to look at how well the Internet performed last week.

At the Oakland NANOG I'd like to give a presentation about what worked, and what didn't work during the last week in regards to the Internet. I would like to gather what details I can from both small and large providers in New York, the rest of the USA, and even overseas about what they saw, what problems they experienced, and what things worked.

You can send me private mail if you wish, with or without attribution. This is a personal effort, not assocated with my employer.

Oakland NANOG is several weeks away, so I don't expect an immediate response. I expect many ISPs will be conducting their own internal reviews. But if you could, please consider responding. I'm looking for input from small, medium and large providers. Thank you.

A few questions, all related to the time between Sept 11 and 17:

1. Briefly tell me who you are, and generally where your operations were located?

2. What worked?

3. What didn't work?

4. Did you activate your emergency response plan?

5. Were you required to do anything different operationally? Did you make preventive operational changes?

6. Were any infrastructure administration functions impaired, such as DNS registration, routing registry, address delegation?

7. Were you able to communicate NOC-to-NOC when needed?

8. Were any means of communications nonfunctional or impaired (direct dial telephone, toll-free telephone, pager, e-mail, fax) when you attempted to communicate with other NOC's?

9. Did you ask for or receive a request for mutual aid from any other providers? Was it provided?

10. Within the limits of safety and rescue efforts, where you able to gain access to your physical facilities?

11. Did hoaxes or rumors impact your operations?

12. Do you have any recommendations how Internet providers could have responded differently?

Miles Fidelman

5:46 p.m.

On Mon, 17 Sep 2001, Daniel Golding wrote:

...

1) The Internet, as currently constituted makes a lousy news propagation method, for large audiences. The one to many model in unicast IP puts too large of a load on the source. Good multicast (which we don't have yet) may

one comment on this: email-based news seemed to work VERY well - both very focused news (such as operational material on nanog), and more general news (I found CNNs "breaking news" email list to be very informative - in fact, I first heard about the initial airliner crash via that list) Miles ************************************************************************** The Center for Civic Networking PO Box 600618 Miles R. Fidelman, President & Newtonville, MA 02460-0006 Director, Municipal Telecommunications Strategies Program 617-558-3698 fax: 617-630-8946 mfidelman@civicnet.org http://civic.net/ccn.html Information Infrastructure: Public Spaces for the 21st Century Let's Start With: Internet Wall-Plugs Everywhere Say It Often, Say It Loud: "I Want My Internet!" **************************************************************************

Strata Rose Chalup

7:48 p.m.

Yes, very. The #coverage channel on slashnet had folks watching/listening to various conventional media, as well as monitoring international news sites, and posting updates and links via moderators. A tremendous amount of info came in that way, and usually scooped any individual media station. I'd guess that setting up an IRC net for nanog-type operational traffic would be very helpful. Equally helpful would be gatewaying that net via packet radio on amateur frequencies. "Commercial" traffic is prohibited, but in a disaster this kind of thing would be equivalent to health-and-welfare traffic. In fact, now that I recall, SANS was asking for amateur radio operators to send in contact info in June or July. They were talking about putting together a non-internet communications network to be used in case of serious virus/DoS/etc slams on the net. It doesn't take a rocket scientist to see that they're thinking InfoWar type scenarios. I don't know if the project was abandoned or if it got complexified into something more formal and thus slowed down. We never heard back from them.

...

Ham Radio Operators? The threat to critical Internet resources from distributed denial of service attack tools continues to increase. An effective emergency communications network may be of great value if damage is done to both the Internet and to phone systems. SANS is looking for ham and packet radio operators who are willing to take a leadership role to help establish and maintain an emergency communication channel. If you are qualified and interested please send an email telling us about your ham radio and computer security activities. Send it to info@sans.org with Emergency Communications Network in the subject line.

It would be worth bringing back FidoNet or similar in parallel with packet radio networks. A lot of packet radio is BBS-based, and doesn't necessarily network between BBS's. I'm pretty new to packet, so go check out some of the packet links on http://www.tapr.org/ (Tucson Amateur Packet Radio), one of the best sites on the net for packet stuff. These folks have been real pioneers in it. If folks are interested in discussing this (packet nanog for emergencies, and/or irc comm net ditto) more, I'd be happy to host or set up a mailing list for it. SRC PS- And whether it was officially sanctioned or not, hats off to whoever put CNN's close-caption feed onto IRC as well. Low-bandwidth news w/o the talking heads. Miles Fidelman wrote:

...

On Mon, 17 Sep 2001, Daniel Golding wrote:

...
1) The Internet, as currently constituted makes a lousy news propagation method, for large audiences. The one to many model in unicast IP puts too large of a load on the source. Good multicast (which we don't have yet) may

one comment on this: email-based news seemed to work VERY well - both very focused news (such as operational material on nanog), and more general news (I found CNNs "breaking news" email list to be very informative - in fact, I first heard about the initial airliner crash via that list)

Miles

************************************************************************** The Center for Civic Networking PO Box 600618 Miles R. Fidelman, President & Newtonville, MA 02460-0006 Director, Municipal Telecommunications Strategies Program 617-558-3698 fax: 617-630-8946 mfidelman@civicnet.org http://civic.net/ccn.html

Information Infrastructure: Public Spaces for the 21st Century Let's Start With: Internet Wall-Plugs Everywhere Say It Often, Say It Loud: "I Want My Internet!" **************************************************************************

-- ======================================================================== Strata Rose Chalup [KF6NBZ] strata "@" virtual.net VirtualNet Consulting http://www.virtual.net/ ** Project Management & Architecture for ISP/ASP Systems Integration ** =========================================================================

Kevin Loch

8:49 p.m.

Strata Rose Chalup wrote:

...

Yes, very. The #coverage channel on slashnet had folks watching/listening to various conventional media, as well as monitoring international news sites, and posting updates and links via moderators. A tremendous amount of info came in that way, and usually scooped any individual media station.

I'd guess that setting up an IRC net for nanog-type operational traffic would be very helpful. Equally helpful would be gatewaying that net via packet radio on amateur frequencies. "Commercial" traffic is prohibited, but in a disaster this kind of thing would be equivalent to health-and-welfare traffic.

This is a gray area. Certainly any traffic related to the immediate saftey of life or property is permitted when "normal" communications services are unavailable. Here's the section of FCC rules part 97 that is relevant: http://www.arrl.org/FandES/field/regulations/news/part97/e.html The main focus seems to be using the amateur service in place of disabled/overloaded communications systems for carrying traffic directly related to the rescue/relief efforts. It would probably be a good idea to ammend the rules to explicitly allow traffic related to restoring other communication services (including the Internet) damaged in a disaster. This could apply to helping wireline networks, broadcast stations and ISP's get back online. Thereby using the "backup system" to help get the primary systems back online. KL (N3KL) bcc: w5jbp@arrl.org

Marshall Eubanks

5:49 p.m.

Daniel Golding wrote:

...

The big lessons seem to be these...

1) The Internet, as currently constituted makes a lousy news propagation method, for large audiences. The one to many model in unicast IP puts too large of a load on the source. Good multicast (which we don't have yet) may fix this. Until that happens, the TV is still a better broadcast news medium. Mechanisms like Akamai's Edgesuite are a pretty good solution until that occurs, as they distribute the load pattern, from a "one to many" to a "many to many" model.

Akamai did not work well Tuesday morning, at least for me. I do not know whether their servers were overloaded, or couldn't get content from the source, but they did NOT work well as seen from here. Washington Post.com, for example, loaded ONCE for me before about 3:00 PM EDT, and I know that site is Akamized. Contrarily Yours Marshall Eubanks

...

2) The Internet is superior to circuit switched services for one to one communications during this sort of condition. Fast busies were the order of the day in NYC and DC for the PSTN and cell phone networks. Instant Messanger services, IRC and email were more reliable than the telephone network by several orders of magnitude.

3) Since the transient from normal conditions was server-limited, there were not any significant network congestion issues. The next time a major event like this happens (and, of course, there will be a next time), news sites may be better prepared, which could cause the next transient from normal conditions to be network-limited.

The big winners were cable TV, email, packet networks and IM applications. The big losers with cell phones, circuit switching, PSTN, non-akamized news sites.

(My apologies if this post if perceived to be on-topic, operational, or has anything to do with internetworking. We will now return to our regularly scheduled, off-topic posts)

- Daniel Golding Sockeye Networks

-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of Marshall Eubanks Sent: Monday, September 17, 2001 1:17 PM To: Sean Donelan Cc: nanog@merit.edu Subject: Re: What Worked - What Didn't

Sean Donelan wrote:

...
As the New York Stock market re-opens, and some things are returning to normal, I'd like to look at how well the Internet performed last week.

At the Oakland NANOG I'd like to give a presentation about what worked, and what didn't work during the last week in regards to the Internet. I would like to gather what details I can from both small and large providers in New York, the rest of the USA, and even overseas about what they saw, what problems they experienced, and what things worked.

You can send me private mail if you wish, with or without attribution. This is a personal effort, not assocated with my employer.

Oakland NANOG is several weeks away, so I don't expect an immediate response. I expect many ISPs will be conducting their own internal reviews. But if you could, please consider responding. I'm looking for input from small, medium and large providers. Thank you.

A few questions, all related to the time between Sept 11 and 17:

1. Briefly tell me who you are, and generally where your operations were located?

2. What worked?

3. What didn't work?

4. Did you activate your emergency response plan?

5. Were you required to do anything different operationally? Did you make preventive operational changes?

6. Were any infrastructure administration functions impaired, such as DNS registration, routing registry, address delegation?

7. Were you able to communicate NOC-to-NOC when needed?

8. Were any means of communications nonfunctional or impaired (direct dial telephone, toll-free telephone, pager, e-mail, fax) when you attempted to communicate with other NOC's?

9. Did you ask for or receive a request for mutual aid from any other providers? Was it provided?

10. Within the limits of safety and rescue efforts, where you able to gain access to your physical facilities?

11. Did hoaxes or rumors impact your operations?

12. Do you have any recommendations how Internet providers could have responded differently?

Sean;

Multicasting worked. It handled a big traffic spike without a hiccup.

Regards Marshall Eubanks

T.M. Eubanks Multicast Technologies, Inc 10301 Democracy Lane, Suite 410 Fairfax, Virginia 22030 Phone : 703-293-9624 Fax : 703-293-9609 e-mail : tme@multicasttech.com http://www.on-the-i.com

Test your network for multicast : http://www.multicasttech.com/mt/ Check the status of multicast in real time : http://www.multicasttech.com/status/index.html

T.M. Eubanks Multicast Technologies, Inc 10301 Democracy Lane, Suite 410 Fairfax, Virginia 22030 Phone : 703-293-9624 Fax : 703-293-9609 e-mail : tme@multicasttech.com http://www.on-the-i.com Test your network for multicast : http://www.multicasttech.com/mt/ Check the status of multicast in real time : http://www.multicasttech.com/status/index.html

Vivien M.

5:57 p.m.

...

-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of Marshall Eubanks Sent: September 17, 2001 1:49 PM To: Daniel Golding Cc: Sean Donelan; nanog@merit.edu Subject: Re: What Worked - What Didn't

Akamai did not work well Tuesday morning, at least for me. I do not know whether their servers were overloaded, or couldn't get content from the source, but they did NOT work well as seen from here.

Washington Post.com, for example, loaded ONCE for me before about 3:00 PM EDT, and I know that site is Akamized.

Washingtonpost.com kept alternating between Akamaized and not Akamaized in my experience; I'm guessing that it takes some time for content to replicate across Akamai servers, so in the meantime they put the new content up locally, and once it was on all the Akamai servers changed their links to the Akamaized URL. For some reason though, it seemed that _all_ the links changed from Akamaized or not Akamaized and back and so on, and not just the new ones. It made for a rather ... odd situation. Vivien -- Vivien M. vivienm@dyndns.org Assistant System Administrator Dynamic DNS Network Services http://www.dyndns.org/

Patrick W. Gilmore

6:16 p.m.

At 01:57 PM 9/17/2001 -0400, Vivien M. wrote:

...

Washingtonpost.com kept alternating between Akamaized and not Akamaized in my experience; I'm guessing that it takes some time for content to replicate across Akamai servers, so in the meantime they put the new content up locally, and once it was on all the Akamai servers changed their links to the Akamaized URL. For some reason though, it seemed that _all_ the links changed from Akamaized or not Akamaized and back and so on, and not just the new ones. It made for a rather ... odd situation.

The customer controls whether an image, site, stream, or anything else is "Akamaized". And content is not replicated to any Akamai server until an end user "mapped" to that server requests it. So, when a customer changes from a standard URL to an Akamized URL, there is no wait time for the data to be pushed to all servers. The very first user asking for that content will be mapped to the nearest Akamai server, which will then pull the data down and give it to the user, saving a copy on its HD. Subsequent users will get the data directly from the hard drive. This is a strictly technical post on how Akamai works. Akamai has absolutely no control over whether a content provider uses Akamai's system to distribute all, some, or none of their content.

...

Vivien

-- TTFN, patrick

Daniel Golding

6:09 p.m.

hmm. I don't work for Akamai, so I can't presume to speak for them, but... I specified Edgesuite, rather than simply akamizing the links. I think that moving ALL content, rather than just some linked content to distributed servers makes a big difference. - Dan -----Original Message----- From: Marshall Eubanks [mailto:tme@21rst-century.com] Sent: Monday, September 17, 2001 1:49 PM To: Daniel Golding Cc: Sean Donelan; nanog@merit.edu Subject: Re: What Worked - What Didn't Daniel Golding wrote:

...

The big lessons seem to be these...

1) The Internet, as currently constituted makes a lousy news propagation method, for large audiences. The one to many model in unicast IP puts too large of a load on the source. Good multicast (which we don't have yet) may fix this. Until that happens, the TV is still a better broadcast news medium. Mechanisms like Akamai's Edgesuite are a pretty good solution until that occurs, as they distribute the load pattern, from a "one to many" to a "many to many" model.

...

2) The Internet is superior to circuit switched services for one to one communications during this sort of condition. Fast busies were the order

...

the day in NYC and DC for the PSTN and cell phone networks. Instant Messanger services, IRC and email were more reliable than the telephone network by several orders of magnitude.

3) Since the transient from normal conditions was server-limited, there were not any significant network congestion issues. The next time a major event like this happens (and, of course, there will be a next time), news sites may be better prepared, which could cause the next transient from normal conditions to be network-limited.

The big winners were cable TV, email, packet networks and IM applications. The big losers with cell phones, circuit switching, PSTN, non-akamized news sites.

(My apologies if this post if perceived to be on-topic, operational, or has anything to do with internetworking. We will now return to our regularly scheduled, off-topic posts)

- Daniel Golding Sockeye Networks

-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of Marshall Eubanks Sent: Monday, September 17, 2001 1:17 PM To: Sean Donelan Cc: nanog@merit.edu Subject: Re: What Worked - What Didn't

Sean Donelan wrote:

...
As the New York Stock market re-opens, and some things are returning to normal, I'd like to look at how well the Internet performed last week.

At the Oakland NANOG I'd like to give a presentation about what worked, and what didn't work during the last week in regards to the Internet. I would like to gather what details I can from both small and large providers in New York, the rest of the USA, and even overseas about what they saw, what problems they experienced, and what things worked.

You can send me private mail if you wish, with or without attribution. This is a personal effort, not assocated with my employer.

Oakland NANOG is several weeks away, so I don't expect an immediate response. I expect many ISPs will be conducting their own internal reviews. But if you could, please consider responding. I'm looking for input from small, medium and large providers. Thank you.

A few questions, all related to the time between Sept 11 and 17:

1. Briefly tell me who you are, and generally where your operations were located?

2. What worked?

3. What didn't work?

4. Did you activate your emergency response plan?

5. Were you required to do anything different operationally? Did you make preventive operational changes?

6. Were any infrastructure administration functions impaired, such as DNS registration, routing registry, address delegation?

7. Were you able to communicate NOC-to-NOC when needed?

8. Were any means of communications nonfunctional or impaired (direct dial telephone, toll-free telephone, pager, e-mail, fax) when you attempted to communicate with other NOC's?

9. Did you ask for or receive a request for mutual aid from any other providers? Was it provided?

10. Within the limits of safety and rescue efforts, where you able to gain access to your physical facilities?

11. Did hoaxes or rumors impact your operations?

12. Do you have any recommendations how Internet providers could have responded differently?

Sean;

Multicasting worked. It handled a big traffic spike without a hiccup.

Regards Marshall Eubanks

T.M. Eubanks Multicast Technologies, Inc 10301 Democracy Lane, Suite 410 Fairfax, Virginia 22030 Phone : 703-293-9624 Fax : 703-293-9609 e-mail : tme@multicasttech.com http://www.on-the-i.com

Test your network for multicast : http://www.multicasttech.com/mt/ Check the status of multicast in real time : http://www.multicasttech.com/status/index.html

Patrick W. Gilmore

6:26 p.m.

At 02:09 PM 9/17/2001 -0400, Daniel Golding wrote:

...

I specified Edgesuite, rather than simply akamizing the links. I think that moving ALL content, rather than just some linked content to distributed servers makes a big difference.

Again, a strictly technical post: EdgeSuite does serve the entire page, and while it is possible that "moving ALL content" might take longer than just moving images, I (personally) believe that would perform better than Akamaizing only images during times of peak congestion. EdgeSuite, much like FreeFlow, does not pre-populate servers. It requests content that has been requested of it. So when a user goes to an EdgeSuited site, they are sent to the nearest Akamai server. That Akamai server requests the HTML as well as individual objects, saves them to the hard drive, and serves them to the user. If no user requests a page, it will not be fetched. So the first user may not experience a large performance increase, but they might, we have other behind-the-scenes tricks which sometimes helps. Either way, they should not see a performance decrease. And all subsequent users should see a substantial performance increase. From the standpoint of an origin server, it only sees one request per region of Akamai servers (upper bound, usually lower). With FreeFlow, the origin server has to serve HTML to *every* user, and only the large files (images, PDFs, other static content - whatever they tell us to deliver) are served by Akamai.

...

- Dan

-- TTFN, patrick

Ian Cooper

6:09 p.m.

At 13:49 9/17/2001 -0400, Marshall Eubanks wrote:

...

Akamai did not work well Tuesday morning, at least for me. I do not know whether their servers were overloaded, or couldn't get content from the source, but they did NOT work well as seen from here.

Washington Post.com, for example, loaded ONCE for me before about 3:00 PM EDT, and I know that site is Akamized.

But it would depend on how far the Akamaization had been taken. Typical use (Freeflow) would be for all the graphics to sit on the Akamai surrogates - that still means that you have to pull the initial HTML "glue" from the (overloaded) origin server. I guess the future will show whether the fast-moving news environment will choose to use the full Edgesuite environment (in the case of Akamai, let's not forget there are other CDNs out there), which would also deliver the initial HTML.

Randy Bush

6:18 p.m.

...

The big winners were cable TV, email, packet networks and IM applications. The big losers with cell phones, circuit switching, PSTN, non-akamized news sites.

no one went after the comms infrastructure. when they do, i suspect that we will find the internet is extremely vulnerable. how many folk even have md5 auth turned on their bgp peering sessions? what nievete! randy

Patrick W. Gilmore

6:32 p.m.

At 11:18 AM 9/17/2001 -0700, Randy Bush wrote:

...

no one went after the comms infrastructure. when they do, i suspect that we will find the internet is extremely vulnerable. how many folk even have md5 auth turned on their bgp peering sessions? what nievete!

If someone can splice into my point-to-point OC system, fake being the router on the other end, and keep my peer from calling me and asking what happened, well, then I have MUCH bigger things to worry about than whether my BGP session is valid. (And he probably has the capability to do whatever he wants, no matter how hard I try to stop him.) As for public peering points, the ARP resolution would cause problems, and either I or my peer would notice pretty darned quickly. But only a small percentage of the traffic on the 'Net goes over public peering points these days anyway. Not sure where else anyone could use MD5 on their BGP. Maybe I missed something?

...

randy

-- TTFN, patrick

Valdis.Kletnieks＠vt.edu

6:46 p.m.

On Mon, 17 Sep 2001 14:32:35 EDT, "Patrick W. Gilmore" <patrick@ianai.net> said:

...

If someone can splice into my point-to-point OC system, fake being the router on the other end, and keep my peer from calling me and asking what

You *do* do ingress and egress filtering of your own addresses, and have checked that your router does in fact use cryptographically challenging seuquence numbers, right? And even if you don't, using MD5 is not *that* expensive (or shouldn't be), and provides security in depth. Unfortunately, I'll bet there's a LOT of routers that don't have filtering in place, don't have good sequence numbers, and don't use MD5. Enough said... -- Valdis Kletnieks Operating Systems Analyst Virginia Tech

Patrick W. Gilmore

7 p.m.

At 02:46 PM 9/17/2001 -0400, Valdis.Kletnieks@vt.edu wrote:

...

On Mon, 17 Sep 2001 14:32:35 EDT, "Patrick W. Gilmore" <patrick@ianai.net> said:

...
If someone can splice into my point-to-point OC system, fake being the router on the other end, and keep my peer from calling me and asking what

You *do* do ingress and egress filtering of your own addresses, and have checked that your router does in fact use cryptographically challenging seuquence numbers, right?

I do not do anything. I Am Not An Isp. :) But when I did run a network, I did *NOT* ingress filter on my own address space. I ran networks with multi-homed clients. If I did not allow my own address space to be announced to me, I would not have been able to talk to my multi-homed downstreams if their link to me was down. When a link to your upstream is down and you cannot send mail to noc@ through your second upstream, you tend to get a new upstream pretty quick. I *ABSOLUTELY* believe in filtering customer announcements into my backbone. Been a big proponent of it for many years. Search the archives. As for "cryptographically challenging sequence numbers", well, no, I have not inspected the code on any cisco or Juniper routers lately. Whatever sequence numbers they use are the sequence numbers they use, and I ain't gonna hack the code to change it.

...

And even if you don't, using MD5 is not *that* expensive (or shouldn't be), and provides security in depth.

I do not *think* it would tax the CPU too much, but it has been at least 3 years since I have done it. IIRC, the CPU overhead was near nil. And it only provides security for the BGP session, not "in depth". I am not saying that is a bad thing, just mentioning the limitation.

...

Unfortunately, I'll bet there's a LOT of routers that don't have filtering in place, don't have good sequence numbers, and don't use MD5. Enough said...

Actually, I am still not certain why it was said at all. There are far, far more difficult hurdles to over come when spoofing a BGP session between major carriers than the sequence numbers. And most people notice when a major peer goes down, very, very quickly. MD5 or not. In fact, I would wager that the misdirected traffic due to the added configuration complexity (yes, one line, but trust me, it can be a bitch if you forget the line, or forget the password) would far outweigh any savings you got from stopping attacks. But not way to tell for certain since this type of attack is practically unheard of. (Or perhaps that is a way to tell? :)

...

Valdis Kletnieks

-- TTFN, patrick

Alex Bligh

8:18 p.m.

--On Monday, 17 September, 2001 2:32 PM -0400 "Patrick W. Gilmore" <patrick@ianai.net> wrote:

...

Maybe I missed something?

Only all the well documented attacks (including DoS). Think about sending RST to BGP port (and other random ports) on your routers. -- Alex Bligh Personal Capacity

Patrick W. Gilmore

8:21 p.m.

At 09:18 PM 9/17/2001 +0100, Alex Bligh wrote:

...

--On Monday, 17 September, 2001 2:32 PM -0400 "Patrick W. Gilmore" <patrick@ianai.net> wrote:

...
Maybe I missed something?

Only all the well documented attacks (including DoS). Think about sending RST to BGP port (and other random ports) on your routers.

I was under the impression that MD5 would not stop an RST attack. It that incorrect? And if you filtered on source IP for all your downstreams, this would solve that problem. (Unless the attacker was a major carrier, in which case he may very well be in possession of your MD5 passphrase.)

...

Alex Bligh

-- TTFN, patrick

John Payne

8:47 p.m.

On Mon, Sep 17, 2001 at 09:18:57PM +0100, Alex Bligh wrote:

...

--On Monday, 17 September, 2001 2:32 PM -0400 "Patrick W. Gilmore" <patrick@ianai.net> wrote:

...
Maybe I missed something?

Only all the well documented attacks (including DoS). Think about sending RST to BGP port (and other random ports) on your routers.

Would MD5 on the BGP session prevent this? Surely, its at the wrong "layer" -- John Payne http://sackheads.org/jpayne/ john@sackheads.org http://sackheads.org/uce/ Fax: +44 870 0547954 To send me mail, use the address in the From: header

John Kristoff

18 Sep 18 Sep

10:12 p.m.

John Payne wrote:

...

Would MD5 on the BGP session prevent this? Surely, its at the wrong "layer"

Its actually a TCP protocol option that happens to only be used by BGP on 'top of it'. John

alex＠yuriev.com

10:34 p.m.

...

John Payne wrote:

...
Would MD5 on the BGP session prevent this? Surely, its at the wrong "layer"

Its actually a TCP protocol option that happens to only be used by BGP on 'top of it'.

And what would be the offset of this special tcp option that is used only by bgp in the tcp header? Alex

Vadim Antonov

17 Sep 17 Sep

7:56 p.m.

On Mon, 17 Sep 2001, Randy Bush wrote:

...

...
The big winners were cable TV, email, packet networks and IM applications. The big losers with cell phones, circuit switching, PSTN, non-akamized news sites.

no one went after the comms infrastructure. when they do, i suspect that we will find the internet is extremely vulnerable. how many folk even have md5 auth turned on their bgp peering sessions? what nievete!

randy

All US long-distance telephony infrastructure can be effectively disabled by a couple dozen or so backhoes digging in the right places. Even competing carriers often share cables. --vadim

Daniel Golding

8:11 p.m.

Gee, the only major ISP that uses MD5 for peering links is Verio. That what you were looking for, Randy? :) Seriously, BGP session hijacking is the least of our worries. If you want to hit internet infrastructure, the points of weakness are obvious and physical. Car bombs at a dozen sites that we all know so well would be enough to seriously degrade internet communications, particularly if they were detonated near the fiber entrance facilities. This underscores the previous concerns mentioned by some about the common colocation of private peering by major internet carriers. Looks a little riskier now, yes? - Daniel Golding -----Original Message----- From: Randy Bush [mailto:randy@psg.com] Sent: Monday, September 17, 2001 2:19 PM To: Daniel Golding Cc: nanog@merit.edu Subject: RE: What Worked - What Didn't

...

The big winners were cable TV, email, packet networks and IM applications. The big losers with cell phones, circuit switching, PSTN, non-akamized news sites.

Randy Bush

8:14 p.m.

...

Gee, the only major ISP that uses MD5 for peering links is Verio.

i believe that statement to be false randy

Daniel Golding

8:26 p.m.

Feel free to enlighten the masses, but in my previous experience at ISPs which had reasonable extensive peering, Verio was the only one to require MD5. The salient point here, is that this is not a widely adopted practice. If you feel it should be, by all means, make your case, as the internet community is probably more open to proposals designed to strengthen security now, than at most other times. - Daniel Golding -----Original Message----- From: Randy Bush [mailto:randy@psg.com] Sent: Monday, September 17, 2001 4:14 PM To: Daniel Golding Cc: nanog@merit.edu Subject: RE: What Worked - What Didn't

...

Gee, the only major ISP that uses MD5 for peering links is Verio.

i believe that statement to be false randy

Randy Bush

8:44 p.m.

...

The salient point here, is that this is not a widely adopted practice. If you feel it should be, by all means, make your case, as the internet community is probably more open to proposals designed to strengthen security now, than at most other times.

it is not a great defense, but it's some defense. like all security efforts, it is not a cure but raises the barrier. i see no reason for inter-isp peering and intra-isp ibgp to be covered fairly quickly. i would suggest having one's provisioning folk working with bgp customers to close that avenue as well, starting with the more critical customers. also, think about your igp. randy

Chris Woodfield

8:51 p.m.

I can think of one particular ISP's POP where the fiber comes into the building from a conduit that comes out of the ground, into a small metal box, and then into the front of the building. In front of this exposed conduit, a small bush was planted. At the time, I joked about how one well-placed shotgun blast from a car in the parking lot would be all it took to destroy most, if not all, of that building's connectivity. As an employee of one of the many companies who have service points at 25 Broadway, I think I'll stop joking about things like that. -C On Mon, Sep 17, 2001 at 04:11:26PM -0400, Daniel Golding wrote:

...

Gee, the only major ISP that uses MD5 for peering links is Verio. That what you were looking for, Randy? :)

Seriously, BGP session hijacking is the least of our worries. If you want to hit internet infrastructure, the points of weakness are obvious and physical. Car bombs at a dozen sites that we all know so well would be enough to seriously degrade internet communications, particularly if they were detonated near the fiber entrance facilities.

This underscores the previous concerns mentioned by some about the common colocation of private peering by major internet carriers. Looks a little riskier now, yes?

- Daniel Golding

-----Original Message----- From: Randy Bush [mailto:randy@psg.com] Sent: Monday, September 17, 2001 2:19 PM To: Daniel Golding Cc: nanog@merit.edu Subject: RE: What Worked - What Didn't

...
The big winners were cable TV, email, packet networks and IM applications. The big losers with cell phones, circuit switching, PSTN, non-akamized news sites.

no one went after the comms infrastructure. when they do, i suspect that we will find the internet is extremely vulnerable. how many folk even have md5 auth turned on their bgp peering sessions? what nievete!

randy

Randy Bush

9:32 p.m.

...

I can think of one particular ISP's POP where the fiber comes into the building from a conduit that comes out of the ground ...

while i support and encourage physical hardening, folk may want to do some minor things about logical attacks in-band that can be conducted from a comfortable chair 10,000km away. randy

Leo Bicknell

18 Sep 18 Sep

12:06 a.m.

On Mon, Sep 17, 2001 at 04:51:17PM -0400, Chris Woodfield wrote:

...

I can think of one particular ISP's POP where the fiber comes into the building from a conduit that comes out of the ground, into a small metal

I can think of an ISP where you can call a 1-800 number and have their fiber carefully painted in orange all over the ground in 48 hours. Oh wait, that was all of them. To quote the commercial, "it's not just a good idea, it's the law." -- Leo Bicknell - bicknell@ufp.org Systems Engineer - Internetworking Engineer - CCIE 3440 Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org

Iljitsch van Beijnum

17 Sep 17 Sep

8:41 p.m.

On Mon, 17 Sep 2001, Randy Bush wrote:

...

...
The big winners were cable TV, email, packet networks and IM applications. The big losers with cell phones, circuit switching, PSTN, non-akamized news sites.

...

no one went after the comms infrastructure. when they do, i suspect that we will find the internet is extremely vulnerable.

"Extremely" may be too strong, but certainly "much more than we want". We multihome in The Netherlands and both our transit ISPs connect to the US in the Washington/New York area, with no real backup. I've heard some telcos talk about networks that span the globe, but as far as I can tell, nearly all traffic from Europe to Africa, Asia/Pacific and South America goes through the US. So apparently the cables are there but they aren't used. And even for the US West Coast satellite is a reasonable alternative with just 50% longer RTTs than sea/land based connections.

...

how many folk even have md5 auth turned on their bgp peering sessions?

How much kerosine can MD5 withstand exactly? But speaking of BGP: what concerns me is the very long timeouts. When a BGP router loses power, it takes minutes for the peer on the other side of the connection to notice something is wrong and reroute the traffic. In the mean time, a lot of traffic has been lost, even though there could have been an alternative path available all along. Fortunately, the power down at 25 Broadway was a controlled one so we didn't have this problem last week.

Randy Bush

8:53 p.m.

...

...
how many folk even have md5 auth turned on their bgp peering sessions? How much kerosine can MD5 withstand exactly?

folk may want to read rfc 2385. better measures would likely be appreciated.

...

When a BGP router loses power, it takes minutes for the peer on the other side of the connection to notice something is wrong and reroute the traffic.

as i do not see this in rfc 1771 or draft-ietf-idr-bgp4-1[23].txt, i suspect that this is implementation specific. randy

Iljitsch van Beijnum

9:31 p.m.

On Mon, 17 Sep 2001, Randy Bush wrote:

...

...
When a BGP router loses power, it takes minutes for the peer on the other side of the connection to notice something is wrong and reroute the traffic.

...

as i do not see this in rfc 1771 or draft-ietf-idr-bgp4-1[23].txt, i suspect that this is implementation specific.

You are right. From the RFC: "The suggested value for the Hold Time is 90 seconds. The suggested value for the KeepAlive timer is 30 seconds." and Cisco's defaults seem to be twice that. Is there any reason these values should be this high? I mean, other than to mimic RIP behavior? Fortunately, the lower of the values configured on both peers is used, so this can easily be changed to 3/1 seconds. But people still have to do it. Don't think this is a trivial issue. When the Amsterdam Internet Exchange lost power a couple of months ago, we couldn't reach most of Europe for about ten minutes when iBGP sessions of one of our transit ISPs started to time out as they ran out of battery power. I'm still not sure why all of this took this long, but three minutes are pretty much guaranteed on any Cisco running BGP with "out of the box" timers over a switched layer 2 network or when "no fast-external-fallover" is in effect.

Randy Bush

9:39 p.m.

...

...
...
When a BGP router loses power, it takes minutes for the peer on the other side of the connection to notice something is wrong and reroute the traffic. as i do not see this in rfc 1771 or draft-ietf-idr-bgp4-1[23].txt, i suspect that this is implementation specific. "The suggested value for the Hold Time is 90 seconds. The suggested value for the KeepAlive timer is 30 seconds."

the hello timer != noticing an interface has gone hard down, which i thought was your original point. randy

Iljitsch van Beijnum

10:10 p.m.

On Mon, 17 Sep 2001, Randy Bush wrote:

...

...
"The suggested value for the Hold Time is 90 seconds. The suggested value for the KeepAlive timer is 30 seconds."

...

the hello timer != noticing an interface has gone hard down, which i thought was your original point.

If there are no other means to detect the datalink session between two BGP speakers has gone south, the hold timer will do this. For POS and most other point-to-point links the "fast-external-fallover" will usually kick in sooner because datalink layer keepalives aren't received anymore (some advocate "no fast-external-fallover" to avoid flapping, though), but in switched environments such as Ethernet or ATM the interface may stay up while there is no connectivity. In those circumstances, the hold timer is the only thing that will break the BGP session eventually.

Mike Lewinski

9:38 p.m.

...

But speaking of BGP: what concerns me is the very long timeouts. When a BGP router loses power, it takes minutes for the peer on the other side of the connection to notice something is wrong and reroute the traffic.

This IOS command should remedy that under most circumstances: bgp fast-external-fallover Mike

8908

Age (days ago)

8909

Last active (days ago)

List overview

Download

34 comments

20 participants

participants (20)

Alex Bligh
alex＠yuriev.com
Chris Woodfield
Daniel Golding
Ian Cooper
Iljitsch van Beijnum
John Kristoff
John Payne
Kevin Loch
Leo Bicknell
Marshall Eubanks
Mike Lewinski
Miles Fidelman
Patrick W. Gilmore
Randy Bush
Sean Donelan
Strata Rose Chalup
Vadim Antonov
Valdis.Kletnieks＠vt.edu
Vivien M.