External Events (was Re: www.etrade.com has no DNS A record !)
On Tue, 25 January 2000, John Hawkinson wrote:
Is your goal to get the word out to network providers of people who use E*TRADE? Do you really expect that many of them will forward this announcement or make good use of it? Should a message be sent to NANOG every time CNN, Netscape, or Yahoo go down?
Am I missing something here? [Like a sense of humor?]
External events have an affect on network service and network operators. Why do most NOC's have one or more monitors tuned to CNN and the Weather channel all day and all night? Ok, I know the real reason, but what is the reason the sales people tell prospective clients? The question is really one of editorial policy and how significant is any individual event. I don't think there is really one answer which can cover everything. The Internet (RTM) worm affected only VAX and Sun computers, an estimated 10% of the Internet of the day. If you didn't use Sun or VAXen, it would have been an irrelevent event for you. When AOL forgot to put a GUARDIAN password on its domains, and there where changed to a tiny ISP, if you didn't use AOL it may have been irrelevent to you. When Cisco, Bay and GATED BGP implementations had a disagreement on whether ASNs could be repeated in an as-path, it may have been irrelevent to you if you used a different BGP implementation or router. Whether a particular NSI problem, an E*Trade problem, or an Ebay problem, or a Cisco CCO problem is really significant enough to talk about semi- publically is tough. It would be nice if each company was willing to make timely disclosures about problems. But as we've seen time and time again, companies would prefer to never to acknowledge they had any problem until it becomes impossible to ignore (e.g. Worldcom's 10 days of hell last summer).
On Tue, Jan 25, 2000 at 08:40:46PM -0800, Sean Donelan wrote:
On Tue, 25 January 2000, John Hawkinson wrote:
Is your goal to get the word out to network providers of people who use E*TRADE? Do you really expect that many of them will forward this announcement or make good use of it? Should a message be sent to NANOG every time CNN, Netscape, or Yahoo go down?
Am I missing something here? [Like a sense of humor?]
External events have an affect on network service and network operators. Why do most NOC's have one or more monitors tuned to CNN and the Weather channel all day and all night? Ok, I know the real reason, but what is the reason the sales people tell prospective clients?
generally, we had CNN/TWC on because we weren't allowed to have wrestling and tractor pulls on during business hours m-f. we were allowed football on the weekends and mnf. occasionally there was a useful bit of news, but more often we were too busy to notice.
The question is really one of editorial policy and how significant is any individual event. I don't think there is really one answer which can cover everything.
I think the question is more one of propagation speed. sometimes, there is actually information on this mailing list of a useful nature delivered before it can be obtained through normal channels. frequently it's information about a fiber cut, and we have you to thank for it. thank you, btw.
The Internet (RTM) worm affected only VAX and Sun computers, an estimated 10% of the Internet of the day. If you didn't use Sun or VAXen, it would have been an irrelevent event for you. When AOL forgot to put a GUARDIAN password on its domains, and there where changed to a tiny ISP, if you didn't use AOL it may have been irrelevent to you. When Cisco, Bay and GATED BGP implementations had a disagreement on whether ASNs could be repeated in an as-path, it may have been irrelevent to you if you used a different BGP implementation or router.
in today's internet, whether it affects anyone on this list specifically, it will most likely affect the customers of the majority on this list. whether or not any of us wish to admit it, customers are the reason we operate our networks. when things like this happen, and affect our customers, they will call our customer support people, who will in turn look to the NOC for answers. working in a NOC, I know it is very nice to be able to look my brethren in the eyes and tell them what the problem is, and what they should tell our customers about it. believe it or not, there are people on this list who actually operate networks and we would like to know why there are thousands of people calling support to ask why they can't trade stocks; and we want to know yesterday.
Whether a particular NSI problem, an E*Trade problem, or an Ebay problem, or a Cisco CCO problem is really significant enough to talk about semi- publically is tough. It would be nice if each company was willing to make timely disclosures about problems. But as we've seen time and time again, companies would prefer to never to acknowledge they had any problem until it becomes impossible to ignore (e.g. Worldcom's 10 days of hell last summer).
this is frequently because these folk have made unrealistic promises to their customers that they can maintain the illusion of keeping so long as news of the failure is kept internal. -- Sam Thomas Geek Mercenary
On Tue, 25 January 2000, John Hawkinson wrote:
Is your goal to get the word out to network providers of people who use E*TRADE? Do you really expect that many of them will forward this announcement or make good use of it? Should a message be sent to NANOG every time CNN, Netscape, or Yahoo go down?
Am I missing something here? [Like a sense of humor?]
On Tue, Jan 25, 2000 at 08:40:46PM -0800, Sean Donelan wrote:
External events have an affect on network service and network operators. Why do most NOC's have one or more monitors tuned to CNN and the Weather channel all day and all night? Ok, I know the real reason, but what is the reason the sales people tell prospective clients?
The question is really one of editorial policy and how significant is any individual event. I don't think there is really one answer which can cover everything.
This is true. That is part of why I asked the question they way I did:
Is your goal to get the word out to network providers of people who use E*TRADE? Do you really expect that many of them will forward this announcement or make good use of it? Should a message be sent to NANOG every time CNN, Netscape, or Yahoo go down?
While most people interpreted it rhetorically, it was actually asked with a significant literal component. When asking the list a question like this, though it's hard to know how to contend with the potential silent majority versus the exuberant minority (I've heard from some of people who agreed with the position I espoused). It appears that there is a significant population among the NANOG readership who benefit from this sort of notification. Personally, I believe that the notification is useful and valuable, however my opinion is mostly that NANOG is not the right place for it. This is an opinion I have held for a long time, and it was solidified back when a mailing list called nsr@merit.edu existed. I believe it stood for "Network Status Reporting". It's awful hard to find archives of it any more (hey, merit!), but google.com has one message cached which demonstrates the flavor: | To: nsr@merit.edu | Subject: 07/22/94 NSFNET Backbone Unreachable 10:00 - 11:00 UTC | From: ANS Network Operations Center <noc@noc.ans.net> | Date: Fri, 22 Jul 1994 11:19:20 GMT | | 07/22/94 NSFNET Backbone Unreachable 10:00 - 11:00 GMT. | | At 10:00 UTC gated exited on all core routers and ENSS's. | All networks announced by NSFNET sites were unreachable or | experienced varying degrees of instability during this window | while gated was restarted across the NSFNET backbone. The | cause of this outage is currently being pursued by our engineers. | | Stephen Powell | ANS Network Operations Center Well, though in many cases notifications were sent to nsr about circuit outages and individual ENSS outages. I believe the charter of the list said that it was appropriate for all sorts of outage reporting, not simply NFSnet backbone reports, however I seem to rarely remember that ever happening, even then. Similarly, the Internet Monthly Report from Anne Cooper at ISI would summarize notable events and regionals (and anybody else, it seemed) would submit monthly reports of significant events. You didn't see discussion of high-level issues on the NSR list, and that was the right thing; issue-discussion was seperate from operational notification. I find that seperation to be incredibly useful. Perhaps it is because at this point I deal less with day-to-day operational issues (company scaling), but I think even in the heyday I would have felt the same. Bill Simpson points out: / In the case of a small rural ISP with less than 4,000 customers, an / amazing number of folks called about our "problem", and the NANOG list / is just about the first place I look for a heads up or explanation. And of course, NANOG doesn't information about most of these outages, and while I think it should not, that doesn't mean I do not think that those outages should go unreported. I would propose that we consider creating a mechanism for that sort of outage reporting. It seems to me that there are two broad categories: a) Official outage reporting from the organization experiencing the outage b) Unofficial outage reporting from someone affected by the outage. Both are valuable and occur in different ways, and unfortunately it is the case that in today's business climate, the latter is likely to be more accurate and detailed. The obvious implementations that occur to me are i) A mailing list like NSR; just bring it back, potentially moderate it to ensure that the usage is consistent with the charter, and redirect postings from NANOG to such a list. ii) A web-based format where people can note outages, and comment on them usefully (perhaps ala slashdot?). I think both of those ideas could work, though both have bene tried and not worked very well for various reasons [what ever happened to outage@dal.net?]. I would ask, however, that someone *not* take this message as the impetus to go out and set up such a thing, but instead try to listen to reasoned discussion and coordinate it with the community. Back to Sean:
The Internet (RTM) worm affected only VAX and Sun computers, an estimated 10% of the Internet of the day. If you didn't use Sun or VAXen, it would have been an irrelevent event for you.
Not only that, it affected *hosts* (unless of course, you were using Suns or VAXen as gateways, as I'm sure many people were). Surely hosts are outside the scope of nanog? ;-) Seriously, though, I think it is terribly unfair to compare something like an Internet-wide worm to a simple DNS misconfiguration. The latter is one person's problem and can be fixed with a quick phone call to the right person (Assuming you can find that person, 20 phone calls later), whereas the former is a huge management problem that cannot be easily dealt with.
When AOL forgot to put a GUARDIAN password on its domains, and there where changed to a tiny ISP, if you didn't use AOL it may have been irrelevent to you.
For the most part, yes, though I believe that this caused real operational effects for large volumes of mail queued on mail servers of network providers in North America, and so was operationally relevent. Failed DNS queries to E*TRADE just don't have the same level of visibility. They may affect customers equally, but they affect providers not-at-all.
When Cisco, Bay and GATED BGP implementations had a disagreement on whether ASNs could be repeated in an as-path, it may have been irrelevent to you if you used a different BGP implementation or router.
You're being really off-the-wall here. It's quite clear that a statistically significant fraction of North American network operators use those implementations, so discussion is meritted. Especially because there is *something* to discuss, not merely "Oh, look, it's broken. We can now wait until they fix it."
Whether a particular NSI problem, an E*Trade problem, or an Ebay problem, or a Cisco CCO problem is really significant enough to talk about semi- publically is tough. It would be nice if each company was willing to make timely disclosures about problems.
E*TRADE's annual report for 1999 makes some disclosures about infrastructure failures, by the way.
But as we've seen time and time again, companies would prefer to never to acknowledge they had any problem until it becomes impossible to ignore (e.g. Worldcom's 10 days of hell last summer).
Indeed. Just because they should be reported doesn't mean they should be reported to NANOG. I think outage notification and operational issue discussion are different things and should go to different places. That worked well for the NSFnet with nsr@merit.edu split from regional-techs@merit.edu, and the Internet has only grown since then, and the scaling benefits would be much more sizable. Opinions? --jhawk
I support John in his opinion that NANOG is not an appropriate forum for real-time outage reporting. -alan Thus spake John Hawkinson (jhawk@bbnplanet.com) on or about Wed, Jan 26, 2000 at 10:48:56AM -0500:
On Tue, 25 January 2000, John Hawkinson wrote:
Is your goal to get the word out to network providers of people who use E*TRADE? Do you really expect that many of them will forward this announcement or make good use of it? Should a message be sent to NANOG every time CNN, Netscape, or Yahoo go down?
Am I missing something here? [Like a sense of humor?]
On Tue, Jan 25, 2000 at 08:40:46PM -0800, Sean Donelan wrote:
External events have an affect on network service and network operators. Why do most NOC's have one or more monitors tuned to CNN and the Weather channel all day and all night? Ok, I know the real reason, but what is the reason the sales people tell prospective clients?
The question is really one of editorial policy and how significant is any individual event. I don't think there is really one answer which can cover everything.
This is true. That is part of why I asked the question they way I did:
Is your goal to get the word out to network providers of people who use E*TRADE? Do you really expect that many of them will forward this announcement or make good use of it? Should a message be sent to NANOG every time CNN, Netscape, or Yahoo go down?
While most people interpreted it rhetorically, it was actually asked with a significant literal component. When asking the list a question like this, though it's hard to know how to contend with the potential silent majority versus the exuberant minority (I've heard from some of people who agreed with the position I espoused).
It appears that there is a significant population among the NANOG readership who benefit from this sort of notification. Personally, I believe that the notification is useful and valuable, however my opinion is mostly that NANOG is not the right place for it.
This is an opinion I have held for a long time, and it was solidified back when a mailing list called nsr@merit.edu existed. I believe it stood for "Network Status Reporting". It's awful hard to find archives of it any more (hey, merit!), but google.com has one message cached which demonstrates the flavor:
| To: nsr@merit.edu | Subject: 07/22/94 NSFNET Backbone Unreachable 10:00 - 11:00 UTC | From: ANS Network Operations Center <noc@noc.ans.net> | Date: Fri, 22 Jul 1994 11:19:20 GMT | | 07/22/94 NSFNET Backbone Unreachable 10:00 - 11:00 GMT. | | At 10:00 UTC gated exited on all core routers and ENSS's. | All networks announced by NSFNET sites were unreachable or | experienced varying degrees of instability during this window | while gated was restarted across the NSFNET backbone. The | cause of this outage is currently being pursued by our engineers. | | Stephen Powell | ANS Network Operations Center
Well, though in many cases notifications were sent to nsr about circuit outages and individual ENSS outages. I believe the charter of the list said that it was appropriate for all sorts of outage reporting, not simply NFSnet backbone reports, however I seem to rarely remember that ever happening, even then.
Similarly, the Internet Monthly Report from Anne Cooper at ISI would summarize notable events and regionals (and anybody else, it seemed) would submit monthly reports of significant events.
You didn't see discussion of high-level issues on the NSR list, and that was the right thing; issue-discussion was seperate from operational notification. I find that seperation to be incredibly useful. Perhaps it is because at this point I deal less with day-to-day operational issues (company scaling), but I think even in the heyday I would have felt the same.
Bill Simpson points out:
/ In the case of a small rural ISP with less than 4,000 customers, an / amazing number of folks called about our "problem", and the NANOG list / is just about the first place I look for a heads up or explanation.
And of course, NANOG doesn't information about most of these outages, and while I think it should not, that doesn't mean I do not think that those outages should go unreported.
I would propose that we consider creating a mechanism for that sort of outage reporting. It seems to me that there are two broad categories:
a) Official outage reporting from the organization experiencing the outage b) Unofficial outage reporting from someone affected by the outage.
Both are valuable and occur in different ways, and unfortunately it is the case that in today's business climate, the latter is likely to be more accurate and detailed.
The obvious implementations that occur to me are i) A mailing list like NSR; just bring it back, potentially moderate it to ensure that the usage is consistent with the charter, and redirect postings from NANOG to such a list. ii) A web-based format where people can note outages, and comment on them usefully (perhaps ala slashdot?).
I think both of those ideas could work, though both have bene tried and not worked very well for various reasons [what ever happened to outage@dal.net?].
I would ask, however, that someone *not* take this message as the impetus to go out and set up such a thing, but instead try to listen to reasoned discussion and coordinate it with the community.
Back to Sean:
The Internet (RTM) worm affected only VAX and Sun computers, an estimated 10% of the Internet of the day. If you didn't use Sun or VAXen, it would have been an irrelevent event for you.
Not only that, it affected *hosts* (unless of course, you were using Suns or VAXen as gateways, as I'm sure many people were). Surely hosts are outside the scope of nanog? ;-) Seriously, though, I think it is terribly unfair to compare something like an Internet-wide worm to a simple DNS misconfiguration. The latter is one person's problem and can be fixed with a quick phone call to the right person (Assuming you can find that person, 20 phone calls later), whereas the former is a huge management problem that cannot be easily dealt with.
When AOL forgot to put a GUARDIAN password on its domains, and there where changed to a tiny ISP, if you didn't use AOL it may have been irrelevent to you.
For the most part, yes, though I believe that this caused real operational effects for large volumes of mail queued on mail servers of network providers in North America, and so was operationally relevent. Failed DNS queries to E*TRADE just don't have the same level of visibility. They may affect customers equally, but they affect providers not-at-all.
When Cisco, Bay and GATED BGP implementations had a disagreement on whether ASNs could be repeated in an as-path, it may have been irrelevent to you if you used a different BGP implementation or router.
You're being really off-the-wall here. It's quite clear that a statistically significant fraction of North American network operators use those implementations, so discussion is meritted. Especially because there is *something* to discuss, not merely "Oh, look, it's broken. We can now wait until they fix it."
Whether a particular NSI problem, an E*Trade problem, or an Ebay problem, or a Cisco CCO problem is really significant enough to talk about semi- publically is tough. It would be nice if each company was willing to make timely disclosures about problems.
E*TRADE's annual report for 1999 makes some disclosures about infrastructure failures, by the way.
But as we've seen time and time again, companies would prefer to never to acknowledge they had any problem until it becomes impossible to ignore (e.g. Worldcom's 10 days of hell last summer).
Indeed. Just because they should be reported doesn't mean they should be reported to NANOG.
I think outage notification and operational issue discussion are different things and should go to different places.
That worked well for the NSFnet with nsr@merit.edu split from regional-techs@merit.edu, and the Internet has only grown since then, and the scaling benefits would be much more sizable.
Opinions?
--jhawk
It has a widely-held reputation in the industry as being one of, if not the, best mailing lists for such. I personally know of several people who subscribed for exactly that reason, myself among them. At 12:19 PM 1/29/2000 -0800, you wrote:
I support John in his opinion that NANOG is not an appropriate forum for real-time outage reporting.
participants (5)
-
Alan Hannan
-
John Hawkinson
-
Sam Thomas
-
Sean Donelan
-
Shawn McMahon