On Tue, 25 January 2000, John Hawkinson wrote:
Is your goal to get the word out to network providers of people who use E*TRADE? Do you really expect that many of them will forward this announcement or make good use of it? Should a message be sent to NANOG every time CNN, Netscape, or Yahoo go down?
Am I missing something here? [Like a sense of humor?]
On Tue, Jan 25, 2000 at 08:40:46PM -0800, Sean Donelan wrote:
External events have an affect on network service and network operators. Why do most NOC's have one or more monitors tuned to CNN and the Weather channel all day and all night? Ok, I know the real reason, but what is the reason the sales people tell prospective clients?
The question is really one of editorial policy and how significant is any individual event. I don't think there is really one answer which can cover everything.
This is true. That is part of why I asked the question they way I did:
Is your goal to get the word out to network providers of people who use E*TRADE? Do you really expect that many of them will forward this announcement or make good use of it? Should a message be sent to NANOG every time CNN, Netscape, or Yahoo go down?
While most people interpreted it rhetorically, it was actually asked with a significant literal component. When asking the list a question like this, though it's hard to know how to contend with the potential silent majority versus the exuberant minority (I've heard from some of people who agreed with the position I espoused). It appears that there is a significant population among the NANOG readership who benefit from this sort of notification. Personally, I believe that the notification is useful and valuable, however my opinion is mostly that NANOG is not the right place for it. This is an opinion I have held for a long time, and it was solidified back when a mailing list called nsr@merit.edu existed. I believe it stood for "Network Status Reporting". It's awful hard to find archives of it any more (hey, merit!), but google.com has one message cached which demonstrates the flavor: | To: nsr@merit.edu | Subject: 07/22/94 NSFNET Backbone Unreachable 10:00 - 11:00 UTC | From: ANS Network Operations Center <noc@noc.ans.net> | Date: Fri, 22 Jul 1994 11:19:20 GMT | | 07/22/94 NSFNET Backbone Unreachable 10:00 - 11:00 GMT. | | At 10:00 UTC gated exited on all core routers and ENSS's. | All networks announced by NSFNET sites were unreachable or | experienced varying degrees of instability during this window | while gated was restarted across the NSFNET backbone. The | cause of this outage is currently being pursued by our engineers. | | Stephen Powell | ANS Network Operations Center Well, though in many cases notifications were sent to nsr about circuit outages and individual ENSS outages. I believe the charter of the list said that it was appropriate for all sorts of outage reporting, not simply NFSnet backbone reports, however I seem to rarely remember that ever happening, even then. Similarly, the Internet Monthly Report from Anne Cooper at ISI would summarize notable events and regionals (and anybody else, it seemed) would submit monthly reports of significant events. You didn't see discussion of high-level issues on the NSR list, and that was the right thing; issue-discussion was seperate from operational notification. I find that seperation to be incredibly useful. Perhaps it is because at this point I deal less with day-to-day operational issues (company scaling), but I think even in the heyday I would have felt the same. Bill Simpson points out: / In the case of a small rural ISP with less than 4,000 customers, an / amazing number of folks called about our "problem", and the NANOG list / is just about the first place I look for a heads up or explanation. And of course, NANOG doesn't information about most of these outages, and while I think it should not, that doesn't mean I do not think that those outages should go unreported. I would propose that we consider creating a mechanism for that sort of outage reporting. It seems to me that there are two broad categories: a) Official outage reporting from the organization experiencing the outage b) Unofficial outage reporting from someone affected by the outage. Both are valuable and occur in different ways, and unfortunately it is the case that in today's business climate, the latter is likely to be more accurate and detailed. The obvious implementations that occur to me are i) A mailing list like NSR; just bring it back, potentially moderate it to ensure that the usage is consistent with the charter, and redirect postings from NANOG to such a list. ii) A web-based format where people can note outages, and comment on them usefully (perhaps ala slashdot?). I think both of those ideas could work, though both have bene tried and not worked very well for various reasons [what ever happened to outage@dal.net?]. I would ask, however, that someone *not* take this message as the impetus to go out and set up such a thing, but instead try to listen to reasoned discussion and coordinate it with the community. Back to Sean:
The Internet (RTM) worm affected only VAX and Sun computers, an estimated 10% of the Internet of the day. If you didn't use Sun or VAXen, it would have been an irrelevent event for you.
Not only that, it affected *hosts* (unless of course, you were using Suns or VAXen as gateways, as I'm sure many people were). Surely hosts are outside the scope of nanog? ;-) Seriously, though, I think it is terribly unfair to compare something like an Internet-wide worm to a simple DNS misconfiguration. The latter is one person's problem and can be fixed with a quick phone call to the right person (Assuming you can find that person, 20 phone calls later), whereas the former is a huge management problem that cannot be easily dealt with.
When AOL forgot to put a GUARDIAN password on its domains, and there where changed to a tiny ISP, if you didn't use AOL it may have been irrelevent to you.
For the most part, yes, though I believe that this caused real operational effects for large volumes of mail queued on mail servers of network providers in North America, and so was operationally relevent. Failed DNS queries to E*TRADE just don't have the same level of visibility. They may affect customers equally, but they affect providers not-at-all.
When Cisco, Bay and GATED BGP implementations had a disagreement on whether ASNs could be repeated in an as-path, it may have been irrelevent to you if you used a different BGP implementation or router.
You're being really off-the-wall here. It's quite clear that a statistically significant fraction of North American network operators use those implementations, so discussion is meritted. Especially because there is *something* to discuss, not merely "Oh, look, it's broken. We can now wait until they fix it."
Whether a particular NSI problem, an E*Trade problem, or an Ebay problem, or a Cisco CCO problem is really significant enough to talk about semi- publically is tough. It would be nice if each company was willing to make timely disclosures about problems.
E*TRADE's annual report for 1999 makes some disclosures about infrastructure failures, by the way.
But as we've seen time and time again, companies would prefer to never to acknowledge they had any problem until it becomes impossible to ignore (e.g. Worldcom's 10 days of hell last summer).
Indeed. Just because they should be reported doesn't mean they should be reported to NANOG. I think outage notification and operational issue discussion are different things and should go to different places. That worked well for the NSFnet with nsr@merit.edu split from regional-techs@merit.edu, and the Internet has only grown since then, and the scaling benefits would be much more sizable. Opinions? --jhawk