
On Sat, 25 Nov 1995, Matt Zimmerman wrote:
connectivity issues are MORE likely to be caused by interaction with other NSP's. Dissemination of problem information between providers helps everyone diagnose difficulties and keep their customers better informed with respect to current status and predictions for the near future (solutions).
Agreed, but it has to be done in an "easy" manner. I'm sure that several of the NSPs have concerns as to what this information will be used for. Everyone likes to portray the image of having a 99.98% uptime whenever possible, even though most folks realize that it just plain isn't possible, at least today. This sort of leads into the question of the various NOCs integration with whatever central repository of information we are shooting to provide. When provider X opens a ticket, will it automatically be reflected in the 'central' database? I doubt folks will go for that based on security alone. Or how about provider X's NOC staff fire off an Email to incident-report@outages.com? How will they be trained or reimbursed for their time spent on this service? [..facts about how useless mailing lists are removed..]
A more interactive shared system (ticket-based?) makes more sense, but may prove far more difficult to design. Problem classification, impact, severity, and location are all issues here, as well as the problem of associating such a record of a problem with its effects. That is, when a provider "discovers" a problem, how are they to know if it has already been "registered", and if so, how to reference the information associated with it?
Such an idea is already being discussed in several smoke filled rooms. :) Remedy/ARS has the ability to accept input for incident reports and queries to its database via an Email form. One could write a Web page containing the necessary parameters in a form, and then transpose that to an Email sent to the AR system. Implementing such a system is really based around cost issues, as the coding is relatively trivial. (CGIs come to mind) (I used the above example because it's something we've done in the past and I know works, there are probably others) On the issue of connectivity -- agreed; some lonely site should not be allowed to be the only host. However -- if connectivity between certain NSPs also falls apart, you're equally screwed. Some sort of distribution of the "centralized" source of information would be needed. I forsee the most difficult part of the process being, convincing all of the associated Operations groups into sharing their outage information. Providing a simple mechanism for either the customer service, or operations staff to disseminate outage information to the "server," would be equally challenging. If step (a) were to be overcome, I would assume that writing a procedure to fit (b). -jh-