I've been reminded I was going to update my draft of Responsible Network Management Guidelines. The latest version is available at http://dranet.dra.com/draft-donelan-rnmg-latest.txt. In addition to any substantive comments, now is the time to correct the grammer and spelling nits. I plan on throwing this into the Informational RFC process before the next IETF meeting. -- Sean Donelan, Data Research Associates, Inc, St. Louis, MO Affiliation given for identification not representation
On Wed, Sep 24, 1997 at 07:29:38PM -0500, Sean Donelan wrote:
In addition to any substantive comments, now is the time to correct the grammer and spelling nits. I plan on throwing this into the Informational RFC process before the next IETF meeting.
Here goes. Didn't realize it was that small... (Warning: I got about halfway through, and realized I was editing, rather than just copyediting -- feel free to ignore those parts if you see fit.)
Operational Requirements Area S. Donelan INTERNET DRAFT DRA <draft-donelan-rnmg-01.txt> September 1997
Responsible Network Management Guidelines
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference ^ material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts ^ Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
Rational and Scope Rationale
This document provides Responsible Network Management personnel of
All three of those words likely should not be capitalized; you're using the term generically, not as a job title.
Internet Service Providers (ISPs) and Internet Service Customers
I know you had to make _something_ up to call them there... but I always have a vague, unallocated unease about new initialisms. Might you just say "their customers"?
(ISCs) with guidelines for network management when the following conditions arise:
- Routine Maintenance Activity - Problem Reporting and Referral - Escalation - End-to-End Testing - Customer Notification - Emergency Communications - Network Service Interuption Measurement
Specific procedures will require negotiations between the organizations involved. These guidelines do not replace or supersede ^^^^^^ "are not intended to"?
agreements or any other legally binding documents.
Responsible Internet Service Provider
A more familar term in Internet Standards is an Autonomous System. Since this document has additional requirements than an entity represented by an Autonomous System or Systems, this document creates a new entity.
"has...than" is a clumsy construct at best. Are you trying to say Since this document defines requirements additional to those customarily expected of the operators of an Autonomous System, it must define a new entity, encompassing AS's and also other organizations. ?
The Responsible Internet Service Provider (RISP) has overall responsibility for Internet service between its Internet Service Customers and other Internet Service Providers making up the Internet.
Ok, so, basically, a RISP is a repository for a contact?
An Internet Network, Autonomous System or group of Autonomous Systems may designate another entity to act on its behalf as its Responsible Internet Service Provider. In this document, Internet Service Customer (ISC) shall refer to the collective network, Autonomous System or Systems which designated the Responsible Internet Service Provider as their agent.
Roughly. An agent, in legal terms.
The Responsible Internet Service Provider is responsible for:
-- Providing a contact that is readily accessible 24 hours a day, 7 days a week.
-- Providing trained personnel.
-- Acting as the Internet Service Customer's (ISC) primary contact in all matters involving Internet Service between Internet Providers.
-- Accept problem reports from Internet Service Customers and casual
Accepting
end users or other parties receiving Internet Service problem reports. The RISP may prioritize problem reports from its own ISCs, or refer casual end users to their primary RISP, if known.
This graf sounds like it's making an assumption that _I_, at least, apparently am not equipped to make, as I fell off a couple turns back. The first sentence could use to be recast.
-- Advising the ISC when there is an ISP failure affecting the ISC
-- Isolating problems to determine if the reported trouble is in the ISP's facilities or in other providers' service.
-- Testing cooperatively, when necessary, with other providers to further identify a problem when it has been isolated to another provider's service.
Suggest moving the parenthetical after "providers".
-- Keeping its ISC advised of the status of the trouble repair.
-- Maintaining complete and accurate records of its own customers and
So, basically, a RISC is an administrative and technical Point of Contact designee?
Routine Maintenance Activity
Responsible Internet Service Providers should perform routine maintenance work during hours of minimum traffic to impact the least number of customers. In most areas, the period of lowest Internet traffic is between 1am and 6am local time. Trans-contential and inter-contential connections should consider the local time on each end of the connection.
It's worthy of note (it was in one of the last 4 RISKS Digests) that, for some things -- backbone gear, NAP's, webfarms, etc -- there _is_ _no_ good time to do maintenance. The audience is world wide and, statistically, you simply can't find a good hour to do it. It might be suggested that each category of operators ought to keep their own traffic logs, to roughly hourly granularity, maybe, to facilitate the determination of "the best time to down the router".
Activities which may affect other Internet Service Providers should be coordinated with the affected providers.
Channels should be designed in advance for this sort of communication (email, voice, pager, etc.), and tested regularly?
Problem Reporting and Referral
The Responsible Internet Service Provider is responsible for performing all the necessary tests to determine the nature of the problem detected, or reported by its customers or by referral from other ISPs. If the trouble is isolated to an ISC or another ISP, the RISP will report the trouble to the appropriate ISC or ISP point of contact.
An example of the information exchanged in the problem referral report:
-- Description of the problem, including source address/name, destination address/name, application or protocol involved, when it last worked, when it stopped working, and any diagnostic messages or test data (i.e. ping, traceroute).
-- Customer reported problem severity
-- RISP determination of problem severity
-- The name and contact information of the person referring the problem
-- The referee's trouble ticket number, and origination date/time
-- The name of the person accepting the report
-- The acceptor's trouble ticket number, and acceptance data/time
Oh, _ghod_ if we could design a standardized trouble ticket interchange format. Excuse me, I feel an RFC coming on. :-)
Periodic status reports shall occur when the problem has been isolated, when there is a significant change in the status of the problem, and when negotiated time intervals expire. Escalation will be according to negotiated procedures.
And prior negotiation should probably take place to decide on equivalencies of severity levels and escalation justifications, etc. Sorry; I'm a systems designer by trade; the stuff just runs out of my fingertips. :-)
Problem isolation may require cooperative testing between the ISC and ISP(s), which shall be provided when requested. The provider making the test is responsible for coordination.
When the problem has been cleared, the ISP/ISP or ISP/ISC shall advise the other the problem has been cleared. When closing a problem report between ISP/ISP or ISP/ISC, the disposition should be furnished by the organization closing the ticket.
Are thos slashed abbreviations _correct_? I guess I missed something; I don't have an expansion ready to hand that fits.
An example of the information exchanged in the problem disposition:
-- Trouble ticket number
-- Referral datetime
-- Returned datetime
-- Trouble identified as
-- Resolution details
-- Service charges, if the ticket resulted in a service charge
If there is a disagreement about the disposition of a problem ticket, the parties involved should document their respective positions and the names of the individuals involved. Escalation will be made according to each organizations escalation procedures.
Glad this is in here... :-)
Escalation
Each ISP and ISC shall establish procedures for timely escalation of problems to successive levels of management. The procedures should include the provision of status reports to the other provider or customer regarding the ticket status. Both technical and management contacts should be included in the escalation procedures.
End-to-End Testing
Networks may experience problems which cannot be isolated by each provider individually testing and maintaining its own services. Each providers' service may appear to perform correctly, but trouble appears on an end-to-end service. The ISC's RISP should coordinate end-to-end testing with each sectional provider by problem referral through their Responsible Internet Service Provider. Each Internet ^^^^^ Pronoun without a referent. Whose? The ISC? The RISP? The sectional
I suspect that's not enough... but we'll see... provider? (There's another new piece of terminology.)
Service Provider should accept the referral request for end-to-end testing coordination, and provide the contact information for the next sectional provider to the original requestor.
This assumes to some extent that the customers -- even though they're paying for the lines -- can actually _get_ the information from the vendors... something which isn't always true. Perhaps a statement encouraging that?
Customer Notification
During a major outage a potential concern is customer goodwill and , network congestion caused by repeated customer attempts to access the down network. An informed customer can reduce customer frustration, and network congestion.
Pre-planning for quick notification can be most beneficial in alerting customers.
Some example methods to notify customers include:
-- If operational, network access equipment can display an alert when customers connect. The alert should be displayed before the customer logs into the network. If the network fails during or after attempting to validate the access information, the alert should not compromise any authentication information.
Particularly consumer software _really_ ought to have provision for a messaging system, like the motd and/or wall. The lack of this on, say, Win95 drives me up a tree...
-- Customer service calls increase dramatically during network failures. An informed customer representative can advise the customer on the best course of action. A method to quickly instruct customer service representatives on the available options should be implemented.
Putting known outages on the automated attendant, like the cable companies do, would be nice. I know good engineering will _never_ win out over paranoid management, but if I'm paying for a service, I don't wanna _guess_ when it's broken. I don't _care_ if the announcements make life harder for the sales team. Maybe they won't have so many outages...
-- The media, radio or television, can be used to inform the public. Pre-arrangements, and planning are needed to ensure only designated contacts are made with the media.
Is there _any_ part of the net that's this globally critical?
-- Other automated announcements, such as World Wide Web pages or e- mail distribution lists with backup through other providers, recorded telephone status lines, or broadcast FAX/Pager notifications.
Public notifications, when utilized, should not make reference by name to the organization believed causing the problem unless the ^ to be organization causing the problem has been confirmed. Internet network problems can be difficult to isolate, and can give misleading indications to their true origin.
Confirmed is a sticky concept. I wouldn't _ever_ announce it, myself. Unless that party did, and "who's allowed to say you can announce it" is something you need to track.
Emergency Communications
Recognizing that all Responsible Internet Service Providers have a responsibility to provide an adequate level of support for their service and/or products, it is recommended they participate in an backup emergency communications system.
Like having valid whois(1) info? :-)
The backup emergency communications system should not depend on the operation of the primary network for obtaining contact, authentication, or other communications information during a network problem. Each RISP is responsible for providing a Emergency Point Of Contact. It is recommended each Emergency POC have at least one out-of-band contact method, such as an internationally dialable (non 1-800) voice and/or fax telephone number. Each RISP should pre- arrange a method for verifying the identity of the Emergency Point of Contacts using alternative communications methods, such as a
Contact
challange/response code-word or call-back to a known telephone
challenge
number.
Note that this isn't always good enough, if the problem is an attack. Call-forwarding and butt-sets, doncha know.
Each RISP should maintain a current off-line copy of the emergency contact procedures for each gateway inter-connection. Each RISP should establish procedures for keeping the off-line emergency contact procedures updated. Each RISP shall test and verify its own emergency POC procedures are accurate and functioning on a regular basis, no less than once a year.
On the net? Monthly...
Network Service Interuption Measurement
Each ISP/ISC should maintain accurate records about service interruptions to measure and develop trend analysis of their network availability.
Security Considerations
You may wish to choose a different section title. "Security Considerations" is customarily used to mean "...of implementation of the procedures in this RFC", which is, I think, not what you mean here...
-- Maintain a complete and accurate record of a RISP's own customers and inter-provider gateways.
-- Public notifications, when utilized, should not make reference by name to the organization believed causing the problem.
-- If the network fails during or after attempting to validate the access information, the alert should not compromise any authentication information.
-- Each RISP should pre-arrange a method for verifying the identity of Point of Contacts using alternative communications methods, such as a challange/response code-word or call-back to a known telephone
challenge
number.
Author's Address
Sean Donelan Data Research Associates, Inc. 1276 North Warson Road Saint Louis, MO 63132
Phone: +1-314-432-1100 EMail: sean@DRA.COM
Not bad. But, from down here in the trenches, I think it could use another round of flogging. How much commentary have you gotten on it? Cheers, -- jr 'will stick fingers in others' RFCs for food' a -- Jay R. Ashworth jra@baylink.com Member of the Technical Staff Unsolicited Commercial Emailers Sued The Suncoast Freenet "People propose, science studies, technology Tampa Bay, Florida conforms." -- Dr. Don Norman +1 813 790 7592
participants (2)
-
Jay R. Ashworth
-
Sean Donelan