outage/maintenance window opinion
Trying to get clarification on an issue. Maintenance/outage window is 2:00AM to 5:00AM, during the window the router we are working on fails and does not come back online until 8:00AM. From a outage reporting/documentation standpoint is the outage start time 2:00AM or 5:01AM since 5:01AM is when the maintenance window and planned outage was over... My take is that the outage starts when the planned maintenance/outage window is over at 5:01AM. Luke Luke Parrish Centurytel Internet Operations 318-330-6661
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 It depends. If your device(s) was part of the change management notification then that's correct. regards, //virendra// Luke Parrish wrote: | Trying to get clarification on an issue. | | Maintenance/outage window is 2:00AM to 5:00AM, during the window the router | we are working on fails and does not come back online until 8:00AM. | | From a outage reporting/documentation standpoint is the outage start time | 2:00AM or 5:01AM since 5:01AM is when the maintenance window and planned | outage was over... | | My take is that the outage starts when the planned maintenance/outage | window is over at 5:01AM. | | Luke | | Luke Parrish | Centurytel Internet Operations | 318-330-6661 | | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCSD8+pbZvCIJx1bcRAkB3AKCMIDKF4yMufSfmPtBpS9JQ+yRhfQCgumRE DxtoyAD6VvFrWENniiZiu90= =4R3G -----END PGP SIGNATURE-----
My opinion: For the customer, the outage starts when their service stops working* and ends when their service starts working again. Your goal should be to make that all happen during the maintenance window. If it doesn't, then the part that was during the window is "planned outage" and the part that wasn't is "unplanned outage". Good ISPs have good explanations for, and sometimes even monetary credit, for "unplanned outages". "Planned outages" can simply be explained by pointing at the announced maintenance interval policy. Matthew Kaufman matthew@eeph.com *Note that this can be different times for different customers, and "stops working" means different things to different people... Some customers are unhappy if their traffic is taking the slightly longer alternate path, others are happy as long as they can reach CNN, even if the rest of the net disappears.
Also, the possibility of equipment failure should *always* be factored into backout/recovery plans. You can have all the faith in your hardware that you want, but Murphy has enable/root. If it's something has simple as having redundant capacity to shift the load to, or as drastic as having a spare chassis sitting on hand, it's always a possibility, however remote. - billn On Mon, 28 Mar 2005, Matthew Kaufman wrote:
My opinion:
For the customer, the outage starts when their service stops working* and ends when their service starts working again. Your goal should be to make that all happen during the maintenance window. If it doesn't, then the part that was during the window is "planned outage" and the part that wasn't is "unplanned outage".
Good ISPs have good explanations for, and sometimes even monetary credit, for "unplanned outages". "Planned outages" can simply be explained by pointing at the announced maintenance interval policy.
Matthew Kaufman matthew@eeph.com
*Note that this can be different times for different customers, and "stops working" means different things to different people... Some customers are unhappy if their traffic is taking the slightly longer alternate path, others are happy as long as they can reach CNN, even if the rest of the net disappears.
On Mon, Mar 28, 2005 at 11:16:47AM -0600, Luke Parrish wrote:
Maintenance/outage window is 2:00AM to 5:00AM, during the window the router we are working on fails and does not come back online until 8:00AM.
From a outage reporting/documentation standpoint is the outage start time 2:00AM or 5:01AM since 5:01AM is when the maintenance window and planned outage was over...
My take is that the outage starts when the planned maintenance/outage window is over at 5:01AM.
I suspect that this depends rather entirely on the person who is *looking* at your outage reports. That is: if you're compiling them only for internal purposes, use whatever policy you like. If someone else, like say, NERC, is the intended audience, then they probably already have an answer to that question. My *personal* approach would be to use the end of the window, yes, but I am not the person you're reporting to. Cheers, -- jra -- Jay R. Ashworth jra@baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274 If you can read this... thank a system administrator. Or two. --me
Heya, I disagree as this entire event wasn't a planned outage. The "planned" part was what you intended to do and, if its anything like the maintenance reports that I send and receive, you typically state how long you expect the impact will be and that it will take place within your maintenance window. I'd argue that you should start the clock ticking when the outage first happened and then take off from that whatever you annouced as the impact duration. For example, if you said that the impact would be a ten-minute outage sometime during your window from 2am to 5am and your outage started at 2am, I'd count this as an unplanned outage starting from 2:10am. That's just my $0.02... On another note, you had a 3 hour window and a 6 hour outage. It sounds like someone didn't seriously consider the "back out" part of your change management planning. You really should have that as part of your process and have a hard deadline within the window after which you revert the network to its previous state. Eric :)
The event I stated in my first email was an example, not an actual incident. I think from the 30+ emails I have received I have had 2 responses that said I should start my SLA credits and outage minutes from the beginning of the window and the rest that feel the outage minutes start ticking when the planned outage was over... Regarding Change Management procedures, we do have had deadlines for backing out, verification, etc etc. But you are right... luke At 11:59 AM 3/28/2005, Eric Gauthier wrote:
Heya,
I disagree as this entire event wasn't a planned outage. The "planned" part was what you intended to do and, if its anything like the maintenance reports that I send and receive, you typically state how long you expect the impact will be and that it will take place within your maintenance window. I'd argue that you should start the clock ticking when the outage first happened and then take off from that whatever you annouced as the impact duration.
For example, if you said that the impact would be a ten-minute outage sometime during your window from 2am to 5am and your outage started at 2am, I'd count this as an unplanned outage starting from 2:10am. That's just my $0.02...
On another note, you had a 3 hour window and a 6 hour outage. It sounds like someone didn't seriously consider the "back out" part of your change management planning. You really should have that as part of your process and have a hard deadline within the window after which you revert the network to its previous state.
Eric :)
Luke Parrish Centurytel Internet Operations 318-330-6661
Luke Parrish wrote:
Trying to get clarification on an issue.
Maintenance/outage window is 2:00AM to 5:00AM, during the window the router we are working on fails and does not come back online until 8:00AM.
From a outage reporting/documentation standpoint is the outage start time 2:00AM or 5:01AM since 5:01AM is when the maintenance window and planned outage was over...
To a small degree, it depends on how long you anticipated the outage to be. Were you expecting a three-hour tour^h^h^h^houtage, or something shorter but opened a big window to give you flexibility on when to do it? I would say that a fifteen-minute expected impact means the outage started at 2:15AM (or fifteen minutes after your work interrupted services). My $0.005, pt
In this situation we were expecting to be done for the majority of the maintenance window, but yes I see your point. However I block out a 3 hour window for maintenance because the activities I am performing on the network could easily cause a longer service outage than planned as we all know. So if I plan for a 4 hour window but only expect 20 minutes of downtime that actually turns into 3 hours, as long as it is inside the maintenance window specified then it should not go against outage minutes. It was done in the window for a reason... ?? Luke At 02:05 PM 3/28/2005, Pete Templin wrote:
Luke Parrish wrote:
Trying to get clarification on an issue. Maintenance/outage window is 2:00AM to 5:00AM, during the window the router we are working on fails and does not come back online until 8:00AM. From a outage reporting/documentation standpoint is the outage start time 2:00AM or 5:01AM since 5:01AM is when the maintenance window and planned outage was over...
To a small degree, it depends on how long you anticipated the outage to be. Were you expecting a three-hour tour^h^h^h^houtage, or something shorter but opened a big window to give you flexibility on when to do it? I would say that a fifteen-minute expected impact means the outage started at 2:15AM (or fifteen minutes after your work interrupted services).
My $0.005,
pt
Luke Parrish Centurytel Internet Operations 318-330-6661
participants (7)
-
Bill Nash
-
Eric Gauthier
-
Jay R. Ashworth
-
Luke Parrish
-
Matthew Kaufman
-
Pete Templin
-
Vicky Rode