On Thu, 8 Nov 2012, Phil wrote: NSR isn't ISSU. The equipment vendors call upgrades with NSR failover, ISSU; if their marketing people feel that a 0.5 or 6 second hit is "good enough".. If you care about the 0.5 seconds, it's important you speak their language, and require that vague expressions such as "In-Service Software Update" be clarified. Personally, I don't trust any of it; routers should have regular
On 11/8/12, Mikael Abrahamsson <swmike@swm.pp.se> wrote: maintenance windows, period, with a minimum duration of 30 minutes. And software updates to fix known bugs should be done regularly, and during those windows. NSR for ISSU, or ISSU with a small hit called ISSU, is likely inexpensive for the network equipment vendors, because they already invested hundreds of thousands of developer hours in implementing and validating NSR functionality to provide redundancy against device failure. The process of replacing code on a hot device, and restructuring any stored data to match expectations of the new code, without suspending or delaying execution of any code during that process, is possible, but a non-trivial problem: whose solutions add complexity (and therefore a higher risk of bugs and unexpected results) to the upgrade process. You might reduce the hit from 0.5 seconds to 0.01 seconds by implementing true in-place upgrade 90% of the time; but 10% of the time, the online upgrade either fails, because of an issue with the online patch, or unexpected interactions between partially patched functional units, result in a period of incorrect device operation --- until the patching finishes, and continued use of bad data even after patching finished.
ISSU contains the wording "in service". 6 seconds of outage isn't "in service". 0.5 seconds of outage isn't "in service". I could accept a few microseconds of outage as being "ISSU", but tenths of seconds isn't in service.
What is the maximum percentage more would your organization be able to justify paying the network equipment vendor for routers/switches, to reduce the ISSU hit from 0.5 seconds to a few microseconds? :)
The main remaining hurdle is updating microcode on linecards, they still need to be rebooted after an upgrade.
-- Mikael Abrahamsson email: swmike@swm.pp.se -- -JH