On Wed, Oct 22, 2014 at 1:31 PM, Barry Shein <bzs@world.std.com> wrote: [snip]
The unix community has exerted great amounts of effort over the decades to speed up reboot, particularly after crashes but also planned. Perhaps you don't remember the days when an fsck was basically mandatory and could take 15-20 minutes on a large disk.
Then we added the clean bit (disk unmounted cleanly, no need for [snip] And you whisk all that away with "it's not really clear to me that 'reboots in seconds' is a think to be optimized"????
False dilemma. Optimizing reboot time down from 20 minutes to 1 minute is a significantly meaningful improvement; it's literally a 85% reduction in time spent during each boot process from the original time. Reducing boot time from 20 minutes to 10 seconds is not significantly better than reducing it to 1 minute. A different choice of tradeoffs is more appropriate to different kinds of systems, depending on their use case (Desktop vs Server)! Especially, when the method of reduction is subject to diminishing returns and increasing fragility or increasing complexity -- greater risk that something is breaking or more potential for unreliability is introduced into the startup process. Also, you may very well spend more time booting your system in order to troubleshoot, the fact that some applications are starting up in an unexpected order resulting in some issue.
To me that's like saying it's not important to try to design so one can recover from a network outage in seconds.
If you need to ensure that a service is not disrupted for more than seconds, then reboot is not the answer. It is some form of clustering. Reboot as a troubleshooting procedure is for desktops. 10 seconds from power on to user interface for desktops, will meaningfully improve the user experience, but not for servers. For servers, you ideally want to take the misbehaving node out of service and let its failover partner takeover. -- -JH