I suppose it is more fun to criticize policy and NSPs, but it may well be a hole in the BGP protocol, or more likely implementations in vendor's code [or user's implementation of twiddleable holddown timers].
My (possibly misinformed) understanding was that certain NSPs running Cisco backbones had holddown timers configured to delay withdrawls. Even after 7007 was disconnected, there were 7007 routes still being advertised well over an hour later. I do not believe these NSPs are going to have timers configured for >1hr. We've seen a problem before where a transit provider (Cisco based) was causing us problems, and we decided to turn them off. They were still advertising our routes an hour later. (Provider unconnected with any in this case). Pulling the session back up and clearing it did not help things. I'd therefore suggest that your analysis is correct. >80% of the downtime is due either to a protocol bug or a s/w bug somewhere, not NOC failure. Alex Bligh Xara Networks