Re: Quantifying risk of waiting vs. upgrading for router vulnerabilities

21 Feb 2005

      At 1:05 AM -0700 1/31/05, Pete Kruckenberg wrote:
...
After another long week of dealing with "upgrade now or die"
vulnerabilities, I'm wondering...
Is there data or analysis that would help me quantify the risks of
waiting (while I plan and evaluate and test) vs. doing immediate
software upgrades?
With many router vulnerabilities, exploits are in the wild within 24
hours. But how often are they used, and how often do they cause actual
network outages? There have been several major router vulnerabilities
during the last 2 years which have provided a reasonable data sample to
analyze. Can that data be used to create a more-accurate risk-analysis
model?
The risk of outage is very high (or certain) if I jump into upgrading
routers, and the quicker I do an upgrade, the more likely I am to have
a serious, extended outage. However, this is the only choice I have
absent information other than "every second gives the miscreants more
time to bring the network down."
If I delay doing the upgrade, using that delay to research and test
candidate versions, carefully deploy the upgrade, etc, I reduce the
risk of outage due to bad upgrades, at the expense of increasing the
risk of exploitation.
I'd love to find the "sweet spot" (if only generally, vaguely or by
rule-of-thumb), the theoretical maximum upgrade delay that will most
reduce the risks of upgrade outages while not dramatically increasing
the risks of exploitation outages.
Ideas? Pointers?
Pete.
Pete,

You touch on a broad area where I think there is data relevant to 
network operators, but they aren't aware of it:  clinical medicine, 
more narrowly public health, and specifically epidemiology.  What you 
describe is very much like the situation where there is a disease 
outbreak, and, perhaps only an experimental drug with which to treat 
it. How does one look at the risk versus reward tradeoff?

There are many medical approaches to considering the value of a drug 
or treatment -- this falls into the discipline, as well, of "evidence 
based medicine."  There are assorted metrics for such things as "cost 
per year of life extension", and, more recently, "cost per year of 
quality life extension."  These models include the cost of the 
treatment and both the probability of protection/improvement and of 
adverse effects. Adverse effects can range from a drug having no 
benefit but doing no harm, but precluding the use of a drug known to 
have some, but probably lesser efficacy -- or perhaps much more 
toxicity.  The "clinician" has to assess the probability that the 
software or medical "bug fix" will kill both the bug and the patient.

It may be worthwhile to study the rather fascinating and 
time-sensitive problem faced every year, in coming up with the 
appropriate mixture of influenza substrains for that year's vaccine. 
The process is rather fascinating.  Influenza strains initially 
classify by which of three H and two N factors are present in a given 
virus. There are substrains below, say, H3N2.

In general, the first of the new year's strains start in animals in 
Western China. They may mutate on their way into human form.  There 
is a practical limit on how many strains can be put into the same 
batch of vaccine, and there is a lead time for vaccine production. 
Vaccine specialists, even ignoring things like this season's 
production disaster, have to make an informed guess what to tell the 
manufacturers to prepare, which may or may not match the viral 
strains clinically presenting in flu season.

There really are a number of applications of epidemiology to network 
operational security. In this community, we note the first 
appearances of malware and have informal alerting among NOCs and 
incident response teams, but I am unaware of anyone using the formal 
epidemiological/biostatistical methods of contact/first occurrence 
tracing.  Applying some fairly simple methods to occurrence vs. time 
vs. location, for example, can reveal if there is one source of 
infection that infects one victim at a time, if there is contagion 
(different from infection) from victim to victim, etc.  Indeed, some 
of the current work in early warning of biological warfare attack may 
have useful parallels to recognizing random infection versus an 
intelligently controlled BOTNET DDoS.

Howard

Re: Quantifying risk of waiting vs. upgrading for router vulnerabilities

Howard C. Berkowitz