New subject: PI vs PA Address Space

18 May 1995

      | Stronger hierarchy leads to:
| 	- strong regulation of ISPs
| 	- hinders competition
| 	- no incentive to solve difficult routing problems
| 	- leads to governmental regulation and control

Let's revisit the economics of the global Internet.
You pay for three things, two of which are real products
and one of which is an elasticity factor:

	1/ delivery of packets into the global Internet
	2/ receipt of packets from the global Internet (reachability)
	3/ warm fuzzies ("they know what they're doing; they 
		are responsive to my needs")

Item (1) is what you get when your immediate service provider
turns up your circuit and you say

	ip route 0.0.0.0 0.0.0.0 Serial0

on your router.  The rate at which you can deliver packets
into the Internet is the minimum of the sum of egress
bandwidths from your local small-i internet, any choke points
in the path to egress points, or the width of your circuit.

For example, in the simple case, if you have an E1 and your
service provider has a 512kbps circuit to AlterNet, your
maximum delivery rate of traffic into the global Internet is
512kbps plus any local connectivity.

The pricing for item (1) is typically the cost of the
physical connection to you plus some value which reflects
the effect your bandwidth utilization is likely to have
on choke points plus a percentage.

Item (2) is what you get when your immediate service provider
has arrangements in place to have their customers' prefixes
carried and made reachable nearly ubiquitously.

("Nearly" covers firewalls and networks with policy constraints
which are enforced via routing mechanisms).

Until fairly recently, the guarantee of even nearly ubiquitous
reachability was impossible to make thanks to the way the
AUP was enforced.

However, once you had the NSFNET backbone service carrying
your routing information, you generally nearly ubiquitous
routing, thanks to the fact that practically everyone
defaulted to AS 690.

Then along comes Change.

The first two huge changes were the CIX and MAE-EAST, two
enormous steps away from the model of AS 690 as the network
to which you simply defaulted.

Suddenly rather than having PSI aggregated behind AS 690,
AlterNet started hearing all their routes directly, and
preferring those.

Generally speaking, the MAE-EAST participants started on
a path wherein they preferred any announcement over anything
heard from AS 690, which often enough was left as a default.

Over time, some of the MAE-EAST participants stopped
defaulting to ANS, partly because the amount of routing
information reachable only from ANS grew smaller, and partly
because in several ways it's easier to manage full routing
for recovery and optimization than it is to manage partial
routing plus a default.

Eventually routers stopped being able to handle full routing
in 16Mb of memory, and suddenly the very real cost of
carrying routing information around became clear to a number
of providers: how much did replacing a bunch of mostly-AGS+
routers with 64Mb Cisco 7000-series routers cost?

This was one of the big pushes behind serious deployment of CIDR.

CIDR's principal goal was to keep routing tables small by
hiding detail, that is, by aggregating into bigger blocks.
(Its secondary goal, full classlessness, is being played with
as folks start experimenting with interdomain routing of
subnets of classful networks).

Originally the need to keep routing tables small was to
prevent routers which had not been converted to 64Mb boxes,
and which could not get by without knowing large amounts of
routing information, from running out of memory and crashing.

Recently we have started noticing that, while memory
consumption is still a real issue for a number of people in
the world, those people with 64Mb boxes are starting to
notice that the amount of CPU used by carrying full routing
is increasing, especialy as interdomain convergence time
is decreasing to the point where an update is seen by most
Ciscos in the U.S. in a matter of a few seconds.

In normal operation, with the normal background noise of a
few flaps per second (largely attributable to flakey network
connections and people doing dynamic routing updates for
dialup users, and some level of longer-term transitions),
most routers talking BGP hardly notice any CPU hit at all.
Even those routers doing siginificant amounts of as-path
and prefix-based filtering for various reasons (mostly
involving backup arrangements and making sure bad things
don't happen (giving or receiving accidental transit, not
accepting or propagating certain bad prefixes (like not
accepting an announcement for one's own backbone network
from external peers), and so forth)) are borderline.

A couple such boxes spend a constant 30-45% of their CPU
handling BGP, others run at a constant 20% handling BGP.

When a big transition happens, such as when someone
at MCI or Sprint types clear ip bgp * at MAE-EAST+,
several routers all over the world jump from less than 10%
to 100% CPU utilization for on the order of ten minutes.

As the number of prefixes increases -- and routing flap --
both the amount of CPU spent on normal everyday processing 
and the amount of real time necessary to handle a major
transition increases.

One observation that has been made is that smaller prefixes
are liklier to flap than larger prefixes.  An analysis of
what prefixes were flapping that I did for the last NANOG
seemed to indicate (after much discussion with the folks
originating the prefixes) that the majority of flaps were
caused by /24s used by dialup customers that got introduced
into the global routing system upon connection, and removed
when the dialup customer hung up.

Multiply this by lots of simultaneous dialup customers
and you have a problem.

The problem is fixable by aggregation.  If you aggregate
all these /24s (or /28s or whatever) into something bigger,
that something bigger is much less likely to flap, and
moreover can easily be set up so that it never flaps at all.

Nailing down these problems helps considerably, but the
amount of CPU used by BGP in increasing numbers of routers
is getting scary.

Following the line of reasoning -- which seems to hold up in
practice -- that on average, smaller prefixes are likelier
to flap over time than larger prefixes, one really wants
to see a large reduction in the number of smaller prefixes
carried globally.

That's not to say that local delegations should be big;
a dialup user should get as small a chunk of address space
as necessary, a dedicated line customer likewise, in an
effort to avoid wasting address space, and also in an effort
to assist in aggregating lots of individual connections
behind a largeish (/18 or shorter) prefix.

So, on the theory that pretty much every prefix that's /18
or shorter aggregates enough links and flap-prone things
within it, and with the observation that very few prefixes
shorter than 18 bits flap in normal circumstances (pace one
international connection that was so completely saturated
that BGP kept falling over due to keepalive timeouts, which
caused traffic to fall off, which allowed BGP to re-establish
itself, causing the cycle to repeat -- this got fixed),
several NSPs started talking about how to go about reducing
the number of prefixes longer than /24 with global scope to
essentially zero.

That is, while you can have a /24, /28 or /32 now or in the
future, and while it can have local scope within a small-i
internet (even one that's a big chunk of the big-I global
Internet), right now nothing longer than /24 will have
global scope at all, and ***in future blocks***, by default,
nothing longer than /18 or /19 (it's /18 now, but it's
not entirely inflexible, and dialogues continue) will 
have global scope.

(I note  *** in future blocks ***  because people get
really terrified that their current /24 will become
useless Real Soon Now.  That is not the plan, and likely
won't be necessary any time soon, _especially_ if 
future allocations can be done right.  Things are trending
in the right direction.)

"Local scope" could be as small as your immediate provider,
or that provider's provider, or even a largeish NSP.
However, if it's not aggregatable into a larger block,
it won't work for interdomain routing among several
size-large NSPs.

Again, the general idea is to keep interdomain routing
working in such a way that it doesn't make moving packets
impossible.

Which returns us to point #2.   Arranging global reachability
for a prefix is nontrivial; lots of things happen in the
background at all levels in order to make global routing
work.

You pay your provider to pay their provider to pay their
provider etc. to work out the hard problems so that a single
piece of email, or an RADB object update or an addition to a
configuration in a router or a phone call is all that's
necessary for you to announce a new network out to the world.

There's a problem though, and that is the cost of making
some prefixes reachable is much greater than others.

In fact, the cost of making everyone's nonaggregatable /28,
/29, ... /32 reachable globally is so great that it is
easier to say it simply cannot happen, in large part because
the cost includes designing, building and deploying new
router technology in several NSPs and ISPs, so that the
routers of the world can actually handle enormous numbers of
prefixes, especially when someone types clear ip bgp *
at a large exchange point.

Finally, (3).  It's clear that people have different needs
and wants and requirements from their service providers.
Generally speaking, the bulk of Sprint's customers want the
global Internet to work, because their users want
sex-on-demand with people in Finland and to go poking around
Brandy's Babes' home pages or www.plaything.com, or whatever
it is that users do.  The bulk of Sprint's customers are
pretty clever and realize that while there are alot of
things that look really really ugly, even or especially from
their perspective, they really are necessary in order to
keep the global Internet working.

Among the things we do realize is that yes, there are side
effects to proxy aggregating a size-large service-provider's
non-aggregated CIDR blocks, and yes there are side-effects
involved in pushing for renumbering into large aggregatable
blocks, and yes there are side-effects to putting up filters
that block prefixes longer than 24 bits, and yes there are
side effects to rewriting our old policy of, "we talk BGP
with you if you're a reseller period" to "we prefer not
to talk BGP at all, unless there is a strong technical
reason to do so".

However, in all these cases the position we take is these
ugly things (and yes, a whole bunch of much less ugly things)
are necessary in order for the global Internet to work, and
in order for us to offer you a level of service such that
your customers or corporation or whatever doesn't scream
bloody murder at you because things Just Don't Work because
some router somewhere just keeled over because it was asked
to do too much.

Moreover, it's not just Sprint taking this line with their
customers -- others do too, and give their customers the
warm fuzziness that their customers are willing to pay for.

So, in the final analysis, what we're pushing for does
not reduce competition in an economic sense, although
it does have side-effects.  There is plenty of room in 
the current marketplace for all sorts of competition, and
even more room for specializaton and cooperative deals, 
which is normal for a growth market of this magnitude.

Lastly, the people most affected by the side-effects
of keeping Sprint's part of the Internet up and running
and connecting more than sixty countries and four
hundred IP resellers are Sprint's customers and their
customers.  Given how little we directly compete with
our customers anyway, while they are right to wish there
were some other way (so does Sprint!), I think they
also realize that the last thing we are trying to do is put
them out of business or make it difficult for them to compete.

Healthy customers makes for healthy revenues.

And a healthy Internet makes for healthy customers.

That's all.

	Sean.

Re: PI vs PA Address Space

Sean Doran

bmanning＠ISI.EDU

Karl Denninger, MCSNet

Michael Dillon

peterb＠telerama.lm.com

Jerry Anderson

Michael F. Nittmann

jerry＠mid.net

David R Conrad

Daniel Karrenberg

peter＠swan.lanl.gov

Michael F. Nittmann

bmanning＠ISI.EDU

tags

participants (11)