I had hoped to be able to stay out of this particular
argument, but several messages, notably Alan Hannan's,
seem to need some commentary.
I hope people will forgive editing lapses; the amount of
revision time I had for this is nearly zero.
There are two somewhat-related issues at play here.
The first is the issue of peering vs transit, which
goes to the heart of how BGP is used. The second is the
issue of peer vs. customer and boils down to "cui bono?".
I shall address them separately. People interested more
in "peering economics" may want to search forward for "II."
I have some issues with how BGP is used traditionally,
most notably with respect to how people tend to consider
ASes and how policies are denoted. Rather than pick on
how things are done now, I'd rather describe how I think
about BGP and the directions it ought, in my opinion, to
evolve.
Firstly, the important thing to note is that
historically there has never been a perfect
definition for "peering" vs "transit", although
many people (me, in particular) tried to make things
simple by treating most peers in one particular way
and everyone else in a completely different way.
However, given the cost of international connectivity
and the various things ICM has done for other fednets,
particularly during the NSFNET transition, this was
never universally applied.
Moreover, with multihoming being all the rage, it would be
a stupid engineering philosophy that built things with the
belief that a clean and simple peering/transit distinction
would be permanent.
All this has driven and continues to drive the evolution
of BGP and the tools to manipulate what's transmitted to
and accept from BGP neighbours of all sorts in interesting
ways.
Therefore, there is perhaps only a very blurry,
connotation-laden definition for "peering" and "transit".
Secondly, I like to think of a RIB and the NLRI that is
exchanged via BGP in terms of a set of reachable items
which can be described in terms of geometry, and which, in
fact, describe a partial topology of the Internet. This
thinking lends itself more towards calculus rather than
algebra, yet most of the tools for manipulating what
enters and exits a RIB or what NLRI is exchanged tends to
be described much more algebraically.
Moreover, it is not too much of a stretch to consider a
RIB as something along the lines of a relational database,
particularly if one is used to doing all sorts of kinky
filtering and attribute modification.
The bright thing about thinking along these lines is that
typically anything that can be expressed in terms of
relational calculus can be expressed in terms of the
relational algebra. I would therefore not argue that the
tools available for dealing with BGP in a very general
sense are broken, only that they are unweildy.
Given the difficulty, in two well-known implementations,
of introducing a policy of, "from this neighbour accept
a set of prefixes in which all prefixes are at least 24
bits long, and in which all prefixes in the range of
206.0.0.0 to 223.255.255.255 are at least 19 bits long",
and then debugging and modifying it afterwards, the
algebraic approach tends to be painful.
This kind of thing is fairly common, too; there are two
common implementations of, "to this peer, send only my
customers' routes", hinging on the definition of "only
my customers' routes". The first, used at provider A,
is to keep a list of all the ASes with which A peers,
and to prevent any prefixes originating in or behind
each of these prefixes from being reannounced. The
second, which appears to be more common, is when provider
B maintains an explicit list of all the ASes which are
downstream from B, and announce only prefixes originating
from those ASes.
There are some variations on approach A and B, some of
which deal with prefixes as well as or instead of ASes.
There is also a third approach, which is newer, and
similar to approach A. Provider C tags all NLRI received
from its peers with a community attribute, and ensures
that nothing with that community attribute is passed on to
its peers.
With respect to accepting inbound announcements there are
also multiple approaches. One, favoured by A, is to tweak
the list of all peers' ASes to prevent any peer from
announcing prefixes that another peer would otherwise
announce. A second approach, which appears to have been
abandoned everywhere but at the "edges", was to accept
from any particular peer only specific, exact prefixes.
This approach has mutated into, "allow only this
particular mapping of prefix to originating AS", with some
modifications to perform some AS-based operations along
the lines of the next approach. This next approach is to
specify a list of AS paths or fractional AS paths which
are "acceptable" from any particular peer, and is
in some use by large international transit providers.
Finally, there is the "I accept everything" approach,
which essentially does no inbound filtering from peers,
because peers are expected to be trustworthy.
In each of these cases, a simple policy of send
everything, or send only my stuff, is fairly tractable.
However, implementing policies such as:
to peer P send all my stuff, and also X's stuff,
since I do mutual back-up with X
from peer P, accept all their stuff, and also X's
stuff
especially when X is a large number of providers, becomes
awkward.
Moreover, policies such as:
to peer Q send everything that they're paying for
transit to
becomes awkward when considering how to propagate Q's
prefixes towards all the places they're paying for transit
to, but not towards all the other prefixes.
One of the things I have thought about from time to time
is doing a merge of gated and quel from University Ingres
or the like. I would far prefer to match against several
things in the RIB simultaneously to build up a set of
prefixes to announce, or to match against several things
in a received announcement in order to determine whether
to install a particular prefix into the RIB to begin with,
possibly tweaking particular attributes.
In fact, this is done rather regularly now, using
route-maps in IOS, however it's ugly, particuarly as
complexity increases, and one wants to do many operations
on large numbers of sets of incoming NLRI, such as
modifying local preferences to prefer longer AS paths
which are to be preferred as backup or primary paths,
rewriting MEDs, making exceptions in prefix-length
filters, and the like.
In a sense, this appraoches the core of a series of
discussions elsewhere, revolving around "traditional"
implementations of route-selection.
I am a member of the Church of Explicit Selection, where
the people controlling a particular set of routers should
be able to make their own choices of what routes to use
and what routes to propagate. The current crusade
involves ripping apart MED, prefix length and AS path
length as selectors, and making it possible to use these
to modify a local metric (i.e., local preference).
The traditionalists sometimes get hung up on what should
be done in the case of ties, however, there are a couple
of things I think there is broad agreement upon:
-- prefix length should not be the ultimate determinant
(The folks who have been having problems with
another provider announcing subnets of their
aggregates might like this...)
-- AS paths should be little more than a trail of
breadcrumbs used to prevent announcement loops,
and AS path _length_ should not necessarily
affect a routing decision
-- AS paths could benefit from a per-AS metric
which could help in the use of AS path _content_
to modify the local metric.
(The idea is that one may want to attract
traffic towards a particular AS1.AS2 pair
or deflect it away from a particular AS3.AS4 pair)
-- MEDs seem to be used in three ways:
i. to attract traffic to one or the other
box where there are multiple boxes in
use by the same AS at an exchange point
(defeating the use of lowest-router-ID
as tiebreaker)
ii. to attract traffic towards one
exchange point or the other or
to repel traffic away from one
exchange point or the other
iii. transmitting information about one's
internal topology to a peer, so that
the peer can adjust routing decision-
making appropriately
(There has been some debate about whether
overloading attributes with multiple semantic
behaviours is stupid or useful; I am of the
former opinion, and think that each of these
could usefully be its own attribute)
Moreover, MED use has changed and likely will
change again, and the current standard
interpretation of MED is not quite right.
Each of these changes is principally driven by the
evolution of the way routes acquired by BGP are chosen and
propagated, and reflect a move towards treating BGP
routing information as a sort of relational database.
If the tools for choosing and propagating BGP routing
information likewise evolve towards a relational set
theory, the distinction between "peer" and "transit" as
they tend to be thought of these days could easily be
thrown out the window.
I would like this; I like a model wherein it becomes
a combination of engineering sensibility and business
decision-making about what is routed where, and that
requires an easy to express a program that does
integration and differentiation of sets of NLRI, based on
large numbers of attributes, from individual ASes,
fractional and complete AS paths, prefixes, mask length,
announcing router, recent stability, internal and external
topology, time of day and phase of moon, and uses these
sets when generating a RIB and when propagating NLRI
onwards, and other operations, such as forming aggregates.
Current tools does not make this easy, even for people who
know what they're doing.
Finally, all of this leads me to conclude that on a purely
technical level, the denotative difference between "peer"
and "transit" is so obscured as to be meaningless, except
that perhaps with current tools, "transit" is easy to
configure and "peering" is not. I would prefer,
therefore, not to think of these as technical terms at
all, but rather economic ones.
This of course leads to the next thing:
II.
Economically, the definition of "peering" vs "transit" is
easy. The former is currently free, while the latter is
not. Alternatively, the latter is what one sells to
customers, the latter is what people would like to sell, but
can't figure out how to price properly.
I like considering the following simple equation:
Value == Bandwidth + Reachability + Service
Until recently, the bandwidth available at an exchange
point was impossible to limit; an aggressive sender could
swamp a particular provder's connection, as could an
aggressive receiver. Moreover, as people have been
discussing, without the assistance of an EP operator,
there is no way to prevent people from sending traffic
into your network without your permission, and EP
operators don't seem to want to wade into the issues that
have surfaced on the NANOG list today and yesterday.
With some development work that stemmed out of a
conversation with Fred Baker in 11.2 on its way, this
problem is softened somewhat; with a crunchy IOS box, one
can specify rate limits on subinterfaces. If traffic
across a particular subinterface exceeds a threshold,
the router will implement a dropping strategy.
This has enormous utility not only to people selling
fractional DS3s and the like, but also to people at
LAN-based EPs. This should add to any LAN-based EP
customer the one interesting feature of ATM-based EPs: the
ability to select one's peers and to control how much
bandwidth they can use. It also should allow one to
_monitor_ how much bandwidth each peer is consuming.
One can therefore control the Bandwidth variable in my
simple equation, and do interesting things. One might
consider that for purposes of traffic planning, 256kbps or
1.5Mbps might be made available for free to someone at an
exchange point who is willing to accept a reduced level of
Service compared to a more traditional customer.
If a "peer" at an EP is using more than that amount of
bandwidth, one might want to negotiate an exchange of
money. For example if A and B meet at an EP and are running
into the drop threshold because of the amount of traffic
being exchanged, the one with the screaming customers may
be asked to pay a certain amount of money per increment of
additional bandwidth. This may always be B's customers,
or it may flip-flop between A's customers and B's
customers over time, and thus should be considered a
business negotiation on both parties' part.
Finally, at some point it may be observed that X is a
large enough chunk of Y's total exchange-point traffic
that migrating that traffic elsewhere, perhaps to a
private peering, would appear to make engineering and
possibly business sense.
Now comes the fun bit: reachability.
I would consider that if A and B are peering and B is
paying some fee for some level of maximum bandwidth, that
B is a customer of A whether one wants to use that term in
marketing literature or not. In this case it seems much
easier for A to treat B as any other customer, providing
"transit" (that is, don't do much by way of filtering).
Alternatively, if there is no money changing hands, it
seems prudent for each of A or B to do some filtering to
avoid providing "transit".
The scope of that filtering is the heart of the long
discussion of charging for routing which has been going on
in various places for some time now, and would take too
long for me to get into here. However, it's clear that
reachability has value, and that it should be possible to
determine a price point for the exchange of that
reachability, in theory.
However, I like the simple case, and figure it should
solve some current heartburns in relatively short order.
If big providers were to come up with a "product" wherein
people wanting to talk at exchange points could easily be
configured up as customers, with prices being determined
by the maximum amount of bandwidth available before
dropping begins and the amount of response-time one wants
in the escalation path at the larger provider's NOC, this
may solve the majority case of small provider frustrations
with the Effectively No Peering At All policies of some of
the larger providers. This appears to be a win for all
parties (including Cisco, who gets to sell 75xx-es that
can handle the bandwidth-limitation feature, and the
various proposals and counter proposals wrt how to deal
with the cost and price of reachability).
The large parties get to make some income to at least
offset the costs of maintaining network infrastructure at
and around EPs, and can bring up extra connectivity
without losing engineering control and oversight of the
infrastructure and traffic flows, the EP operators get
more interest in EPs since there will be less hesitation
on the part of large parties to appear at them, the small
providers get to buy cheaper, supported service while
calling it "peering", and can control how their traffic
moves around at EPs, and can complain if they are paying
a particular amount of money for a service they are not
getting because of capacity problems within large
providers.
I expect to see this sort of thing rolled out fairly soon,
although I no longer know who will be the first to
announce it formally. Informally, this type of service
has been provided to some degree (modulo bandwidth
controls) by a fair number of parties at EPs, notably CAIS.
Sean.