Re: US-Asia Peering

3 Jan 2003

      Rearranged slightly.
...
What are the technical issue with extreme long distance (transoceanic) 
peering?
In particular, what are the issues interconnecting layer 2 switches across 
the ocean for the purposes of providing a global peering cloud
using:
In the generic sense, the issues are largely the same as
interconnecting to the L2 switch in the customer's cage (operated by
the customer as an aggregation device) or the L2 switch implementing
another exchange fabric in the same building or metro.

Complex L2 topologies are hard to manage, especially when the devices
that implement that topology are not all managed by the same entity.
L2 has poor policy knobs for implementing administrative boundaries;
if such a boundary is involved, the events that you need to anticipate
are largely the same whether the switches are separated by a cage wall
or an ocean. The auto-configuring properties of L2 fabrics (like MAC
learning, or spanning tree) that make it an attractive edge technology
can be very detrimental to smooth operation when more than one set of
hands operates the controls.

An exchange point is, quite literally, an implementation of an
administrative boundary; the desire of customers to use L2 devices
themselves (for aggregation of cheap copper interfaces toward
expensive optical ones, use of combined L2/L3 gear, or whatever other
reason) means that the L2 fabric of any Ethernet-based exchange has
potentially as many hands on the controls as there are customers.

So, good operational sense for the exchange point operator means
exercising as much control over those auto-configuring properties as
is possible; turning them off or turning automatic functions into
manual ones. Did I mention that L2 has lousy knobs for policy control?
(They're getting a little better, so perhaps whatever is a notch
better than "lousy" is appropriate).

One of the ones that you have to turn off is spanning tree (read the
thread from a few months back on the hospital network that melted down
for background material). That means that you have to build a
loop-free topology out of reliable links, which you can get with all
three of the technologies you mention, but you have to build a
loop-free topology. In order to use inter-metro connectivity
efficiently, you are limited to building L2 patches that each
implement pairwise connectivity between two metros. That makes this:
...
0) vanilla circuit transport to interconnect the switches
hard, because your interior connectivity is dedicated to one of those
pairwise combinations (hard, but not impossible, assuming you have
some capex to throw at the problem). The pairwise limitation also,
indirectly, puts the kibosh on using this fabric as a means to pretend
that a network has a backbone in order to qualify for peering that it
wouldn't get otherwise.

That leaves these two:
...
1) MPLS framed ethernet service to interconnect the switches
2) tunnelled transport over transit to interconnect the switches
which will carry the exchange point traffic over an L3 (okay, so MPLS
is "L2.5") network; in addition, you get the benefit of being able to
have all the L3 knobs available in the interior to do traffic
engineering. Both options perform better when the interior MTU can
accomodate the encapsulation protocol plus payload without
fragmenting, so someone is operating routers with SONET interfaces in
this equation.

Qui bene?

- The operator of the L3 network that carries the inter-EP fabric gets
  revenue.

- The people who peer using this L2 fabric get to avoid some transit,
  but I would argue that it is only to reach providers that are
  similarly desirous of avoiding transit, since this won't help the
  upwardly mobile with the geographic diversity component of getting
  to the "next" tier.

Who loses?

- Transit providers who came to the exchange point for the purpose of
  picking up transit sales.

- If the exchange point operator is the one carrying the traffic, they
  lose for competing with their customers in the previous bullet; they
  will have taken the first steps on the path from being an exchange
  point operator to being a network-plus-colo provider (where they'll
  compete with the network-plus-colo providers just coming out of
  bankruptcy with all their debt scraped off).

So far, there has been an assumption that the provider of inter-EP
connectivity is willing to portion it out in a manner that is
usage-insensitive for the participants. I don't believe that the glut
of capacity or the other expenses that come with operating an L3
network has driven the costs so low that the resulting product is "too
cheap to meter." If that is the case, then delivering robust,
auditable service is better implemented by connecting the customers up
to the L3 gear and implementing their L2 connections as pairwise
virtual circuits between customers (so you can be sure you're not
paying remote rates to talk to a local customer, or billing local
rates to traffic that a customer exchanges over the wide area). We
effectively just took the exchange point switch fabric out of the
equation and layered the peering fabric as VCs (MPLS or tunneled
Ethernet) on a transit network where each customer is connected on a
separately auditable interface. At that point:

- the people doing the peering might just as well use their transit,
  unless they work together (in pairwise combinations, so this doesn't
  scale well) to get on-net pricing from a common transit provider.

- the people doing the peering might as well just buy a circuit, if
  they're going so cheap, or build out their own network in hopes of
  making the grade with respect to the geographic diversity component
  of other people's peering requirements.
...
I understand that there is a real glut of AP transoceanic capacity, 
particularly on the Japan-US cable where twice as much capacity is idle as 
is in use. This has sent the price point down to historic levels, O($28K/mo 
for STM-1) or less than $200/Mbps for transport! This is approaching an 
attractive price point for long distance peering so, just for grins,...
Consumed at what quantities? Parceling this out at single-digit-Mb/s
quantities increases the cost at the edge of the network that delivers
this.
...
Are there transport providers that can provide a price point around 
$100/Mbps for transport capacity from Tokyo to the U.S. (LAX/SJO) ?
SJO is in Costa Rica, bud. :-)

Stephen
VP Eng., PAIX

Re: US-Asia Peering

Stephen Stuart