I had hoped to be able to stay out of this particular argument, but several messages, notably Alan Hannan's, seem to need some commentary. I hope people will forgive editing lapses; the amount of revision time I had for this is nearly zero. There are two somewhat-related issues at play here. The first is the issue of peering vs transit, which goes to the heart of how BGP is used. The second is the issue of peer vs. customer and boils down to "cui bono?". I shall address them separately. People interested more in "peering economics" may want to search forward for "II." I have some issues with how BGP is used traditionally, most notably with respect to how people tend to consider ASes and how policies are denoted. Rather than pick on how things are done now, I'd rather describe how I think about BGP and the directions it ought, in my opinion, to evolve. Firstly, the important thing to note is that historically there has never been a perfect definition for "peering" vs "transit", although many people (me, in particular) tried to make things simple by treating most peers in one particular way and everyone else in a completely different way. However, given the cost of international connectivity and the various things ICM has done for other fednets, particularly during the NSFNET transition, this was never universally applied. Moreover, with multihoming being all the rage, it would be a stupid engineering philosophy that built things with the belief that a clean and simple peering/transit distinction would be permanent. All this has driven and continues to drive the evolution of BGP and the tools to manipulate what's transmitted to and accept from BGP neighbours of all sorts in interesting ways. Therefore, there is perhaps only a very blurry, connotation-laden definition for "peering" and "transit". Secondly, I like to think of a RIB and the NLRI that is exchanged via BGP in terms of a set of reachable items which can be described in terms of geometry, and which, in fact, describe a partial topology of the Internet. This thinking lends itself more towards calculus rather than algebra, yet most of the tools for manipulating what enters and exits a RIB or what NLRI is exchanged tends to be described much more algebraically. Moreover, it is not too much of a stretch to consider a RIB as something along the lines of a relational database, particularly if one is used to doing all sorts of kinky filtering and attribute modification. The bright thing about thinking along these lines is that typically anything that can be expressed in terms of relational calculus can be expressed in terms of the relational algebra. I would therefore not argue that the tools available for dealing with BGP in a very general sense are broken, only that they are unweildy. Given the difficulty, in two well-known implementations, of introducing a policy of, "from this neighbour accept a set of prefixes in which all prefixes are at least 24 bits long, and in which all prefixes in the range of 206.0.0.0 to 223.255.255.255 are at least 19 bits long", and then debugging and modifying it afterwards, the algebraic approach tends to be painful. This kind of thing is fairly common, too; there are two common implementations of, "to this peer, send only my customers' routes", hinging on the definition of "only my customers' routes". The first, used at provider A, is to keep a list of all the ASes with which A peers, and to prevent any prefixes originating in or behind each of these prefixes from being reannounced. The second, which appears to be more common, is when provider B maintains an explicit list of all the ASes which are downstream from B, and announce only prefixes originating from those ASes. There are some variations on approach A and B, some of which deal with prefixes as well as or instead of ASes. There is also a third approach, which is newer, and similar to approach A. Provider C tags all NLRI received from its peers with a community attribute, and ensures that nothing with that community attribute is passed on to its peers. With respect to accepting inbound announcements there are also multiple approaches. One, favoured by A, is to tweak the list of all peers' ASes to prevent any peer from announcing prefixes that another peer would otherwise announce. A second approach, which appears to have been abandoned everywhere but at the "edges", was to accept from any particular peer only specific, exact prefixes. This approach has mutated into, "allow only this particular mapping of prefix to originating AS", with some modifications to perform some AS-based operations along the lines of the next approach. This next approach is to specify a list of AS paths or fractional AS paths which are "acceptable" from any particular peer, and is in some use by large international transit providers. Finally, there is the "I accept everything" approach, which essentially does no inbound filtering from peers, because peers are expected to be trustworthy. In each of these cases, a simple policy of send everything, or send only my stuff, is fairly tractable. However, implementing policies such as: to peer P send all my stuff, and also X's stuff, since I do mutual back-up with X from peer P, accept all their stuff, and also X's stuff especially when X is a large number of providers, becomes awkward. Moreover, policies such as: to peer Q send everything that they're paying for transit to becomes awkward when considering how to propagate Q's prefixes towards all the places they're paying for transit to, but not towards all the other prefixes. One of the things I have thought about from time to time is doing a merge of gated and quel from University Ingres or the like. I would far prefer to match against several things in the RIB simultaneously to build up a set of prefixes to announce, or to match against several things in a received announcement in order to determine whether to install a particular prefix into the RIB to begin with, possibly tweaking particular attributes. In fact, this is done rather regularly now, using route-maps in IOS, however it's ugly, particuarly as complexity increases, and one wants to do many operations on large numbers of sets of incoming NLRI, such as modifying local preferences to prefer longer AS paths which are to be preferred as backup or primary paths, rewriting MEDs, making exceptions in prefix-length filters, and the like. In a sense, this appraoches the core of a series of discussions elsewhere, revolving around "traditional" implementations of route-selection. I am a member of the Church of Explicit Selection, where the people controlling a particular set of routers should be able to make their own choices of what routes to use and what routes to propagate. The current crusade involves ripping apart MED, prefix length and AS path length as selectors, and making it possible to use these to modify a local metric (i.e., local preference). The traditionalists sometimes get hung up on what should be done in the case of ties, however, there are a couple of things I think there is broad agreement upon: -- prefix length should not be the ultimate determinant (The folks who have been having problems with another provider announcing subnets of their aggregates might like this...) -- AS paths should be little more than a trail of breadcrumbs used to prevent announcement loops, and AS path _length_ should not necessarily affect a routing decision -- AS paths could benefit from a per-AS metric which could help in the use of AS path _content_ to modify the local metric. (The idea is that one may want to attract traffic towards a particular AS1.AS2 pair or deflect it away from a particular AS3.AS4 pair) -- MEDs seem to be used in three ways: i. to attract traffic to one or the other box where there are multiple boxes in use by the same AS at an exchange point (defeating the use of lowest-router-ID as tiebreaker) ii. to attract traffic towards one exchange point or the other or to repel traffic away from one exchange point or the other iii. transmitting information about one's internal topology to a peer, so that the peer can adjust routing decision- making appropriately (There has been some debate about whether overloading attributes with multiple semantic behaviours is stupid or useful; I am of the former opinion, and think that each of these could usefully be its own attribute) Moreover, MED use has changed and likely will change again, and the current standard interpretation of MED is not quite right. Each of these changes is principally driven by the evolution of the way routes acquired by BGP are chosen and propagated, and reflect a move towards treating BGP routing information as a sort of relational database. If the tools for choosing and propagating BGP routing information likewise evolve towards a relational set theory, the distinction between "peer" and "transit" as they tend to be thought of these days could easily be thrown out the window. I would like this; I like a model wherein it becomes a combination of engineering sensibility and business decision-making about what is routed where, and that requires an easy to express a program that does integration and differentiation of sets of NLRI, based on large numbers of attributes, from individual ASes, fractional and complete AS paths, prefixes, mask length, announcing router, recent stability, internal and external topology, time of day and phase of moon, and uses these sets when generating a RIB and when propagating NLRI onwards, and other operations, such as forming aggregates. Current tools does not make this easy, even for people who know what they're doing. Finally, all of this leads me to conclude that on a purely technical level, the denotative difference between "peer" and "transit" is so obscured as to be meaningless, except that perhaps with current tools, "transit" is easy to configure and "peering" is not. I would prefer, therefore, not to think of these as technical terms at all, but rather economic ones. This of course leads to the next thing: II. Economically, the definition of "peering" vs "transit" is easy. The former is currently free, while the latter is not. Alternatively, the latter is what one sells to customers, the latter is what people would like to sell, but can't figure out how to price properly. I like considering the following simple equation: Value == Bandwidth + Reachability + Service Until recently, the bandwidth available at an exchange point was impossible to limit; an aggressive sender could swamp a particular provder's connection, as could an aggressive receiver. Moreover, as people have been discussing, without the assistance of an EP operator, there is no way to prevent people from sending traffic into your network without your permission, and EP operators don't seem to want to wade into the issues that have surfaced on the NANOG list today and yesterday. With some development work that stemmed out of a conversation with Fred Baker in 11.2 on its way, this problem is softened somewhat; with a crunchy IOS box, one can specify rate limits on subinterfaces. If traffic across a particular subinterface exceeds a threshold, the router will implement a dropping strategy. This has enormous utility not only to people selling fractional DS3s and the like, but also to people at LAN-based EPs. This should add to any LAN-based EP customer the one interesting feature of ATM-based EPs: the ability to select one's peers and to control how much bandwidth they can use. It also should allow one to _monitor_ how much bandwidth each peer is consuming. One can therefore control the Bandwidth variable in my simple equation, and do interesting things. One might consider that for purposes of traffic planning, 256kbps or 1.5Mbps might be made available for free to someone at an exchange point who is willing to accept a reduced level of Service compared to a more traditional customer. If a "peer" at an EP is using more than that amount of bandwidth, one might want to negotiate an exchange of money. For example if A and B meet at an EP and are running into the drop threshold because of the amount of traffic being exchanged, the one with the screaming customers may be asked to pay a certain amount of money per increment of additional bandwidth. This may always be B's customers, or it may flip-flop between A's customers and B's customers over time, and thus should be considered a business negotiation on both parties' part. Finally, at some point it may be observed that X is a large enough chunk of Y's total exchange-point traffic that migrating that traffic elsewhere, perhaps to a private peering, would appear to make engineering and possibly business sense. Now comes the fun bit: reachability. I would consider that if A and B are peering and B is paying some fee for some level of maximum bandwidth, that B is a customer of A whether one wants to use that term in marketing literature or not. In this case it seems much easier for A to treat B as any other customer, providing "transit" (that is, don't do much by way of filtering). Alternatively, if there is no money changing hands, it seems prudent for each of A or B to do some filtering to avoid providing "transit". The scope of that filtering is the heart of the long discussion of charging for routing which has been going on in various places for some time now, and would take too long for me to get into here. However, it's clear that reachability has value, and that it should be possible to determine a price point for the exchange of that reachability, in theory. However, I like the simple case, and figure it should solve some current heartburns in relatively short order. If big providers were to come up with a "product" wherein people wanting to talk at exchange points could easily be configured up as customers, with prices being determined by the maximum amount of bandwidth available before dropping begins and the amount of response-time one wants in the escalation path at the larger provider's NOC, this may solve the majority case of small provider frustrations with the Effectively No Peering At All policies of some of the larger providers. This appears to be a win for all parties (including Cisco, who gets to sell 75xx-es that can handle the bandwidth-limitation feature, and the various proposals and counter proposals wrt how to deal with the cost and price of reachability). The large parties get to make some income to at least offset the costs of maintaining network infrastructure at and around EPs, and can bring up extra connectivity without losing engineering control and oversight of the infrastructure and traffic flows, the EP operators get more interest in EPs since there will be less hesitation on the part of large parties to appear at them, the small providers get to buy cheaper, supported service while calling it "peering", and can control how their traffic moves around at EPs, and can complain if they are paying a particular amount of money for a service they are not getting because of capacity problems within large providers. I expect to see this sort of thing rolled out fairly soon, although I no longer know who will be the first to announce it formally. Informally, this type of service has been provided to some degree (modulo bandwidth controls) by a fair number of parties at EPs, notably CAIS. Sean.
If a "peer" at an EP is using more than that amount of bandwidth, one might want to negotiate an exchange of money. For example if A and B meet at an EP and are running into the drop threshold because of the amount of traffic being exchanged, the one with the screaming customers may be asked to pay a certain amount of money per increment of additional bandwidth. This may always be B's customers, or it may flip-flop between A's customers and B's customers over time, and thus should be considered a business negotiation on both parties' part.
I think Sean has summaried the arguments very well, and I agree with im in most respects. The issue as I see it is to be found in this paragraph above. Where a phone company exchanges traffic with another phone compnay and there is some form of settlement involved, this cannot apply to the IXs until some serious technology breakthroughs happen in terms of measurement of IP. The reason the telcos can figure out who owes what is because the routeing for both halves of the conversation are (a) symmetric and (b) you know who placed the call. In the Internet both parties are paying for their connection (I hope) and the distinction between who made the call and who is offering the service can only be made after quite some work. The source and destination ports are not enough. Also, does this mean that asymmetric routeing (the reason for multiple DS3s and hot-potato routeing) is a no no ? Are web sites like 800 numbers, in which case the called party should be charged... how do you hold this database... the list goes on and on. Just to add to the confusion... Regards, -- Peter Galbavy peter@wonderland.org @ Home phone://44/973/499465 in Wonderland http://www.wonderland.org/~peter/ snail://UK/NW1_6LE/London/21_Harewood_Avenue/
participants (2)
-
Peter Galbavy
-
Sean Doran