"Kent W. England" <kwe@geo.net> writes:
This is true, but the definition of the top of the hierarchy is arbitrary and is the nexus of the debate about "topological" versus geographical addressing, which I interpret as "ISP at top" versus "exchange point at top" hierarchies. Both are valid topological hierarchies.
As tli pointed out the top of the hierarchy is not arbitrary, it must be default free. In a hierarchical routing system there are three forwarding directions to consider: intra-area ("lateral"), default ("upwards") and sub-area ("aggregate" or "downwards"). At the top of a hierarchy you cannot have an upwards forwarding direction, therefore the entire address space must be intra-area or presented as an aggreagate. If you consider an addressing structure that looks like this: level-3-area-id:level-2-area-id:level-1-area-id:final-flat-id and in an internetwork with three levels of hierarchy, this pattern is easy to consider. A level-2 router may have some things directly attached to its level-2 area, including its peer routers, and it would carry routes towards them, which probably would be in a flat routing table. Among the reasons it needs these routes is that it has to know where to send traffic towards each of the level-1 areas that are attached to its own area, and it has to know where to send "default" traffic towards one or more of its in-area peers that have level-3 connections. Each such level-3 router would have to know how to foward to any given level-2 area, and therefore would need to carry routes for each level-3/level-2 gateway. Each level-1 router, by contrast, only needs to know how to route towards all the things in its area, and how to reach at least one level-2 router. (One can be a little tricky and have a single level-1 area connected to multiple level-2 routers in different level-2 areas, in which case better routing optimality may be obtained by the level-1 router carrying some level-2 routing information. This would be analogous to Yakov Rekhter's "route pull".) However, the minimum set of routes to carry is that which can cause traffic to be forwarded along a strict single-path tree-like hierarchy. This requires that each area be fully contiguous at all times. Other routes may be introduced in various places to alter this behaviour if that is desirable, or to effect IS-IS style partition repair. In order for the hierarchical routing system to scale the number of entities known in any given area must be small enough to route on in what is conceptually a flat manner. That means that there are bounds on the number of level-n to level-n-minus-one areas, and this in turn requires that the addressing scheme allow for a deep enough hierarchy. Consequently, the number of things in the top, default-free hierarchy is always going to be limited, no matter what "type" of hierarchical allocation scheme is proposed. The further requirement that any given area be fully contiguous means that the "top" of the hierarchy must be self-repairing. In other words, as tli pointed out, if you have a switch in some convenient geographical location, like the Grenwich Observatory or the UN building or MAE-EAST, your entire routing system fails if that switch or that location fails. Consequently, to avoid the single point of failure, there would be a desire to have several diverse geographical locations to act as the top of the hierarchy. The problem, again, is that any given area must be fully contiguous, and this implies that any level-n router connecting to one of these diverse locations would have connectivity to every other level-n router, so that this top level n area would be contiguous. One could propose to implement this as a big bridged network. The original DGIX proposal was along these lines. Operational experience with much smaller but still big bridged exchange points has demonstrated pretty much conclusively that this is a Really Really Bad Idea. One could propose to implement this using the native protocol, effectively connecting all of these exchange points into a level-n-plus-one area of its own. As long as the level-n-plus-one area could route to all the level-n areas at all times, this would work just nicely, on a technical basis. The difference therefore between your "ISP at top" option and your "exchange point at top" option is that in a hierarchical addressing system, which is the only way we currently know how to scale a global internetwork, is merely in the choice of words. Whatever is at the top has to connect reliably and continuously all the things that are one step down from the top, and simple belt-and-suspenders implies geographical diversity. Thus, the top of the hierarchy may be expressed as a big, geographically diverse bridged network connecting all the "next-level-down" routers, a single big geographically diverse routed network comprising a single area, or a meshed concatenation of the "next-level-down" routers in such a way that robust interconnectivity among them is maintained at all times. The choice is probably best made on the basis of reliability and cost, but experience shows that it is likelier made on the basis of politics, autonomy/mistrust of other operators, marketing goals, and possibly cost. If it were possible for two routing areas to cleanly synthesize a next-level-up area, which probably implies the use of variable length addresses, then it strikes me intuitively that better routing hierarchy than is likely to be cobbled together through the deployment of physical infrastructure can be enjoyed, keeping the number of routing entries needed by any given router anywhere in such an Internet to a minimum. With a variable length addressing scheme, in other words, one can consider a set of operations which can be summarized as "make-hierarchical" or "make-lateral". The obvious and increasingly important first baby step in an evolutionary path towards a scalable Internet is, to quote Noel Chiappa, "to make the world safe for NAT, by making all end-to-end functions use the DNS name; e.g. for authentication, pseudo-headers for checksums, etc, etc." As he continued, this can be justified solely on the basis of working better with NAT, which solves some real-world problems now, and which is being used now. There is lots more to discuss. Is big-internet still in post-Bass trauma? If not, let's discuss it there, or privately. However, to tie in some tiny degree of NANOG relevance, and to emphasise through repetition, the idea of using a large bridged network has been broken through the history of exchange points, particularly since the lovely days when people didn't learn from Milo's FIX upgrade path and began doing multimedia bridging. Single exchange points fail, so avoiding the large briged network by having a single exchange point be the "top" of a hierarchy won't work. Therefore, the current hierarchy implemented in provider-based addressing with some coordination to preserve some degree of geographic alignment of addresses (through ARIN, RIPE and APNIC, and large-ISP allocation strategies), is almost certainly the most appropriate one. That is to say, we got CIDR pretty much right. Sean.