Re: Regarding global BGP community values
To go forward, I'd like to say here that the new draft are prepared now concerning the global communities.
While it's clear that a considerable amount of disagreement exists regarding transitive communities dynamically doing things, it's extremely simple for providers to just not pay attention to them. Another potential application for global transitive communities, which is likely even more debatable than path selection issues, is using them in conjunction with MEDs and "more specifics" of provider aggregates (to fix some of the brokenness of aggregates and MEDs) in order to provide a safety net for potential route leaking. This could be advantageous for several reasons. I think msot of us agree that a mechanism to re-introduce intelligence to "best-exit" type routing configurations is a good idea. It's a good idea not only because some providers want to perform "best-exit" as a value-add to their services, but also because it makes sense in order to provide the ability to compensate a peer (who's fussing about settlements and traffic asymmetries) by carrying the traffic longer on your network. It could also assist in more optimally regionalizing traffic exchange between networks, especially with the ever-growing geographically distributed inter-connectivty provided by direct interconnections. The offshoots with providing more specifics to peers are obvious, I believe. One problem is potential significant growth in routing and forwarding tables sizes, which was one of the primary drivers for aggregation techniques in the first place. If this is a problem, the provider can always opt to not accept the more specifics from the peer. The other problem I can think of at the moment, which is likley more of a concern for most folks, is wrt providers leaking more specifics, either via BGP customers, or directly. This could be a concern because perceived "clue" of a peer, as well as simple errors in configurations, etc... This can be addressed to some extent by providing drafts that discuss these issues. Then, there are the problems those accepting MEDs have regarding a networks ability to associate intelligent values with MEDs, or provide a *only* reasonable number of prefixes (versus "more specifics" expanding to thousands of /24s and longer). Of course, a large piece of this would be reliant upon a decent IP allocation plan that at worst provides router-based aggregates for more specifics, and preferrably PoP-based. This is difficult, of course, with all the older networks, acquisitions, etc... Anyways, if a set of transitive communities were defined to provide a safety net that could catch the more specifcs, or some other mechanism were created to provide the same capabilities, I'd be interested. I believe AboveNet and a few others actually have experience with accepting more specifics, and since I missed the BOF in Montreal (and no information is available on the web server as of yet?), I'd be interested in hearing what folks oinions are regarding this. -danny
One problem is potential significant growth in routing and forwarding tables sizes, which was one of the primary drivers for aggregation techniques in the first place. If this is a problem, the provider can always opt to not accept the more specifics from the peer.
The growth itself do not cause the problems, but in conjunction with the poor router implementation (which cause 60,000 routes to use 30 MB of the RAM - that means 500 bytes for every prefix -:) and numerous memory leaks in the router implementation cause the problem. If we look around, we'll see existing computers (including embedded ones) have not CPU and memory problems, and all problems we see with the routers are mainly caused by the bad implemented text. On the other hand, you are right if you speak about the stability or loop-less routing - extra specifics cause a lot of instability. But it's slightly out of this issue. Talking about the global transitive communities, we should mention one existing problem. Communities are used now for boths internal and global control (I make my peering announces as 2118:11, I mark those announces which should not be advertised to the other peers, as 2118:12, for example, TELIA use communities widely, and so on). On the other hand, we have not any mechanism how to filter communities out when we advertise prefixes - in case of CISCO, I can - - don't send communities at all - set new communities instead of existing - add new communities to the existing This is the chance to see the growing number of useless communities if we introduce the set of transitive ones. And this make some mechanism of _community filtering_ very desirable. Note, this days we see the turn from the AS-based routing to the community-based and MED-based, because: (1) AS-es themself do not provide any protection against the mistakes - they are not used in the routing; this cause prefix-based filtering very desirable (at least at the downstream links); (2) AS list growth quickly, and (even if we build access lists by the RIPE or RA-DB or <ANY>...-DB data base) we can't maintain such big pieces of configuration; (3) AS-base control restricts the main principle of the effective routing control - _analyze everything careful, but ONCE; then add your labels and use this labels_. The communities are one type of such _labels_. And, if we are facing to the some future BGP-5 protocol, and remembering about the compatibility, the new terms _local community, global community, transitive communities_ (replace _community_ to any other world, if you want) became very desirable. Note - we just have _PRIVATE-AS_ and some ways to filter them out; now it's time to have PRIVATE_COMMUNITIES as well.
The other problem I can think of at the moment, which is likley more of a concern for most folks, is wrt providers leaking more specifics, either via BGP customers, or directly. This could be a concern Note - it's often when we leak such specifics _on purpose_. For example, see 144.206/16 - we should leak some specifics from this block to make routing _correct_ (some branchs have commercial-quality access, some branches have not).
On the other hand, the more you restrict allowed (in the Internet) prefixes, the less effectively you does use address space. This is the stick with the two ends. I believe we should see more and more /20, /21 and even /24 prefixes in the network in the next few years - because the CPU and memory could be increased easily, but the address space can not. Alex (Roudnev).
because perceived "clue" of a peer, as well as simple errors in configurations, etc... This can be addressed to some extent by providing drafts that discuss these issues.
Then, there are the problems those accepting MEDs have regarding a networks ability to associate intelligent values with MEDs, or provide a *only* reasonable number of prefixes (versus "more specifics" expanding to thousands of /24s and longer). Of course, a large piece of this would be reliant upon a decent IP allocation plan that at worst provides router-based aggregates for more specifics, and preferrably PoP-based. This is difficult, of course, with all the older networks, acquisitions, etc...
Anyways, if a set of transitive communities were defined to provide a safety net that could catch the more specifcs, or some other mechanism were created to provide the same capabilities, I'd be interested.
I believe AboveNet and a few others actually have experience with accepting more specifics, and since I missed the BOF in Montreal (and no information is available on the web server as of yet?), I'd be interested in hearing what folks oinions are regarding this.
-danny
"Alex P. Rudnev" wrote:
The growth itself do not cause the problems, but in conjunction with the poor router implementation (which cause 60,000 routes to use 30 MB of the RAM - that means 500 bytes for every prefix -:) and numerous memory leaks in the router implementation cause the problem. If we look around, we'll see existing computers (including embedded ones) have not CPU and memory problems, and all problems we see with the routers are mainly caused by the bad implemented text.
I, and the rest of the Internet community, would like to invite you to start a router company and show us how it can be done with far less memory. ;-) More seriously, you might take a look around and note that there are not a great deal of difference in the amount of memory needed to support a prefix across the various well-known implementations. Which is not to say that we're blameless, just that a lot of good people have worked hard and are all equally incompetent at conserving memory while simultaneously producing a scalable, stable, feature-rich implementation. Regards, Tony
see existing computers (including embedded ones) have not CPU and memory problems, and all problems we see with the routers are mainly caused by the bad implemented text.
I, and the rest of the Internet community, would like to invite you to start a router company and show us how it can be done with far less memory. Sorry, I forget -:); on the other hand, if you want to build the router wasting 8 bytes for every BGP prefix, you no doubt do it (don't asnwer _buy more memory instead, it's cheaper_ - no one object this).
Speaking about the CISCO's, no one thought about the memory when realised BGP there; the worst failures in the CISCO history was caused by some _temporary_ prefix leaks which caused routers to eat memory _permanently_ (last case was in our network 1 week ago when we leaked extra 20,000 prefixes to our access routers; it was fixed in a 5 minutes, but more then half of them get stomachache and refuse to work even when this leak disappeared... I don't blame the software designers, they must found the compromise between the stability, time_to_implement, cost and memory, but I'd like to highlight that they really did not concerned about such _cheap_ thing as memory at all). (let me to put -:) here). But you hided my idea that the less prefixes we allow to be in the global Internet, the less effectively we use address space; memory can be upgraded (not easily due to bad router's design, through /compare with the PC, and you should aggreee), the address space can not at all. This means we are facing to the growth routing tables no matter if we dislike it. Alex.
Regards, Tony
Aleksei Roudnev, the head of Network Operations Center, Relcom, Moscow (+7 095) 194-19-95 (Network Operations Center Hot Line),(+7 095) 230-41-41, N 13729 (pager) (+7 095) 196-72-12 (Support), (+7 095) 194-33-28 (Fax)
Speaking about the CISCO's, no one thought about the memory when realised BGP there; the worst failures in the CISCO history was caused by some _temporary_ prefix leaks which caused routers to eat memory _permanently_ (last case was in our network 1 week ago when we leaked extra 20,000 prefixes to our access routers; it was fixed in a 5 minutes, but more then half of them get stomachache and refuse to work even when this leak disappeared... I don't blame the software designers, they must found the compromise between the stability, time_to_implement, cost and memory, but I'd like to highlight that they really did not concerned about such _cheap_ thing as memory at all). (let me to put -:) here).
On behalf of {myself, Paul, Ravi, Enke}, I assure you that Cisco's BGP has _always_ been worried about conserving memory. Tony
work even when this leak disappeared... I don't blame the software designers, they must found the compromise between the stability, time_to_implement, cost and memory, but I'd like to highlight that they really did not concerned about such _cheap_ thing as memory at all). (let me to put -:) here).
On behalf of {myself, Paul, Ravi, Enke}, I assure you that Cisco's BGP has _always_ been worried about conserving memory. BGP - yes, total architecture - not at all. Even very simple ensuranses _don't allow the process eating already 90% of the memory to eat last 10%_ and _defragment the garbage_ was not realised, and if some (BGP for example) process became crazy and over-eat something, not one can even log-in and say _reload_ -:).
Tony
Aleksei Roudnev, Network Operations Center, Relcom, Moscow (+7 095) 194-19-95 (Network Operations Center Hot Line),(+7 095) 230-41-41, N 13729 (pager) (+7 095) 196-72-12 (Support), (+7 095) 194-33-28 (Fax)
participants (3)
-
Alex P. Rudnev
-
Danny McPherson
-
Tony Li