Re: Do ATM-based Exchange Points make sense anymore?
It appears that for analysis purposes one has to separate access from switching. How much payload one brings to the exchange depends on port speed and protocol overhead. In that light, Frame Relay can bring similar amount of payload as Ethernet (comparable overhead) and preserve good properties of ATM (traffic flow separation). Regards, nenad p.s. both juniper 160 and cisco gsr can handle oc-48 frame relay, and they don't seem to be frame relay switches
Date: Fri, 09 Aug 2002 13:42 -0700 (PDT) From: "William B. Norton" <wbn@equinix.com> To: Nenad Trifunovic <nenad.trifunovic@wcom.com> CC: nanog@merit.edu Subject: Re: Do ATM-based Exchange Points make sense anymore?
Can you, please, explain why you didn't consider Frame Relay based exchange in your analysis?
I don't have much insight into Frame Relay-based Internet Exchange Points ;-) The majority of IXes around the world are ethernet-based, with some legacy FDDI and a few ATM IXes. It is in these areas that I have done the most data collection. The same analysis could be applied to peering across WANs and MANs as compared with buying transit though. It might be interesting provided I can get some market prices for transport and ports.
Why look at ATM? Right now almost everyone I am speaking with is seeing massive drops in transit and transport prices, even below the points I quoted, but with no comparable price drop in ATM ports or transport into an ATM cloud. These forces lead to a point where a connection to an ATM IX makes no sense (from a strictly financial standpoint). I have another 10 folks to walk through the paper to make sure I'm not missing anything in the analysis, and I'll post to the list when the paper is available. If you are interested I'd love to walk you through it to get your take.
One point a couple other folks brought up during the review (paraphrasing) "You can't talk about a 20% ATM cell tax on the ATM-based IX side without counting the HDLC Framing Overhead (4%) for the OC-x circuit into an ethernet-based IX." Since the "Effective Peering Bandwidth" is the max peering that can be done across the peering infrastructure, this is a good point and has now been factored into the model and analysis.
Bill
On Fri, 9 Aug 2002, Nenad Trifunovic wrote:
It appears that for analysis purposes one has to separate access from switching. How much payload one brings to the exchange depends on port speed and protocol overhead. In that light, Frame Relay can bring similar amount of payload as Ethernet (comparable overhead) and preserve good properties of ATM (traffic flow separation).
What functionality does PVC give you that the ethernet VLAN does not? What is the current max speed of frame relay in any common vendor implementation (I'm talking routers here). -- Mikael Abrahamsson email: swmike@swm.pp.se
What functionality does PVC give you that the ethernet VLAN does not?
That´s quite easy. Endpoint liveness. A IPv4 host on a VLAN has no idea if the guy on the "other end" died until the BGP timer expires. FR has LMI, ATM has OAM. (and ILMI) Pete
What functionality does PVC give you that the ethernet VLAN does not?
That´s quite easy. Endpoint liveness. A IPv4 host on a VLAN has no idea if the guy on the "other end" died until the BGP timer expires.
FR has LMI, ATM has OAM. (and ILMI)
Adding complexity to a system increases its cost but not nec'ily its value. Consider the question: how often do you expect endpoint liveness to matter? If the connection fabric between your routers has an MTBF best measured in hours or days, then you've got bigger problems than you'll solve with LMI. If on the other hand the MTBF is best measured in months or years, then when it does fail the failure is likely to be *in* the extra complexity you added. -- Paul Vixie
Paul Vixie wrote:
Adding complexity to a system increases its cost but not nec'ily its value. Consider the question: how often do you expect endpoint liveness to matter?
The issue I'm trying to address is to figure out how to extend the robustness that can be achieved with tuned IGP's with subsecond convergence across an exchange point without suffering a one to five minute delay blackholing packets. Liveness is an issue when a box either loses coherency between software and hardware state on an interface or decides to reload all or part of the system without minding to reset the BGP TCP sessions before going away. I'd be happy to hear solutions that are in use and commonplace for this problem. Mostly I've seen "it's the other guys problem" as an answer and solution being migrating all connectivity to one ISP.
If the connection fabric between your routers has an MTBF best measured in hours or days, then you've got bigger problems than you'll solve with LMI.
If on the other hand the MTBF is best measured in months or years, then when it does fail the failure is likely to be *in* the extra complexity you added.
As far as I understand, this "complexity" just got added with Neighbor Discovery on IPv6. Which would solve this problem when properly propagated up the stack from ND to TCP and tweaking the ND timers down. No need to touch the BGP timers. Pete
warning: i've had one "high gravity steel reserve" over my quota. hit D now.
The issue I'm trying to address is to figure out how to extend the robustness that can be achieved with tuned IGP's with subsecond convergence across an exchange point without suffering a one to five minute delay blackholing packets.
why on god's earth would subsecond anything matter in a nonmilitary situation? are you willing to pay a cell tax AND a protocol complexity tax AND a device complexity tax to make this happen? do you know what that will do do your TCO and therefore your ROI? you want to pay this tax 100% of the time even though your error states will account for less than 0.001% of the time? you want to have the complexity as your most likely source of (false positive) error?
As far as I understand, this "complexity" just got added with Neighbor Discovery on IPv6.
if so, then, you misunderstand. -- Paul Vixie
On 10 Aug 2002, Paul Vixie wrote:
why on god's earth would subsecond anything matter in a nonmilitary situation?
It does when you start doing streaming anything, say TV or telephony. I agree that this wont be solved using any current L3 or above protocol since BGP takes quite a while to recalculate anyway. Any redundancy has to be pre-calculated or on a lower level, this is where for instance SRP/DPT claims excellence, guess same claims come from the MPLS crowd. I guess you have to pay a 50% tax on capacity to handle this whatever you do. Personally I agree with you, the KISS principle is golden here. Peering should be cheap, that is the only reason to do it, and therefore one does not want a lot of complexity that brings up the cost. Tweaking eBGP dead timers to 5-10 seconds works well in most cases. I have some idea about bringing together some of the signalling from DPT/SRP into a switching ethernet environment (for instance, have some kind of signalling between switches (propagated to hosts) that a certain port has gone down and notify that certain mac addresses are no longer reachable). I have not looked into it more carefully and it would take several years to get any standard implemented (even though I feel that it wouldn't be that hard to do). Just state what mac addresses was removed from your forwarding table due to link down, signalling this to everybody connected to you. Probably won't scale to very large L2 domains, but would perhaps be ok for 50-100 nodes connected to an IX. -- Mikael Abrahamsson email: swmike@swm.pp.se
On Sat, 10 Aug 2002, Mikael Abrahamsson wrote:
It does when you start doing streaming anything, say TV or telephony. I agree that this wont be solved using any current L3 or above protocol since BGP takes quite a while to recalculate anyway. Any redundancy has to be pre-calculated or on a lower level, this is where for instance SRP/DPT claims excellence, guess same claims come from the MPLS crowd.
For it to be of any use, this rapid failover would have to be end-to-end too. It's no good picking on one network element, such as the exchange, and getting them to spend significant amounts of time and energy on rapid failover, if it's just going to fall apart on either side. We've been down this road with multicast. Getting good (non-IGMP) multicast containment on a switched ethernet isn't easy, nor is the current situation ideal - there are several different approaches to containment out there (and then we come back to getting stuff through the standards process too). But, the pressure isn't there either, because the access networks aren't enabled/capable right now - certainly from talking to UK broadband providers. There's also a non-technical driver - the bizdev people who are in favour of per megabit billing will oppose multicast on the grounds that the meter won't tick over as quickly (in their eyes).
Personally I agree with you, the KISS principle is golden here. Peering should be cheap, that is the only reason to do it, and therefore one does not want a lot of complexity that brings up the cost. Tweaking eBGP dead timers to 5-10 seconds works well in most cases.
Agreed. However, one thing to consider is the effect that the short timers has on the routing table, in terms of announcements and withdrawals. It takes about 20-30 seconds to warm boot a Foundry BI8000/15000 and get it forwarding. So, in the event of a software upgrade (or some other need to reboot, fairly rare), as long as you dont have fast-external-fallover enabled or your timers shortened, you will blackhole some traffic, but in the large majority, BGP sessions will stay up. With the shorter timers or fast-external-fallover, a very short maintenance slot at a large exchange can cause ripples in the routing table. It would be interesting to do some analysis of this - how far the ripples spread from each exchange! I'm not saying that one or the other is right, it's just another tax!
I have some idea about bringing together some of the signalling from DPT/SRP into a switching ethernet environment (for instance, have some kind of signalling between switches (propagated to hosts) that a certain port has gone down and notify that certain mac addresses are no longer reachable).
Keith Mitchell had some ideas about harnessing OSPF at the MAC layer, which I become involved in. People thought it may have had some potential (others thought we were on interesting drugs!), but we're back to the tax thing again. It's yet another protocol, and some people believed that it's usefulness would be overtaken by MPLS (despite the potential for more complexity), which we already have.
Probably won't scale to very large L2 domains, but would perhaps be ok for 50-100 nodes connected to an IX.
Which, some argue, reduces the number of potential applications, and therefore the justification for building it. Mike
Mike Hughes wrote:
With the shorter timers or fast-external-fallover, a very short maintenance slot at a large exchange can cause ripples in the routing table. It would be interesting to do some analysis of this - how far the ripples spread from each exchange!
We do BGP instability research, and this is something I'd like to examine further. Compared to other sources of BGP noise, I don't think it's a primary driver for the instability we monitor each day, but I'd like to quantify it. If people were willing to give us a heads-up after the fact when there were .. um .. maintenance events at the major exchanges, we could then go back and look at global propagation of the ripples on fine timescales. --jim p.s. The more eyes we have, the more we see, and we are always looking for more silent peers, especially small and midsize providers or their multihomed customers. See http://renesys.com/cgi-bin/bgpfeed to sign up to send us a one-way multihop EBGP feed. It's quick, painless, and you will be helping unravel the mysteries of why global routing works so well in spite of us all.
Mikael Abrahamsson <swmike@swm.pp.se> writes:
On 10 Aug 2002, Paul Vixie wrote:
why on god's earth would subsecond anything matter in a nonmilitary situation?
It does when you start doing streaming anything, say TV or telephony. I
I submit that it doesn't matter for voice or video, if the MTBF is reasonably high. Consider the reliability that people put up with from their cable companies, and the voice quality that we accept from our (North American) cell phones, not to mention the dropped calls. Streaming video and VOIP is an order of magnitude better in my experience without doing anything special. I hate to come across (particularly in this forum) as an advocate of purely market-driven engineering, you have to ask yourself what you're buying if you're spending money to fix a problem that your customers don't (and won't) perceive as such. Remember the words of Admiral Gorshkov, who is variously quoted as having said: "(Better,Perfect) is the enemy of good enough." ---Rob
Paul Vixie wrote:
warning: i've had one "high gravity steel reserve" over my quota. hit D now.
The issue I'm trying to address is to figure out how to extend the robustness that can be achieved with tuned IGP's with subsecond convergence across an exchange point without suffering a one to five minute delay blackholing packets.
why on god's earth would subsecond anything matter in a nonmilitary situation?
If the software MTBF would be better, convergence would not be an issue. As long as it's an operational hazard to run core boxes (with some vendors anyway) with older piece of code than six months, you end up engineering convergence into the networks.
are you willing to pay a cell tax AND a protocol complexity tax AND a device complexity tax to make this happen? do you know what that will do do your TCO and therefore your ROI? you want to pay this tax 100% of the time even though your error states will account for less than 0.001% of the time? you want to have the complexity as your most likely source of (false positive) error?
Who said anything about cell tax? If I ask for liveness you give me ATM?
As far as I understand, this "complexity" just got added with Neighbor Discovery on IPv6.
if so, then, you misunderstand.
As far as I understand, ND does contain the functionality I'd like to accomplish, unfortunately it does not do that for IPv4. I'm just making points why, in existing operational environment, going from ATM to GE reduces robustness. Instead of going on the defensive it would probably help to discuss how to make ethernet-based solutions more robust, since that's where everybody is moving to anyway. Pete
On Sat, Aug 10, 2002 at 06:09:05PM +0300, Petri Helenius wrote:
If the software MTBF would be better, convergence would not be an issue. As long as it's an operational hazard to run core boxes (with some vendors anyway) with older piece of code than six months, you end up engineering convergence into the networks.
Odd, I think most people would say it's an operational hazard to run code newer than 6 months old, or at least with less than 6 months of testing on any particular image. How they're able to completely break so many critically important things within 2 weeks between a bugfix code rev is still beyond me. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
On Sat, Aug 10, 2002 at 11:20:44AM -0400, Richard A Steenbergen wrote:
On Sat, Aug 10, 2002 at 06:09:05PM +0300, Petri Helenius wrote:
If the software MTBF would be better, convergence would not be an issue. As long as it's an operational hazard to run core boxes (with some vendors anyway) with older piece of code than six months, you end up engineering convergence into the networks.
Odd, I think most people would say it's an operational hazard to run code newer than 6 months old, or at least with less than 6 months of testing on any particular image.
With all the recent software secuirty advisories that affect many vendors (ssh, snmp, etc..) running anything older than that is a blatant security risk for anyones network. Not keeping up-to-date on these items and thinking you're fine is just asking to be brought down.
How they're able to completely break so many critically important things within 2 weeks between a bugfix code rev is still beyond me. :)
I'm not sure what vendor you are refering to, but i've not seen any problems like this anytime in the past 6+ months. - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
On 10 Aug 2002, Paul Vixie wrote:
why on god's earth would subsecond anything matter in a nonmilitary situation?
Telemedicine, tele-robotics, etc, etc. Actually, there's a lot of cases when you want to have subsecond recovery. The current Internet routing technology is not up to the task; so people who need it have to build private networks and pay for that arm and leg, too. --vadim
Paul just hit on it. At how many layers do you want protection, and will they interfere with each other. Granted not all protection schemes overlap. If there if not a layer 1 failure, and a router maintains link0 but the card or routers has somehow failed and is no longer passing packets, I suppose that would have to be caught at layer 3. At an (MAN) exchange pt based in S. Fl, the technology is a multi-node area exchange point (layer 1 technology) based on dwdm and optical switches. The detection of nodes and failures is done with enhanced-OSPF. On testing, failure between the farthest two nodes and recovery took 16ms (approx 95miles dist btw nodes). Each individual circuit has a choice of protection level. This allows for no protection for any of a number of reasons. One may be to not interfere with a protection scheme at a higher level. While the switches do use OSPF for detection and recovery, they also use MPLS for reservation of bandwidth. None of this information is passed onto the customer routers however. It seems there should be a clear delineation btw the layers and what protection schemes should run at each. I also believe in separation of church and state if u will, router companies should play in their space while optical companies show stay in theirs. While it makes sense for some information to pass btw differing types of equipment (such as ODSI protocol or UNI 1.0) integration of the protection schemes runs a high degree of a cascade failure, or susceptibility to an exploit attach. As an added thought, the same MAN exchange point can do intranode connections (hairpinning). So that the same node that is used in internodal transport and peering, can also be used within a colo as an intelligent cross-connect box. This would allow for visibility and monitoring within the colo and even customer network management of their cross connects. I suppose the discussion is what do you want from your exchange pt operator and what do you NOT want. Many people would not feel comfortable that circuit operators have visibility and maintain stats on even NUMBER of packets passed.... dd At 9:21 +0000 8/10/02, Paul Vixie wrote:
warning: i've had one "high gravity steel reserve" over my quota. hit D now.
The issue I'm trying to address is to figure out how to extend the robustness that can be achieved with tuned IGP's with subsecond convergence across an exchange point without suffering a one to five minute delay blackholing packets.
why on god's earth would subsecond anything matter in a nonmilitary situation?
are you willing to pay a cell tax AND a protocol complexity tax AND a device complexity tax to make this happen? do you know what that will do do your TCO and therefore your ROI? you want to pay this tax 100% of the time even though your error states will account for less than 0.001% of the time? you want to have the complexity as your most likely source of (false positive) error?
As far as I understand, this "complexity" just got added with Neighbor Discovery on IPv6.
if so, then, you misunderstand. -- Paul Vixie
-- David Diaz dave@smoton.net [Email] pagedave@smoton.net [Pager] Smotons (Smart Photons) trump dumb photons
If the connection fabric between your routers has an MTBF best measured in hours or days, then you've got bigger problems than you'll solve with LMI.
Agreed. However, I think the debate may be over the (un)reliability of routers connected to the exchange, not the exchange itself. -- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
Thus spake "Petri Helenius" <pete@he.iki.fi>
What functionality does PVC give you that the ethernet VLAN does not?
That´s quite easy. Endpoint liveness. A IPv4 host on a VLAN has no idea if the guy on the "other end" died until the BGP timer expires.
FR has LMI, ATM has OAM. (and ILMI)
FR LMI and ATM ILMI are so notoriously unreliable at endpoint liveness that FR EEK and ATM OAM became necessary. Be glad Ethernet is not stuck with such a useless "feature". It would be trivial for someone to write up an "Ethernet EEK" or "IPv4 ND" draft and submit it to their favorite router vendors for implementation. If nobody has done so, it's obviously not that important. S
What functionality does PVC give you that the ethernet VLAN does not?
Shaping, for one.
What is the current max speed of frame relay in any common vendor implementation (I'm talking routers here).
Doesn't OC48 POS on GSR and Jewniper do FR?
-- Mikael Abrahamsson email: swmike@swm.pp.se
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
On Sat, Aug 10, 2002 at 05:42:32PM -0400, Alex Rubenstein wrote:
What is the current max speed of frame relay in any common vendor implementation (I'm talking routers here).
Doesn't OC48 POS on GSR and Jewniper do FR?
Welcome to MAE Chicago/New York, http://www.mae.net/FE/. But M160's and OC48 ports are expensive, I suspect its overkill for the amount of traffic that will actually be exchanged there. I do wonder why most GigE exchange points are still doing single lan segment peering instead of having a peermaker type service for dynamic vlan configurations. Manual configuration is slow and a pain, and with some of them charging you per-vlan what it would cost for a copper crossconnect, it's no wonder most people don't use them. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
Thus spake "Alex Rubenstein" <alex@nac.net>
What functionality does PVC give you that the ethernet VLAN does not?
Shaping, for one.
There is nothing inherent in Ethernet which precludes shaping. Low- and mid-range routers can do it just fine. If your core router doesn't, speak with your vendor. Then again, do your core routers really support shaping on OC192 FR either?
What is the current max speed of frame relay in any common vendor implementation (I'm talking routers here).
Doesn't OC48 POS on GSR and Jewniper do FR?
If those boxes approached the reliability of carrier FR/ATM gear, that might be relevant. S
participants (13)
-
Alex Rubenstein
-
cowie@renesys.com
-
David Diaz
-
Jared Mauch
-
Mikael Abrahamsson
-
Mike Hughes
-
Nenad Trifunovic
-
Paul Vixie
-
Petri Helenius
-
Richard A Steenbergen
-
rs@seastrom.com
-
Stephen Sprunk
-
Vadim Antonov