Hello NANOG, I like would to know what are best practices for an internet exchange. I have some concerns about the following; Can the IXP members use RFC 1918 ip addresses for their peering? Can the IXP members use private autonomous numbers for their peering? Maybe the answer is obviuos, but I like to know from any IXP admins what their setup/experiences have been. -- --sharlon
On Fri, Apr 17, 2009 at 10:11:30AM -0400, Sharlon R. Carty wrote:
Hello NANOG,
I like would to know what are best practices for an internet exchange. I have some concerns about the following; Can the IXP members use RFC 1918 ip addresses for their peering? Can the IXP members use private autonomous numbers for their peering?
Maybe the answer is obviuos, but I like to know from any IXP admins what their setup/experiences have been.
-- --sharlon
15 years into the exchange trade has given me the following insights: RFC1918 space can be used - but virtually everyone who starts there migrates to globally unique space. Private ASNs - same deal. Private ASNs tend to have special treatment inside ISPs - so path matching gets you in the end. --bill
me@sharloncarty.net (Sharlon R. Carty) wrote:
I like would to know what are best practices for an internet exchange. I have some concerns about the following; Can the IXP members use RFC 1918 ip addresses for their peering?
No. Those IP addresses will at least appear on traceroutes; also, it might not be such a good idea to use the same RFC1918 space (accidentally) on different IXPs. This will get your skin crawling (thing IGP, or at least config databases)... IXPs can usually get a v4 and a v6 block for their peering grid easily.
Can the IXP members use private autonomous numbers for their peering?
They could, but what would you then do with them inside your IGP? And apart from that - ISPs that want to peer tend to have their ASNs ready... I am not an IXP operator, but I know of no exchange (public or private, big or closet-style) that uses private ASNs or RFC1918 space. Elmar.
I like would to know what are best practices for an internet exchange. I have some concerns about the following; Can the IXP members use RFC 1918 ip addresses for their peering?
No. Those IP addresses will at least appear on traceroutes; also, it might not be such a good idea to use the same RFC1918 space (accidentally) on different IXPs. This will get your skin crawling (thing IGP, or at least config databases)... IXPs can usually get a v4 and a v6 block for their peering grid easily.
Anyone with a decently configured firewall would block IP packets with source address from RFC1918 coming from the Internet. Your IXP would appear as a black hole in traceroute printout because the ICMP replies sent from the IXP IP addresses would be blocked. A while ago I've described a few more caveats in an article (see http://blog.ioshints.info/2008/08/private-ip-addresses-in-public-networks.ht ml). Ivan http://www.ioshints.info/about http://blog.ioshints.info/
with the advent of vlan tags, the whole idea of CSMA for IXP networks is passe. just put each pair of peers into their own private tagged vlan and let one of them allocate a V4 /30 and a V6 /64 for it. as a bonus, this prevents third party BGP (which nobody really liked which sometimes got turned on by mistake) and prevents transit dumping and/or "pointing default at" someone. the IXP no longer needs any address space, they're just a VPN provider. shared-switch connections are just virtual crossconnects. -- Paul Vixie
On Fri, 17 Apr 2009, Paul Vixie wrote: > with the advent of vlan tags, the whole idea of CSMA for IXP networks is passe. > just put each pair of peers into their own private tagged vlan. Uh, I'm not sure whether you're being sarcastic or not. -Bill
On 17.04.2009 20:52 Paul Vixie wrote
with the advent of vlan tags, the whole idea of CSMA for IXP networks is passe. just put each pair of peers into their own private tagged vlan and let one of them allocate a V4 /30 and a V6 /64 for it. as a bonus, this prevents third party BGP (which nobody really liked which sometimes got turned on by mistake) and prevents transit dumping and/or "pointing default at" someone. the IXP no longer needs any address space, they're just a VPN provider. shared-switch connections are just virtual crossconnects.
Large IXP have >300 customers. You would need up to 45k vlan tags, wouldn't you? Arnold -- Arnold Nipper / nIPper consulting, Sandhausen, Germany email: arnold@nipper.de phone: +49 6224 9259 299 mobile: +49 172 2650958 fax: +49 6224 9259 333
On Apr 17, 2009, at 12:00 PM, Arnold Nipper wrote:
On 17.04.2009 20:52 Paul Vixie wrote
with the advent of vlan tags, the whole idea of CSMA for IXP networks is passe. just put each pair of peers into their own private tagged vlan and let one of them allocate a V4 /30 and a V6 /64 for it. as a bonus, this prevents third party BGP (which nobody really liked which sometimes got turned on by mistake) and prevents transit dumping and/or "pointing default at" someone. the IXP no longer needs any address space, they're just a VPN provider. shared-switch connections are just virtual crossconnects.
Large IXP have >300 customers. You would need up to 45k vlan tags, wouldn't you?
QinQ could solve this Kris
On 17.04.2009 21:04 kris foster wrote
On Apr 17, 2009, at 12:00 PM, Arnold Nipper wrote:
On 17.04.2009 20:52 Paul Vixie wrote
with the advent of vlan tags, the whole idea of CSMA for IXP networks is passe. just put each pair of peers into their own private tagged vlan and let one of them allocate a V4 /30 and a V6 /64 for it. as a bonus, this prevents third party BGP (which nobody really liked which sometimes got turned on by mistake) and prevents transit dumping and/or "pointing default at" someone. the IXP no longer needs any address space, they're just a VPN provider. shared-switch connections are just virtual crossconnects.
Large IXP have >300 customers. You would need up to 45k vlan tags, wouldn't you?
QinQ could solve this
not really -- Arnold Nipper / nIPper consulting, Sandhausen, Germany email: arnold@nipper.de phone: +49 6224 9259 299 mobile: +49 172 2650958 fax: +49 6224 9259 333
On Apr 17, 2009, at 12:05 PM, Arnold Nipper wrote:
On 17.04.2009 21:04 kris foster wrote
On Apr 17, 2009, at 12:00 PM, Arnold Nipper wrote:
On 17.04.2009 20:52 Paul Vixie wrote
with the advent of vlan tags, the whole idea of CSMA for IXP networks is passe. just put each pair of peers into their own private tagged vlan and let one of them allocate a V4 /30 and a V6 /64 for it. as a bonus, this prevents third party BGP (which nobody really liked which sometimes got turned on by mistake) and prevents transit dumping and/or "pointing default at" someone. the IXP no longer needs any address space, they're just a VPN provider. shared-switch connections are just virtual crossconnects.
Large IXP have >300 customers. You would need up to 45k vlan tags, wouldn't you?
QinQ could solve this
not really
painfully, with multiple circuits into the IX :) I'm not advocating Paul's suggestion at all here Kris
the vlan tagging idea is a virtualization of the PNI construct. why use an IX when running 10's/100's/1000's of private network interconnects will do? granted, if out of the 120 ASN's at an IX, 100 are exchanging on average - 80KBs - then its likley safe to dump them all into a single physical port and vlan tag the heck out of it. its those other 20 that demand some special care. (welcome to "how to grow your presence at an IX and when to leave"-101 :) --bill
On Fri, 17 Apr 2009, bmanning@vacation.karoshi.com wrote:
the vlan tagging idea is a virtualization of the PNI construct. why use an IX when running 10's/100's/1000's of private network interconnects will do?
granted, if out of the 120 ASN's at an IX, 100 are exchanging on average - 80KBs - then its likley safe to dump them all into a single physical port and vlan tag the heck out of it.
its those other 20 that demand some special care.
The construct also doesn't scale well for multicast traffic exchange if there's a significant number of multicast peers even though the traffic might be low for individual source ASNs. On the other hand, if the IXP doesn't use IGMP/MLD snooping capable switches, then I suppose it doesn't matter. Antonio Querubin whois: AQ7-ARIN
On Fri, 17 Apr 2009, bmanning@vacation.karoshi.com wrote:
the vlan tagging idea is a virtualization of the PNI construct. why use an IX when running 10's/100's/1000's of private network interconnects will do?
granted, if out of the 120 ASN's at an IX, 100 are exchanging on average - 80KBs - then its likley safe to dump them all into a single physical port and vlan tag the heck out of it.
its those other 20 that demand some special care.
The construct also doesn't scale well for multicast traffic exchange if there's a significant number of multicast peers even though the traffic might be low for individual source ASNs. On the other hand, if the IXP doesn't use IGMP/MLD snooping capable switches, then I suppose it doesn't matter.
Didn't we go through all this with ATM VC's at the AADS NAP, etc? ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.
On Fri, Apr 17, 2009 at 04:52:53PM -0500, Joe Greco wrote:
On Fri, 17 Apr 2009, bmanning@vacation.karoshi.com wrote:
the vlan tagging idea is a virtualization of the PNI construct. why use an IX when running 10's/100's/1000's of private network interconnects will do?
granted, if out of the 120 ASN's at an IX, 100 are exchanging on average - 80KBs - then its likley safe to dump them all into a single physical port and vlan tag the heck out of it.
its those other 20 that demand some special care.
The construct also doesn't scale well for multicast traffic exchange if there's a significant number of multicast peers even though the traffic might be low for individual source ASNs. On the other hand, if the IXP doesn't use IGMP/MLD snooping capable switches, then I suppose it doesn't matter.
Didn't we go through all this with ATM VC's at the AADS NAP, etc?
... JG
yes indeed. --bill
The construct also doesn't scale well for multicast traffic exchange if there's a significant number of multicast peers even though the traffic might be low for individual source ASNs. On the other hand, if the IXP doesn't use IGMP/MLD snooping capable switches, then I suppose it doesn't matter.
the people who do massive volumes of multicast in my experience have also been the ones whose network policies, or unicast traffic volumes, or both, prevented them from joining CSMA peering fabrics. CSMA assumes a large number of small flows, which is not what i see in the multicast market, but i admit that i'm not as involved as i used to be.
----- "kris foster" <kris.foster@gmail.com> wrote:
painfully, with multiple circuits into the IX :) I'm not advocating Paul's suggestion at all here
Kris
Totally agree with you Kris. For the IX scenario (or at least looking in a Public way) it seems Another Terrible Mistake to me. IMHO, when you are in a Public IX, you usually want to reach everyone's network without hassling around. Then it is your problem, and yours peer problem if we peer or not. When you overload a certain port at a Public IX, you rather upgrade that Port, or, Move particular bit pushers and movers for a Private Peering port (if it really makes technical and economical sense). I don't see how this idea that came out there could benefit the operational daily works (For IX, For IX Customers) , also, it would require work from the (usually) Neutral IX, when users need to connect ear other, which, will lead in more money to pay. (hey IX OPS.. we are company X and Z, and we signed a nice peering agreement.. can you please virtual patch us ?) Where is the neutrality here ? Time ? What if my equipment brokes at 3 AM and IX Ops need to change configs ? Ok, ones could say... it is automated... BUT.. what is the really security behind automation ? The portal is on the Wild Web, right ? This happens today on datacenters, with real cross connects, usually thru MMR's (Meet me Rooms). I don't want to have a Virtual Meet me Room, on Internet exchanges where i peer. This is my view. I might be wrong, but i don't care, as i am square as a rock. :-) I don't understand how can this new concept (or really old, considering ancient ATM peering and stuff), can be better, more secure, and cheaper for all. cheers, --nvieira
On 17.04.2009 23:06 Paul Vixie wrote
Large IXP have >300 customers. You would need up to 45k vlan tags, wouldn't you?
the 300-peer IXP's i've been associated with weren't quite full mesh in terms of who actually wanted to peer with whom, so, no.
Much depends on your definition of "quite". Would 30% qualify? Arnold -- Arnold Nipper / nIPper consulting, Sandhausen, Germany email: arnold@nipper.de phone: +49 6224 9259 299 mobile: +49 172 2650958 fax: +49 6224 9259 333
the 300-peer IXP's i've been associated with weren't quite full mesh in terms of who actually wanted to peer with whom, so, no.
Much depends on your definition of "quite". Would 30% qualify?
30% would be an over-the-top success. has anybody ever run out of 1Q tags in an IXP context?
On 18.04.2009 00:04 Paul Vixie wrote
the 300-peer IXP's i've been associated with weren't quite full mesh in terms of who actually wanted to peer with whom, so, no.
Much depends on your definition of "quite". Would 30% qualify?
30% would be an over-the-top success. has anybody ever run out of 1Q tags in an IXP context?
Why? You only need 1 ;-) Arnold -- Arnold Nipper / nIPper consulting, Sandhausen, Germany email: arnold@nipper.de phone: +49 6224 9259 299 mobile: +49 172 2650958 fax: +49 6224 9259 333
Arnold Nipper <arnold@nipper.de> writes:
On 18.04.2009 00:04 Paul Vixie wrote
... has anybody ever run out of 1Q tags in an IXP context?
Why? You only need 1 ;-)
really? 1? at PAIX we started with three, two unicast (wrongheadedness) and one multicast, then added another unicast for V6. then came the VNI's, so i'm betting there are hundreds or thousands at most PAIX nodes today. are others just using one big shared network for everything? i should expand on something i said earlier on this thread. the progression i saw at PAIX and later saw from inside MFN was that most new peerings would happen on a shared port and then as that port filled up some peerings would move to PNI. given that success in these terms looks like a PNI, i'm loathe to build in any dependencies on the long term residency of a given peering on a shared multiaccess subnet. i should answer something said earlier: yes there's only 14 bits of tag and yes 2**14 is 4096. in the sparsest and most wasteful allocation scheme, tags would be assigned 7:7 so there'd be a max of 64 peers. it's more likely that tags would be assigned by increment, but it's still nowhere near enough for 300+ peers. however, well before 300 peers, there'd be enough staff and enough money to use something other than a switch in the middle, so that the "tagspace" would be per-port rather than global to the IXP. Q in Q is not how i'd build this... cisco and juniper both have hardware tunnelling capabilities that support this stuff... it just means as the IXP fabric grows it has to become router-based. i've spent more than several late nights and long weekends dealing with the problems of shared multiaccess IXP networks. broadcast storms, poisoned ARP, pointing default, unintended third party BGP, unintended spanning tree, semitranslucent loops, unauthorized IXP LAN extension... all to watch the largest flows move off to PNI as soon as somebody's port was getting full. conventional wisdom says a shared fabric is fine. conventional wisdom also said that UNIX came only from bell labs, that computers and operating systems were bought from the same vendor on a single PO, that protocols built for T1 customers who paid $1000 MRC would scale to DSL customers who paid $30 MRC, that Well and Portal shell users should be allowed to use outbound SMTP, that the internet would only be used cooperatively, and that business applications were written in COBOL whereas scientific applications were written in FORTRAN, and that the cool people all used BSD whereas Linux was just a toy. so i think conventional wisdom isn't perfectly ageless. -- Paul Vixie
On 18/04/2009, at 12:08 PM, Paul Vixie wrote:
i should answer something said earlier: yes there's only 14 bits of tag and yes 2**14 is 4096. in the sparsest and most wasteful allocation scheme, tags would be assigned 7:7 so there'd be a max of 64 peers. it's more likely that tags would be assigned by increment, but it's still nowhere near enough for 300+ peers. however, well before 300 peers, there'd be enough staff and enough money to use something other than a switch in the middle, so that the "tagspace" would be per-port rather than global to the IXP. Q in Q is not how i'd build this... cisco and juniper both have hardware tunnelling capabilities that support this stuff... it just means as the IXP fabric grows it has to become router-based.
On Alcatel-Lucent 7x50 gear, VLAN IDs are only relevant to that local port. If you want to build a "VLAN" that operates like it does on a Cisco switch or something, you set up a tag on each port, and join the tags together with a L2 switching service. The tag IDs can be different on each port, or the same... it has no impact. -- Nathan Ward
Nathan Ward <nanog@daork.net> writes:
On 18/04/2009, at 12:08 PM, Paul Vixie wrote:
... Q in Q is not how i'd build this... cisco and juniper both have hardware tunnelling capabilities that support this stuff... ...
On Alcatel-Lucent 7x50 gear, VLAN IDs are only relevant to that local port. If you want to build a "VLAN" that operates like it does on a Cisco switch or something, you set up a tag on each port, and join the tags together with a L2 switching service. The tag IDs can be different on each port, or the same... it has no impact.
apologies for leaving out alcatel-lucent and any other vendor who can also do this. i mentioned only juniper and cisco in the above-quoted article because that's the limit of my own knowledge on this topic. -- Paul Vixie
From: Paul Vixie <vixie@isc.org> Date: Sat, 18 Apr 2009 00:08:04 +0000 ... i should answer something said earlier: yes there's only 14 bits of tag and yes 2**14 is 4096. in the sparsest and most wasteful allocation scheme, tags would be assigned 7:7 so there'd be a max of 64 peers.
i meant of course 12 bits, that 2**12 is 4096, and 6:6. apologies for slop.
On 18/04/2009 01:08, Paul Vixie wrote:
i've spent more than several late nights and long weekends dealing with the problems of shared multiaccess IXP networks. broadcast storms, poisoned ARP, pointing default, unintended third party BGP, unintended spanning tree, semitranslucent loops, unauthorized IXP LAN extension... all to watch the largest flows move off to PNI as soon as somebody's port was getting full.
Paul- to be fair, things might have moved on a little since the earlier years of internet exchanges. These days, we have switches which do multicast and broadcast storm control, unicast flood control, mac address counting, l2 and l3 acls, dynamic arp inspection, and they can all be configured to ignore bpdus in a variety of imaginative ways. We have arp sponges and broadcast monitors. We have edge routers which can do multiple flavours of urpf, and for those hardcore types who don't like md5 or gtsm, there's always ipsec for bgp sessions. I have to be honest: i just don't care if people use L2 connectivity to get to an exchange from a router somewhere else on their LAN. They have one mac address to play around with, and if they start leaking mac addresses towards the exchange fabric, all they're going to do is hose their own connectivity. If they are silly enough to enable stp at their edge, then that will trash their connectivity, as a carrier up event will trigger STP packets from their switch before their router notices, and mac learning will prevent their router from gaining access to the exchange. If they decide to loop their L2 traffic, do I care? They'll just be chopped off automatically, and I'll get an email. And if people behave really cretinously, I'll just bang in more L2 or L3 filters to stop them from tickling my monitoring systems, but most likely at that stage, they will have been extensively depeered due to technical ineptitude. Stupid behaviour is self-limiting and is really just an annoyance these days rather than a problem. As you've noted, there is a natural progression for services providers here from shared access to pni, which advances according to the business and financial requirements of the parties involved. If exchange users decide to move from shared access peering to PNI, good for them - it means their business is doing well. But this doesn't mean that IXPs don't offer an important level of service to their constituents. Because of them, the isp industry has convenient access to dense interconnection at a pretty decent price.
Q in Q is not how i'd build this... cisco and juniper both have hardware tunnelling capabilities that support this stuff... it just means as the IXP fabric grows it has to become router-based.
Hey, I have an idea: you could take this plan and build a tunnel-based or even a native IP access IXP platform like this, extend it to multiple locations and then buy transit from a bunch of companies which would give you a native L3 based IXP with either client prefixes only or else an option for full DFZ connectivity over the exchange fabric. You could even build a global IXP on this basis! It's a brilliant idea, and I just can't imagine why no-one thought of it before. Nick
Date: Sat, 18 Apr 2009 16:35:51 +0100 From: Nick Hilliard <nick@foobar.org>
... i just don't care if people use L2 connectivity to get to an exchange from a router somewhere else on their LAN. They have one mac address to play around with, and if they start leaking mac addresses towards the exchange fabric, all they're going to do is hose their own connectivity.
yeah we did that at PAIX. if today's extremenetworks device has an option to learn one MAC address per port and no more, it's because we had a terrible time getting people to register their new MAC address when they'd change out interface cards or routers. hilarious levels of fingerpointing and downtime later, our switch vendor added a knob for us. but we still saw typo's in IP address configurations whereby someone could answer ARPs for somebody else's IP. when i left PAIX (the day MFN entered bankruptcy) we were negotiating for more switch knobs to prevent accidental and/or malicious ARP poisoning. (and note, this was on top of a no-L2-devices rule which included draconian auditing rights for L2/L3 capable hardware.)
As you've noted, there is a natural progression for services providers here from shared access to pni, which advances according to the business and financial requirements of the parties involved. If exchange users decide to move from shared access peering to PNI, good for them - it means their business is doing well. But this doesn't mean that IXPs don't offer an important level of service to their constituents. Because of them, the isp industry has convenient access to dense interconnection at a pretty decent price.
yes, that's the progression of success. and my way of designing for success is to start people off with VNI's (two-port VLANs containing one peering) so that when they move from shared-access to dedicated they're just moving from a virtual wire to a physical wire without losing any of the side-benefits they may have got from a shared-access peering fabric.
Q in Q is not how i'd build this... cisco and juniper both have hardware tunnelling capabilities that support this stuff... it just means as the IXP fabric grows it has to become router-based.
Hey, I have an idea: you could take this plan and build a tunnel-based or even a native IP access IXP platform like this, extend it to multiple locations and then buy transit from a bunch of companies which would give you a native L3 based IXP with either client prefixes only or else an option for full DFZ connectivity over the exchange fabric. You could even build a global IXP on this basis! It's a brilliant idea, and I just can't imagine why no-one thought of it before.
:-). i've been known to extend IXP fabrics to cover a metro, but never beyond.
Best solution I ever saw to an 'unintended' third-party peering was devised by a pretty brilliant guy (who can pipe up if he's listening). When he discovered traffic loads coming from non-peers he'd drop in an ACL that blocked everything except ICMP - then tell the NOC to route the call to his desk with the third party finally gave up troubleshooting and called in... fun memories of the NAPs... jy On Apr 18, 2009, at 11:35 AM, Nick Hilliard wrote:
On 18/04/2009 01:08, Paul Vixie wrote:
i've spent more than several late nights and long weekends dealing with the problems of shared multiaccess IXP networks. broadcast storms, poisoned ARP, pointing default, unintended third party BGP, unintended spanning tree, semitranslucent loops, unauthorized IXP LAN extension... all to watch the largest flows move off to PNI as soon as somebody's port was getting full.
If you are unfortunate enough to have to peer at a public exchange point, put your public ports into a vrf that has your routes. Default will be suboptimal to debug. I must say stephen and vixie and (how hard this is to type) even richard steenbergens methodology makes the most sense going forward. Mostly to prevent self-inflicted harm on parts of the exchange participants. Will it work? Doubtful in todays internet clue level /vijay On 4/18/09, Jeff Young <young@jsyoung.net> wrote:
Best solution I ever saw to an 'unintended' third-party peering was devised by a pretty brilliant guy (who can pipe up if he's listening). When he discovered traffic loads coming from non-peers he'd drop in an ACL that blocked everything except ICMP - then tell the NOC to route the call to his desk with the third party finally gave up troubleshooting and called in...
fun memories of the NAPs...
jy
On Apr 18, 2009, at 11:35 AM, Nick Hilliard wrote:
On 18/04/2009 01:08, Paul Vixie wrote:
i've spent more than several late nights and long weekends dealing with the problems of shared multiaccess IXP networks. broadcast storms, poisoned ARP, pointing default, unintended third party BGP, unintended spanning tree, semitranslucent loops, unauthorized IXP LAN extension... all to watch the largest flows move off to PNI as soon as somebody's port was getting full.
-- Sent from my mobile device
A solution I put in place at UUnet circa 1997 was to take a set of /32 routes representing major destination, e.g. ISP web sites, content sites, universities, about 20 of them, and temporarily place a /32 static route to each participant at the public exchange and traceroute to the destination. In this manner one can build a matrix to see how each participant gets to each destination. When we found someone sending traffic to us with whom we were not peering, it was only a small bit of work to contact the ISP and ask them to fix the "error". One guy whose initial were GBO fixed it several times if I remember correctly. I wonder how prevalent third-party next hops (here share my peering!) are nowadays? From time to time it was interesting to watch peers and see when they figured out others were using them for transit. BTW, I wonder how many folks did do the ICMP acl stuff. We never did it at UUNET that I remember. In 1997 I know the routers could handle the ACL, at least as well as routers in those days could be said to handle traffic. The guy that taught it to me had the initial NS over a margarita at Rio Grande. Completely preventing the potential for the problem is superior to detecting it. But at the time, without a clear method for preventing it, detection was good. I remember MFS tried to implement mac filters but bugs in the code rendered it a moot exercise. -alan vijay gill wrote:
If you are unfortunate enough to have to peer at a public exchange point, put your public ports into a vrf that has your routes.
On 4/18/09, Jeff Young <young@jsyoung.net> wrote:
Best solution I ever saw to an 'unintended' third-party peering was devised by a pretty brilliant guy (who can pipe up if he's listening).
On Apr 18, 2009, at 11:35 AM, Nick Hilliard wrote
On 18/04/2009 01:08, Paul Vixie wrote:
....pointing default, ....
So here is an idea that I hope someone shoots down. We've been talking about pseudo-wires, and the high level of expertise a shared-fabric IXP needs to diagnose weird switch oddities, etc. As far as I can tell, the principal reason to use a shared fabric is to allow multiple connections to networks that may not justify their own dedicated ($$$$) router port. Once they do, they can move over to a PNI. However, an IXP is (at the hardware level at least) trying to achieve any-to-any connectivity without concern for capacity up to the port size of each port on every flow. Scaling this to multiple pieces of hardware has posed interesting challenges when the connection speed to participants is of the same order as the interconnection between IXP switches. So here is a hybrid idea, I'm not sure if It has been tried or seriously considered before. Since the primary justification for a shared fabric is cost savings.... What if everyone who participated at an IXP brought their own switch. For argument's sake, a Nexus 5xxx. It has 20+ ports of L2, wire speed 10G. You connect 1-2 ports on your router, you order 18 cross-connects to your favorite peers. The IXP becomes a cross-connect provider (there is a business model bump that can be addressed here, TelX and others could address it). As you need more ports, you add them. A Nexus 5K runs about $500 per port. Here are some advantages. If you have 300 participants, yes, you have a lot of ports/switches. However, as "interconnectivity" increases, so does the total fabric capacity. Each additional switch does NOT add significant complexity to the participants, but it does bring significant backplane and buffering capabilities. Each participant could then configure their own pVlans, Vlans or whatever on *their* switch. If they screw something up, it doesn't take everyone down. A non-offending participant that interconnects with an offender can shut down 1 port (automatically or manually) without affecting the rest of their services. This also prevents the requirement of very complicated security features in the L2/L3 gray area. If you don't want your peer to have multiple MACs, don't accept them. If you're cool with it, you can let it slide. If you want to move someone over to a PNI, the IXP needs to do zilch. You just move your cross-connect over to a new port on your service window, your peer can do it at the same or a different time, no big deal. If you *keep* it on a switch however, you can use LACP uplinks from the switches you have to provide say 40G uplinks to your router so large peers don't affect your ability to process traffic. I doubt however, that if this model is applied, there is much purpose for PNIs -- the cost savings model mostly vanishes. As you want to move to higher speeds (40G and 100G) the IXP has to do zilch. You can switch your ports or peers at anytime you choose or set up a separate fabric for your 100G peers. An upgrade in port density or capacity for a peer, or set of peers, does not require a forklift of the whole IXP or some strange speed shifting (other than in the affected parties). Disadvantages. It's probably cheaper on a per-participant basis than a shared fabric once it gets to be a certain size. It's a different model (many-to-many vs one-to-many) that many are used to. It requires interconnects to other participants (en masse) to be about the same as the per port cost of a shared fabric (this is probably achievable given what many places charge for 10G ports). Each participant is managing an additional type of gear. Theoretically if someone uses an Extreme and another uses a Cisco, there might be issues, but at a pure vanilla-L2/VLAN level, I'm pretty sure even 2nd and 3rd tier vendors can interconnect just fine. I think this addresses the keep it as simple as possible without over simplifying. There is nothing new to this model except (perhaps) as its applied to an IXP. People have been aggregating traffic by ports into trunks by capacity for a long time. I haven't figured out why it hasn't really been done to scale at the IXP level. Thoughts? Deepak Jain AiNET
-----Original Message----- From: vijay gill [mailto:vgill@vijaygill.com] Sent: Monday, April 20, 2009 12:35 AM To: Jeff Young; Nick Hilliard; Paul Vixie; nanog@merit.edu Subject: Re: IXP
If you are unfortunate enough to have to peer at a public exchange point, put your public ports into a vrf that has your routes. Default will be suboptimal to debug.
I must say stephen and vixie and (how hard this is to type) even richard steenbergens methodology makes the most sense going forward. Mostly to prevent self-inflicted harm on parts of the exchange participants. Will it work? Doubtful in todays internet clue level
/vijay
On 4/18/09, Jeff Young <young@jsyoung.net> wrote:
Best solution I ever saw to an 'unintended' third-party peering was devised by a pretty brilliant guy (who can pipe up if he's listening). When he discovered traffic loads coming from non-peers he'd drop in an ACL that blocked everything except ICMP - then tell the NOC to route the call to his desk with the third party finally gave up troubleshooting and called in...
fun memories of the NAPs...
jy
On Apr 18, 2009, at 11:35 AM, Nick Hilliard wrote:
On 18/04/2009 01:08, Paul Vixie wrote:
i've spent more than several late nights and long weekends dealing with the problems of shared multiaccess IXP networks. broadcast storms, poisoned ARP, pointing default, unintended third party BGP, unintended spanning tree, semitranslucent loops, unauthorized IXP LAN extension... all to watch the largest flows move off to PNI as soon as somebody's port was getting full.
-- Sent from my mobile device
Hello Deepak: -----Original Message----- So here is an idea that I hope someone shoots down. We've been talking about pseudo-wires, and the high level of expertise a shared-fabric IXP needs to diagnose weird switch oddities, etc. As far as I can tell, the principal reason to use a shared fabric is to allow multiple connections to networks that may not justify their own dedicated ($$$$) router port. Once they do, they can move over to a PNI. However, an IXP is (at the hardware level at least) trying to achieve any-to-any connectivity without concern for capacity up to the port size of each port on every flow. Scaling this to multiple pieces of hardware has posed interesting challenges when the connection speed to participants is of the same order as the interconnection between IXP switches. So here is a hybrid idea, I'm not sure if It has been tried or seriously considered before. Since the primary justification for a shared fabric is cost savings.... What if everyone who participated at an IXP brought their own switch. For argument's sake, a Nexus 5xxx. It has 20+ ports of L2, wire speed 10G. [Michael K. Smith - Adhost] This sounds like fertile ground for unintended consequences. Unmanaged spanning tree topological changes as three people, previously connected to their own switch and to others, now decide to connect to each other as well, using those inexpensive L2 ports. Regards, Mike
Hello Deepak:
-----Original Message-----
So here is an idea that I hope someone shoots down.
We've been talking about pseudo-wires, and the high level of expertise a shared-fabric IXP needs to diagnose weird switch oddities, etc.
As far as I can tell, the principal reason to use a shared fabric is to allow multiple connections to networks that may not justify their own dedicated ($$$$) router port. Once they do, they can move over to a PNI. However, an IXP is (at the hardware level at least) trying to achieve any-to-any connectivity without concern for capacity up to the port size of each port on every flow. Scaling this to multiple pieces of hardware has posed interesting challenges when the connection speed to participants is of the same order as the interconnection between IXP switches.
So here is a hybrid idea, I'm not sure if It has been tried or seriously considered before.
Since the primary justification for a shared fabric is cost savings....
What if everyone who participated at an IXP brought their own switch. For argument's sake, a Nexus 5xxx. It has 20+ ports of L2, wire speed 10G.
[Michael K. Smith - Adhost]
This sounds like fertile ground for unintended consequences. Unmanaged spanning tree topological changes as three people, previously connected to their own switch and to others, now decide to connect to each other as well, using those inexpensive L2 ports.
If each port is in its own pVLAN or similar, and they are only allowed to talk to their uplinks and not other L2 ports on the same switch, loops are avoided. I should have hashed that point out with another line. Yes, strictly throwing up an unconfigured switch becomes a problem after the 2nd one goes in -- but only for those brave enough to peer with you and dumb enough to allow their switch to behave that way. The double-edged clue sword. Deepak
-----Original Message-----
So here is an idea that I hope someone shoots down.
We've been talking about pseudo-wires, and the high level of expertise a shared-fabric IXP needs to diagnose weird switch oddities, etc.
As far as I can tell, the principal reason to use a shared fabric is to allow multiple connections to networks that may not justify their own dedicated ($$$$) router port. Once they do, they can move over to a PNI. However, an IXP is (at the hardware level at least) trying to achieve any-to-any connectivity without concern for capacity up to the port size of each port on every flow. Scaling this to multiple pieces of hardware has posed interesting challenges when the connection speed to participants is of the same order as the interconnection between IXP switches.
So here is a hybrid idea, I'm not sure if It has been tried or seriously considered before.
Since the primary justification for a shared fabric is cost savings....
What if everyone who participated at an IXP brought their own switch. For argument's sake, a Nexus 5xxx. It has 20+ ports of L2, wire speed 10G.
[Michael K. Smith - Adhost]
This sounds like fertile ground for unintended consequences. Unmanaged spanning tree topological changes as three people, previously connected to their own switch and to others, now decide to connect to each other as well, using those inexpensive L2 ports.
If each port is in its own pVLAN or similar, and they are only allowed to talk to their uplinks and not other L2 ports on the same switch, loops are avoided. I should have hashed that point out with another line. Yes, strictly throwing up an unconfigured switch becomes a problem after the 2nd one goes in -- but only for those brave enough to peer with you and dumb enough to allow their switch to behave that way. The double-edged clue sword. Deepak [Michael K. Smith - Adhost] The problem is the model as you've presented it, or as I've read it anyway, allows that type of insertion as part of its root design. If all of the switches have to speak spanning tree because they may be connected to each other on some connection outside of your administrative control, then you have no control over what happens "over there" that causes issues with the STP domain. I'm a big fan of the operational simplicity of a L2 shared fabric with spanning tree disallowed by policy and configuration with all of its resources dedicated to forwarding frames. I'm not convinced that a more complex L3 shared architecture over a shared L2 fabric gets you any more security or resiliency, but it certainly keeps the exchange people busy! I should note that I do technical work for the Seattle Internet Exchange so I'm showing a bias. But, with that said, we have benefited greatly from this model, through consistent, tragedy free growth and very high uptime. Regards, Mike
* deepak@ai.net (Deepak Jain) [Mon 20 Apr 2009, 23:25 CEST]:
So here is an idea that I hope someone shoots down.
We've been talking about pseudo-wires, and the high level of expertise a shared-fabric IXP needs to diagnose weird switch oddities, etc. [..] What if everyone who participated at an IXP brought their own switch. For argument's sake, a Nexus 5xxx. It has 20+ ports of L2, wire speed 10G.
You didn't Cc: randy bush and I assume he's been delete-threading this so I'll say it instead: I encourage all my competitors to try this. You do realise, I hope, that the ability to diagnose weird switch oddities decreases pretty radically when the switch is outside one's administrative control, right? Ethernet has no administrative boundaries that can be delineated. Spanning one broadcast domain across multiple operators is therefore a recipe for disaster. Attempts to limit this will fail as there is no enforcement possible in such a cooperative environment except yelling after the fact and frantic mailing during meltdowns. I don't think I need to spell out how quick hacks will severely restrict scalability. Cheap, fast, secure. It is obvious which two Ethernet chose. -- Niels. -- "We humans get marks for consistency. We always opt for civilization after exhausting the alternatives." -- Carl Guderian
On Monday 20 April 2009 18:57:01 Niels Bakker wrote:
Ethernet has no administrative boundaries that can be delineated. Spanning one broadcast domain across multiple operators is therefore a recipe for disaster.
Isn't this the problem that NBMA networks like ATM were built for?
Cheap, fast, secure. It is obvious which two Ethernet chose.
And which two ATM chose. Although secondhand ATM gear is coming down in price.... ATM has its own issues, but the broadcast layer 2 problem isn't one of them. Seems to me Ethernet layer 2 stuff is just trying today to do what ATM gear did ten years ago. Faster, of course, but still much the same. But, again, too bad ATM was just too expensive, and too different, and Gigabit Ethernet just too easy (at the time).
But I recollect that FORE ATM equipment using LAN Emulation (LANE) used a broadcast and unknown server (BUS) to establish a point-to-point ATM PVC for each broadcast and multicast receiver on a LAN segment. As well as being inherently unscalable (I think the BUS ran on an ASX1000 cpu), this scheme turned the single stream concept of multicast on its head, creating essentially a unicast stream for each multicast PVC client. -----Original Message----- From: Lamar Owen [mailto:lowen@pari.edu] Sent: Tuesday, April 21, 2009 1:21 PM To: nanog@nanog.org Subject: Re: IXP On Monday 20 April 2009 18:57:01 Niels Bakker wrote:
Ethernet has no administrative boundaries that can be delineated. Spanning one broadcast domain across multiple operators is therefore a recipe for disaster.
Isn't this the problem that NBMA networks like ATM were built for?
Cheap, fast, secure. It is obvious which two Ethernet chose.
And which two ATM chose. Although secondhand ATM gear is coming down in price.... ATM has its own issues, but the broadcast layer 2 problem isn't one of them. Seems to me Ethernet layer 2 stuff is just trying today to do what ATM gear did ten years ago. Faster, of course, but still much the same. But, again, too bad ATM was just too expensive, and too different, and Gigabit Ethernet just too easy (at the time).
On Wed, Apr 22, 2009, Holmes,David A wrote:
But I recollect that FORE ATM equipment using LAN Emulation (LANE) used a broadcast and unknown server (BUS) to establish a point-to-point ATM PVC for each broadcast and multicast receiver on a LAN segment. As well as being inherently unscalable (I think the BUS ran on an ASX1000 cpu), this scheme turned the single stream concept of multicast on its head, creating essentially a unicast stream for each multicast PVC client.
IIRC, plenty of popular ethernet switches do this across their backplane for multicast .. Adrian
On Fri, Apr 17, 2009 at 09:00:53PM +0200, Arnold Nipper wrote:
Large IXP have >300 customers. You would need up to 45k vlan tags, wouldn't you?
Not only that, but when faced with the requirement of making the vlan IDs match on both sides of the exchange, most members running layer 3 switches with global vlan significance are going to hit major layer 8 hurdles negotiating the available IDs very quickly. A far better way to implement this is with a web portal brokered virtual crossconnect system, which provisions MPLS martini pwe or vpls circuits between members. This eliminates the vlan scaling and clash issues, as it shifts you from as 12-bit identifier to a 32-bit identifier with vlan tag handoffs to the clients being arbitrarily mapped as the client wishes. Such a system has significant advantages over traditional flat layer 2 switches, in things like security, reliability, flexibility, scalability (in members, traffic, and number of locations within the network), and multiservice use (since you can accurately bill with snmp counters per vlan-ID instead of just guestimating w/sflow). Of course trying to deploy such a system in the current IX market space (especially in the US) has its own unique challenges. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
On Fri, Apr 17, 2009 at 04:10:32PM -0500, Richard A Steenbergen wrote:
A far better way to implement this is with a web portal brokered virtual crossconnect system, which provisions MPLS martini pwe or vpls circuits between members.
A couple of years ago I thought of the same, and discovered that some new MAEs were (supposed to be?) built on exactly that scheme. Hard to have really new ideas these days. :) http://meetings.apnic.net/meetings/19/docs/sigs/ix/ix-pres-bechly-mae-ext-se... Best regards, Daniel -- CLUE-RIPE -- Jabber: dr@cluenet.de -- dr@IRCnet -- PGP: 0xA85C8AA0
with the advent of vlan tags, the whole idea of CSMA for IXP networks is passe. just put each pair of peers into their own private tagged vlan and let one of them allocate a V4 /30 and a V6 /64 for it. as a bonus, this prevents third party BGP (which nobody really liked which sometimes got turned on by mistake) and prevents transit dumping and/or "pointing default at" someone. the IXP no longer needs any address space, they're just a VPN provider. shared-switch connections are just virtual crossconnects. Large IXP have >300 customers. You would need up to 45k vlan tags, wouldn't you?
now arnold, you're spoiling a great idea. researchers could measure the exchnge to see if it ever fully converged (to steal a routing term). nice paper there, and who cares about working connectivity. </sarcasm> randy
Arnold Nipper wrote:
On 17.04.2009 20:52 Paul Vixie wrote
Large IXP have >300 customers. You would need up to 45k vlan tags, wouldn't you?
Not agreeing or disagreeing with this as a concept, but I'd imagine that since a number of vendors support arbitrary vlan rewrite on ports that in simple environment you could do some evil things with that. (ie. you could use QinQ "like" ATM Virtual Paths between core switches and then reuse the VLAN tag as a VC). Then, as long as no peer has more than 4096 peers you're sweet. It'd hurt your head and probably never work, but heck, there's a concept to argue about. (Please note: I don't endorse this as an idea). I guess the other option is to use MPLS xconnect style or, heck, most vendors have gear that'll allow you to do Layer 3 at the same speed as Layer 2, so you could go for routing everyone to a common routing core with either EBGP multihop or MLPA with communities to control route entry and exit. Then broadcast isn't an issue and multicast would kind of be okay. (Please note: I don't endorse this as an idea). None of these options, to be honest, are nice and all more complex than just a Layer2 network with some protocol filtering and rate limiting at the edge. So, it's not clear what more complex arrangements would fix. My feeling is that IXes are just a substitute for PNIs anyway, so peering does naturally migrate that way as the flow get larger. If you're an IX as a business then this may afront you, but more IXes-as-a-business are Colo people (eg. S&D, Equinix) who make good money on xconnects anyway. Or you have a business model that means you accept this happens. Clearly, given the number of 10Gbps ports on some exchanges it's not that much of an issue. MMC
Not agreeing or disagreeing with this as a concept, but I'd imagine that since a number of vendors support arbitrary vlan rewrite on ports that in simple environment you could do some evil things with that. (ie. you could use QinQ "like" ATM Virtual Paths between core switches and then reuse the VLAN tag as a VC). Then, as long as no peer has more than 4096 peers you're sweet. It'd hurt your head and probably never work, but heck, there's a concept to argue about. (Please note: I don't endorse this as an idea).
This would be best managed by a very smart, but very simple piece of software. Just like Facebook or LinkedIn, or what-have-you, a network accepts a "peer/friend" request from another network. Once both sides agree (and only as long as both sides agree) the configuration is pinned up. Either side can pull it down. The configs, up to the hardware limits, would be pretty trivial.. Especially QinQ management for VLANID uniqueness. Not sure how switches handle HOL blocking with QinQ traffic across trunks, but hey... what's the fun of running an IXP without testing some limits? Deepak Jain AiNET
Not sure how switches handle HOL blocking with QinQ traffic across trunks, but hey... what's the fun of running an IXP without testing some limits?
Indeed. Those with longer memories will remember that I used to regularly apologize at NANOG meetings for the DEC Gigaswitch/FDDI head-of-line blocking that all Gigaswitch-based IXPs experienced when some critical mass of OC3 backbone circuits was reached and the 100 MB/s fabric rolled over and died, offered here (again) as a cautionary tale for those who want to test those particular limits (again). At PAIX, when we "upgraded" to the Gigaswitch/FDDI (from a DELNI; we loved the DELNI), I actually used a feature of the switch that you could "black out" certain sections of the crossbar to prevent packets arriving on one port from exiting certain others at the request of some networks to align L2 connectivity with their peering agreements. It was fortunate that the scaling meltdown occurred when it did, otherwise I would have spent more software development resources trying to turn that capability into something that was operationally sustainable for networks to configure the visibility of their port to only those networks with which they had peering agreements. That software would probably have been thrown away with the Gigaswitches had it actually been developed, and rewritten to use something horrendous like MAC-based filtering, and if I recall correctly the options didn't look feasible at the time - and who wants to have to talk to a portal when doing a 2am emergency replacement of a linecard to change registered MAC addresses, anyway?. The port-based stuff had a chance of being operationally feasible. The notion of a partial pseudo-wire mesh, with a self-service portal to request/accept connections like the MAEs had for their ATM-based fabrics, follows pretty well from that and everything that's been learned by anyone about advancing the state of the art, and extends well to allow an IXP to have a distributed fabric benefit from scalable L2.5/L3 traffic management features while looking as much like wires to the networks using the IXP. If the gear currently deployed in IXP interconnection fabrics actually supports the necessary features, maybe someone will be brave enough to commit the software development resources necessary to try to make it an operational reality. If it requires capital investment, though, I suspect it'll be a while. The real lesson from the last fifteen or so years, though, is that bear skins and stone knives clearly have a long operational lifetime. Stephen
stephen, any idea why this hasn't hit the nanog mailing list yet? it's been hours, and things that others have sent on this thread has appeared. is it stuck in a mail queue? --paul re:
To: Deepak Jain <deepak@ai.net> cc: Matthew Moyle-Croft <mmc@internode.com.au>, Arnold Nipper <arnold@nipper.de>, Paul Vixie <vixie@isc.org>, "nanog@merit.edu" <nanog@merit.edu> Subject: Re: IXP Date: Sat, 18 Apr 2009 05:30:41 +0000 From: Stephen Stuart <stuart@tech.org>
Not sure how switches handle HOL blocking with QinQ traffic across trunks, but hey... what's the fun of running an IXP without testing some limits?
Indeed. Those with longer memories will remember that I used to regularly apologize at NANOG meetings for the DEC Gigaswitch/FDDI head-of-line blocking that all Gigaswitch-based IXPs experienced when some critical mass of OC3 backbone circuits was reached and the 100 MB/s fabric rolled over and died, offered here (again) as a cautionary tale for those who want to test those particular limits (again).
At PAIX, when we "upgraded" to the Gigaswitch/FDDI (from a DELNI; we loved the DELNI), I actually used a feature of the switch that you could "black out" certain sections of the crossbar to prevent packets arriving on one port from exiting certain others at the request of some networks to align L2 connectivity with their peering agreements. It was fortunate that the scaling meltdown occurred when it did, otherwise I would have spent more software development resources trying to turn that capability into something that was operationally sustainable for networks to configure the visibility of their port to only those networks with which they had peering agreements. That software would probably have been thrown away with the Gigaswitches had it actually been developed, and rewritten to use something horrendous like MAC-based filtering, and if I recall correctly the options didn't look feasible at the time - and who wants to have to talk to a portal when doing a 2am emergency replacement of a linecard to change registered MAC addresses, anyway?. The port-based stuff had a chance of being operationally feasible.
The notion of a partial pseudo-wire mesh, with a self-service portal to request/accept connections like the MAEs had for their ATM-based fabrics, follows pretty well from that and everything that's been learned by anyone about advancing the state of the art, and extends well to allow an IXP to have a distributed fabric benefit from scalable L2.5/L3 traffic management features while looking as much like wires to the networks using the IXP.
If the gear currently deployed in IXP interconnection fabrics actually supports the necessary features, maybe someone will be brave enough to commit the software development resources necessary to try to make it an operational reality. If it requires capital investment, though, I suspect it'll be a while.
The real lesson from the last fifteen or so years, though, is that bear skins and stone knives clearly have a long operational lifetime.
Stephen
On Sat, Apr 18, 2009 at 05:30:41AM +0000, Stephen Stuart wrote:
Not sure how switches handle HOL blocking with QinQ traffic across trunks, but hey... what's the fun of running an IXP without testing some limits?
Indeed. Those with longer memories will remember that I used to regularly apologize at NANOG meetings for the DEC Gigaswitch/FDDI head-of-line blocking that all Gigaswitch-based IXPs experienced when some critical mass of OC3 backbone circuits was reached and the 100 MB/s fabric rolled over and died, offered here (again) as a cautionary tale for those who want to test those particular limits (again).
Ohhh... Scary Stories! :)
The real lesson from the last fifteen or so years, though, is that bear skins and stone knives clearly have a long operational lifetime.
well... while there is a certain childlike obession with the byzantine, rube-goldburg, lots of bells, knobs, whistles type machines... for solid, predictable performance, simple clean machines work best.
Stephen
--bill
Date: Sat, 18 Apr 2009 10:09:00 +0000 From: bmanning@vacation.karoshi.com
... well... while there is a certain childlike obession with the byzantine, rube-goldburg, lots of bells, knobs, whistles type machines... for solid, predictable performance, simple clean machines work best.
like you i long for the days when a DELNI could do this job. nobody makes hubs anymore though. but the above text juxtaposes poorly against the below text:
Date: Sat, 18 Apr 2009 16:35:51 +0100 From: Nick Hilliard <nick@foobar.org>
... These days, we have switches which do multicast and broadcast storm control, unicast flood control, mac address counting, l2 and l3 acls, dynamic arp inspection, and they can all be configured to ignore bpdus in a variety of imaginative ways. We have arp sponges and broadcast monitors. ...
in terms of solid and predictable i would take per-peering VLANs with IP addresses assigned by the peers themselves, over switches that do unicast flood control or which are configured to ignore bpdu's in imaginative ways. but either way it's not a DELNI any more. what i see is inevitable complexity and various different ways of layering that complexity in. the choice of per-peering VLANs represents a minimal response to the problems of shared IXP fabrics, with maximal impedance matching to the PNI's that inevitably follow successful shared-port peerings.
On Sat, Apr 18, 2009 at 04:01:41PM +0000, Paul Vixie wrote:
Date: Sat, 18 Apr 2009 10:09:00 +0000 From: bmanning@vacation.karoshi.com
... well... while there is a certain childlike obession with the byzantine, rube-goldburg, lots of bells, knobs, whistles type machines... for solid, predictable performance, simple clean machines work best.
like you i long for the days when a DELNI could do this job. nobody makes hubs anymore though. but the above text juxtaposes poorly against the below text:
i never said i longed for DELNI's (although there is a naive beauty in such things) i make the claim that simple, clean design and execution is best. even the security goofs will agree.
but either way it's not a DELNI any more. what i see is inevitable complexity and various different ways of layering that complexity in. the choice of per-peering VLANs represents a minimal response to the problems of shared IXP fabrics, with maximal impedance matching to the PNI's that inevitably follow successful shared-port peerings.
complexity invites failure - failure in unusual and unexpected ways. small & simple systems are more nimble, faster and more resilient. complex is usually big, slow, fraught w/ little used code paths, a veritable nesting ground for virus, worm, half-baked truths, and poorly tested assumptions. one very good reason folks move to PNI's is that they are simpler to do. More cost-effective -AT THAT performance point-. I worry (to the extent that I worry about such things at all these days) that the code that drives the Internet these days is bloated, slow, and generally trying to become the "swiss-army-knife" application of critial infrastructure joy. witness BGP. more knobs/whistles than you can shake a stick at. the distinct lack of restraint by code developers in their desire to add every possible feature is argueably the primary reason the Internet is so riddled with security vulnerabilities. I'll get off my soap-box now and let you resume your observations that complexity as a goal in and of itself is the olny path forward. What a dismal world-view. --bill
On Sat, 18 Apr 2009 16:58:24 +0000 bmanning@vacation.karoshi.com wrote:
i make the claim that simple, clean design and execution is best. even the security goofs will agree.
"Even"? *Especially* -- or they're not competent at doing security. But I hadn't even thought about DELNIs in years. --Steve Bellovin, http://www.cs.columbia.edu/~smb
Date: Sat, 18 Apr 2009 13:17:11 -0400 From: "Steven M. Bellovin" <smb@cs.columbia.edu>
On Sat, 18 Apr 2009 16:58:24 +0000 bmanning@vacation.karoshi.com wrote:
i make the claim that simple, clean design and execution is best. even the security goofs will agree.
"Even"? *Especially* -- or they're not competent at doing security.
wouldn't a security person also know about http://en.wikipedia.org/wiki/ARP_spoofing and know that many colo facilities now use one customer per vlan due to this concern? (i remember florian weimer being surprised that we didn't have such a policy on the ISC guest network.) if we maximize for simplicity we get a DELNI. oops that's not fast enough we need a switch not a hub and it has to go 10Gbit/sec/port. looks like we traded away some simplicity in order to reach our goals.
On Sat, Apr 18, 2009 at 09:12:24PM +0000, Paul Vixie wrote:
Date: Sat, 18 Apr 2009 13:17:11 -0400 From: "Steven M. Bellovin" <smb@cs.columbia.edu>
On Sat, 18 Apr 2009 16:58:24 +0000 bmanning@vacation.karoshi.com wrote:
i make the claim that simple, clean design and execution is best. even the security goofs will agree.
"Even"? *Especially* -- or they're not competent at doing security.
wouldn't a security person also know about
http://en.wikipedia.org/wiki/ARP_spoofing
and know that many colo facilities now use one customer per vlan due to this concern? (i remember florian weimer being surprised that we didn't have such a policy on the ISC guest network.)
if we maximize for simplicity we get a DELNI. oops that's not fast enough we need a switch not a hub and it has to go 10Gbit/sec/port. looks like we traded away some simplicity in order to reach our goals.
er... 10G is old hat... try 100G. i'm not arguing for a return to smoke signals. i'm arguing that simplicity is often time gratuitously abandoned in favor of the near-term, quick buck. if i may paraphrase Albert, "Things should be as simple as possible, but no simpler" and ARP... well there's a dirt simple hack that the ethernet-based folks have never been able to shake. :) --bill
Paul Vixie wrote:
if we maximize for simplicity we get a DELNI. oops that's not fast enough we need a switch not a hub and it has to go 10Gbit/sec/port. looks like we traded away some simplicity in order to reach our goals.
Agreed. Security + Efficiency = base complexity 1Q has great benefits in security while maintaining a reasonable base complexity compared to "1 mac per port/MAC acl + broadcast storm control + <insert common L2/3 security/performance tweaks commonly used in a flat multi-point topology>". Things grow more complex as you reach up into MPLS. I'll show my ignorance and ask if it's possible to handle multicast on a separate shared tag and maintain security and simplicity while handling unicast on p2p tags? Standard methods of multicast on the Internet are foreign to me, and tend to act differently than multicast feeds standardly used for video over IP in local segments (from what little I have read). Primarily, I believe there was a reliance of unicast routing by multicast, which separate L2 paths might break. Jack
Thanks for talking about your PNIs. Let's see: Permit Next Increase Private Network Interface Private Network Interconnection Primary Network Interface and it goes on and on . . .
On Sat, 18 Apr 2009 21:12:24 +0000 Paul Vixie <vixie@isc.org> wrote:
Date: Sat, 18 Apr 2009 13:17:11 -0400 From: "Steven M. Bellovin" <smb@cs.columbia.edu>
On Sat, 18 Apr 2009 16:58:24 +0000 bmanning@vacation.karoshi.com wrote:
i make the claim that simple, clean design and execution is best. even the security goofs will agree.
"Even"? *Especially* -- or they're not competent at doing security.
wouldn't a security person also know about
I'm taking no position on the underlying argument; I'm simply stating that simplicity is an essential element for security. I like a philosophy I've seen attributed to Einstein: "everything should be as simple as possible, and no simpler". And yes, I know about ARP spoofing... --Steve Bellovin, http://www.cs.columbia.edu/~smb
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, Apr 18, 2009 at 5:11 PM, Steven M. Bellovin <smb@cs.columbia.edu> wrote:
I'm taking no position on the underlying argument; I'm simply stating that simplicity is an essential element for security. I like a philosophy I've seen attributed to Einstein: "everything should be as simple as possible, and no simpler".
Agreed -- and that reminds of the Dr. Who Maxim: "The more sophisticated the technology, the more vulnerable it is to primitive attack. People often overlook the obvious." Also, Voltaire: "Common sense is not so common.” - - ferg -----BEGIN PGP SIGNATURE----- Version: PGP Desktop 9.5.3 (Build 5003) wj8DBQFJ6m+lq1pz9mNUZTMRAoN8AKDTMNwoFDm+QGxfzNos0agHpetIHQCeOves q57FCgdaqbKKkrW5Ii/9P9Y= =Vc6Q -----END PGP SIGNATURE----- -- "Fergie", a.k.a. Paul Ferguson Engineering Architecture for the Internet fergdawgster(at)gmail.com ferg's tech blog: http://fergdawg.blogspot.com/
On Apr 19, 2009, at 5:12 AM, Paul Vixie wrote:
many colo facilities now use one customer per vlan due to this concern?
Haven't most major vendors for years offered features in their switches which mitigate ARP-spoofing, provide per-port layer-2 isolation on a sub-VLAN basis, as well as implementing layer-3 anti- spoofing on a per-switchport basis (i.e., BCP38 on a per-switchport basis)? ----------------------------------------------------------------------- Roland Dobbins <rdobbins@cisco.com> // +852.9133.2844 mobile Our dreams are still big; it's just the future that got small. -- Jason Scott
On Sat, 18 Apr 2009, Paul Vixie wrote:
"Even"? *Especially* -- or they're not competent at doing security.
wouldn't a security person also know about
http://en.wikipedia.org/wiki/ARP_spoofing
and know that many colo facilities now use one customer per vlan due to this concern? (i remember florian weimer being surprised that we didn't have such a policy on the ISC guest network.)
I tend to believe there is almost always more than one way to solve any problem, and if you can't think of more than one way you probably don't understand the problem fully. IXPs are a subset of the Colo problem, so there may be some issues for the colo case that IXPs can handle differently than general purpose colos. Why use "complex" DELNIs when you could just have passive coax and a real RF broadcast medium for your IXP. If all the IXP participants always did the right thing, you wouldn't need the IXP operator to do anything. The problem is sometimes an IXP participant does the wrong thing, and the other IXP participants want the IXP operator to do something about it which is probably why most IXP operators use stuff more complex than a passive coax. Other than Nick's list, are there any other things someone interested in checking IXP critical infrastructure might add to the checklist?
I'll get off my soap-box now and let you resume your observations that complexity as a goal in and of itself is the olny path forward. What a dismal world-view.
No-one is arguing that complexity is a goal. Opportunities to introduce gratuitous complexity abound, and defending against them while recognizing that the opportunity that represents genuine progress (trading outhouses for indoor plumbing, for example) is quite a challenge. I'm all for using the cleanest, simplest, and most reliable means to meet requirements. Not all IXPs have the same requirements driving their business, though - an IXP that operates a distributed metro-area fabric has additional concerns for reliability and cost-efficient use of resources than an IXP that operates a single switch. If requirements were such that I needed to buy and *use* a partial mesh topology for a distributed IXP fabric in the most reliable fashion possible, I'd much rather go the route described earlier than try to cobble something together with PVST/MST L2 technologies, but that's just me. You can assert that the status quo gives you solid predictable performance, but the reality is that you occasionally get sucked into a vortex of operational issues arising from L2's failure modes. To continue with my bad plumbing analogy, open sewers were a reliable means of moving waste material, easy to see when they were failing, but occasionally produced outbreaks of disease. Are open sewers still in use in the world today? You bet. The underlying hardware layer that IXPs use is capable of more than IXPs use. Whether to turn on those features is driven by requirements, from customers and from the economics of the business. I would argue, though, that at today's level of robustness and penetration of the technologies that we've been discussing, the customer "requirement" to peer on a shared VLAN is much more about complacency than avoiding risk (as you seem to be arguing). When we were turning PAIX from a private interconnect location into a public IXP, we questioned every assumption about what role IXPs played in order to ensure that we weren't making decisions simply to preserve the status quo. One of the things we questioned was whether to offer a peering fabric at all, or simply rely on PNIs. Obviously we opted to have a peering fabric, and I don't regret the decision despite the many long nights dealing with migration from FDDI to Ethernet (and the fun of translational bridge MTU-related issues during the migration), and the failure modes of Ethernet L2 - so your assertion that Ethernet L2 provides solid predictable performance needs to be modified with "mostly". I'll counter with an assertion that some L2.5/L3 networks are built and operated to more 9s than some IXP L2 networks that span multiple chassis. Whether that additional reliability makes business sense to offer, though, is a different question. If lack of complexity was a *requirement* that trumped all others, there would still be a DELNI at PAIX.
Stephen, that's a straw-man argument. Nobody's arguing against VLANs. Paul's argument was that VLANs rendered shared subnets obsolete, and everybody else has been rebutting that. Not saying that VLANs shouldn't be used. Sent via BlackBerry by AT&T -----Original Message----- From: Stephen Stuart <stuart@tech.org> Date: Sat, 18 Apr 2009 18:05:03 To: <bmanning@vacation.karoshi.com> Cc: nanog@merit.edu nanog@merit.edu<nanog@merit.edu> Subject: Re: IXP
I'll get off my soap-box now and let you resume your observations that complexity as a goal in and of itself is the olny path forward. What a dismal world-view.
No-one is arguing that complexity is a goal. Opportunities to introduce gratuitous complexity abound, and defending against them while recognizing that the opportunity that represents genuine progress (trading outhouses for indoor plumbing, for example) is quite a challenge. I'm all for using the cleanest, simplest, and most reliable means to meet requirements. Not all IXPs have the same requirements driving their business, though - an IXP that operates a distributed metro-area fabric has additional concerns for reliability and cost-efficient use of resources than an IXP that operates a single switch. If requirements were such that I needed to buy and *use* a partial mesh topology for a distributed IXP fabric in the most reliable fashion possible, I'd much rather go the route described earlier than try to cobble something together with PVST/MST L2 technologies, but that's just me. You can assert that the status quo gives you solid predictable performance, but the reality is that you occasionally get sucked into a vortex of operational issues arising from L2's failure modes. To continue with my bad plumbing analogy, open sewers were a reliable means of moving waste material, easy to see when they were failing, but occasionally produced outbreaks of disease. Are open sewers still in use in the world today? You bet. The underlying hardware layer that IXPs use is capable of more than IXPs use. Whether to turn on those features is driven by requirements, from customers and from the economics of the business. I would argue, though, that at today's level of robustness and penetration of the technologies that we've been discussing, the customer "requirement" to peer on a shared VLAN is much more about complacency than avoiding risk (as you seem to be arguing). When we were turning PAIX from a private interconnect location into a public IXP, we questioned every assumption about what role IXPs played in order to ensure that we weren't making decisions simply to preserve the status quo. One of the things we questioned was whether to offer a peering fabric at all, or simply rely on PNIs. Obviously we opted to have a peering fabric, and I don't regret the decision despite the many long nights dealing with migration from FDDI to Ethernet (and the fun of translational bridge MTU-related issues during the migration), and the failure modes of Ethernet L2 - so your assertion that Ethernet L2 provides solid predictable performance needs to be modified with "mostly". I'll counter with an assertion that some L2.5/L3 networks are built and operated to more 9s than some IXP L2 networks that span multiple chassis. Whether that additional reliability makes business sense to offer, though, is a different question. If lack of complexity was a *requirement* that trumped all others, there would still be a DELNI at PAIX.
"Bill Woodcock" <woody@pch.net> writes:
... Nobody's arguing against VLANs. Paul's argument was that VLANs rendered shared subnets obsolete, and everybody else has been rebutting that. Not saying that VLANs shouldn't be used.
i think i saw several folks, not just stephen, say virtual wire was how they'd do an IXP today if they had to start from scratch. i know that for many here, starting from scratch isn't a reachable worldview, and so i've tagged most of the defenses of shared subnets with that caveat. the question i was answering was from someone starting from scratch, and when starting an IXP from scratch, a shared subnet would be just crazy talk. -- Paul Vixie
In a message written on Fri, Apr 24, 2009 at 01:48:28AM +0000, Paul Vixie wrote:
i think i saw several folks, not just stephen, say virtual wire was how they'd do an IXP today if they had to start from scratch. i know that for many here, starting from scratch isn't a reachable worldview, and so i've tagged most of the defenses of shared subnets with that caveat. the question i was answering was from someone starting from scratch, and when starting an IXP from scratch, a shared subnet would be just crazy talk.
I disagree. Having no shared subnet renders an exchange switching platform useless to me. If I have to go to all the work of configuring both ends in a exchange point operator provisioning system (and undoubtly being billed for it), assigning a /30, and configuring an interface on my router then I will follow that procedure and order a hunk of fiber. Less points of failure, don't have to deal with how the exchange operator runs their switch, and I get the bonus of no shared port issues. The value of an exchange switch is the shared vlan. I could see an argument that switching is no longer necessary; but I can see no rational argument to both go through all the hassles of per-peer setup and get all the drawbacks of a shared switch. Even exchanges that took the small step of IPv4 and IPv6 on separate VLAN's have diminished value to me, it makes no sense. It's the technological equvilient of bringing everyone into a conference room and then having them use their cell phones to call each other and talk across the table. Why are you all in the same room if you don't want a shared medium? -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On Thu, Apr 23, 2009, Leo Bicknell wrote:
It's the technological equvilient of bringing everyone into a conference room and then having them use their cell phones to call each other and talk across the table. Why are you all in the same room if you don't want a shared medium?
Because you don't want to listen to what others have to say to you. Adrian (The above statement has network operational relevance at an IP level.)
Leo Bicknell wrote:
The value of an exchange switch is the shared vlan. I could see an argument that switching is no longer necessary; but I can see no rational argument to both go through all the hassles of per-peer setup and get all the drawbacks of a shared switch. Even exchanges that took the small step of IPv4 and IPv6 on separate VLAN's have diminished value to me, it makes no sense.
Cost. Shared port/ports versus port per peer, no physical cross connects to be made for each new peer. For a medium sized network, an IXP can provide cheap connectivity to many peers saving on transit costs. I'll admit, my knowledge is limited given I exist in the non-existent Oklahoma infrastructure, but I count the days (years?) until I can afford to light a 10Gb ring down to Dallas and hopefully minimize the number of ports and size of hardware I need down there to interconnect my ring (and thus me) to everyone else. Hopefully with as few physical interconnects as possible, as my Junipers ports are expensive for my size. I'll never be transit free, but perhaps I can get peering through an IXP and save some transit costs. Jack
Leo Bicknell wrote:
In a message written on Fri, Apr 24, 2009 at 01:48:28AM +0000, Paul Vixie wrote:
i think i saw several folks, not just stephen, say virtual wire was how they'd do an IXP today if they had to start from scratch. i know that for many here, starting from scratch isn't a reachable worldview, and so i've tagged most of the defenses of shared subnets with that caveat. the question i was answering was from someone starting from scratch, and when starting an IXP from scratch, a shared subnet would be just crazy talk.
I disagree.
Having no shared subnet renders an exchange switching platform useless to me. If I have to go to all the work of configuring both ends in a exchange point operator provisioning system (and undoubtly being billed for it), assigning a /30, and configuring an interface on my router then I will follow that procedure and order a hunk of fiber. Less points of failure, don't have to deal with how the exchange operator runs their switch, and I get the bonus of no shared port issues.
The value of an exchange switch is the shared vlan. I could see an argument that switching is no longer necessary; but I can see no rational argument to both go through all the hassles of per-peer setup and get all the drawbacks of a shared switch. Even exchanges that took the small step of IPv4 and IPv6 on separate VLAN's have diminished value to me, it makes no sense.
It's the technological equvilient of bringing everyone into a conference room and then having them use their cell phones to call each other and talk across the table. Why are you all in the same room if you don't want a shared medium?
I second that. We got to go through all the badness that was the ATM NAPs (AADS, PacBell NAP, MAE-WEST ATM). I think exactly for the reason Leo mentions they failed. That is, it didn't even require people to figure out all the technical reasons they were bad (many), they were fundamentally doomed due to increasing the difficulty of peering which translated to an economic scaling problem. i.e. if you make it hard for people to peer then you end up with less peers and shared vlan exchanges based on things like ethernet outcompete you. Been there done that. We've already experienced the result of secure ID cards and the PeerMaker tool. It was like pulling teeth to get sessions setup, and most peers plus the exchange operator didn't believe in oversubscription (can you say CBR? I knew you could), so you end up with 2 year old bandwidth allocations cast in stone because it was such a pain to get the peer to set it up in the first place, and to increase bandwidth to you means your peer has to reduce the bandwidth they allocated to somebody else. Mike. -- +---------------- H U R R I C A N E - E L E C T R I C ----------------+ | Mike Leber Wholesale IPv4 and IPv6 Transit 510 580 4100 | | Hurricane Electric AS6939 | | mleber@he.net Internet Backbone & Colocation http://he.net | +---------------------------------------------------------------------+
We got to go through all the badness that was the ATM NAPs (AADS, PacBell NAP, MAE-WEST ATM).
I think exactly for the reason Leo mentions they failed. That is, it didn't even require people to figure out all the technical reasons they were bad (many), they were fundamentally doomed due to increasing the difficulty of peering which translated to an economic scaling problem.
i.e. if you make it hard for people to peer then you end up with less peers and shared vlan exchanges based on things like ethernet outcompete you.
Been there done that.
We've already experienced the result of secure ID cards and the PeerMaker tool. It was like pulling teeth to get sessions setup, and most peers plus the exchange operator didn't believe in oversubscription (can you say CBR? I knew you could), so you end up with 2 year old bandwidth allocations cast in stone because it was such a pain to get the peer to set it up in the first place, and to increase bandwidth to you means your peer has to reduce the bandwidth they allocated to somebody else.
I, too, had a SecureID card, whose PIN I promptly forgot. I actually feel sorry for the poor software developers of that system; who knows what "requirements" were imposed on them by management fiat versus researched from the customer (and potential customer) base? Ethernet != shared VLAN, as I'm sure you know, so equating the two is non-sequitur. Ethernet has grown enough features that it can be used effectively in a variety of ways - and knowing which features to avoid is just as important as knowing which features to expose. "Not every knob that can be turned, should be turned." The challenge to a developer of the software infrastructure of a modern IXP is to take what we learned about the ease of use of shared VLAN peering and translate it into the world of pseudo-wire interconnect. Does it have to be as hard as PeerMaker? Clearly not. If someone is going to jump into that space, though, there's a lot of homework to do to research what a provisioning system would need to do to present as little a barrier to peering as possible. Your argument, and Leo's, is fundamentally the complacency argument that I pointed out earlier. You're content with how things are, despite the failure modes, and despite inefficiencies that the IXP operator is forced to have in *their* business model because of your complacency.
In a message written on Fri, Apr 24, 2009 at 05:06:15PM +0000, Stephen Stuart wrote:
Your argument, and Leo's, is fundamentally the complacency argument that I pointed out earlier. You're content with how things are, despite the failure modes, and despite inefficiencies that the IXP operator is forced to have in *their* business model because of your complacency.
I do not think that is my argument. I have looked at the failure modes and the cost of fixing them and decided that it is cheaper and easier to deal with the failure modes than it is to deal with the fix. Quite frankly, I think the failure modes have been grossly overblown. The number of incidents of shared network badness that have caused problems are actually few and far between. I can't attribute any down-time to shared-network badness at exchanges (note, colos are a different story) in a good 5-7 years. On the contrary, I can attribute downtime already to paranoia about it. When I had an ethernet interface fail at a colo provider to remain nameless I was forced to call the noc, have them put the port in a "quarantine" vlan, watch it with tcpdump for a hour, and then return it to service. Total additional downtime after the bad interface was replaced, 2 hours. I have no idea how watching an interface in a vlan with tcpdump supposedly protects a shared network. Remember the 7513's, where adding or removing a dot1q subinterface might bounce the entire trunk? I know of several providers to this day that won't add/remove subinterfaces during the day, but turning up BGP sessions on shared lans can be done all day long. The scheme proposed with private vlan's to every provider adds a significant amount of engineering time, documentation, and general effort to public peering. Public peering barely makes economic sense when its cost is as close to free as we can get it, virtually any increase makes it useless. We've already seen many major networks drop public peering all together because the internal time and effort to deal with small peers is not worth the benefit. Important volumes of traffic will be carried outside of a shared switch. The colo provider cannot provision a switching platform at a cost effective rate to handle all cross connects. So in the world of PNI's, the public switch, and shared segment already select for small players. You may want to peer with them because you think it's fair and good, you may do it to qualify up and comers for PNI's, but you're not /public peering/ for profit in 99% of the cases. All this is not to say private VLAN's aren't a service that could be offered. There may be a niche for particular size networks with particular sized flows to use them for good purposes. Colo providers should look at providing the service. A replacement for a shared, multi-access peering LAN? No. No. No. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On 24/04/2009 18:46, Leo Bicknell wrote:
I have looked at the failure modes and the cost of fixing them and decided that it is cheaper and easier to deal with the failure modes than it is to deal with the fix.
Leo, your position is: "worse is better". I happen to agree with this sentiment for a variety of reasons. Stephen Stuart disagrees - for a number of other carefully considered and well-thought-out reasons. Richard Gabriel's essay on "worse is better" as it applied to Lisp is worth reading in this context. The ideas he presents are relevant well beyond the article's intended scope and are applicable to the shared l2 domain vs PI interconnection argument (within reasonable bounds). Nick
On Fri, Apr 24, 2009 at 12:46 PM, Leo Bicknell <bicknell@ufp.org> wrote:
Quite frankly, I think the failure modes have been grossly overblown. The number of incidents of shared network badness that have caused problems are actually few and far between. I can't attribute any down-time to shared-network badness at exchanges (note, colos are a different story) in a good 5-7 years.
Wait aren't you on NYIIX and Any2? Those two alone are good for 5-7 times a year like clockwork. Please allow me to send you a complementary copy of "The Twelve Days of NYIIX" for your caroling collection this December: On the first day of Christmas, NYIIX gave to me, A BPDU from someone's spanning tree. On the second day of Christmas, NYIIX gave to me, Two forwarding loops, And a BPDU from someone's spanning tree. On the third day of Christmas, NYIIX gave to me, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree. On the fourth day of Christmas, NYIIX gave to me, Four Foundry crashes, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree. On the fifth day of Christmas, NYIIX gave to me, Five flapping sessions, Four Foundry crashes, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree. On the sixth day of Christmas, NYIIX gave to me, Six maintenances notices, Five flapping sessions, Four Foundry crashes, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree. On the seventh day of Christmas, NYIIX gave to me, Seven broadcast floods, Six maintenances notices, Five flapping sessions, Four Foundry crashes, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree. On the eighth day of Christmas, NYIIX gave to me, Eight defaulting peers, Seven broadcast floods, Six maintenances notices, Five flapping sessions, Four Foundry crashes, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree. On the ninth day of Christmas, NYIIX gave to me, Nine CDP neighbors, Eight defaulting peers, Seven broadcast floods, Six maintenances notices, Five flapping sessions, Four Foundry crashes, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree. On the tenth day of Christmas, NYIIX gave to me, Ten proxy ARPs, Nine CDP neighbors, Eight defaulting peers, Seven broadcast floods, Six maintenances notices, Five flapping sessions, Four Foundry crashes, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree. On the eleventh day of Christmas, NYIIX gave to me, Eleven OSPF hellos, Ten proxy ARPs, Nine CDP neighbors, Eight defaulting peers, Seven broadcast floods, Six maintenances notices, Five flapping sessions, Four Foundry crashes, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree. On the twelfth day of Christmas, NYIIX gave to me, Twelve peers in half-duplex, Eleven OSPF hellos, Ten proxy ARPs, Nine CDP neighbors, Eight defaulting peers, Seven broadcast floods, Six maintenances notices, Five flapping sessions, Four Foundry crashes, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree.
In a message written on Fri, Apr 24, 2009 at 04:22:49PM -0500, Paul Wall wrote:
On the twelfth day of Christmas, NYIIX gave to me, Twelve peers in half-duplex, Eleven OSPF hellos, Ten proxy ARPs, Nine CDP neighbors, Eight defaulting peers, Seven broadcast floods, Six maintenances notices, Five flapping sessions, Four Foundry crashes, Three routing leaks, Two forwarding loops, And a BPDU from someone's spanning tree.
Let's group: Problems that can/will occur with per-vlan peering: Twelve peers in half-duplex, Six maintenances notices, Five flapping sessions, Four Foundry crashes, Three routing leaks, Two forwarding loops, Problems that if they affect your equipment, you're configuring it wrong, and can/will occur with per-vlan peering: Eleven OSPF hellos, Nine CDP neighbors, Problems that if they affect the exchange, the exchange is configuring their equipment wrong, and can/will ocurr with per-vlan peering: Two forwarding loops, And a BPDU from someone's spanning tree. Problems unique to a shared layer 2 network: Eight defaulting peers, Seven broadcast floods, Leaving aside the particular exchanges, I'm going to guess you are not impressed by the technical tallent operating the exchange switches from the tone of your message. Do you believe making the configuration for the exchange operation 100 times more complex will: A) Lead to more mistakes and down time. B) Lead to less mistakes and down time. C) Have no effect? I'm going with A. I also think the downtime from A, will be an order of magnitude more down time than the result of defaulting peers (which, generally results in no down time, just theft of service), or broadcast floods. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On 24.04.2009 03:48 Paul Vixie wrote
"Bill Woodcock" <woody@pch.net> writes:
... Nobody's arguing against VLANs. Paul's argument was that VLANs rendered shared subnets obsolete, and everybody else has been rebutting that. Not saying that VLANs shouldn't be used.
i think i saw several folks, not just stephen, say virtual wire was how they'd do an IXP today if they had to start from scratch. i know that for many here, starting from scratch isn't a reachable worldview, and so i've tagged most of the defenses of shared subnets with that caveat. the question i was answering was from someone starting from scratch, and when starting an IXP from scratch, a shared subnet would be just crazy talk.
I like to disagree here, Paul. Best regards, Arnold -- Arnold Nipper / nIPper consulting, Sandhausen, Germany email: arnold@nipper.de phone: +49 6224 9259 299 mobile: +49 172 2650958 fax: +49 6224 9259 333
Paul Vixie wrote:
in terms of solid and predictable i would take per-peering VLANs with IP addresses assigned by the peers themselves, over switches that do unicast flood control or which are configured to ignore bpdu's in imaginative ways.
Simplicity only applies when it doesn't hinder security (the baseline complexity). PE/BRAS systems suffer from a subset of IXP issues with a few of their own. It amazes me how much "security" has been pushed from the PE out into switches and dslams. Enough so, that I've found many vendors that break IPv6 because of their "security" features. 1Q tagging is about the simplest model I have seen for providing the necessary isolation, mimicking PNI. For PE, it has allowed complete L3 ignorance in the L2 devices while enforcing security policies at the aggregation points. For an IXP it provides the necessary isolation and security without having an expectation of the type of L3 traffic crossing through the IXP. It's true that 1Q tagging requires a configuration component, but I'd hesitate to call it complex. 10,000 line router configs may be long, but often in repetition due to configuration limitations rather than complex. HE's IPv6 tunnel servers are moderately more complex and have handled provisioning well in my experience. Multicast was brought up as an issue, but it's not less efficient than if PNI had been used, and a structure could be designed to meet the needs of multicast when needed. Jack
Sorry, hit "send" a little early, by accident. On Apr 17, 2009, at 11:52 AM, Paul Vixie wrote:
with the advent of vlan tags, the whole idea of CSMA for IXP networks is passe. just put each pair of peers into their own private tagged vlan.
I'm not sure whether you're being sarcastic, and if I'm not sure, I bet people who don't know you really aren't sure. So: the only nominal IXP I know of where that's really been experimented with seriously is MYIX, in Kuala Lumpur, where it's been a notable failure. The other 300-and-some IXPs do things normally, with an IX subnet that people can peer across. So, the advent of standardized . 1Q tags in 1998, preceded by ISL for many years before that, has not yet rendered the 99.6% majority best-practice passe. Just a clarification. -Bill
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Elmar K. Bins wrote:
I am not an IXP operator, but I know of no exchange (public or private, big or closet-style) that uses private ASNs or RFC1918 space.
I know of at least two IXPs where RFC 1918 space is used on the IXP Subnet. I know a fair number of IXPs where providers use Private ASNs even for longish durations. I also know of a lot of IXPs where IPv4 prefixes longer then /24s are visible. But, as others have said, in most cases these measures are temporary in nature and eventually everyone will migrate. thanks -gaurab -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknpajwACgkQSo7fU26F3X1/KgCg8P6or9LD7kldigNW38OhJ5eF r9wAnRtbbGel2JZVFRJ0xqLbcxWUeBUQ =dVae -----END PGP SIGNATURE-----
Hello NANOG,
I like would to know what are best practices for an internet exchange. I have some concerns about the following; Can the IXP members use RFC 1918 ip addresses for their peering? Can the IXP members use private autonomous numbers for their peering?
Maybe the answer is obviuos, but I like to know from any IXP admins what their setup/experiences have been.
If you read RFC1918, the intention is to use that space within an enterprise. There is some wisdom there. It is unclear why you would want to do this, as the ARIN/etc allocation rules for an IXP are generally trivial. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.
Theorically it's doable. But mostly No to your questions. IXP means Internet eXchange Point. So it is public Internet. Why do you want to use private IP address ? Most RIR allocate /24 unit for IXP. For troubleshooting purpose, it is better to use public IP address as it is designed. Unless you want to have MPLS/VPN only connections, and use private IP Addr/ASN between them. Sharlon R. Carty wrote:
Hello NANOG,
I like would to know what are best practices for an internet exchange. I have some concerns about the following; Can the IXP members use RFC 1918 ip addresses for their peering? Can the IXP members use private autonomous numbers for their peering?
Maybe the answer is obviuos, but I like to know from any IXP admins what their setup/experiences have been.
On 17/04/2009 15:11, Sharlon R. Carty wrote:
I like would to know what are best practices for an internet exchange. I have some concerns about the following; Can the IXP members use RFC 1918 ip addresses for their peering? Can the IXP members use private autonomous numbers for their peering?
Maybe the answer is obviuos, but I like to know from any IXP admins what their setup/experiences have been.
If it's your exchange, you can do anything you want. I one saw a network which used 127.0.0.0/8 for connectivity. But I'd strongly suggest insisting from day 1: - public IP addresses for ipv4 and ipv6 - requirement for all members to use BGP, their own ASN and their own address space - no customer IGPs - dropping customer bpdus on sight - ruthless and utterly fascist enforcement of one mac address per port, using either L2 ACLs or else mac address counting, with no exceptions for any reason, ever. This is probably the single more important stability / security enforcement mechanism for any IXP. You should also take a look at the technical requirements on some of the larger european IXP web sites (linx / ams-ix / decix / etc), to see what they allow and don't allow. It goes without saying that you're not going to be able to do this on your average low-end switch. Nick
I have been looking at ams-ix and linx, even some african internet exchanges as examples. But seeing how large they are(ams-x & linx) and we are in the startup phase, I would rather have some tips/examples from anyone who has been doing IXP for quite awhile. So far all the responses have been very helpful. On Apr 18, 2009, at 1:28 PM, Nick Hilliard wrote:
On 17/04/2009 15:11, Sharlon R. Carty wrote:
I like would to know what are best practices for an internet exchange. I have some concerns about the following; Can the IXP members use RFC 1918 ip addresses for their peering? Can the IXP members use private autonomous numbers for their peering?
Maybe the answer is obviuos, but I like to know from any IXP admins what their setup/experiences have been.
If it's your exchange, you can do anything you want. I one saw a network which used 127.0.0.0/8 for connectivity. But I'd strongly suggest insisting from day 1:
- public IP addresses for ipv4 and ipv6 - requirement for all members to use BGP, their own ASN and their own address space - no customer IGPs - dropping customer bpdus on sight - ruthless and utterly fascist enforcement of one mac address per port, using either L2 ACLs or else mac address counting, with no exceptions for any reason, ever. This is probably the single more important stability / security enforcement mechanism for any IXP.
You should also take a look at the technical requirements on some of the larger european IXP web sites (linx / ams-ix / decix / etc), to see what they allow and don't allow.
It goes without saying that you're not going to be able to do this on your average low-end switch.
Nick
On 18.04.2009 21:51 Sharlon R. Carty wrote
I have been looking at ams-ix and linx, even some african internet exchanges as examples. But seeing how large they are(ams-x & linx) and we are in the startup phase, I would rather have some tips/examples from anyone who has been doing IXP for quite awhile. So far all the responses have been very helpful.
Do what Nick suggested and you will run a real safe IXP. Nick knows how to do that. Arnold -- Arnold Nipper / nIPper consulting, Sandhausen, Germany email: arnold@nipper.de phone: +49 6224 9259 299 mobile: +49 172 2650958 fax: +49 6224 9259 333
- public IP addresses for ipv4 and ipv6 - requirement for all members to use BGP, their own ASN and their own address space
just to not confuse, that is behind the peering port. the peering port uses the exchange's ipv4/6 space
- no customer IGPs - dropping customer bpdus on sight - ruthless and utterly fascist enforcement of one mac address per port, using either L2 ACLs or else mac address counting, with no exceptions for any reason, ever. This is probably the single more important stability / security enforcement mechanism for any IXP.
You should also take a look at the technical requirements on some of the larger european IXP web sites (linx / ams-ix / decix / etc), to see what they allow and don't allow.
sharlon, reread nick's advice a few times, maybe pin it to your wall.
It goes without saying that you're not going to be able to do this on your average low-end switch.
just curious. has anyone tried arista for smallish exchanges, before jumping off the cliff into debugging extreme, foundry, ... randy
On 19.04.2009 01:08 Randy Bush wrote
just curious. has anyone tried arista for smallish exchanges, before jumping off the cliff into debugging extreme, foundry, ...
last time I look at them their products lacked port security or anything similiar. Iirc it's on the roadmap for thier next generation of switches. Arnold -- Arnold Nipper / nIPper consulting, Sandhausen, Germany email: arnold@nipper.de phone: +49 6224 9259 299 mobile: +49 172 2650958 fax: +49 6224 9259 333
just curious. has anyone tried arista for smallish exchanges, before jumping off the cliff into debugging extreme, foundry, ... last time I look at them their products lacked port security or anything similiar.
whoops!
Iirc it's on the roadmap for thier next generation of switches.
bummer, as performance and per-port cost are certainly tasty. randy
On 19.04.2009 01:38 Randy Bush wrote
just curious. has anyone tried arista for smallish exchanges, before jumping off the cliff into debugging extreme, foundry, ... last time I look at them their products lacked port security or anything similiar.
whoops!
Iirc it's on the roadmap for thier next generation of switches.
bummer, as performance and per-port cost are certainly tasty.
Indeed ... Afaik low latency is due to the fact that Arista boxes are doing cut through. Pricewise they are very attractive. And Arista EOS actually is more or less a full blown Linux which allows you to do _really_ tricky things. Arnold -- Arnold Nipper / nIPper consulting, Sandhausen, Germany email: arnold@nipper.de phone: +49 6224 9259 299 mobile: +49 172 2650958 fax: +49 6224 9259 333
Iirc it's on the roadmap for thier next generation of switches. bummer, as performance and per-port cost are certainly tasty. Afaik low latency is due to the fact that Arista boxes are doing cut through.
no shock there
Pricewise they are very attractive. And Arista EOS actually is more or less a full blown Linux which allows you to do _really_ tricky things.
and they expose, not hide it. so you can do management in python, which i think is damned cool. shame it wasn't unix. i have street gossip that basic mac security may be testable quite soon, but more serious port security is some months off. at that point, it should be an interesting device to try in a small exchange. billo dropped one into the ietf network, it looked to be a bitchin' device for datacenter deployment, which i gather is the intended market. randy
On Sat, 18 Apr 2009, Nick Hilliard wrote:
- ruthless and utterly fascist enforcement of one mac address per port, using either L2 ACLs or else mac address counting, with no exceptions for any reason, ever. This is probably the single more important stability / security enforcement mechanism for any IXP.
Well, as long as it simply drops packets and doesn't shut the port or some other "fascist" enforcement. We've had AMSIX complain that our Cisco 12k with E5 linecard was spitting out a few tens of packets per day during two months with random source mac addresses. Started suddenly, stopped suddenly. It's ok for them to drop the packets, but not shut the port in a case like that. -- Mikael Abrahamsson email: swmike@swm.pp.se
On Sun, 19 Apr 2009, Mikael Abrahamsson wrote:
On Sat, 18 Apr 2009, Nick Hilliard wrote:
- ruthless and utterly fascist enforcement of one mac address per port, using either L2 ACLs or else mac address counting, with no exceptions for any reason, ever. This is probably the single more important stability / security enforcement mechanism for any IXP.
Well, as long as it simply drops packets and doesn't shut the port or some other "fascist" enforcement. We've had AMSIX complain that our Cisco 12k with E5 linecard was spitting out a few tens of packets per day during two months with random source mac addresses. Started suddenly, stopped suddenly. It's ok for them to drop the packets, but not shut the port in a case like that.
From the IX operator perspective it is important to immediately shut down a port showing a packet from an extra MAC address, rather than just silently dropping them. The "fascist" reason being that it is a quick and effective way of informing the participant that their recent maintenance has gone afoul. At the SIX we have err-disable recovery set to 5 minutes so that the port will come back up automatically. (sometimes only to be shutdown again two packets later, and usually before any BGP sessions have returned)
If the port is left up with the rogue packets simply being dropped, and the exchange sends the participant a followup email informing them of the problem, the participant's maintenance window may have already have passed and so problem resolution tends to get extended. In cases that are temporarily unfixable, such as router bug, we have been known to change the port config such that the rogue packets are just dropped/logged rather than answered with a shutdown, but that is rare. Chris SIX Janitor
On 19.04.2009 19:43 Chris Caputo wrote
On Sun, 19 Apr 2009, Mikael Abrahamsson wrote:
On Sat, 18 Apr 2009, Nick Hilliard wrote:
- ruthless and utterly fascist enforcement of one mac address per port, using either L2 ACLs or else mac address counting, with no exceptions for any reason, ever. This is probably the single more important stability / security enforcement mechanism for any IXP.
Well, as long as it simply drops packets and doesn't shut the port or some other "fascist" enforcement. We've had AMSIX complain that our Cisco 12k with E5 linecard was spitting out a few tens of packets per day during two months with random source mac addresses. Started suddenly, stopped suddenly. It's ok for them to drop the packets, but not shut the port in a case like that.
From the IX operator perspective it is important to immediately shut down a port showing a packet from an extra MAC address, rather than just silently dropping them.
We (DE-CIX) simply nail each MAC statically to the customer port and allow traffic from these statically configured MAC addresses to enter the switch fabric. Initially this was done as a workaround as the F10 boxes didn't support port-security. Meanwhile we think this is the best way to handle MAC management. As a benefit there is no need to shut down customer ports when frames from additional MACs arrive. These are simply ignored. Works really great for us. YMMV. Arnold -- Arnold Nipper / nIPper consulting, Sandhausen, Germany email: arnold@nipper.de phone: +49 6224 9259 299 mobile: +49 172 2650958 fax: +49 6224 9259 333
On 19/04/2009 08:31, Mikael Abrahamsson wrote:
Well, as long as it simply drops packets and doesn't shut the port or some other "fascist" enforcement. We've had AMSIX complain that our Cisco 12k with E5 linecard was spitting out a few tens of packets per day during two months with random source mac addresses. Started suddenly, stopped suddenly. It's ok for them to drop the packets, but not shut the port in a case like that.
Yes, and <sigh> it's not that simple. There are known situations on certain switch platforms where if you use "violation restrict" on a port, and that port sees incoming mac addresses which belong to someone else on the exchange lan, the restrict command will wipe those mac addresses from the cam and the other person's equipment can lose connectivity. So violation restrict can cause collateral damage, which is really rather nasty. Also, Cisco GSR E5 cards aren't the only cards which inject junk from time to time. Not irregularly, I see routers from another Well Known Router Vendor injecting ipv6 frames with no mac headers. This bug appears to be tickled when the router's bgp engine gets a sudden spanking. There are other situations where bogus macs appears, mostly related to either old or nasty hardware, but enough to make blanket use of shutdown-on-violation a problem too. So I'll eat my words and admit that I actually do care when I see this sort of thing - because it causes problems, and is the sign of broken hardware, broken software or more often, bad network configuration, all of which are matters of concern, and which indicate a problem which needs attention. But however bogus packets are dealt with - whether restrict, shutdown or ignore, the most important thing is that they are never forwarded. Nick
participants (40)
-
Adrian Chadd
-
Alan Hannan
-
Alex H. Ryu
-
Antonio Querubin
-
Arnold Nipper
-
Bill Woodcock
-
bmanning@vacation.karoshi.com
-
Chris Caputo
-
Dale Carstensen
-
Daniel Roesen
-
Deepak Jain
-
Elmar K. Bins
-
Gaurab Raj Upadhaya
-
Holmes,David A
-
Ivan Pepelnjak
-
Jack Bates
-
Jeff Young
-
Joe Greco
-
kris foster
-
Lamar Owen
-
Leo Bicknell
-
Matthew Moyle-Croft
-
Michael K. Smith - Adhost
-
Mikael Abrahamsson
-
Mike Leber
-
Nathan Ward
-
Nick Hilliard
-
Niels Bakker
-
Nuno Vieira - nfsi telecom
-
Paul Ferguson
-
Paul Vixie
-
Paul Wall
-
Randy Bush
-
Richard A Steenbergen
-
Roland Dobbins
-
Sean Donelan
-
Sharlon R. Carty
-
Stephen Stuart
-
Steven M. Bellovin
-
vijay gill