Notes from the IX operator's BOF--last set before I get beer and gear. :) Apologies for the gap, I nodded off partway through, but hopefully didn't miss too much of the important content. ^_^;; Matt 2008.02.18 IX operator's panel Louie Lee from Equinix starts the ball rolling. Welcome to IXP BOF, anyone is welcome, mainly for exchanges, but all customers welcome. AGENDA: Greg Dendy Equinix forklift upgrade learning process Niels Bakker, Cabling Challenges for large chassis AMSIX Greg Dendy L2 Hygiene Equinix Jeff d'Ambly, BFD experiments Equnix ring protocols out there, hear what we'd like to see Mike Hughes, sFlow portal work at LINX Hopefully will fill the 90 minutes they have. It will be discussion style, don't just have people talk at you, will make it interactive. Greg Dendy is first up. Congrats to Louie for 8 years at Equinix. Forklift upgrades at all the public exchange points for Equinix. Increasing demand for 10 gig port capacity not supported on the current platform. Current feature sets less stable and resilient than preferred Current platform didn't support newer technologies like DWDM optics Service continuity for all ports during upgrade is paramount Long process first steps 6-24 months ahead of time test available platforms that do or can support the required features Work with vendors to develop platforms into IX production candidates; will be visible, but not high volume sales. Make the vendor decision, rather like jumping off a cliff. 3-6 months before upgrade test and verify platform for software for launch; pick and nail it down. arrange for appropriate cage/cabinet/power within the datacenter; 150amps of DC power, need to negotiate with facilities to get space/power. Order all needed parts; patch panels, patch cabling, new cross connects 1-3 months prior receive production hardware, install, test in production space; hw components, ports, optics, etc. finalize maintenance date (avoid conference dates!) pick at good time, 10pm to 4am, not optimal for europe, but that's when fabric low point is. 0-1 month prior finalize new pre-cabled patch panels, cross-connects and platform configuration. develop and document maintenance proceedure. may take 6-7 iterations of that process. Notify customers and field questions Train the migration team; 10-12 people from product managers down to rack-and-stack folks. Double-recheck of all the elements one more time. Migration time! baseline snapshot--mac table, arp table, pingsweep, bgp state summary. Interconnect new and old platforms with redundant trunks for continuity Move connections port by port to new platform, confirm each is up/up and works before moving on CAM entry ICMP reachability Equinix Peering session Traffic resumes at expected levels Patrick notes they have route collectors IP is .250 or .125, AS 65517 Route collector is only way they can troubleshoot issues with routing that come up. Old topology slide. Migrated using double ring to new boxes. Finishing Migration Final checks when migration is complete overall platform health: jitter node at each site traffic levels back to normal Threshold monitoring for frame errors Route collector peers are all up Disable trunks to old paltform Close maintenance window -- do round robin from everyone involved, and notify customers. They don't stock old optics in case new platforms don't work with customers. Usually between customer and switch, optics issues crop up during turnups. XFP, X2/XFP-plus tend to be problematic; LR to ZR for example; may happen to work in old platform, new platform may not interact the same way. Problem solved--plenty of 10G port capacity to meet near and long term demand IX platform has increased resilieance and statbility with improved failover Access to new tech: DWDM XFP optics Lessons learned: Lots of testing; you can't really retest too much Vendor participation/cooperation is paramount. If vendor won't work with you, pick someone else. Teamwork is crucial Communication among the team is vital as well. Prepare, prepare, then prepare yet again. Question--is it better to move customers as a batch to shorten up the maintenance window, even though that increases the chance of simultaneous issues. They prefer to be cautious. For testing, do they run full traffic through the platform first? They test cross connects, yes. For the whole platform, they pushed as much traffic as they could for a few days through the box. Generally, not as much throughput testing, mostly digital lightwave on ports and fibers. They focus not so much on throughput as software stability. Leverage vendor tests on the box as much as possible. What about security side? They do lots of MAC filtering test, make sure it learns correctly, make sure MAC learning and MAC security aren't done in parallel so that traffic leaks through, for example. Q: have their been issues that have cropped up, and do you test for anomalies, like undervoltage situations, etc. and make sure gear is solid? As much is possible. Q: Do they negotiate finders-fees for bugs which they send back to vendor? No, but they wish they could. Niels Bakker, AMSIX -- cabling challenges for large switches Stupid Fibers for 10GbE connections. Before we start AMS-IX uses photonic switches with redundant ethernet switches, effectively tripling the amount of fibers needed patch panel to pxc pxc to first ethernet switch pxc to second ethernet switch great for redundancy (ethernet switch lots more complicated than a photonic switch) Bad old days in 10gigE didn't figure everyone would want it. Standard LC fiber patches everywhere Rather vulnerable have to be put in place pretty much individually to have them line up correctly difficult to get rid of slack Before and after photos; lots of fiber extenders. Photonic cross connect switch with LC connectors Solutions Breakout cables bundles eight strands into one cable easy to install specify up front precisely what you want interlock snap Wartelplaat to hold bundles in place. Picture showing breakout bundles, saves huge amount of time; sturdy, can run a whole bunch in one go, pre-tested by the supplier Interlock snap on the end of cable. 32 ports with breakout. 144 port photonic cross connect switch MLX-32 top half--actually did use a forklift to put it in. Solutions (2) MPO cables 12 strands in one high-density connector, 8 used work with latest generation of glimmerglass photonic switches find a cable that is not ribbon (so it can bend in all direction stil LC on other side (TX/Rx) MPO on switch side would be nice too! different LR vs ER blades MPO cables out of glimmerglass go in groups of 8; one set of Rx, other goes to all Tx MPO cables on left; Even bigger breakout cables, and mRJ-21 RX-8; 24 strands at the other side; fiber after the tap is very small. Yellow go to photonic switch, different rx and tx. Where do you they get the cool cables? What are failure rates? A few arrived broken, possibly during transport or installation, they just send them back for warranty installations. No cases of bundle cables going bad thus far. With dual switch situation, the backup path could be bad and might not show up until a switch toggle happens. why did you run RX and TX in different multicores? Out of necessity; optical switch puts 8 rx on a port and 8 tx on a different port. Thanks to Louie for starting the BOF off while Mike was still at the break. Greg Dendy back up to the mike Switch Fabric Health and Stability (or "L2 Hygiene") Why? Loops are bad, mmkay Broadcast storms Proxy ARP (argh!) Next-hop foo IGP leaks Other multi/broadcast traffic (I'm looking at you, CDP!) Customer connections: single-port remote-ethernet LAG ports redundant ports remote ethernet handoffs are sometimes the scary ones How to prevent loops. Firm policy against them. switch policies MAC ACLs on each interface MAC larning limit of 1 Static MACs LACP for Aggregated Links Please use LACP; heartbeat is heart, if it goes away, links drop. Quarantine VLAN OSPF hellos IGMP/PIM MOP (cisco 6500/7600) old DECnet--turn it off! CDP Manual scans also sFlow reporting IDs illegitimate traffic tools for peers to use to understand from where peering traffic originates Other measures Jitter statistics measures metro IX latency available to IX participants Full disclosure to all IX participants communication during planned and unplanned events full RFO investigation and reporting for all outages Jeff d'Ambly now talks about BFD experiments. Bidirectional Forwarding Detection anyone running it on sessions privately? Anyone doing it over an exchange switch? IETF describes BFD as a protocol intended to detect faults in the bidirectional path between two forwarding engines. Why? Ensures connectivity backplane to backplane, makes sure you have connectivity. independent of media, routing protocol, and data layer protocol detection of one-way link failures no changes to existing protocols faster BGP fault detection BGP timers remain at default values IXP can easily support BFD nothing the IXP needs to do Failover times When configuring BFD, be aware of the different failure times on your network. MSTP .8 to 4 seconds Ring protocol 200 to 500ms VPLS less than 50ms Static LAG Less than 1ms (rehash) LACP 1 to 30 seconds Example topology in lab 4 node ring Sample cisco config neigbor 192.168.100.1 fall-over bfd bfd interval 999 min_rx 999 multiplier 10 JunOS config bfd-liveness-detection detection-time FOO What holds operators back from BFD deployment? Is there anything else IXPs can do to raise awareness and best practices around BFD for ebGP sessions? Really, just need to raise awareness; would be good for IXPs to publish their best current practices for what timers work best for their exchange point. What is a good time for it? 10seconds seems to be a fair compromise. Depends largely on application. Prior to this, used PNI to get fast failver; this could help reduce need for PNIs. One point is made that it will control how fast it tears it down; it will not change how fast it tries to come back up. Two camps with BFD; one says it should be on the route processor, so you know the brain at the other end is alive; other group says it should be on linecard to not load down main CPU RFC talks about maintain state, but it hasn't been implemented yet; and no lack of dampening in BFD can end up causing a cascading failure throughout the rest of the network. Can't afford to have a flap in one area churn CPU to a degree it affects other parts of the network. Don't turn it on without thinking about the implications of this! IX operator can't replicate all the bits of our networks as well, so it's good to have open exchange of information. Mike Hughes, sFlow port work at LINX Mike is a huge carbon footprint? recycled slides from LINX last week. Were you flying blind? a big L2 network is like a bucket with traffic sloshing around inside IXP gets traffic engineering data IX customers get flow data datagram flow the exchange -> sfacctd->mysql<->php/xml<->ajax/php Switches export 1 per 2000 per port, ingress to each port; vlan between all switches, not being handled by management port, but actually through fabric. sfacctd slices and dices sflow data. Database layout 1minute samples 5 minute samples 15 minute averages 2x dual core CPUs, HP servers 16GB RAM 820GB RAID6 array 8x146 disks about 50GB data/month select ports on the web page, lists your traffic, sorts by column headers, etc. can display graphs of your traffic, etc. proper web application Add-in processing of extreme LAN data authenticated direct XML interface for members sFlow observation based peering matrix (with opt-out ability) Improve engineering tools toptalkers reports switch-to-switch 'traffic matrix' per-ethertype, MoU violating traffic, etc. proactive notification agent be able to configure various thresholds, recieve alerts weekly "overview report" Would love to hear what kind of features you want to see on an sFlow portal--let them know! Beer and Gear time!