Here's my notes from second half of NANOG today. Now off to bear and gear. :) Matt 2009.10.19 NANOG 47 Monday notes, part 2 Mike Hughes starts things off after lunch at 1436 hours Pacific time. Few bits of administrivia still. If you want to submit a lightning talk, you can do it up until 7pm today. Please vote for the committee members! PC nominations close this evening as well; if you'd like to be on it, do that as well, as much help as possible is needed. 3 lightning talks next up. First up is Ernest McCracken http://netlabs.cs.memphis.edu NetViews: real time visualization of Internet Path Dynamics for Network Management Started doing this as part of his undergrad work. Goal was to help researchers visualize network paths. Topology mapping typically try to represent internet architectures. Scatter, skitter, Rocket Fuel, CAIDA, why graph in realtime? monitor realtime reachability spot anomalous depeering identify route hijacking and misconfigurations developing next-gen routing monitor system. BGPMon -- realtime lightweight BGP monitor with over 70 peers--allows for fast updates NetViews - visualizes both control plane paths (via BGP updates) and forwarding paths (via active probing) BGPMon is running, you connect to it, get the routing updates; data broker sends BGP updates. Prober probes target network from BGP peers to get path updates. GeoCoder and IP crawler get geographic info, and traceable IPs for probing. Slide showing data pathway They probe during routing events; a timeline showing BGP updates during the timeline. They keep probing until they see no additional updates. Visualization filters to show networks based on the number of ASes an AS connects to. You can see the updates scroll in realtime on the live map as the updates come into the system. Blue is path additions/changes, Red for changes. They can also visualize forwarding paths, but there's challenges in inferring forwarding paths based on traceroutes. Future work: correlate forwarding and routing dynamics to create a classification model for internet paths add scalability by having clients run traceroute jobs in a P2P fashion Give client users the ability to communicate with each other Funded by NSF, and collaborating with UCLA, ColoState and UofO on BGPMon system. Any questions? Q: Dave Meyer--can it be run internally? What infrastructure do you need? Server portal runs in lab, clients can run on any java client. Synch up with him afterwards if you have any summer internships available. Next talk Jim Cowie from Renesys The recession and the routing table Reading the tea leaves They dig into the routing tables to see what's happening. Tough times, tough questions We konw that internet transit purchases are sensitive to business conditions (2000 crash) is the 2008-2009 recession affecting growth in the global/regional routing tables Should be some sign of pullback in the routing tables like in 2000. 3 years of North American routing--it's still going up, there's no depression visible. Why did the table keep growing? Enterprises don't cut costs by leaving the internet, they cut costs by reducing diversity cheap transit getting cheaper acts like "easy money" prospect of v4 runout may result in "use it or lose it" addition of routes into table. Half the table is just hanging out with 1 provider. Number of prefixes with 4 or more providers is going up. The 1 provider networks either go to no-longer-advertised or shift to 2 or 3 providers. More go to the "no-longer-seen" pool; fewer upgrade to the next category up. People postpone getting to multihoming. Triple-homing seems to be sweet spot. 4 or more provider pool is getting larger and more stable over time; you don't tend to decrease over time. Global recession might give more of a break before v4 exhaustion Cheap transit killed that theory some evidence of single- and dual- homed customers putting off the move to higher order multihoming in 2007 and 2008 "obviously practicing for IPv6 transition, after which apparently multihoming becomes unnecessary" Otherwise, growth continues apae Bring on the post-IPv4 marketplace! Q: Randy Bush--BGP is a great data hiding system; it doesn't tell you much about the real topology of the internet. How do you determine how a prefix has a single upstream? A: ask him afterwards. Q: is this transit AS? A: Yes Q: You have to have seen the AS through another AS, that's how you can count the upstreams. Joe Abley up to the front from ICANN DNSSec for the Root Zone Matt Larson, from VeriSign. Info update for those who care about DNSSec collaboration between ICANN and VeriSign with DoC ICANN is IANA functions operator Manages the Key Signing Key Accepts DS records from TLD operators Verifies and processes request Sends update requests to DoC for authorization and to VeriSign for implementation DoC NTIA Authorizes changes to the root zone DS records Root key sets DNSSec updates VeriSign manages the zone signing key Proposed Approach to protect the KSK CPS--certificate practice statement DPS, DNSSEC policy and practice statement basically, to assure people the practices are adequate to protect it. community trust proposal that community representatives have an active role in management of the KSK as crypto officers needed to activate the KSK as backup key share holders protecting shares of the symmetric keys in case of disaster recovery Auditing and Transparency Third-party auditors check that ICANN... webcast of sessions KSK is 2048 bit RSA key rolled every 2-5 years RFC5011 for automatic key rollovers propose using signatures based on SHA-256 but there's no shipping code based on this Zone signing Key (held by verisign) ZSK is 1024-bit RSA rolled once a quarter SHA-256 signature Signature validityRRSIG validity 15 days resign every 10 days Other RRSIG validity 7 days resign twice a day Key Ceremonies Key Generation Generation of new KSK Every 2-5 years Processing of ZSK signing request (KSR) signing ZSK for the next upcoming quarter Root Trust Anchor published on a web site by ICANN as XML-wrapped and plain DS record to facilitate automatic processing PKCS#10 certificate signing request Roll Out incremental roll out of the signed root groups of root server "letters" at a time watch the query profile to all root servers as roll out progresses Listen to community feedback for any issues No validation Real keys will be replaced by dummy keys while rolling out the signed root signatures not valid during roll out actual keys will be published at end of rollout Timeline December 1, 2009 root zone signed initially signed zone stays internal to ICANN and Verisign ICANN Jan-July 2010 incremental roll out of signed root July 1, 2010 KSK rolled out root trust anchor ISP Security BOF later today will talk about it. Full architectural documents around the process will be published in the next few weeks. Next speaker is Paul Francis, talking about Virtual Aggregation. Reducing FIB Size with Virtual Aggregation (VA) ISPs often want to extend the life of old routers Routers that have inadequate FIB but otherwise are still useful A common approach--use old routers as customer PE, default to core Other FIB/RIB shrinking tips Filter out more specific routes For lower-tier ISPs, default to transit ISPs ie use 0/0 and load balance among transit ISPs BUT leads to non-optimal routes lots of configuration (peer routes, "important" routes like Google) Can't be used by transit ISPs themselves Mitigating non-optimal default routes Use more-specific "semi-defaults" AS3303 Swisscom IP-Plus point 62/8, 80/7, 21/7, etc. to EU transit ISP ARIN space to US transit class B 128/3, 160/5, 168/6 to US transit IETF working on a more general solution: virtual agg GROW working group draft-ietf-grow-va-00 -va-gre-00 -va-mpls-00 -va-perf-00 VA is a way to control FIB size in routers DFZ FIB, not VPN tables does not shrink RIB size Tight control of FIB size for any or all routers no coordination between ISPs works with legacy routers Important today--possibly critical tomorrow? looking forward, BGP RIB growth rate could increase substantially exhaustion of v4 erodes aggregation because of pressure to shrink default prefix size uptake of v6 VA can help ease these pressures VA not perfect Requires configuration of its own Entails a traffic load/FIB size tradeoff which can be quite good academic study on large transit ISP 10x fib reduction with negligible latency/load penalty But in general we don't know how easy to achieve this-- configuration... Why this talk? You can help us define VA certain protocols or configuration details alternative ways to deploy or tell us that VA is useless encourage your vendor to implement VA current implementations from Huawei and ?? VA Basic Idea Define "Virtual Prefixes" (VP) These are shorter (bigger) than real prefixes think of /6s, /7s, /8s Assign different routers to be "responsible" for different virtual prefixes ie, they need to know how to route everything in the VP FIB-suppression BGP runs as normal all routers have full RIB important to not muck with BGP operation per se suppress updates to FIB for more specifics of virtual prefixes APR (aggregation point router) for 22/8 originate route to 22/8 with nexthop being itself it FIB-installs all sub-prefixes within 22/8 other routers FIB-suppress all prefixes within 22/8 This just tunnel-maps from one router to another out to the egress point. The only router with the need to know how to route that packet was APR1 (well, that, and the ingress router) The packet takes a bit of a longer path to do this with simple aggregates. You can add "popular prefixes" to routers to point them along "better" paths. Types of tunnels defined MPLS (using LDP) GRE ... A deployment example Robert Rasuzck at Cisco shows a POP site with 4 PE customer agg routers, 2 Rs, 2 RRs; core can use tunnels between them already. Use RRs as APRs -- can optionally FIB-install routes for which PE is egress If you do FIB suppression at the RR layer Then need to install popular prefixes at the PE layer--GROW looking to automate that part. VA from our point of view Figure out where you need FIB reduction Based on this, design your deployment select VPs assign routers as APRs, configure configure "VIP-list" New IETF GROW WG work item for FIB suppression Q: Patrick, Akamai--this seems very complex; couldn't we just take prefixes out of the FIB that are covered by a shorter prefix with same next-hop; wouldn't that be much easier to do, and save FIB space? Could we maybe ask vendors to look into doing that? A: Lixia may have done some looking into that; she says that two people on her team, they found out that you can compress your FIB between 10 and 50% by simply suppressing more specifics with same next hops. She was going to give a talk at GROW at the next meeting that would do this. Q: Doni from PeakWeb, was asking vendors for this around the 200,000 routes in the FIB; the vendors were wanting to simply sell more hardware. Which routers need the full FIB in the drawing? A: None of them need full routes. Generally got about 10X saving in all of them. Q: Owen deLong--if you already have all the routers everywhere, it might make sense; if you have just 2 routers in a POP, this looks like a distributed CAM load, to have multiple routers pretend to look like one router A: yes, it's like that. Q: RAS--remember the 8k Foundry boxes? They had 8k CAM table, and their solution was to either have just default, or break it up into /12s; this is similar, it just limits based on number of next-hops they have. Could we get benefits from doing more simple aggregation like that? Q: Igor notes you can probably just upgrade for cheaper than transferring all sorts of routes back and forth and paying for additional interconnect ports. Q: Anton Kapella, have they considered looking about Auto-TE QoS stuff internally? If packets are being redirected around internally, it does mean something for link-loading; how will this interact with QoS, since this will transport packets along links not originally planned for it? In what they saw, very few packets used the non-optimal paths. Next up. We'll do coffee break at 1615, BOFs at 1645 BGP# - a system for dynamic route control in data centers tenants and landlords one landlord owner and manager of the datacenter many tenants internal users search, email, gaming external users utility computing customers empower tenants to control routing decisions Routing tensions tenants have different goals tenant goal--spread traffic or migrate traffic from one server to another current system, tenant submits tickets to get routing changed. whole ticket flow is shown Tenants have limited control over routing A better system allow for automated route control allow tenants independent and safe route control ensure scalability allow for maintenance changes BGP# simple speakers (multispeaker) peer with BGP routers send route announcements/withdrawls (ECMP capable) Stateful controller (controller) controls coordinates speakers custom API ("applications") Application runs on tenant box; speaks to controller via API; controller speaks to multispeaker which peers with router to send the update to spread traffic, similar thing; application uses API to ask controller which asks mutispeaker; it has 2 sessions to router, with 2 next hops for prefix. Automated route control controller API allows for custom applications Application can automatically manage routes Independent and safe route control only allow a tenant to change their own prefixes. Scalability Multispeaker and controller not placed in machines handling user trafic eliminates need for one policy controller per machine reduces peering sessions to router eliminate per-ticket manual intervention Resiliency ensure system continues operating instantiate multiple multispeakers single multispeaker failure doesn't affect other MS ability separate multispeaker and controller prefix resiliency -- ensure prefix stays available announce same prefix from mutiple multispeaker router retains prefix even if one MS fails Automation service could deploy a new multispeaker with same config if one dies. No inconsistency with multiple Multispeakers suppose some multispeakers become unresponsive BGP# listening tool detects the lack of router readvertisement suppose multispeaker reboots and is in different state? get config and state from persistent store Alternate approach each tenant sets up its own BGP instances needs one session per machine landlord may need to deal with many BGP peers Conclusions: Tenants have more power Landlord retains responsibility for validation of routes. system achieves stability and resiliency Q: Francis asks if BGP is an awfully coarsegrained tool to use for something like this--what about using MPLS for setting up flows. A: BGP finite state machine is much simpler to follow and update. We'll go into coffee break now; BOFs start at 1645 hours Eastern time. SC elections, JUST DO IT!! PC nominations open for 3 more hours! 1800 hours in Regency for Bear and Gear. BOFs, Mobile Data Track, ISP Security BOF, and DNS BOF will be upstairs in DeSoto room. Tuesday we start at 0830 again with breakfast. For now, I head over to the DNS thingie BoF IPv6 and resolvers; how do we make it less painful? For most people, rolling out IPv6 can't break IPv4, and separate hostnames isn't scaleable long term. Per Google, 0.078% users are impacted by enabling quad A on machines. Assuming a user base of 600M that's 470K users that get broken, which isn't acceptable. Right now, in browsers, IPv4 fallback is on the order of 21 seconds to 181 seconds, which is for most SLA numbers considered "broken" Options? Don't roll out IPv6 prefer A over AAAA accept the breakage what about checking for working IPv6 connectivity before sending back AAAA record. Only way to know if user has working v6 transport is if the AAAA request came via IPv6 instead of IPv4 Recursive servers need to be set to only return AAAA *only* when request came in via IPv6; otherwise, return A record only. Now, auth DNS server only has to worry about IPv6 reachability to the recursive server. We've asked if ISC can write this; ISC has done this, it'll be in BIND 9.7; it'll be in a second beta coming out in early November; if you want to check it out, if you're on user list or beta list, you'll get notification; otherwise, check ISC site in early november and it should be there. Feature will be a knob you have to turn on. There's an additional check put in; if DNSSEC is set, it won't forge DNS answers unless there's a knob set "BREAKDNSSEC" that you can turn on, the knob is going to be very well documented. But if you've gone through the work of setting up DNSSEC, you should know how to troubleshoot it yourself. This should be be set up for resolvers facing customers, not for internal services that have odd configurations. What about having an ACL for controlling behaviour for different subnets? If they fit in a view statement, you already have that capability. Will this be available within a view? Yes, you can do it there. But the ACL idea is interesting, and could be better than pushing people towards views. This really goes on the recursive side. We need to try to convince ISPs to use these options. If a request comes from a 6to4 address; the source is a 6to4 address; do you respond with AAAA or not. How about ACLs with flexibility to see if it's over v4 but from a 6to4 address to send different responses. A simple default policy is good, but the flexibility is good for more experienced sites. Simple on-or-off knob is in 9.7b2; more granular control will be needed for later versions... what about 6rd? They would get no AAAA results in that scenario. We might need a DNSv6 option for DHCPv4 which would be able to give back the v6 DNS servers. Should we put together an information draft for IETF; we can draft one up, so the three of them should talk; Igor Gashinksi, Yahoo, Larissa Shapiro, ISC, Alain Durand, Comcast OARC meeting, Beijing, Jason Fesler will be there to talk about it. ISOC meeting in Paris next week, we'll be there to talk about it as well. Internet2 joint techs talked about it as well. General consensus is that this is a necessary evil hack. If we can get it working with 6rd, it'll be an interesting working solution to a common problem. What about shoving JavaScript code into web pages to report DNS lookups back via REST infrastructure to get an idea what the types of breakage. OS, browser, IP, and which test cases break or not. We could do a series of test queries, and see which ones break or not. Give A Give AAAA Result of the query comes into the beacon server, so we can see if they saw the reply or not. It would be better for the javascript to reply back with what *it* saw, as well as see what the server log sees. There's some javascript coding challenges with collecting this data. If we can at least break it down to 3 buckets 6to4 teredo native it would help really pin down where the breakage. Do note that the percentage shown wasn't Yahoo data, that was Google's data, so we don't have that breakdown ourselves. Would going to AS level be too specific for people? We'd need to consider carefully privacy issues and anonymizing the data as per our privacy rules. What about running an experiment in partnerships for specific ASNs? Point is, this is coming out, do share the data, this will be going into mainline code release. It's opt-in, defaulted off. The *actual* names for the config options are... This will apply to just the RR set, not the glue set; if glue returns AAAA, it'll still come back intact. The tests are really testing recursive lookup server to the last proxy device in front of the client. But what if the recursive server to auth server breakage happens? If the recursive server lookup side, can we turn the knob on in the other direction? This is an interesting challenge; we'll have to see how much additional work this will need, and how much additional funding will be needed to cover for it. ISC will...look into the feasability of doing that. The IETF draft cutoff is tonight for Paris, so maybe it'll be done for the Anaheim one, at which point we'll have working code out there, and a bit more time for writing the draft. We wrap up the BOF at 1724 hours Pacific time. (what about a switch for auth servers that allows for turning off "don't send AAAA records to ZZZZZ"?)