NANOG, October 24 & 25, 1994 (version 2) Notes by Stan Barber <sob@academ.com> Thanks to Guy Almes and Stan Borinski for their corrections and additions to these notes. [Please note that any errors are mine, and I'd appreciate corrections being forwarded to me. The first version of this document is now available at the following URL: http://rrdb.merit.edu/nanaog.octminutes.html] Elise Gerich opened the meeting with Merit's current understanding of the state of the transition. THENET , CERFNET and MICHNET have expressed specific dates for transition. The current NSFNET contract with Merit will terminate on April 30, 1995. John Scudder then discussed some modeling he and Sue Hares have done on the projected load at the NAPs. The basic conclusions are that the FDDI technology (at Sprint) will be saturated sometime next year and that load-balancing strategies among NSPs across the NAPS is imperative for the long term viability of the new architecture. John also expressed concern over the lack of expressed policy for the collection of statistical data by the NAP operators. All of the NAP operator are present and stated that they will collect data, but that there are serious and open questions concerning the privacy of that data and how to publish it appropriately. John said that collecting the data was most important. Without the data, there is no source information from which publication become possible. He said that MERIT/NSFNET had already tackled these issues. Maybe the NAP operators can use this previous work as a model to develop their own policies for publication. After the break, Paul Vixie discussed the current status of the DNS and BIND. Specifically, he discusses DNS security. There are two reasons why DNS are not secure. There are two papers on this topic and they are both in the current BIND kit. So the information is freely available. Consider the case of telneting across the Internet and getting what appears to be your machine's login banner. Doing a double check (host->address, then address->host) will help eliminate this problem. hosts.equiv and .rhosts are also sources of problems. Polluting the cache is a real problem. Doing UDP flooding is another problem. CERT says that doing rlogin is bad, but that does not solve the cache pollution problem. How to defend? 1. Validate the packets returned in a response to the query. Routers should drop UDP packets on which the source address don't match what it should be. (e.g. a udp packet comes in on a WAN link that should have come in via an ethernet interface). TCP is harder to spoof because of the three-way handshake, however running all DNS queries over TCP will add too much overhead to this process. 2. There are a number of static validations of packet format that can be done. Adding some kind of cryptographic information to the DNS would help. Unfortunately, this moves very slowly because there are a number of strong conflicting opinions. What is being done? The current BETA of BIND has almost everything fixed that can be fixed without a new protocol. Versions prior 4.9 are no longer supported. Paul may rewrite this server in the future, but it will still be called named because vendors have a hard time putting it into their releases if it is called something else. Paul is funded half-time by the Internet Software Consortium. Rick Adams funds it via UUNET's non-profit side. Rick did not want to put it under GNU. ISC also is now running a root server and in doing this some specific issues related to running root servers are now being addressed in fixes to BIND. DNS version 2 is being discussed. This is due to the limit in the size of the udp packet. Paul M. and Paul V. are working to say something about this at the next IETF. HP, Sun, DEC and SGI are working with Paul to adopt the 4.9.3 BIND once it is productional. After this comes out, Paul will start working on other problems. One problem is the size of BIND in core. This change will include using the Berkeley db routing to feed this from a disk-based database. There will also be some effort for helping doing load-balancing better and perhaps implementing policy features. What about service issues? Providing name service is a start. DEC and SGI will be shipping BIND 4.9.3 will be shipping it with the next release. Paul has talked to Novell, but noone else....Novell has not been a helpful from the non-Unix side. RA Project : Merit and ISI with a subcontract with IBM ISI does the Route Server Development and the RA Futures Merit does the Routing Registry Databases and Network Management The Global Routing Registry consists of the RADB, various private routing registries, RIPE and APNIC. The RADB will be used to generate route server configurations and potentially router configurations. 1993 -- RIPE 81 1994 -- PRIDE tools April 1994 -- Merit Routing Registry September 1994 -- RIPE-181 October 1994 -- RIPE-181 Software implementation November 1994 -- NSP Policy Registrations/Route Server Configurations Why use the RADB? Troubleshooting, Connectivity, Stability The Route Server by ISI with IBM They facilitate routing information exchange. They don't forward packets. There are two at each NAP with one AS number. They provide routing selection and distribution on behalf of clients (NSPs). [Replication of gated single table use = view] Multiple views to support clients with dissimilar route selection and/or distribution policies. BGP4 and BGP4 MIB are supported. RS's AS inserted in AS path, MED is passed unmodified (this appears controversial). Yakov said that Cisco has a hidden feature to ignore AS_PATH and trust MED. The Route Servers are up and running on a testbed and have been tested with up to 8 peers and 5 views. Target ship date to 3 NAPS is October 21. The fourth will soon follow. The Network Management aspect of the RA project uses a Hierarchically Distributed Network Management Model. At the NAP, only local NM Traffic, externalizes NAP Problems, SNMPv1 and SNMPv2 are supported. OOB Access provides seamless PPP backup & console port access. Remote debugging environment is identical to local debugging environment. The Centralized Network Management System at Merit polls distributed rovers for problems, consolidates the problems into ROC (Routing Operations Center) alert screen. It was operational on August 1st which is operated by the University of Michigan Network Systems at the same location as the previous NSFNET NOC. This group current provides support for MichNet and UMNnet. It is expected to provide service to CICnet. Currently, it provides 24/7 human operator coverage. Everything should be operational by the end of November. Routing Futures -- Route Server decoupling packet forwarding from routing information exchange, scalability and modularity. For example, explicit routing will be supported (with the development of ERP). IPv6 will be provided. Doing analysis of RRDB and define a general policy language (backward compatible with RIPE 181). Routing policy consistency and aggregation will be developed. Securing the route servers -- All of the usual standard mechanisms are being applied. Single-use passwords.... mac-layer bridges .... etc....How do we keep the routes from getting screwed intentionally? Denial of service attacks are possible. A design document on the route server will be available via the RRDB.MERIT.EDU WWW server. There is a serious concern to synchronization of the route servers and the routing registries. No solution has been implemented currently. Merit believes that will do updates at least once a day. There was mention of using the rwhois from the InterNIC as a possible way to configure the routing DB by pooling local information. Conversion from PRDB to RRDB The PRDB is AS 690 specific, NACRs, twice weekly and AUP constrained. The RADB has none of these features. Migration will occur before April of 1995. The PRDB will be temporarily part of the Global Routing Registry during transition. Real soon now -- Still send NACR and it will be entered into PRDB and RRDB. Constancy checking will be more automated. Output for AS 690 will be compared from both to check consistency. While this is happening, users will do what they always have. [Check ftp.ra.net for more information.] There is alot of concern among the NANOG participants about the correctness of all the information in the PRDB. Specifically, there appears to be some inaccuracy (homeas) of the information. ESnet has a special concern about this. [Operators should send mail to dsj@merit.edu to fix the missing homeas problem.] Transition Plan: 1. Continue submitting NACRs 2. Start learning RIPE 181 3. Set/Confirm your AS's Maintainer object for future security 4. Switch to using Route Templates (in December) When it all works --RADB will be source for AS690 configuration, NACRs will go away, use local registries RADB to generate AS690 on second week of December. NACRs to die at the end of that week. European Operators' Forum Overview -- Peter Lothberg [I missed this, so this information is from Stan Borinski] Peter provided some humorous, yet interesting observations on the status of the Internet in Europe. To show the tremendous growth occurring in Europe as well, he gave an example. After being out of capacity on their Stockholm E1 link for some time, they finally installed another. It took one day for it to get it to capacity! Unfortunately, the E1 costs $700,000/year. [Back to my notes.... -- Stan Barber] Proxy Aggregation -- CIDR by Yakov Rekhter Assumptions -- Need to match the volume of routing information with the available resources, while providing connectivity server -- on a per provider basis. Need to match the amount of resource with the utility of routing information -- on a per provider basis. But what about "MORE THRUST?" It's not a good answer. Drives the costs up, doesn't help with complexity of operations, eliminates small providers Proxy aggregation -- A mechanism to allow aggregation of routing information originated by sites that are BGP-4 incapable. Proxy aggregation -- problems -- full consensus must exist for it to work. Local aggregation -- to reconnect the entity that benefits from the aggregation and the party that creates the aggregation. Bilateral agreements would control the disposition of doing local aggregation. Doing the aggregation at exit is better, but harder than doing it at entry. Potential Candidates for Local Aggregation -- Longer prefix in presence of a shorter prefix, Adjacent CIDR Blocks, Aggregation over known holes. Routing in the presence of Local Aggregation -- AS and router that did the aggregation is identified via BGP (AGGREGATOR attribute) Should register in RRDB Summary -- adding more memory to routers is not an answer Regionals should aggregate their own CIDR blocks An NSP may do local aggregation and register it in the RRDB. Optimal routing and large scale routing are mutually exclusive. CIDR is the only known technique to provide scalable routing in the Internet. Large Internet and the ability of every site to control its own routing are mutually exclusive. Yakov also noted that 64Mb routers won't last as long as IPv4. [More notes from Stan Borinski, while I was out again.] Ameritech NAP Labs by Andy Schmidt Ameritech performed tests with RFC 1323 kernel modifications on Sun Sparc machines. A window of 32k was enabled at line speed. The AT&T switch used by Ameritech has buffers that are orders of magnitude larger than other vendors. All studies discussed showed bigger buffers were the key to realizing ATM's performance capabilities. [Back to my notes -- Stan Barber] Sprint Network Reengineering -- Sean Doran T-3 Network with sites in DC, Atlanta, Ft.Worth and Stockton currently. Will be expanding to Seattle, Chicago and Sprint NAP in the next several months. ICM uses this network for transit from one coast to the other. They expect to create a separate ICM transit network early next year. Next NANOG will be at NCAR in February. PacBell NAP Status--Frank Liu The Switch is a Newbridge 36-150. NSFNET/ANS connected via Hayward today. MCINET via Hayward today. PB Labs via Concord today. Sprintlink connected via San Jose (not yet). NETCOM connected via Santa Clara in the next Month. APEX Global Information Services (based in Chicago) will connect via Santa Clara, but not yet. The Packet Clearing House (consortium) for small providers connected via Frame Relay to PB NAP. They will connect via one router to the NAP. It is being led by Electric City's Chris Allen. CIX connections are also in the cloud, but not in the same community yet. Testing done by Bellcore and PB. [TTCP was used for testing. The data was put up and removed quickly, so I did lose some in taking notes.] One source (TAXI/Sonet) -> One sink Two Sources (TAXI/Sonet) -> One Sink Five Sources (ethernet connected) ->One Sink (ethernet connected) Equipment issues -- DSU HSSI Clock mismatch with the data rate (37 DSSI clock rate versus 44 data rate versus a theoretical 52). Sink devices does not have enough processing power to deal with large numbers of 512 byte packets. Also, there was MTU mismatch issues between the SunOS (512 bytes) machines used and the Solaris (536 bytes) machines used. One Source-> One Sink MSS Window Throughput (out of 40Mb/sec) 4470 51000 33.6 4470 25000 22.33 Two Source -> One Sink 4470 18000 33.17 (.05% cell loss, .04% packet restrans) 1500 51000 15.41 (.69% cell loss, 2.76% packet restrans) Conclusions Maximum throughput is 33.6 Mbps for the 1:1 connection. Maximum throughput will be higher when the DSU HSSI clock and data-rate mismatch is corrected. Cell loss rate is low (.02% -- .69%). Throughput degraded with the TCP window size is greater than 13000 bytes. Large switch buffers and router traffic shaping are needed. [The results appear to show TCP backing-off strategy engaging.] Future Service Plan of the SF-NAP-- Chin Yuan Currently, the NAP does best effort with RFC 1490 encapsulation. March 1995 -- Variable Bit Rate, Sub-Rate Tariff (4,10,16,25,34 and 40Mbps on 51, 100 and 140Mbps on OC3c). At CPE: Static Traffic Shaping and RFC 1483 and 1577 support [Traffic Shaping to be supported by Cisco later this year in API card for both OC3c and T3.] June 1995 -- Support for DS1 ATM (DXI and UNI at 128, 384 kbps and 1.4Mbps) 1996 or later -- Available Bit Rate and SVCs. At CPE: Dynamic Traffic Shaping Notes on Variable Bit Rate: Sustainable Cell Rate(SCR) and Maximum Burst Size (MBS)--- * Traffic Policing * Aggregated SCR is no greater than the line rate * MBS = 32, 100, 200 cells (Negotiable if > 200 cells) Peak Cell Rate (possible) * PCR <=line rate Traffic shaping will be required for the more advanced services. Available Bit Rate will require feedback to the router. ANS on performance --- Curtis Villamizar There are two problems: aggregation of lower-speed TCP flows, support for high speed elastic supercomputer application. RFC 1191 is very important as is RFC-1323 for these problems to be addressed. RFC 1191 -- Path MTU discovery RFC 1323 -- High Performance Extensions for TCP The work that was done -- previous work showed that top speed for TCP was 30Mbs. The new work -- TCP Single Flow, TCP Multiple Flow, using TCP RED modifications (more Van Jacobson majic!) to handle multi-size windows. Environment -- two different DS3 paths (NY->MICH: 20msec; NY->TEXAS->MICH: 68msec), four different versions of the RS6000 router software and Indy/SCs Conditions -- Two background conditions (no background traffic, reverse TCP flow intended to achieve 70-80% utilization) Differing numbers of TCP flows. Results are available on-line via http. Temporarily it is located at: http://tweedledee.ans.net:8001:/ It will be on line rrdb.merit.edu more permanently. It is important that vendors support RED and the two RFCs previously mentioned to handle this problem. Also, Curtis believes that the results presented by the NAP operators has little validity because there is no delay as a component of their tests. ATM -- What Tim Salo wants from ATM.... [I ran out of alertness, so I apologize to Tim for having extremely sketchy notes on this talk.] MAGIC -- Gigabit TestBed Currently Local Area ATM switches over SONET. Mostly FORE switches. LAN encapsulation (ATM Forum) versus RFC 1537 Stan | Academ Consulting Services |internet: sob@academ.com Olan | For more info on academ, see this |uucp: bcm!academ!sob Barber | URL- http://www.academ.com/academ |Opinions expressed are only mine.
Stan: Regarding the NAP traffic projections and implications for FDDI NAPs ... Sprint is in the process of acquiring a FDDI switch and plan to have it installed and operational in the next couple of months. High-traffic NSPs can then be moved over to the switch, giving 100 (or slightly more) Mbps capacity pairwise. Switch monitoring and management will be integrated with existing NAP management operations. Thanks for the notes ... -- Bilal (San Diego Supercomputer Center)
John Scudder then discussed some modeling he and Sue Hares have done on the projected load at the NAPs. The basic conclusions are that the FDDI technology (at Sprint) will be saturated sometime next year and that load-balancing strategies among NSPs across the NAPS is imperative for the long term viability of the new architecture. John also expressed concern over the lack of expressed policy for the collection of statistical data by the NAP operators. All of the NAP operator are present and stated that they will collect data, but that there are serious and open questions concerning the privacy of that data and how to publish it appropriately. John said that collecting the data was most important. Without the data, there is no source information from which publication become possible. He said that MERIT/NSFNET had already tackled these issues. Maybe the NAP operators can use this previous work as a model to develop their own policies for publication.
Thanks for the notes! My comments are belated responses to the participants, since I was unable to be at the the meeting.
Throughput degraded with the TCP window size is greater than 13000 bytes.
We never use a maximum window size this small. Our system default is 32k which is 1.5 times the actual pipe size for a T1 connected site at 120 mS. This is near optimal for typical users on the west coast, who are one or two T1 hops away from the current NSFnet. This is slightly too aggressive for typical users on the East coast. But it is an order of magnitude too small for many of our users in Boston, San Francisco, Champaign-Urbana, etc. furthermore, I believe that a number of vendors are shipping workstations with large default window sizes, including SGI IRIX_52 on all platforms and OSF/1 for the DEC Alpha. A 13000 byte maximum window size is insufficient. I would like to "second" Curtis' remarks about the impact of round trip delay on traffic burstyness. The essence of the problem is that TCP controls the total amount of data out in the network, but has no control over the distribution of data within one round trip time. Slow start and the "turbulence" effects discussed in Lixia Zhang's paper on two way traffic (sigcomm'92) tend to maximize this burstyness. I have recently become aware of a weaker criteria for success that should also be considered. If you imagine a infinite bandwidth network with finite delay and loss rates, TCP will run at some finite rate determined by the delay, MTU and loss rate. A quick, back of the envelope calculation (neglecting many possibly important terms) yields: BW = MTU/RTT * sqrt(1.5/Loss) Or for the BW to be congestion controlled Loss < 1.5 * (MTU/RTT/BW)**2 (please excuse the fortran ;-) So for Curtis to reach 40 Mb/s with a 4k MTU and 70 mS RTT, the TOTAL END-to-END loss must have been less than 0.02% of the packets. Since each packet would be about 1000 cells..... To reach 10 Mb/s with a 1500 Byte MTU, the same path needs to have better than a 0.05% end-to-end loss rate. PSC did a demo with LBL at NET'91, on a two month old T3 NSFnet (actually running at half T3) where we achieved near these rates (the RTT was only 20 mS so the loss might have been as high as 0.12%.) Practical experience suggests that this calculation is not pessimistic enough, and that actual loss rates must be significantly better. For one thing it assumes an absolutely state of the art TCP (taho is not good enough!), otherwise performance drops by at least an order of magnitude. --MM--
participants (3)
-
Bilal Chinoy
-
Matt Mathis
-
sob@academ.com