Re: BGP announcements and small providers
At 10:50 AM 2/25/97 -0500, Paul Ferguson wrote:
Well, without naming names, the prefix-length based filtering is done on non-customer routes. A byproduct of this is it grudgingly encourages aggregation.
Well, yes, but now that multiple providers are doing this the fact that they are non-customer filters affects anyone who is not a customer of BOTH providers, thus further encouraging people to aggregate. I would not mind seeing these filters become more prevalent, making it unreasonable for people to become customers of everyone who filters to get around the filters. Renumbering is NOT that hard folks, and it DOES help. Justin Newton Network Architect Erol's Internet Services ISP/C Director at Large
At 10:50 AM 2/25/97 -0500, Paul Ferguson wrote:
Well, without naming names, the prefix-length based filtering is done on non-customer routes. A byproduct of this is it grudgingly encourages aggregation.
Well, yes, but now that multiple providers are doing this the fact that they are non-customer filters affects anyone who is not a customer of BOTH providers, thus further encouraging people to aggregate. I would not mind seeing these filters become more prevalent, making it unreasonable for people to become customers of everyone who filters to get around the filters. Renumbering is NOT that hard folks, and it DOES help.
Justin Newton Network Architect Erol's Internet Services ISP/C Director at Large
Renumbering is not that hard IF you are an end-user and it only affects one or two links (ie: a /24 or two). It is atrociously difficult if you're an ISP and have sold sizeable connections to end-users, some of them with significant installed base (ie: a school system with a dozen buildings across a metro area and a thousand or more systems, along with the infrastructure to interconnect them). It also will disrupt service if you're serving web pages or doing other things that require stable IP numbers. In general I agree that renumbering in the general case for end customers isn't that big a deal. However, for ISPs there are significant legal and operational issues raised by being forced to renumber due to a change in provider relationships. Actions which operational groups such as NANOG take that cause those hardships are, in my opinion, dangerous on a business level. -- -- Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity http://www.mcs.net/~karl | T1's from $600 monthly to FULL DS-3 Service | 99 Analog numbers, 77 ISDN, Web servers $75/mo Voice: [+1 312 803-MCS1 x219]| Email to "info@mcs.net" WWW: http://www.mcs.net/ Fax: [+1 312 803-4929] | 2 FULL DS-3 Internet links; 400Mbps B/W Internal
Knowing that NSPs are filtering /24s, how does an Internet Content Provider (ICP) with just a /24 (all that is needed) that is wishing to be dual-homed see all of the net? Tim -- Internet: pozar @ kumr.lns.com Snail: Tim Pozar / LNS / 1978 45th Ave / San Francisco CA 94116 / USA POTS: +1 415 665 3790 Radio: KC6GNJ / KAE6247
They don't, that's the whole point. DS On Tue, 25 Feb 1997, Tim Pozar wrote:
Knowing that NSPs are filtering /24s, how does an Internet Content Provider (ICP) with just a /24 (all that is needed) that is wishing to be dual-homed see all of the net?
Tim -- Internet: pozar @ kumr.lns.com Snail: Tim Pozar / LNS / 1978 45th Ave / San Francisco CA 94116 / USA POTS: +1 415 665 3790 Radio: KC6GNJ / KAE6247
Knowing that NSPs are filtering /24s, how does an Internet Content Provider (ICP) with just a /24 (all that is needed) that is wishing to be dual-homed see all of the net?
Why even use a /24? Here is a "netstat -nr" from an interface default client, which has an RFC1597 private network for its content server and a BSD/OS 2.1 squid accelerator front-ending it. Destination Gateway Flags Refs Use Interface default:de1 137.39.63.225 UGS 1 0 de1 default:de2 204.74.120.1 UGS 1 0 de2 default 137.39.63.225 UGS 1523 15365222 de1 127 127.0.0.1 UGRS 0 0 lo0 127.0.0.1 127.0.0.1 UH 11 6482 lo0 137.39.63.224/27 link#2 UC 0 0 de1 137.39.63.225 0:0:c:35:29:a0 UHL 1 307 de1 137.39.63.227 0:0:f8:1:a5:8e UHL 0 16 de1 137.39.63.228 0:a0:24:94:5b:e9 UHL 0 3 de1 137.39.63.255 link#2 UHL 0 1 de1 192.168.1 link#1 UC 0 0 de0 192.168.1.1 0:0:f8:2:b3:66 UHL 1 20 lo0 192.168.1.2 8:0:69:2:65:e7 UHL 2 793220 de0 192.168.1.255 link#1 UHL 1 206 de0 204.74.120/27 link#3 UC 0 0 de2 204.74.120.31 link#3 UHL 0 1 de2 224/8 link#1 UC 0 0 de0 The diffs are all PD and should apply OK against other BSDish systems. I gave a more detailed talk about this at SF NANOG. The diffs are also quite short. % ftp ftp.vix.com ftp> cd pub/vixie/ifdefault ftp> ls -rw-rw-r-- 1 716 ten 1731 Jan 31 06:15 ifconfig-diffs -rw-rw-r-- 1 716 ten 5386 Jan 31 05:59 kernel-diffs -rw-rw-r-- 1 716 ten 3696 Jan 31 06:23 netstat-diffs You also need to set up a "socket" forwarder for things you want to be handled by the private-net device: telnet stream tcp nowait nobody /usr/libexec/tcpd socket 192.168.1.2 23 other-ssl stream tcp nowait nobody /usr/libexec/socket socket 192.168.1.2 145 There's a small amount of sendmail.cf work needed to masquerade as the private host and relay mail between the different address spaces.
Knowing that NSPs are filtering /24s, how does an Internet Content Provider (ICP) with just a /24 (all that is needed) that is wishing to be dual-homed see all of the net?
Swamp /24, or use most of a /18|/19 underutilized, or better use more intelligence than "just" BGP - for instance Paul Vixie's stuff at the last NANOG. Alex Bligh Xara Networks
On Tue, 25 Feb 1997, Alex.Bligh wrote: [...]
Swamp /24, or use most of a /18|/19 underutilized, or better use more
Given current address allocation policies, how are you supposed to go about getting a /19 to waste in the first place?
intelligence than "just" BGP - for instance Paul Vixie's stuff at the last NANOG.
Paul's solution certainly adds more reliability as long as the clients doing the connecting do the right thing WRT rotating through the A records for your servers. As far as I can tell, it still doesn't do anything to solve the problem of choosing the "best" interface for the connection to happen on. Obviously the definition of "best" is up for debate, but if the squid machine was doing BGP, there would at least be some path optimization done. As it is, if the interface-defaulted squid machine was dual-homed to providers X and Y that don't peer, a customer of X could get the A record for the interface in Y's space. The client would then have to take the transit path between X and Y, which for many X's and Y's, sucks. If the dual-homed machine was doing BGP, the customer of X would always use the interface on X's side, and vice versa. Of course, we all know that we need to aggregate, shrink the routing table, shrink peer lists, etc., and Paul's solution certainly wins in that repect. -- Matt Ranney - mjr@ranney.com This is how I sign all my messages.
Navigator versions up to and including 3.01 (excluding a special release version made for @home) do not go rotate through the A records. If the first one fails they don't bother with the rest. MSIE 3.0 does time out and try later addresses (although it seems to have a silly bug where it displays the wrong address in the status line at the bottom). Navigator also exhibits another problem (I'm not sure if MSIE has this problem or not, it probably does). It caches DNS results forever. I've renumbered web servers, done the ttl game, and seen traffic at the old ips three weeks after the change "should" have propagated everywhere and the old records timed out everywhere. I'm guessing people running unix boxes who don't restart their browser... Assuming you don't have an address 10.10.10.10 on your network you can try <http://rotate.arctic.org/~dgaudet/blank>. Dean On Tue, 25 Feb 1997, Matt Ranney wrote:
On Tue, 25 Feb 1997, Alex.Bligh wrote:
[...]
Swamp /24, or use most of a /18|/19 underutilized, or better use more
Given current address allocation policies, how are you supposed to go about getting a /19 to waste in the first place?
intelligence than "just" BGP - for instance Paul Vixie's stuff at the last NANOG.
Paul's solution certainly adds more reliability as long as the clients doing the connecting do the right thing WRT rotating through the A records for your servers. As far as I can tell, it still doesn't do anything to solve the problem of choosing the "best" interface for the connection to happen on. Obviously the definition of "best" is up for debate, but if the squid machine was doing BGP, there would at least be some path optimization done.
As it is, if the interface-defaulted squid machine was dual-homed to providers X and Y that don't peer, a customer of X could get the A record for the interface in Y's space. The client would then have to take the transit path between X and Y, which for many X's and Y's, sucks. If the dual-homed machine was doing BGP, the customer of X would always use the interface on X's side, and vice versa.
Of course, we all know that we need to aggregate, shrink the routing table, shrink peer lists, etc., and Paul's solution certainly wins in that repect. -- Matt Ranney - mjr@ranney.com
This is how I sign all my messages.
On Wed 26 Feb, Matt Ranney wrote:
As it is, if the interface-defaulted squid machine was dual-homed to providers X and Y that don't peer, a customer of X could get the A record for the interface in Y's space. The client would then have to take the transit path between X and Y, which for many X's and Y's, sucks.
You could take in all the BGP data from your providers (read-only as it were) then link that into your DNS server so that it returns an IP address according to the 'best' (however you define that...) route that you have back to them...? aid -- Adrian J Bool | mailto:aid@u-net.net Network Operations | http://www.u-net.net/ U-NET Ltd | tel://44.1925.484461/
On Wed, 26 Feb 1997, Adrian Bool wrote:
On Wed 26 Feb, Matt Ranney wrote:
As it is, if the interface-defaulted squid machine was dual-homed to providers X and Y that don't peer, a customer of X could get the A record for the interface in Y's space. The client would then have to take the transit path between X and Y, which for many X's and Y's, sucks.
You could take in all the BGP data from your providers (read-only as it were) then link that into your DNS server so that it returns an IP address according to the 'best' (however you define that...) route that you have back to them...?
aid
I think the whole purpose t Paul V's madness in creating this solution was to avoid doing just that. Also I seem to remember something in Paul's take that took care of this situation. Maybe he can elaborate. Geoff White Virtual Sites netmaster@v-site.net
As it is, if the interface-defaulted squid machine was dual-homed to providers X and Y that don't peer, a customer of X could get the A record for the interface in Y's space. The client would then have to take the transit path between X and Y, which for many X's and Y's, sucks.
Yes. But there are a lot of other reasons for specific paths to suck, and I don't consider this one to be detectable, or even representative.
You could take in all the BGP data from your providers (read-only as it were) then link that into your DNS server so that it returns an IP address according to the 'best' (however you define that...) route that you have back to them...?
No. DNS does not guarantee meaningful ordering of RRs within RRsets, and except in the case of MX or SRV which have explicit priorities, clients are free to sort or randomize or even subset the A RRset they get back. Also, consider caches that are used directly or indirectly by clients whose addresses may not be ideal for the original ordering, even if ordering were preserved. Server ordering of A RRsets is just not a useful approach.
I think the whole purpose t Paul V's madness in creating this solution was to avoid doing just that.
My approach avoids the use of BGP, but not for the above stated reasons. As I said at the SF NANOG, it is hard to get transit providers to send a full BGP table, it is hard to accept it, and it would take a modified GateD that randomized destinations in order to keep BGP's path selection from leading 90% of your routes down 1/Nth of your transit providers. BGP was the wrong answer.
Also I seem to remember something in Paul's take that took care of this situation. Maybe he can elaborate.
Just as DNS Round Robin is suboptimal but better than nothing, so it is that first hop path symmetry is not a complete solution but far better than a single static default when buying transit from N providers. At some point I will revisit the interface default logic and add round robin selection for outbound TCP sessions -- obviously the first hop path symmetry trick only works for inbound sessions. None of this will ever lead to optimality but it's more robust than a single connection from any provider I know of, and it's better than randomizing BGP or depending on stable ordering in DNS. The next step after outbound round robin is teaching Squid to keep track of connection quality from clients, and to send back HTTP redirects if a client comes in on the "wrong" interface. This is the only way to fix the MSIE and NSN brain damage whereby only the first A RR of a response is used, and when I get this part done it will be a product rather than a giveaway like the interface default stuff. (Yes, I have enough investors and customers for this, thanks for your interest.)
On Wed, 26 Feb 1997, Paul A Vixie wrote: [...]
My approach avoids the use of BGP, but not for the above stated reasons. As I said at the SF NANOG, it is hard to get transit providers to send a full BGP table, it is hard to accept it, and it would take a modified GateD that randomized destinations in order to keep BGP's path selection from leading 90% of your routes down 1/Nth of your transit providers. BGP was the wrong answer.
Were the majority of your paths going down one single provider because of a silly tie-breaker like the numeric value of the IP address of the peer, or was it because that provider had a shorter AS path? If its the latter, where's the problem? If one provider has a better path and you aren't out of bandwidth on the connection to that provider, why would you want to take a different path? -- Matt Ranney - mjr@ranney.com This is how I sign all my messages.
On Tue, 25 Feb 1997, Alex.Bligh wrote:
[...]
Swamp /24, or use most of a /18|/19 underutilized, or better use more
Given current address allocation policies, how are you supposed to go about getting a /19 to waste in the first place?
I get my address space from RIPE not Internic so I'm no internic expert, but I believe having an distinct routing policy was justfication for a /19|/18 sized block reserved to assign from (I think someone on this list suggested you have to be at an IXP too). So you just have the /24 registered, but can announce the larger block. It's (roughly) the same at RIPE.
table, shrink peer lists, etc., and Paul's solution certainly wins in that repect.
Re the i/f thing, I think all you need to do is make sure on the squid machine the same o/p i/f is used as i/p i/f for any given connection. As the i/fs have different IPs this looks to me like some trivial source routing (as this is the IP address that will appear as the source on the return packet). (Obviously this doesn't do quite the same as running BGP). I didn't actually see all of Paul's presentation so he may have a better solution. So I think his solution or variants thereof win in most respects :-) Alex Bligh Xara Networks
On Tue, 25 Feb 1997, Tim Pozar wrote:
Knowing that NSPs are filtering /24s, how does an Internet Content Provider (ICP) with just a /24 (all that is needed) that is wishing to be dual-homed see all of the net?
Paul Vixie had one solution to this problem that he presented at NANOG. However, I wonder if he has a write-up of this scheme that we could refer people to? Michael Dillon - Internet & ISP Consulting Memra Software Inc. - Fax: +1-250-546-3049 http://www.memra.com - E-mail: michael@memra.com
Michael Dillon wrote:
On Tue, 25 Feb 1997, Tim Pozar wrote:
Knowing that NSPs are filtering /24s, how does an Internet Content Provider (ICP) with just a /24 (all that is needed) that is wishing to be dual-homed see all of the net?
Paul Vixie had one solution to this problem that he presented at NANOG. However, I wonder if he has a write-up of this scheme that we could refer people to?
I saw it there. May be a solution... Thanks. Tim -- Internet: pozar @ kumr.lns.com Snail: Tim Pozar / LNS / 1978 45th Ave / San Francisco CA 94116 / USA POTS: +1 415 665 3790 Radio: KC6GNJ / KAE6247
participants (11)
-
Adrian Bool
-
Alex.Bligh
-
David Schwartz
-
Dean Gaudet
-
Geoff White
-
Justin W. Newton
-
Karl Denninger
-
Matt Ranney
-
Michael Dillon
-
Paul A Vixie
-
Tim Pozar