Anycast 101

Iljitsch van Beijnum

16 Dec 2004 16 Dec '04

11:31 p.m.

I got some messages from people who weren't exactly clear on how anycast works and fails. So let me try to explain... In IPv6, there are three ways to address a packet: one-to-one (unicast), one-to-many (multicast), or one-to-any (anycast). Like multicast addresses, anycast addresses are shared by a group of systems, but a packet addressed to the group address is only delivered to a single member of the group. IPv6 has "round robin ARP" functionality that allows anycast to work on local subnets. Anycast DNS is a very different beast. Unlike IPv6, IPv4 has no specific support for anycast, and the point here is to distribute the group address very widely, rather than over a single subnet anyway. So what happens is that a BGP announcement that covers the service address is sourced in different locations, and each location is basically configured to think it's the "owner" of the address. The idea is that BGP will see the different paths towards the different anycast instances, and select the best one. Now note that the only real benefit of doing this is reducing the network distance between the users and the service. (Some people cite DoS benefits but DoSsers play the distribution game too, and they're much better at it.) Anycast is now deployed for a significant number of root and gtld servers. Before anycast, most of those servers were located in the US, and most of the rest of the world suffered significant latency in querying them. Due to limitations in the DNS protocol, it's not possible to increase the number of authoritative DNS servers for a zone beyond around 13. With anycast, a much larger part of the world now has regional access to the root and com and net zones, and probably many more that I don't know about. However, there are some issues. The first one is that different packets can end up at different anycast instances. This can happen when BGP reconverges after some network event (or after an anycast instance goes offline and stops announcing the anycast prefix), but under some very specific circumstances it can also happen with per packet load balancing. Most DNS traffic consists of single packets, but the DNS also uses TCP for queries sometimes, and when intermediate MTUs are small there may be fragmentation. Another issue is the increased risk of fait sharing. In the old root setup, it was very unlikely for a non-single homed network to see all the root DNS servers behind the same next hop address. With anycast, this is much more likely to happen. The pathological case is one where a small network connects to one or more transit networks and has local/regional peering, and then sees an anycast instance for all root servers over peering. If then something bad happens to the peering connection (peering router melts down, a peer pulls an AS7007, peering fabric goes down, or worse, starts flapping), all the anycasted addresses become unreachable at the same time. Obviously this won't happen to the degree of unreachability in practice (well, unless there are only two addresses that are both anycast for a certain TLD, then your milage may vary), but even if 5 or 8 or 12 addresses become unreachable the timeouts get bad enough for users to notice. The 64000 ms timeout query is: at what point do the downsides listed above (along with troubleshooting hell) start to overtake the benefit of better latency? I think the answer lies in the answers to these three questions: - How good is BGP in selecting the lowest latency path? - How fast is BGP convergence? - Which percentage of queries go to the first or fastest server in the list?

Show replies by date

Crist Clark

17 Dec 17 Dec

12:05 a.m.

Iljitsch van Beijnum wrote:

...

Due to limitations in the DNS protocol, it's not possible to increase the number of authoritative DNS servers for a zone beyond around 13.

I believe you misspelled, "Due to people who do not understand the DNS protocol being allowed to configure firewalls..." -- Crist J. Clark crist.clark@globalstar.com Globalstar Communications (408) 933-4387

Steven M. Bellovin

12:59 a.m.

In message <41C222C3.9020906@globalstar.com>, Crist Clark writes:

...

Iljitsch van Beijnum wrote:

...
Due to limitations in the DNS protocol, it's not possible to increase the number of authoritative DNS servers for a zone beyond around 13.

I believe you misspelled, "Due to people who do not understand the DNS protocol being allowed to configure firewalls..."

No, firewalls have nothing to do with it. Section 4.2.1 of RFC 1035 says: Messages carried by UDP are restricted to 512 bytes (not counting the IP or UDP headers). There's a large installed base of machines that conform to that limit and don't understand EDNS0. I'll leave the packet layout and arithmetic as an exercise for the reader (cheaters may want to run tcpdump on 'dig ns .' and examine the result), but the net result is what Iljitsch said: you can only fit about 13 servers into a response. --Steve Bellovin, http://www.research.att.com/~smb

Crist Clark

1:18 a.m.

Steven M. Bellovin wrote:

...

In message <41C222C3.9020906@globalstar.com>, Crist Clark writes:

...
Iljitsch van Beijnum wrote:

...
Due to limitations in the DNS protocol, it's not possible to increase the number of authoritative DNS servers for a zone beyond around 13.

I believe you misspelled, "Due to people who do not understand the DNS protocol being allowed to configure firewalls..."

No, firewalls have nothing to do with it. Section 4.2.1 of RFC 1035 says:

Messages carried by UDP are restricted to 512 bytes (not counting the IP or UDP headers).

There's a large installed base of machines that conform to that limit and don't understand EDNS0. I'll leave the packet layout and arithmetic as an exercise for the reader (cheaters may want to run tcpdump on 'dig ns .' and examine the result), but the net result is what Iljitsch said: you can only fit about 13 servers into a response.

Into a UDP response. A resolver will recieve the first 512 bytes of the truncated response and may then use TCP to get the complete response... unless there is a firewall blocking 53/tcp in the way. But how often does that happpen? The root servers sustaining the ensuing SYN flood is another issue. -- Crist J. Clark crist.clark@globalstar.com Globalstar Communications (408) 933-4387

Valdis.Kletnieks＠vt.edu

5:04 p.m.

On Thu, 16 Dec 2004 17:18:12 PST, Crist Clark said:

...

Into a UDP response. A resolver will recieve the first 512 bytes of the truncated response and may then use TCP to get the complete response... unless there is a firewall blocking 53/tcp in the way. But how often does that happpen?

You're new here, aren't you? ;) It happens *all* *the* *time* (probably just as often as sites that block all ICMP including 'frag needed' and wonder why PMTU Discovery breaks and connections hang). The *real* operational problem is that almost 100% of the time that there's a firewall blocking 53/tcp, the person running the firewall is (a) unaware that it's blocking it and (b) doesn't even realize that DNS *can* use TCP.... Quite often, there's even a "(c) they don't even know they have a firewall" just to make things really interesting.

Douglas K. Fischer

23 Dec 23 Dec

6:14 p.m.

Valdis.Kletnieks@vt.edu wrote:

...

On Thu, 16 Dec 2004 17:18:12 PST, Crist Clark said:

...
Into a UDP response. A resolver will recieve the first 512 bytes of the truncated response and may then use TCP to get the complete response... unless there is a firewall blocking 53/tcp in the way. But how often does that happpen?

It happens *all* *the* *time* (probably just as often as sites that block all ICMP including 'frag needed' and wonder why PMTU Discovery breaks and connections hang).

The *real* operational problem is that almost 100% of the time that there's a firewall blocking 53/tcp, the person running the firewall is (a) unaware that it's blocking it and (b) doesn't even realize that DNS *can* use TCP....

Quite often, there's even a "(c) they don't even know they have a firewall" just to make things really interesting.

One of the most common misconceptions I've encountered and had heated debates with some would-be admins is the belief that the only "proper" use of 53/tcp for DNS is for zone transfers. For that reason they explicitly block 53/tcp in their firewalls. Same thing with that good old misconception that all forms of ICMP are evil and should be blocked. Doug --*-- Life would be so much easier if we only had the source code... -Anonymous --*--

Suzanne Woolf

17 Dec 17 Dec

1:54 a.m.

On Thu, Dec 16, 2004 at 07:59:58PM -0500, Steven M. Bellovin wrote:

...

In message <41C222C3.9020906@globalstar.com>, Crist Clark writes:

...
Iljitsch van Beijnum wrote:

...
Due to limitations in the DNS protocol, it's not possible to increase the number of authoritative DNS servers for a zone beyond around 13.

I believe you misspelled, "Due to people who do not understand the DNS protocol being allowed to configure firewalls..."

No, firewalls have nothing to do with it. Section 4.2.1 of RFC 1035 says:

Messages carried by UDP are restricted to 512 bytes (not counting the IP or UDP headers).

There's a large installed base of machines that conform to that limit and don't understand EDNS0. I'll leave the packet layout and arithmetic as an exercise for the reader (cheaters may want to run tcpdump on 'dig ns .' and examine the result), but the net result is what Iljitsch said: you can only fit about 13 servers into a response.

Just because I feel like splitting hairs.... You're both right. As far as we (ISC) can tell, there are lots of resolvers that authoritative servers can't send big packets to because they don't grok EDNS0. There are also lots of resolvers that grok EDNS0 behind firewalls that don't. Big fun can occur when the resolver indicates EDNS0-compliance from behind such a firewall and keeps asking because it thinks it's not getting answers....For extra credit, try to deploy DNSSEC in this reality. It's not for nothing that we speak of extending the DNS protocol as "rebuilding the airplane in flight" around here....

Alon Tirosh

2:01 a.m.

To add, there are also a lot of edge appliances (Company C appliances that start with P) that block 53/tcp >= 512B by default without admins realizing, hence EDNS gets actively blocked while normal DNS traffic works (this is a major issue for Enterprise Windows Admins.) On Fri, 17 Dec 2004 01:54:43 +0000, Suzanne Woolf <Suzanne_Woolf@isc.org> wrote:

...

On Thu, Dec 16, 2004 at 07:59:58PM -0500, Steven M. Bellovin wrote:

...
In message <41C222C3.9020906@globalstar.com>, Crist Clark writes:

...
Iljitsch van Beijnum wrote:

...
Due to limitations in the DNS protocol, it's not possible to increase the number of authoritative DNS servers for a zone beyond around 13.

I believe you misspelled, "Due to people who do not understand the DNS protocol being allowed to configure firewalls..."

No, firewalls have nothing to do with it. Section 4.2.1 of RFC 1035 says:

Messages carried by UDP are restricted to 512 bytes (not counting the IP or UDP headers).

There's a large installed base of machines that conform to that limit and don't understand EDNS0. I'll leave the packet layout and arithmetic as an exercise for the reader (cheaters may want to run tcpdump on 'dig ns .' and examine the result), but the net result is what Iljitsch said: you can only fit about 13 servers into a response.

Just because I feel like splitting hairs....

You're both right. As far as we (ISC) can tell, there are lots of resolvers that authoritative servers can't send big packets to because they don't grok EDNS0. There are also lots of resolvers that grok EDNS0 behind firewalls that don't. Big fun can occur when the resolver indicates EDNS0-compliance from behind such a firewall and keeps asking because it thinks it's not getting answers....For extra credit, try to deploy DNSSEC in this reality.

It's not for nothing that we speak of extending the DNS protocol as "rebuilding the airplane in flight" around here....

Stephane Bortzmeyer

11:20 a.m.

On Thu, Dec 16, 2004 at 07:59:58PM -0500, Steven M. Bellovin <smb@research.att.com> wrote a message of 26 lines which said:

...

I'll leave the packet layout and arithmetic as an exercise for the reader

This has been already done :-) http://w6.nic.fr/dnsv6/resp-size.html

Steve Gibbard

2:06 a.m.

On Fri, 17 Dec 2004, Iljitsch van Beijnum wrote:

...

I got some messages from people who weren't exactly clear on how anycast works and fails. So let me try to explain...

Nice try.

...

Anycast is now deployed for a significant number of root and gtld servers. Before anycast, most of those servers were located in the US, and most of the rest of the world suffered significant latency in querying them. Due to limitations in the DNS protocol, it's not possible to increase the number of authoritative DNS servers for a zone beyond around 13. With anycast, a much larger part of the world now has regional access to the root and com and net zones, and probably many more that I don't know about.

Think of this also as a reliability measure. If a region of the world has poor connectivity to the so-called "Internet core" (Remember the Sri Lanka international fiber outage a few months ago?), a loss of international connectivity can mean a loss of DNS, which breaks even local connectivity.

...

However, there are some issues. The first one is that different packets can end up at different anycast instances. This can happen when BGP reconverges after some network event (or after an anycast instance goes offline and stops announcing the anycast prefix), but under some very specific circumstances it can also happen with per packet load balancing. Most DNS traffic consists of single packets, but the DNS also uses TCP for queries sometimes, and when intermediate MTUs are small there may be fragmentation.

You're misunderstanding how per-packet load balancing is generally used. Per-packet load balancing works very well when you've got two identical circuits between the same two routers, and you want to make sure neither circuit fills up while the other has spare capacity. Using per-packet load balancing on non-identical paths (in your example, out different peering or transit connections) doesn't work. Even when connecting to a unicast host, the packets would arrive out of order, leading to some really nasty performance problems. If anybody is using per-packet load balancing in that sort of situation, anycast DNS is the least of their problems.

...

Another issue is the increased risk of fait sharing. In the old root setup, it was very unlikely for a non-single homed network to see all the root DNS servers behind the same next hop address. With anycast, this is much more likely to happen. The pathological case is one where a small network connects to one or more transit networks and has local/regional peering, and then sees an anycast instance for all root servers over peering. If then something bad happens to the peering connection (peering router melts down, a peer pulls an AS7007, peering fabric goes down, or worse, starts flapping), all the anycasted addresses become unreachable at the same time.

You appear to be assuming that every anycast server in the world announces routes for every anycasted address. The general Anycast rule is that for however many anycasted IP addresses you have serving a zone, you have that many separate sets of anycast nodes. So, if you have a zone served by anyns1, anyns2, and anyns3, there will be a set of nodes that is anyns1, a set of nodes that is anyns2, and a set of nodes that is anyns3. Different servers, different routers, and probably different physical locations. Are there scenarios where an outage would lead to a loss of all of the anycast clouds? Of course, but those scenarios would apply to Unicast servers as well. The potentially valid point you've made is about switching servers during BGP convergence. As such, anycast might well be inappropriate for long term stateful connections. However, BGP reconvergence should be relatively rare, DNS queries finish quickly, and DNS is good about failing over to another DNS server IP address when a query fails. If your example is a network whose entire routing table is reconverging, and they're changing their routes to all the name servers for a particular zone, their network performance is going to be pretty bad until convergence finishes anyway.

...

Obviously this won't happen to the degree of unreachability in practice (well, unless there are only two addresses that are both anycast for a certain TLD, then your milage may vary), but even if 5 or 8 or 12 addresses become unreachable the timeouts get bad enough for users to notice.

Right, but if you're losing 5 or 8 or 12 diverse routes at the same time, your problem probably has very little to do with anycast. -Steve

Iljitsch van Beijnum

12:05 p.m.

On 17-dec-04, at 3:06, Steve Gibbard wrote:

...

...
under some very specific circumstances it can also happen with per packet load balancing.

...

You're misunderstanding how per-packet load balancing is generally used.

I wasn't saying anything about how per packet load balancing is generaly used, the point is that it's possible that subsequent packets end up at different anycast instances when a number of specific prerequesites exists. In short: a customer must pplb across two routers at the same ISP, and each of those routers must have different preferred paths to different anycast instances. This isn't going to happen often, but it's not impossible, and it's not bad engineering on the customer's or ISP's part if it does, IMO.

...

Using per-packet load balancing on non-identical paths (in your example, out different peering or transit connections) doesn't work.

That's right, because BGP only installs two or more routes when the path attributes are identical or nearly identical. However, the attributes may be different (different next hop, IGP metric, MED) inside the ISP network but the differences can then go away at the next hop.

...

Even when connecting to a unicast host, the packets would arrive out of order, leading to some really nasty performance problems. If anybody is using per-packet load balancing in that sort of situation, anycast DNS is the least of their problems.

Yes, this is why people are so terrified of per packet load balancing. Most of this fear is unfounded, though: the only way to get consistent out of order packets (a few here and there doesn't matter) is when the links in the middle are the same or lower effective bandwidth than the links at the source edge. And even then it will mostly happen for packets of different sizes.

...

You appear to be assuming that every anycast server in the world announces routes for every anycasted address.

No. I'm not concerned about what happens at the anycasted ends, it's the way it looks from any given vantage point throughout the network that matters.

...

Are there scenarios where an outage would lead to a loss of all of the anycast clouds? Of course, but those scenarios would apply to Unicast servers as well.

The assumption is that it's universally benificial to see DNS addresses "close". While it is good to be able to see several addresses "close", it's better for redundancy when there are also some that are seen "far away", since when big failures happen, it's less likely that everything "close" _and_ everything "far away" is impacted at the same time.

...

...
Obviously this won't happen to the degree of unreachability in practice (well, unless there are only two addresses that are both anycast for a certain TLD, then your milage may vary), but even if 5 or 8 or 12 addresses become unreachable the timeouts get bad enough for users to notice.

...

Right, but if you're losing 5 or 8 or 12 diverse routes at the same time, your problem probably has very little to do with anycast.

That's not the point. If without anycast this is better than with anycast, then this should go on the "con" list for anycast.

Michael.Dillon＠radianz.com

2:16 p.m.

...

That's not the point. If without anycast this is better than with anycast, then this should go on the "con" list for anycast.

People often confuse two separate technical things here. One is the BGP anycast technique which allows anycasting to be used in an IPv4 network, and the other is the application of BGP anycasting to DNS in an IPv4 network. It would be clearer if people would prefix "anycast" with either BGP or DNS to make it clear which they are talking about. Conceivably there could be other applications that could be distributed using BGP anycast. And if those applications are designed knowing the quirks of BGP anycasting then presumably they would have ways to overcome some of the issues that affect DNS. I would reword your statement as follows. ... then this should go on the "con" list for DNS anycasting. --Michael Dillon

Marshall Eubanks

2:59 p.m.

On Fri, 17 Dec 2004 14:16:58 +0000 Michael.Dillon@radianz.com wrote:

...

...
That's not the point. If without anycast this is better than with anycast, then this should go on the "con" list for anycast.

People often confuse two separate technical things here. One is the BGP anycast technique which allows anycasting to be used in an IPv4 network, and the other is the application of BGP anycasting to DNS in an IPv4 network. It would be clearer if people would prefix "anycast" with either BGP or DNS to make

There is also MSDP anycasting, which is both pretty cool and close to best common practice for anyone running MSDP. Regards Marshall

...

it clear which they are talking about. Conceivably there could be other applications that could be distributed using BGP anycast. And if those applications are designed knowing the quirks of BGP anycasting then presumably they would have ways to overcome some of the issues that affect DNS.

I would reword your statement as follows.

... then this should go on the "con" list for DNS anycasting.

--Michael Dillon

William Allen Simpson

2:27 p.m.

Iljitsch van Beijnum wrote:

...

On 17-dec-04, at 3:06, Steve Gibbard wrote:

...
...
under some very specific circumstances it can also happen with per packet load balancing.

...
You're misunderstanding how per-packet load balancing is generally used.

I wasn't saying anything about how per packet load balancing is generaly used, the point is that it's possible that subsequent packets end up at different anycast instances when a number of specific prerequesites exists. In short: a customer must pplb across two routers at the same ISP, and each of those routers must have different preferred paths to different anycast instances. This isn't going to happen often, but it's not impossible, and it's not bad engineering on the customer's or ISP's part if it does, IMO.

You're wrong. That's VERY bad engineering! PPLB requires 2 routers, one at each end of the link bundle. More than 1 router at any end will lead to a lot more problems than anycast, including multicast and any stateful protocol (like TCP). For one thing, the load balancing will be only in 1 direction, and will lead to congestion in the reverse path.... Self defeating. -- William Allen Simpson Key fingerprint = 17 40 5E 67 15 6F 31 26 DD 0D B9 9B 6A 15 2C 32

Iljitsch van Beijnum

2:55 p.m.

On 17-dec-04, at 15:27, William Allen Simpson wrote:

...

...
In short: a customer must pplb across two routers at the same ISP, and each of those routers must have different preferred paths to different anycast instances. This isn't going to happen often, but it's not impossible, and it's not bad engineering on the customer's or ISP's part if it does, IMO.

...

You're wrong. That's VERY bad engineering!

...

PPLB requires 2 routers, one at each end of the link bundle.

It doesn't really require that. Redundancy requires that the routers at the ends of two links both be different. Having one router at one end and two at the other is a good compromise in many situations.

...

More than 1 router at any end will lead to a lot more problems than anycast, including multicast and any stateful protocol (like TCP).

How many people run multicast exactly? And that's precisely the reason why multicast is a different SAFI so you get to have different multicast and unicast routing. As for TCP, it would be very useful if someone were to run the following experiment: +-------+ |router2| +-------+ / \ +------+ +-------+ +-------+ +------+ |host a+---+router1| |router4+---+host b| +------+ +-------+ +-------+ +------+ \ / +-------+ |router3| +-------+ (Assuming the links to the hosts are (for instance) gigabit and the ones between the routers fast ethernet. If they're all the same speed you're only going to see out of order packets when the later packet is smaller than the earlier packet, which is inconsistent with a TCP session running at full blast.) Setup #1: per destination load balancing from router1 to routers 2 and 3 Setup #2: per packet load balancing from router1 to routers 2 and 3 So setup #1 will get no reordering, but is limited to 100 Mbps. Setup #2 will see reordering, but has a total bandwidth of 200 Mbps. Which going to perform better?

...

For one thing, the load balancing will be only in 1 direction, and will lead to congestion in the reverse path.... Self defeating.

Traffic patterns aren't the same in both directions in many/most cases, so one direction may be enough.

William Allen Simpson

3:56 p.m.

Iljitsch van Beijnum wrote:

...

It doesn't really require that. Redundancy requires that the routers at the ends of two links both be different. Having one router at one end and two at the other is a good compromise in many situations.

OK, now I'm sure you don't actually do any engineering. In 25+ years, I've not found that router failure was a major or even interesting problem. Link failures are probably 80%. Upstream failures are probably another 5%, about the same as staff fumblefingers, power failures, and customer misconfiguration that somehow affects routing -- like the idiots with the 5 character password last week that got rooted and swamped their link so badly that BGP dropped. I've lived through "inverse multiplexing", and BONDING, etc, etc.... Sure, I've had routers that had to be rebooted every week to overcome a slow memory leak. But you're not fixing that.... A redundant router should be where it would be doing some good -- on a diverse link to another upstream. -- William Allen Simpson Key fingerprint = 17 40 5E 67 15 6F 31 26 DD 0D B9 9B 6A 15 2C 32

Steve Gibbard

3:59 p.m.

On Fri, 17 Dec 2004, Iljitsch van Beijnum wrote:

...

As for TCP, it would be very useful if someone were to run the following experiment: +-------+ |router2| +-------+ / \ +------+ +-------+ +-------+ +------+ |host a+---+router1| |router4+---+host b| +------+ +-------+ +-------+ +------+ \ / +-------+ |router3| +-------+

(Assuming the links to the hosts are (for instance) gigabit and the ones between the routers fast ethernet. If they're all the same speed you're only going to see out of order packets when the later packet is smaller than the earlier packet, which is inconsistent with a TCP session running at full blast.)

Now, let's say that your path through router 2 is several hundred, or maybe a few thousand, miles longer than your path through router 3. You are, after all, arguing that the paths are different enough that the packets are going to end up at different anycast hosts, which is generally equivalent to going into another network via a different exchange point. Have you just come up with a way to overcome the speed of light, or are you arguing that doing per packet load balancing over paths with differences in latency of tens or hundreds of milliseconds wouldn't result in out of order packets? -Steve

Joe Shen

10:23 a.m.

My question: I noticed that people always talked about BGP when they talked about anycast dns server farm. But, is there any problem or anything must be taken care about when anycast is employed within a DNS server farm within MAN? What I mean is, if we want to employ anycast in a cache server farm which is located within a big OSPF network, is there anything problemetic ? or should we consider anycast only when root server is to be installed ? Some people said, it's not needed to set up anycast in MAN because DNS system in such situation is very small ( less than 10 SUN servers ). regards Joe --- Iljitsch van Beijnum <iljitsch@muada.com> wrote:

...

I got some messages from people who weren't exactly clear on how anycast works and fails. So let me try to explain...

In IPv6, there are three ways to address a packet: one-to-one (unicast), one-to-many (multicast), or one-to-any (anycast). Like multicast addresses, anycast addresses are shared by a group of systems, but a packet addressed to the group address is only delivered to a single member of the group. IPv6 has "round robin ARP" functionality that allows anycast to work on local subnets.

Anycast DNS is a very different beast. Unlike IPv6, IPv4 has no specific support for anycast, and the point here is to distribute the group address very widely, rather than over a single subnet anyway. So what happens is that a BGP announcement that covers the service address is sourced in different locations, and each location is basically configured to think it's the "owner" of the address.

The idea is that BGP will see the different paths towards the different anycast instances, and select the best one. Now note that the only real benefit of doing this is reducing the network distance between the users and the service. (Some people cite DoS benefits but DoSsers play the distribution game too, and they're much better at it.)

Anycast is now deployed for a significant number of root and gtld servers. Before anycast, most of those servers were located in the US, and most of the rest of the world suffered significant latency in querying them. Due to limitations in the DNS protocol, it's not possible to increase the number of authoritative DNS servers for a zone beyond around 13. With anycast, a much larger part of the world now has regional access to the root and com and net zones, and probably many more that I don't know about.

However, there are some issues. The first one is that different packets can end up at different anycast instances. This can happen when BGP reconverges after some network event (or after an anycast instance goes offline and stops announcing the anycast prefix), but under some very specific circumstances it can also happen with per packet load balancing. Most DNS traffic consists of single packets, but the DNS also uses TCP for queries sometimes, and when intermediate MTUs are small there may be fragmentation.

Another issue is the increased risk of fait sharing. In the old root setup, it was very unlikely for a non-single homed network to see all the root DNS servers behind the same next hop address. With anycast, this is much more likely to happen. The pathological case is one where a small network connects to one or more transit networks and has local/regional peering, and then sees an anycast instance for all root servers over peering. If then something bad happens to the peering connection (peering router melts down, a peer pulls an AS7007, peering fabric goes down, or worse, starts flapping), all the anycasted addresses become unreachable at the same time.

Obviously this won't happen to the degree of unreachability in practice (well, unless there are only two addresses that are both anycast for a certain TLD, then your milage may vary), but even if 5 or 8 or 12 addresses become unreachable the timeouts get bad enough for users to notice.

The 64000 ms timeout query is: at what point do the downsides listed above (along with troubleshooting hell) start to overtake the benefit of better latency? I think the answer lies in the answers to these three questions:

- How good is BGP in selecting the lowest latency path? - How fast is BGP convergence? - Which percentage of queries go to the first or fastest server in the list?

__________________________________________________ Do You Yahoo!? Log on to Messenger with your mobile phone! http://sg.messenger.yahoo.com

Iljitsch van Beijnum

11:33 a.m.

On 17-dec-04, at 11:23, Joe Shen wrote:

...

is there any problem or anything must be taken care about when anycast is employed within a DNS server farm within MAN?

...

What I mean is, if we want to employ anycast in a cache server farm which is located within a big OSPF network, is there anything problemetic ? or should we consider anycast only when root server is to be installed ?

Since OSPF generally converges orders of magnitude faster and it's easier to get OSPF to select the highest bandwidth/lowest latency path than BGP, this should be easier. The problem of how to revoke the anycast route when the service goes away is pretty much the same. The benefits (especially latency) are also likely to be less, though.

...

Some people said, it's not needed to set up anycast in MAN because DNS system in such situation is very small ( less than 10 SUN servers ).

The only problem that is unsolvable without anycast is getting response times below 25 ms or so in every corner of the world. In all other cases it all depends on the pros and cons of different ways to get the job done.

Joe Abley

5:53 p.m.

On 17 Dec 2004, at 06:33, Iljitsch van Beijnum wrote:

...

On 17-dec-04, at 11:23, Joe Shen wrote:

...
is there any problem or anything must be taken care about when anycast is employed within a DNS server farm within MAN?

...
What I mean is, if we want to employ anycast in a cache server farm which is located within a big OSPF network, is there anything problemetic ? or should we consider anycast only when root server is to be installed ?

Since OSPF generally converges orders of magnitude faster and it's easier to get OSPF to select the highest bandwidth/lowest latency path than BGP, this should be easier.

Also, be mindful of ECMP. http://www.isc.org/pubs/tn/isc-tn-2004-1.html http://www.isc.org/pubs/tn/isc-tn-2004-1.txt Joe

Joe Shen

20 Dec 20 Dec

1:41 p.m.

Hi, That's what I want to discuss about. The paper gives a very detailed explanation on anycast with OSPF_ecmp, and what I want to know is: is there anything not included in it but must be considered carefully when anycast cache server farm is to be established in MAN ? Will there be any problem with OSPF-ECMP convergence ? is there any request with DNS software(BIND, CNS, powerdns etc. ) selection? Considering such a situation, a big ISP want to set up hierachical cache DNS service, it has several MAN interconnected by backbone. each MAN uses a reserved ASN. The backbone has a public ASN and connect to each MAN with e-BGP. Should BGP multipath be considered ? or should each MAN announce same DNS server address block in each e-bgp session ? will there be any possible problem in such situation? what I do care about is, convergence speed, reliablity, load balancing within cache server farm, or load sharing between different cache server farm when one of them failed, cost of administration. Joe

...

Also, be mindful of ECMP.

http://www.isc.org/pubs/tn/isc-tn-2004-1.html http://www.isc.org/pubs/tn/isc-tn-2004-1.txt

Joe

__________________________________________________ Do You Yahoo!? Log on to Messenger with your mobile phone! http://sg.messenger.yahoo.com

Stephane Bortzmeyer

17 Dec 17 Dec

11:25 a.m.

On Fri, Dec 17, 2004 at 12:31:37AM +0100, Iljitsch van Beijnum <iljitsch@muada.com> wrote a message of 68 lines which said:

...

and then sees an anycast instance for all root servers over peering. If then something bad happens to the peering connection ... but even if 5 or 8 or 12 addresses become unreachable the timeouts get bad enough for users to notice.

We can turn this into a Good Practice: do not put an instance of every root name server on any given exchange point. Actually, this is only a theoretical issue, the current maximum seems to be only three (at the LINX in London).

Iljitsch van Beijnum

12:11 p.m.

On 17-dec-04, at 12:25, Stephane Bortzmeyer wrote:

...

...
but even if 5 or 8 or 12 addresses become unreachable the timeouts get bad enough for users to notice.

...

We can turn this into a Good Practice: do not put an instance of every root name server on any given exchange point.

...

Actually, this is only a theoretical issue, the current maximum seems to be only three (at the LINX in London).

Well, there may be only three "at" the LINX, but from where I'm sitting, 7 are reachable over the AMS-IX, 4 over ISP #1 and 1 over ISPs #2 and #3, respectively. Interestingly enough, b, c, d and f all share this hop: 6 portch1.core01.ams03.atlas.cogentco.com (195.69.144.124) (195.69.144.0/23 is the AMS-IX exchange subnet.)

William Allen Simpson

2:41 p.m.

Iljitsch van Beijnum wrote:

...

Well, there may be only three "at" the LINX, but from where I'm sitting, 7 are reachable over the AMS-IX, 4 over ISP #1 and 1 over ISPs #2 and #3, respectively.

Interestingly enough, b, c, d and f all share this hop:

6 portch1.core01.ams03.atlas.cogentco.com (195.69.144.124)

(195.69.144.0/23 is the AMS-IX exchange subnet.)

I'm beginning to wonder whether you're just an agent provocateur. Assuming that your link to AMS-IX fails, your redundant attachments to the world will provide the reachability to those same 7 via your other links. All that shows is those 7 are topologically closer via that path. You don't seem to have 13 paths. So? FWIW, I see all DNS roots via the same BellSouth path. IFF BellSouth fails, I'm sure that other paths will pick up the slack. I'm not worried, because I've experienced BellSouth failures in the past, and I've tested dropping each of my links from time to time to ensure that routing works and I'm getting what I'm paying for.... Do you actually do any engineering, or just kibitzing? -- William Allen Simpson Key fingerprint = 17 40 5E 67 15 6F 31 26 DD 0D B9 9B 6A 15 2C 32

Paul Vixie

6:43 p.m.

i don't think iljitsch is in a position to teach an "anycast 101" class. here's my evidence: -------- From: Paul Vixie <paul@vix.com> To: dnsop@lists.uoregon.edu Subject: Re: [dnsop] Re: Root Anycast (fwd) X-Mailer: MH-E 7.4; nmh 1.0.4; GNU Emacs 21.3.1 Date: Mon, 04 Oct 2004 22:26:18 +0000 Sender: vixie@sa.vix.com X-Evolution: 0000020e-0000 note-- harald asked us to move this thread off of ietf@, so i've done that. iljitsch added ietf@ back to the headers in his reply to me. i'm taking it back off again. iljitsch, please leave it off, respecting harald's wishes.

...

... It's possible for bad things to happen if:

1. some DNS server is anycast (TLD servers are worse than roots because the root zone is so small) 2. fragmented UDP packets or TCP are used as a transport 3. a network is built such that packets entering it through router X may prefer a different external link towards a certain destination than packet entering it through router Y 4. a customer of this network is connected to two different routers 5. the customer enables per packet load balancing

#1 and #2 are normal, even though fragmented udp isn't very common nowadays. #3 is extremely common. #4 is normal for high-end customers. and #5 will only affect customers whose ISP shares an IGP with the anycast -- in other words, "other customers of the same ISP". if this problem erupts, the ISP will take care of it. it's not an internet-level (BGP-level) problem at all.

...

Now the question is: how do we deal with this? I don't think removing anycast wholesale makes sense and/or is feasible. Same thing for declaring per packet load balancing an evil practice.

as i said the other day, "all power tools can kill." if you turn on PPLB and it hurts, then turn it off until you can read the manual or take a class or talk to an expert. PPLB is a link bundling technology. if you turn it on in non-parallel-path situation, it will hurt you, so, "don't do that."

...

A better solution would be to give network operators something that enables them to make sure load balancing doesn't happen for anycasted destinations. A good way to do this would be having an "anycast" or "don't load balance" community in BGP, or publication of a list of ASes and/or prefixes that shouldn't be load balanced because the destinations are anycast.

since PPLB won't affect BGP (since BGP is not multipath by default), this is not an issue.

...

...
and they would know that PPLB is basically a link bundling technology used when all members of the PPLB group start and end in the same router-pair;

It doesn't make much sense to have multiple links terminate on the same router on both ends as then both these routers become single points of failure.

i don't even know what conversation we're in any more. why does it matter whether they are single points of failure, if this is the configuration for which PPLB was intended? if you have two 155Mbit/sec links and you want to be able to treat them as a 310Mbit/sec link rather than upgrading them to a single 622Mbit/sec link, then PPLB is a godsend. otherwise, don't use it.

...

Often, the end sending out most traffic will have the links terminate on one router (so load balancing is possible) while the other ends of the links terminate on two or more routers.

there are other safe configurations for PPLB, to be sure. turning it on toward your transits and doing PPLB among two default routes, and turning it on toward your transits and turning on BGP multipath, are not two of them. the fact that an unsafe configuration can be built using PPLB is not news, since as we all know, "all power tools can kill." -- Paul Vixie

Iljitsch van Beijnum

8:55 p.m.

On 17-dec-04, at 19:43, Paul Vixie wrote:

...

i don't think iljitsch is in a position to teach an "anycast 101" class.

If anyone feels they can do better, please step up...

...

here's my evidence:

...

note-- harald asked us to move this thread off of ietf@, so i've done that. iljitsch added ietf@ back to the headers in his reply to me. i'm taking it back off again. iljitsch, please leave it off, respecting harald's wishes.

Hey! I missed this one. I'm on dnsop but it's pretty low on my to-read list. Unfortunately, your evidence contains its share of errors so I'm not sure if you should be teaching the class either.

...

...
... It's possible for bad things to happen if:

...

...
1. some DNS server is anycast (TLD servers are worse than roots because the root zone is so small) 2. fragmented UDP packets or TCP are used as a transport 3. a network is built such that packets entering it through router X may prefer a different external link towards a certain destination than packet entering it through router Y 4. a customer of this network is connected to two different routers 5. the customer enables per packet load balancing

...

#1 and #2 are normal, even though fragmented udp isn't very common nowadays. #3 is extremely common. #4 is normal for high-end customers. and #5 will only affect customers whose ISP shares an IGP with the anycast -- in other words, "other customers of the same ISP".

Nope. Consider: +-------+ +-------+ |ISPrtr1+---+ACinstA| +------+---+---+---+ +-------+ |source| | +------+---+---+---+ +-------+ |ISPrtr2+---+ACinstB| +-------+ +-------+ Where the anycast instances exchange routing information using BGP. If there is no special BGP configuration in effect, the ISPrtr1 will prefer the path to anycast instance A and 2 to B, because the external path takes precedence over a same length path that's learned over iBGP. The current Cisco multipath BGP rules require the whole AS path to be the same (which would be the case in this diagram if both anycast instances use the same AS number), but older IOSes only require the next hop AS and the path length to be the same.

...

...
Now the question is: how do we deal with this? I don't think removing anycast wholesale makes sense and/or is feasible. Same thing for declaring per packet load balancing an evil practice.

...

as i said the other day, "all power tools can kill." if you turn on PPLB and it hurts, then turn it off until you can read the manual or take a class or talk to an expert. PPLB is a link bundling technology. if you turn it on in non-parallel-path situation, it will hurt you, so, "don't do that."

Yes, per packet load balancing will cause reordering, and if that's an issue you shouldn't use it. But if with pplb packets end up at two different hosts, that's not the fault of the people who invented per packet load balancing or the people who turned it on, but the fault of the people giving the same address to two different hosts.

...

...
A better solution would be to give network operators something that enables them to make sure load balancing doesn't happen for anycasted destinations. A good way to do this would be having an "anycast" or "don't load balance" community in BGP, or publication of a list of ASes and/or prefixes that shouldn't be load balanced because the destinations are anycast.

...

since PPLB won't affect BGP (since BGP is not multipath by default), this is not an issue.

If the uncommon network setup exists, and pplb is turned on, the problem can manifest itself. The fact that someone had to turn on a feature that's turned off by default is immaterial. (There is no BGP by default to begin with.)

...

...
...
and they would know that PPLB is basically a link bundling technology used when all members of the PPLB group start and end in the same router-pair;

...

...
It doesn't make much sense to have multiple links terminate on the same router on both ends as then both these routers become single points of failure.

...

i don't even know what conversation we're in any more. why does it matter whether they are single points of failure, if this is the configuration for which PPLB was intended?

There is no requirement that all packets between two hosts follow the same path. So people who pplb have the IP architecture at their side, unlike those who implement anycast. So a little less blaming the victim would be in order. (Well, if there are any victims, because all of this happening is pretty unlikely.)

Paul Vixie

9:58 p.m.

...

...
as i said the other day, "all power tools can kill." if you turn on PPLB and it hurts, then turn it off until you can read the manual or take a class or talk to an expert. PPLB is a link bundling technology. if you turn it on in non-parallel-path situation, it will hurt you, so, "don't do that."

Yes, per packet load balancing will cause reordering, and if that's an issue you shouldn't use it. But if with pplb packets end up at two different hosts, that's not the fault of the people who invented per packet load balancing or the people who turned it on, but the fault of the people giving the same address to two different hosts.

since i already know that Iljitsch isn't listening, i'm not interested in debating him further. i would be interested in hearing from anybody else who thinks that turning on pplb in a eyeball-centric isp that has multiple upstream paths is a reasonable thing to do, even if there were no anycast services deployed anywhere in the world. at the moment i am completely certain that turning on pplb would be an irrational act, and would have a significant performance-dooming effect on a client population behind it, and that the times when pplb would actually be useful and helpful are very limited, and that anycast doesn't even enter into the reasons why doing as Iljitsch paints would be a bad idea. but my mind is open, if anyone can speak from experience on the matter. -- Paul Vixie

Iljitsch van Beijnum

10:01 p.m.

On 17-dec-04, at 22:58, Paul Vixie wrote:

...

since i already know that Iljitsch isn't listening, i'm not interested in debating him further.

[...]

...

but my mind is open, if anyone can speak from experience on the matter.

Right.

William Allen Simpson

10:53 p.m.

Paul Vixie wrote:

...

since i already know that Iljitsch isn't listening, i'm not interested in debating him further. i would be interested in hearing from anybody else who thinks that turning on pplb in a eyeball-centric isp that has multiple upstream paths is a reasonable thing to do, even if there were no anycast services deployed anywhere in the world. at the moment i am completely certain that turning on pplb would be an irrational act, and would have a significant performance-dooming effect on a client population behind it, and that the times when pplb would actually be useful and helpful are very limited, and that anycast doesn't even enter into the reasons why doing as Iljitsch paints would be a bad idea.

but my mind is open, if anyone can speak from experience on the matter.

I concur -- it's not reasonable. We debated these issues to death about PPP multi-link, which could be thought of as some variant of a single node talking to 2 (or more) disparate routers (NAS's are routers, after all). Can't depend on all the links attaching to the same dial-in NAS. Various companies developed protocols to make the NAS's look like a single router -- otherwise, it wouldn't work. Plenty of experience. Has nothing to do with anycast. -- William Allen Simpson Key fingerprint = 17 40 5E 67 15 6F 31 26 DD 0D B9 9B 6A 15 2C 32

Paul Vixie

18 Dec 18 Dec

9:31 p.m.

vixie@vix.com (Paul Vixie) (hey, that's me!) wrote:

...

...
...
as i said the other day, "all power tools can kill." if you turn on PPLB and it hurts, then turn it off until you can read the manual or take a class or talk to an expert. PPLB is a link bundling technology. if you turn it on in non-parallel-path situation, it will hurt you, so, "don't do that."

Iljitsch replied as follows:

...

...
Yes, per packet load balancing will cause reordering, and if that's an issue you shouldn't use it. But if with pplb packets end up at two different hosts, that's not the fault of the people who invented per packet load balancing or the people who turned it on, but the fault of the people giving the same address to two different hosts.

i then bypassed Iljitsch and went to the gallery:

...

i would be interested in hearing from anybody else who thinks that turning on pplb in a eyeball-centric isp that has multiple upstream paths is a reasonable thing to do, even if there were no anycast services deployed anywhere in the world.

so far, no takers. i've heard from rfc-writers who say pplb was never meant to be used the way Iljitsch is describing it, and i've heard from equipment vendors who say their customers don't do that and that if some customer did that and asked for support the response would be "don't do that!", and i've heard from network operators who say they would never do that, and i've heard from customers of network operators who did that with notable bad effects. but so far nobody has said "yes, what Iljitsch is describing should work." let me summarize. Iljitsch says that pplb is incompatible with anycast, since a pplb-using access router at the inner edge of an ISP could hear two different IGP routes to some destination, which ended up taking different exits from the ISP and thus different BGP paths. whereas pplb would normally only operate on equal-cost paths, the BGP->IGP path would hide the variance in BGP paths and make these "paths" eligible for pplb. i've said that pplb is only useful for turning two OC3's into an "OC6" (or similar circuit bundling where a pair of routers has multiple connections to eachother) and that even in this case, packet reordering is likely to occur, which will make tcp-flow performance suffer across this "link". i have also said that turning pplb on across non-parallel links, such as to multiple providers or through multiple tunnels or whatever, would pretty much guaranty that a word rhyming with "massive suckage" would occur. and i've made these claims independent of anycast -- that is, life will be bad if you use pplb outside its intended purpose, even if nobody anywhere was using anycast. loath though i am to treat a "preponderance of assertion" as equivilent to "proof", i see no alternative on this issue. noone is defending the use case Iljitsch is proposing. noone is even saying "i tried that and it was OK". lots of people are saying various things like "don't do that!" and "are you crazy?" it's important to point out a third time that it's indeed possible that Iljitsch's proposed use case for pplb would interact badly with anycast, and that i'm not arguing against that assertion. i'm saying that the pplb configuration proposed by Iljitsch would have really bad consequences even if noone, anywhere on the internet, was using anycast. and so we return to yesterday's statement:

...

at the moment i am completely certain that turning on pplb would be an irrational act, and would have a significant performance-dooming effect on a client population behind it, and that the times when pplb would actually be useful and helpful are very limited, and that anycast doesn't even enter into the reasons why doing as Iljitsch paints would be a bad idea.

and i'll repeat, again:

...

but my mind is open, if anyone can speak from experience on the matter.

and, "good luck storming the castle, boys." -- Paul Vixie

Iljitsch van Beijnum

20 Dec 20 Dec

12:28 p.m.

On 18-dec-04, at 22:31, Paul Vixie wrote:

...

...
i would be interested in hearing from anybody else who thinks that turning on pplb in a eyeball-centric isp that has multiple upstream paths is a reasonable thing to do, even if there were no anycast services deployed anywhere in the world.

...

so far, no takers. i've heard from rfc-writers who say pplb was never meant to be used the way Iljitsch is describing it, and i've heard from equipment vendors who say their customers don't do that and that if some customer did that and asked for support the response would be "don't do that!", and i've heard from network operators who say they would never do that, and i've heard from customers of network operators who did that with notable bad effects.

...

but so far nobody has said "yes, what Iljitsch is describing should work."

Apparently you also didn't get any pointers to RFCs or other authoritative sources that say "each and every packet injected into the internet must be delivered in sequence". You feel you get to decide what other people should and shouldn't do. I find that dangerous. As long as there is no standard or law that says something can't be done, people are free to do it. Apart from that, I'm not convinced per packet load balancing is as bad as people keep saying. In the absense of any research that I know of, my position is that per packet load balancing does have potential adverse effects, so per destination load balancing is preferred, but if there is a reason why pdlb doesn't fit the bill, pplb is a reasonable choice.

...

let me summarize. Iljitsch says that pplb is incompatible with anycast,

No. What I'm saying in general is that anycast isn't 100% problem free, so: 1. There should always be non-anycast alternatives 2. It would be good if we had a way (= BGP community) to make sure that anycasted routes aren't load balanced across I don't think either of these is unreasonable.

...

since a pplb-using access router at the inner edge of an ISP could hear two different IGP routes to some destination, which ended up taking different exits from the ISP and thus different BGP paths.

I'm not even sure if I understand this sentence, but it sure doesn't look like something I said. What I said was, that if you inject packets towards an anycasted address into two different routers within a certain AS, there is a very real possibility these two packets will end up at different anycast instances. I'm on very firm ground here as this follows directly from the BGP path selection rules. (Although in real life this wouldn't happen too often because customers tend to connect to two routers in the same or neighboring pops.)

...

whereas pplb would normally only operate on equal-cost paths, the BGP->IGP path would hide the variance in BGP paths and make these "paths" eligible for pplb.

Again: huh?

...

i've said that pplb is only useful for turning two OC3's into an "OC6" (or similar circuit bundling where a pair of routers has multiple connections to eachother) and that even in this case, packet reordering is likely to occur, which will make tcp-flow performance suffer across this "link".

But would the TCP performance over this "OC6 link" be better than that over a single OC3 link? That's the real question.

...

i have also said that turning pplb on across non-parallel links, such as to multiple providers or through multiple tunnels or whatever, would pretty much guaranty that a word rhyming with "massive suckage" would occur. and i've made these claims independent of anycast -- that is, life will be bad if you use pplb outside its intended purpose, even if nobody anywhere was using anycast.

Your argument is that since it's a bad idea to do this, nobody will, so making it even worse is ok. My argument is that even though it's a bad idea, some people will do it we shouldn't unnecessarily make things worse and/or make a reasonable effort to repair the damage.

...

loath though i am to treat a "preponderance of assertion" as equivilent to "proof", i see no alternative on this issue. noone is defending the use case Iljitsch is proposing. noone is even saying "i tried that and it was OK". lots of people are saying various things like "don't do that!" and "are you crazy?"

And we all know that when you tell people not to do something they don't, and there are no crazy people connected to the net.

Stephane Bortzmeyer

12:39 p.m.

[Warning: I've never actually deployed an anycast DNS setup so you are free to ignore my message.] On Mon, Dec 20, 2004 at 01:28:43PM +0100, Iljitsch van Beijnum <iljitsch@muada.com> wrote a message of 109 lines which said:

...

1. There should always be non-anycast alternatives

I believe there is a strong consensus about that. And therefore a strong agreement that ".org" is seriously wrong. This is after all a good engineering practice: when you deploy something new, do it carefully and not everywhere at the same time.

Joe Shen

1:53 p.m.

I don't think PPLB is compatible with anycast esp. in situation when we consider end-to-end communication with multiple packets. As PPLB may derive to out-of-sequence between TCP pacekets & different DNS server destination of the same UDP stream, it will broke anycast DNS service in some situation. So, if TCP based DNS requests is considered, flow-based load balancing should be considered which is total differnt from PPLB. Joe --- Iljitsch van Beijnum <iljitsch@muada.com> wrote:

...

On 18-dec-04, at 22:31, Paul Vixie wrote:

...
...
i would be interested in hearing from anybody else who thinks that turning on pplb in a eyeball-centric isp that has multiple upstream paths is a reasonable thing to do, even if there were no anycast services deployed anywhere in the world.

...
so far, no takers. i've heard from rfc-writers who say pplb was never meant to be used the way Iljitsch is describing it, and i've heard from equipment vendors who say their customers don't do that and that if some customer did that and asked for support the response would be "don't do that!", and i've heard from network operators who say they would never do that, and i've heard from customers of network operators who did that with notable bad effects.

...
but so far nobody has said "yes, what Iljitsch is describing should work."

Apparently you also didn't get any pointers to RFCs or other authoritative sources that say "each and every packet injected into the internet must be delivered in sequence".

You feel you get to decide what other people should and shouldn't do. I find that dangerous. As long as there is no standard or law that says something can't be done, people are free to do it.

Apart from that, I'm not convinced per packet load balancing is as bad as people keep saying. In the absense of any research that I know of, my position is that per packet load balancing does have potential adverse effects, so per destination load balancing is preferred, but if there is a reason why pdlb doesn't fit the bill, pplb is a reasonable choice.

...
let me summarize. Iljitsch says that pplb is incompatible with anycast,

No. What I'm saying in general is that anycast isn't 100% problem free, so:

1. There should always be non-anycast alternatives 2. It would be good if we had a way (= BGP community) to make sure that anycasted routes aren't load balanced across

I don't think either of these is unreasonable.

...
since a pplb-using access router at the inner edge of an ISP could hear two different IGP routes to some destination, which ended up taking different exits from the ISP and thus different BGP paths.

I'm not even sure if I understand this sentence, but it sure doesn't look like something I said. What I said was, that if you inject packets towards an anycasted address into two different routers within a certain AS, there is a very real possibility these two packets will end up at different anycast instances. I'm on very firm ground here as this follows directly from the BGP path selection rules. (Although in real life this wouldn't happen too often because customers tend to connect to two routers in the same or neighboring pops.)

...
whereas pplb would normally only operate on equal-cost paths, the BGP->IGP path would hide the variance in BGP paths and make these "paths" eligible for pplb.

Again: huh?

...
i've said that pplb is only useful for turning two OC3's into an "OC6" (or similar circuit bundling where a pair of routers has multiple connections to eachother) and that even in this case, packet reordering is likely to occur, which will make tcp-flow performance suffer across this "link".

But would the TCP performance over this "OC6 link" be better than that over a single OC3 link? That's the real question.

...
i have also said that turning pplb on across non-parallel links, such as to multiple providers or through multiple tunnels or whatever, would pretty much guaranty that a word rhyming with "massive suckage" would occur. and i've made these claims independent of anycast -- that is, life will be bad if you use pplb outside its intended purpose, even if nobody anywhere was using anycast.

Your argument is that since it's a bad idea to do this, nobody will, so making it even worse is ok. My argument is that even though it's a bad idea, some people will do it we shouldn't unnecessarily make things worse and/or make a reasonable effort to repair the damage.

...
loath though i am to treat a "preponderance of assertion" as equivilent to "proof", i see no alternative on this issue. noone is defending the use case Iljitsch is proposing. noone is even saying "i tried that and it was OK". lots of people are saying various things like "don't do that!" and "are you crazy?"

And we all know that when you tell people not to do something they don't, and there are no crazy people connected to the net.

__________________________________________________ Do You Yahoo!? Log on to Messenger with your mobile phone! http://sg.messenger.yahoo.com

bmanning＠vacation.karoshi.com

4:40 p.m.

...

...
but so far nobody has said "yes, what Iljitsch is describing should work."

Apparently you also didn't get any pointers to RFCs or other authoritative sources that say "each and every packet injected into the internet must be delivered in sequence".

er... please quote chapter/verse here. these are "packets" and have sequence numbers -BECAUSE- they may not be received in order. the end-system must be designed to place the packets back in order before presenting the data to the application. e.g. this is not a circuit switched network. --bill

Paul Vixie

4:44 p.m.

...

...
Apparently you also didn't get any pointers to RFCs or other authoritative sources that say "each and every packet injected into the internet must be delivered in sequence".

er... please quote chapter/verse here. these are "packets" and have sequence numbers -BECAUSE- they may not be received in order. the end-system must be designed to place the packets back in order before presenting the data to the application.

e.g. this is not a circuit switched network.

of course it will work. it just won't be particularly fast. specifically, it won't allow tcp to discover the actual end-to-end bandwidth*delay product, and therefore tcp won't set its window size advantageously, and some or all of the links along the path won't run at capacity. packet reordering is not fatal to the technology, but it is fatal to the business. "not everything that can be done, should be done."

Petri Helenius

5:39 p.m.

Paul Vixie wrote:

...

of course it will work. it just won't be particularly fast. specifically, it won't allow tcp to discover the actual end-to-end bandwidth*delay product, and therefore tcp won't set its window size advantageously, and some or all of the links along the path won't run at capacity. packet reordering is not fatal to the technology, but it is fatal to the business. "not everything that can be done, should be done."

Since when bad engineering is bad to the big business? The world is full of examples to the contrary. Pete

7604

Age (days ago)

7611

Last active (days ago)

List overview

Download

35 comments

18 participants

participants (18)

Alon Tirosh
bmanning＠vacation.karoshi.com
Crist Clark
Douglas K. Fischer
Iljitsch van Beijnum
Joe Abley
Joe Shen
Marshall Eubanks
Michael.Dillon＠radianz.com
Paul Vixie
Paul Vixie
Petri Helenius
Stephane Bortzmeyer
Steve Gibbard
Steven M. Bellovin
Suzanne Woolf
Valdis.Kletnieks＠vt.edu
William Allen Simpson

Anycast 101

Iljitsch van Beijnum

Crist Clark

Steven M. Bellovin

Crist Clark

Valdis.Kletnieks＠vt.edu

Douglas K. Fischer

Suzanne Woolf

Alon Tirosh

Stephane Bortzmeyer

Steve Gibbard

Iljitsch van Beijnum

Michael.Dillon＠radianz.com

Marshall Eubanks

William Allen Simpson

Iljitsch van Beijnum

William Allen Simpson

Steve Gibbard

Joe Shen

Iljitsch van Beijnum

Joe Abley

Joe Shen

Stephane Bortzmeyer

Iljitsch van Beijnum

William Allen Simpson

Paul Vixie

Iljitsch van Beijnum

Paul Vixie

Iljitsch van Beijnum

William Allen Simpson

Paul Vixie

Iljitsch van Beijnum

Stephane Bortzmeyer

Joe Shen

bmanning＠vacation.karoshi.com

Paul Vixie

Petri Helenius

tags

participants (18)