Anycast provider for SMTP?

Joe Hamelin

15 Jun 2015 15 Jun '15

5:50 p.m.

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change. Have any of you seen something like this work in the wild? -- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

Show replies by date

Jürgen Jaritsch

15 Jun 15 Jun

5:54 p.m.

I guess there is no real chance without conntrack ... I'll try to use something like LVS+mysql conntrack (no idea if this even exists ...) .... Jürgen Jaritsch Head of Network & Infrastructure ANEXIA Internetdienstleistungs GmbH Telefon: +43-5-0556-300 Telefax: +43-5-0556-500 E-Mail: jj@anexia.at Web: http://www.anexia.at Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601 -----Original Message----- From: Joe Hamelin [joe@nethead.com] Received: Montag, 15 Juni 2015, 19:51 To: NANOG list [nanog@nanog.org] Subject: Anycast provider for SMTP? I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change. Have any of you seen something like this work in the wild? -- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

Christopher Morrow

6:02 p.m.

On Mon, Jun 15, 2015 at 1:54 PM, Jürgen Jaritsch <jj@anexia.at> wrote:

...

I guess there is no real chance without conntrack ... I'll try to use something like LVS+mysql conntrack (no idea if this even exists ...) ....

not clear how helpful that is?

...

-----Original Message----- From: Joe Hamelin [joe@nethead.com] Received: Montag, 15 Juni 2015, 19:51 To: NANOG list [nanog@nanog.org] Subject: Anycast provider for SMTP?

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site

'when one site goes down' ... then the other works fine, right? smtp is not latency sensitive in the sense that a 30second timeout for a server will mean delivery to the secondary... right?

...

(virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

Joe Hamelin

6:13 p.m.

On Mon, Jun 15, 2015 at 11:02 AM, Christopher Morrow < morrowc.lists@gmail.com> wrote:

...

'when one site goes down' ... then the other works fine, right? smtp is not latency sensitive in the sense that a 30second timeout for a server will mean delivery to the secondary... right?

The two MX sites are connected via third party MPLS. The problem is when one MX site loses Internet connectivity the sending MTA may take up to 4 hours to resend and hopefully the DNS coin toss gives it the address of the site that is still connected. (Read as: French ISPs don't seem as robust as I'm use to in the US.) Since our mail traffic is international something like anycast would be nice. Now the other problem is we don't have an ASN or do external BGP ourselves. And not that it matters in a network sense, but this is a Domino mail system. I'm just trying to bring it up to year 2000 standards. -- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

...

William Herrin

7:15 p.m.

On Mon, Jun 15, 2015 at 2:13 PM, Joe Hamelin <joe@nethead.com> wrote:

...

The two MX sites are connected via third party MPLS. The problem is when one MX site loses Internet connectivity the sending MTA may take up to 4 hours to resend and hopefully the DNS coin toss gives it the address of the site that is still connected.

Hi Joe, Have you been able to document which originating MTA software misbehaves this way? Correct SMTP behavior is to attempt TCP connections to all IP addresses at each MX level in turn, and repeat for each MX level. Only upon failure of all of them. defer the message for later delivery. Interrupted connections (as opposed to timeouts) may go straight to deferred, figuring that bulk traffic like email should pause if congestion exhibits itself in the form of a stalled TCP connection. So it would make sense for a handful of messages to be delayed. And of course all bets are off if Internet connectivity is "flapping" instead of hard down. Regards, Bill Herrin -- William Herrin ................ herrin@dirtside.com bill@herrin.us Owner, Dirtside Systems ......... Web: <http://www.dirtside.com/>

John Orthoefer

5:55 p.m.

Well we, Genuity, use to use Cisco Distributed Director to do this. Basically it was a DNS server that ran on a Cisco Router, and could use a lot of different metrics to give an answer, which included routing based metrics. Johno

...

On Jun 15, 2015, at 1:50 PM, Joe Hamelin <joe@nethead.com> wrote:

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

-- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

William Herrin

6:09 p.m.

On Mon, Jun 15, 2015 at 1:50 PM, Joe Hamelin <joe@nethead.com> wrote:

...

My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

Anycast + TCP = much pain, for reasons which should be obvious. It's on the near side of impossible, but the far side of impractical. You'd spend a lot of money with some high-price software developers getting it to work.

...

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down.

Not sure why you'd have problems with this since it's a primary operating mode that SMTP was explicitly designed for. Can you elaborate on the kinds of trouble you've experienced? Regards, Bill Herrin -- William Herrin ................ herrin@dirtside.com bill@herrin.us Owner, Dirtside Systems ......... Web: <http://www.dirtside.com/>

Nick Hilliard

6:28 p.m.

On 15/06/2015 19:09, William Herrin wrote:

...

Anycast + TCP = much pain, for reasons which should be obvious.

This was presented at some conference or other a couple of years ago:

...

https://www.nanog.org/meetings/nanog37/presentations/matt.levine.pdf

Nick

Dave Taht

7:05 p.m.

On Mon, Jun 15, 2015 at 11:28 AM, Nick Hilliard <nick@foobar.org> wrote:

...

On 15/06/2015 19:09, William Herrin wrote:

...
Anycast + TCP = much pain, for reasons which should be obvious.

This was presented at some conference or other a couple of years ago:

...
https://www.nanog.org/meetings/nanog37/presentations/matt.levine.pdf

...

From that otherwise encouraging preso:

"What about IPv6? We have a plan! We plan to be dead before customers demand IPv6". I am pretty sure the authors are still alive(?). I have been using anycast at a small scale on mesh networks, for dns, primarily. Works.

...

Nick

-- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast

Joe Abley

7:34 p.m.

On 15 Jun 2015, at 15:05, Dave Taht wrote:

...

I have been using anycast at a small scale on mesh networks, for dns, primarily. Works.

Many of us have been using anycast at Internet scale for DNS for a couple of decades. I would go further than "works" and perhaps say "necessary". There were some wise words written in RFC 4786 about use of anycast with other protocols (well, I think they are wise, but then I wrote some of them): When a service is anycast between two or more nodes, the routing system makes the node selection decision on behalf of a client. Since it is usually a requirement that a single client-server interaction is carried out between a client and the same server node for the duration of the transaction, it follows that the routing system's node selection decision ought to be stable for substantially longer than the expected transaction time, if the service is to be provided reliably. Some services have very short transaction times, and may even be carried out using a single packet request and a single packet reply (e.g., DNS transactions over UDP transport). Other services involve far longer-lived transactions (e.g., bulk file downloads and audio- visual media streaming). Services may be anycast within very predictable routing systems, which can remain stable for long periods of time (e.g., anycast within a well-managed and topologically-simple IGP, where node selection changes only occur as a response to node failures). Other deployments have far less predictable characteristics (see Section 4.4.7). The stability of the routing system, together with the transaction time of the service, should be carefully compared when deciding whether a service is suitable for distribution using anycast. In some cases, for new protocols, it may be practical to split large transactions into an initialisation phase that is handled by anycast servers, and a sustained phase that is provided by non-anycast servers, perhaps chosen during the initialisation phase. This document deliberately avoids prescribing rules as to which protocols or services are suitable for distribution by anycast; to attempt to do so would be presumptuous. Operators should be aware that, especially for long running flows, there are potential failure modes using anycast that are more complex than a simple 'destination unreachable' failure using unicast. Joe

Dave Taht

7:54 p.m.

On Mon, Jun 15, 2015 at 12:34 PM, Joe Abley <jabley@hopcount.ca> wrote:

...

On 15 Jun 2015, at 15:05, Dave Taht wrote:

...
I have been using anycast at a small scale on mesh networks, for dns, primarily. Works.

Many of us have been using anycast at Internet scale for DNS for a couple of decades. I would go further than "works" and perhaps say "necessary".

Oh, I agree. My point was that anycast is also potentially of use in smaller (corporate/mesh) networks, not just in DNS, but smtp as being discussed here. Web and other forms of proxy, also. Other cases, like gittorrent? I'm pretty sure it's a bad idea for ntp, and for non-fully mirrored file distribution services.

...

There were some wise words written in RFC 4786 about use of anycast with other protocols (well, I think they are wise, but then I wrote some of them):

a good read.

...

When a service is anycast between two or more nodes, the routing system makes the node selection decision on behalf of a client. Since it is usually a requirement that a single client-server interaction is carried out between a client and the same server node for the duration of the transaction, it follows that the routing system's node selection decision ought to be stable for substantially longer than the expected transaction time, if the service is to be provided reliably.

Some services have very short transaction times, and may even be carried out using a single packet request and a single packet reply (e.g., DNS transactions over UDP transport). Other services involve far longer-lived transactions (e.g., bulk file downloads and audio- visual media streaming).

Services may be anycast within very predictable routing systems, which can remain stable for long periods of time (e.g., anycast within a well-managed and topologically-simple IGP, where node selection changes only occur as a response to node failures). Other deployments have far less predictable characteristics (see Section 4.4.7).

The stability of the routing system, together with the transaction time of the service, should be carefully compared when deciding whether a service is suitable for distribution using anycast. In some cases, for new protocols, it may be practical to split large transactions into an initialisation phase that is handled by anycast servers, and a sustained phase that is provided by non-anycast servers, perhaps chosen during the initialisation phase.

This document deliberately avoids prescribing rules as to which protocols or services are suitable for distribution by anycast; to attempt to do so would be presumptuous.

Operators should be aware that, especially for long running flows, there are potential failure modes using anycast that are more complex than a simple 'destination unreachable' failure using unicast.

Joe

-- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast

Randy Bush

16 Jun 16 Jun

midnight

...

"What about IPv6? We have a plan! We plan to be dead before customers demand IPv6". I am pretty sure the authors are still alive(?).

and customer demand for ipv6 still holds strong, right?

...

I have been using anycast at a small scale on mesh networks, for dns, primarily. Works.

dns is udp rand

Dave Taht

12:07 a.m.

On Mon, Jun 15, 2015 at 5:00 PM, Randy Bush <randy@psg.com> wrote:

...

...
"What about IPv6? We have a plan! We plan to be dead before customers demand IPv6". I am pretty sure the authors are still alive(?).

and customer demand for ipv6 still holds strong, right?

Does seem to be on the uptick!

...

...
I have been using anycast at a small scale on mesh networks, for dns, primarily. Works.

dns is udp

No. In my case, at least, I have been exhaustively testing dnsmasq + dnssec, which falls back to tcp a lot more often than it used to given all the headaches edns0 was causing, and cloudflare gleefully coming up with ever more innovative ways to dump weird stuff on the wire, like signing a domain with a control-c (\003.domain.com). Although 2.73 just (finally) shipped, I am still concerned about the tcp fallback in the anycast scenario. So I do kind of expect that there will be more tcp dns, and I think tcp dns is something android falls back to a lot, still.

...

rand

-- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast

Matt Palmer

1:26 a.m.

On Mon, Jun 15, 2015 at 05:07:22PM -0700, Dave Taht wrote:

...

On Mon, Jun 15, 2015 at 5:00 PM, Randy Bush <randy@psg.com> wrote:

...
...
"What about IPv6? We have a plan! We plan to be dead before customers demand IPv6". I am pretty sure the authors are still alive(?).

and customer demand for ipv6 still holds strong, right?

Does seem to be on the uptick!

It's certainly stronger than it has *ever* been before. - Matt -- I am cow, hear me moo, I weigh twice as much as you. I'm a cow, eating grass, methane gas comes out my ass. I'm a cow, you are too; join us all! Type apt-get moo.

Rafael Possamai

17 Jun 17 Jun

1:23 p.m.

https://www.google.com/intl/en/ipv6/statistics.html On Mon, Jun 15, 2015 at 8:26 PM, Matt Palmer <mpalmer@hezmatt.org> wrote:

...

On Mon, Jun 15, 2015 at 05:07:22PM -0700, Dave Taht wrote:

...
On Mon, Jun 15, 2015 at 5:00 PM, Randy Bush <randy@psg.com> wrote:

...
...
"What about IPv6? We have a plan! We plan to be dead before customers demand IPv6". I am pretty sure the authors are still alive(?).

and customer demand for ipv6 still holds strong, right?

Does seem to be on the uptick!

It's certainly stronger than it has *ever* been before.

- Matt

-- I am cow, hear me moo, I weigh twice as much as you. I'm a cow, eating grass, methane gas comes out my ass. I'm a cow, you are too; join us all! Type apt-get moo.

John Orthoefer

16 Jun 16 Jun

12:56 a.m.

...

On Jun 15, 2015, at 8:00 PM, Randy Bush <randy@psg.com> wrote:

dns is udp

15 years ago when we set up 4.2.2.1, there was a fair amount of TCP based DNS. We tried for a bit to support it via the anycast address, but ultimately we decided the support issues weren’t worth it. The few customers that asked/required it were given non-anycast addresses to use for TCP based DNS. I really think the OPs best answer is some DNS based load balancer, that can take metrics based on routing. johno

Bill Woodcock

15 Jun 15 Jun

6:13 p.m.

...

On Jun 15, 2015, at 10:50 AM, Joe Hamelin <joe@nethead.com> wrote:

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB?

It seems like you may be over-thinking this. You could, in fact, use anycast, in one of two ways: You could anycast the DNS, with servers in the US and Europe, and different MX metrics between the two, so anyone who’s nearby the European DNS server will see the European MX host as the first-choice, and anyone nearer the US DNS server will see the US MX host as first-choice. Or you could skip the MX records, and just put both US and European SMTP servers on the same IP address, which would save a lot of steps and simplify the system, but leave you with the _very_ occasional corner-case of someone equal-path-length load-balancing traffic to you such that half of one TCP session goes to Europe, and half the the US. That’s a bogeyman that scares a lot of people into not using anycast for TCP services, particularly long-lived ones, but it’s a theoretical problem rather than an actually-observed-in-the-wild problem. But since it scares people, it’s probably safer just doing the DNS anycast, rather than SMTP anycast, to avoid startling the easily-upset out there. :-) Either of these is vastly simpler and more reliable than trying to throw a load balancer into the mix. As you note, load balancers aren’t particularly HA. Always replace load balancers with crossconnects. Much more HA. -Bill

William Herrin

6:54 p.m.

On Mon, Jun 15, 2015 at 2:13 PM, Bill Woodcock <woody@pch.net> wrote:

...

Or you could skip the MX records, and just put both US and European SMTP servers on the same IP address, which would save a lot of steps and simplify the system, but leave you with the _very_ occasional corner-case of someone equal-path-length load-balancing traffic to you such that half of one TCP session goes to Europe, and half the the US. That’s a bogeyman that scares a lot of people into not using anycast for TCP services, particularly long-lived ones, but it’s a theoretical problem rather than an actually-observed-in-the-wild problem. But since it scares people, it’s probably safer just doing the DNS anycast, rather than SMTP anycast, to avoid startling the easily-upset out there. :-)

If I had a dollar for every system that's collapsed from a known but previously "theoretical" problem... It's only theoretical until a VIP can't connect. Deploy a system without covering the corner cases and your comeuppance is assured. Okay, granted you can probably cover your corner case here with a priority 20 MX that leads to a unicast address on one of the two servers. SMTP can let the rare fellow with the bisected packet flow gracefully fall back. Nevertheless, I think you've offered some really bad advice here Bill. Hijackers killing the passengers was a bogeyman too. If you just kept calm and cooperated, you lived through it. Until you didn't, and allowed yourself to be an instrument in killing thousands on the ground as a bonus. Sometimes the math offers really bad advice. On Mon, Jun 15, 2015 at 2:28 PM, Nick Hilliard <nick@foobar.org> wrote:

...

On 15/06/2015 19:09, William Herrin wrote:

...
Anycast + TCP = much pain, for reasons which should be obvious.

This was presented at some conference or other a couple of years ago: https://www.nanog.org/meetings/nanog37/presentations/matt.levine.pdf

Thought the comment on page 22 was apropos: their plan is to be dead before future change catches up with them. Regards, Bill Herrin -- William Herrin ................ herrin@dirtside.com bill@herrin.us Owner, Dirtside Systems ......... Web: <http://www.dirtside.com/>

Christopher Morrow

6:57 p.m.

On Mon, Jun 15, 2015 at 2:54 PM, William Herrin <bill@herrin.us> wrote:

...

Okay, granted you can probably cover your corner case here with a priority 20 MX that leads to a unicast address on one of the two servers. SMTP can let the rare fellow with the bisected packet flow gracefully fall back.

but 'well behaved smtp clients' should already be falling back right?

John Levine

7:17 p.m.

...

but 'well behaved smtp clients' should already be falling back right?

If you have multiple SMTP servers at the same priority, it's a pretty broken client that doesn't try them all until one works. That said, there is a depressing number of pretty broken SMTP clients. R's, John

Bill Woodcock

16 Jun 16 Jun

4:43 p.m.

...

On Jun 15, 2015, at 11:54 AM, William Herrin <bill@herrin.us> wrote: I think you've offered some really bad advice here Bill.

As I said, there are lots of people who _think_ it doesn’t work. And then there are people who’ve actually done it, and know better. Besides, you seem to not have read what I actually posted. In which the advice I gave was _not_ to do anycast TCP, so as to avoid having to deal with people who _think_ they know something, and are excessively verbal about it. Which is tedious. Perhaps better advice would have been to go ahead and do it, solving his problem, but to just not post to NANOG about it, so he doesn’t have to listen to people who think they know better telling him that what he’s doing isn’t possible. Bumblebees, flight, etc. -Bill

William Herrin

5:12 p.m.

On Tue, Jun 16, 2015 at 12:43 PM, Bill Woodcock <woody@pch.net> wrote:

...

...
On Jun 15, 2015, at 11:54 AM, William Herrin <bill@herrin.us> wrote: I think you've offered some really bad advice here Bill.

As I said, there are lots of people who _think_ it doesn’t work. And then there are people who’ve actually done it, and know better.

Uh huh. The numbers are clear: 99.99% of the time it works. The other 0.01% of the time you're screwed and had better pray the user is one of the ones you can afford to lose. Unicast TCP breaks too, but it has the virtue of being fixable 100% of the time.

...

Besides, you seem to not have read what I actually posted. In which the advice I gave was _not_ to do anycast TCP, so as to avoid having to deal with people who _think_ they know something

Just because I rolled my eyes so hard my vision blurred doesn't mean I failed to read your comment.

...

Perhaps better advice would have been to go ahead and do it, solving his problem, but to just not post to NANOG about it, so he doesn’t have to listen to people who think they know better telling him that what he’s doing isn’t possible.

If you read what Joe wrote, he doesn't currently have an AS number or employ BGP with his Internet providers. Extrapolate for his IPv4 assignment situation and the /24 announcement barrier. In an IPv4-depleted world, he won't be doing anycast any time soon, even if it was a sound plan. Regards, Bill Herrin -- William Herrin ................ herrin@dirtside.com bill@herrin.us Owner, Dirtside Systems ......... Web: <http://www.dirtside.com/>

Bill Woodcock

5:55 p.m.

...

If you read what Joe wrote, he doesn't currently have an AS number or employ BGP with his Internet providers. Extrapolate for his IPv4 assignment situation and the /24 announcement barrier. In an IPv4-depleted world, he won't be doing anycast any time soon…

…which is one of the reasons why I suggested that he do anycast DNS (presumably using a DNS service provider) rather than anycast SMTP (presumably using himself) anyway. So, regardless of how much you’re rolling your eyes, we’re saying the same thing. We’re just being testy about the details. -Bill

Mark Andrews

17 Jun 17 Jun

2:50 a.m.

In message <82D10008-CB76-42C7-A78C-EE876924DF1E@pch.net>, Bill Woodcock writes:

...

...
If you read what Joe wrote, he doesn't currently have an AS number or employ BGP with his Internet providers. Extrapolate for his IPv4 assignment situation and the /24 announcement barrier. In an IPv4-depleted world, he won't be doing anycast any time soon…

…which is one of the reasons why I suggested that he do anycast DNS (presumably using a DNS service provider) rather than anycast SMTP (presumably using himself) anyway.

So, regardless of how much you’re rolling your eyes, we’re saying the same thing. We’re just being testy about the details.

-Bill

If you are that worried about a anycast SMTP/TCP session breaking, you will be just as worried about a anycast DNS/TCP session breaking. That said the problem is that a client SMTP server doesn't retry fast enough when a TCP session breaks mid transaction. Anycast TCP will not fix this. I'm not aware of any SMTP client that takes 4 hours to try the next MX when connect fails and it was a 4 hour retry that was the complaint. Anycast will only help if the SMTP client doesn't try all the lowest cost MX's and there are very few broken SMTP clients that do this. The best fix for these is to identify the clients and get them upgraded to something that is RFC compliant. Trying multiple MXs is a 20+ year old requirement. Basically you are wasting your money on anycast SMTP. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

John Levine

16 Jun 16 Jun

6:28 p.m.

...

Uh huh. The numbers are clear: 99.99% of the time it works. The other 0.01% of the time you're screwed and had better pray the user is one of the ones you can afford to lose.

Unicast TCP breaks too, but it has the virtue of being fixable 100% of the time.

I love the wry humor on the nanog list. R's, John PS:

...

If you read what Joe wrote, he doesn't currently have an AS number or employ BGP with his Internet providers. Extrapolate for his IPv4 assignment situation and the /24 announcement barrier.

Assuming he has his own address space, why couldn't he just tell them what the IPs are and ask them to announce it, like any other customer does?

Masataka Ohta

7:49 p.m.

William Herrin wrote:

...

If you read what Joe wrote, he doesn't currently have an AS number or employ BGP with his Internet providers. Extrapolate for his IPv4 assignment situation and the /24 announcement barrier. In an IPv4-depleted world, he won't be doing anycast any time soon, even if it was a sound plan.

Anyone having /24 can start hosting business with 255*N anycast servers. Masataka Ohta

Owen DeLong

8:45 p.m.

...

On Jun 16, 2015, at 12:49 , Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:

William Herrin wrote:

...
If you read what Joe wrote, he doesn't currently have an AS number or employ BGP with his Internet providers. Extrapolate for his IPv4 assignment situation and the /24 announcement barrier. In an IPv4-depleted world, he won't be doing anycast any time soon, even if it was a sound plan.

Anyone having /24 can start hosting business with 255*N anycast servers.

Masataka Ohta

I don’t think that’s quite true… I think you will find that 254*N is probably the best theoretical Max with just a /24 and that more likely, you’ll need some hosts on that subnet that don’t necessarily provide anycast services bringing the practical limit somewhat lower. Of course, if you have what you need to do 255, you can probably actually do 256. Owen

Jon Lewis

9:06 p.m.

On Tue, 16 Jun 2015, Owen DeLong wrote:

...

...
On Jun 16, 2015, at 12:49 , Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp> wrote:

William Herrin wrote:

...
If you read what Joe wrote, he doesn't currently have an AS number or employ BGP with his Internet providers. Extrapolate for his IPv4 assignment situation and the /24 announcement barrier. In an IPv4-depleted world, he won't be doing anycast any time soon, even if it was a sound plan.

Anyone having /24 can start hosting business with 255*N anycast servers.

Masataka Ohta

I donÿÿt think thatÿÿs quite trueÿÿ I think you will find that 254*N is probably the best theoretical Max with just a /24 and that more likely, youÿÿll need some hosts on that subnet that donÿÿt necessarily provide anycast services bringing the practical limit somewhat lower. Of course, if you have what you need to do 255, you can probably actually do 256.

Advertise the /24, internally route 256 /32s to the devices that service those IPs on one or more networks numbered out of other IP ranges. The machines all need unique unicast IPs anyway. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________

Guillaume Tournat

15 Jun 15 Jun

6:40 p.m.

Give a look at hosted GSLB service, FortiDirector, which I have set up for a customer (for SMTP, Exchange, ActiveSync world wide services.

...

Le 15 juin 2015 à 19:50, Joe Hamelin <joe@nethead.com> a écrit :

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

-- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

Joe Abley

6:58 p.m.

Hi Joe, On 15 Jun 2015, at 13:50, Joe Hamelin wrote:

...

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

If you can give responses to QTYPE=MX queries that match the location of the client, you can approximate this without deploying your SMTP servers using anycast. This feels like a simpler solution to operate; anycast sometimes pits BGP-fearing, syseng people against neteng people when things break at 3am, and if that rings true for you then a solution that avoids it might be of interest. So, suppose clients in region A could query NETHEAD.COM/IN/MX and get a response that looks like NETHEAD.COM. IN MX 10 REGION-A-MX.NETHEAD.COM. IN MX 20 REGION-B-MX.NETHEAD.COM. IN MX 20 REGION-C-MX.NETHEAD.COM. whereas clients in region B might see a response that looks more sensible to them: NETHEAD.COM. IN MX 10 REGION-B-MX.NETHEAD.COM. IN MX 20 REGION-A-MX.NETHEAD.COM. IN MX 20 REGION-C-MX.NETHEAD.COM. etc, etc. That way you still get a reasonable fallback in the event that one MX target is unreachable for a particular client, but you steer the bulk of your traffic in a way that makes sense (and which your syseng people don't have to understand the details of). You can achieve the above DNS trickery using various load balancers that other people in this thread have already mentioned. You can also install your own geomaps in your own nameservers and handle it yourself, or you can buy managed DNS service from various people that can do this kind of thing. Disclaimer: Dyn, for whom I work, sells such a service. Joe

James Hartig

19 Jun 19 Jun

5:19 a.m.

...

You can achieve the above DNS trickery using various load balancers that other people in this thread have already mentioned. You can also install your own geomaps in your own nameservers and handle it yourself, or you can buy managed DNS service from various people that can do this kind of thing.

Just curious, how does DNS load balancing work if people are using 8.8.8.8/208.67.222.222 or basically any public resolvers that cache and have a significant (relatively speaking) user-base? Is the actual percent of requests so small that it doesn't matter? -- James

Christopher Morrow

12:12 p.m.

On Fri, Jun 19, 2015 at 7:19 AM, James Hartig <fastest963@gmail.com> wrote:

...

Just curious, how does DNS load balancing work if people are using 8.8.8.8/208.67.222.222 or basically any public resolvers that cache and

don't know exactly, but you might get some interesting clues from the f-root or as112 designs, eh?

Joe Abley

1:42 p.m.

On 19 Jun 2015, at 8:12, Christopher Morrow wrote:

...

On Fri, Jun 19, 2015 at 7:19 AM, James Hartig <fastest963@gmail.com> wrote:

...
Just curious, how does DNS load balancing work if people are using 8.8.8.8/208.67.222.222 or basically any public resolvers that cache and

If the client that performs the upstream query within the 8.8.8.8/whatever infrastructure is close to you for some meaningful interpretation of "close" then you still get an answer that is (effectively) localised for you. If the resolver infrastructure is sufficiently far that what is good for it is not good for you, then the deployed (if not quite standardised) answer is edns-client-subnet: the resolver infrastructure you're using embeds your client address in its upstream query. The authority servers can then localise a response (and scope it) as being suitable for you, not the resolver in general. http://tools.ietf.org/html/draft-vandergaast-edns-client-subnet-02 There are privacy concerns, here. But we might posit that you've already in the business of trading privacy for convenience if you're using a public resolver.

...

don't know exactly, but you might get some interesting clues from the f-root or as112 designs, eh?

Root servers and AS112 servers don't steer clients towards content according to where they are. They give consistent answers for all queries, regardless of where they came from. Joe

Christopher Morrow

1:47 p.m.

<embarassed> On Fri, Jun 19, 2015 at 3:42 PM, Joe Abley <jabley@hopcount.ca> wrote:

...

On 19 Jun 2015, at 8:12, Christopher Morrow wrote:

...
On Fri, Jun 19, 2015 at 7:19 AM, James Hartig <fastest963@gmail.com> wrote:

...
Just curious, how does DNS load balancing work if people are using 8.8.8.8/208.67.222.222 or basically any public resolvers that cache and

...
don't know exactly, but you might get some interesting clues from the f-root or as112 designs, eh?

Root servers and AS112 servers don't steer clients towards content according to where they are. They give consistent answers for all queries, regardless of where they came from.

dang you jabley! I didn't see the 'if using' part :( my answer(s) are irrelevant!

Rob Seastrom

20 Jun 20 Jun

1:22 p.m.

"Joe Abley" <jabley@hopcount.ca> writes:

...

http://tools.ietf.org/html/draft-vandergaast-edns-client-subnet-02

There are privacy concerns, here. But we might posit that you've already in the business of trading privacy for convenience if you're using a public resolver.

Personally, I've always thought the privacy concerns of draft-vandergaast (not of using public recursive servers) are overwrought. The entity running the recursive nameserver has knowledge of the exact address (not just the subnet) that you're sending the query from, by inspection of the packet. The entity running the authoritative nameserver does not... but unless you're using DNS for some kind of off-label purpose ( http://code.kryo.se/iodine/ comes immediately to mind), the next thing you'll be doing once you have the reply is opening some kind of connection to the address returned... at which point the target entity will be able to tell the exact address that you're coming from. This assessment makes the assumption that the folks running the authoritative DNS servers are either the target entity or its agent. If that's an invalid assumption, one might say you have bigger problems. If someone could explain a privacy concern here that doesn't involve dipping into my meager tinfoil supply (I'm low and not going to the grocery until tomorrow), that would be swell. -r

Tony Finch

19 Jun 19 Jun

12:47 p.m.

James Hartig <fastest963@gmail.com> wrote:

...

Just curious, how does DNS load balancing work if people are using 8.8.8.8/208.67.222.222 or basically any public resolvers that cache and have a significant (relatively speaking) user-base?

http://www.afasterinternet.com/ietfdraft.htm Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ Fisher, German Bight: Northwest 4 or 5, increasing 6 at times. Slight or moderate. Showers. Good, occasionally moderate.

Christopher Morrow

1:46 p.m.

On Fri, Jun 19, 2015 at 2:47 PM, Tony Finch <dot@dotat.at> wrote:

...

James Hartig <fastest963@gmail.com> wrote:

...
Just curious, how does DNS load balancing work if people are using 8.8.8.8/208.67.222.222 or basically any public resolvers that cache and have a significant (relatively speaking) user-base?

http://www.afasterinternet.com/ietfdraft.htm

that doesn't address how packets get to the address or back though, right? that's about the content in the packet.

Bill Woodcock

5:06 p.m.

...

On Jun 18, 2015, at 10:19 PM, James Hartig <fastest963@gmail.com> wrote: Just curious, how does DNS load balancing work if people are using 8.8.8.8/208.67.222.222 or basically any public resolvers that cache and have a significant (relatively speaking) user-base? Is the actual percent of requests so small that it doesn't matter?

The percent of requests is significant, but OpenDNS and Google and the other significant open resolvers are, themselves, anycast, so the geographic correlation is preserved. Also, there’s an RFC for passing an origin IP tag along to the authoritative server, but I don’t know if anyone’s actually doing anything with that on any global inter-provider scale. -Bill

Max Tulyev

15 Jun 15 Jun

7:16 p.m.

I see no major problems to use anycast for that. The problem will be in rare case when particular routing chain from client to one of your servers will be changed until TCP stream is active. SMTP have short connections. Even if it happens, it will look as just broken connection for client, and it will shortly re-try it. Am I lost something? On 15.06.15 20:50, Joe Hamelin wrote:

...

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

-- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

Rafael Possamai

7:45 p.m.

I could be mistaken, but you might get all of this done with AWS's Route53. I would read this: http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html... The other step would be to setup HA in each SMTP node (US and France) such as LB or Failover. Just an idea. On Mon, Jun 15, 2015 at 12:50 PM, Joe Hamelin <joe@nethead.com> wrote:

...

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

-- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

Joe Hamelin

7:52 p.m.

On Mon, Jun 15, 2015 at 12:45 PM, Rafael Possamai <rafael@gav.ufsc.br> wrote:

...

The other step would be to setup HA in each SMTP node (US and France) such as LB or Failover. Just an idea.

I'll look at the AWS doc, thanks.

The mailserver is seldom the problem (it's an AS/400) but the ISP pipe experiences prolonged outages. -- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

Rafael Possamai

8:58 p.m.

You're welcome. I hope that helps. On another note, if your internet pipe in Europe isn't as stable as your pipe in the US, then you could also try and have your infrastructure provider blend your uplink with two or more carrier-grade paths. You wouldn't have to worry about signing up for and maintaining an AS, but you could improve your uptime significantly. On Mon, Jun 15, 2015 at 2:52 PM, Joe Hamelin <joe@nethead.com> wrote:

...

On Mon, Jun 15, 2015 at 12:45 PM, Rafael Possamai <rafael@gav.ufsc.br> wrote:

...
The other step would be to setup HA in each SMTP node (US and France) such as LB or Failover. Just an idea.

I'll look at the AWS doc, thanks.

The mailserver is seldom the problem (it's an AS/400) but the ISP pipe experiences prolonged outages.

-- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

Joe Hamelin

9:08 p.m.

On Mon, Jun 15, 2015 at 1:58 PM, Rafael Possamai <rafael@gav.ufsc.br> wrote:

...

You're welcome. I hope that helps.

On another note, if your internet pipe in Europe isn't as stable as your pipe in the US, then you could also try and have your infrastructure provider blend your uplink with two or more carrier-grade paths. You wouldn't have to worry about signing up for and maintaining an AS, but you could improve your uptime significantly.

It seems to be more of a last-mile backhoe fade issue right now. I'm trying to convince them that a manufacturing facility isn't a good place for a data center. -- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

William Herrin

17 Jun 17 Jun

2:12 a.m.

On Mon, Jun 15, 2015 at 5:08 PM, Joe Hamelin <joe@nethead.com> wrote:

...

It seems to be more of a last-mile backhoe fade issue right now. I'm trying to convince them that a manufacturing facility isn't a good place for a data center.

Backhoes seem to have gotten you for a day or so now. My mail to you is deferred on my server and: nslookup -q=mx nethead.com Server: 192.168.99.1 Address: 192.168.99.1#53 Non-authoritative answer: nethead.com mail exchanger = 10 tulalip.us. nethead.com mail exchanger = 0 hamelin.us. telnet hamelin.us. 25 Trying 208.71.161.175... Connection failed: No route to host traceroute -T -p 25 hamelin.us. traceroute to hamelin.us. (208.71.161.175), 30 hops max, 60 byte packets 1 lo0-100.WASHDC-VFTTP-312.verizon-gni.net (71.246.241.1) 1.091 ms 1.127 ms 1.442 ms 2 T1-3-0-4.WASHDC-LCR-22.verizon-gni.net (130.81.221.218) 3.869 ms T2-9-0-13.WASHDC-LCR-22.verizon-gni.net (100.41.137.158) 5.005 ms T2-9-0-13.WASHDC-LCR-21.verizon-gni.net (100.41.137.88) 5.651 ms 3 * * * 4 0.ae3.BR2.IAD8.ALTER.NET (140.222.227.195) 6.399 ms 6.578 ms 6.668 ms 5 204.255.168.226 (204.255.168.226) 5.324 ms 5.744 ms 6.168 ms 6 207.88.14.162.ptr.us.xo.net (207.88.14.162) 79.304 ms 74.726 ms 75.877 ms 7 vb6.rar3.chicago-il.us.xo.net (207.88.12.33) 72.258 ms 75.141 ms 72.125 ms 8 te-4-1-0.rar3.denver-co.us.xo.net (207.88.12.22) 74.619 ms 74.544 ms 74.475 ms 9 te-3-0-0.rar3.seattle-wa.us.xo.net (207.88.12.81) 78.125 ms 78.264 ms 77.969 ms 10 ae0d0.cir1.seattle7-wa.us.xo.net (207.88.13.141) 74.881 ms 76.052 ms 76.469 ms 11 216.156.100.146.ptr.us.xo.net (216.156.100.146) 89.162 ms 88.563 ms 89.005 ms 12 cr2-sea-b-te-0-0-0-9.bb.spectrumnet.us (174.127.140.158) 85.827 ms cr2-sea-b-te-0-0-0-8.bb.spectrumnet.us (174.127.140.154) 86.021 ms 85.414 ms 13 cr1-bds-te-0-0-0-1.bb.spectrumnet.us (174.127.138.123) 88.308 ms cr1-bds-te-0-0-0-3.bb.spectrumnet.us (174.127.138.127) 86.834 ms cr1-bds-te-0-0-0-1.bb.spectrumnet.us (174.127.138.123) 87.826 ms 14 TulalipTribes-1000M-BDS.demarc.spectrumnet.us (216.243.26.98) 88.101 ms 87.321 ms 88.475 ms 15 * 208.83.58.225 (208.83.58.225) 88.298 ms 88.084 ms 16 74.112.52.200 (74.112.52.200) 87.485 ms 86.812 ms 86.365 ms 17 host-208-71-161-250.tulalipbroadband.com (208.71.161.250) 86.317 ms 86.103 ms 86.366 ms 18 host-208-71-161-175.tulalip.us (208.71.161.175) 108.818 ms 108.118 ms 107.581 ms 19 host-208-71-161-175.tulalip.us (208.71.161.175) 2605.488 ms !H 2617.132 ms !H * -Bill -- William Herrin ................ herrin@dirtside.com bill@herrin.us Owner, Dirtside Systems ......... Web: <http://www.dirtside.com/>

Robert Blayzor

16 Jun 16 Jun

11:57 p.m.

On Jun 15, 2015, at 1:50 PM, Joe Hamelin <joe@nethead.com> wrote:

...

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

F5 GTM? Depending on what your DNS volume is you could probably get away with a couple of virtual appliances… -- Robert inoc.net!rblayzor Jabber: rblayzor.AT.inoc.net PGP Key: 78BEDCE1 @ pgp.mit.edu

Rafael Possamai

17 Jun 17 Jun

4:02 a.m.

Any luck on a DNS based solution? On Mon, Jun 15, 2015 at 12:50 PM, Joe Hamelin <joe@nethead.com> wrote:

...

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

-- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

Joe Hamelin

4:36 a.m.

On Tue, Jun 16, 2015 at 9:02 PM, Rafael Possamai <rafael@gav.ufsc.br> wrote:

...

Any luck on a DNS based solution?

I'm looking into a F5 GTM solution based out of a colo we have in Europe to direct SMTP between France and the US hubs. Now I just have to work layers 8 & 9. Remember when users didn't expect sub-minute delivery times? Thanks for everyone's help, you've give me a lot of good ideas to consider and I've learned more than I ever thought I would about anycast. Although I'm not on the BGP end of things anymore I value the minds, personalities and pure history that NANOG brings. Total side note: I remember back at a NANOG in Atlanta, 2000 maybe, at a BOF on ARIN allocations where I was arguing for netblocks less than a /21 because Amazon couldn't justify that much at that time, I mean we only had one public site but still wanted to multi-home. I remember Randy Bush even backed me up on that one. In the end I did get a block for Amazon and brought up BGP. Oh how times have changed (and how I wish I still had those stock options!) Best regards, Joe (ex JH484) -- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

...

Ray Soucy

7:13 p.m.

Anycast is generally not well-suited for stateful connectivity (e.g. most things TCP). The use case for anycast is restricted to simple challenge-response protocol design. As such, you typically only see it leveraged for simple services (e.g. DNS, NTP). The reason for this, as you suspect, is you can never guarantee that the path and thus the server will remain consistent across client connections. Ideally you can leverage DNS to provide a response to a unicast resource rather than trying to make the service itself anycast. DNS can be anycast, and DNS can provide different responses based on geographical location, but these can happen independently or together. As you still want failover, you might opt to announce the MX record with the priorities reversed but still pointing to each server. For example MX 10 server1, MX 20 server2 on one side, and MX 10 server2, MX 20 server1 on the other. Typically you would use a DNS load balancer rather than simple anycast DNS to achieve this though. On Mon, Jun 15, 2015 at 1:50 PM, Joe Hamelin <joe@nethead.com> wrote:

...

I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

-- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

-- Ray Patrick Soucy Network Engineer University of Maine System T: 207-561-3526 F: 207-561-3531 MaineREN, Maine's Research and Education Network www.maineren.net

Chuck Church

9:12 p.m.

----Original Message----- From: NANOG [mailto:nanog-bounces@nanog.org] On Behalf Of Ray Soucy Sent: Wednesday, June 17, 2015 3:14 PM To: Joe Hamelin Cc: NANOG list Subject: Re: Anycast provider for SMTP?

...

As such, you typically only see it leveraged for simple services (e.g. DNS, NTP).

I've been thinking about this for NTP. Wouldn't you end up with constant corrections with NTP and Anycast? Or is the assumption your anycasted NTP hosts are all peers of each other and extremely close in time to one another? That still wouldn't address the latency differences between the different hosts. Chuck

Joe Abley

9:27 p.m.

On Jun 17, 2015, at 17:15, Chuck Church <chuckchurch@gmail.com> wrote:

...

...
As such, you typically only see it leveraged for simple services (e.g. DNS, NTP).

I've been thinking about this for NTP. Wouldn't you end up with constant corrections with NTP and Anycast?

I am not a time geek, but the general and consistent advice I have heard from actual such geeks is, as you suspected, not to use anycast to distribute NTP service. I imagine that advice could be modified somewhat if you differentiate between NTP as used within a mesh of well-synchronised clocks and NTP as an occasional service for mobile clients that require only a loose sense of now. The latter seems like availability might be more important than stability over an extended period, so anycast might make sense there. Joe

Ray Soucy

9:38 p.m.

NTP might have been a bad example for the timing reasons. One thing to keep in mind with anycast is that unless there are problems the routes are fairly stable and depending on how many servers you deploy and what route visibility you have even different providers will often see the same location as the closest path in terms of BGP. I believe pool.ntp.org employs anycast to some extent, but I'm not sure about that. SNTP seems to to have a discovery component designed to work well with anycast. RFC 7094 has a good summary of all this. In general, the consensus seems to be that anycast is better used for discovery services rather than services themselves. On Wed, Jun 17, 2015 at 5:12 PM, Chuck Church <chuckchurch@gmail.com> wrote:

...

----Original Message----- From: NANOG [mailto:nanog-bounces@nanog.org] On Behalf Of Ray Soucy Sent: Wednesday, June 17, 2015 3:14 PM To: Joe Hamelin Cc: NANOG list Subject: Re: Anycast provider for SMTP?

...
As such, you typically only see it leveraged for simple services (e.g. DNS, NTP).

I've been thinking about this for NTP. Wouldn't you end up with constant corrections with NTP and Anycast? Or is the assumption your anycasted NTP hosts are all peers of each other and extremely close in time to one another? That still wouldn't address the latency differences between the different hosts.

Chuck

-- Ray Patrick Soucy Network Engineer University of Maine System T: 207-561-3526 F: 207-561-3531 MaineREN, Maine's Research and Education Network www.maineren.net

Kurt Kraut

18 Jun 18 Jun

8:13 a.m.

Ray, "Anycast is generally not well-suited for stateful connectivity (e.g. most things TCP)." I don't know anything that would support that claim. I have been using for years BGP anycast for audio and video streaming, always in TCP (RTMP, HLS, WMS, and even the good and old ShoutCast) and works like a charm. And this is the 'secret sauce' of the company I work for, the thing we do better than our competitors that make our users happy and never wanting to leave us: anycast. We have customers that are TV stations and stream 24x7x365 their content and they have watchers getting their streaming also 24x7x365 (like waiting rooms, airports) with no complaints or instability. Best regards, Kurt Kraut 2015-06-17 16:13 GMT-03:00 Ray Soucy <rps@maine.edu>:

...

Anycast is generally not well-suited for stateful connectivity (e.g. most things TCP). The use case for anycast is restricted to simple challenge-response protocol design.

As such, you typically only see it leveraged for simple services (e.g. DNS, NTP).

The reason for this, as you suspect, is you can never guarantee that the path and thus the server will remain consistent across client connections.

Ideally you can leverage DNS to provide a response to a unicast resource rather than trying to make the service itself anycast. DNS can be anycast, and DNS can provide different responses based on geographical location, but these can happen independently or together.

As you still want failover, you might opt to announce the MX record with the priorities reversed but still pointing to each server. For example MX 10 server1, MX 20 server2 on one side, and MX 10 server2, MX 20 server1 on the other.

Typically you would use a DNS load balancer rather than simple anycast DNS to achieve this though.

On Mon, Jun 15, 2015 at 1:50 PM, Joe Hamelin <joe@nethead.com> wrote:

...
I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

-- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

-- Ray Patrick Soucy Network Engineer University of Maine System

T: 207-561-3526 F: 207-561-3531

MaineREN, Maine's Research and Education Network www.maineren.net

Christopher Morrow

8:22 a.m.

On Thu, Jun 18, 2015 at 4:13 AM, Kurt Kraut via NANOG <nanog@nanog.org> wrote:

...

Ray,

"Anycast is generally not well-suited for stateful connectivity (e.g. most things TCP)."

I don't know anything that would support that claim. I have been using for years BGP anycast for audio and video streaming, always in TCP (RTMP, HLS, WMS, and even the good and old ShoutCast) and works like a charm. And this is the 'secret sauce' of the company I work for, the thing we do better than our competitors that make our users happy and never wanting to leave us: anycast.

We have customers that are TV stations and stream 24x7x365 their content and they have watchers getting their streaming also 24x7x365 (like waiting rooms, airports) with no complaints or instability.

most of this conversation is a distraction from the OP's question though... since his core problem is: "Broken mta behaviour" and won't really be solved with anycast/etc. TCP anycast seems to work just fine, agreed. UDP anycast seems to work just fine, agreed. any anycast deployment has it's warts, understanding that and them will make operations and services smoother.

Ray Soucy

11:51 a.m.

I gave a pretty broad answer because the question was about hosting mail servers using anycast. I don't think what I was getting at in regards to stateful vs. stateless was incorrect, but I was talking about the application level not the nature of the protocol and throwing TCP in there confused the issue (I wasn't talking about TCP itself as a stateful protocol; notice I said "most things"). You can certainly do anycast with TCP, and for small stateless services it can be effective. You can't do anycast for a stateful application without taking the split-brain problem into account. The entire CDN model was developed with anycast in mind, so yes, I'm sure it does work quite well. It generally fits the description of a stateless service, and if it does implement a stateful service it's designed such that nodes have a method of sharing information (perhaps using an eventually consistent model). Taking a normal application, like mail or a dynamic website, and just using anycast for load balancing without designing the service with the anycast model in mind is probably not a good idea. You need to expect that the same user could access different systems, and design for that. The real point here is the problem OP is describing should be easily handled by having proper MX records, and getting into anycast for mail is likely not the right choice (unless maybe your goal is to be really efficient at SPAM). I'd like to know more on what problems he's seeing. On Thu, Jun 18, 2015 at 4:13 AM, Kurt Kraut <listas@kurtkraut.net> wrote:

...

Ray,

"Anycast is generally not well-suited for stateful connectivity (e.g. most things TCP)."

I don't know anything that would support that claim. I have been using for years BGP anycast for audio and video streaming, always in TCP (RTMP, HLS, WMS, and even the good and old ShoutCast) and works like a charm. And this is the 'secret sauce' of the company I work for, the thing we do better than our competitors that make our users happy and never wanting to leave us: anycast.

We have customers that are TV stations and stream 24x7x365 their content and they have watchers getting their streaming also 24x7x365 (like waiting rooms, airports) with no complaints or instability.

Best regards,

Kurt Kraut

2015-06-17 16:13 GMT-03:00 Ray Soucy <rps@maine.edu>:

...
Anycast is generally not well-suited for stateful connectivity (e.g. most things TCP). The use case for anycast is restricted to simple challenge-response protocol design.

As such, you typically only see it leveraged for simple services (e.g. DNS, NTP).

The reason for this, as you suspect, is you can never guarantee that the path and thus the server will remain consistent across client connections.

Ideally you can leverage DNS to provide a response to a unicast resource rather than trying to make the service itself anycast. DNS can be anycast, and DNS can provide different responses based on geographical location, but these can happen independently or together.

As you still want failover, you might opt to announce the MX record with the priorities reversed but still pointing to each server. For example MX 10 server1, MX 20 server2 on one side, and MX 10 server2, MX 20 server1 on the other.

Typically you would use a DNS load balancer rather than simple anycast DNS to achieve this though.

On Mon, Jun 15, 2015 at 1:50 PM, Joe Hamelin <joe@nethead.com> wrote:

...
I have a mail system where there are two MX hosts, one in the US and one in Europe. Both have a DNS MX record metric of 10 so a bastardized round-robin takes place. This does not work so well when one site goes down. My solution will be to place a load balancer in a hosting site (virtual, of course) and have it provide HA. But what about HA for the LB? At first glance anycasting would seem to be a great idea but there is a problem of broken sessions when routes change.

Have any of you seen something like this work in the wild?

-- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

-- Ray Patrick Soucy Network Engineer University of Maine System

T: 207-561-3526 F: 207-561-3531

MaineREN, Maine's Research and Education Network www.maineren.net

-- Ray Patrick Soucy Network Engineer University of Maine System T: 207-561-3526 F: 207-561-3531 MaineREN, Maine's Research and Education Network www.maineren.net

Joe Abley

1:08 p.m.

On 18 Jun 2015, at 7:51, Ray Soucy wrote:

...

You can certainly do anycast with TCP, and for small stateless services it can be effective. You can't do anycast for a stateful application without taking the split-brain problem into account.

It's really difficult to apply broad "can" or "can't", "works" or "doesn't work" advice here since there really are no absolutes. What works and what doesn't depends on the intersection between theory and practice (including other peoples' networks), and is broader than the architectural decision to use or not use anycast. The text I pasted much earlier from RFC 4786 was a result of a lot of discussion (and more than a handful of objections to our attempts to answer this question, and to the document as a whole existing at all). In the general, mathematical sense, it's never safe to use anycast with TCP; "safe" here means "entirely safe in all circumstances". Since we live on the Internet, we know nowhere is safe, so this answer is unsatisfying and doesn't help us make real-world decisions. In the pragmatic, throw it at the wall and see what sticks sense, it's usually fine to use anycast with TCP; "usually" means things like "pretty sure I remember this working just fine at my last job" and "in our very particular situation the helpdesk phone didn't seem to ring". There's usually very little science attached to this answer, either in terms of comprehensive data about failures or in terms of characterising the precise environment and considering the ways in which it is similar or dissimilar to others. If anycast is being considered as part of a solution to a particular problem, we might consider an answer of the form "anycast, when it works, is expected to solve that problem; anycast might introduce new problems, though, so we also need to think about a fall-back to a situation where the old problems are reintroduced but the new ones are gone". This kind of fudges around the difficulty in confidently enumerating all the new problems with an anticipation that anycast will work enough of the time to make it worth using at all. So, in the example at hand, using an MX RRSet that tries first to deliver to an SMTP service that is distributed using anycast but will fall back to SMTP service that is not might be a reasonable approach, e.g. $ORIGIN QUIRKAFLEEG.ORG. @ MX 10 ANY.MX ; service provided at DEFRA, NLAMS, USIAD, HKHKG MX 20 DEFRA.MX ; service provided just at DEFRA MX 20 NLAMS.MX ; service provided just at NLAMS MX 20 USIAD.MX ; service provided just at USIAD MX 20 HKHKG.MX ; service provided just at HKHKG. so a client will first attempt to deliver to ANY.MX.QUIRKAFLEEG.ORG, and if that fails we'll try one of the others. For this particular question I still think that geoip/dns is a more straightforward approach, since it avoids the possible timeout and retry behaviour of the client that might delay delivery of mail in the event that the anycast MX is unavailable. Joe

Ben

1:59 p.m.

On Thu, Jun 18, 2015 at 09:08:13AM -0400, Joe Abley wrote:

...

On 18 Jun 2015, at 7:51, Ray Soucy wrote:

...
You can certainly do anycast with TCP, and for small stateless services it can be effective. You can't do anycast for a stateful application without taking the split-brain problem into account.

It's really difficult to apply broad "can" or "can't", "works" or "doesn't work" advice here since there really are no absolutes. What works and what doesn't depends on the intersection between theory and practice (including other peoples' networks), and is broader than the architectural decision to use or not use anycast.

The text I pasted much earlier from RFC 4786 was a result of a lot of discussion (and more than a handful of objections to our attempts to answer this question, and to the document as a whole existing at all).

In the general, mathematical sense, it's never safe to use anycast with TCP; "safe" here means "entirely safe in all circumstances". Since we live on the Internet, we know nowhere is safe, so this answer is unsatisfying and doesn't help us make real-world decisions.

In the pragmatic, throw it at the wall and see what sticks sense, it's usually fine to use anycast with TCP; "usually" means things like "pretty sure I remember this working just fine at my last job" and "in our very particular situation the helpdesk phone didn't seem to ring". There's usually very little science attached to this answer, either in terms of comprehensive data about failures or in terms of characterising the precise environment and considering the ways in which it is similar or dissimilar to others.

I think the single greatest issue with anycast is people relying too much on anycast where traffic falls over in a certain location, say with blackholing, and there's no easy/quick fallback. Like two dns servers for a domain both served in the same location on anycast. But that can happen without anycast too..

...

If anycast is being considered as part of a solution to a particular problem, we might consider an answer of the form "anycast, when it works, is expected to solve that problem; anycast might introduce new problems, though, so we also need to think about a fall-back to a situation where the old problems are reintroduced but the new ones are gone". This kind of fudges around the difficulty in confidently enumerating all the new problems with an anticipation that anycast will work enough of the time to make it worth using at all.

So, in the example at hand, using an MX RRSet that tries first to deliver to an SMTP service that is distributed using anycast but will fall back to SMTP service that is not might be a reasonable approach, e.g.

$ORIGIN QUIRKAFLEEG.ORG.

@ MX 10 ANY.MX ; service provided at DEFRA, NLAMS, USIAD, HKHKG MX 20 DEFRA.MX ; service provided just at DEFRA MX 20 NLAMS.MX ; service provided just at NLAMS MX 20 USIAD.MX ; service provided just at USIAD MX 20 HKHKG.MX ; service provided just at HKHKG.

so a client will first attempt to deliver to ANY.MX.QUIRKAFLEEG.ORG, and if that fails we'll try one of the others.

I think that is the most prudent advice, if using anycast, have a fallback. But following this thread there's something that's been left unsaid, and that no-one seems to have mentioned. If there's two MX hosts that can most likely receive mail for users in either location, and of them is unreliable, then what happens when that unreliable one receives an email and can't pass it onto the relevant place. One solution is to segregate email into location dependent domains, and just have the right email go to the right location. But if wanting to pick and choose what to send on, it might make sense to proxy all the emails to the destination, so that if email is coming in the dodgy location, and being forwarded to the less dodgy location and the connection breaks mid connection the message can be resent and hopefully hit the less dodgy location. And I think in some ways what might make more sense is to get some alternate path connectivity in the dodgy location if it's just backhaul that's failing.

...

For this particular question I still think that geoip/dns is a more straightforward approach, since it avoids the possible timeout and retry behaviour of the client that might delay delivery of mail in the event that the anycast MX is unavailable.

For availability without a high amount of performance necessary I think that geoip/dns is probably a better solution than anycast. But if wanting to sidetrack a little, I think that anycasting, or even moving mail servers closer to the user isn't happening much yet. And in a way terminating close to the input of network, and proxying to a relevant location seems to me a way that could incorporate some smarts without having to hold e-mail close to the edge, and slightly improve mail delivery performance for larger emails. So the proxy would hold mappings of user to location, then open up a connection masquerading as the users original source for any acl's, rate limiting or such. And if the connection from the edge to the mail server breaks, then another connection directly to the relevant location may work. Ben.

Rob Seastrom

5:34 p.m.

Ray Soucy <rps@maine.edu> writes:

...

You can certainly do anycast with TCP, and for small stateless services it can be effective. You can't do anycast for a stateful application without taking the split-brain problem into account.

In my experience, the thing that makes anycast work *well* is having the concept of a Plan B baked into some-layer-above-4. That creates the ability to recovery gracefully in the corner case when a routing change causes your session to blow up. Choice of layer 4 protocol doesn't really enter into it, nor does the length of time that the layer 4 session exists (in the case of UDP, generally 2 packets; in the case of TCP, somewhat longer). Shorter sessions have a lower likelihood of losing, due to shorter exposure time, but even for a single-packet-each-way UDP transaction the time (and the risk) is not 0. People of course use anycast for DNS. Personal experience shows that it also seems to work great for HLS video streaming. I'd imagine it would work fine for email too, since the whole concept of multi-level MX is a "plan-B-at-higher-level" thing.

...

The entire CDN model was developed with anycast in mind,

Not really; practical application of anycast was nascent when US 6,108,703 (the "Akamai patent", which centered around DNS) was filed. A brief history of anycast is at https://tools.ietf.org/html/draft-mcpherson-anycast-arch-implications-00 section 3.

...

Taking a normal application, like mail or a dynamic website, and just using anycast for load balancing without designing the service with the anycast model in mind is probably not a good idea. You need to expect that the same user could access different systems, and design for that.

For anything at scale, wherein one has multiple back end devices, one must already design for that. Designing consistency-synchronized systems that work over continental or global scale latency is left as an exercise to the implementer.

...

The real point here is the problem OP is describing should be easily handled by having proper MX records, and getting into anycast for mail is likely not the right choice (unless maybe your goal is to be really efficient at SPAM).

Probably originating outbound connections to arbitrary locations from an anycast locator is a step away from goodness. -r

Jonas Björk

7:43 p.m.

While risking being slightly off topic: Does anyone use anycast dhcp servers? Have you run into any problems considering synching the leases?

Joe Abley

7:51 p.m.

On 18 Jun 2015, at 15:43, Jonas Björk wrote:

...

While risking being slightly off topic: Does anyone use anycast dhcp servers? Have you run into any problems considering synching the leases?

Since DHCP uses broadcast and multicast addresses when a client is discovering a server, it's not obvious why you'd have to. You can run redundant sets of isc-dhcpd servers together serving the same broadcast domain and have them assign leases from the same address pools (at least, I've never tried it, but I was within internal mailing list range of the person maintaining that code and heard him shouting fairly often about it, not always in tones of rage and frustration). Was that what you were after? Joe

Nick Hilliard

7:54 p.m.

On 18/06/2015 20:51, Joe Abley wrote:

...

Since DHCP uses broadcast and multicast addresses when a client is discovering a server, it's not obvious why you'd have to.

most non trivial (i.e. routed networks) would use dhcp relay, in which case anycast dns could be argued to make some sense. TBH, the OP would be better off with multiple unicast installations with backup configured. Most decent quality dhcp implementations can operate in active/failover mode. Nick

Baldur Norddahl

8:01 p.m.

Den 18/06/2015 21.52 skrev "Joe Abley" <jabley@hopcount.ca>:

...

On 18 Jun 2015, at 15:43, Jonas Björk wrote:

...
While risking being slightly off topic: Does anyone use anycast dhcp

servers?

...

...
Have you run into any problems considering synching the leases?

Since DHCP uses broadcast and multicast addresses when a client is discovering a server, it's not obvious why you'd have to.

Because clients will switch to unicast for renewal. Also clients will stay with the current server forever, so you might have a bad distribution of load between the servers. If one server was down everyone will switch to the other and never go back until forced. Regards Baldur

Jonas Björk

9:25 p.m.

...

Because clients will switch to unicast for renewal. Also clients will stay with the current server forever, so you might have a bad distribution of load between the servers. If one server was down everyone will switch to the other and never go back until forced.

Why wouldn't they go back to the nearest server when it comes back online?

...

Regards Baldur

Mike Meredith

19 Jun 19 Jun

8:39 a.m.

On Thu, 18 Jun 2015 15:51:31 -0400, "Joe Abley" <jabley@hopcount.ca> may have written:

...

Since DHCP uses broadcast and multicast addresses when a client is discovering a server, it's not obvious why you'd have to.

And broadcast/multicast when renewing a lease (DHCPREQUEST). You will of course see unicast addresses on the server side if the server is seeing requests forwarded by a udp helper.

...

You can run redundant sets of isc-dhcpd servers together serving the same broadcast domain and have them assign leases from the same address pools (at least, I've never tried it, but I was within

Indeed. Rock solid in my experience (on a "little" network). -- Mike Meredith, University of Portsmouth Principal Systems Engineer, Hostmaster, Security, and Timelord!

Baldur Norddahl

1:43 p.m.

On 19 June 2015 at 10:39, Mike Meredith <mike.meredith@port.ac.uk> wrote:

...

On Thu, 18 Jun 2015 15:51:31 -0400, "Joe Abley" <jabley@hopcount.ca> may have written:

...
Since DHCP uses broadcast and multicast addresses when a client is discovering a server, it's not obvious why you'd have to.

And broadcast/multicast when renewing a lease (DHCPREQUEST). You will of course see unicast addresses on the server side if the server is seeing requests forwarded by a udp helper.

RFC 2131 section 4.4.5: "At time T1 the client moves to RENEWING state and sends (*via unicast*) a DHCPREQUEST message to the server to extend its lease. The client sets the 'ciaddr' field in the DHCPREQUEST to its current network address. The client records the local time at which the DHCPREQUEST message is sent for computation of the lease expiration time. The client MUST NOT include a 'server identifier' in the DHCPREQUEST message." Also from section 4.3.2: "DHCPREQUEST generated during RENEWING state: 'server identifier' MUST NOT be filled in, 'requested IP address' option MUST NOT be filled in, 'ciaddr' MUST be filled in with client's IP address. In this situation, the client is completely configured, and is trying to extend its lease. This message will be *unicast*, so *no relay agents will be involved in its transmission*. Because 'giaddr' is therefore not filled in, the DHCP server will trust the value in 'ciaddr', and use it when replying to the client." If there is no reply to the unicast, the client should eventually do a fallback to broadcast, but a great number of DHCP clients fail to implement that. They will instead keep unicasting until the lease expire, then start over including deconfiguring the IP stack and then send DISCOVER. Regards, Baldur

Masataka Ohta

18 Jun 18 Jun

11:51 p.m.

On 2015/06/19 4:43, Jonas Björk wrote:

...

While risking being slightly off topic: Does anyone use anycast dhcp servers? Have you run into any problems considering synching the leases?

In general, multiple anycast servers on a link, which is the anycast model of IPv6, is a bad idea, because broadcast just works. Of course, IPv6 inhibition of broadcast is another bad idea. Masataka Ohta

3675

Age (days ago)

3680

Last active (days ago)

List overview

Download

64 comments

30 participants

participants (30)

Baldur Norddahl
Ben
Bill Woodcock
Christopher Morrow
Chuck Church
Dave Taht
Guillaume Tournat
James Hartig
Joe Abley
Joe Hamelin
John Levine
John Orthoefer
Jon Lewis
Jonas Björk
Jürgen Jaritsch
Kurt Kraut
Mark Andrews
Masataka Ohta
Matt Palmer
Max Tulyev
Mike Meredith
Nick Hilliard
Owen DeLong
Rafael Possamai
Randy Bush
Ray Soucy
Rob Seastrom
Robert Blayzor
Tony Finch
William Herrin