Re: Anycast provider for SMTP?

18 Jun 2015

      On Thu, Jun 18, 2015 at 09:08:13AM -0400, Joe Abley wrote:
...
On 18 Jun 2015, at 7:51, Ray Soucy wrote:
...
You can certainly do anycast with TCP, and for small stateless services it
can be effective.  You can't do anycast for a stateful application without
taking the split-brain problem into account.
It's really difficult to apply broad "can" or "can't", "works" or "doesn't
work" advice here since there really are no absolutes. What works and what
doesn't depends on the intersection between theory and practice (including
other peoples' networks), and is broader than the architectural decision to
use or not use anycast.
The text I pasted much earlier from RFC 4786 was a result of a lot of
discussion (and more than a handful of objections to our attempts to answer
this question, and to the document as a whole existing at all).
In the general, mathematical sense, it's never safe to use anycast with TCP;
"safe" here means "entirely safe in all circumstances". Since we live on the
Internet, we know nowhere is safe, so this answer is unsatisfying and
doesn't help us make real-world decisions.
In the pragmatic, throw it at the wall and see what sticks sense, it's
usually fine to use anycast with TCP; "usually" means things like "pretty
sure I remember this working just fine at my last job" and "in our very
particular situation the helpdesk phone didn't seem to ring". There's
usually very little science attached to this answer, either in terms of
comprehensive data about failures or in terms of characterising the precise
environment and considering the ways in which it is similar or dissimilar to
others.
I think the single greatest issue with anycast is people relying too much on anycast
where traffic falls over in a certain location, say with blackholing, and there's no
easy/quick fallback.  Like two dns servers for a domain both served in the same location
on anycast. But that can happen without anycast too..
...
If anycast is being considered as part of a solution to a particular
problem, we might consider an answer of the form "anycast, when it works, is
expected to solve that problem; anycast might introduce new problems,
though, so we also need to think about a fall-back to a situation where the
old problems are reintroduced but the new ones are gone". This kind of
fudges around the difficulty in confidently enumerating all the new problems
with an anticipation that anycast will work enough of the time to make it
worth using at all.
So, in the example at hand, using an MX RRSet that tries first to deliver to
an SMTP service that is distributed using anycast but will fall back to SMTP
service that is not might be a reasonable approach, e.g.
$ORIGIN QUIRKAFLEEG.ORG.
@  MX 10 ANY.MX   ; service provided at DEFRA, NLAMS, USIAD, HKHKG
   MX 20 DEFRA.MX ; service provided just at DEFRA
   MX 20 NLAMS.MX ; service provided just at NLAMS
   MX 20 USIAD.MX ; service provided just at USIAD
   MX 20 HKHKG.MX ; service provided just at HKHKG.
so a client will first attempt to deliver to ANY.MX.QUIRKAFLEEG.ORG, and if
that fails we'll try one of the others.
I think that is the most prudent advice, if using anycast, have a fallback.  But
following this thread there's something that's been left unsaid, and that no-one
seems to have mentioned.

If there's two MX hosts that can most likely receive mail for users in either
location, and of them is unreliable, then what happens when that unreliable one
receives an email and can't pass it onto the relevant place.

One solution is to segregate email into location dependent domains, and just have
the right email go to the right location.  But if wanting to pick and choose what
to send on, it might make sense to proxy all the emails to the destination, so that
if email is coming in the dodgy location, and being forwarded to the less dodgy
location and the connection breaks mid connection the message can be resent and
hopefully hit the less dodgy location.

And I think in some ways what might make more sense is to get some alternate path
connectivity in the dodgy location if it's just backhaul that's failing.
...
For this particular question I still think that geoip/dns is a more
straightforward approach, since it avoids the possible timeout and retry
behaviour of the client that might delay delivery of mail in the event that
the anycast MX is unavailable.
For availability without a high amount of performance necessary I think that geoip/dns
is probably a better solution than anycast.

But if wanting to sidetrack a little, I think that anycasting, or even moving mail
servers closer to the user isn't happening much yet.  And in a way terminating close to
the input of network, and proxying to a relevant location seems to me a way that could
incorporate some smarts without having to hold e-mail close to the edge, and slightly
improve mail delivery performance for larger emails.  So the proxy would hold mappings
of user to location, then open up a connection masquerading as the users original
source for any acl's, rate limiting or such.  And if the connection from the edge to the
mail server breaks, then another connection directly to the relevant location may work.

Ben.