Re: Is multihoming hard? [was: DNS amplification]

25 Mar 2013


      I assume those people will not bother with any attempt to multihome in any form.
They are not, therefore, part of what is being discussed here.

Owen

On Mar 23, 2013, at 19:47 , Kyle Creyts <kyle.creyts@gmail.com> wrote:
...
You do realize that there are quite a few people (home broadband subscribers?) who just "go do something else" when their internet goes down, right?
There are people who don't understand the difference between "a site being slow" and packet-loss. For many of these people, losing internet service carries zero business impact, and relatively little life impact; they might even realize they have better things to do than watch cat videos or scroll through endless social media feeds.
Will they really demand ubiquitous, unabridged connectivity?
When?
On Mar 23, 2013 12:58 PM, "Owen DeLong" <owen@delong.com> wrote:
...
On Mar 23, 2013, at 12:12 , Jimmy Hess <mysidia@gmail.com> wrote:
...
On 3/23/13, Owen DeLong <owen@delong.com> wrote:
...
A reliable cost-effective means for FTL signaling is a hard problem without
a known solution.
Faster than light signalling is not merely a hard problem.
Special relativity doesn't provide that information may travel faster
than the maximum
speed C.    If you want to signal faster than light, then slow down the light.
...
An idiot-proof simple BGP configuration is a well known solution. Automating
it would be relatively simple if there were the will to do so.
Logistical problems...  if it's a multihomed connection, which of the
two or three providers manages it,  and gets to blame the other
provider(s) when anything goes wrong: or are you gonna rely on the
customer to manage it?
The box could (pretty easily) be built with a "Primary" and "Secondary" port.
The cable plugged into the primary port would go to the ISP that sets the
configuration. The cable plugged into the other port would go to an ISP
expected to accept the announcements of the prefix provided by the ISP
on the primary port.
BFD could be used to illuminate a tri-color LED on the box for each port,
which would be green if BFD state is good and red if BFD state is bad.
At that point, whichever one is red gets the blame. If they're both green,
then traffic is going via the primary and the primary gets the blame.
If you absolutely have to troubleshoot which provider is broken, then
start by unplugging the secondary. If it doesn't start working in 5 minutes,
then clearly there's a problem with the primary regardless of what else
is happening.
Lather, rinse, repeat for the secondary.
...
Someone might be able to make a protocol that lets this happen, which
would need to detect on a per-route basis any performance/connectivity
issues, but I would say it's not any known implementation of BGP.
A few additional options to DHCP could actually cover it from the primary
perspective.
For the secondary provider, it's a little more complicated, but could be
mostly automated so long as the customer identifies the primary provider
and/or provides an LOA for the authorized prefix from the primary to
the secondary.
The only complexity in the secondary case is properly filtering the announcement
of the prefix assigned by the primary.
...
...
1.   ISPs are actually motivated to prevent customer mobility, not enable it.
...
2.   ISPs are motivated to reduce, not increase the number of multi-homed
     sites occupying slots in routing tables.
This is not some insignificant thing.   The ISPs have to maintain
routing tables
   as well;  ultimately the ISP's customers are in bad shape, if too many slots
   are consumed.
I never said it was insignificant. I said that solving the multihoming problem
in this manner was trivial if there was will to do so. I also said that the above
were contributing factors in the lack of will to do so.
...
How about
  3.  Increased troubleshooting complexity when there are potential
issues or complaints.
I do not buy that it is harder to troubleshoot a basic BGP configuration
than a multi-carrier NAT-based solution that goes woefully awry.
I'm sorry, I've done the troubleshooting on both scenarios and I have
to say that if you think NAT makes this easier, you live in a different
world than I do.
...
The concept of a "fool proof"  BGP configuration is clearly a new sort of myth.
Not really.
Customer router accepts default from primary and secondary providers.
So long as default remains, primary is preferred. If primary default goes
away, secondary is preferred.
Customer box gets prefix (via DHCP-PD or static config or whatever
either from primary or from RIR). Advertises prefix to both primary
and secondary.
All configuration of the BGP sessions is automated within the box
other than static configuration of customer prefix (if static is desired).
Primary/Secondary choice is made by plugging providers into the
Primary or Secondary port on the box.
...
The idea that the protocol on its own, with a very basic config, does
not ever require
any additional attention,  to achieve expected results;  where
expected results include isolation from any faults with the path from
one of of the user's two, three, or four providers,  and  balancing
for optimal throughput and best latency/loss to every destination.
I have installed these configurations at customer sites for several of
my consulting clients that wanted to multihome their SMBs.
Some of them have been running for more than 8 years without a
single issue.
For all of the above requirements, no. You can't do that with the most
advanced manual BGP configurations today.
However, if we reduce it to:
1.      The internet connection stays up so long as one of the two
        providers is up.
2.      Traffic prefers the primary provider so long as the primary provider
        is up.
3.      My addressing remains stable so long as I remain connected to
        the primary provider (or if I use RIR based addressing, longer).
Then what I have proposed actually is achievable, does work, and
does actually meet the needs of 99+% of organizations that wish to
multihome.
...
BGP multihoming doesn't  prevent users from having issues because:
o Connectivity issues that are a responsibility of one of their provider's
        That they might have expected multihoming to protect them against
         (latency, packet loss).
Correct. However, this is true of ANY multihoming solution. The dual-
provider NAT solution certainly does NOT improve this.
...
o very Poor performance of one of their links;  or poor
performance of one of their
        links to their favorite destination
See above.
...
o Asymmetric paths;  which means that when latency or loss is poor,
        the customer doesn't necessarily know which provider to blame,
        or if both are at fault,  and  the providers can spend a lot of time
        blaming each other.
See above.
...
These are all solvable problems,   but at cost, and therefore not for
massmarket lowest cost ISP service.
My point is that the automated simple BGP solution I propose can provide
a better customer experience than the currently popular NAT-based
multihoming with simpler troubleshooting and lower costs.
...
It's not as if they can have
   "Hello, DSL technical support...  did you try shutting off your
other peers and retesting'?"
ROFL.
...
The average end user won't have a clue -- they will need one of the
providers, or someone else to be managing that for them,  and
understand  how each provider is connected.
Again, you're setting a much higher goal than I was.
My goal was to do something better than what is currently being done.
(Connect a router to two providers and use NAT to choose between them).
...
I don't see large ISPs  training up their support reps for  DSL
$60/month services, to handle BGP troubleshooting, and multihoming
management/repair.
But they already get stuck with this in the current NAT-based solution which
is even harder to troubleshoot and creates even more problems.
Owen