Re: plea for comcast/sprint handoff debug help

6 Nov 2020

      I hate to jump in late. but... :)

After reading this a few times it seems like what's going on is:
  o a set of assumptions were built into the software stack
     this seems fine, hard to build with some assumptions :)

  o the assumptions seem to include: "if rrdp fails <how?> feel free
to jump back/to rsync"
    I think SOME of the problem is the 'how' there.
    Admittedly someone (randy) injected a pretty pathological failure
mode into the system
    and didn't react when his 'monitoring' said: "things are broke yo!"

  o absent a 'failure' the software kept on getting along as it had before.
    Afterall, maybe the operator here intentionally put their
repository into this whacky state?
    How is an RP software stack supposed to know what the PP's
management is meaning to do?

  o lots of debate about how we got to where we are, I don't know that
much of it is really helpful.

I think a way forward here is to offer a suggestion for the software
folk to cogitate on and improve?
   "What if (for either rrdp or rsync) there is no successful
update[0] in X of Y attempts,
   attempt the other protocol to sync down to bring the remote PP back
to life in your local view."

This both allows the RP software to pick their primary path (and stick
to that path as long as things work) AND
helps the PP folk recover a bit quicker if their deployment runs into troubles.

0: I think 'failure' here is clear (to me):
    1) the protocol is broken (rsync no connect, no http connect)
    2) the connection succeeds but there is no sync-file (rrdp) nor
valid MFT/CRL

The 6486-bis rework effort seems to be getting to: "No MFT? no CRL?
you r busted!"
so I think if you don't get MFT/CRL in X of Y attempts it's safe to
say the PP over that protocol is busted,
and attempting the other proto is acceptable.

thanks!
-chris

On Mon, Nov 2, 2020 at 4:37 AM Job Snijders <job@ntt.net> wrote:
...
On Mon, Nov 02, 2020 at 09:13:16AM +0100, Tim Bruijnzeels wrote:
...
On the other hand, the fallback exposes a Malicious-in-the-Middle
replay attack surface for 100% of the prefixes published using RRDP,
100% of the time. This allows attackers to prevent changes in ROAs to
be seen.
This is a mischaracterization of what is going on. The implication of
what you say here is that RPKI cannot work reliably over RSYNC, which is
factually incorrect and an injustice to all existing RSYNC based
deployment. Your view on the security model seems to ignore the
existence of RPKI manifests and the use of CRLs, which exist exactly to
mitigate replays.
Up until 2 weeks ago Routintar indeed was not correctly validating RPKI
data, fortunately this has now been fixed:
https://mailman.nanog.org/pipermail/nanog/2020-October/210318.html
Also via the RRDP protocol old data be replayed, because because just
like RSYNC, the RRDP protocol does not have authentication. When RPKI
data is transported from Publication Point (RP) to Relying Party, the RP
cannot assume there was an unbroken 'chain of custody' and therefor has
to validate all the RPKI signatures.
For example, if a CDN is used to distribute RRDP data, the CDN is the
MITM (that is literally what CDNs are: reverse proxies, in the middle).
The CDN could accidentally serve up old (cached) content or misserve
current content (swap 2 filenames with each other).
...
This is a tradeoff. I think that protecting against replay should be
considered more important here, given the numbers and time to fix
HTTPS issue.
The 'replay' issue you perceive is also present in RRDP. The RPKI is a
*deployed* system on the Internet and it is important for Routinator to
remain interopable with other non-nlnetlabs implementations.
Routinator not falling back to rsync does *not* offer a security
advantage, but does negatively impact our industry's ability to migrate
to RRDP. We are in 'phase 0' as described in Section 3 of
https://tools.ietf.org/html/draft-sidrops-bruijnzeels-deprecate-rsync
Regards,
Job

Re: plea for comcast/sprint handoff debug help

Christopher Morrow