I hate to jump in late. but... :) After reading this a few times it seems like what's going on is: o a set of assumptions were built into the software stack this seems fine, hard to build with some assumptions :) o the assumptions seem to include: "if rrdp fails <how?> feel free to jump back/to rsync" I think SOME of the problem is the 'how' there. Admittedly someone (randy) injected a pretty pathological failure mode into the system and didn't react when his 'monitoring' said: "things are broke yo!" o absent a 'failure' the software kept on getting along as it had before. Afterall, maybe the operator here intentionally put their repository into this whacky state? How is an RP software stack supposed to know what the PP's management is meaning to do? o lots of debate about how we got to where we are, I don't know that much of it is really helpful. I think a way forward here is to offer a suggestion for the software folk to cogitate on and improve? "What if (for either rrdp or rsync) there is no successful update[0] in X of Y attempts, attempt the other protocol to sync down to bring the remote PP back to life in your local view." This both allows the RP software to pick their primary path (and stick to that path as long as things work) AND helps the PP folk recover a bit quicker if their deployment runs into troubles. 0: I think 'failure' here is clear (to me): 1) the protocol is broken (rsync no connect, no http connect) 2) the connection succeeds but there is no sync-file (rrdp) nor valid MFT/CRL The 6486-bis rework effort seems to be getting to: "No MFT? no CRL? you r busted!" so I think if you don't get MFT/CRL in X of Y attempts it's safe to say the PP over that protocol is busted, and attempting the other proto is acceptable. thanks! -chris On Mon, Nov 2, 2020 at 4:37 AM Job Snijders <job@ntt.net> wrote:
On Mon, Nov 02, 2020 at 09:13:16AM +0100, Tim Bruijnzeels wrote:
On the other hand, the fallback exposes a Malicious-in-the-Middle replay attack surface for 100% of the prefixes published using RRDP, 100% of the time. This allows attackers to prevent changes in ROAs to be seen.
This is a mischaracterization of what is going on. The implication of what you say here is that RPKI cannot work reliably over RSYNC, which is factually incorrect and an injustice to all existing RSYNC based deployment. Your view on the security model seems to ignore the existence of RPKI manifests and the use of CRLs, which exist exactly to mitigate replays.
Up until 2 weeks ago Routintar indeed was not correctly validating RPKI data, fortunately this has now been fixed: https://mailman.nanog.org/pipermail/nanog/2020-October/210318.html
Also via the RRDP protocol old data be replayed, because because just like RSYNC, the RRDP protocol does not have authentication. When RPKI data is transported from Publication Point (RP) to Relying Party, the RP cannot assume there was an unbroken 'chain of custody' and therefor has to validate all the RPKI signatures.
For example, if a CDN is used to distribute RRDP data, the CDN is the MITM (that is literally what CDNs are: reverse proxies, in the middle). The CDN could accidentally serve up old (cached) content or misserve current content (swap 2 filenames with each other).
This is a tradeoff. I think that protecting against replay should be considered more important here, given the numbers and time to fix HTTPS issue.
The 'replay' issue you perceive is also present in RRDP. The RPKI is a *deployed* system on the Internet and it is important for Routinator to remain interopable with other non-nlnetlabs implementations.
Routinator not falling back to rsync does *not* offer a security advantage, but does negatively impact our industry's ability to migrate to RRDP. We are in 'phase 0' as described in Section 3 of https://tools.ietf.org/html/draft-sidrops-bruijnzeels-deprecate-rsync
Regards,
Job