
Dear all, I'm very happy to see the direction this conversation has taken, seems we've moved on towards focussing on solutions and outcomes - this is encouraging. On Mon, Oct 01, 2018 at 05:44:17PM +0100, Nick Hilliard wrote:
John Curran wrote on 01/10/2018 00:21:
There is likely some on the nanog mailing list who have a view on this matter, so I pose the question of "who should be responsible" for consequences of RPKI RIR CA failure to this list for further discussion.
other replies in this thread have assumed that RPKI CA failure modes are restricted to loss of availability, but there are others failure modes, for example:
- fraud: rogue CA employee / external threat actor signs ROAs illegitimately
- negligence: CA accidentally signs illegitimate ROAs due to e.g. software bug
- force majeure: e.g. court orders CA to sign prefix with AS0, complicated by NIR RPKI delegation in jurisdictions which may have difficult relations with other parts of the world.
These types of situations are well-trodden territory for other types of PKI CA, where users
Otherwise, as other people have pointed out, catastrophic systems failure at the CA is designed to be fail-safe. I.e. if the CA goes away, ROAs will be evaluated as "unknown" and life will continue on. If people misconfigure their networks and do silly things with this specific failure mode, that's their problem. You can't stop people from aiming guns at their feet and pulling the trigger.
There are a number of failure modes and I believe the operational community has yet to fully explore how to mitigate most risks. Over time I expect we'll develop BCPs how to improve the robustness of the system; these BCPs can only come into existence driven by actual operational experierence. A positive development that addresses some aspects of the concerns raised is Certificate Transparency. Cloudflare set up a CT log (https://groups.google.com/forum/#!topic/certificate-transparency/_deL5iGB5sY) and I hope others like Google will also consider doing this. CT is a great tool to help keep the roots perform in line with community expectations. I consider it the operator community's responsibility to figure out how to deal with outages. I don't intend to hold the RIRs liable - we'll need to learn to protect ourselves. Kind regards, Job