On Sun, 9 Jul 2006 14:47:13 -0400, Shumon Huque <shuque@isc.upenn.edu> wrote:
On Thu, Jul 06, 2006 at 04:52:52PM -0400, Steven M. Bellovin wrote:
SSH is a distributed single point of failure, just like the old thick yellow Ethernet. Remember how reliable and easy to debug that was?
More seriously, the original virtue of SSH was that it could be deployed without centralized infrastructure. That's great for many purposes; it's exactly what you don't want if you're an ISP managing a lot of servers and network elements. You really do want a PKI, complete with CRLs. I know that (most) SSH implementations don't do that -- complain to your vendor. (Note: the CAs are also single points of failure. However, they can be kept offline or nearly so, booted from a FooLive CD that logs to a multi-session CD or via a write-only network port through a tight firewall, etc. Yes, you have to worry about procedures, physical access, and people, but you *always* have to worry about those.
--Steven M. Bellovin, http://www.cs.columbia.edu/~smb
The problem is how do you ensure that you've distributed the most current CRLs to all your SSH clients. You might need to deploy a redundant highly available set of OCSP responders. Which means that at least a part of your centralized infrastructure is now online and inline :-) Admittedly not the part that necessarily needs access to the CA's private key, so not terrible from a security paranoia point of view.
CRLs contain serial numbers and the date of the next-to-be-issued CRL. You'll always know if you haven't gotten one. What you do then is a matter of policy -- it's perfectly reasonable to accept keys even if you've missed an update or two. I'll further assert that the need for really prompt certificate revocation is often greatly overstated. Someone you don't want to have one obtains a private key at time T0. You discover this at time T1, T1 > T0. You go through assorted internal proceses, including the time to generate and push the next CRL; that happens at T2, T2 > T1. Most of the time, T1-T0 > T2-T1. That is, the key will be compromised for longer (and probably much longer) than it takes to send out a CRL. But the window of avoidable trouble is T2-T1. Furthermore, this being NANOG, the real issue is whether or not the the bad guy can *cause* network trouble during that interval -- ordinary network failures are presumably rare enough that the odds on trouble happening during that interval *and* the bad guy trying something are low. Most of their trouble probably happened during [T0,T1], a much longer time. Trying to optimize the rest of the infrastructure to avoid [T1,T2] trouble isn't worth it.
We already have a deployed key management infrastructure at our site (Kerberos). If it were (practically) possible to authenticate login sessions to routers with it, we'd definitely use it. I can't see us deploying a PKI just to authenticate SSH host keys.
Why not? PKIs don't have to be big and scary, especially if it's a "pki" instead of a "PKI". Assertion: with a few scripts to invoke OpenSSL, anyone capable of running a Kerberos server is capable of running their own special-purpose pki for this purpose.
There is the general chicken-and-egg concern about using network based authentication services to access critical network hardware. But I think many (most?) of us have other means to access routers during catastrophic failures or unavailability of the former. We have an out of band ethernet connected to the router consoles, which can be dialed into (needs authentication with a hardware token).
But the inband schemes are better, or you wouldn't bother with them. --Steven M. Bellovin, http://www.cs.columbia.edu/~smb