We all know that the weakest link of SSH is key management: if you do not confirm by a secure out of band channel that the public host key of the device you are connecting to is correct, then SSH's crypto will not help you. SSH implements neither a CA hierarchy (like X.509 certificates) nor a web of trust (like PGP) so you are left checking the validity of host keys yourself. Still, it's not so bad if you only connect to a small handful of well known servers. You will either have verified them all soon enough and not be bothered with it anymore, or system administrators will maintain a global known_hosts file that lists all the correct ones. But it's quite different when you manage a network of hundreds or thousands of devices. I find myself connecting to devices I've never connected to before on a regular basis and being prompted to verify the public host keys they are offering up. This happens in the course of something else that I am doing and I don't necesarily have the time to check a host key. If I did have time, it's hard to check it anyway: the device is just one of a huge number of network elements of no special significance to me and I didn't install it and generate its key and I don't know who did.
From time to time I also get hit with warning messages from my SSH client about a changed host key and it's probably just that someone swapped out the router's hardware sometime since the last time I connected and a new key got generated. But I'm not sure. Worst of all, my problem is repeated for every user because each user is working with their own private ssh_known_hosts database into which they accept host keys.
A possible solution is: - Maintain a global known_hosts file. Make everyone who installs a new router or turns up SSH on an existing one contribute to it. Install it as the global (in /etc/) known_hosts file on all the ssh clients you can. Pro: The work to accept a new host key is done one, and it's done by the person who installed the router, who is in the best position to confirm that no man in the middle attack is taking place. Con: You need to make sure updates to this file are authentic (its benefit is lost if untrusted people are allowed to contribute), and you need to be sure it gets installed on the ssh clients people use to connect to the network elements. Con: If a host key changes but it is found to be benign (such as the scenario I describe above), users can't do much about it until the global file is corrected and redeployed (complicated openssh options which users will generally not know to bypass the problem notwithstanding). I'm looking for information on best practices that are in use to tackle this problem. What solutions are working for you? Thanks -Phil