Testing Out Of Band Management Tooling
All Related to the Serial console topic I wanted to do a quick test. I hit a hiccup in my documentation and addressed it. Questions: A: How often should one do dry run testing on OOBM? B: How often should one call and confirm emergency numbers for remote vendor sites? C: How could Out Of Band Management equipment be monitored better in 2025? If you have other thoughts about questions, pop them in starting at D -- - Andrew "lathama" Latham -
On Tue, 23 Dec 2025 at 21:00, Andrew Latham via NANOG <nanog@lists.nanog.org> wrote:
Related to the Serial console topic I wanted to do a quick test. I hit a hiccup in my documentation and addressed it.
Questions: A: How often should one do dry run testing on OOBM?
We login to each control-plane (pri and bu) once a day via OOB. But according to my monte carlo[0] simulation availability of OOB can be pretty bad, and it should still be up when it's needed. So you can probably do it less often than once a day and have a very high probability of having useful OOB when needed. [0] https://gist.github.com/ytti/91213b76b6d7390bfb9b8c216dfa1d58 -- ++ytti
I wrote a script for our monitoring system that just does a quick login checks for prompt and if everything is okay sleeping dogs lay. If anything goes wrong anywhere along the line I get a yellow condition until it is resolved. If it stays yellow for 24 hours it goes red. I always practice and preach automate everything. A computer will never sleep or forget. On Tue, Dec 23, 2025, 1:59 PM Andrew Latham via NANOG <nanog@lists.nanog.org> wrote:
All
Related to the Serial console topic I wanted to do a quick test. I hit a hiccup in my documentation and addressed it.
Questions: A: How often should one do dry run testing on OOBM? B: How often should one call and confirm emergency numbers for remote vendor sites? C: How could Out Of Band Management equipment be monitored better in 2025?
If you have other thoughts about questions, pop them in starting at D
-- - Andrew "lathama" Latham - _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/WELPASPH...
Once upon a time, Josh Luthman <josh@imaginenetworksllc.com> said:
I wrote a script for our monitoring system that just does a quick login checks for prompt and if everything is okay sleeping dogs lay.
We've got something similar for IPMI serial-over-LAN, running for all of our servers. -- Chris Adams <cma@cmadams.net>
By gov edict my clients have monthly DR testing, because that’s SOP for public safety agencies. These tests exercise everything, including power failure and opsec communications (oobm for humans) in the event of a data breach. The trick is to not get complacent about noting small discrepancies in these tests. I don’t see many private sector companies doing this. -mel
On Dec 23, 2025, at 11:00 AM, Andrew Latham via NANOG <nanog@lists.nanog.org> wrote:
All
Related to the Serial console topic I wanted to do a quick test. I hit a hiccup in my documentation and addressed it.
Questions: A: How often should one do dry run testing on OOBM? B: How often should one call and confirm emergency numbers for remote vendor sites? C: How could Out Of Band Management equipment be monitored better in 2025?
If you have other thoughts about questions, pop them in starting at D
-- - Andrew "lathama" Latham - _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/WELPASPH...
participants (5)
-
Andrew Latham -
Chris Adams -
Josh Luthman -
Mel Beckman -
Saku Ytti