How much do you automate your automation?
I've been told that at [some of] the largest networks, network engineers "never directly log into network devices". This implies that all configuration changes made to and insights gleaned from the network gear are done via some form of automation. I assume it's commonplace to have/use Unix CLI tools for executing configuration changes. I've written such things for the past couple of places I've worked so that we can literally copy&paste from a MOP to a shell session and have a change implemented. Such tools become extremely handy when you want to make the same change on a few or a few hundred devices. What I'm wondering is, how common is it to take the next logical step and if you have a planned maintenance window to implement some simple change, do you have an engineer manually make that change, manually execute a script that implements the change, or use old-school automation (at) to schedule a date & time at which the script that implements the change will be run, and optionally have an engineer monitor that the change happened and had the intended results? ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
Certain companies, e.g. hyperscalers, automate changes completely. An engineer / architect decides what do to, e.g. upgrade all the XXX routers to OS YYY. Then they hand that off to an operations team which uses a pre-written script (really much much much more than a “script” - frequently written by a third team) to tell the “upgrade all device type $FOO to OS $BAR”. At which point the system figures which devices get upgraded, separates the fleet into stages, decides when each device is touched, pre-drains, upgrades, verifies the upgrade, undrains, verifies traffic moved back, proceeds to next device, etc., with possible human ACKs required to move to the next stage or whatever other segmentation you like. Obviously you can make things more specific, such as all device type $FOO in role $BAR, or in geography $BAT, or pretty much any other method you can dream up. It is almost like computers are good at following a complex decision tree with lots of variables. Who knew? Without this, networks deploying 10s of 1000s of devices could not survive. With it, you can scale the number of devices far more quickly than you scale the staff. Oh, and you can also take down your whole network very very quickly. :-) -- TTFN, patrick
On Apr 14, 2026, at 15:36, Jon Lewis via NANOG <nanog@lists.nanog.org> wrote:
I've been told that at [some of] the largest networks, network engineers "never directly log into network devices". This implies that all configuration changes made to and insights gleaned from the network gear are done via some form of automation.
I assume it's commonplace to have/use Unix CLI tools for executing configuration changes. I've written such things for the past couple of places I've worked so that we can literally copy&paste from a MOP to a shell session and have a change implemented. Such tools become extremely handy when you want to make the same change on a few or a few hundred devices.
What I'm wondering is, how common is it to take the next logical step and if you have a planned maintenance window to implement some simple change, do you have an engineer manually make that change, manually execute a script that implements the change, or use old-school automation (at) to schedule a date & time at which the script that implements the change will be run, and optionally have an engineer monitor that the change happened and had the intended results?
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________ _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/WKX5XT2L...
TL;DR; Some organizations have full copy/versions of their stack in offline mode for testing. e.g. https://docs.gns3.com/docs/ There are many slices of the pie.....mmmm pie.... * Known good systems/devices * Known legacy systems/devices * Unknown systems/devices * 3rdparty systems/devices * (other slices here) 1. To automate the automation of your systems you need to understand that some systems from all slices will have a no-change verbal rule set based on an existing long term ticket with no resolution. 2. MOP/SOP/Playbooks should list the systems/devices that CAN be operated on. 3. A breakglass user or access method MUST exist. Some/many styems/devices/teams DO NOT support breakglass. 4. (insert other deep thoughts) On Tue, Apr 14, 2026 at 1:36 PM Jon Lewis via NANOG <nanog@lists.nanog.org> wrote:
I've been told that at [some of] the largest networks, network engineers "never directly log into network devices". This implies that all configuration changes made to and insights gleaned from the network gear are done via some form of automation.
I assume it's commonplace to have/use Unix CLI tools for executing configuration changes. I've written such things for the past couple of places I've worked so that we can literally copy&paste from a MOP to a shell session and have a change implemented. Such tools become extremely handy when you want to make the same change on a few or a few hundred devices.
What I'm wondering is, how common is it to take the next logical step and if you have a planned maintenance window to implement some simple change, do you have an engineer manually make that change, manually execute a script that implements the change, or use old-school automation (at) to schedule a date & time at which the script that implements the change will be run, and optionally have an engineer monitor that the change happened and had the intended results?
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________ _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/WKX5XT2L...
-- - Andrew "lathama" Latham -
Hi, I think if we want to keep environments consistent across all devices, we should avoid manually tweaking any single server. it’s better to run everything in batches. That way, if one device has been manually adjusted, the batch script won’t fail unexpectedly. Also, now that we have the AI agent, for these simple issues we can just give it a high-level task instead of being overly specific. Once the script is tested, we can batch-deploy it easily. That’s my POV. On Wed, Apr 15, 2026 at 12:30 AM Andrew Latham via NANOG < nanog@lists.nanog.org> wrote:
TL;DR; Some organizations have full copy/versions of their stack in offline mode for testing. e.g. https://docs.gns3.com/docs/
There are many slices of the pie.....mmmm pie....
* Known good systems/devices * Known legacy systems/devices * Unknown systems/devices * 3rdparty systems/devices * (other slices here)
1. To automate the automation of your systems you need to understand that some systems from all slices will have a no-change verbal rule set based on an existing long term ticket with no resolution. 2. MOP/SOP/Playbooks should list the systems/devices that CAN be operated on. 3. A breakglass user or access method MUST exist. Some/many styems/devices/teams DO NOT support breakglass. 4. (insert other deep thoughts)
On Tue, Apr 14, 2026 at 1:36 PM Jon Lewis via NANOG <nanog@lists.nanog.org> wrote:
I've been told that at [some of] the largest networks, network engineers "never directly log into network devices". This implies that all configuration changes made to and insights gleaned from the network gear are done via some form of automation.
I assume it's commonplace to have/use Unix CLI tools for executing configuration changes. I've written such things for the past couple of places I've worked so that we can literally copy&paste from a MOP to a shell session and have a change implemented. Such tools become extremely handy when you want to make the same change on a few or a few hundred devices.
What I'm wondering is, how common is it to take the next logical step and if you have a planned maintenance window to implement some simple change, do you have an engineer manually make that change, manually execute a script that implements the change, or use old-school automation (at) to schedule a date & time at which the script that implements the change
will
be run, and optionally have an engineer monitor that the change happened and had the intended results?
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________ _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/WKX5XT2L...
-- - Andrew "lathama" Latham - _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/Z5CCCO27...
On Tue, 14 Apr 2026 at 22:37, Jon Lewis via NANOG <nanog@lists.nanog.org> wrote:
What I'm wondering is, how common is it to take the next logical step and if you have a planned maintenance window to implement some simple change, do you have an engineer manually make that change, manually execute a script that implements the change, or use old-school automation (at) to schedule a date & time at which the script that implements the change will be run, and optionally have an engineer monitor that the change happened and had the intended results?
If you do deltas, this is a very difficult problem. Trying to reconcile how to move from A to B. This often leads to a network where some things are managed by automation, like interface/bgp turn-up, some things are managed by people. And the true state is the configuration backup, there is no way to recreate the entire config from data. Even the mentioned hyperscalers rarely actually manage 100% of config via system, they manage DC from system, but edge nodes may use the above process. If you ignore deltas, the problem becomes very simple. That is, if for any change, changing a dot in the description of one interface you ship an entire new configuration, and let the router worry about the reconciliation between the A and B configuration. Anyone can get to the latter option with trivial resources and skill, the former I wouldn't recommend to anyone, no matter how well resourced. The process to get to the latter is 1. put your configuration backups in your network configuration directory 2. edit the configuration file when needed 3. push the configuration file Now 100% comes from the system, and anyone can do this literally in minutes. Of course you're not exactly reducing much work here at all. But the point is, it doesn't need to be a risky project which may or may not deliver something. You can start today, and manage 100% of config in the system. Then one by one pick low hanging fruits, remove them from the flat file, generate them from SQL, and create the final configuration using the flat file + generated config. Now you always know what the network state is, there is no need for the flat file to ever be zero, that's not important. This deltaless configuration used to be quite poorly supported by vendors, but today it is nearly universally supported (Junos, SROS, IOS-XR, EOS all work), IOS-XE I'm not entirely sure if it works or not. -- ++ytti
Small note to add here, order of configuration change is possible in automation and configuration management systems. Often the manual MOP/Playbook take for granted the procedural ordering of changes that a configuration management system can apply all at once across the board instantly. I think this very issue has caused some caos for organizations in the past. On Wed, Apr 15, 2026 at 12:59 AM Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Tue, 14 Apr 2026 at 22:37, Jon Lewis via NANOG <nanog@lists.nanog.org> wrote:
What I'm wondering is, how common is it to take the next logical step and if you have a planned maintenance window to implement some simple change, do you have an engineer manually make that change, manually execute a script that implements the change, or use old-school automation (at) to schedule a date & time at which the script that implements the change will be run, and optionally have an engineer monitor that the change happened and had the intended results?
If you do deltas, this is a very difficult problem. Trying to reconcile how to move from A to B. This often leads to a network where some things are managed by automation, like interface/bgp turn-up, some things are managed by people. And the true state is the configuration backup, there is no way to recreate the entire config from data. Even the mentioned hyperscalers rarely actually manage 100% of config via system, they manage DC from system, but edge nodes may use the above process.
If you ignore deltas, the problem becomes very simple. That is, if for any change, changing a dot in the description of one interface you ship an entire new configuration, and let the router worry about the reconciliation between the A and B configuration.
Anyone can get to the latter option with trivial resources and skill, the former I wouldn't recommend to anyone, no matter how well resourced.
The process to get to the latter is
1. put your configuration backups in your network configuration directory 2. edit the configuration file when needed 3. push the configuration file
Now 100% comes from the system, and anyone can do this literally in minutes.
Of course you're not exactly reducing much work here at all. But the point is, it doesn't need to be a risky project which may or may not deliver something. You can start today, and manage 100% of config in the system. Then one by one pick low hanging fruits, remove them from the flat file, generate them from SQL, and create the final configuration using the flat file + generated config. Now you always know what the network state is, there is no need for the flat file to ever be zero, that's not important.
This deltaless configuration used to be quite poorly supported by vendors, but today it is nearly universally supported (Junos, SROS, IOS-XR, EOS all work), IOS-XE I'm not entirely sure if it works or not.
-- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/AEQMBR7Y...
-- - Andrew "lathama" Latham -
Hi, On 15/04/2026 08:59, Saku Ytti via NANOG wrote: [..]
If you ignore deltas, the problem becomes very simple. That is, if for any change, changing a dot in the description of one interface you ship an entire new configuration, and let the router worry about the reconciliation between the A and B configuration.
Anyone can get to the latter option with trivial resources and skill, the former I wouldn't recommend to anyone, no matter how well resourced.
The process to get to the latter is
1. put your configuration backups in your network configuration directory 2. edit the configuration file when needed 3. push the configuration file
Now 100% comes from the system, and anyone can do this literally in minutes.
This is an excellent way to start automation for an existing deployment. Another approach is to make automation part of your hardware refresh cycle. When bringing in new equipment, create a complete config with automation from day 1. This forces you to standardize some things. If you introduce automation at a later point in time, you will need to deal with all kinds of customizations and snowflake bits of config that got added along the way.
Of course you're not exactly reducing much work here at all. But the point is, it doesn't need to be a risky project which may or may not deliver something. You can start today, and manage 100% of config in the system. Then one by one pick low hanging fruits, remove them from the flat file, generate them from SQL, and create the final configuration using the flat file + generated config. Now you always know what the network state is, there is no need for the flat file to ever be zero, that's not important.
In an environment where you have multiple hardware types deployed for the same function or want to migrate to different hardware, it can actually be very useful to have 100% of config come from a source of truth + templating. That way, replacing equipment with a different type is only a matter of changing port assignments and generating a new config. Kind regards, Martin
participants (6)
-
Andrew Latham -
Brandon Z. -
Jon Lewis -
Martin Pels -
Patrick W. Gilmore -
Saku Ytti