
On Mon, Feb 01, 2010 at 09:46:07PM -0500, Stefan Fouant wrote:
Vijay Gill had some real interesting insights into this in a presentation he gave back at NANOG 44:
http://www.nanog.org/meetings/nanog44/presentations/Monday/Gill_programatic_...
His Blog article on "Infrastructure is Software" further expounds upon the benefits of such an approach - http://vijaygill.wordpress.com/2009/07/22/infrastructure-is-software/
That stuff is light years ahead of anything anybody is doing today (well, apart from maybe Vijay himself ;) ... but IMO it's where we need to start heading.
Vijay's stuff is fascinating. The vision is great. But in my experience, the vendors and implementations basically ruin the dream for anyone who doesn't have his pull. I'm sure my software is nowhere close to being as sophisticated as his, but my plans are pretty much in line with his suggestions. Some problems I've run into that I don't see any kind of solution for: 1) Forwarding-impacting bugs: IOS bugs that are triggered by SNMP are easily the #1 cause of our accidental service impact. Most seem to be race conditions that require real-world config and forwarding load - not something a small shop can afford to build a lab to reproduce. If we stuck to manual deployment, we might have made a few mistakes but would it have been worse? Maybe - but honestly, it could be a wash. 2) Vendor support is highly suspicious of automation: anytime I open a ticket, even unrelated to an automated software process, the first thing the vendor support demands is to disable all automation. Juniper is by far the best about this, and they *still* don't actually believe their own automation tools work. Cisco TAC's answer has always been "don't ever use SNMP if it causes crashes!" Procurve doesn't even bother to respond to tickets related to automation bugs, even if they are remotely triggerable crashes in the default config. 3) Automation interfaces are largely unsupported: I imagine vendor software development having one or two guys that are the masterminds for SNMP/NETCONF/whatever - and that's it. When I have a question on how to find a particular tool, or find a bug in an automation function, I can often go months on a ticket with people that have no idea what I'm talking about. What documentation exists is typically incomplete or inconsistent across versions and product lines. 4) Related tools prevent reliable error reporting: as far as I can tell, Net-SNMP returns random values if a request fails; if there's a pattern, I've failed to discern it. expect is similar. ScreenOS's SSH implementation always returns that a file copy failed. Procurve only this year implemented ssh key-based auth in combination with remote authentication. The best-of-breed seems to be an oft-pathetic collection of tools. 5) Management support: developing automation software is hard - network devices aren't nearly as easy to deal with as they should be. When I spend weeks developing features that later causes IOS to spontaneously reload, people that don't understand the relation to operational impact start to advocate dismantling the automation just like the vendors above. I'm sure we'll continue to build automated policy and configuration tools. I'm just not convinced it's the panacea that everyone thinks. Unless you're one of the biggest, it puts your network at someone else's mercy - and that someone else doesn't care about your operational expenses. Ross -- Ross Vandegrift ross@kallisti.us "If the fight gets hot, the songs get hotter. If the going gets tough, the songs get tougher." --Woody Guthrie