DevOps workflow for networking

older
Google DNS --- Figuring out which...

Kasper Adel

10 Aug 2017 10 Aug '17

12:52 a.m.

We are pretty new to those new-age network orchestrators and automation, I am curious to ask what everyone is the community is doing? sorry for such a long and broad question. What is your workflow? What tools are your teams using? What is working what is not? What do you really like and what do you need to improve? How mature do you think your process is? etc etc Wanted to ask and see what approaches the many different teams here are taking! We are going to start working from a GitLab based workflow. Projects are created, issues entered and developed with a gitflow branching strategy. GitLab CI pipelines run package loadings and run tests inside a lab. Tests are usually python unit tests that are run to do both functional and service creation, modification and removal tests. For unit testing we typically use python libraries to open transactions to do the service modifications (along with functional tests) against physical lab devices. For our prod deployment we leverage 'push on green' and gating to push package changes to prod devices. Thanks

Show replies by date

Joe Hamelin

10 Aug 10 Aug

2:16 a.m.

We've been using this tool since we're a LEAN company, but it actually is a good way to assign tasks/projects and delegate tasks so everyone can see what is going on. Managers can move cards to your active lane or ask why a task/project has stalled. I'm not sure what exactly you are looking for but as a team management tool, this has mostly worked for us for the last 3-4 years. YMMV. https://kanbanize.com/ -- Joe Hamelin, W7COM, Tulalip, WA, +1 (360) 474-7474 On Wed, Aug 9, 2017 at 5:52 PM, Kasper Adel <karim.adel@gmail.com> wrote:

...

We are pretty new to those new-age network orchestrators and automation,

I am curious to ask what everyone is the community is doing? sorry for such a long and broad question.

What is your workflow? What tools are your teams using? What is working what is not? What do you really like and what do you need to improve? How mature do you think your process is? etc etc

Wanted to ask and see what approaches the many different teams here are taking!

We are going to start working from a GitLab based workflow.

Projects are created, issues entered and developed with a gitflow branching strategy.

GitLab CI pipelines run package loadings and run tests inside a lab.

Tests are usually python unit tests that are run to do both functional and service creation, modification and removal tests.

For unit testing we typically use python libraries to open transactions to do the service modifications (along with functional tests) against physical lab devices.

For our prod deployment we leverage 'push on green' and gating to push package changes to prod devices.

Thanks

Jippen

11 Aug 11 Aug

7:18 a.m.

To be honest, most companies I've worked at have moved to amazon, where the networking stack has APIs. I've also seen folks who use CI/CD pipelines to generate configuration files for devices that don't directly support automation. On Wed, Aug 9, 2017 at 7:16 PM, Joe Hamelin <joe@nethead.com> wrote:

...

We've been using this tool since we're a LEAN company, but it actually is a good way to assign tasks/projects and delegate tasks so everyone can see what is going on. Managers can move cards to your active lane or ask why a task/project has stalled.

I'm not sure what exactly you are looking for but as a team management tool, this has mostly worked for us for the last 3-4 years. YMMV.

https://kanbanize.com/

-- Joe Hamelin, W7COM, Tulalip, WA, +1 (360) 474-7474

On Wed, Aug 9, 2017 at 5:52 PM, Kasper Adel <karim.adel@gmail.com> wrote:

...
We are pretty new to those new-age network orchestrators and automation,

I am curious to ask what everyone is the community is doing? sorry for such a long and broad question.

What is your workflow? What tools are your teams using? What is working what is not? What do you really like and what do you need to improve? How mature do you think your process is? etc etc

Wanted to ask and see what approaches the many different teams here are taking!

We are going to start working from a GitLab based workflow.

Projects are created, issues entered and developed with a gitflow branching strategy.

GitLab CI pipelines run package loadings and run tests inside a lab.

Tests are usually python unit tests that are run to do both functional and service creation, modification and removal tests.

For unit testing we typically use python libraries to open transactions to do the service modifications (along with functional tests) against physical lab devices.

For our prod deployment we leverage 'push on green' and gating to push package changes to prod devices.

Thanks

Hugo Slabbert

3:51 p.m.

Possibly a minor nit, but if the devices "don't directly support automation", how is the "D" part of "CI/CD" accomplished there? `integration -ne deployment`. Do you mean something like "there is no API or e.g. netconf interface, but they can generate config off-box, scp it, and `copy start run` to load"? -- Hugo Slabbert | email, xmpp/jabber: hugo@slabnet.com pgp key: B178313E | also on Signal On Fri 2017-Aug-11 00:18:39 -0700, Jippen <cheetahmorph@gmail.com> wrote:

...

To be honest, most companies I've worked at have moved to amazon, where the networking stack has APIs. I've also seen folks who use CI/CD pipelines to generate configuration files for devices that don't directly support automation.

Leo Bicknell

4:34 p.m.

In a message written on Fri, Aug 11, 2017 at 08:51:25AM -0700, Hugo Slabbert wrote:

...

Possibly a minor nit, but if the devices "don't directly support automation", how is the "D" part of "CI/CD" accomplished there? `integration -ne deployment`. Do you mean something like "there is no API or e.g. netconf interface, but they can generate config off-box, scp it, and `copy start run` to load"?

More or less. I've worked at places that do this sort of thing. 1) Download config from box. 2) Run script to determine changes necesary to config. 3) Load changes. 4) Download config again. 5) Re-run the script to determine changes necessary, verify there are none. For a lot of the devices with a Cisco-IOS like interface it's not even hard. Generate a code snippet: config terminal interface e0 description bar end write mem Then tftp the config to a server, have the script see e0 has description bar. -- Leo Bicknell - bicknell@ufp.org PGP keys at http://www.ufp.org/~bicknell/

Saku Ytti

8:45 p.m.

On 11 August 2017 at 19:34, Leo Bicknell <bicknell@ufp.org> wrote: Hey,

...

For a lot of the devices with a Cisco-IOS like interface it's not even hard. Generate a code snippet:

config terminal interface e0 description bar end write mem

Then tftp the config to a server, have the script see e0 has description bar.

To me there are two fundamentally different ways to do this 1) consider world dynamic, incrementally change it 2) consider world static, generate it from scratch The first one, is like managing servers with puppet/chef/ansible, you ask it to run some set of commands when you decide you want to turn up new service. The second one, is like using docker, if you want to change it, you build new full container, and swap it to the network. The benefit of second one is, that there is absolute guarantee of the state of the device immediately after the change has been made. The first one assumes there is known state in the system, when incremental change is pushed. I am great proponent of the second way of doing things. Mainly because: a) I find it trivial to generate full config from database, where as figuring how to go from A to B I find complicated (i.e. error prone) to do b) 2nd mandates that only system is managing the device, because if someone does login and does do something out-of-system, it will go away on next change - I think this is large advantage c) I do not need to try to prove system state is currently correct by implementing more and more tests towards figuring out state, instead I prove system state by setting all of it Downside of the 2nd method is, that it requires device which supports replacing whole config, classic IOS(-XE) and SR-OS today do not. JunOS, IOS-XR, EOS (Both compass and arista) and VRP do. SR-OS is making strides towards solving this. IOS-XE I'm hoping but not holding breath. -- ++ytti

Tom Beecher

8:53 p.m.

The same way we've done it for years ; really hacky expect scripts. :) On Fri, Aug 11, 2017 at 11:51 AM, Hugo Slabbert <hugo@slabnet.com> wrote:

...

Possibly a minor nit, but if the devices "don't directly support automation", how is the "D" part of "CI/CD" accomplished there? `integration -ne deployment`. Do you mean something like "there is no API or e.g. netconf interface, but they can generate config off-box, scp it, and `copy start run` to load"?

-- Hugo Slabbert | email, xmpp/jabber: hugo@slabnet.com pgp key: B178313E | also on Signal

On Fri 2017-Aug-11 00:18:39 -0700, Jippen <cheetahmorph@gmail.com> wrote:

To be honest, most companies I've worked at have moved to amazon, where the

...
networking stack has APIs. I've also seen folks who use CI/CD pipelines to generate configuration files for devices that don't directly support automation.

Andrew Latham

10 Aug 10 Aug

3:44 p.m.

Kasper I know that many are embarrassed to share their overly manual processes and or others are keeping their solutions private. It sounds like you have a solution for your needs. I would add some transparency to the process in the form of a dashboard or summary of status for support staff, security, QA, etc to understand that a particular release was tested, approved and deployed "days" prior to the customer having an issue vs "minutes". Inter-organization communication and socialization for the win! Maybe a poll on workflow would be fun: 1. Break Fix workflow - aka ASAP 2. Whale customer requests only 3. Budget constrained projects only 4. Everything is awesome we get to test all the things AND have single button rollback 5. All of the above depending on the team and department. :) On Wed, Aug 9, 2017 at 7:52 PM, Kasper Adel <karim.adel@gmail.com> wrote:

...

We are pretty new to those new-age network orchestrators and automation,

I am curious to ask what everyone is the community is doing? sorry for such a long and broad question.

What is your workflow? What tools are your teams using? What is working what is not? What do you really like and what do you need to improve? How mature do you think your process is? etc etc

Wanted to ask and see what approaches the many different teams here are taking!

We are going to start working from a GitLab based workflow.

Projects are created, issues entered and developed with a gitflow branching strategy.

GitLab CI pipelines run package loadings and run tests inside a lab.

Tests are usually python unit tests that are run to do both functional and service creation, modification and removal tests.

For unit testing we typically use python libraries to open transactions to do the service modifications (along with functional tests) against physical lab devices.

For our prod deployment we leverage 'push on green' and gating to push package changes to prod devices.

Thanks

-- - Andrew "lathama" Latham lathama@gmail.com http://lathama.com <http://lathama.org> -

Raymond Burkholder

4:41 p.m.

some observations below:

...

On 9 Aug 2017, at 21:52, Kasper Adel <karim.adel@gmail.com> wrote:

We are pretty new to those new-age network orchestrators and automation,

There are definitely many options out there. With a considerable amount of sophisticated. But fortunately, it is possible to start simple and add layers of abstraction as knowledge and experience is gained.

...

I am curious to ask what everyone is the community is doing? sorry for such a long and broad question.

the brief version: we are working towards integrating a SaltStack with Napalm management and orchestration solution.

...

What is your workflow? What tools are your teams using? What is working what is not? What do you really like and what do you need to improve? How mature do you think your process is? etc etc

Things are getting started. I am able to automate the build of servers simply by knowing the mac address and then pxebooting the device. The operating system is installed, auto - reboote. It then automatically gets its total configuration applied , again automatically, from a Salt server. Our operating environment uses Debian. And by incorporating the auto installation of Quagga/FRR, Openvswitch, KVM/Qemu, and LXC into the appropriate devices, it is possible to build a homogenous server/router/switch/virtualization solution with certain devices picking up varying weights of those roles. The people on this list who are running high bandwidth networks, may not see this a much of a benefit, but for smaller operators, I thinks there is value. But then again, when something like Napalm is incorporated into the mix, then automation of the ‘big iron’ becomes part of the overall solution. I came across a CloudFlare slide deck which shows their perspective for management, implementation, and orchestration. https://ripe72.ripe.net/presentations/58-RIPE72-Network-Automation-with-Salt... And SaltStack has a proxy minion, which enables it to talk to cli based devices.

...

Wanted to ask and see what approaches the many different teams here are taking!

We are going to start working from a GitLab based workflow.

Salt uses generic ‘state’ files which are completed with device specific settings from ‘pillar’ files. Both of which can be version controlled in git.

...

Projects are created, issues entered and developed with a gitflow branching strategy.

GitLab CI pipelines run package loadings and run tests inside a lab.

I not affiliated with SaltStack, just a happy user. Having said that, various dev/test/prod scenarios can be implemented. With orchestrated work flows and provisioning processes based upon the level of sophistication required.

...

Tests are usually python unit tests that are run to do both functional and service creation, modification and removal tests.

Rather than re-inventing the wheel, take a look at SaltStack or Ansible and/or Napalm. All are python based and could probably get you to your target faster than when using Python natively. When it is necessary to go native python on a hairy integration problem, then it is no problem to incorporate Python as needed.

...

For unit testing we typically use python libraries to open transactions to do the service modifications (along with functional tests) against physical lab devices.

Napalm may get you that next level of sophistication where configs can be diff’d before roll-out.

...

For our prod deployment we leverage 'push on green' and gating to push package changes to prod devices.

Which can be orchestrated.

...

Thanks

Raymond Burkholder https://blog.raymond.burkholder.net -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.

Pete Lumbis

18 Aug 18 Aug

6:08 p.m.

Awesome! I gave a presentation on CI/CD for networking last year at the Interop conference; my demo was based on Gitlab https://gitlab.com/plumbis/cumulus-ci-cd/ I use Behave for testing, but it is just a front end for python code under the hood to actually validate that everything is doing what it's supposed to be doing. I did a little bit of work to try and get Ansible to do checking and validation in a playbook, but since Ansible isn't really a programming language it felt like putting a square peg in a round hole. I would recommend an actual programming language or testing frame work. Likely the biggest challenge you'll encounter is a lack of features in vendor VMs and the fact you can't change interface names. Generally, in production, we don't have "eth1, eth2, eth3" as the cabled up interfaces, so you end up needing to maintain two sets of configs (prod and test) or something to modify production configs on the fly, both of which are crummy options.

...

From a workflow perspective, you can treat configuration like code and run full test suites when pull requests are issued and then use the test results as the basis for a change review meeting. Don't let humans talk about changes that we already know won't work.

Glad to hear about other people seriously considering CI/CD in the network space, good luck! -Pete On Wed, Aug 9, 2017 at 8:52 PM, Kasper Adel <karim.adel@gmail.com> wrote:

...

We are pretty new to those new-age network orchestrators and automation,

I am curious to ask what everyone is the community is doing? sorry for such a long and broad question.

What is your workflow? What tools are your teams using? What is working what is not? What do you really like and what do you need to improve? How mature do you think your process is? etc etc

Wanted to ask and see what approaches the many different teams here are taking!

We are going to start working from a GitLab based workflow.

Projects are created, issues entered and developed with a gitflow branching strategy.

GitLab CI pipelines run package loadings and run tests inside a lab.

Tests are usually python unit tests that are run to do both functional and service creation, modification and removal tests.

For unit testing we typically use python libraries to open transactions to do the service modifications (along with functional tests) against physical lab devices.

For our prod deployment we leverage 'push on green' and gating to push package changes to prod devices.

Thanks

Andrew Latham

24 Aug 24 Aug

3:14 p.m.

Related I am working on https://github.com/lathama/Adynaton and hope to get parts into the Python Standard Library with help from some peers. Anyone who wants to help out ping me off list. On Fri, Aug 18, 2017 at 1:08 PM, Pete Lumbis <alumbis@gmail.com> wrote:

...

Awesome!

I gave a presentation on CI/CD for networking last year at the Interop conference; my demo was based on Gitlab https://gitlab.com/plumbis/cumulus-ci-cd/

I use Behave for testing, but it is just a front end for python code under the hood to actually validate that everything is doing what it's supposed to be doing.

I did a little bit of work to try and get Ansible to do checking and validation in a playbook, but since Ansible isn't really a programming language it felt like putting a square peg in a round hole. I would recommend an actual programming language or testing frame work.

Likely the biggest challenge you'll encounter is a lack of features in vendor VMs and the fact you can't change interface names. Generally, in production, we don't have "eth1, eth2, eth3" as the cabled up interfaces, so you end up needing to maintain two sets of configs (prod and test) or something to modify production configs on the fly, both of which are crummy options.

From a workflow perspective, you can treat configuration like code and run full test suites when pull requests are issued and then use the test results as the basis for a change review meeting. Don't let humans talk about changes that we already know won't work.

Glad to hear about other people seriously considering CI/CD in the network space, good luck!

-Pete

On Wed, Aug 9, 2017 at 8:52 PM, Kasper Adel <karim.adel@gmail.com> wrote:

...
We are pretty new to those new-age network orchestrators and automation,

I am curious to ask what everyone is the community is doing? sorry for such a long and broad question.

What is your workflow? What tools are your teams using? What is working what is not? What do you really like and what do you need to improve? How mature do you think your process is? etc etc

Wanted to ask and see what approaches the many different teams here are taking!

We are going to start working from a GitLab based workflow.

Projects are created, issues entered and developed with a gitflow branching strategy.

GitLab CI pipelines run package loadings and run tests inside a lab.

Tests are usually python unit tests that are run to do both functional and service creation, modification and removal tests.

For unit testing we typically use python libraries to open transactions to do the service modifications (along with functional tests) against physical lab devices.

For our prod deployment we leverage 'push on green' and gating to push package changes to prod devices.

Thanks

-- - Andrew "lathama" Latham lathama@gmail.com http://lathama.com <http://lathama.org> -

James Bensley

22 Aug 22 Aug

8:18 a.m.

On 10 August 2017 at 01:52, Kasper Adel <karim.adel@gmail.com> wrote:

...

We are pretty new to those new-age network orchestrators and automation,

I am curious to ask what everyone is the community is doing? sorry for such a long and broad question.

What is your workflow? What tools are your teams using? What is working what is not? What do you really like and what do you need to improve? How mature do you think your process is? etc etc

The wheels here move extremely slowly so it's slowly, slowly catchy monkey for us. So far we have been using Ansible and GitLab CI and the current plan is to slowly engulf the existing network device by device into the process/toolset.

...

Wanted to ask and see what approaches the many different teams here are taking!

We are going to start working from a GitLab based workflow.

Projects are created, issues entered and developed with a gitflow branching strategy.

GitLab CI pipelines run package loadings and run tests inside a lab.

Yes that is the "joy" of GitLab, see below for a more detailed breakdown but we use docker images to run CI processes, we can branch and make merge requests which trigger the CI and CD processes. It's not very complicated and it just works. I didn't compare with stuff like BitBucket, I must admit I just looked at GitLab and saw that it worked, tried it, stuck with it, no problems so far.

...

Tests are usually python unit tests that are run to do both functional and service creation, modification and removal tests.

For unit testing we typically use python libraries to open transactions to do the service modifications (along with functional tests) against physical lab devices.

Again see below, physical and virtual devices, and also some custom python scripts for unit tests like checking IPv4/6 addresses are valid (not 999.1.2.3 or AA:BB:HH::1), AS numbers are valid integeters of the right size etc.

...

For our prod deployment we leverage 'push on green' and gating to push package changes to prod devices.

Thanks

Yeah that is pretty much my approach too. Device configs are in YAML files (actually multiple files). So one git repo stores the constituent YAML files, when you update a file and push to the repo the CI process starts which runs syntax checks and semantic checks against the YAML files (some custom python scripts basically). As Saku mentioned, we also follow the “replace entire device config” approach to guarantee the configuration state (or at least “try” when it comes to crazy old IOS). So this means we have Jinja2 templates that render YAML files into device specific CLI config files. They live in a separate repo and again many constituent Jinaj2 files make one entire device template. So any push to this Jinja2 repo triggers a separate CI workflow which performs syntax checking and semantic checking of the Jinja2 templates (again, custom Python scripts). When one pushes to the YAML repo to update a device config, the syntax and semantic checks are made against the YAML files; they are then “glued” together to make the entire device configs in a single file, the Jinja2 repo is checked out, the entire YAML file is used to feed the Jinja templates and configs are built and now the vendor specific config needs to be syntax checked. This CD part of the process (to a testing area) is a WIP still, for Junos we can push to a device and use “commit check” for IOS and others we can’t. So right now I’m working on a mixture of pushing the config to virtual IOS devices and to physical kit in the lab but this also causes problems in that interface / line card slot numbers/names will change so we need to run a few regex statements against the config to jimmy it into a lab device (so pretty ugly and temporary I hope). When the CD to “testing” passes then CD to “production” can be manually triggered. Another repo stores the running config of all devices (from the previous push). So we can push the candidate config to a live device (using Ansible with NAPALM [1]) and get a diff against the running config, make the “config replace” action, then download the running config and put that back into the repo. So we have a local stored copy of device configs so we can see off-line the diff’s between pushes. It also provides a record that the process of going form YAML > Jinaj2 > to device produces the config we expected (although prior to this one will have had to make a branch and then a merge request, which is peer reviewed, to get the CD part to run and push to device, so there shouldn’t be any surprises this late in the process!). Is it fool proof, no. It is a young system still being design and developed. Is it better than before, hell yes. Cheers, James. [1] Ansible and NAPALM here might seem like overkill but we use Ansible for other stuff like x86 box management so this means configuring a server or a router is abstracted through one single tool to the operator (i.e. playbooks are use irrelevant of device type, rather than say playbooks for servers but python scripts for firewalls). Also we use YAML files as config files for x86 boxes also living in GitLab with a CI/CD process so again, one set of tools for all.

2985

Age (days ago)

2999

Last active (days ago)

List overview

Download

11 comments

11 participants

participants (11)

Andrew Latham
Hugo Slabbert
James Bensley
Jippen
Joe Hamelin
Kasper Adel
Leo Bicknell
Pete Lumbis
Raymond Burkholder
Saku Ytti
Tom Beecher