From: Saku Ytti <saku@ytti.fi> Sent: Friday, January 25, 2019 7:59 AM
On Thu, 24 Jan 2019 at 18:43, <adamv0025@netconsultings.com> wrote:
We fight with that all the time, I'd say that from the whole Design->Certify->Deploy->Verify->Monitor service lifecycle time budget, the service certification testing is almost half of it. That's why I'm so interested in a model driven design and testing approach.
This shop has 100% automated blackbox testing, and still they have to cherry- pick what to test.
Sure one tests only for the few specific current and near future use cases.
Do you have statistics how often you find show-stopper issues and how far into the test they were found?
I don't keep those statistics, but running bug scrubs in order to determine the code for regression testing is usually good starting point to avoid show-stoppers, what is then found later on during the testing is usually patched -so yes you end up with a brand new code and several patches related to your use cases (PEs, Ps, etc..)
I expect this to be exponential curve, like upgrading box, getting your signalling protocols up, pushing one packet in each service you sell is easy and fast, I wonder will massive amount of work increase confidence significantly from that.
Yes it will.
The issues I tend to find in production are issues which are not trivial to recreate in lab, once we know what they are, which implies that finding them a-priori is bit naive expectation. So, assumptions:
This is because you did your due diligence during the testing. Do you have statistics on the probability of these "complex" bugs occurrence?
Hopefully we'll enter NOS future where we download NOS from github and compile it to our devices. Allowing whole community to contribute to unit testing and use-cases and to run minimal bug surface code in your environment.
Not there yet, but you can compile your own routing protocols and run those on vendor OS.
I see very little future in blackbox testing vendor NOS at operator site, beyond quick poke at lab. Seems like poor value. Rather have pessimistic deployment plan, lab => staging => 2-3 low risk site => 2-3 high risk site => slow roll up
Yes that's also a possibility -one of the strong arguments for massive disaggregation at the edge, to reduce the fallout of a potential critical failure. Depends on the shop really.
I really need to have this ever growing library of test cases that the automat will churn through with very little human intervention, in order to reduce the testing from months to days or weeks at least.
Lot of vendor, maybe all, accept your configuration and test them for releases. I think this is only viable solution vendors have for blackbox, gather configs from customers and test those, instead of try to guess what to test. I've done that with Cisco in two companies, unfortunately I can't really tell if it impacted quality, but I like to think it did.
Did that with juniper partners and now directly with Juniper. The thing is though they are using our test plan... adam