Your opinion on network analysis in the presence of uncertain events
Hi NANOG, Networks evolve in uncertain environments. Links and devices randomly fail; external BGP announcements unpredictably appear/disappear leading to unforeseen traffic shifts; traffic demands vary, etc. Reasoning about network behaviors under such uncertainties is hard and yet essential to ensure Service Level Agreements. We're reaching out to the NANOG community as we (researchers) are trying to better understand the practical requirements behind "probabilistic" network reasoning. Some of our questions include: Are uncertain behaviors problematic? Do you care about such things at all? Are you already using tools to ensure the compliance of your network design under uncertainty? Are there any good? We designed a short anonymous survey to collect operators answers. It is composed of 14 optional questions, most of which (13/14) are closed-ended. It should take less than 10 minutes to complete. We expect the findings to help the research community in designing more powerful network analysis tools. Among others, we intend to present the aggregate results in a scientific article later this year. It would be *terrific* if you could help us out! Survey URL: https://goo.gl/forms/HdYNp3DkKkeEcexs2 Thanks much! Laurent Vanbever, ETH Zürich PS: It goes without saying that we would also be extremely grateful if you could forward this email to any operator you know and who may not read NANOG.
I took the survey. It’s short and sweet — well done! I do have a question. You ask "Are there any good?” Any good what? -mel On Jan 15, 2019, at 10:59 AM, Vanbever Laurent <lvanbever@ethz.ch<mailto:lvanbever@ethz.ch>> wrote: Hi NANOG, Networks evolve in uncertain environments. Links and devices randomly fail; external BGP announcements unpredictably appear/disappear leading to unforeseen traffic shifts; traffic demands vary, etc. Reasoning about network behaviors under such uncertainties is hard and yet essential to ensure Service Level Agreements. We're reaching out to the NANOG community as we (researchers) are trying to better understand the practical requirements behind "probabilistic" network reasoning. Some of our questions include: Are uncertain behaviors problematic? Do you care about such things at all? Are you already using tools to ensure the compliance of your network design under uncertainty? Are there any good? We designed a short anonymous survey to collect operators answers. It is composed of 14 optional questions, most of which (13/14) are closed-ended. It should take less than 10 minutes to complete. We expect the findings to help the research community in designing more powerful network analysis tools. Among others, we intend to present the aggregate results in a scientific article later this year. It would be *terrific* if you could help us out! Survey URL: https://goo.gl/forms/HdYNp3DkKkeEcexs2 Thanks much! Laurent Vanbever, ETH Zürich PS: It goes without saying that we would also be extremely grateful if you could forward this email to any operator you know and who may not read NANOG.
I took the survey. It’s short and sweet — well done!
Thanks a lot, Mel! Highly appreciated!
I do have a question. You ask "Are there any good?” Any good what?
I just meant whether existing network analysis tools were any good (or good enough) at reasoning about probabilistic behaviors that people care about (if any). All the best, Laurent
I know of none that take probabilities as inputs. Traditional network simulators, such as GNS3, let you model various failure modes, but probability seems squishy enough that I don’t see how it can be accurate, and thus helpful. It’s like that Dilbert cartoon where the pointy haired boss asks for a schedule of all future unplanned outages :) https://dilbert.com/strip/1997-01-29 -mel On Jan 15, 2019, at 11:59 AM, Vanbever Laurent <lvanbever@ethz.ch<mailto:lvanbever@ethz.ch>> wrote: I took the survey. It’s short and sweet — well done! Thanks a lot, Mel! Highly appreciated! I do have a question. You ask "Are there any good?” Any good what? I just meant whether existing network analysis tools were any good (or good enough) at reasoning about probabilistic behaviors that people care about (if any). All the best, Laurent
My understanding was that the tool will combine historic data with the MTBF datapoints form all components involved in a given link in order to try and estimate a likelihood of a link failure. Heck I imagine if one would stream a heap load of data at a ML algorithm it might draw some very interesting conclusions indeed -i.e. draw unforeseen patterns across huge datasets while trying to understand the overall system (network) behaviour. Such a tool might teach us something new about our networks. Next level would be recommendations on how to best address some of the potential pitfalls it found. Maybe in closed systems like IP networks, with use of streaming telemetry from SFPs/NPUs/LC-CPUs/Protocols/etc.., we’ll be able to feed the analytics tool with enough data to allow it to make fairly accurate predictions (i.e. unlike in weather or markets prediction tools where the datasets (or search space -as not all attributes are equally relevant) is virtually endless). adam From: NANOG <nanog-bounces@nanog.org> On Behalf Of Mel Beckman Sent: Tuesday, January 15, 2019 10:40 PM To: Vanbever Laurent <lvanbever@ethz.ch> Cc: nanog@nanog.org Subject: Re: Your opinion on network analysis in the presence of uncertain events I know of none that take probabilities as inputs. Traditional network simulators, such as GNS3, let you model various failure modes, but probability seems squishy enough that I don’t see how it can be accurate, and thus helpful. It’s like that Dilbert cartoon where the pointy haired boss asks for a schedule of all future unplanned outages :) https://dilbert.com/strip/1997-01-29 -mel On Jan 15, 2019, at 11:59 AM, Vanbever Laurent <lvanbever@ethz.ch <mailto:lvanbever@ethz.ch> > wrote: I took the survey. It’s short and sweet — well done! Thanks a lot, Mel! Highly appreciated! I do have a question. You ask "Are there any good?” Any good what? I just meant whether existing network analysis tools were any good (or good enough) at reasoning about probabilistic behaviors that people care about (if any). All the best, Laurent
MTBF can’t be used alone to predict failure probability, because product mortality follows the infamous “bathtub curve”. Products are as likely to fail early in their lives as later in their lives. MTBF as a scalar value is just an average. -mel via cell On Jan 16, 2019, at 12:43 PM, "adamv0025@netconsultings.com<mailto:adamv0025@netconsultings.com>" <adamv0025@netconsultings.com<mailto:adamv0025@netconsultings.com>> wrote: My understanding was that the tool will combine historic data with the MTBF datapoints form all components involved in a given link in order to try and estimate a likelihood of a link failure. Heck I imagine if one would stream a heap load of data at a ML algorithm it might draw some very interesting conclusions indeed -i.e. draw unforeseen patterns across huge datasets while trying to understand the overall system (network) behaviour. Such a tool might teach us something new about our networks. Next level would be recommendations on how to best address some of the potential pitfalls it found. Maybe in closed systems like IP networks, with use of streaming telemetry from SFPs/NPUs/LC-CPUs/Protocols/etc.., we’ll be able to feed the analytics tool with enough data to allow it to make fairly accurate predictions (i.e. unlike in weather or markets prediction tools where the datasets (or search space -as not all attributes are equally relevant) is virtually endless). adam From: NANOG <nanog-bounces@nanog.org<mailto:nanog-bounces@nanog.org>> On Behalf Of Mel Beckman Sent: Tuesday, January 15, 2019 10:40 PM To: Vanbever Laurent <lvanbever@ethz.ch<mailto:lvanbever@ethz.ch>> Cc: nanog@nanog.org<mailto:nanog@nanog.org> Subject: Re: Your opinion on network analysis in the presence of uncertain events I know of none that take probabilities as inputs. Traditional network simulators, such as GNS3, let you model various failure modes, but probability seems squishy enough that I don’t see how it can be accurate, and thus helpful. It’s like that Dilbert cartoon where the pointy haired boss asks for a schedule of all future unplanned outages :) https://dilbert.com/strip/1997-01-29 -mel On Jan 15, 2019, at 11:59 AM, Vanbever Laurent <lvanbever@ethz.ch<mailto:lvanbever@ethz.ch>> wrote: I took the survey. It’s short and sweet — well done! Thanks a lot, Mel! Highly appreciated! I do have a question. You ask "Are there any good?” Any good what? I just meant whether existing network analysis tools were any good (or good enough) at reasoning about probabilistic behaviors that people care about (if any). All the best, Laurent
Hi Adam/Mel, Thanks for chiming in! My understanding was that the tool will combine historic data with the MTBF datapoints form all components involved in a given link in order to try and estimate a likelihood of a link failure. Yep. This could be one way indeed. This likelihood could also be taking the form of intervals in which you expect the true value to lies (again, based on historical data). This could be done both for link/devices failures but also for external inputs such as BGP announcements (to consider the likelihood that you receive a route for X in, say, NEWY). The tool would then to run the deterministic routing protocols (not accounting for ‘features’ such as prefer-oldest-route for a sec.) on these probabilistic inputs so as to infer the different possible forwarding outcomes and their relative probabilities. For now we had something like this in mind. One can of course make the model more and more complex by e.g. also taking into account data plane status (to model gray failures). Intuitively though, the more complex the model, the more complex the inference process is. Heck I imagine if one would stream a heap load of data at a ML algorithm it might draw some very interesting conclusions indeed -i.e. draw unforeseen patterns across huge datasets while trying to understand the overall system (network) behaviour. Such a tool might teach us something new about our networks. Next level would be recommendations on how to best address some of the potential pitfalls it found. Yes. I believe some variants of this exist already. I’m not sure how much they are used in practice though. AFAICT, false positives/negatives is still a big problem. Non-trivial recommendation system will require a model of the network behavior that can somehow be inverted easily which is probably something academics should spend some time on :-) Maybe in closed systems like IP networks, with use of streaming telemetry from SFPs/NPUs/LC-CPUs/Protocols/etc.., we’ll be able to feed the analytics tool with enough data to allow it to make fairly accurate predictions (i.e. unlike in weather or markets prediction tools where the datasets (or search space -as not all attributes are equally relevant) is virtually endless). I’m with you. I also believe that better (even programmable) telemetry will unlock powerful analysis tools. Best, Laurent PS: Thanks a lot to those who have already answered our survey! For those who haven’t yet: https://goo.gl/forms/HdYNp3DkKkeEcexs2 (it only takes a couple of minutes).
On Tue, 15 Jan 2019 at 19:01, Vanbever Laurent <lvanbever@ethz.ch> wrote:
Hi NANOG,
Networks evolve in uncertain environments. Links and devices randomly fail; external BGP announcements unpredictably appear/disappear leading to unforeseen traffic shifts; traffic demands vary, etc. Reasoning about network behaviors under such uncertainties is hard and yet essential to ensure Service Level Agreements.
We're reaching out to the NANOG community as we (researchers) are trying to better understand the practical requirements behind "probabilistic" network reasoning. Some of our questions include: Are uncertain behaviors problematic? Do you care about such things at all? Are you already using tools to ensure the compliance of your network design under uncertainty? Are there any good?
We designed a short anonymous survey to collect operators answers. It is composed of 14 optional questions, most of which (13/14) are closed-ended. It should take less than 10 minutes to complete. We expect the findings to help the research community in designing more powerful network analysis tools. Among others, we intend to present the aggregate results in a scientific article later this year.
It would be *terrific* if you could help us out!
Survey URL: https://goo.gl/forms/HdYNp3DkKkeEcexs2
Thanks much!
Laurent Vanbever, ETH Zürich
PS: It goes without saying that we would also be extremely grateful if you could forward this email to any operator you know and who may not read NANOG.
Hi Laurent, I have filled out the survey however, I would just like to request that in the future you don't use a URL shortner like goo.gl; many people don't like those because we can't see were you're sending us until we click that link. Some people also block them because they are a security issue (our corporate proxy does, I have to drop off the VPN or use a URL expander to retrieve the original URL). Also have you seen Batfish? I looks like you guys want to write a tool that has some overlap with Batfish. Batfish can ingest the configs from my network and answer questions such as "can host A can reach host B?" or "will prefix advertisement P from host A will be filtered/accepted by host B?", "if I ping from this source IP who has a return route and can respond?" etc. Kind regards, James.
participants (4)
-
adamv0025@netconsultings.com
-
James Bensley
-
Mel Beckman
-
Vanbever Laurent