On Fri, Aug 16, 2019 at 5:02 AM Robert Kisteleki <robert@ripe.net> wrote:
Hi,
On 2019-08-15 17:38, Christopher Morrow wrote:
This looks like fun! (a few questions for the RIPE folk, I think though below)
What is the expected load of streaming clients on the RIPE service? (I wonder because I was/am messing about with something similar, though less node and js... not that that's relevant here).
One of the (IMO) most useful features is that you can filter what you want to receive. In fact this makes the service useful :-) So unless you want to tune in to a significant portion of BGP chatter, the load should not be substantial.
yup, I can see a usecase clearly for: "This is my prefix set, and my transit-as-set, tell me when there are deviations" (which is probably 2 different connections with 2 different filters to the not-fire-hose feed - oh the docs say you can provide more than one filter, ok... cool) The firehose is perhaps more friendly for folk like an ISP that could offer some form of monitoring for their customer's prefixes? It's also useful (to me anyway) to tell me: "I see prefix-A picked up a new Origin? odd?" or "Wow, someone 7007'd themselves!" which isn't clearly (to me anyway) simple to do in the 'not firehose' version of the stream/service... The firehose also looks like a great feed to add to my other internal route monitoring things: 1) get bgp data from my firewall's upstream devices 2) get bgp from my internal network 3) eat bmp from my PE/CE device set 4) add rislive-firehose 5) add routeviews/ris update data when available (poll each 15m min, process mrt && ingest data) determine what patterns/filters/thigns I want to monitor: "did prefixX just change upstream ASN and I should bias traffic differently toward that prefix?" etc...
I hadn't seen the ripe folk pipe up anywhere with what their SLO/etc is for the ris-live service? (except their quip about: "used to run in a tmux session I had to occassioanlly ssh into <foo> and restart when <foo> rebooted" I believe the end of that quip in Iceland was: "and now its' running as a real service")
It's in between those. We now have a conscious setup which should also be able to scale up, but bits and pieces (like full monitoring of the service) are still being developed.
ok cool! as with my question to John Curran about ARIN service SLOs I'm really asking: "Hey, if I inputting this data into my business process I want to know what to expect from a performance/scalability/outage/reliability perspective" if that's not written down and published then some folks MAY chose to believe: "Well, it's available now, and now and now.. so 'always, 100%!!' seems sane!" or others may choose to believe; "Well, nice toy you have there... let me know when it's ready for me to ingest into my production monitoring/etc systems" <toddle off to the corner to play ball with cartman...>
Also, one of the strengths to the 'monitoring as a service' folks is their number of collection points and breadth of ASN to which they interconnect those points/ RISLive, I think, reports out from ~37 or so RIPE probes, how do we (the internet) get more deployed (or better interconnection to the current sets)? and maybe even more imoprtantly... what's the right spread/location/interconnectivity map for these probes?
RIS Live provides data from RIS, which has a bunch of collectors around the world (see https://www.ripe.net/analyse/internet-measurements/routing-information-servi...) with many hundreds of peering sessions. But it is by no means complete in terms of coverage.
If and how the community (NANOG or RIPE or else) should work on optimal data collection is indeed a useful discussion to have.
ok, cool! :)
Cheers, Robert
thanks! for showing what's possible with tooling being developed by like minded individuals :)
-chris