Hi all, I’m part of a small team that’s been working on network visibility and security for some time now and we ended up developping a software network probe from scratch that we are considering to open-source. Our ask is not really should we, more how much of it? Would any of you use that and how? Now, for a bit of context. For years we have used the usual stack: Zeek plus homegrown glue for databases, dashboards, and so on. That works well enough in many environments (I mean, that’s what META uses for it’s own DC, so it does work alright), but in our own environment… we repeatedly hit limits in a few places: small edge boxes, noisy OT/telco environments, MSP-style multi-tenant deployments, and links where bandwidth drops are painful. At some point we stopped trying to patch more on top of Zeek and started building our own internal network probe instead 6 years ago (time flies by when you code fun stuff). We are now trying to decide how much of it (if any) should be open-sourced, and I’d like to sanity-check that with people who run similar tooling in production. What the probe does (high level) This is NANOG, so a network probe (DPI-based), shouldn’t be strange for a lot of you ;-) For those who are not familiar, it’s a tool that captures packets and turns them into enriched metadata / DB-ready records (flows, protocols, selected network metadata, assets, etc.). Some operational characteristics: * At around 1 Gbit/s sustained line rate, we are currently in the ballpark of: These numbers are still being tuned, but that’s the order of magnitude. * 2 CPU cores and 8GB of RAM * ~10 Mbit/s of metadata sent to the database * ~1 Mbit/s stored on disk after compression That's for the lean part, now for the carrier-grade: * Internal testing, plus commercial test gear (we had it certified by Spirent Communication), shows no false detections at full ine-rate 100 Gbit/s (145.000.000 pps, for around 10 to 12.000.000 news sessions per second) for 135+ network protocols (L2–L7) and nearly 4000+ applications, on simple commodity hardware (no FPGA/ASIC; just good old CPU and RAM). We are not using DPDK. For higher-speed use cases we ended up writing our own NIC drivers in Rust, too. For small links at full line rate with full protocol analysis, resource usage is roughly an order of magnitude lower than what we observed with Zeek or Suricata in equivalent scenarios (happy to share benchmark details if useful). What were our Design goals, in brief: * Low footprint: able to run “where Zeek/Suricata hurt”: on-prem systems, VMs, small Kubernetes worker nodes, cloud workloads, on both small x86 or ARM edge boxes (thanks to Rust), etc. * Simple deployment: a single static Rust binary with no dynamic dependencies. Drop it on a recent Linux host, point it at an interface, and it starts capturing. There is an installer with a CLI mode for use with Ansible/other automation. Optionally, a dockerized DB pipeline for ClickHouse/Postgres. * Fleet-oriented: usable at the scale of hundreds or thousands of probes in an MSP / distributed environment. * Outputs: JSON over HTTP / REST API, plus structured schemas for ClickHouse/Postgres so operators can plug in their own analytics, detections, or reporting. * Implementation: full Rust codebase, with a focus on predictability and safety rather than ad-hoc packet tricks that reduce visibility or telemetry quality. Why we didn’t just stick with Zeek This is not “Zeek bad, our code good”. We simply had a different set of constraints. The main drivers were: * Resource footprint when deploying probes directly on Kubernetes worker nodes, small cloud instances, or ARM edge devices. We also wanted to reuse the same probe design when monitoring much higher-speed links on commodity hardware. * Fleet and multi-tenant operation: the need to deploy, manage, and upgrade a large number of probes in an MSP / MSSP context, with clear separation between tenants. * Tighter control over metadata shape and volume: so that DB / storage does not explode in noisy environments. Our past Zeek deployments filled an Elastic cluster in a couple of days and often forced us to rebuild that instance; we wanted more predictable volume control. The result is a probe that overlaps with Zeek/Suricata functionally, but with different trade-offs. Open source, open core, or something else? Internally we are debating what would actually be useful to open-source for the operator community, versus what (if anything) should remain “product”. The rough options we see: * Open-source the probe engine and protocol parsers, so operators can run and extend it themselves and build their own services / UX on top. * Open-source primarily the DB schemas, ingestion pipeline, and operational tooling, while keeping the probe itself closed. * Keep the entire stack closed and offer it only as a self-hosted / appliance / cloud solution. Before we spend months going down any of these routes, I would really value operator feedback. Specific questions for the readers courageous enough to have reached this point in the post ;-) 1. Does this actually fill a gap for you, or is your current setup “good enough”? If you have deployed Zeek / Suricata / nProbe / NTOP / similar in anger, would you even look at something like this? 2. If some part of it were open-sourced, what would be most useful to you in practice? * Core probe and parsers? * Schemas / ingestion pipeline / deployment tooling? * SDKs / libraries to embed in your own systems? * Something else entirely? 3. Licensing / model concerns: Are there licenses that are an immediate “no” (e.g. AGPL)? Would “core open-source with additional commercial features” be acceptable, or is that a non-starter in your environment? 4. How you would realistically consume it: In your networks, would you be more likely to: * run it as a self-hosted binary on your own infrastructure, * deploy it as some kind of appliance, * or consume it as a managed service that delivers metadata or alerts? 5. What would make you discard it immediately? Examples: excessive resource usage, awkward integration model, unclear security story, problematic license, unclear long-term maintenance, etc. This is not a product announcement, beta signup, or marketing exercise. There are no links in this message. I am trying to avoid spending time open-sourcing the wrong components, or doing it in a way that doesn’t match how operators would actually use such a tool. If you have fought with network telemetry in production, I would appreciate hearing “this would be useful if X/Y/Z” or “we wouldn’t bother, because…”. I am happy to answer technical questions and take blunt feedback, on- or off-list, if this is of interest. Best regards, Fanch Fanch FRANCIS, PhD CEO +33 6 14 60 05 47 https://calendly.com/fanch-nanocorp/visio https://www.nanocorp.ai/<https://141-nanocorp.trakqit.com/?u=https:%2F%2Fwww.nanocorp.ai%2F&e=d4d55ac50f741f8ca2a25dfe80934e92> [signature_2808701025] [https://141-nanocorp.trakqit.com/img/d4d55ac50f741f8ca2a25dfe80934e92]