On Wed, Jan 24, 2007 at 08:05:24PM +0000, Paul Vixie wrote:
glibly said, sir. but i disasterously underestimated the amount of time and money it would take to build BIND9.
While I can't question your credentials at creating serious network infrastructure, I wonder about the comparison between BIND9 and a network monitoring framework that _I_ envision. I think I know a couple of requirement handicaps that BIND9 had which a new tool wouldn't. Specifically, you have to ensure compatibility with the RFCs, which locks you into a fairly complicated parser for the least-writable data format (the zone file) that I have ever had the displeasure of editing. While it gets easier over time, it seems remarkably difficult to get right the first time. Mostly, people forget to update the serial number, but other problems are common too. I imagine you also wanted to maintain the overall structure of the config file, but I don't see this as particularly problematic; it seems straightforward enough for me. Furthermore, there is the monolithic issue; while I find it very convenient to have two name servers instead of four on my home network, it seems that it is serving too many masters (pun not intended). If recursive queries and queries to authoritative name servers had different ports, then there would be little reason to have both in the same package. I can solve it easily right now with IP aliases, which I consider a kluge, but the package I would use for it doesn't support some things that would be nice, like dynamic updates, but I suppose those too could be split off fairly easily. Everybody I know who would have a use for a scalable monitoring system is capable of scripting, and most capable of programming to extend the framework. I suspect an attempt to anticipate every possible need and solve them all for once with one tool would end up growing to unmanagable complexity far too quickly. A framework is the easy part. At the URL in my signature you can find the dynamic firewall daemon, a framework for dynamically adjusting firewall rules. It has an asynch I/O core, so one thread, one program, one firewall, many clients. There is a python version for netfilter/Linux (which is very alpha and needs a new maintainer) and for BSD (pf of course). It supports fixed-size rule queues, rules which timeout at a particular time (can be relative to current), rule sets that can be enabled or disabled by commands, variable substitution (where variable means "modifiable by external programs", and so on, without requiring chains, tables or lists in the firewall syntax. Although I spent a lot of time on design in my head, writing the code is the easy part. It took about a thousand lines of code. I could probably do it in less than 40 hours, but I couldn't do it all at once. The real problem appears to be thinking something over and over and letting your subconscious work on it until you're pretty sure the answer you had converged on consciously is the right one. The hard part, I have found, is in getting people to contribute to it (or generating awareness, which may be a precondition). I'm thinking about writing up a paper on it for submitting to Usenix ;login: magazine, you might keep an eye open for it if you are interested. If you are interested in python and netfilter/iptables, and have some free time, then definitely send me an email; if you know anyone who would like to be an author of a cutting-edge network security system, let them know about it.
and talk to devices that will never go to an snmp connectathon,
Here is a scoping problem. If I started with this goal, I'd be stuck in analysis paralysis forever. I'd rather start with SNMP, and get a usable product that could be extended. The complexity of the task goes up with the square of the things to consider, so I think it's absolutely essential to start with limited objectives and generalize where appropriate on subsequent generations. It seems to me the scalability problem (where most of the data is never read, and one box has to do everything) is more a problem of not being able to have the clients provide some resources without also having a complicated remote interface. Computers are very fast and only getting faster (of course disk I/O bandwidth is not accelerating at all, compared to CPU or network bandwidth). I'm not convinced it would take something python or another very expressive language could provide if properly distributed, and that alone would reduce the time spent writing code by a factor of 10-100, excluding the time spent coming up with a simple (secure) way of distributing the load. I would be most interested in hearing what NANOG people would like to see in a monitoring tool. I think this is an excellent forum for hashing out what it should really do, and how. -- ``Unthinking respect for authority is the greatest enemy of truth.'' -- Albert Einstein -><- <URL:http://www.subspacefield.org/~travis/>