On Wed, Jan 24, 2007 at 08:34:19AM -0500, Jason LeBlanc wrote:
I would say somewhere around 4000 network interfaces (6-8 stats per int) and around 1000 servers (8-10 stats per server) we started seeing problems, both with navigation in the UI and with stats not reliably updating. I did not try that poller, perhaps its worth trying it again using it. I will also say this was about 2 years ago, I think the box it was running on was a dual P3-1000 with a raid 10 using 6 drives (10k rpm I think).
After looking for 'the ideal' tool for many years, it still amazes me that no one has built it. Bulk gets, scalable schema and good portal/UI. RTG is better than MRTG, but the config/db/portal are still lacking.
So, i've been the caretaker of a few different snmp pollers over a few years, as well as done some database foo (250m+ rows/day of data) and these things interrelate in a number of ways. First start with the polling, you need to do bulkget/bulkwalk of the various mibs to collect the data in a reasonable way, timestamp it all (either internally before you "cook" the data), poll frequently enough to detect spikes (including inaccurate spikes and backwards/missing counter bugs), etc.. Take a simple set of data you might want to collect: router interfaces (mib) up/down in/out octets, in/out packets, in errors/out drops speed (ifMIB too?) ifMIB (64-bit counters, but only sometimes) description speed (interface mib too?) mpls ? ldp? te? paths? mac accounting ? then you get into do you store the raw data you collect with markers for snmp timeouts, or just a 5 min calculation/sample? (this relates to the above 250m rows/day) how do you define your schema? how long does it take to insert/index/whatnot the data? how to handle ifindex moves (not just one vendor too, don't forget that)? how do you match that link to a customer for billing? who gets what reports? engineering reports too? provisioning link-in? tie to ip address db (interface ip<->customer mapping)? the list goes on and on, this is just part of it, let alone any possible tracking of assets/hardware, let alone proactive network monitoring (tie those traps/walks) to the internal ping(er) to passive network monitoring, etc.. this is a huge burden to figure it all out, implement and then monitor/operate 24x7. miss enough samples or data and you end up billing too little. this is why most folks have either cooked their own, or use some expensive suite of tools, leaving just a little bit of other stuff out there. in a lot of ways, just buying a ge/10ge and paying some alternate price for it may be cheaper than a burstable rate as it could reduce a lot of this extra cost. i remember hearing that it cost telcos more to count/track the calls to give you a detailed bill than for the call itself. this is why flat-rate is nearly king these days (in the us at least). - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.