OT: If you thought Y2K was bad, wait until cyber-security hits
(shooting self in foot...) Just eliminate tech support and proprietary software! "A list of our settings is available at www.domain.com/settings. And don't call us with tech problems. We don't do tech support." I know of at least one ISP out there already doing this. Not that they're highly successful, but imagine not having to tell someone, "Yes, your username and password are case sensitive and must be spelled exactly as supplied. And it's .net, not .com" ever again. Or alternately just require registration through a BBS system as a clue test. :) (Waiting for visit from the sales/marketing/shareholder folk...) Best regards, _________________________ Alan Rowland -----Original Message----- From: Valdis.Kletnieks@vt.edu [mailto:Valdis.Kletnieks@vt.edu] Sent: Saturday, July 20, 2002 10:03 PM To: Scott Francis Cc: nanog@merit.edu Subject: Re: If you thought Y2K was bad, wait until cyber-security hits Snip... I'll personally nominate for sainthood anybody who figures out how to make it work for an ISP's terms of service. ;) -- Valdis Kletnieks Computer Systems Senior Engineer Virginia Tech
On Mon, Jul 22, 2002 at 10:00:44AM -0700, alan_r1@corp.earthlink.net said:
(shooting self in foot...)
Just eliminate tech support and proprietary software! "A list of our settings is available at www.domain.com/settings. And don't call us with tech problems. We don't do tech support."
I know of at least one ISP out there already doing this. Not that they're highly successful, but imagine not having to tell someone, "Yes, your username and password are case sensitive and must be spelled exactly as supplied. And it's .net, not .com" ever again.
http://www.flex.com/ Unfortunately, it looks like they took down the hate mail page, which was hysterical. *sigh* They target clueful users only, and seem to be getting by just fine. http://www.flex.com/adsl/ has a bit more of the "intelligent users only" pitch. -- -= Scott Francis || darkuncle (at) darkuncle (dot) net =- GPG key CB33CCA7 has been revoked; I am now 5537F527 illum oportet crescere me autem minui
On Mon, 22 Jul 2002, Scott Francis wrote: : On Mon, Jul 22, 2002 at 10:00:44AM -0700, alan_r1@corp.earthlink.net said: : > : > (shooting self in foot...) : > : > Just eliminate tech support and proprietary software! "A list of our : > settings is available at www.domain.com/settings. And don't call us with : > tech problems. We don't do tech support." : > : > I know of at least one ISP out there already doing this. Not that they're : > highly successful, but imagine not having to tell someone, "Yes, your : > username and password are case sensitive and must be spelled exactly as : > supplied. And it's .net, not .com" ever again. : : http://www.flex.com/ : : Unfortunately, it looks like they took down the hate mail page, which was : hysterical. *sigh* They target clueful users only, and seem to be getting by : just fine. http://www.flex.com/adsl/ has a bit more of the "intelligent users : only" pitch. One of Hawaii's fun things... ;-) http://www.flex.com/net_status/fan_con.html scott "sorry sir but i find AOL easy to use, i didnt know that since AOL is a helluva lot easier to use than freakin IE im considered computer illteritate, just quit bashing AOL, not all of us are sado-masochists." *heh* no need to comment, but it surely is begging for it... :-)
I met del at a mini "Computer Expo" at Wailea, Maui in '96. He was dealing Blackjack in his booth for prizes (I won an external 14.4 modem) and giving away "beta test" dialup accounts. I thought that 'shaka.com' was cool, so after 6 months of free beta, I signed up and have been with them since. --Michael ----- Original Message ----- From: "Scott Weeks" <surfer@mauislanwanman.com> To: "Scott Francis" <darkuncle@darkuncle.net> Cc: "Rowland, Alan D" <alan_r1@corp.earthlink.net>; <nanog@merit.edu> Sent: Monday, July 22, 2002 11:04 AM Subject: Re: OT: If you thought Y2K was bad, wait until cyber-security hits
On Mon, 22 Jul 2002, Scott Francis wrote:
: On Mon, Jul 22, 2002 at 10:00:44AM -0700, alan_r1@corp.earthlink.net said: : > : > (shooting self in foot...) : > : > Just eliminate tech support and proprietary software! "A list of our : > settings is available at www.domain.com/settings. And don't call us with : > tech problems. We don't do tech support." : > : > I know of at least one ISP out there already doing this. Not that they're : > highly successful, but imagine not having to tell someone, "Yes, your : > username and password are case sensitive and must be spelled exactly as : > supplied. And it's .net, not .com" ever again. : : http://www.flex.com/ : : Unfortunately, it looks like they took down the hate mail page, which was : hysterical. *sigh* They target clueful users only, and seem to be getting by : just fine. http://www.flex.com/adsl/ has a bit more of the "intelligent users : only" pitch.
One of Hawaii's fun things... ;-)
http://www.flex.com/net_status/fan_con.html
scott
"sorry sir but i find AOL easy to use, i didnt know that since AOL is a helluva lot easier to use than freakin IE im considered computer illteritate, just quit bashing AOL, not all of us are sado-masochists."
*heh* no need to comment, but it surely is begging for it... :-)
There was some mail being tossed around earlier about Cogent having latency. I'm actually seeing this on PSINet (Now owned by Cogent.) Is anyone else still seeing the latency they were experiencing earlier? Derek
Yes, it's horrid. I've been peering with PSI for going on three years, and it's never been as bad as it is now. oddly enough, we see 30+ msec across a DS3 to them, which isn't that loaded (35 to 40 mb/s). Then, behind whatever we peer with, we see over 400 msec, with 50% loss, during business hours. On Mon, 22 Jul 2002, Derek Samford wrote:
There was some mail being tossed around earlier about Cogent having latency. I'm actually seeing this on PSINet (Now owned by Cogent.) Is anyone else still seeing the latency they were experiencing earlier?
Derek
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
40mb/s isn't "loaded" for a DS3? --Phil -----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Alex Rubenstein Sent: Monday, July 22, 2002 8:27 PM To: Derek Samford Cc: nanog@merit.edu Subject: Re: PSINet/Cogent Latency Yes, it's horrid. I've been peering with PSI for going on three years, and it's never been as bad as it is now. oddly enough, we see 30+ msec across a DS3 to them, which isn't that loaded (35 to 40 mb/s). Then, behind whatever we peer with, we see over 400 msec, with 50% loss, during business hours. On Mon, 22 Jul 2002, Derek Samford wrote:
There was some mail being tossed around earlier about Cogent
having
latency. I'm actually seeing this on PSINet (Now owned by Cogent.) Is anyone else still seeing the latency they were experiencing earlier?
Derek
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
Nah, that's not loaded. Its not loaded until you make it go in to alarm by passing traffic:):). ----- Original Message ----- From: "Phil Rosenthal" <pr@isprime.com> To: "'Alex Rubenstein'" <alex@nac.net> Cc: <nanog@merit.edu> Sent: Monday, July 22, 2002 6:05 PM Subject: RE: PSINet/Cogent Latency
40mb/s isn't "loaded" for a DS3?
--Phil
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Alex Rubenstein Sent: Monday, July 22, 2002 8:27 PM To: Derek Samford Cc: nanog@merit.edu Subject: Re: PSINet/Cogent Latency
Yes, it's horrid. I've been peering with PSI for going on three years, and it's never been as bad as it is now.
oddly enough, we see 30+ msec across a DS3 to them, which isn't that loaded (35 to 40 mb/s).
Then, behind whatever we peer with, we see over 400 msec, with 50% loss, during business hours.
On Mon, 22 Jul 2002, Derek Samford wrote:
There was some mail being tossed around earlier about Cogent
having
latency. I'm actually seeing this on PSINet (Now owned by Cogent.) Is anyone else still seeing the latency they were experiencing earlier?
Derek
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
bwahaha, 2 funnee. I gotta think most people would be thinking of adding another ds3 at that point. Bri ----- Original Message ----- From: "Phil Rosenthal" <pr@isprime.com> To: "'Alex Rubenstein'" <alex@nac.net> Cc: <nanog@merit.edu> Sent: Monday, July 22, 2002 6:05 PM Subject: RE: PSINet/Cogent Latency
40mb/s isn't "loaded" for a DS3?
--Phil
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Alex Rubenstein Sent: Monday, July 22, 2002 8:27 PM To: Derek Samford Cc: nanog@merit.edu Subject: Re: PSINet/Cogent Latency
Yes, it's horrid. I've been peering with PSI for going on three years, and it's never been as bad as it is now.
oddly enough, we see 30+ msec across a DS3 to them, which isn't that loaded (35 to 40 mb/s).
Then, behind whatever we peer with, we see over 400 msec, with 50% loss, during business hours.
On Mon, 22 Jul 2002, Derek Samford wrote:
There was some mail being tossed around earlier about Cogent
having
latency. I'm actually seeing this on PSINet (Now owned by Cogent.) Is anyone else still seeing the latency they were experiencing earlier?
Derek
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
You certainly would, except for the fact that the provider is in bankruptcy and won't/can't answer the phone. We wanted to do an oc3 or oc12 or gig-e, but that was replied to with, "wha?" On Mon, 22 Jul 2002, Brian wrote:
bwahaha, 2 funnee. I gotta think most people would be thinking of adding another ds3 at that point.
Bri
----- Original Message ----- From: "Phil Rosenthal" <pr@isprime.com> To: "'Alex Rubenstein'" <alex@nac.net> Cc: <nanog@merit.edu> Sent: Monday, July 22, 2002 6:05 PM Subject: RE: PSINet/Cogent Latency
40mb/s isn't "loaded" for a DS3?
--Phil
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Alex Rubenstein Sent: Monday, July 22, 2002 8:27 PM To: Derek Samford Cc: nanog@merit.edu Subject: Re: PSINet/Cogent Latency
Yes, it's horrid. I've been peering with PSI for going on three years, and it's never been as bad as it is now.
oddly enough, we see 30+ msec across a DS3 to them, which isn't that loaded (35 to 40 mb/s).
Then, behind whatever we peer with, we see over 400 msec, with 50% loss, during business hours.
On Mon, 22 Jul 2002, Derek Samford wrote:
There was some mail being tossed around earlier about Cogent
having
latency. I'm actually seeing this on PSINet (Now owned by Cogent.) Is anyone else still seeing the latency they were experiencing earlier?
Derek
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
I call any upstream link 'over capacity' if either: 1) There is less than 50mb/s unused 2) The circuit is more than 50% in use I guess by my definition a DS3 is always 'over capacity' --Phil -----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Brian Sent: Monday, July 22, 2002 9:36 PM To: pr@isprime.com; 'Alex Rubenstein' Cc: nanog@merit.edu Subject: Re: PSINet/Cogent Latency bwahaha, 2 funnee. I gotta think most people would be thinking of adding another ds3 at that point. Bri ----- Original Message ----- From: "Phil Rosenthal" <pr@isprime.com> To: "'Alex Rubenstein'" <alex@nac.net> Cc: <nanog@merit.edu> Sent: Monday, July 22, 2002 6:05 PM Subject: RE: PSINet/Cogent Latency
40mb/s isn't "loaded" for a DS3?
--Phil
-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Alex Rubenstein Sent: Monday, July 22, 2002 8:27 PM To: Derek Samford Cc: nanog@merit.edu Subject: Re: PSINet/Cogent Latency
Yes, it's horrid. I've been peering with PSI for going on three years,
and it's never been as bad as it is now.
oddly enough, we see 30+ msec across a DS3 to them, which isn't that loaded (35 to 40 mb/s).
Then, behind whatever we peer with, we see over 400 msec, with 50% loss, during business hours.
On Mon, 22 Jul 2002, Derek Samford wrote:
There was some mail being tossed around earlier about Cogent
having
latency. I'm actually seeing this on PSINet (Now owned by Cogent.) Is anyone else still seeing the latency they were experiencing earlier?
Derek
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
On Mon, 22 Jul 2002, Phil Rosenthal wrote:
I call any upstream link 'over capacity' if either: 1) There is less than 50mb/s unused
That must work well for T1's and DS3's.
2) The circuit is more than 50% in use
I call it 'over capacity' too, but that doesn't mean all the ducks are in a row to get both sides to realise an upgrade is needed, and even if they do realise it, to actually get it done. I am sure 2238092 people on this list can complain of the same problem. So, what do you do? You monitor it's usage, making adjustments to make sure it doesn't get clobbered. You can easily run DS-3s at 35 to 40 mbit/sec, with little to none increase in latency from the norm. Many people do this as well, even up to OC12 or higher levels all the time.
I guess by my definition a DS3 is always 'over capacity'
Which must work very well for those DS3's doing 10 to 20 mb/s. Do you upgrade those to OC3 or beyond? -- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
Actually, I wouldn't think about getting T1, DS3 or OC3 in the first place ;) Oc-12 is the minimum link I would even look at -- and my preference is gig-e... Even if there is only 90 megs on the interface... --Phil -----Original Message----- From: Alex Rubenstein [mailto:alex@nac.net] Sent: Monday, July 22, 2002 10:02 PM To: Phil Rosenthal Cc: nanog@merit.edu Subject: RE: PSINet/Cogent Latency On Mon, 22 Jul 2002, Phil Rosenthal wrote:
I call any upstream link 'over capacity' if either: 1) There is less than 50mb/s unused
That must work well for T1's and DS3's.
2) The circuit is more than 50% in use
I call it 'over capacity' too, but that doesn't mean all the ducks are in a row to get both sides to realise an upgrade is needed, and even if they do realise it, to actually get it done. I am sure 2238092 people on this list can complain of the same problem. So, what do you do? You monitor it's usage, making adjustments to make sure it doesn't get clobbered. You can easily run DS-3s at 35 to 40 mbit/sec, with little to none increase in latency from the norm. Many people do this as well, even up to OC12 or higher levels all the time.
I guess by my definition a DS3 is always 'over capacity'
Which must work very well for those DS3's doing 10 to 20 mb/s. Do you upgrade those to OC3 or beyond? -- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
Good for you, Phil. Chime in again when you've got something useful to offer. In the meantime, you may want to review Economics 101 along with certain queueing schemes, especially RED (no, I'm not endorsing the idea of oversubscribing to the extreme, but then again, neither was Alex). Also, re-read the previous post. There's a big difference between choice and facility. Did you grow up spending Summers in the Hamptons with no conception of the value of a dollar, or are you simply trolling? -brian On Mon, 22 Jul 2002, Phil Rosenthal wrote: : :Actually, I wouldn't think about getting T1, DS3 or OC3 in the first :place ;) :Oc-12 is the minimum link I would even look at -- and my preference is :gig-e... Even if there is only 90 megs on the interface... : :--Phil : :-----Original Message----- :From: Alex Rubenstein [mailto:alex@nac.net] :Sent: Monday, July 22, 2002 10:02 PM :To: Phil Rosenthal :Cc: nanog@merit.edu :Subject: RE: PSINet/Cogent Latency : : : : :On Mon, 22 Jul 2002, Phil Rosenthal wrote: : :> :> I call any upstream link 'over capacity' if either: :> 1) There is less than 50mb/s unused : :That must work well for T1's and DS3's. : : :> 2) The circuit is more than 50% in use : :I call it 'over capacity' too, but that doesn't mean all the ducks are :in a row to get both sides to realise an upgrade is needed, and even if :they do realise it, to actually get it done. I am sure 2238092 people on :this list can complain of the same problem. : :So, what do you do? You monitor it's usage, making adjustments to make :sure it doesn't get clobbered. You can easily run DS-3s at 35 to 40 :mbit/sec, with little to none increase in latency from the norm. Many :people do this as well, even up to OC12 or higher levels all the time. : : : : :> I guess by my definition a DS3 is always 'over capacity' : :Which must work very well for those DS3's doing 10 to 20 mb/s. Do you :upgrade those to OC3 or beyond? : : :-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- :-- Net Access Corporation, 800-NET-ME-36, http://www.nac.net -- : : : :
With the price of transit where it is today: #1 Transit is often cheaper than peering (if you factor in port costs on public exchanges, or link costs for private exchanges) #2 The difference in price is likely not large enough for me to risk: saturation, latency, etc... My customers pay me to provide them a premium service, and I see value in providing that service. Some people have no problem selling cogent -- what can I say... You get what you pay for... And no, I'm not trolling. Is having a different opinion not allowed now? And 40mbit over a 45mbit circuit, if it is to an uplink/peer -- well, if he has customers who are connected at 100mbit switched uncapped (likely) -- then many customers (possibly even some DSL customers...) can flood off his peer links with only a 5mbit stream. --Phil -----Original Message----- From: Brian Wallingford [mailto:brian@meganet.net] Sent: Monday, July 22, 2002 11:13 PM To: Phil Rosenthal Cc: 'Alex Rubenstein'; nanog@merit.edu Subject: RE: PSINet/Cogent Latency Good for you, Phil. Chime in again when you've got something useful to offer. In the meantime, you may want to review Economics 101 along with certain queueing schemes, especially RED (no, I'm not endorsing the idea of oversubscribing to the extreme, but then again, neither was Alex). Also, re-read the previous post. There's a big difference between choice and facility. Did you grow up spending Summers in the Hamptons with no conception of the value of a dollar, or are you simply trolling? -brian On Mon, 22 Jul 2002, Phil Rosenthal wrote: : :Actually, I wouldn't think about getting T1, DS3 or OC3 in the first :place ;) :Oc-12 is the minimum link I would even look at -- and my preference is :gig-e... Even if there is only 90 megs on the interface... : :--Phil : :-----Original Message----- :From: Alex Rubenstein [mailto:alex@nac.net] :Sent: Monday, July 22, 2002 10:02 PM :To: Phil Rosenthal :Cc: nanog@merit.edu :Subject: RE: PSINet/Cogent Latency : : : : :On Mon, 22 Jul 2002, Phil Rosenthal wrote: : :> :> I call any upstream link 'over capacity' if either: :> 1) There is less than 50mb/s unused : :That must work well for T1's and DS3's. : : :> 2) The circuit is more than 50% in use : :I call it 'over capacity' too, but that doesn't mean all the ducks are :in a row to get both sides to realise an upgrade is needed, and even if :they do realise it, to actually get it done. I am sure 2238092 people on :this list can complain of the same problem. : :So, what do you do? You monitor it's usage, making adjustments to make :sure it doesn't get clobbered. You can easily run DS-3s at 35 to 40 :mbit/sec, with little to none increase in latency from the norm. Many :people do this as well, even up to OC12 or higher levels all the time. : : : : :> I guess by my definition a DS3 is always 'over capacity' : :Which must work very well for those DS3's doing 10 to 20 mb/s. Do you :upgrade those to OC3 or beyond? : : :-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- :-- Net Access Corporation, 800-NET-ME-36, http://www.nac.net -- : : : :
On Mon, 22 Jul 2002, Phil Rosenthal wrote: : :With the price of transit where it is today: :#1 Transit is often cheaper than peering (if you factor in port costs on :public exchanges, or link costs for private exchanges) :#2 The difference in price is likely not large enough for me to risk: :saturation, latency, etc... : :My customers pay me to provide them a premium service, and I see value :in providing that service. : :Some people have no problem selling cogent -- what can I say... You get :what you pay for... : :And no, I'm not trolling. Is having a different opinion not allowed :now? :And 40mbit over a 45mbit circuit, if it is to an uplink/peer -- well, if :he has customers who are connected at 100mbit switched uncapped (likely) :-- then many customers (possibly even some DSL customers...) can flood :off his peer links with only a 5mbit stream. Much better. Your prior posts lacked context and continuity. I've always advocated overprovisioning myself, vs. creative buffering, queuing, and/or "distracting" the end user. The statement "I wouldn't think of getting T1, DS3 or OC3 in the fist place", without context, easily lends itself to misinterpretation. cheers, brian : :--Phil : :-----Original Message----- :From: Brian Wallingford [mailto:brian@meganet.net] :Sent: Monday, July 22, 2002 11:13 PM :To: Phil Rosenthal :Cc: 'Alex Rubenstein'; nanog@merit.edu :Subject: RE: PSINet/Cogent Latency : : :Good for you, Phil. Chime in again when you've got something useful to :offer. : :In the meantime, you may want to review Economics 101 along with certain :queueing schemes, especially RED (no, I'm not endorsing the idea of :oversubscribing to the extreme, but then again, neither was Alex). : :Also, re-read the previous post. There's a big difference between :choice and facility. : :Did you grow up spending Summers in the Hamptons with no conception of :the value of a dollar, or are you simply trolling? : :-brian : : :On Mon, 22 Jul 2002, Phil Rosenthal wrote: : :: ::Actually, I wouldn't think about getting T1, DS3 or OC3 in the first ::place ;) :Oc-12 is the minimum link I would even look at -- and my :preference is :gig-e... Even if there is only 90 megs on the :interface... :: ::--Phil :: ::-----Original Message----- ::From: Alex Rubenstein [mailto:alex@nac.net] ::Sent: Monday, July 22, 2002 10:02 PM ::To: Phil Rosenthal ::Cc: nanog@merit.edu ::Subject: RE: PSINet/Cogent Latency :: :: :: :: ::On Mon, 22 Jul 2002, Phil Rosenthal wrote: :: ::> ::> I call any upstream link 'over capacity' if either: ::> 1) There is less than 50mb/s unused :: ::That must work well for T1's and DS3's. :: :: ::> 2) The circuit is more than 50% in use :: ::I call it 'over capacity' too, but that doesn't mean all the ducks are ::in a row to get both sides to realise an upgrade is needed, and even if ::they do realise it, to actually get it done. I am sure 2238092 people :on :this list can complain of the same problem. :: ::So, what do you do? You monitor it's usage, making adjustments to make ::sure it doesn't get clobbered. You can easily run DS-3s at 35 to 40 ::mbit/sec, with little to none increase in latency from the norm. Many ::people do this as well, even up to OC12 or higher levels all the time. :: :: :: :: ::> I guess by my definition a DS3 is always 'over capacity' :: ::Which must work very well for those DS3's doing 10 to 20 mb/s. Do you ::upgrade those to OC3 or beyond? :: :: ::-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- ::-- Net Access Corporation, 800-NET-ME-36, http://www.nac.net -- :: :: :: :: : : :
On Mon, Jul 22, 2002 at 10:01:36PM -0400, Alex Rubenstein wrote:
So, what do you do? You monitor it's usage, making adjustments to make sure it doesn't get clobbered. You can easily run DS-3s at 35 to 40 mbit/sec, with little to none increase in latency from the norm. Many people do this as well, even up to OC12 or higher levels all the time.
Just remember that while a 5 minute average may not be at 100%, the microbursts are probably quite a bit over that. For an ISP who actually cares about making money it's not *easy* to say "I'm terminating my peer to PSI because of their degraded performance and unwillingness to upgrade", but a de-localpref'ing is probably a good idea. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
My point exactly -- I guess some people disagree... Probably with any sort of queuing there will only be minimal packet loss at 40mbit, but at any point one more stream can push it up to 43mbit, and then queuing might no longer be enough... (and even if it is, can we say lag?) --Phil -----Original Message----- From: Randy Bush [mailto:randy@psg.com] Sent: Monday, July 22, 2002 11:31 PM To: Phil Rosenthal Cc: nanog@merit.edu Subject: RE: PSINet/Cogent Latency
40mb/s isn't "loaded" for a DS3?
if you are measuring 40mb at five min intervals, micro peaks are pegged out causing serious packet loss. randy
On Mon, Jul 22, 2002 at 11:34:44PM -0400, Phil Rosenthal wrote:
My point exactly -- I guess some people disagree... Probably with any sort of queuing there will only be minimal packet loss at 40mbit, but at any point one more stream can push it up to 43mbit, and then queuing might no longer be enough... (and even if it is, can we say lag?)
Efficient packet loss is still packet loss. Just because you manage to make the link "look good" by slowing down TCP before your queueing latency starts going up doesn't make your network any less ghetto. IMHO the biggest problem in peering is getting the other side to actively upgrade links to prevent congestion. If you're not in a position where you can dictate terms to your peer, move traffic off it and let economics take care of the rest. Leaving a congested peer up for your own benefit at the expense of your customers is one of the surest ways to lose customers to someone who doesn't. I'd rather have a noncongested gige public peer than a ds3 private peer any day. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
I'd rather have a noncongested gige public peer than a ds3 private
From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Richard A Steenbergen On Mon, Jul 22, 2002 at 11:34:44PM -0400, Phil Rosenthal wrote: peer any day. Except apparently that's called trolling ;) --Phil
Is there patch or special config example available that would allow me to use mrtg (or rather rrdtool) to measure more often and then graph it in a way that would show standard 5-min graph but also separate line showing those micro burst and actual peak usage? On Mon, 22 Jul 2002, Randy Bush wrote:
40mb/s isn't "loaded" for a DS3?
if you are measuring 40mb at five min intervals, micro peaks are pegged out causing serious packet loss.
randy
On Mon, Jul 22, 2002 at 08:38:58PM -0700, william@elan.net wrote: >
Is there patch or special config example available that would allow me to use mrtg (or rather rrdtool) to measure more often and then graph it in a way that would show standard 5-min graph but also separate line showing those micro burst and actual peak usage?
Cricket (cricket.sourceforge.net). -- - mdz
An effective way would to graph queue drops: Serial4/1/1 is up, line protocol is up Description: to PSI via 3x-xxx-xxx-xxxx Internet address is 154.13.64.22/30 Last clearing of "show interface" counters 5w4d Queueing strategy: fifo Output queue 0/40, 2275 drops; input queue 0/75, 0 drops 30 second input rate 5000 bits/sec, 6 packets/sec 30 second output rate 39911000 bits/sec, 4697 packets/sec 144472370 packets input, 2769590243 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 1 giants, 0 throttles 0 parity 5 input errors, 5 CRC, 0 frame, 0 overrun, 1 ignored, 0 abort 1969955129 packets output, 430008350 bytes, 0 underruns FYI, for those of you commenting on my full PSI pipe, with a very small queue depth of only 40 packets, we've seen 0.00011548% percent drop -- 1 in every 865914 packets sent. Agreed, not 0%, but still, arguably that would never, ever be noticed by anyone. Once again, I don't condone; however, 1/10000th of a percent of packet loss is easily worth the decreased cost in traffic sent to this endpoint. Anyone disagree? (an important a seperate note is that CAR/CEF drops due to ICMP reaching over 10 mb/s would trigger the same counter) On Mon, 22 Jul 2002, william@elan.net wrote:
Is there patch or special config example available that would allow me to use mrtg (or rather rrdtool) to measure more often and then graph it in a way that would show standard 5-min graph but also separate line showing those micro burst and actual peak usage?
On Mon, 22 Jul 2002, Randy Bush wrote:
40mb/s isn't "loaded" for a DS3?
if you are measuring 40mb at five min intervals, micro peaks are pegged out causing serious packet loss.
randy
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
As you probably guessed, I do... TCP is designed to not saturate links, so... If you take what should be 60 megs of traffic and put it limit it to 45, else queue for a while, or drop if queue full... The sessions will slow-start back up to a slow enough speed that wont drop. No (or very little) packet loss, but lower quality of service anyway. --Phil -----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Alex Rubenstein Sent: Tuesday, July 23, 2002 12:05 AM To: william@elan.net Cc: nanog@merit.edu Subject: RE: PSINet/Cogent Latency An effective way would to graph queue drops: Serial4/1/1 is up, line protocol is up Description: to PSI via 3x-xxx-xxx-xxxx Internet address is 154.13.64.22/30 Last clearing of "show interface" counters 5w4d Queueing strategy: fifo Output queue 0/40, 2275 drops; input queue 0/75, 0 drops 30 second input rate 5000 bits/sec, 6 packets/sec 30 second output rate 39911000 bits/sec, 4697 packets/sec 144472370 packets input, 2769590243 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 1 giants, 0 throttles 0 parity 5 input errors, 5 CRC, 0 frame, 0 overrun, 1 ignored, 0 abort 1969955129 packets output, 430008350 bytes, 0 underruns FYI, for those of you commenting on my full PSI pipe, with a very small queue depth of only 40 packets, we've seen 0.00011548% percent drop -- 1 in every 865914 packets sent. Agreed, not 0%, but still, arguably that would never, ever be noticed by anyone. Once again, I don't condone; however, 1/10000th of a percent of packet loss is easily worth the decreased cost in traffic sent to this endpoint. Anyone disagree? (an important a seperate note is that CAR/CEF drops due to ICMP reaching over 10 mb/s would trigger the same counter) On Mon, 22 Jul 2002, william@elan.net wrote:
Is there patch or special config example available that would allow me
to use mrtg (or rather rrdtool) to measure more often and then graph it in a way that would show standard 5-min graph but also separate line showing those micro burst and actual peak usage?
On Mon, 22 Jul 2002, Randy Bush wrote:
40mb/s isn't "loaded" for a DS3?
if you are measuring 40mb at five min intervals, micro peaks are pegged out causing serious packet loss.
randy
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
On Tue, Jul 23, 2002 at 12:04:34AM -0400, Alex Rubenstein wrote:
An effective way would to graph queue drops:
Serial4/1/1 is up, line protocol is up
ifInDiscards = 1.3.6.1.2.1.2.2.1.13 ifOutDiscards = 1.3.6.1.2.1.2.2.1.19 A far more interesting thing to graph than temperature IMHO. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
On Mon, Jul 22, 2002 at 08:38:58PM -0700, william@elan.net wrote:
Is there patch or special config example available that would allow me to use mrtg (or rather rrdtool) to measure more often and then graph it in a way that would show standard 5-min graph but also separate line showing those micro burst and actual peak usage?
It's usually not practical to sample data that often, at least over snmp. 30 seconds is reasonable if your poller doesn't suck (aka not mrtg), but thats still a fair amount of averaging. As an example, looking at an interface doing 135Mbps average on a pretty steady curve through Juniper's "monitor interface" which gives 2 second samples, I see between 120Mbps and 150Mbps fluctuations almost constantly. Personally I would like to see the data collection done on the router itself where it is simple to collect data very frequently, then pushed out. This is particularly important when you are doing things like billing 95th percentile, where a loss of connectivity between the polling machine and the device is a loss of billing information. Why Juniper won't spend 5 minutes to make a simple lib so a program could sample interface counters, so someone could write this kind of system to run on the RE, is beyond me. I blame generations of dumbed down network engineers wielding perl as their only tool. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
----- Original Message ----- From: "Richard A Steenbergen" <ras@e-gerbil.net> Subject: Re: PSINet/Cogent Latency
Personally I would like to see the data collection done on the router itself where it is simple to collect data very frequently, then pushed out. This is particularly important when you are doing things like billing 95th percentile, where a loss of connectivity between the polling machine and the device is a loss of billing information.
Redbacks can actually do this with what they call Bulkstats. Collects data on specified interfaces and ftp uploads the data file every so specified often. Pretty slick. Course, this isn't very helpful with Redback's extensive core router lineup, but still. --Doug
Call me crazy -- but what's wrong with setting up RRDtool with a heartbeat time of 30 seconds, and putting in cron: * * * * * rrdscript.sh ; sleep 30s ; rrdscript.sh Wouldn't work just as well? I haven't tried it -- so perhaps this is too taxing (probably you would only run this on a few interfaces anyway)... The last time I tested such a thing was on an uplink doing ~200 mgs and deviation was about +/- 5mbs per second --Phil -----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu] On Behalf Of Doug Clements Sent: Tuesday, July 23, 2002 12:59 AM To: Richard A Steenbergen Cc: nanog@merit.edu Subject: Re: PSINet/Cogent Latency ----- Original Message ----- From: "Richard A Steenbergen" <ras@e-gerbil.net> Subject: Re: PSINet/Cogent Latency
Personally I would like to see the data collection done on the router itself where it is simple to collect data very frequently, then pushed
out. This is particularly important when you are doing things like billing 95th percentile, where a loss of connectivity between the polling machine and the device is a loss of billing information.
Redbacks can actually do this with what they call Bulkstats. Collects data on specified interfaces and ftp uploads the data file every so specified often. Pretty slick. Course, this isn't very helpful with Redback's extensive core router lineup, but still. --Doug
----- Original Message ----- From: "Phil Rosenthal" <pr@isprime.com> Subject: RE: PSINet/Cogent Latency
Call me crazy -- but what's wrong with setting up RRDtool with a heartbeat time of 30 seconds, and putting in cron: * * * * * rrdscript.sh ; sleep 30s ; rrdscript.sh
Wouldn't work just as well?
I haven't tried it -- so perhaps this is too taxing (probably you would only run this on a few interfaces anyway)...
Redback's implementation overcame the limitation of monitoring say, 20,000 user circuits. You don't want to poll 20,000 interfaces for maybe 4 counters each, every 5 minutes. I think the problem with using rrdtool for billing purposes as described is that data can (and does) get lost. If your poller is a few cycles late, the burstable bandwidth measured goes up when the poller catches up to the interface counters. More bursting is bad for %ile (or good if you're selling it), and the customer won't like the fact that they're getting charged for artifically high measurements. Bulkstats lets the measurement happen independant of the reporting. --Doug
I don't think RRD is that bad if you are gonna check only every 5 minutes... Again, perhaps I'm just missing something, but so lets say you measure 30 seconds late , and it thinks its on time -- So that one sample will be higher , then the next one will be on time, so 30 seconds early for that sample -- it will be lower. On the whole -- it will be accurate enough -- no? Besides I think RRD has a bunch of things built in to deal with precisely this problem. I'm not saying a hardware solution can't be better -- but it is likely overkill compared to a few cheap intels running RRD -- assuming your snmpd can deal with the load... --Phil -----Original Message----- From: Doug Clements [mailto:dsclements@linkline.com] Sent: Tuesday, July 23, 2002 1:50 AM To: pr@isprime.com Cc: nanog@merit.edu Subject: Re: PSINet/Cogent Latency ----- Original Message ----- From: "Phil Rosenthal" <pr@isprime.com> Subject: RE: PSINet/Cogent Latency
Call me crazy -- but what's wrong with setting up RRDtool with a heartbeat time of 30 seconds, and putting in cron: * * * * * rrdscript.sh ; sleep 30s ; rrdscript.sh
Wouldn't work just as well?
I haven't tried it -- so perhaps this is too taxing (probably you would only run this on a few interfaces anyway)...
Redback's implementation overcame the limitation of monitoring say, 20,000 user circuits. You don't want to poll 20,000 interfaces for maybe 4 counters each, every 5 minutes. I think the problem with using rrdtool for billing purposes as described is that data can (and does) get lost. If your poller is a few cycles late, the burstable bandwidth measured goes up when the poller catches up to the interface counters. More bursting is bad for %ile (or good if you're selling it), and the customer won't like the fact that they're getting charged for artifically high measurements. Bulkstats lets the measurement happen independant of the reporting. --Doug
On Tue, Jul 23, 2002 at 01:56:45AM -0400, Phil Rosenthal wrote:
I don't think RRD is that bad if you are gonna check only every 5 minutes...
RRD doesn't measure anything, it stores and graphs data. The perl pollers everyone is using can barely keep up with 5 minute samples on a couple dozen routers and a few hundred interfaces, requiring "poller farms" to be distributed across a network, 'lest a box or part of the network break and you lose data.
Again, perhaps I'm just missing something, but so lets say you measure 30 seconds late , and it thinks its on time -- So that one sample will be higher , then the next one will be on time, so 30 seconds early for that sample -- it will be lower. On the whole -- it will be accurate enough -- no?
"enough" is a relative term, but sure. :)
I'm not saying a hardware solution can't be better -- but it is likely overkill compared to a few cheap intels running RRD -- assuming your snmpd can deal with the load...
What hardware... storing a few byte counters is trivial, but polling them through snmp is what is hard (never trust a protocol named "simple" or "trivial"). Creating a buffer of samples which can be periodically sampled should be easy and painless. I don't know if I call periodic ftp "painless" but its certainly a start. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
I have a small RRD project box that polls 200 interfaces and has it takes 1 minute, 5 seconds to run with 60% cpu usage (so obviously it can be streamlined if I wanted to work on it). I guess the limit in this implementation is 1000 interfaces per box in this setup -- but I see most of the CPU usage is in the forking of snmpget over and over. Im sure I could write a small program in C that could do this at least 10X more efficiently. That's 10,000 interfaces with RRD on one intel -- if you are determined to do it. I think if you are billing 10k interfaces, you can afford a 2nd intel box to check the 2nd 10,000, no? My point is that if you have sufficient clue, time, and motivation -- Today's generic PCs are capable to do many "large" tasks... --Phil -----Original Message----- From: Richard A Steenbergen [mailto:ras@e-gerbil.net] Sent: Tuesday, July 23, 2002 2:10 AM To: Phil Rosenthal Cc: 'Doug Clements'; nanog@merit.edu Subject: Re: PSINet/Cogent Latency On Tue, Jul 23, 2002 at 01:56:45AM -0400, Phil Rosenthal wrote:
I don't think RRD is that bad if you are gonna check only every 5 minutes...
RRD doesn't measure anything, it stores and graphs data. The perl pollers everyone is using can barely keep up with 5 minute samples on a couple dozen routers and a few hundred interfaces, requiring "poller farms" to be distributed across a network, 'lest a box or part of the network break and you lose data.
Again, perhaps I'm just missing something, but so lets say you measure
30 seconds late , and it thinks its on time -- So that one sample will
be higher , then the next one will be on time, so 30 seconds early for
that sample -- it will be lower. On the whole -- it will be accurate enough -- no?
"enough" is a relative term, but sure. :)
I'm not saying a hardware solution can't be better -- but it is likely
overkill compared to a few cheap intels running RRD -- assuming your snmpd can deal with the load...
What hardware... storing a few byte counters is trivial, but polling them through snmp is what is hard (never trust a protocol named "simple" or "trivial"). Creating a buffer of samples which can be periodically sampled should be easy and painless. I don't know if I call periodic ftp "painless" but its certainly a start. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
On Tue, 23 July 2002 02:25:36 -0400, Phil Rosenthal wrote:
I have a small RRD project box that polls 200 interfaces and has it takes 1 minute, 5 seconds to run with 60% cpu usage (so obviously it can be streamlined if I wanted to work on it). I guess the limit in this implementation is 1000 interfaces per box in this setup -- but I see most of the CPU usage is in the forking of snmpget over and over. Im sure I could write a small program in C that could do this at least 10X more efficiently. That's 10,000 interfaces with RRD on one intel -- if you are determined to do it.
I think if you are billing 10k interfaces, you can afford a 2nd intel box to check the 2nd 10,000, no?
Phil, imagine some four routers dying or not answering queries, you will see the poll script give you timeout after timeout after timeout and with some 50 to 100 routers and the respective interfaces you see mrtg choke badly, losing data. You see, the poll script is doing one after the other, mainly, so you wait too long and then the next run starts and then something. mrtg/rrd is not the tool of choice for accounting / billing but nice enough for showing you 'backup' graphs for visitors probably. Alexander
Yo Alexander! On Tue, 23 Jul 2002, Alexander Koch wrote:
imagine some four routers dying or not answering queries, you will see the poll script give you timeout after timeout after timeout and with some 50 to 100 routers and the respective interfaces you see mrtg choke badly, losing data.
Yep. Anything gets behind and it all gets behind. That is why we run multiple copies of MRTG. That way polling for one set of hosts does not have to wait for another set. If one set is timing out the other just keeps on as usual. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 20340 Empire Blvd, Suite E-3, Bend, OR 97701 gem@rellim.com Tel:+1(541)382-8588 Fax: +1(541)382-8676
On Mon, Jul 22, 2002 at 11:42:57PM -0700, Gary E. Miller wrote:
Yo Alexander!
On Tue, 23 Jul 2002, Alexander Koch wrote:
imagine some four routers dying or not answering queries, you will see the poll script give you timeout after timeout after timeout and with some 50 to 100 routers and the respective interfaces you see mrtg choke badly, losing data.
Yep. Anything gets behind and it all gets behind.
That is why we run multiple copies of MRTG. That way polling for one set of hosts does not have to wait for another set. If one set is timing out the other just keeps on as usual.
Parallelism is polling science 101. If your poller can't do this, it will never scale, just give up and go home. And I mean controlled parallelism, not forking out all your queries at once and letting the system sort it out (as I've seen done by people waving their redhat cds and perl tshirts). -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
Some long long long time ago I wrote a small tool called snmpstatd. Back then Sprint management was gracious to allow me to release it as a public-domain code. It basically collects usage statistics (in 30-sec "peaks" and 5-min averages), memory and CPU utilization from routers, by performing _asynchronous_ SNMP polling. I believe it can scale to about 5000-10000 routers. It also performs accurate time base interpolation for 30-sec sampling (i.e. it always requests router's local time and uses it for computing accurate 30-sec peak usage). The data is stored in text files which are extremely easy to parse. The configuration is text-based; it also includes compact status alarm output (i.e. which routers/links are down), PostScript chart generator, and troff/nroff based text report generator, with summary downtime and usage figures + significant events. The tool was used routinely to produce reporting on ICM-NET performance for NSF. This thing may need some hacking to accomodate later-day IOS bogosities, though. If anyone wants it, I have it at www.kotovnik.com/~avg/snmpstatd.tar.gz --vadim On Mon, 22 Jul 2002, Gary E. Miller wrote:
Yo Alexander!
On Tue, 23 Jul 2002, Alexander Koch wrote:
imagine some four routers dying or not answering queries, you will see the poll script give you timeout after timeout after timeout and with some 50 to 100 routers and the respective interfaces you see mrtg choke badly, losing data.
Yep. Anything gets behind and it all gets behind.
That is why we run multiple copies of MRTG. That way polling for one set of hosts does not have to wait for another set. If one set is timing out the other just keeps on as usual.
RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 20340 Empire Blvd, Suite E-3, Bend, OR 97701 gem@rellim.com Tel:+1(541)382-8588 Fax: +1(541)382-8676
On Tue, Jul 23, 2002 at 08:34:40AM +0200, Alexander Koch wrote:
Phil,
imagine some four routers dying or not answering queries, you will see the poll script give you timeout after timeout after timeout and with some 50 to 100 routers and the respective interfaces you see mrtg choke badly, losing data.
You see, the poll script is doing one after the other, mainly, so you wait too long and then the next run starts and then something.
mrtg/rrd is not the tool of choice for accounting / billing but nice enough for showing you 'backup' graphs for visitors probably.
Hi.
From http://people.ee.ethz.ch/~oetiker/webtools/mrtg/reference.html:
Forks (UNIX only) On a system that can fork (UNIX for example) mrtg can fork itself into multiple instances while it is acquiring data via snmp. For situations with high latency or a great number of devices this will speed things up considerably. It will not make things faster though if you query a single switch sitting next door. As far as I know NT can not fork so this option is not available on NT. Example: Forks: 4 Of course, people would have to read the documentation first..
Alexander
-- Matthew S. Hallacy FUBAR, LART, BOFH Certified http://www.poptix.net GPG public key 0x01938203
On Tue, Jul 23, 2002 at 02:25:36AM -0400, Phil Rosenthal wrote:
I have a small RRD project box that polls 200 interfaces and has it takes 1 minute, 5 seconds to run with 60% cpu usage (so obviously it can be streamlined if I wanted to work on it). I guess the limit in this implementation is 1000 interfaces per box in this setup -- but I see most of the CPU usage is in the forking of snmpget over and over. Im sure I could write a small program in C that could do this at least 10X more efficiently. That's 10,000 interfaces with RRD on one intel -- if you are determined to do it.
10x? Wanna try a higher order of magnitude? While you're at it, eliminate the forking to the rrdtool bin when you're adding data. A little thought and profiling goes a long way, this is simple number crunching we're talking about, not supercomputer work. The problem comes from the perl mentality (why is there no C lib for efficiently adding to an rrd db? because they're expecting everyone to call it from perl :P), "it's good enough for my couple boxes and you can throw more machines at it". But again, I have no doubt that if you designed it properly you could throw lots of snmp queries and scale decently to a nice sized core network, I've seen it done. The problem is potential communication loss between the poller and the device, and the amount of work that the device (which usually isn't running gods gift to any code let alone snmp code) has to do for higher sampling rates with many interfaces. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
On Tue, Jul 23, 2002 at 02:40:10AM -0400, Richard A Steenbergen wrote:
While you're at it, eliminate the forking to the rrdtool bin when you're adding data. A little thought and profiling goes a long way, this is simple number crunching we're talking about, not supercomputer work. The problem comes from the perl mentality (why is there no C lib for efficiently adding to an rrd db? because they're expecting everyone to call it from perl :P), "it's good enough for my couple boxes and you can throw more machines at it".
There is a C library, librrd. That is how the other language APIs are built. As to efficiency, there is a lot of stringification, which is inconvenient and unnatural in C, but this should not be the bottleneck in the collection operation.
But again, I have no doubt that if you designed it properly you could throw lots of snmp queries and scale decently to a nice sized core network, I've seen it done. The problem is potential communication loss between the poller and the device, and the amount of work that the device (which usually isn't running gods gift to any code let alone snmp code) has to do for higher sampling rates with many interfaces.
That said, bulk statistical exports from the device itself can easily be more implemented efficiently than SNMP. But unless the export process is universally standardized, SNMP (for all its warts, and it has many) will still have an edge in that it works nearly everywhere (for varying values of "works"). -- - mdz
On Tue, Jul 23, 2002 at 09:53:41AM -0400, Matt Zimmerman wrote:
There is a C library, librrd. That is how the other language APIs are built. As to efficiency, there is a lot of stringification, which is inconvenient and unnatural in C, but this should not be the bottleneck in the collection operation.
Yeah that thing, but I don't consider "rrdtool(argc, argv);" to be an actual lib interface. :) Where the bottleneck happens is dependant on a lot of things. I've seen libsnmp using code (which is vile at best) put out more than enough queries, so it can be done. After that, your bottleneck probably would be all the string parsing and file opening for every transaction in RRD. For a laugh, consider the fact that mrtg used to PREPEND data to its .log files on every poll. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
On Tue, 23 Jul 2002, Phil Rosenthal wrote:
I have a small RRD project box that polls 200 interfaces and has it takes 1 minute, 5 seconds to run with 60% cpu usage (so obviously it can be streamlined if I wanted to work on it). I guess the limit in this implementation is 1000 interfaces per box in this setup -- but I see most of the CPU usage is in the forking of snmpget over and over. Im sure I could write a small program in C that could do this at least 10X more efficiently. That's 10,000 interfaces with RRD on one intel -- if you are determined to do it.
Interesting. We have a dual p3-700, doing LOTS of other things, which does 1600 interfaces under MRTG using small amounts of CPU. You are using 'Forks', if you're using MRTG, no? This whole process takes less than 2 minutes.
I think if you are billing 10k interfaces, you can afford a 2nd intel box to check the 2nd 10,000, no?
First and foremost, you said RRD, not billing. Who uses RRD for billing purposes?
My point is that if you have sufficient clue, time, and motivation -- Today's generic PCs are capable to do many "large" tasks...
Quite. In regards to billing, we have some home grown software that (don't laugh too hard) runs as an NT service; it collects 1,700 ports of information every five minutes (Bytes[In|Out], BitsSec[In|Out], AdminStatus, OperStatus, Time) in only 60 seconds; we've found the best way to do this is to blast SNMP requests, and wait for replies which are then event driven; wait 10 seconds, retry all the ones we get, then try again. We've found that this works the best, having tried about 4 different ways of doing it over the last 5 years. It's all then nicely stored in a SQL DB. -- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
----- Original Message ----- From: "Phil Rosenthal" <pr@isprime.com> Subject: RE: PSINet/Cogent Latency
I don't think RRD is that bad if you are gonna check only every 5 minutes...
Again, perhaps I'm just missing something, but so lets say you measure 30 seconds late , and it thinks its on time -- So that one sample will be higher , then the next one will be on time, so 30 seconds early for that sample -- it will be lower. On the whole -- it will be accurate enough -- no?
If you're polling every 5 minutes, with 2 retrys per poll, and you miss 2 retrys, then your next poll will be 5 minutes late. It's not disastrous, but it's also not perfect. Again, peaks and vallys on your graph cost more than smooth lines, even with the same total bandwidth. Do you want to be the one to tell your customers your billing setup is "accurate enough", and especially that it's going to have a tendancy to be "accurate enough" in your favor?
Besides I think RRD has a bunch of things built in to deal with precisely this problem.
Wouldn't that be just spiffy!
I'm not saying a hardware solution can't be better -- but it is likely overkill compared to a few cheap intels running RRD -- assuming your snmpd can deal with the load...
No extra hardware needed. I think the desired solution was integration into the router. The data is already there, you just need software to compile it and ship it out via a reliable reporting mechanism. For being relatively simple, it's a nice idea that it could replace the "almost" in an "almost accurate" billing process. --Doug
I see your point, but I still think RRD is "good enough". If cisco/foundry/juniper added this to their respective OS's -- I'd be a happy camper... If they don't -- I won't lose sleep over it. --Phil -----Original Message----- From: Doug Clements [mailto:dsclements@linkline.com] Sent: Tuesday, July 23, 2002 2:12 AM To: pr@isprime.com Cc: nanog@merit.edu Subject: Re: PSINet/Cogent Latency ----- Original Message ----- From: "Phil Rosenthal" <pr@isprime.com> Subject: RE: PSINet/Cogent Latency
I don't think RRD is that bad if you are gonna check only every 5 minutes...
Again, perhaps I'm just missing something, but so lets say you measure
30 seconds late , and it thinks its on time -- So that one sample will
be higher , then the next one will be on time, so 30 seconds early for
that sample -- it will be lower. On the whole -- it will be accurate enough -- no?
If you're polling every 5 minutes, with 2 retrys per poll, and you miss 2 retrys, then your next poll will be 5 minutes late. It's not disastrous, but it's also not perfect. Again, peaks and vallys on your graph cost more than smooth lines, even with the same total bandwidth. Do you want to be the one to tell your customers your billing setup is "accurate enough", and especially that it's going to have a tendancy to be "accurate enough" in your favor?
Besides I think RRD has a bunch of things built in to deal with precisely this problem.
Wouldn't that be just spiffy!
I'm not saying a hardware solution can't be better -- but it is likely
overkill compared to a few cheap intels running RRD -- assuming your snmpd can deal with the load...
No extra hardware needed. I think the desired solution was integration into the router. The data is already there, you just need software to compile it and ship it out via a reliable reporting mechanism. For being relatively simple, it's a nice idea that it could replace the "almost" in an "almost accurate" billing process. --Doug
On Mon, Jul 22, 2002 at 10:50:03PM -0700, Doug Clements wrote:
I think the problem with using rrdtool for billing purposes as described is that data can (and does) get lost. If your poller is a few cycles late, the burstable bandwidth measured goes up when the poller catches up to the interface counters. More bursting is bad for %ile (or good if you're selling it), and the customer won't like the fact that they're getting charged for artifically high measurements.
RRDtool takes into account the time at which the sample was collected, and if it does not exactly match the expected sampling period, it is resampled on the fly. See: http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/tutorial/rrdtutorial.html under "Data Resampling" for more information. RRDtool has some quirks when used for billing purposes, but it is not guilty of the error that you describe. -- - mdz
Packet loss is not guaranteed, especially considering the queuing mechanism used is not disclosed. IE, a simply hold queue north of 2048 will cause no loss, but the occasional jitter/latency, most likely not even measureable by common endpoints on the net. I'm not endorsing, just correcting. On Mon, 22 Jul 2002, Randy Bush wrote:
40mb/s isn't "loaded" for a DS3?
if you are measuring 40mb at five min intervals, micro peaks are pegged out causing serious packet loss.
randy
-- Alex Rubenstein, AR97, K2AHR, alex@nac.net, latency, Al Reuben -- -- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --
On Mon, 22 Jul 2002, Alex Rubenstein wrote:
Yes, it's horrid. I've been peering with PSI for going on three years, and it's never been as bad as it is now.
I took advantage of their "free peering" offer back in the day, and ended up peering with them for about 18 months (06/1999 - 01/2001). It took about 9 months for them to get the circuit installed. For the first few months, everything was great, but then we started getting massive spikes in latency (300-700ms) just getting across the pipe between my router and PSI's router. I liken it to owning an old Audi - they were great when they ran, but spent more time in the shop than on the road. The process of opening tickets and getting clued people in their NOC to talk to me was an adventure. PSI, much like some other providers, went to great pains to try keeping $CUSTOMER from having a direct path to $CLUEDPEOPLE. They could never adequately explain the latency, other than it would mysteriously go away and re-appear, more or less independent of the amount of traffic on the circuit. Eventually an upper-level engineer told me that the saturation was due to congestion on their end of the pipe, and getting some fatter pipe in there would take 60 days. Fine. 90 days later, the bigger pipe is installed on their end and the latency goes away for a few weeks, then comes back. Wash. Rinse. Repeat. A few more months of that, and I cancelled the peering.
oddly enough, we see 30+ msec across a DS3 to them, which isn't that loaded (35 to 40 mb/s).
Then, behind whatever we peer with, we see over 400 msec, with 50% loss, during business hours.
It has a lot of similarities to old Audi's. Remember they used to work fine and then for no reason used to fall in to drive, rev high, and run over Grandma and the kids! Sounds a bit like their peering.:) On Tue, 23 Jul 2002, Streiner, Justin wrote:
On Mon, 22 Jul 2002, Alex Rubenstein wrote:
Yes, it's horrid. I've been peering with PSI for going on three years, and it's never been as bad as it is now.
I took advantage of their "free peering" offer back in the day, and ended up peering with them for about 18 months (06/1999 - 01/2001). It took about 9 months for them to get the circuit installed.
For the first few months, everything was great, but then we started getting massive spikes in latency (300-700ms) just getting across the pipe between my router and PSI's router. I liken it to owning an old Audi - they were great when they ran, but spent more time in the shop than on the road.
The process of opening tickets and getting clued people in their NOC to talk to me was an adventure. PSI, much like some other providers, went to great pains to try keeping $CUSTOMER from having a direct path to $CLUEDPEOPLE.
They could never adequately explain the latency, other than it would mysteriously go away and re-appear, more or less independent of the amount of traffic on the circuit. Eventually an upper-level engineer told me that the saturation was due to congestion on their end of the pipe, and getting some fatter pipe in there would take 60 days.
Fine.
90 days later, the bigger pipe is installed on their end and the latency goes away for a few weeks, then comes back.
Wash. Rinse. Repeat.
A few more months of that, and I cancelled the peering.
oddly enough, we see 30+ msec across a DS3 to them, which isn't that loaded (35 to 40 mb/s).
Then, behind whatever we peer with, we see over 400 msec, with 50% loss, during business hours.
participants (21)
-
Alex Rubenstein
-
Alexander Koch
-
Brian
-
Brian Wallingford
-
Derek Samford
-
Doug Clements
-
G. Scott Granados
-
Gary E. Miller
-
Matt Zimmerman
-
Matthew S. Hallacy
-
Michael Painter
-
Phil Rosenthal
-
Randy Bush
-
Richard A Steenbergen
-
Rowland, Alan D
-
Scott Francis
-
Scott Granados
-
Scott Weeks
-
Streiner, Justin
-
Vadim Antonov
-
william@elan.net