Link capacity upgrade threshold - Test - lists.nanog.org

newer
UUnet issues

Link capacity upgrade threshold

older
FCC's Definition of Broadband

devang patel

30 Aug 2009 30 Aug '09

3:50 a.m.

Hi All, I just wanted to know what is Link capacity upgrade threshold in terms of % of link utilization? Just to get an idea... thanks, Devang Patel

Reply

Sign in to reply online Use email software

Show replies by date

Justin Wilson - MTIN

30 Aug 30 Aug

3:54 a.m.

I consider a circuit nearing capacity at 80-85%. Depending on the circuit we start the process of increasing capacity around 70%. There are almost always telco issues, in-building issues, not enough physical ports on the provider end, and other such things that slow you down. Justin From: devang patel <devangnp@gmail.com> Date: Sat, 29 Aug 2009 21:50:41 -0600 To: <nanog@nanog.org> Subject: Link capacity upgrade threshold Hi All, I just wanted to know what is Link capacity upgrade threshold in terms of % of link utilization? Just to get an idea... thanks, Devang Patel

Reply

Sign in to reply online Use email software

William Herrin

4:15 a.m.

On Sat, Aug 29, 2009 at 11:50 PM, devang patel<devangnp@gmail.com> wrote:

I just wanted to know what is Link capacity upgrade threshold in terms of % of link utilization? Just to get an idea...

If your 95th percentile utilization is at 80% capacity, it's time to start planning the upgrade. If your 95th percentile utilization is at 95% it's time to finish the upgrade. If you average or median utilizations are at 80% capacity then as often as not it's time for your boss to fire you and replace you with someone who can do the job. Slight variations depending on the resource. Use absolute peak instead of 95th percentile for modem bank utilization -- under normal circumstances a modem bank should never ring busy. And a gig-e can run a little closer to the edge (percentage-wise) before folks notice slowness than a T1 can. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Reply

Sign in to reply online Use email software

Mikael Abrahamsson

5:23 a.m.

On Sun, 30 Aug 2009, William Herrin wrote:

If your 95th percentile utilization is at 80% capacity, it's time to start planning the upgrade. If your 95th percentile utilization is at 95% it's time to finish the upgrade.

I now see why people at the IETF spoke in a way that "core network congestion" was something natural. If your MRTG graph is showing 95% load in 5 minute average, you're most likely congesting/buffering at some time during that 5 minute interval. If this is acceptable or not in your network (it's not in mine) that's up to you. Also, a gig link on a Cisco will do approx 93-94% of imix of a gig in the values presented via SNMP (around 930-940 megabit/s as seen in "show int") before it's full, because of IFG, ethernet header overhead etc. So personally, I consider a gig link "in desperate need of upgrade" when it's showing around 850-880 megs of traffic in mrtg. -- Mikael Abrahamsson email: swmike@swm.pp.se

Reply

Sign in to reply online Use email software

Patrick W. Gilmore

5:03 p.m.

On Aug 30, 2009, at 1:23 AM, Mikael Abrahamsson wrote:

On Sun, 30 Aug 2009, William Herrin wrote:

...
If your 95th percentile utilization is at 80% capacity, it's time to start planning the upgrade. If your 95th percentile utilization is at 95% it's time to finish the upgrade.

I now see why people at the IETF spoke in a way that "core network congestion" was something natural.

If your MRTG graph is showing 95% load in 5 minute average, you're most likely congesting/buffering at some time during that 5 minute interval. If this is acceptable or not in your network (it's not in mine) that's up to you.

Also, a gig link on a Cisco will do approx 93-94% of imix of a gig in the values presented via SNMP (around 930-940 megabit/s as seen in "show int") before it's full, because of IFG, ethernet header overhead etc.

I've heard this said many times. I've also seen 'sho int' say 950,000,000 bits/sec and not see packets get dropped. I was under the impression "show int" showed -every- byte leaving the interface. I could make an argument that IFG would not be included, but things like ethernet headers better be. Does this change between IOS revisions, or hardware, or is it old info, or ... what? -- TTFN, patrick P.S. I agree that without perfect conditions (e.g. using an Ixia to test link speeds), you should upgrade WAAAAAY before 90-something percent. microbursts are real, and buffer space is small these days. I'm just asking what the counters -actually- show.

So personally, I consider a gig link "in desperate need of upgrade" when it's showing around 850-880 megs of traffic in mrtg.

-- Mikael Abrahamsson email: swmike@swm.pp.se

Reply

Sign in to reply online Use email software

Richard A Steenbergen

6:46 p.m.

On Sun, Aug 30, 2009 at 01:03:35PM -0400, Patrick W. Gilmore wrote:

...
Also, a gig link on a Cisco will do approx 93-94% of imix of a gig in the values presented via SNMP (around 930-940 megabit/s as seen in "show int") before it's full, because of IFG, ethernet header overhead etc.

I've heard this said many times. I've also seen 'sho int' say 950,000,000 bits/sec and not see packets get dropped. I was under the impression "show int" showed -every- byte leaving the interface. I could make an argument that IFG would not be included, but things like ethernet headers better be.

Does this change between IOS revisions, or hardware, or is it old info, or ... what?

Actually Cisco does count layer 2 header overhead in its snmp and show int results, it is Juniper who does not (for most platforms at any rate) due to their hw architecture. I did some tests regarding this a while back on j-nsp, you'll see different results for different platforms and depending on whether you're looking at the tx or rx. Also you'll see different results for vlan overhead and the like, which can further complicate things. That said, "show int" is an epic disaster for a significantly large percentage of the time. I've seen more bugs and false readings on that thing than I can possibly count, so you really shouldn't rely on it for rate readings. The problem is extra special bad on SVIs, where you might see a reading that is 20% high or low from reality at any given second, even on modern code. I'm not aware of any major issues detecting drops though, so you should at least be able to detect them when they happen (which isn't always at line rate). If you're on a 6500/7600 platform running anything SXF+ try "show platform hardware capacity interface" to look for interfaces with lots of drops globally. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Reply

Sign in to reply online Use email software

Randy Bush

12:04 p.m.

If your 95th percentile utilization is at 80% capacity, it's time to start planning the upgrade.

s/80/60/ the normal snmp and other averaging methods *really* miss the bursts. randy

Reply

Sign in to reply online Use email software

Nick Hilliard

12:26 p.m.

On 30/08/2009 13:04, Randy Bush wrote:

the normal snmp and other averaging methods *really* miss the bursts.

Definitely. For fun and giggles, I recently turned on 30 second polling on some kit and it turned up all sorts of interesting peculiarities that were completely blotted out in a 5 minute average. In order to get a really good idea of what's going on at a microburst level, you would need to poll as often as it takes to fill the buffer of the port in question. This is not feasible in the general case, which is why we resort to hacks like QoS to make sure that when there is congestion, it is handled semi-sensibly. There's a lot to the saying that QoS really means "Quantity of Service", because quality of service only ever becomes a problem if there is a shortfall in quantity. Nick

Reply

Sign in to reply online Use email software

Peter Hicks

12:34 p.m.

Nick Hilliard wrote:

Definitely. For fun and giggles, I recently turned on 30 second polling on some kit and it turned up all sorts of interesting peculiarities that were completely blotted out in a 5 minute average.

Would RMON History and Alarms help? I've always considered rolling them out to some of my kit to catch microbursts. Poggs

Reply

Sign in to reply online Use email software

Shane Ronan

4:53 p.m.

What system were you using to monitor link usage? Shane On Aug 30, 2009, at 8:26 AM, Nick Hilliard wrote:

On 30/08/2009 13:04, Randy Bush wrote:

...
the normal snmp and other averaging methods *really* miss the bursts.

Definitely. For fun and giggles, I recently turned on 30 second polling on some kit and it turned up all sorts of interesting peculiarities that were completely blotted out in a 5 minute average.

In order to get a really good idea of what's going on at a microburst level, you would need to poll as often as it takes to fill the buffer of the port in question. This is not feasible in the general case, which is why we resort to hacks like QoS to make sure that when there is congestion, it is handled semi-sensibly.

There's a lot to the saying that QoS really means "Quantity of Service", because quality of service only ever becomes a problem if there is a shortfall in quantity.

Nick

Reply

Sign in to reply online Use email software

Nick Hilliard

5:02 p.m.

On 30/08/2009 17:53, Shane Ronan wrote:

What system were you using to monitor link usage?

yrtg Nick

Reply

Sign in to reply online Use email software

Paul Jakma

1 Sep 1 Sep

10:55 a.m.

On Sun, 30 Aug 2009, Nick Hilliard wrote:

In order to get a really good idea of what's going on at a microburst level, you would need to poll as often as it takes to fill the buffer of the port in question. This is not feasible in the general case, which is why we resort to hacks like QoS to make sure that when there is congestion, it is handled semi-sensibly.

Or some enterprising vendor could start recording utilisation stats? regards, -- Paul Jakma paul@jakma.org Key ID: 64A2FF6A Fortune: Try to value useful qualities in one who loves you.

Reply

Sign in to reply online Use email software

Aaron J. Grier

7:18 p.m.

On Tue, Sep 01, 2009 at 11:55:45AM +0100, Paul Jakma wrote:

On Sun, 30 Aug 2009, Nick Hilliard wrote:

...
In order to get a really good idea of what's going on at a microburst level, you would need to poll as often as it takes to fill the buffer of the port in question. This is not feasible in the general case, which is why we resort to hacks like QoS to make sure that when there is congestion, it is handled semi-sensibly.

Or some enterprising vendor could start recording utilisation stats?

do any router vendors provide something akin to hardware latches to keep track of highest buffer fill levels? poll as frequently/infrequently as you like... -- Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com

Reply

Sign in to reply online Use email software

Holmes,David A

10 p.m.

Another approach to collecting buffer utilization is to infer such utilization from other variables. Active measurement of round trip times (RTT), packet loss, and jitter on a link-by-link basis is a reliable way of inferring interface queuing which leads to packet loss. A link that runs with good values on all 3 measures (low RTT, little or no packet loss, low jitter with small inter-packet arrival variation) can be deemed not a candidate for bandwidth upgrades. The key to active measurement is random measurement of the links so as to catch the bursts. The BRIX active measurement product (now owned by EXFO) is a good active measurement tool which randomizes probe data so as to, over time, collect a randomized sample of link behavior. -----Original Message----- From: Aaron J. Grier [mailto:agrier@poofygoof.com] Sent: Tuesday, September 01, 2009 12:19 PM To: nanog@nanog.org Subject: Re: Link capacity upgrade threshold On Tue, Sep 01, 2009 at 11:55:45AM +0100, Paul Jakma wrote:

On Sun, 30 Aug 2009, Nick Hilliard wrote:

...
In order to get a really good idea of what's going on at a microburst level, you would need to poll as often as it takes to fill the buffer of the port in question. This is not feasible in the general case, which is why we resort to hacks like QoS to make sure that when there is congestion, it is handled semi-sensibly.

Or some enterprising vendor could start recording utilisation stats?

do any router vendors provide something akin to hardware latches to keep track of highest buffer fill levels? poll as frequently/infrequently as you like... -- Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com

Reply

Sign in to reply online Use email software

Deepak Jain

10:29 p.m.

do any router vendors provide something akin to hardware latches to keep track of highest buffer fill levels? poll as frequently/infrequently as you like...

Without getting into each permutation of a device's architecture, aren't buffer fills really just buffer drops? There are means to determine this. Lots of vendors have configurable buffer pools for inter-device traffic levels that record high water levels as well. Deepak Jain AiNET

Reply

Sign in to reply online Use email software

Jack Bates

10:49 p.m.

Holmes,David A wrote:

runs with good values on all 3 measures (low RTT, little or no packet loss, low jitter with small inter-packet arrival variation) can be deemed not a candidate for bandwidth upgrades. The key to active

Sounds great, unless you don't own the router on the other side of the link which is subject to icmp filtering has a loaded RE, etc. If you pass the traffic through the routers to a reliable server, you'll be monitoring multiple links/routers and not just a single one. Jack

Reply

Sign in to reply online Use email software

Kevin Oberman

30 Aug 30 Aug

4:22 p.m.

Date: Sun, 30 Aug 2009 21:04:15 +0900 From: Randy Bush <randy@psg.com>

...
If your 95th percentile utilization is at 80% capacity, it's time to start planning the upgrade.

s/80/60/

the normal snmp and other averaging methods *really* miss the bursts.

s/60/40/ If you need to carry large TCP flows, say 2Gbps on a 10GE, dropping even a single packet due to congestion is unacceptable. Even with fast recovery, the average transmission rate will take a noticeable dip on every drop and even a drop rate under 1% will slow the flow dramatically. The point is, what is acceptable for one traffic profile may be unacceptable for another. Mail and web browsing are generally unaffected by light congestion. Other applications are not so forgiving. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751

Reply

Sign in to reply online Use email software

Erik L

31 Aug 31 Aug

2:41 a.m.

...
...
If your 95th percentile utilization is at 80% capacity, it's time to start planning the upgrade.

s/80/60/

the normal snmp and other averaging methods *really* miss the bursts.

s/60/40/

What is this "upgrade" thing you all speak of? When your links become saturated, shouldn't you solve the problem by deploying DPI-based application-discriminatory throttling and start double-dipping your customers? After all, it's their fault for using up more bandwidth than your flawed business model told you they will use. (If you're not familiar with Bell Canada, it's OK if you don't get the joke).

Reply

Sign in to reply online Use email software

Mohacsi Janos

7:41 a.m.

On Sun, 30 Aug 2009, Randy Bush wrote:

...
If your 95th percentile utilization is at 80% capacity, it's time to start planning the upgrade.

s/80/60/

the normal snmp and other averaging methods *really* miss the bursts.

Agreed. Internet traffic is very burtsy. If you care your customer experience upgrade at 60-65% level. Especially if an interface is towards a customers is similar in bandwith of backbone links... Best Regards, Janos Mohacsi

Reply

Sign in to reply online Use email software

Tom Sands

30 Aug 30 Aug

2:06 p.m.

If talking about just max capacity, I would agree with most of the statements of 80+% being in the right range, likely with a very fine line of when you actually start seeing a performance impact. Operationally, at least in our network, I'd never run anything at that level. Providers that are redundant for each other don't normally operate above 40-45%, in order to accommodate a failure. Other links that have a backup, but don't actively load share, normally run up to about 60-70% before being upgraded. By the time the upgrade is complete, it could be close to 80%. -------------------------------------------------------------------------------- Tom Sands Rackspace Hosting William Herrin wrote:

On Sat, Aug 29, 2009 at 11:50 PM, devang patel<devangnp@gmail.com> wrote:

...
I just wanted to know what is Link capacity upgrade threshold in terms of % of link utilization? Just to get an idea...

If your 95th percentile utilization is at 80% capacity, it's time to start planning the upgrade. If your 95th percentile utilization is at 95% it's time to finish the upgrade.

If you average or median utilizations are at 80% capacity then as often as not it's time for your boss to fire you and replace you with someone who can do the job.

Slight variations depending on the resource. Use absolute peak instead of 95th percentile for modem bank utilization -- under normal circumstances a modem bank should never ring busy. And a gig-e can run a little closer to the edge (percentage-wise) before folks notice slowness than a T1 can.

Regards, Bill Herrin

Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at abuse@rackspace.com, and delete the original message. Your cooperation is appreciated.

Reply

Sign in to reply online Use email software

5888

Age (days ago)

5890

Last active (days ago)

Download

19 comments

19 participants

tags

participants (19)

Aaron J. Grier
Deepak Jain
devang patel
Erik L
Holmes,David A
Jack Bates
Justin Wilson - MTIN
Kevin Oberman
Mikael Abrahamsson
Mohacsi Janos
Nick Hilliard
Patrick W. Gilmore
Paul Jakma
Peter Hicks
Randy Bush
Richard A Steenbergen
Shane Ronan
Tom Sands
William Herrin