Tail Drops and TCP Slow Start

newer
"Route control" and routing table...

older
Re: Network Operations Luminaries?

Murphy, Brennan

7 Dec 2001 7 Dec '01

5:12 p.m.

If I have a DS3 or OC3 handling mounds and mounds of FTP download traffic, what is the easiest way to detect if the bandwidth in use is falling into a classic Tail Drop pattern? According to a Cisco book I am reading, the bandwidth utilization should graph in a "sawtooth" pattern of gradual increases in accordance with multiple machines gradually increasing via TCP slow start and then sharp drops. Will this only happen when the utilization approaches 100%. (maybe dumb question) Should I be able to do a show buffers and see misses or is there some better way to detect other than via graphing? Also, suppose in examining my ftp traffic patterns that I noticed that it spikes at 15minutes after the type of the hour, consistently, etc. Could I create a timed access list to only kick in at that time? Anyone have experience with WRED to handle ftp congestion? I usually take these types of questions to Cisco but I thought I'd post it to this list to get any generic real world advice. sh buff Buffer elements: 499 in free list (500 max allowed) 5713661 hits, 0 misses, 0 created Public buffer pools: Small buffers, 104 bytes (total 600, permanent 600): 580 in free list (20 min, 1250 max allowed) 2225528470 hits, 6 misses, 18 trims, 18 created 0 failures (0 no memory) Middle buffers, 600 bytes (total 450, permanent 450): 448 in free list (10 min, 1000 max allowed) 68259213 hits, 7 misses, 21 trims, 21 created 0 failures (0 no memory) Big buffers, 1524 bytes (total 450, permanent 450): 449 in free list (5 min, 1500 max allowed) 6807747 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory) VeryBig buffers, 4520 bytes (total 50, permanent 50): 50 in free list (0 min, 1500 max allowed) 46167681 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory) Large buffers, 5024 bytes (total 50, permanent 50): 50 in free list (0 min, 150 max allowed) 0 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory) Huge buffers, 18024 bytes (total 5, permanent 5): 5 in free list (0 min, 65 max allowed) 34 hits, 6 misses, 12 trims, 12 created 0 failures (0 no memory) Interface buffer pools: IPC buffers, 4096 bytes (total 768, permanent 768): 768 in free list (256 min, 2560 max allowed) 769236774 hits, 0 fallbacks, 0 trims, 0 created 0 failures (0 no memory) Header pools:

Show replies by date

John Kristoff

10 Dec 10 Dec

6:13 p.m.

"Murphy, Brennan" wrote:

...

what is the easiest way to detect if the bandwidth in use is falling into a classic Tail Drop pattern? According to a Cisco book I am reading, the bandwidth utilization should graph in a "sawtooth" pattern of gradual increases in accordance with multiple machines gradually increasing

It may be difficult to tell at any one router. Depending on where the endpoints are, and I'm assuming they are scattered around the net, some connections may lose packets at different times and places on the net. If that one OC3 or DS3 link were the only thing that mattered, perhaps it would be easier to tell.

...

via TCP slow start and then sharp drops. Will this only happen when the utilization approaches 100%. (maybe dumb question)

Again I think it depends, but I would venture to say: probably.

...

Should I be able to do a show buffers and see misses or is there some better way to detect other than via graphing?

I haven't studied those statistics on a Cisco, but they may tell you something, but I would suspect it would be difficult to discern what you're looking for based on them alone. Another parameter to monitor is packet drops.

...

Also, suppose in examining my ftp traffic patterns that I noticed that it spikes at 15minutes after the type of the hour, consistently, etc. Could I create a timed access list to only kick in at that time?

I guess you could, but that seems to be a very short and narrow minded approach to managing your capacity.

...

I usually take these types of questions to Cisco but I thought I'd post it to this list to get any generic real world advice.

Based on your 'show buffers' output, Cisco may recommend some tuned buffer settings for you. John

Rodney Dunn

7:48 p.m.

On Fri, Dec 07, 2001 at 11:12:39AM -0600, Murphy, Brennan wrote:

...

If I have a DS3 or OC3 handling mounds and mounds of FTP download traffic, what is the easiest way to detect if the bandwidth in use is falling into a classic Tail Drop pattern? According to a Cisco book I am reading, the bandwidth utilization should graph in a "sawtooth" pattern of gradual increases in accordance with multiple machines gradually increasing via TCP slow start and then sharp drops. Will this only happen when the utilization approaches 100%. (maybe dumb question)

It could be either/or. If the link is oversubscribed you may see what you are describing via the 'bits/sec' counter in 'sh int'. Turn the timers down via 'load-interval' to get a more granular timeframe. This link could be at 50% utilization but the upstream link feeding it running at maxiumum capacity so the 50% you see locally would experience the same behavior.

...

Should I be able to do a show buffers and see misses or is there some better way to detect other than via graphing?

'sh buffers' really isn't what you want to look at. The 'bits/sec' counter is more inline with the throughput on the interface. Turn the load interval down for better granularity. If you are seeing buffer misses there are usually other issues going on like very bursty traffic or other resource contention. Typically buffer misses are seen more on LAN segments and I don't usually recommend changing the defaults because most of the time there is some other underlying issue that tuning the buffers is hacking around.

...

Also, suppose in examining my ftp traffic patterns that I noticed that it spikes at 15minutes after the type of the hour, consistently, etc. Could I create a timed access list to only kick in at that time? Anyone have experience with WRED to handle ftp congestion?

It's more of a dynamic thing than that. WRED will smooth out the curve for you if the link you are working on is the source of the problem. What were you suggesting to do with the ACL anyway if it did kick in? Say for example you see the rate vary on a DS3 from 30M to 45M in a sawtooth manner. After applying WRED the high and low points of the peaks should be less and monitoring the throughput on the interface should show it stay consistently closer to linerate for that circuit.

...

I usually take these types of questions to Cisco but I thought I'd post it to this list to get any generic real world advice.

This comes from lab testing and real world experience. hth, rodney

...

sh buff Buffer elements: 499 in free list (500 max allowed) 5713661 hits, 0 misses, 0 created

Public buffer pools: Small buffers, 104 bytes (total 600, permanent 600): 580 in free list (20 min, 1250 max allowed) 2225528470 hits, 6 misses, 18 trims, 18 created 0 failures (0 no memory) Middle buffers, 600 bytes (total 450, permanent 450): 448 in free list (10 min, 1000 max allowed) 68259213 hits, 7 misses, 21 trims, 21 created 0 failures (0 no memory) Big buffers, 1524 bytes (total 450, permanent 450): 449 in free list (5 min, 1500 max allowed) 6807747 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory) VeryBig buffers, 4520 bytes (total 50, permanent 50): 50 in free list (0 min, 1500 max allowed) 46167681 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory) Large buffers, 5024 bytes (total 50, permanent 50): 50 in free list (0 min, 150 max allowed) 0 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory) Huge buffers, 18024 bytes (total 5, permanent 5): 5 in free list (0 min, 65 max allowed) 34 hits, 6 misses, 12 trims, 12 created 0 failures (0 no memory)

Interface buffer pools: IPC buffers, 4096 bytes (total 768, permanent 768): 768 in free list (256 min, 2560 max allowed) 769236774 hits, 0 fallbacks, 0 trims, 0 created 0 failures (0 no memory)

Header pools:

Jon 'tex' Boone

11 Dec 11 Dec

9:19 a.m.

"Murphy, Brennan" <Brennan_Murphy@NAI.com> writes:

...

If I have a DS3 or OC3 handling mounds and mounds of FTP download traffic, what is the easiest way to detect if the bandwidth in use is falling into a classic Tail Drop pattern? According to a Cisco book I am reading, the bandwidth utilization should graph in a "sawtooth" pattern of gradual increases in accordance with multiple machines gradually increasing via TCP slow start and then sharp drops. Will this only happen when the utilization approaches 100%. (maybe dumb question)

My rule #1 of troubleshooting performance problems is that most of the effect that you are seeing is due to a single problem somewhere along the path. Fixing that 1 item will yield a huge boost in performance. Fixing the rest of the problems will yield smaller, incremental boosts - although possibly still substantial. If the sawtooth in the graph represents a single session going into error recover (i.e. slow-start), then you *might* see this without necessarily seeing overwhelming evidence of it on the Cisco. If the sawtooth represents multiple sessions entering slow start nearly simultaneously (aka global synchronization), then it would be much easier to capture evidence of this via "show interface" stats. The bursty nature of data traffic is such that you can experience temporary congestion events that are "smoothed over" by queueing on the outbound interface. If these events are brief in duration and separated sufficiently in time, you will not necessarily see any indication of it via the "show interfaces" output. The shorter you have set your load-interval, the more likely you are to see the bandwidth increase represented by the burst (although 30 seconds is long enough to hide bursts on lightly loaded links). Note: it is difficult to get instantaneous measures of queueing reliably from a Cisco. Best is to check the output of "show controller cbus" (on 75xx series) which will show you the hardware details: look for the txacc and txlimit values. txlimit is the maximum # of items allowed in the transmit ring for that interface, while txacc == (txlimit - # items in the queue). This command needs to be issued repeatedly (and quickly) to get some sense of the queue size. This is *slightly* better than looking at the output of show interface, but still leaves a lot to be desired. :-) Once your tx queue fills up completely (txacc == 0), all packets scheduled for transmission on that interface are dropped. As your bandwidth utilization approaches 100%, you will see more and more queueing take place, meaning that the chances of the next incoming packet being dropped increase significantly as the load goes up. I doubt that you would see global synchronization until the load on your link was very near to 100%, but I haven't done the traffic studies to prove it. :-) RED attempts to prevent this situation by pro-actively dropping packets from the tx queue before it fills up. Using an exponentially-weighted average (to smooth out burstiness) of queue size to determine how likely it is that the current packet will be dropped, RED will tend to hit the "high bandwidth" users first, leaving the smaller users relatively unharmed. If RED does its job correctly, then you will not see global synchronization, although a graph of the throughput of a single FTP session that happened to be policed by RED would demonstrate the sawtooth.

...

Should I be able to do a show buffers and see misses or is there some better way to detect other than via graphing?

You are very unlikely to see this via "show buffers", as this is not likely to be caused by your device running out of memory if your cards are sized correctly. The only way to tell is by looking at the instantaneous measures of queue size if you are looking for a single session performance drop (in the face of near constant high background utilization). If you are seeing global synchronization, then you should see a *big* dip in your usage via "show interface" when set to 30 second load-interval.

...

Anyone have experience with WRED to handle ftp congestion?

RED is specifically designed to deal with this problem. WRED and dWRED do a decent job, but nothing can help you if you simply have more aggregate demand for bandwidth than your interface can support. And, neither WRED nor dWRED work for UDP applications/DoS attacks. -jon -- ------------------ Jon Allen Boone tex@delamancha.org CCIE #8338

8617

Age (days ago)

8621

Last active (days ago)

List overview

Download

3 comments

4 participants

participants (4)

John Kristoff
Jon 'tex' Boone
Murphy, Brennan
Rodney Dunn