"Murphy, Brennan" <Brennan_Murphy@NAI.com> writes:
If I have a DS3 or OC3 handling mounds and mounds of FTP download traffic, what is the easiest way to detect if the bandwidth in use is falling into a classic Tail Drop pattern? According to a Cisco book I am reading, the bandwidth utilization should graph in a "sawtooth" pattern of gradual increases in accordance with multiple machines gradually increasing via TCP slow start and then sharp drops. Will this only happen when the utilization approaches 100%. (maybe dumb question)
My rule #1 of troubleshooting performance problems is that most of the effect that you are seeing is due to a single problem somewhere along the path. Fixing that 1 item will yield a huge boost in performance. Fixing the rest of the problems will yield smaller, incremental boosts - although possibly still substantial. If the sawtooth in the graph represents a single session going into error recover (i.e. slow-start), then you *might* see this without necessarily seeing overwhelming evidence of it on the Cisco. If the sawtooth represents multiple sessions entering slow start nearly simultaneously (aka global synchronization), then it would be much easier to capture evidence of this via "show interface" stats. The bursty nature of data traffic is such that you can experience temporary congestion events that are "smoothed over" by queueing on the outbound interface. If these events are brief in duration and separated sufficiently in time, you will not necessarily see any indication of it via the "show interfaces" output. The shorter you have set your load-interval, the more likely you are to see the bandwidth increase represented by the burst (although 30 seconds is long enough to hide bursts on lightly loaded links). Note: it is difficult to get instantaneous measures of queueing reliably from a Cisco. Best is to check the output of "show controller cbus" (on 75xx series) which will show you the hardware details: look for the txacc and txlimit values. txlimit is the maximum # of items allowed in the transmit ring for that interface, while txacc == (txlimit - # items in the queue). This command needs to be issued repeatedly (and quickly) to get some sense of the queue size. This is *slightly* better than looking at the output of show interface, but still leaves a lot to be desired. :-) Once your tx queue fills up completely (txacc == 0), all packets scheduled for transmission on that interface are dropped. As your bandwidth utilization approaches 100%, you will see more and more queueing take place, meaning that the chances of the next incoming packet being dropped increase significantly as the load goes up. I doubt that you would see global synchronization until the load on your link was very near to 100%, but I haven't done the traffic studies to prove it. :-) RED attempts to prevent this situation by pro-actively dropping packets from the tx queue before it fills up. Using an exponentially-weighted average (to smooth out burstiness) of queue size to determine how likely it is that the current packet will be dropped, RED will tend to hit the "high bandwidth" users first, leaving the smaller users relatively unharmed. If RED does its job correctly, then you will not see global synchronization, although a graph of the throughput of a single FTP session that happened to be policed by RED would demonstrate the sawtooth.
Should I be able to do a show buffers and see misses or is there some better way to detect other than via graphing?
You are very unlikely to see this via "show buffers", as this is not likely to be caused by your device running out of memory if your cards are sized correctly. The only way to tell is by looking at the instantaneous measures of queue size if you are looking for a single session performance drop (in the face of near constant high background utilization). If you are seeing global synchronization, then you should see a *big* dip in your usage via "show interface" when set to 30 second load-interval.
Anyone have experience with WRED to handle ftp congestion?
RED is specifically designed to deal with this problem. WRED and dWRED do a decent job, but nothing can help you if you simply have more aggregate demand for bandwidth than your interface can support. And, neither WRED nor dWRED work for UDP applications/DoS attacks. -jon -- ------------------ Jon Allen Boone tex@delamancha.org CCIE #8338