----- Original Message ----- From: <smd@clock.org> To: <dts@senie.com>; <paul@adelphia.net>; <pingpan@cs.columbia.edu> Cc: <nanog@merit.edu> Sent: Wednesday, November 29, 2000 1:17 PM Subject: Re: Traceroute versus other performance measurement
Daniel Senie writes:
| Programs such as pathchar can AT MOST tell you about latency, not about | bandwidth.
Well, this is simply wrong.
The theory of operation for pathchar is very simple: it attempts to build a queue at an interface, and measure the amount of time it takes two back-to-back packets to pass through. The law of large numbers says that for any interface, given enough traffic emitted from pathchar, there will be a time when pathchar will successfully observe the minimum packet inter-arrival time, and/or the minimum delay for a set of varying-length packets, either of which will indicate the bottleneck bandwidth.
Pathchar is robust against nearly everything except a bottleneck mismatch[*]: trying to measure a faster bottleneck (interface) than one closer to the pathchar-running host is subject to huge errors, and the very clever maths used to improve the SNR increase of the nearer slower interface is sometimes just insufficient.
| Any cases where links are in parallel (e.g. multilink PPP of | multiple ISDN or T1 lines, or trunked Ethernet links) will typically NOT | show up in the calculations,
Simple logic tells us that this doesn't matter: you end up either measuring the bottleneck bandwidth of the aggregate of the multiple paths, or the bottleneck bandwidth of a single component, depending on how the load-balancing works. Pathchar tries to avoid measuring only the component bandwidth.
More interesting is non-parallel equal-cost paths, and pathchar does tricks to measure the various components as can be seen; the problem is that there are non-parallel equal-cost paths that are invisible (tunnels of whatever sort, of which MPLS is a bad variety).
Your complaint about this would be reasonable if pathchar weren't trying to measure the path characteristics that would be seen by a flow ORIGINATING AT THE PATHCHAR TEST BOX. If in the multiple-path case such a flow is constrained to a single component, then pathchar is correct to report that.
IOW, yes, pathchar is poor at identifying some types of network infrastructure, but that is not its job. It is very good at its job, which is indicating the bottlenecks from source to destination, and giving a very good guess at the bottleneck bandwidths.
All the theory sounds great. Now, you've got a customer using the utility to test a circuit between two boxes, and calls to complain that he's only seeing 1/2 of the expected bandwidth, because Pathchar tells him he's getting X, and we said we provisioned 2X. Perhaps it's just a customer education issue. I think you're making assumptions about how load is shared on parallel links. Often this is done by hashing the IP address or mac address of the packets as a way to ensure there will be no packet reordering issues on the parallel links. You can send traffic until you clog one of the two pipes, but will never cause spill to the other link.
| This compounds other issues with trying to determine path
| with such tools, most especially (and as others mentioned) asymmetric
characteristics paths.
On the contrary; if real live traffic (which pathchar generates) observes path flutter over finite time, then other real live traffic (as generated by users) also likely will flutter over finite time.
You're making the assumption that you'll see change in the path FROM THE ONE STARTING POINT where you're running pathchar. This is simply not going to happen in many cases. Equal cost multipath, trunking and even unequal cost pathing will result in you seeing only a part of the picture.
This is backed up by other observations, such as Vern Paxson's, that attempt to characterize the routing, delay and loss aspects of the Internet over long periods of time (taking advantage of the law of large numbers). Pathchar just works faster and tries to answer the question of bottleneck bandwidth, and make educated guesses about the bandwidths of subsequent bottlenecks.
Yet people run the tools, believe the results, even though the results aren't telling them the truth. This is backed up by observations in the real world, customer complaints and all.
The paper is quite good at describing alot of the theory of operation, http://www.caida.org/tools/utilities/others/pathchar/ and deals explicitly with some of Daniel Senie's objections.
Sean. - -- [*] it is also not robust against "slow path" bottlenecks, which occur when the test traffic is treated substantially differently than "real" traffic, although since test traffic "through" a router en route to a subsequent hop is _unlikely_ to be treated differently (as compared with test traffic _to_ the same router), one can filter out
undesirable
artefacts to some degree by using data collected by measuring
"end-to-end".