Re: Traceroute versus other performance measurement

29 Nov 2000

      Daniel Senie writes:

| Programs such as pathchar can AT MOST tell you about latency, not about
| bandwidth. 

Well, this is simply wrong.

The theory of operation for pathchar is very simple: it attempts 
to build a queue at an interface, and measure the amount of time
it takes two back-to-back packets to pass through.   The law of
large numbers says that for any interface, given enough traffic
emitted from pathchar, there will be a time when pathchar will 
successfully observe the minimum packet inter-arrival time, and/or
the minimum delay for a set of varying-length packets, either of
which will indicate the bottleneck bandwidth.

Pathchar is robust against nearly everything except a bottleneck
mismatch[*]: trying to measure a faster bottleneck (interface) than
one closer to the pathchar-running host is subject to huge errors,
and the very clever maths used to improve the SNR increase of the
nearer slower interface is sometimes just insufficient.

| Any cases where links are in parallel (e.g. multilink PPP of
| multiple ISDN or T1 lines, or trunked Ethernet links) will typically NOT
| show up in the calculations, 

Simple logic tells us that this doesn't matter: you end up either
measuring the bottleneck bandwidth of the aggregate of the multiple
paths, or the bottleneck bandwidth of a single component, depending
on how the load-balancing works.  Pathchar tries to avoid measuring
only the component bandwidth.

More interesting is non-parallel equal-cost paths, and pathchar
does tricks to measure the various components as can be seen;
the problem is that there are non-parallel equal-cost paths that
are invisible (tunnels of whatever sort, of which MPLS is a bad variety).

Your complaint about this would be reasonable if pathchar weren't
trying to measure the path characteristics that would be seen by
a flow ORIGINATING AT THE PATHCHAR TEST BOX.  If in the multiple-path
case such a flow is constrained to a single component, then pathchar
is correct to report that.

IOW, yes, pathchar is poor at identifying some types of network
infrastructure, but that is not its job.  It is very good at its
job, which is indicating the bottlenecks from source to destination,
and giving a very good guess at the bottleneck bandwidths.

| This compounds other issues with trying to determine path characteristics
| with such tools, most especially (and as others mentioned) asymmetric paths.

On the contrary; if real live traffic (which pathchar generates)
observes path flutter over finite time, then other real live traffic
(as generated by users) also likely will flutter over finite time.

This is backed up by other observations, such as Vern Paxson's,
that attempt to characterize the routing, delay and loss aspects
of the Internet over long periods of time (taking advantage of
the law of large numbers).    Pathchar just works faster and tries
to answer the question of bottleneck bandwidth, and make educated
guesses about the bandwidths of subsequent bottlenecks.

The paper is quite good at describing alot of the theory of operation,
http://www.caida.org/tools/utilities/others/pathchar/
and deals explicitly with some of Daniel Senie's objections.

	Sean.
- --
[*] it is also not robust against "slow path" bottlenecks, which occur
    when the test traffic is treated substantially differently than "real"
    traffic, although since test traffic "through" a router en route to
    a subsequent hop is _unlikely_ to be treated differently (as compared
    with test traffic _to_ the same router), one can filter out undesirable
    artefacts to some degree by using data collected by measuring "end-to-end".

Re: Traceroute versus other performance measurement

smd＠clock.org