On Fri, 20 Sep 2002, Stephen Stuart wrote:
Regarding CPU cycles for route lookups:
Back in the mid-1990s, route lookups were expensive. There was a lot of hand-wringing, in fact, about how doing route lookups at every hop in larger and larger FIBs had a negative impact on end-to-end performance. One of the first reasons for the existence of MPLS was to solve this problem.
This was a totally bogus reason from the very beginning. Given that real backbones carry no prefixes longer than 24 bits the "long" lookup in 16-4-4-4-4 bit radix tree takes 5 memory reads. Given that a lot of routes are aggregated, and the ubiquity of fast data caches (which can safely hold 3 top levels of full FIB) the average number of memory reads needed by a general-purpose CPUs available in mid-90s to do route lookup is (surprise) - about 1.2 In fact, full-featured IP forwarding (including Fair Queueing and packet classification) at 120 kpps was demonstrated using a 133MHz Pentium MMX. Did beat crap out of 7000's SSE. Wasn't that hard, too; after all it is more than 1000 CPU cycles per packet. The value proposition of ASIC-based packet routing (and use of CAMs) was always quite dicey. For arithmetically inclined, I'd offer to do the same calculation for P-4 running at 2 Ghz, assuming 1000 cycles per packet, and average packet size of 200 bytes. Then estimate cost of that hardware and spend some time wondering about prices of OC-192 cards and about virtues of duopoly :) Of course, doing "line rate" forwarding of no-payload Christmas-tree packets is neat, but does anyone really need it?
In the case of modern hardware, forwarding delay due to route lookup times is probably not a contributing factor to latency.
In the real life (as opposed to benchmark tests) nobody cares about forwarding delay. Even with dumb implementations it is 2 orders of magnitude smaller than light of speed delays in long-haul circuits (and on short hops end-host performance is limited by bus speed anyway).
"More hops" can mean many other things in terms of delay, though - in both the "more delay" and "less delay" directions, and countering the "more hops means more delay" perception will (I think) always be a challenge.
"More hops" means more queues, i.e. potential congestion points. With max queue size selected to be RTT*BW, the maximal packet latency is, therefore, Nhops*RTT. In a steady mode, with all network uniformly loaded the mean latency is, therefore, (Nhops*MeanQueueLength + 1)*RTT. (MQL is from 0 (empty queue) to 1 (full queue)). In the real networks, queueing delays tend to contribute mostly to delay variance, not to the mean delay (in an underloaded network the average queue size is < 1 packet). In other words, a large number of hops makes life a lot less pleasant for interactive applications; file transfers generally don't care. --vadim