Comparing an old flow snapshot with some packet size data
NANOG Folks; [Cross posted to big-internet list in a separate message.] I'm hoping to get some comment and perhaps some more cisco flow stats or sniffer stats from participants on this list on the state of flows on the WAN Internet. I did a little traffic comparison to see what I could glean from comparing Sean Doran's flow stats posted last January to big-internet with an unpublished analysis of a snippet of FIX West data, collected by Kim Claffy at NLANR and analysed by Jerry Scharf of the CIX. Back in January, Sean Doran and Dorian Kim posted some cisco IP flow stats to the big-internet list. I haven't seen any since, but my big-internet mail delivery seems spotty so I may have missed some messages. I'd be interested in seeing some more flow stats, if anyone has been collecting more data. Kim Claffy collected 15 minutes of traffic data from FIX West on 12 Feb 96 and Jerry Scharf analyzed the packet size distribution of that sample. I used this data in a paper I recently finished on WAN protocol overhead. Here's a portion of the packet size histogram from this data. Only packet sizes that exceed 1% of the total traffic over this fifteen minute period are listed, although Jerry's data contains counts of all the traffic that Kim collected. IP Payload Per cent of Packets 40 30.55% 41 1.51% 44 3.04% 72 4.10% 185 2.72% 296 1.48% 552 22.29% 576 3.59% 1500 1.51% All other packet sizes are less than 1% of the total, but as you can see that adds up to about 29% of the traffic. There were almost no packets larger than 1500 bytes. And the 29% of other traffic was scattered over the interval up to 1500 bytes. Jerry has a perl script that does a "what if" calculation on what the WAN protocol overhead would be if all this traffic were HDLC or FR or ATM, but so far he hasn't published anything. The most interesting thing to me is that the most common traffic is probably file transfer (whether HTTP or FTP), since the 552 bytes corresponds to a TCP payload of 512 bytes, the largest power of two smaller than the IP default MTU of 576. 30% of the traffic is a zero byte TCP payload corresponding to all the connection setup and flow control traffic for all those file transfers going on. To recall what Sean originally posted in January: -------------------begin------- This is from a fairly small-traffic router (sl-kc-2.sprintlink.net),... Sean. - -- IP Flow Switching Cache, 29999 active, 2769 inactive, 58411388 added 1418487 lru, 22352334 timeout, 20923593 tcp fin, 2633568 invalidates 5253815 dns, 5799592 resent syn, 0 counter wrap statistics cleared 141949 seconds ago Protocol Total Flows Packets Bytes Packets Active(Sec) Idle(Sec) -------- Flows /Sec /Flow /Pkt /Sec /Flow /Flow TCP-Telnet 267034 1.8 233 75 439.3 182.6 36.5 TCP-FTP 1030837 7.2 10 78 76.6 22.6 43.7 TCP-FTPD 554967 3.9 164 345 641.3 52.7 15.7 TCP-WWW 32107858 226.2 15 247 3610.6 13.5 28.1 TCP-SMTP 3526231 24.8 13 159 323.1 10.2 23.6 TCP-X 9600 0.0 121 129 8.2 148.2 55.1 TCP-BGP 111096 0.7 14 77 11.5 229.2 61.1 TCP-other 5729172 40.3 70 220 2858.1 71.0 41.3 UDP-TFTP 2398 0.0 3 62 0.0 13.4 69.5 UDP-DNS 12875077 90.7 2 110 195.4 5.4 43.6 UDP-other 1489072 10.4 30 293 321.8 28.5 68.7 ICMP 665771 4.6 13 259 62.8 75.5 66.8 IGMP 5144 0.0 18 278 0.6 82.4 64.3 IPINIP 4450 0.0 933 377 29.2 166.7 61.0 IP-other 2693 0.0 11 136 0.2 80.8 65.7 Total: 58381400 411.3 20 227 8579.4 0.0 0.0 ------------------------end-------- I would say that these two different sets of statistics are roughly consistent. (Note that neither one represents a lot of data. The FIX West data is only over 15 minutes and Sean's was over the major part of a day.) Note the small number of packets per flow for WWW and FTP in Sean's data, from 10 to 15 for each flow. I don't understand the 78 bytes/pkt for FTP, [Robert Elz points out I'm looking at the FTP control channel. duh.] but the WWW bytes/pkt of 247 [and the FTPD bytes/pkt of 345] are roughly consistent with the packet distribution of 30% at 40 bytes and 22% at 552. If I average 40 and 552 I get 296, near to 247. It's rough, but sensible. With all appropriate caveats about the limited sample size, the majority of the TCP flows are WWW or FTP file transfers with a data payload of about 512 bytes (from the Claffy/Scharf data) and a total number of packets about 15 (from Sean's data). If I assume it takes 2 empty packets to open the connection, 6 packets of data, 5 ACKs back, and 2 more empty packets to close, then we have a file size of about 6*512 or 3100 bytes. [I could be off on those counts, but not by much.] Therefore, the average or most common Web/FTP file size transferred is about 3000 bytes. Simon Spero's trace analysis of an HTTP page load (available at the W3C web site) is remarkably similar. All in all, these three data sources (Claffy/Scharf, Doran, Spero) seem relatively consistent. An overwhelming amount of the flows in the Internet seem to be small file transfers, the TCP payload for this traffic is mostly <=512 bytes, when it could easily be <=1460 bytes. And slow start adds at least one extra RTT to each transfer that might be avoided if the payloads were 1460 instead of 512. Would there be any improvement if hosts used path MTU discovery, or would it add up to about the same thing? I'm not sure whether you can do path MTU discovery at the same time you are starting a TCP session or whether, as is more likely, it is a separate process and uses an RTT or more before starting the TCP session. Now, is there more data to bolster or refute these conclusions? I've done what I can with what I've found, but there just isn't much data to go on anymore. But I think it is pretty consistent with the view that a lot of the traffic is WWW TCP sessions of a few kilobytes. Would you agree? Would path MTU discovery help or could we all just informally set the Internet default MTU up to 1500 bytes [as John Hawkinson suggested on big-internet] and suffer a few fragmented slow speed links. Are most PPP MTUs set at the default 1500 or no? --Kent (Please note that as far as I know neither Kim nor Jerry have published anything from this data, so don't bug them for information or hold them responsible in any way for what I did with it.)
In message <2.2.32.19960806173812.00711b8c@mail.cts.com>, "Kent W. England" wri tes:
NANOG Folks;
[ ...
Now, is there more data to bolster or refute these conclusions? I've done what I can with what I've found, but there just isn't much data to go on anymore. But I think it is pretty consistent with the view that a lot of the traffic is WWW TCP sessions of a few kilobytes. Would you agree? Would path MTU discovery help or could we all just informally set the Internet default MTU up to 1500 bytes [as John Hawkinson suggested on big-internet] and suffer a few fragmented slow speed links. Are most PPP MTUs set at the default 1500 or no?
--Kent
Hi there Kent, Persistant connections is a prominant feature of HTTP 1.1, now in draft. Maybe someone who follows that WG can comment on its progress. If on average there are 2-3 inline images per page (reasonable estimate IMO, though I have no data to back this up), then the average transfer size will increase. I've heard (verbal at NANOG) that Netscape has promised to support persistant connections, with the only caveat that they will open one connection for the page itself and another for all the inlines so they can start rendering the first inline while a long page is being read. They can probably avoid this for short pages. This could lead to a significant improvement in the ability of the Internet traffic to respond to low levels of packet drop and make good use of TCP congestion control, plus it will significantly improve the speed of transfer on uncongested paths where currently TCP never gets out of the initial slow start. Curtis ps- If this isn't too off topic, does anyone know what servers and clients (if any) currently support persistant connections?
ps- If this isn't too off topic, does anyone know what servers and clients (if any) currently support persistant connections?
I am pretty sure that persistant connections is one of the big features IIS [Microsoft] was been touting since v1.0. I think they call the technology [when used with Explorer at least] a function that transfers the entire page in a single "hit". I know we have modified our servers to support persistant connections [ala draft HTTP 1.1] but then again, this is a web company. Deepak Jain American Information Network
participants (3)
-
Curtis Villamizar
-
Deepak Jain
-
Kent W. England