----- Original Message -----
From: "Bill Woodcock" <woody@pch.net>
On Jul 28, 2014, at 9:28 AM, William Herrin <bill@herrin.us> wrote:
The data set suffers three flaws:
Depending on your point of view, a lot more than three, undoubtedly.
1. It is not representative of the actual traffic flows on the Internet.
There are an infinite number of things it’s not representative of, but it also doesn’t claim to be representative of them. Traffic flows on the Internet is a different survey of a different thing, but if someone can figure out how to do it well, I would be very supportive of their effort. It's a _much_ more difficult survey to do, since it requires getting people to pony up their unanonymized netflow data, which they’re a lot less likely to do, en masse, than their peering data. We’ve been trying to figure out a way to do it on a large and representative enough scale to matter for twenty years, without too much headway. The larger the Internet gets, the more difficult it is to survey well, so the problem gets harder with time, rather than easier.
I think you're over-specifizing Bill's assertion, Woody. He didn't mean "TCP Flows", I don't think; he was simply -- as I understood him -- talking about the 40,000ft view of connections between pieces of the Internet. I don't expect your dataset to have flow-level data, and I don't think he did either; it isn't really germane to the conversation we're having. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274