As a decent sized north American ISP I think I need totally agree with this post. There simply is not any economically justifiable reason to collect customer data, doing so is expensive, and unless you are trying to traffic shape like a cell carrier has zero economic benefit. In our case we do 1:4000 netflow samples and that is literally it, we use that data for peering analytics and failure modeling. This is true for both large ISPs I've been involved with and in both cases I would have overseen the policy. What I see in this thread is a bunch of folks guessing that clearly have not been involved in large eyeball ISP operations. -----Original Message----- From: NANOG <nanog-bounces+john=vanoppen.com@nanog.org> On Behalf Of Saku Ytti Sent: Tuesday, May 16, 2023 7:56 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@nanog.org Subject: Re: Do ISP's collect and analyze traffic of users? I can't tell what large is. But I've worked for enterprise ISP and consumer ISPs, and none of the shops I worked for had capability to monetise information they had. And the information they had was increasingly low resolution. Infraprovider are notoriously bad even monetising their infra. I'm sure do monetise. But generally service providers are not interesting or have active shareholders, so very little pressure to make more money, hence firesales happen all the time due infrastructure increasingly seen as a liability, not an asset. They are generally boring companies and internally no one has incentive to monetise data, as it wouldn't improve their personal compensation. And regulations like GDPR create problems people rather not solve, unless pressured. Technically most people started 20 years ago with some netflow sampling ratio, and they still use the same sampling ratio, despite many orders of magnitude more packets. Meaning previously the share of flows captured was magnitude higher than today, and today only very few flows are seen in very typical applications, and netflow is largely for volumetric ddos and high level ingressAS=>egressAS metrics. Hardware offered increasingly does IPFIX as if it was sflow, that is, 0 cache, immediately exported after sampled, because you'd need like 1:100 or higher resolution, to have any significant luck in hitting the same flow twice. PTX has stopped supporting flow-cache entirely because of this, at the sampling rate where cache would do something, the cache would overflow. Of course there are other monetisation opportunities via other mechanism than data-in-the-wire, like DNS On Tue, 16 May 2023 at 15:57, Tom Beecher <beecher@beecher.cc> wrote:
Two simple rules for most large ISPs.
1. If they can see it, as long as they are not legally prohibited, they'll collect it. 2. If they can legally profit from that information, in any way, they will.
Now, ther privacy policies will always include lots of nice sounding clauses, such as 'We don't see your personally identifiable information'. This of course allows them to sell 'anonymized' sets of that data, which sounds great , except as researchers have proven, it's pretty trivial to scoop up multiple, discrete anonymized data sets, and cross reference to identify individuals. Netflow data may not be as directly 'valuable' as other types of data, but it can be used in the blender too.
Information is the currency of the realm.
On Mon, May 15, 2023 at 7:00 PM Michael Thomas <mike@mtcc.com> wrote:
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").
Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.
Mike
-- ++ytti