Do ISP's collect and analyze traffic of users?
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow"). Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here. Mike
I’ve got Akvorado and netflow to identify where traffic comes in/goes to so we can improve our peering and make less traffic go via transit. I did see an article about Team Cymru selling netflow data from ISPs to governments though. https://www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-co... Rishi Panthee Ryamer LLC Https://ryamer.com rishipanthee@ryamer.com On May 15, 2023, at 5:59 PM, Michael Thomas <mike@mtcc.com> wrote: And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow"). Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here. Mike
I did see an article about Team Cymru selling netflow data from ISPs to governments though.
Team Cymru sold the same thing to the FBI Cyber Crimes division that any of us could purchase if we wanted to pay for it. On Tue, May 16, 2023 at 8:52 AM Rishi Panthee <rishipanthee@ryamer.com> wrote:
I’ve got Akvorado and netflow to identify where traffic comes in/goes to so we can improve our peering and make less traffic go via transit. I did see an article about Team Cymru selling netflow data from ISPs to governments though. https://www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-co...
Rishi Panthee Ryamer LLC Https://ryamer.com rishipanthee@ryamer.com
On May 15, 2023, at 5:59 PM, Michael Thomas <mike@mtcc.com> wrote:
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").
Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.
Mike
Our ISP does not collect (nor obviously sell) customer information/traffic. People volunteer all of their information on Facebook/Twitter/etc already, I'm not sure I see a concern. On Tue, May 16, 2023 at 9:07 AM Tom Beecher <beecher@beecher.cc> wrote:
I did see an article about Team Cymru selling netflow data from ISPs to
governments though.
Team Cymru sold the same thing to the FBI Cyber Crimes division that any of us could purchase if we wanted to pay for it.
On Tue, May 16, 2023 at 8:52 AM Rishi Panthee <rishipanthee@ryamer.com> wrote:
I’ve got Akvorado and netflow to identify where traffic comes in/goes to so we can improve our peering and make less traffic go via transit. I did see an article about Team Cymru selling netflow data from ISPs to governments though. https://www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-co...
Rishi Panthee Ryamer LLC Https://ryamer.com rishipanthee@ryamer.com
On May 15, 2023, at 5:59 PM, Michael Thomas <mike@mtcc.com> wrote:
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").
Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.
Mike
+1 to what Josh writes below. I would also differentiate between mobile networks (service provisioned to individual devices & often carrier s/w on the device) and wireline networks (home devices behind a router/gateway/NAT). I just don't think sale of data is a business for wireline ISPs. If it were - given most companies are public - you'd see it in SEC 10K filings and on earnings calls. Indeed, they'd be required to talk about it with investors if it was a material revenue stream. I see none of that. Rather, the focus is on subscription revenue. If you want to know about data monetization - focus on services you don't pay for... Jason From: NANOG <nanog-bounces+jason_livingood=cable.comcast.com@nanog.org> on behalf of Josh Luthman <josh@imaginenetworksllc.com> Date: Tuesday, May 16, 2023 at 09:43 To: Tom Beecher <beecher@beecher.cc> Cc: "nanog@nanog.org" <nanog@nanog.org> Subject: Re: Do ISP's collect and analyze traffic of users? Our ISP does not collect (nor obviously sell) customer information/traffic. People volunteer all of their information on Facebook/Twitter/etc already, I'm not sure I see a concern. On Tue, May 16, 2023 at 9:07 AM Tom Beecher <beecher@beecher.cc<mailto:beecher@beecher.cc>> wrote: I did see an article about Team Cymru selling netflow data from ISPs to governments though. Team Cymru sold the same thing to the FBI Cyber Crimes division that any of us could purchase if we wanted to pay for it. On Tue, May 16, 2023 at 8:52 AM Rishi Panthee <rishipanthee@ryamer.com<mailto:rishipanthee@ryamer.com>> wrote: I’ve got Akvorado and netflow to identify where traffic comes in/goes to so we can improve our peering and make less traffic go via transit. I did see an article about Team Cymru selling netflow data from ISPs to governments though. https://www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-contract<https://urldefense.com/v3/__https:/www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-contract__;!!CQl3mcHX2A!AdX4KK2veZ3cQX8jQB2xomCrDsHIFeUu9Ciu6M3tgLwYWOMpvKk2AV5L55a2sX9721iC7E8Q9tyi0lVDpsDtqP5dOgn8cQ$> Rishi Panthee Ryamer LLC Https://ryamer.com<https://urldefense.com/v3/__Https:/ryamer.com__;!!CQl3mcHX2A!AdX4KK2veZ3cQX8jQB2xomCrDsHIFeUu9Ciu6M3tgLwYWOMpvKk2AV5L55a2sX9721iC7E8Q9tyi0lVDpsDtqP7ptkTZDg$> rishipanthee@ryamer.com<mailto:rishipanthee@ryamer.com> On May 15, 2023, at 5:59 PM, Michael Thomas <mike@mtcc.com<mailto:mike@mtcc.com>> wrote: And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow"). Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here. Mike
On 5/16/23 7:35 AM, Livingood, Jason via NANOG wrote:
+1 to what Josh writes below. I would also differentiate between mobile networks (service provisioned to individual devices & often carrier s/w on the device) and wireline networks (home devices behind a router/gateway/NAT).
I just don't think sale of data is a business for wireline ISPs. If it were - given most companies are public - you'd see it in SEC 10K filings and on earnings calls. Indeed, they'd be required to talk about it with investors if it was a material revenue stream. I see none of that. Rather, the focus is on subscription revenue. If you want to know about data monetization - focus on services you don't pay for...
Why would there be a difference between wireless and wired? Mike
Why would there be a difference between wireless and wired?
Service provisioning in a mobile network is at the device level and tied to an individual vs. at a home shared across many devices & people. So just starting off there is more visibility to say X traffic is related to Y person. Then there’s location data to know roughly where that person/device is traveling. Also most carriers have software installed on the device as part of the provisioning/authentication function and I think there are historical cases where that provided some visibility into other apps on the device. In any case, it seems the most value (to advertisers & data brokers) is in the location data and I think that’s where all the scrutiny on MNOs has been recently. JL
On May 16, 2023, at 2:57 PM, Michael Thomas <mike@mtcc.com> wrote:
On 5/16/23 7:35 AM, Livingood, Jason via NANOG wrote:
+1 to what Josh writes below. I would also differentiate between mobile networks (service provisioned to individual devices & often carrier s/w on the device) and wireline networks (home devices behind a router/gateway/NAT).
I just don't think sale of data is a business for wireline ISPs. If it were - given most companies are public - you'd see it in SEC 10K filings and on earnings calls. Indeed, they'd be required to talk about it with investors if it was a material revenue stream. I see none of that. Rather, the focus is on subscription revenue. If you want to know about data monetization - focus on services you don't pay for...
Why would there be a difference between wireless and wired?
If you purchase MVNO from someone, those providers may do something else. I think it’s also a bit interesting because some providers previously attempted to monetize this data, either through DNS wildcarding vs NXDOMAIN. If it’s just generic “someone asked for this name” vs “this CIDR or IP requested data”. https://tech.slashdot.org/story/14/03/11/1813226/crowdsourcing-confirms-webs... A reminder that what’s old is new again on the internet, so I’m sure we’ll see things come back around, bad ideas continue to come up with revenue ideas. - Jared
First NANOG post, the topic compels me to chime in. For me, the question also implies that user-side we are attempting to scrub any of the data we volunteer on social media (or other) platforms. I am careful about what I volunteer up to the Internetz, and have been since my first AOL floppy experience.... So, the question of do the ISPs collect data is particularly important because regardless of how careful I am to anonymize my own contribution to my "online profile," Tom's assessment is the bleakest possible picture for anyone attempting to limit the data set which represents us. michael brooks Sr. Network Engineer Adams 12 Five Star Schools :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: "flying is learning how to throw yourself at the ground and miss" On Tue, May 16, 2023 at 7:42 AM Josh Luthman <josh@imaginenetworksllc.com> wrote:
Our ISP does not collect (nor obviously sell) customer information/traffic. People volunteer all of their information on Facebook/Twitter/etc already, I'm not sure I see a concern.
On Tue, May 16, 2023 at 9:07 AM Tom Beecher <beecher@beecher.cc> wrote:
I did see an article about Team Cymru selling netflow data from ISPs to
governments though.
Team Cymru sold the same thing to the FBI Cyber Crimes division that any of us could purchase if we wanted to pay for it.
On Tue, May 16, 2023 at 8:52 AM Rishi Panthee <rishipanthee@ryamer.com> wrote:
I’ve got Akvorado and netflow to identify where traffic comes in/goes to so we can improve our peering and make less traffic go via transit. I did see an article about Team Cymru selling netflow data from ISPs to governments though. https://www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-co... <https://urldefense.com/v3/__https://www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-contract__;!!IR39LLzvxw!ONxcdPXwj8lyFeci0a3JR8IBTqjjqtzZg8vjTO6rYamj4BvqNnTOOTtr4Nebr851S2GGVT0acYBtKMlEhaVq3egEJzNY$>
Rishi Panthee Ryamer LLC Https://ryamer.com <https://urldefense.com/v3/__Https://ryamer.com__;!!IR39LLzvxw!ONxcdPXwj8lyFeci0a3JR8IBTqjjqtzZg8vjTO6rYamj4BvqNnTOOTtr4Nebr851S2GGVT0acYBtKMlEhaVq3VIN0_k6$> rishipanthee@ryamer.com
On May 15, 2023, at 5:59 PM, Michael Thomas <mike@mtcc.com> wrote:
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").
Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.
Mike
I think it's safe to assume they are selling such data. https://www.techdirt.com/2021/08/25/isps-give-netflow-data-to-third-parties-... https://www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-co... On Mon, May 15, 2023 at 6:01 PM Michael Thomas <mike@mtcc.com> wrote:
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").
Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.
Mike
On Mon, May 15, 2023 at 6:42 PM Dave Phelps <tippenring@gmail.com> wrote:
I think it's safe to assume they are selling such data.
https://www.techdirt.com/2021/08/25/isps-give-netflow-data-to-third-parties-...
https://www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-co...
From the second article:
"Team Cymru’s products can also include data such as URLs visited, cookies, and PCAP data" Really? From Netflow? I admit, I'm perhaps a little behind on the latest netflow whiz-bangs, but I've never seen a netflow record type that included HTTP cookies or PCAP data before. Certainly, the products listed on the Team Cymru website don't make any mention of including cookies or PCAP data, at least not from what I've been able to ascertain from digging through their product listing. Is there some secret "off the menu" product that allows one to purchase a data feed that includes cookies and PCAP data? Matt
On 16 May 2023, at 06:46, Matthew Petach <mpetach@netflight.com> wrote: [..] I admit, I'm perhaps a little behind on the latest netflow whiz-bangs, but I've never seen a netflow record type that included HTTP cookies or PCAP data before.
Take your pick from the "latest" ~2009 IPFIX Information Elements: https://www.iana.org/assignments/ipfix/ipfix.xhtml One can stuff almost anything in there. Now if one should, and if one is allowed to..... There is a reason why the marketing companies that control the general Internet moved the browser to HTTPS and are trying to move to using their VPNs/CDNs: cannot modify the data to alter or remove the Ad in-flight, and cannot easily see anymore what people are even contacting: visibility for the ad network and not the ISP (which is mostly a good thing, but not so much operationally ;) ) Greets, Jeroen
On Tue, May 16, 2023 at 1:10 AM Jeroen Massar <jeroen@massar.ch> wrote:
On 16 May 2023, at 06:46, Matthew Petach <mpetach@netflight.com> wrote: [..] I admit, I'm perhaps a little behind on the latest netflow whiz-bangs, but I've never seen a netflow record type that included HTTP cookies or PCAP data before.
Take your pick from the "latest" ~2009 IPFIX Information Elements:
https://www.iana.org/assignments/ipfix/ipfix.xhtml
One can stuff almost anything in there.
Now if one should, and if one is allowed to.....
Wow. Thank you, Jeroen, I was indeed a bit out of date. Thank you for the pointer! (For those in the same boat as I, here's the relevant portion that clearly points out that yes, you can export the entire packet if you so desire): 313 ipHeaderPacketSection octetArray default current This Information Element carries a series of n octets from the IP header of a sampled packet, starting sectionOffset octets into the IP header. However, if no sectionOffset field corresponding to this Information Element is present, then a sectionOffset of zero applies, and the octets MUST be from the start of the IP header. With sufficient length, this element also reports octets from the IP payload. However, full packet capture of arbitrary packet streams is explicitly out of scope per the Security Considerations sections of [RFC5477 <https://www.iana.org/go/rfc5477>] and [RFC2804 <https://www.iana.org/go/rfc2804>]. Thanks! Matt (still learning after all these years. ^_^ )
There are already so many different ways that organizations can find out all sorts of information about individual users, as others have noted (social media interactions, mobile location/GPS data, call/text history, interactions with specific sites, etc), that there probably isn't much incentive for many providers to harvest data beyond what is needed for troubleshooting and capacity planning. Plus, gathering more data - potentially down to the level packet payload - is not an easy problem to solve (read: expensive) and doesn't scale well at all. 100G links are very common today, and 400G is becoming so. I doubt that many infrastructure providers would be able to justify the major investments in extra infrastructure to support this, for a revenue stream that likely wouldn't match that investment, which would make such an investment a loss-leader. Content providers - particularly social media platforms - have a somewhat different business model, but those providers already have many different ways to harvest and sell large troves of user data. Thank you jms On Tue, May 16, 2023 at 3:44 PM Matthew Petach <mpetach@netflight.com> wrote:
On Tue, May 16, 2023 at 1:10 AM Jeroen Massar <jeroen@massar.ch> wrote:
On 16 May 2023, at 06:46, Matthew Petach <mpetach@netflight.com> wrote: [..] I admit, I'm perhaps a little behind on the latest netflow whiz-bangs, but I've never seen a netflow record type that included HTTP cookies or PCAP data before.
Take your pick from the "latest" ~2009 IPFIX Information Elements:
https://www.iana.org/assignments/ipfix/ipfix.xhtml
One can stuff almost anything in there.
Now if one should, and if one is allowed to.....
Wow.
Thank you, Jeroen, I was indeed a bit out of date. Thank you for the pointer!
(For those in the same boat as I, here's the relevant portion that clearly points out that yes, you can export the entire packet if you so desire):
313 ipHeaderPacketSection octetArray default current
This Information Element carries a series of n octets from the IP header of a sampled packet, starting sectionOffset octets into the IP header.
However, if no sectionOffset field corresponding to this Information Element is present, then a sectionOffset of zero applies, and the octets MUST be from the start of the IP header.
With sufficient length, this element also reports octets from the IP payload. However, full packet capture of arbitrary packet streams is explicitly out of scope per the Security Considerations sections of [ RFC5477 <https://www.iana.org/go/rfc5477>] and [RFC2804 <https://www.iana.org/go/rfc2804>].
Thanks!
Matt (still learning after all these years. ^_^ )
On 19/05/2023 15:27, Justin Streiner wrote: It amazes me how people can focus on Netflow metadata and ignore things like Microsoft telemetry data from every Windows box, or ignore the massive amount of html cookies that are traded by companies or how almost every corporate firewall or anti-spam box "reports" back to the mother ship and sends tons of information via secret channels like hashed DNS lookups just to be avoided. Regards, Hank
There are already so many different ways that organizations can find out all sorts of information about individual users, as others have noted (social media interactions, mobile location/GPS data, call/text history, interactions with specific sites, etc), that there probably isn't much incentive for many providers to harvest data beyond what is needed for troubleshooting and capacity planning. Plus, gathering more data - potentially down to the level packet payload - is not an easy problem to solve (read: expensive) and doesn't scale well at all. 100G links are very common today, and 400G is becoming so. I doubt that many infrastructure providers would be able to justify the major investments in extra infrastructure to support this, for a revenue stream that likely wouldn't match that investment, which would make such an investment a loss-leader.
Content providers - particularly social media platforms - have a somewhat different business model, but those providers already have many different ways to harvest and sell large troves of user data.
Thank you jms
Hank: No doubt there is a massive amount of information that can be gathered from in-box telemetry. This thread appears to be more focused on providers gathering data from traffic in flight across their infrastructure. Thank you jms On Fri, May 19, 2023 at 8:49 AM Hank Nussbacher <hank@efes.iucc.ac.il> wrote:
On 19/05/2023 15:27, Justin Streiner wrote:
It amazes me how people can focus on Netflow metadata and ignore things like Microsoft telemetry data from every Windows box, or ignore the massive amount of html cookies that are traded by companies or how almost every corporate firewall or anti-spam box "reports" back to the mother ship and sends tons of information via secret channels like hashed DNS lookups just to be avoided.
Regards, Hank
There are already so many different ways that organizations can find out all sorts of information about individual users, as others have noted (social media interactions, mobile location/GPS data, call/text history, interactions with specific sites, etc), that there probably isn't much incentive for many providers to harvest data beyond what is needed for troubleshooting and capacity planning. Plus, gathering more data - potentially down to the level packet payload - is not an easy problem to solve (read: expensive) and doesn't scale well at all. 100G links are very common today, and 400G is becoming so. I doubt that many infrastructure providers would be able to justify the major investments in extra infrastructure to support this, for a revenue stream that likely wouldn't match that investment, which would make such an investment a loss-leader.
Content providers - particularly social media platforms - have a somewhat different business model, but those providers already have many different ways to harvest and sell large troves of user data.
Thank you jms
On 5/19/23 6:09 AM, Justin Streiner wrote:
Hank:
No doubt there is a massive amount of information that can be gathered from in-box telemetry. This thread appears to be more focused on providers gathering data from traffic in flight across their infrastructure.
Yeah, my curiosity was whether ISP were trying to get in the monetizing traffic analysis biz which seems to be a small degree but they can't really compete with the much finer grained information that other means can provide and that they have no particular expertise in it or an institutional desire. For things like Google and Facebook, that kind of analysis was part of their initial business plan. Mike
Thank you jms
On Fri, May 19, 2023 at 8:49 AM Hank Nussbacher <hank@efes.iucc.ac.il> wrote:
On 19/05/2023 15:27, Justin Streiner wrote:
It amazes me how people can focus on Netflow metadata and ignore things like Microsoft telemetry data from every Windows box, or ignore the massive amount of html cookies that are traded by companies or how almost every corporate firewall or anti-spam box "reports" back to the mother ship and sends tons of information via secret channels like hashed DNS lookups just to be avoided.
Regards, Hank
> There are already so many different ways that organizations can find > out all sorts of information about individual users, as others have > noted (social media interactions, mobile location/GPS data, call/text > history, interactions with specific sites, etc), that there probably > isn't much incentive for many providers to harvest data beyond what is > needed for troubleshooting and capacity planning. Plus, gathering > more data - potentially down to the level packet payload - is not an > easy problem to solve (read: expensive) and doesn't scale well at all. > 100G links are very common today, and 400G is becoming so. I doubt > that many infrastructure providers would be able to justify the major > investments in extra infrastructure to support this, for a revenue > stream that likely wouldn't match that investment, which would make > such an investment a loss-leader. > > Content providers - particularly social media platforms - have a > somewhat different business model, but those providers already have > many different ways to harvest and sell large troves of user data. > > Thank you > jms
On 5/15/23 9:46 PM, Matthew Petach wrote:
On Mon, May 15, 2023 at 6:42 PM Dave Phelps <tippenring@gmail.com> wrote:
I think it's safe to assume they are selling such data.
https://www.techdirt.com/2021/08/25/isps-give-netflow-data-to-third-parties-...
https://www.vice.com/en/article/dy3z9a/fbi-bought-netflow-data-team-cymru-co...
From the second article:
"Team Cymru’s products can also include data such as URLs visited, cookies, and PCAP data"
Really? From Netflow?
I admit, I'm perhaps a little behind on the latest netflow whiz-bangs, but I've never seen a netflow record type that included HTTP cookies or PCAP data before.
Certainly, the products listed on the Team Cymru website don't make any mention of including cookies or PCAP data, at least not from what I've been able to ascertain from digging through their product listing.
Is there some secret "off the menu" product that allows one to purchase a data feed that includes cookies and PCAP data?
Given the pervasiveness of TLS these days, even if they could get it off the remaining unencrypted data I'm not sure it would have a lot of value. Mike
ISP capture traffic samplings in both directions Upstream at aggregation points , Downstream at ingress and your DNS queries but the last part everyone knows . Some of the most expensive gear is used to sample and aggregate that data On Mon, May 15, 2023 at 7:01 PM Michael Thomas <mike@mtcc.com> wrote:
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").
Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.
Mike
Two simple rules for most large ISPs. 1. If they can see it, as long as they are not legally prohibited, they'll collect it. 2. If they can legally profit from that information, in any way, they will. Now, ther privacy policies will always include lots of nice sounding clauses, such as 'We don't see your personally identifiable information'. This of course allows them to sell 'anonymized' sets of that data, which sounds great , except as researchers have proven, it's pretty trivial to scoop up multiple, discrete anonymized data sets, and cross reference to identify individuals. Netflow data may not be as directly 'valuable' as other types of data, but it can be used in the blender too. Information is the currency of the realm. On Mon, May 15, 2023 at 7:00 PM Michael Thomas <mike@mtcc.com> wrote:
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").
Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.
Mike
I can't tell what large is. But I've worked for enterprise ISP and consumer ISPs, and none of the shops I worked for had capability to monetise information they had. And the information they had was increasingly low resolution. Infraprovider are notoriously bad even monetising their infra. I'm sure do monetise. But generally service providers are not interesting or have active shareholders, so very little pressure to make more money, hence firesales happen all the time due infrastructure increasingly seen as a liability, not an asset. They are generally boring companies and internally no one has incentive to monetise data, as it wouldn't improve their personal compensation. And regulations like GDPR create problems people rather not solve, unless pressured. Technically most people started 20 years ago with some netflow sampling ratio, and they still use the same sampling ratio, despite many orders of magnitude more packets. Meaning previously the share of flows captured was magnitude higher than today, and today only very few flows are seen in very typical applications, and netflow is largely for volumetric ddos and high level ingressAS=>egressAS metrics. Hardware offered increasingly does IPFIX as if it was sflow, that is, 0 cache, immediately exported after sampled, because you'd need like 1:100 or higher resolution, to have any significant luck in hitting the same flow twice. PTX has stopped supporting flow-cache entirely because of this, at the sampling rate where cache would do something, the cache would overflow. Of course there are other monetisation opportunities via other mechanism than data-in-the-wire, like DNS On Tue, 16 May 2023 at 15:57, Tom Beecher <beecher@beecher.cc> wrote:
Two simple rules for most large ISPs.
1. If they can see it, as long as they are not legally prohibited, they'll collect it. 2. If they can legally profit from that information, in any way, they will.
Now, ther privacy policies will always include lots of nice sounding clauses, such as 'We don't see your personally identifiable information'. This of course allows them to sell 'anonymized' sets of that data, which sounds great , except as researchers have proven, it's pretty trivial to scoop up multiple, discrete anonymized data sets, and cross reference to identify individuals. Netflow data may not be as directly 'valuable' as other types of data, but it can be used in the blender too.
Information is the currency of the realm.
On Mon, May 15, 2023 at 7:00 PM Michael Thomas <mike@mtcc.com> wrote:
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").
Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.
Mike
-- ++ytti
On 5/16/23 16:55, Saku Ytti wrote:
I can't tell what large is. But I've worked for enterprise ISP and consumer ISPs, and none of the shops I worked for had capability to monetise information they had. And the information they had was increasingly low resolution. Infraprovider are notoriously bad even monetising their infra.
I'm sure do monetise. But generally service providers are not interesting or have active shareholders, so very little pressure to make more money, hence firesales happen all the time due infrastructure increasingly seen as a liability, not an asset. They are generally boring companies and internally no one has incentive to monetise data, as it wouldn't improve their personal compensation. And regulations like GDPR create problems people rather not solve, unless pressured.
I tend to agree. ISP's are, generally, terrible at evolving beyond selling bandwidth. While there might be some ISP's that are able to monetize the data they collect - to whatever degree that monetization is useful - I'd hazard that the majority don't do this because it requires a different mindset that most ISP's simply don't have. Mark.
On 5/17/23 12:06 AM, Mark Tinka wrote:
On 5/16/23 16:55, Saku Ytti wrote:
I can't tell what large is. But I've worked for enterprise ISP and consumer ISPs, and none of the shops I worked for had capability to monetise information they had. And the information they had was increasingly low resolution. Infraprovider are notoriously bad even monetising their infra.
[ .... ] I tend to agree.
ISP's are, generally, terrible at evolving beyond selling bandwidth.
While there might be some ISP's that are able to monetize the data they collect - to whatever degree that monetization is useful - I'd hazard that the majority don't do this because it requires a different mindset that most ISP's simply don't have.
Mark.
For those who may have a broader interest in the topic of user/subscriber information collection by ISPs.... Eight Years Holding ISPs to Account in Latin America: A Comparative Outlook of Victories and Challenges for User Privacy By Veridiana Alimonti May 12, 2023 https://www.eff.org/deeplinks/2023/05/eight-years-holding-isps-account-latin... All the best -- /Dr. Robert Mathews, D.Phil. Principal Technologist & Distinguished Research Scholar - National Security Affairs & Industrial Preparedness Office of Scientific Inquiry & Applications University of Hawai'i/
As a decent sized north American ISP I think I need totally agree with this post. There simply is not any economically justifiable reason to collect customer data, doing so is expensive, and unless you are trying to traffic shape like a cell carrier has zero economic benefit. In our case we do 1:4000 netflow samples and that is literally it, we use that data for peering analytics and failure modeling. This is true for both large ISPs I've been involved with and in both cases I would have overseen the policy. What I see in this thread is a bunch of folks guessing that clearly have not been involved in large eyeball ISP operations. -----Original Message----- From: NANOG <nanog-bounces+john=vanoppen.com@nanog.org> On Behalf Of Saku Ytti Sent: Tuesday, May 16, 2023 7:56 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@nanog.org Subject: Re: Do ISP's collect and analyze traffic of users? I can't tell what large is. But I've worked for enterprise ISP and consumer ISPs, and none of the shops I worked for had capability to monetise information they had. And the information they had was increasingly low resolution. Infraprovider are notoriously bad even monetising their infra. I'm sure do monetise. But generally service providers are not interesting or have active shareholders, so very little pressure to make more money, hence firesales happen all the time due infrastructure increasingly seen as a liability, not an asset. They are generally boring companies and internally no one has incentive to monetise data, as it wouldn't improve their personal compensation. And regulations like GDPR create problems people rather not solve, unless pressured. Technically most people started 20 years ago with some netflow sampling ratio, and they still use the same sampling ratio, despite many orders of magnitude more packets. Meaning previously the share of flows captured was magnitude higher than today, and today only very few flows are seen in very typical applications, and netflow is largely for volumetric ddos and high level ingressAS=>egressAS metrics. Hardware offered increasingly does IPFIX as if it was sflow, that is, 0 cache, immediately exported after sampled, because you'd need like 1:100 or higher resolution, to have any significant luck in hitting the same flow twice. PTX has stopped supporting flow-cache entirely because of this, at the sampling rate where cache would do something, the cache would overflow. Of course there are other monetisation opportunities via other mechanism than data-in-the-wire, like DNS On Tue, 16 May 2023 at 15:57, Tom Beecher <beecher@beecher.cc> wrote:
Two simple rules for most large ISPs.
1. If they can see it, as long as they are not legally prohibited, they'll collect it. 2. If they can legally profit from that information, in any way, they will.
Now, ther privacy policies will always include lots of nice sounding clauses, such as 'We don't see your personally identifiable information'. This of course allows them to sell 'anonymized' sets of that data, which sounds great , except as researchers have proven, it's pretty trivial to scoop up multiple, discrete anonymized data sets, and cross reference to identify individuals. Netflow data may not be as directly 'valuable' as other types of data, but it can be used in the blender too.
Information is the currency of the realm.
On Mon, May 15, 2023 at 7:00 PM Michael Thomas <mike@mtcc.com> wrote:
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").
Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.
Mike
-- ++ytti
On Sat, Jun 10, 2023 at 9:46 AM John van Oppen <john@vanoppen.com> wrote:
As a decent sized north American ISP I think I need totally agree with this post. There simply is not any economically justifiable reason to collect customer data, doing so is expensive, and unless you are trying to traffic shape like a cell carrier
They shape? News to me...
has zero economic benefit. In our case we do 1:4000 netflow samples and that is literally it, we use that data for peering analytics and failure modeling.
This is true for both large ISPs I've been involved with and in both cases I would have overseen the policy.
What I see in this thread is a bunch of folks guessing that clearly have not been involved in large eyeball ISP operations.
The smaller (mostly rural) WISPs I work with do not have time or desire to monetize traffic either! Pretty much all of them have their hands full just solving tech support problems. They do collect extensive metrics on bandwidth, packet loss, latency, snmp stats of all sorts, airtime, interference, cpu stats, routing info, (common tools are things like UISP, splynx, opennms), and keep amazingly good (lidar, even) maps of the local terrain. If the bigger ISPs are only doing netflow once in a while, no wonder the little wisps survive. The ones shaping via libreqos.io now are totally in love[1] with our in-band RTT metrics as that is giving them insight into their backhaul behaviors in rain and snow and sleet, instead of out of band snmp, as well as gaining insight into when it is the customer wifi that is the real problem. It is the combination of all these metrics that helps narrow down problems. But the only monetization that happens is the monthly bill. Most of these cats are actually rather ornery and *very* insistent about protecting their customers privacy, from all comers, and resistant to cloud based applications in general. There are some bad apples in the wisp world that do want to rate limit (via dpi) netflix above all else in case of running low on backhaul, but they are not in my customer base. [1] we (and they) *are* passionately interesting in identifying the characteristics of multiple traffic types and mitigating attacks, and a couple are publishing some anonymized movies of what traffic looks like: https://www.youtube.com/@trendaltoews7143/videos
-----Original Message----- From: NANOG <nanog-bounces+john=vanoppen.com@nanog.org> On Behalf Of Saku Ytti Sent: Tuesday, May 16, 2023 7:56 AM To: Tom Beecher <beecher@beecher.cc> Cc: nanog@nanog.org Subject: Re: Do ISP's collect and analyze traffic of users?
I can't tell what large is. But I've worked for enterprise ISP and consumer ISPs, and none of the shops I worked for had capability to monetise information they had. And the information they had was increasingly low resolution. Infraprovider are notoriously bad even monetising their infra.
I'm sure do monetise. But generally service providers are not interesting or have active shareholders, so very little pressure to make more money, hence firesales happen all the time due infrastructure increasingly seen as a liability, not an asset. They are generally boring companies and internally no one has incentive to monetise data, as it wouldn't improve their personal compensation. And regulations like GDPR create problems people rather not solve, unless pressured.
Technically most people started 20 years ago with some netflow sampling ratio, and they still use the same sampling ratio, despite many orders of magnitude more packets. Meaning previously the share of flows captured was magnitude higher than today, and today only very few flows are seen in very typical applications, and netflow is largely for volumetric ddos and high level ingressAS=>egressAS metrics.
Hardware offered increasingly does IPFIX as if it was sflow, that is, 0 cache, immediately exported after sampled, because you'd need like 1:100 or higher resolution, to have any significant luck in hitting the same flow twice. PTX has stopped supporting flow-cache entirely because of this, at the sampling rate where cache would do something, the cache would overflow.
Of course there are other monetisation opportunities via other mechanism than data-in-the-wire, like DNS
On Tue, 16 May 2023 at 15:57, Tom Beecher <beecher@beecher.cc> wrote:
Two simple rules for most large ISPs.
1. If they can see it, as long as they are not legally prohibited, they'll collect it. 2. If they can legally profit from that information, in any way, they will.
Now, ther privacy policies will always include lots of nice sounding clauses, such as 'We don't see your personally identifiable information'. This of course allows them to sell 'anonymized' sets of that data, which sounds great , except as researchers have proven, it's pretty trivial to scoop up multiple, discrete anonymized data sets, and cross reference to identify individuals. Netflow data may not be as directly 'valuable' as other types of data, but it can be used in the blender too.
Information is the currency of the realm.
On Mon, May 15, 2023 at 7:00 PM Michael Thomas <mike@mtcc.com> wrote:
And maybe try to monetize it? I'm pretty sure that they can be compelled to do that, but do they do it for their own reasons too? Or is this way too much overhead to be doing en mass? (I vaguely recall that netflow, for example, can make routers unhappy if there is too much "flow").
Obviously this is likely to depend on local laws but since this is NANOG we can limit it to here.
Mike
-- ++ytti
-- Podcast: https://www.linkedin.com/feed/update/urn:li:activity:7058793910227111937/ Dave Täht CSO, LibreQos
As a decent sized north American ISP I think I need totally agree with this post. There simply is not any economically justifiable reason to collect customer data, doing so is expensive, and unless you are trying to traffic shape like a cell carrier
They shape? News to me...
You can find this in their respective network management disclosures. Most typically it is bitrate shaping of OTT video traffic. JL
participants (18)
-
Dave Phelps
-
Dave Taht
-
Hank Nussbacher
-
Jared Mauch
-
Jeroen Massar
-
John van Oppen
-
Josh Luthman
-
Justin Streiner
-
Livingood, Jason
-
Lou Devictoria
-
Mark Tinka
-
Matthew Petach
-
michael brooks - ESC
-
Michael Thomas
-
Rishi Panthee
-
Robert Mathews (OSIA)
-
Saku Ytti
-
Tom Beecher