An interesting question would be to quantify and do statistical analysis on the following:
Take a set of 1000 or more residential last mile broadband customers on an effectively more-than-they-can-use connection (symmetric 1Gbps active ethernet or similar).
On a 60s interval, retrieve SNMP traffic stats from the interfaces towards the customers' demarcs, or directly from on premises CPEs.
Store that data in influxdb or another lossless time series database for a multi month period.
Anonymize the data so that no possible information about the identity/circuit ID/location of the customer can be identified. Perhaps other than "gigE customer somewhere in North America", representing a semi random choice of US/Canada domestic market residential broadband users.
Provide that data set to persons who wish to analyze it to see how much/how bursty the traffic really is, night/day traffic patterns, remote work traffic patterns during office hours in certain time zones, etc. Additionally quantify what percentage of users move how much upstream data or come anywhere near maxing it out in brief bursts (people doing full disk offsite backups of 8TB HDDs to Backblaze, uploading 4K videos to youtube, etc).
I at first thought of a concept of doing something similar but with netflow data on a per CPE basis, but that has a great deal more worrisome privacy and PII data implications than simply raw bps/s interface data. Presumably netflow (or data from Kentik, etc) for various CDN traffic and other per-AS downstream traffic headed to an aggregation router that serves exclusively a block of a few thousand downstream residential symmetric gigabit customers would not be a difficult task to sufficiently anonymize.