It would seem like the more copies the better, seemingly chunking this data up and using .torrent files may be a way to both (a) ensure the integrity of the data, and (b) enable an additional method to ensure that there are enough copies being replicated (initial seeders would hopefully retain the data for as long as possible)... On Fri, Dec 16, 2016 at 12:24 PM, Ken Chase <math@sizone.org> wrote:
University Toronto's Robarts Library is hosting an all-day party tomorrow of people to surf and help identify datasets, survey and get size and details, authenticate copies, etc.
fb event: https://www.facebook.com/events/1828129627464671/
/kc
We are currently working on a scheme to successfully authenticate and verify the integrity of the data. Datasets in https://climate.daknob.net/ are compressed to a .tar.bz2 and then hashed using SHA-256. The final file with all checksums is then signed using a set of PGP keys.
We are still working on a viable way to verify the authenticity of files before there are tons of copies lying around and there???s a working group in the Slack team I sent previously where your input is much needed!
Thanks, Antonios
On 16 Dec 2016, at 18:30, Ken Chase <math@sizone.org> wrote:
Surfing through the links - any hints on how big these datasets are? Everyone's got a few TB to throw at things, but fewer of us have spare PB to throw around.
There's some random #s on the goog doc sheet for sizes (100's of TB for the landsat archive seems credible), and there's one number that destroys credibility of the sheet (100000000000 GB (100 ZB)) for the EPA archive.
The other page has many 'TBA' entries for size.
Not sure what level of player one needs to be to be able to serve a useful segment of these archives. I realize some of the datasets are tiny (<GB) but which ones are most important vs size (ie the win-per-byte ratio) isnt indicated. (I know its early times.)
Also I hope they've SHA512'd the datasets for authenticity before all
myriad copies being flungabout are 'accused' of being manipulated 'to
On Fri, Dec 16, 2016 at 06:42:46PM +0200, DaKnOb said: these promote
the climate change agenda' yadda.
Canada: time to step up! (Cant imagine the Natl Research Council would do so on their mirror site, too much of a gloves-off slap in the face to Trump.)
/kc
On Fri, Dec 16, 2016 at 06:02:46PM +0200, DaKnOb said:
If you???re interested, there???s also a Slack team: climatemirror.slack.com
You can find more info about that here:
- https://climate.daknob.net/ - http://climatemirror.org/ - http://www.ppehlab.org/datarefuge
Thank you for your help!
On 16 Dec 2016, at 17:58, Rich Kulawiec <rsk@gsp.org> wrote:
This is a short-term (about one month) project being thrown together in a hurry...and it could use some help. I know that some of you have lots of resources to throw at this, so if you have an interest in preserving a lot of scientific research data, I've set up a mailing list to coordinate IT efforts to help out. Signup via climatedata-request@firemountain.net or, if you prefer Mailman's web interface, http://www.firemountain.net/mailman/listinfo/climatedata should work.
Thanks, ---rsk
-- Ken Chase - math@sizone.org Guelph Canada
-- Miano, Steven M. http://stevenmiano.com