On 7/30/2012 1:42 PM, Patrick W. Gilmore wrote:
I'm sorry Panashe is upset by this rule. Interestingly, "Your search - Panashe Flack nanog - did not match any documents." So my guess is that a post from that account has not happened before, meaning the post was moderated yet still made it through.
Has anyone done a data mining experiment to see how many posts a month are from "new" members? My guess is it is a trivial percentage.
Ignoring many harder to determine things like "who has changed their email address" and reducing it to simple shell commands, I got this: for i in `cat ../nanog_archive_index.html | grep txt | cut -f2 -d\"` ; do wget http://mailman.nanog.org/pipermail/nanog/$i; done du -sh=41M (uncompressed=100M). That seems small for all the mail since random 2007 but I'd rather use an official archive so people can duplicate results and refine things. grep -h "^From: " * | sort | uniq -c | sort -nr First of all I will say Owen is winning by a fair margin: 1562 From: owen at delong.com (Owen DeLong) 929 From: randy at psg.com (Randy Bush) 775 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) 688 From: morrowc.lists at gmail.com (Christopher Morrow) 621 From: jbates at brightok.net (Jack Bates) 558 From: jra at baylink.com (Jay Ashworth) 480 From: gbonser at seven.com (George Bonser) 450 From: patrick at ianai.net (Patrick W. Gilmore) 446 From: cidr-report at potaroo.net (cidr-report at potaroo.net) Total count: grep -h "^From: " * | wc -l 54166 # Totals for < 10 contributors for i in 1 2 3 4 5 6 7 8 9; do grep -h "^From: " * | sort | uniq -c | sort -nr | grep " $i" | wc -l; done 3129 1111 552 319 208 157 131 103 94 Total for less than 10 posts contributors: 5804 Percentages: 5804/54166=1% of posts from low contributors. # shows the number of people who've contributed that number of times. grep -h "^From: " * | sort | uniq -c | sort -nr | awk '{print $1}' | uniq -c | sort -nr # another interesting thing to look at is posts by month per user (dropping the -h from grep): grep "^From: " * | sort | uniq -c | sort -nr # not the most efficient, but tells you who posted the most in a month: for i in *; do grep "^From: " * | sort | uniq -c | sort -nr | grep $i | head -n 1; done # Per month, how many single post contributions happen/total. The numbers can be higher here since people who posted in a different month may still be counted as a new contributor for i in *; do echo -n "$i "; grep "^From: " $i | sort | uniq -c | sort -nr | grep " 1 " | wc -l | tr '\n' '/'; grep "^From: " $i | wc -l ; done