On Thu, Jul 07, 2016 at 08:32:19PM +0000, Mel Beckman said:
Yes. It indicates that there was never a time when you did not know everything :)
-mel beckman
The issue isnt knowing everything, it's making accusations of issues while you still dont know how much you dont know. (~D. Rumsfeld) -- My customers in a nutshell (they pay to be able to yell about random stuff I guess, and I provide that service!). The OP didnt make any accusations however, and just asked what was going on (sorry if I sounded harsh in reply). Once, Google having a 8.8.8.8 failure locally on its (anycast?) dns servers resulted in dozens of calls to us "your server hosting our site must be down!! Our website isnt working! People are calling us!". Most of my work is with these situations is spent proving it's not our fault. Mtr makes it very hard because it's a very subtle tool, and only gives partial information. (I still think mtr is a killer app though!) consider this (fake, example) trace: 6. 100ge13-1.core1.chi1.he.net 0.0% 10 7. 100ge14-1.core2.chi1.he.net 0.0% 10 8. 100ge3-1.core1.sjc2.he.net 30.0% 10 9. ??? 10. UNKNOWN-216-115-101-X.yahoo.com 10.0% 10 11. routerer-ext.ysv.freebsd.org 20.0% 10 12. wfe0.ysv.freebsd.org 30.0% 10 First off, the OP may have asked "who's fault is hop 9, yahoo or HE?" and seen it as an issue. Ignoring that for now, the rest of the packetloss is an issue -- where is the problem though? This is very tricky - it looks like hop 8 is at fault of course - or is it just dropping ICMP as it's allowed to? How did hop 10 get only 10% loss then if 8 has 30? Is 8 then dropping ~20% (not statistically correct..) of ICMP just cuz it can, and then having a 'real' 10% loss on top of that? Or it's hop 11? But hop 12 has more PL, perhaps hop 12 is the issue all along and 8 10 and 11 are just dropping ICMP? Or it's 8, 11 and 12 doing ~10% each? (not statistically correct.) Can't say for sure - it's a probabilities game - and being completely correct about it, hop 6 isn't blameless either (just very unlikely to be at fault statistically, though not impossible with only 10 pings per hop - a statistician can calculate it for us). This is why more pings are required to be sure of the situation - I like to do -i 0.1 -c 100 so it's completed quickly before conditions change. Then you can make a statistically valid pronouncement of where the problem MIGHT BE within a useful confidence interval - however, without the return route we're still largely in the dark as to the actual location of the issue. You cant be '100% sure' with this stuff - technically speaking, it's all 'luck of the draw'. (Beware: this one time, at band camp, some etherchannel or equiv at HE was showing PL only for specific ips in any target subnet -- because they were xor'ing the source & target IP to load balance and one channel was wonky. Fun times debugging that one: "WFM from here, what's your issue?") /kc -- Ken Chase - ken@heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
Ken, I should have made clear I wasn't replying to you. I was replying to Brielle's comment:
Is it bad that the first thing that came to mind is "Oh FFS, another troll"?
-mel beckman
On Jul 7, 2016, at 2:35 PM, Ken Chase <math@sizone.org> wrote:
On Thu, Jul 07, 2016 at 08:32:19PM +0000, Mel Beckman said:
Yes. It indicates that there was never a time when you did not know everything :)
-mel beckman
The issue isnt knowing everything, it's making accusations of issues while you still dont know how much you dont know. (~D. Rumsfeld) -- My customers in a nutshell (they pay to be able to yell about random stuff I guess, and I provide that service!).
The OP didnt make any accusations however, and just asked what was going on (sorry if I sounded harsh in reply). Once, Google having a 8.8.8.8 failure locally on its (anycast?) dns servers resulted in dozens of calls to us "your server hosting our site must be down!! Our website isnt working! People are calling us!".
Most of my work is with these situations is spent proving it's not our fault. Mtr makes it very hard because it's a very subtle tool, and only gives partial information. (I still think mtr is a killer app though!)
consider this (fake, example) trace:
6. 100ge13-1.core1.chi1.he.net 0.0% 10 7. 100ge14-1.core2.chi1.he.net 0.0% 10 8. 100ge3-1.core1.sjc2.he.net 30.0% 10 9. ??? 10. UNKNOWN-216-115-101-X.yahoo.com 10.0% 10 11. routerer-ext.ysv.freebsd.org 20.0% 10 12. wfe0.ysv.freebsd.org 30.0% 10
First off, the OP may have asked "who's fault is hop 9, yahoo or HE?" and seen it as an issue. Ignoring that for now, the rest of the packetloss is an issue -- where is the problem though?
This is very tricky - it looks like hop 8 is at fault of course - or is it just dropping ICMP as it's allowed to? How did hop 10 get only 10% loss then if 8 has 30? Is 8 then dropping ~20% (not statistically correct..) of ICMP just cuz it can, and then having a 'real' 10% loss on top of that?
Or it's hop 11? But hop 12 has more PL, perhaps hop 12 is the issue all along and 8 10 and 11 are just dropping ICMP? Or it's 8, 11 and 12 doing ~10% each? (not statistically correct.)
Can't say for sure - it's a probabilities game - and being completely correct about it, hop 6 isn't blameless either (just very unlikely to be at fault statistically, though not impossible with only 10 pings per hop - a statistician can calculate it for us).
This is why more pings are required to be sure of the situation - I like to do -i 0.1 -c 100 so it's completed quickly before conditions change. Then you can make a statistically valid pronouncement of where the problem MIGHT BE within a useful confidence interval - however, without the return route we're still largely in the dark as to the actual location of the issue. You cant be '100% sure' with this stuff - technically speaking, it's all 'luck of the draw'.
(Beware: this one time, at band camp, some etherchannel or equiv at HE was showing PL only for specific ips in any target subnet -- because they were xor'ing the source & target IP to load balance and one channel was wonky. Fun times debugging that one: "WFM from here, what's your issue?")
/kc -- Ken Chase - ken@heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
On 7/7/16 3:50 PM, Mel Beckman wrote:
Ken,
I should have made clear I wasn't replying to you. I was replying to Brielle's comment:
Is it bad that the first thing that came to mind is "Oh FFS, another troll"? I'd never say I was always knowledgeable, but after the thread the other day, and just general stress lately, some reactions are hard not to give into :)
-- Brielle Bruns The Summit Open Source Development Group http://www.sosdg.org / http://www.ahbl.org
participants (3)
-
Brielle Bruns
-
Ken Chase
-
Mel Beckman