On Thu, Oct 19, 2006 at 07:18:01PM -0400, Deepak Jain wrote:
1 NOC (that will remain nameless even though they should really be shamed) said the following in response to the question -- when we were trying to diagnose +50ms jumps in their latency within a single POP.
Q: "As part of this, can you tell me why your router is prohibiting packets being sent to our interface?"
A:" The reason you cannot hit your interface is it is blocked for security reasons."
I've heard this response before, albeit not from the company you're referring to. The most common response -- which is at this point a template response -- I hear is "Well, you can't rely on traceroute because of ICMP prioritisation". When you start to explain how traceroute actually works (both ICMP-based and UDP-based (which still relies on ICMP responses, of course!)), and that ICMP prio should only affect the IP of which the router listens on (and not hops beyond or at the dest), most NOCs fire back with another template of their choice ("We're not aware of any issues", "No that's incorrect", "I'll check with engineers", or the ever-so-amusing "traceroute and ping aren't reliable, you need to use a different method of testing") -- but the most common is: "can you send us traceroutes of what you're seeing?" "But! You just said..... argh!!" I happen to work in a NOC, and I have never -- nor will I ever -- spout off that template response. When a client or customer calls about something, I give them the benefit of the doubt. If it turns out they're wrong later, they at least (hopefully) learned something. I just happen to believe in getting things done, rather than arguing against doing investigative work. When an issue occurs, look at it quickly, not 24-48 hours later. I am absolutely fine with ICMP being prioritised last, but those scenarios induce more questions; "so ICMP is prio'd last, which would mean the router is busy processing other packets, which could mean your router is over-utilised either CPU-wise or iface-wise since we're seeing 250ms at your hop and beyond". 48 hours later, a network technician looks at the router and either finds absolutely nothing ("It must've gone away on it's own") or finds something conclusive (but only when the issue re-occurs, is still occurring, or if they keep historic data).
Did I miss the conspiracy?? I know my membership dues are all paid up. If this has been going on a while, I apologize I guess I've just noticed the trend in our shift reports.
Yes, this has been going on for awhile. Well, not ICMP_UNREACH_NET (from your example) but general ICMP prioritisation or the explicit dropping of either ICMP_TIMXCEED (traceroute) or ICMP_ECHOREPLY (ping). A real-life example, from my own (residential) ISP. Try to imagine reporting an issue at hop 6 to a technician (who will always insist the problem is somewhere prior). Here's an example of a working network (no sarcasm; I'm serious!): 1. 192.168.1.1 0.0% 30 30 0.5 0.5 0.5 0.6 2. ??? 100.0 30 0 0.0 0.0 0.0 0.0 3. 68.87.198.129 0.0% 30 30 8.5 9.9 7.5 21.1 4. 68.87.192.34 30.0% 30 21 9.2 11.9 9.2 20.5 5. 68.87.226.134 66.7% 30 10 10.9 12.9 10.2 25.9 6. 12.116.188.13 0.0% 30 30 10.6 12.6 10.1 25.0 7. 12.123.12.126 0.0% 30 30 12.9 12.3 10.2 15.3 And an example of when things are broken: 1. 192.168.1.1 0.0% 30 30 0.5 0.5 0.4 0.6 2. ??? 100.0 30 0 0.0 0.0 0.0 0.0 3. 68.87.198.129 0.0% 30 30 12.7 12.7 8.1 28.4 4. 68.87.192.34 20.0% 30 24 13.4 11.5 9.5 14.5 5. 68.87.226.134 96.7% 30 1 12.6 12.6 12.6 12.6 6. 12.116.188.13 50.0% 30 15 15.1 11.8 10.3 15.1 7. 12.123.12.122 50.0% 30 15 11.6 17.5 11.1 60.5 Since I'm not a network administrator, I'll ask point blank: why exactly do your netadmins filter and rate ICMP like this, and what are you gaining from it? Most kiddies stick with pure TCP or UDP these days -- the goal is to saturate the pipe, not cause a literal service DoS (e.g. crashing Apache, etc.) Additionally, I'll ask another question: exactly what tool are NOCs (or even network administrators) supposed to use to diagnose network path problems via layer 3 and 4? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |