On Fri, Jul 06, 2012 at 09:19:48AM -0500, Matt Chung wrote:
A former manager of mine once told me you can gauge a persons understanding by the questions they ask and I personally agree with this statement. Most of us will be able to make a reasonable assessment of the person by listening to the content of their questions. I'm not looking for an immediate resolution, but trying to understand the thought process of the individual. I feel realistic scenarios provide some insight on the individual's analytical skills.
"A client cannot access the website "http://xyz.com". What do you do to troubleshoot this issue?"
it's blocking icmp echo.. dns works.. with multiple regional dns servers.. the page loads for me.. has a modern tcp/ip stack, probably linux judging by an initial window size of 14600 .. hosted on amazon web services... I'd imagine that they're unlikely to be blocking icmp totally.. and just the echo.. but there's still that possibility... (yeah I know it's just an example)
Depending on the candidate, I've seen a variety of answers: 1) "Can you ping the device?" 2) "Can you access the gateway?" 3) "What does the running config look like on the router" 4) "Is there a firewall in between"
heh,.. think i've been on the internet too long. i think from the destination site not working and what could be wrong with it.. then work my way back to the client. of course i completely skipped in my thinking that maybe other sites don't work too, and that there could be malware... and i didn't actually try going to the site with anything other than curl... i suppose a big part of that particular problem is figuring out if it's at their end - a greater problem - or an actual problem getting to the site.
I believe these questions may be asked in the right context provided there is enough information to isolate the issue to the network however the statement is devoid of anything useful that would make the network suspect. I would like to hear some questions such as:
"are other websites accessible? Or is the only website the client is experiencing issues with?" "was the website working previously? when did it start happening?" "what does the client see on their screen ? are they getting an error?"
yeah that's a good idea :) my order is probably assuming there may be a more complicated issue, when it could be a simple problem, which actually seems to be quite common from what i've experienced with technical people. oh! the network cable was unplugged!
These questions reflect the persons ability to accurately understand the problem before deep diving into the technical details. From there, you can get more technical. "Client is receiving an HTTP 404 error." Great, rule out network since this is an application layer response...
Some of those type problems have got a lot more complicated. Like - that could be a transparent proxy caching an HTTP 404... or the web site could be hosted in multiple locations and not syncing between them properly, which could still require some level of debugging.. or someone somehow managed to advertise the hosts subnet with a more preferred route, then doesn't have the content. Or say someone's decided to do something fancy like give different IP's back from DNS but giving internal IP addresses back to the local farm.. but they've decided to use Amazon DNS servers.. and set them to give IP .. but the customer happens to be using Amazon DNS servers because they're hosting a web site on Amazon, and for some reason thought it'd be a good idea.. and then the internal IP address of course doesn't have the content. I suppose that's still application level to some points of view. It doesn't make the site magically work though, or figure out what's causing it. Also from my experience, I don't tend to find out one website's not working unless it is working on/off or for other people, and the most common situation seems to be some kind of load balancing with one mirror not working, and I find it helpful to check from a few locations. And sometimes doing dns lookups, on multiple DNS servers, and seeing a different IP and using curl -x <ip>:80 seems to be the easiest way to check this. But that's assuming a transparent proxied network, which tends to mean MTU issues show up as instead "banking web sites aren't working". Which can show up sometimes when people change routers to one not doing MSS-clamping, and operate at 1492 MTU... The issue is significant enough, and the problem hard enough for helpdesk type people to diagnose that it's common for MSS clamping to be set at a network level for networks with a significant amount of people with < 1500 MTU. Ben.