Re: outages, quality monitoring, trouble tickets, etc
On Mon, 27 Nov 1995, Jon Zeeff wrote:
Being in the web hosting business, we measure our own "availability" and that of others. The top providers do 99.9%. The median in our sample group is 98.5%. That's about 15 times worse and 11 hours/month.
It's amazing how many of the companies in the low 98s claim 99.9%.
I'd be terribly interested to know how you obtained these figures...we do web hosting services as well. I had one of our clients complain angrily for weeks that his web site was frequently "down" because he couldn't get to it from AOL. I had to sit him down and show that his site was operational and accessible from a dozen other sites to convince him that AOL was the exception, and that our connectivity and server reliability were not to blame.
I've seen several providers claim "it's AOL's fault" because the provider themselves didn't properly set up an RADB entry so the ANS network would carry their packets. Given that AOL is on the other side of the ANS network, that could be a big problem. So, just sitting down and showing the site is accessible from a dozen other sites proves nothing about it not being the provider's problem -- it helps to show that it works in some cases though. Ed Morin Northwest Nexus, Inc. _________________________________________________________________________ Ed Morin edm@halcyon.com Northwest Nexus - Professional Internet Services Bellevue, WA USA Voice: 206 455-3505 Web: http://www.halcyon.com/ Info: info@halcyon.com
On Tue, 28 Nov 1995, Ed Morin wrote: [AOL unreachability]
I've seen several providers claim "it's AOL's fault" because the provider themselves didn't properly set up an RADB entry so the ANS network would carry their packets. Given that AOL is on the other side of the ANS network, that could be a big problem. So, just sitting down and showing the site is accessible from a dozen other sites proves nothing about it not being the provider's problem -- it helps to show that it works in some cases though.
Hmm. I was under the impression that our maintainer object (NETRAIL-NOC), AS object (AS4006), and route object (205.215.0.0/18, containing the server in question) were sufficient. Do I need to go through some additional motions to satisfy ANS? Also, this seems to only be an intermittent problem (I would think that ANS not carrying our packets would result in complete unreachability). // Matt Zimmerman Chief of System Management NetRail, Inc. // mdz@netrail.net sales@netrail.net // (703) 524-4800 [voice] (703) 524-4802 [data] (703) 534-5033 [fax]
In message <Pine.LNX.3.91.951129134921.25369D-100000@netrail.net>, Matt Zimmerm an writes:
On Tue, 28 Nov 1995, Ed Morin wrote:
[AOL unreachability]
I've seen several providers claim "it's AOL's fault" because the provider themselves didn't properly set up an RADB entry so the ANS network would carry their packets. Given that AOL is on the other side of the ANS network, that could be a big problem. So, just sitting down and showing the site is accessible from a dozen other sites proves nothing about it not being the provider's problem -- it helps to show that it works in some cases though.
Hmm. I was under the impression that our maintainer object (NETRAIL-NOC), AS object (AS4006), and route object (205.215.0.0/18, containing the server in question) were sufficient. Do I need to go through some additional motions to satisfy ANS? Also, this seems to only be an intermittent problem (I would think that ANS not carrying our packets would result in complete unreachability).
// Matt Zimmerman Chief of System Management NetRail, Inc. // mdz@netrail.net sales@netrail.net // (703) 524-4800 [voice] (703) 524-4802 [data] (703) 534-5033 [fax]
The policy toward AS4006 was set to 1:3561 2:1239 based on the advisories for the 3 AS4006 nets that existed when we froze the aut-num. If you add prefixes to AS4006, you don't have to do anything except to make sure to register route objects with the correct origin AS. For AS that have never registered an AS690 advisory (there were 20 AS covering 59 prefixes in the IRR) we didn't have any policy. For AS that have never registered anything in the IRR, we don't have any import policy and we won't be importing their routes. We plan to run a perl program to detect new aut-nums and keep md5 sums of prior aut-nums so we can detect changes (assuming the changed field won't get changed). We will be basing any new import policy on the paths seen in the IRR, just sending a notification message to the AS affected when we change things. This will give us as reliable routing as we have now, but less burden on others to tell us how they want us to route towards them. I'm hoping to be able to catch things we need to change by noting changes to the aut-nums. Right now the tools to do this are not available, so we need to trace paths manually. It might be that updating prpaths is about all that is needed. Curtis
On Thu, 30 Nov 1995, Curtis Villamizar wrote:
The policy toward AS4006 was set to 1:3561 2:1239 based on the advisories for the 3 AS4006 nets that existed when we froze the aut-num. If you add prefixes to AS4006, you don't have to do anything except to make sure to register route objects with the correct origin AS.
The prefix in question was one of these three, and our networks seem to be talking just fine. The fact that our system also sends and receives hundreds of messages to/from AOL customers every day would seem to suggest further that the original problem was/is with AOL. Of course, the fact that a good percentage of their servers don't respond to pings from here makes it difficult to isolate when this is happening. Are they just broken in this aspect, or is this another symptom of a connectivity problem between us? (I just tried this from another location outside our network, and trying to ping a.mx.aol.com produced a _segfault_ (Solaris box)...what's going on here?) // Matt Zimmerman Chief of System Management NetRail, Inc. // mdz@netrail.net sales@netrail.net // (703) 524-4800 [voice] (703) 524-4802 [data] (703) 534-5033 [fax]
In message <Pine.LNX.3.91.951130161916.15338O-100000@netrail.net>, Matt Zimmerm an writes:
On Thu, 30 Nov 1995, Curtis Villamizar wrote:
The policy toward AS4006 was set to 1:3561 2:1239 based on the advisories for the 3 AS4006 nets that existed when we froze the aut-num. If you add prefixes to AS4006, you don't have to do anything except to make sure to register route objects with the correct origin AS.
The prefix in question was one of these three, and our networks seem to be talking just fine. The fact that our system also sends and receives hundreds of messages to/from AOL customers every day would seem to suggest further that the original problem was/is with AOL. Of course, the fact that a good percentage of their servers don't respond to pings from here makes it difficult to isolate when this is happening. Are they just broken in this aspect, or is this another symptom of a connectivity problem between us? (I just tried this from another location outside our network, and trying to ping a.mx.aol.com produced a _segfault_ (Solaris box)...what's going on here?)
// Matt Zimmerman Chief of System Management NetRail, Inc. // mdz@netrail.net sales@netrail.net // (703) 524-4800 [voice] (703) 524-4802 [data] (703) 534-5033 [fax]
Try running traceroute rather than or in addition to ping. How about if we take this off line. This may not be a pressing NANOG issue. Curtis
participants (3)
-
Curtis Villamizar
-
edm@halcyon.com
-
Matt Zimmerman