From: Avi Freedman <freedman@netaxs.com> To: Jack Rickard <jack.rickard@boardwatch.com> Cc: mohney@access.digex.net; nanog@merit.edu; GeneShklar@keynote.com Subject: Re: Keynote/Boardwatch Results Date: Tuesday, July 08, 1997 9:40 PM
It would appear that everyone is pretty smugly satisfied by concensus
the performance series we ran actually measures server performance and
since all ISPs run weeny home servers, this was not "really" a test, flawed methodology, etc. I corresponded with Doug the Hump at Digex about
Avi: "Cheating" is of course encouraged. This isn't an academic test at your local university. We're all out of school now. If you can figure out a way to beat the game, you have ipso facto figured out a way to make the web look faster to end users. As it appears to be, so it is. As Martha Stewart says, that would be a "good thing." It would be a good thing for you and your product line. It would be a good thing for your web site customers. It would be a good thing for end-users viewing such web sites. If the net effect of all this is that all the smart people on the net get mad at me and go figure out a way to make the web look faster, it will be well and good enough for me as well. I had previously agreed NOT to publish IP numbers of the Keynote host machines. Keynote does make this information available on their web site, so I myself was a little bemused by the request, but I did agree to honor it. In any event, someone else has already posted the locations and networks (which we DID publish), along with the IP numbers, here on the list. So you should have them. If mirroring/caching moves the numbers definitively, it then establishes a "real" value to such a technique, and it can be offered to customers at a higher price with some actual data comparing how they will "look" to the world using both the less expensive simple hosting as compared to the more expensive geographic mirroring technique. I personally think this would move the numbers more than anything else that could be done, but that's what looks LOGICAL to me, not anything I know or have tested. I am rather convinced that moving a single web site to another location, putting it on larger iron, or using different OS will have very minor impact on the numbers. My own personal theory is a little vaguely formed. But I think the heart of the performance problems lie in a visceral connundrum of the network. It is and should be one network. The perceptual disjuncture I already detect, even among NANOG members, between straight "routes" as viewed by traceroute and ping, and the way data actually moves on the network (at least vis a vis the mental model or "theory" I have of it) is a somewhat shocking part of the problem. I was actually unaware that many of the network engineers actually viewed the world in this way until this exercise. I was even a bit flip about not dealing with ping/traceroute at any level of comparison. Perhaps an article on this is in order. But I think most of it has to do with interconnects between networks, and architectural decisions accummulated over the years based on concepts of what should be "fair" with regards to the insoluble, but ever moronic "settlements" concept and who gets value from whom based on what. If decisions had been more based on optimizing data flows, and less on whose packets transit who's backbones and why, performance would have been improved. I don't know how much, but certainly some. When the main thing on the minds of CEOs of networks is preventing anyone from having a "free ride" (ala SIdgemore's theory of optimizing Internet performance by it being owned totally by UUNET), or the relatively mindless routing algorithm of moving a packet to the destination network at the earliest opportunity to make sure "I" am not paying for it's transit, if it goes to "your" location, I suspect performance suffers. My sense is larger numbers of smaller networks, interconnected at ever more granular locations, would be a good thing. This will get me in big caca with the "hop counting" mind set, and of course at about 254 hops a minor little problem arises, but I think so nonetheless. Very small ISPs know this viscerally. They all want to multihome to multiple backbones, and have done some work to interconnect among themselves locally. Savvis actually has a very interesting concept though it upsets everybody. It kind of upsets me because it makes my head hurt, literally. They've carried it almost to another level of abstraction. If you ponder it long beyond the obvious, it has some interesting economic consequences. Checkbook NAPS lead to an inverted power structure where the further away you move from centralized networks such as internetMCI and Sprint, by blending layers after the fashion of a winemaker, the better your product becomes and the better apparent connectivity your customers have. The head hurting part is that if you extend this infinitely we would all wind up dancing to the tune of a single dialup AOL customer in Casper Wyoming somewhere in the end. But there is a huge clue in here somewhere. In all cases, Savvis numbers were better than the either UUNET, Sprint, or internetMCI numbers individually. Would it then be true, that if there were three Savvis's each aggregating three networks with a private NAP matrix, and I developed a private NAP matrix using the three Savvis level meshes, would my performance be yet better again? And what if Savvis opened the gates and allowed UUNET end users to connect to Sprint IP Services web sites transiting via Savvis network? More vaguely, if you have four barrels of wine with one a bit acidic, one a bit sweet, one a bit oakey, and one a bit tannic, and you blend them all together, it would appear apparent that you would have a least common denominator wine that is acidic, sweet, oakey, and tannic. You don't. You get a fifth wine that is infinitely superior than the sum of the parts or any one component barrel. It is an entirely "new thing." This is sufficiently true that it is in almost all cases how wines are made now. Are networks like wine? Jack Rickard ============================================================== Jack Rickard Boardwatch Magazine Editor 8500 West Bowles Ave jack.rickard@boardwatch.com Littleton, CO 80123 (303)973-6038 voice (303)973-3731 fax http://www.boardwatch.com ============================================================== ---------- that that this.
I've liked this guy since I first met him largely because he's funny and doesn't take himself too seriously. He's got a yen for black helicopters that still has me in stitches.
Humph. Doug the Hump indeed. Well, Alex @ our shop wants an Apache Helicopter for our NOC as well. I'm laughing because I'm not sure anyone's called him that before...
Anyway, I'm thinking of putting www.netaxs.net on one of our core routers :)
Think that'd help?
Actually, what we'd do is make it a loopback interface on all of our core routers and thus you'd hit whatever the closest router to the querying machine is, bypassing much of the network.
Hmm. Have to try that one out.
Anyway, I have to take a look at some of the test sites (saw some of them listed in the new Directory) and see if I can figure out some of the topology of the testing.
Jack - could you put up IPs or whatnot of the sources (the test sites) for people to "tune" and test to?
jack.rickard@boardwatch.com Littleton, CO 80123
Avi
On Wed, 9 Jul 1997, Jack Rickard wrote: Before you start with your claims, Jack, that I have something to lose, you should realize that I am an independent consultant, and work for none of the people in the study. ==>what looks LOGICAL to me, not anything I know or have tested. I am rather ==>convinced that moving a single web site to another location, putting it on ==>larger iron, or using different OS will have very minor impact on the ==>numbers. You may be convinced because of the theory you've developed to match the flawed methodology with which the tests were performed. However, I have some tests that I did to measure the connect delays on sites. Here's the average for 200 web sites that were given to me when I polled some people for their favorite sites (duplicates excluded): (because in a lot of cases we're talking milliseconds, percentage is not really fine enough, but this was to satisfy personal curiosity) SYN -> SYN/ACK time (actual connection) 22% Web browser says "Contacting www.website.com..." SYN/ACK -> first data (web server work-- 78% getting material, processing material) Web browser says "www.website.com contacted, waiting for response" Note that this didn't include different types of content. But it *did* truly measure one thing--that the delay caused by web servers is considerably higher than that of "network performance" (or actual connect time). And, the biggest beef is that you claimed Boardwatch's test was BACKBONE NETWORK performance, not end-to-end user-perception performance. You threw in about 20 extra variables that cloud exactly what you were measureing. Not to mention completely misrepresenting what you actually measured. /cah
I guess along these lines the following question came up. If this was supposed to be an end-to-end user performance idea, why were backbone provider sites being hit instead of sites more typical end users would be using? Say, a major search engine? It smacks me that the article's results were slanted to make a comment on backbones, not user-viewed performance, where the test has been argued to be measuring the latter. DISCLOSURE: Then again, we do a god awful amount of web traffic and like people looking at "real world" performance over any particular path through any particular cloud. -Deepak. On Wed, 9 Jul 1997, Craig A. Huegen wrote:
On Wed, 9 Jul 1997, Jack Rickard wrote:
Before you start with your claims, Jack, that I have something to lose, you should realize that I am an independent consultant, and work for none of the people in the study.
==>what looks LOGICAL to me, not anything I know or have tested. I am rather ==>convinced that moving a single web site to another location, putting it on ==>larger iron, or using different OS will have very minor impact on the ==>numbers.
You may be convinced because of the theory you've developed to match the flawed methodology with which the tests were performed. However, I have some tests that I did to measure the connect delays on sites.
Here's the average for 200 web sites that were given to me when I polled some people for their favorite sites (duplicates excluded):
(because in a lot of cases we're talking milliseconds, percentage is not really fine enough, but this was to satisfy personal curiosity)
SYN -> SYN/ACK time (actual connection) 22% Web browser says "Contacting www.website.com..."
SYN/ACK -> first data (web server work-- 78% getting material, processing material) Web browser says "www.website.com contacted, waiting for response"
Note that this didn't include different types of content. But it *did* truly measure one thing--that the delay caused by web servers is considerably higher than that of "network performance" (or actual connect time).
And, the biggest beef is that you claimed Boardwatch's test was BACKBONE NETWORK performance, not end-to-end user-perception performance. You threw in about 20 extra variables that cloud exactly what you were measureing. Not to mention completely misrepresenting what you actually measured.
/cah
SYN -> SYN/ACK time (actual connection) 22% Web browser says "Contacting www.website.com..."
SYN/ACK -> first data (web server work-- 78% getting material, processing material) Web browser says "www.website.com contacted, waiting for response"
Note that this didn't include different types of content. But it *did* truly measure one thing--that the delay caused by web servers is considerably higher than that of "network performance" (or actual connect time).
Urm, maybe I'm missing something here, but taking an incredibly simplistic model where you have a probability p of losing any packet, there is a (1-p)^3 chance of 3 way handshake losing a packet and stalling, and a (1-p)^(2 * no packets reqd for 1st data) for the latter. With slow start etc there are bound to be more than two packets back before it starts processing the response, so the latter is always going to have a higher chance of failing. Now add to the fact that with technology such as ATM, it is more likely large packets will be dropped than small ones (with a given cell loss probability), and being careful to remember all that good stuff at the last but one NANOG about broken client stacks, and I think you might find the above is a "non measurement". I *think* (and am not sure) that if you have a proxy set up, you always get the latter once you have connected to the proxy. Oh, and to skew the figures in the other direction, doesn't the first prompt come up while the DNS lookup is being done? Alex Bligh Xara Networks
On Wed, 9 Jul 1997, Alex.Bligh wrote: ==>ones (with a given cell loss probability), and being careful to remember ==>all that good stuff at the last but one NANOG about broken client stacks, ==>and I think you might find the above is a "non measurement". It's a rough measurement, and if you'd go so far as to assign a 20% error margin, you'd stillsee that a web server still owns a *significant* piece of the click-to-data time, over 50%. I think that a 20% error margin would be fair for this, provided neither I nor my provider was having network problems at the time. At the time, this was intended as a rough measurement to determine how much time was wasted in waiting for inefficient web servers. ==>I *think* (and am not sure) that if you have a proxy set up, you ==>always get the latter once you have connected to the proxy. ==> ==>Oh, and to skew the figures in the other direction, doesn't the first ==>prompt come up while the DNS lookup is being done? Nope. You'll see "Looking up host www.website.com..." in most browsers. (I didn't use a browser to measure this; those "web browser says" lines were there for the reference--a lot of people ask me why it sits there a while after saying "contacted, waiting for response". /cah
Grumble. This is really starting to grate. Wasn't the point of this ~study to find the best value for your dollar when buying leased lines? What exactly does putting web servers in your pop have to do with backbone performance. Furthermore, exactly how would you economically scale and support such a spaghetti operation? It seems to me the unintended goal of the study was to find access providers on whose networks web sites have a snappier "user experience". Am I the only one here who see this as measuring apples to judge oranges? Will someone please point me to the scientific method being used here, 'cause I sure as hell can't see it. -- JMC On Wed, 9 Jul 1997, Jack Rickard wrote:
"Cheating" is of course encouraged. This isn't an academic test at your local university. We're all out of school now. If you can figure out a way
participants (5)
-
Alex.Bligh
-
Craig A. Huegen
-
Deepak Jain
-
Jack Rickard
-
Jesse Caulfield