question concerning traceroute?
I am trying to troubleshoot a latency issue for some of our networks, and was wondering about this.Knowing that routing isn't always symmetrical, is it possible for a traceroute to traverse a different reverse path, than the path that it took to get there? .or will it provide a trace of the path the packet took to reach the destination? According to definition, is should take the same path, but are there any other cases that I should be aware of? Darrell
$author = "Darrell Carley" ;
I am trying to troubleshoot a latency issue for some of our networks, and was wondering about this.Knowing that routing isn't always symmetrical, is it possible for a traceroute to traverse a different reverse path, than the path that it took to get there? .or will it provide a trace of the path the packet took to reach the destination? According to definition, is should take the same path, but are there any other cases that I should be aware of?
a traceroute shows the outbound route. it's possible for the the probe packets to follow one path and the returning icmp packets to take another path. a looking glass in the AS your tracing to is a good way to see what the return path is... marty -- Close your brown eyes, And lay down next to me. Close your eyes, lay down. 'Cos there goes the fear, Let it go. "There Goes the Fear" - Doves
I am trying to troubleshoot a latency issue for some of our networks, and was wondering about this.Knowing that routing isn't always symmetrical, is it possible for a traceroute to traverse a different reverse path, than the path that it took to get there?
Traceroute sends UDP datagrams and receives ICMP datagrams in order to show you what it shows you. It is possible for the ICMP datagrams to return via a different path than the UDP datagrams took outbound (it is also possible that they will not return).
.or will it provide a trace of the path the packet took to reach the destination?
This is not the "or" case of the question you asked previously. Traceroute will display the path that the UDP datagrams took to get to the destination you specified. No information will be presented about the return path that the ICMP datagrams took.
According to definition, is should take the same path
This is not a correct assertion.
but are there any other cases that I should be aware of?
The traceroute man page lists a few. Stephen
On Thu, Oct 17, 2002 at 07:45:39AM -0700, Stephen Stuart wrote: Traceroute sends UDP datagrams and receives ICMP datagrams in order to show you what it shows you. It is possible for the ICMP datagrams to return via a different path than the UDP datagrams took outbound (it is also possible that they will not return). remark it is also possible for the (forward or reverse) path to change in the middle of the measurement, such that traceroute output would lead you to believe a path that never existed anywhere on the Internet (i.e., one that is not manifested in the current physical Internet) and you would not be able to confirm for sure without asking the contacts for the IP links in question how they're connected. traceroute is a disconcertingly blunt hammer; that we continue to use it to essentially nail moving jello to a wall says more about us than about anything on the Internet (and is quite the testimony to van who thought it up and implemented it in a few hours 20 years ago and noone has come up with anything better since.) (caida has a few hundred gigabytes of traceroute-like output on disk, so it's at least auspicious for the mass storage industry if not the jello nailing mission) k
.or will it provide a trace of the path the packet took to reach the destination?
This is not the "or" case of the question you asked previously. Traceroute will display the path that the UDP datagrams took to get to the destination you specified. No information will be presented about the return path that the ICMP datagrams took.
According to definition, is should take the same path
This is not a correct assertion.
but are there any other cases that I should be aware of?
The traceroute man page lists a few. Stephen
On Thu, Oct 17, 2002 at 08:43:01AM -0700, k claffy wrote:
remark it is also possible for the (forward or reverse) path to change in the middle of the measurement, such that traceroute output would lead you to believe a path that never existed anywhere on the Internet (i.e., one that is not manifested in the current physical Internet) and you would not be able to confirm for sure without asking the contacts for the IP links in question how they're connected.
That's true but only if you have compiled traceroute without "--enable-schroedinger" ;-)) -- Arnold
On Thu, 17 Oct 2002, k claffy wrote:
remark it is also possible for the (forward or reverse) path to change in the middle of the measurement, such that traceroute output would lead you to believe a path that never existed anywhere on the Internet (i.e., one that is not manifested in the current physical Internet) and you would not be able to confirm for sure without asking the contacts for the IP links in question how they're connected.
Although I've only seen it as part of an April's Fool prank, it is possible to do amazingly evil things to traceroutes, Whitehouse.Gov going through Kremvax.Su Truth is such an elusive thing. Not only do you need to worry about the network changing while you are measuring it, you also need to worry about the network telling you the truth about what happened to the packet.
traceroute is a disconcertingly blunt hammer; that we continue to use it to essentially nail moving jello to a wall says more about us than about anything on the Internet (and is quite the testimony to van who thought it up and implemented it in a few hours 20 years ago and noone has come up with anything better since.)
People have come up with other ways of tracing routers through a packet network, snmptrace, ip record route, beacon packets. But they all have limitations compared to Van Jacobson's traceroute. A testiment to the power of traceroute is its now considered necessary functionality for any data network, not just TCP/UDP/IP networks. OSItraceroute, MPLStraceroute, ATMtraceroute, DECNETtraceroute, etc.
alex@yuriev.com wrote:
According to definition, is should take the same path, but are there any other cases that I should be aware of?
According to the definition, it is going to show you the path the packets took from you to the destination, not from the destination back.
Unless you did "- g", Arnold
alex@yuriev.com wrote:
According to definition, is should take the same path, but are there any other cases that I should be aware of?
According to the definition, it is going to show you the path the packets took from you to the destination, not from the destination back.
Unless you did "- g",
Not correct. -g specifies loose source routing on the way *there*, not back. Alex
On Thu, Oct 17, 2002 at 10:58:03AM -0400, alex@yuriev.com wrote:
alex@yuriev.com wrote:
According to definition, is should take the same path, but are there any other cases that I should be aware of?
According to the definition, it is going to show you the path the packets took from you to the destination, not from the destination back.
Unless you did "- g",
Not correct. -g specifies loose source routing on the way *there*, not back.
Alex, I think the intention was to indicate that you can traceroute -g <remote-router-before-host> <your-local-ip> to get the path to and back. -g requires an argument obviously.. - Jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
According to definition, is should take the same path, but are there any other cases that I should be aware of?
According to the definition, it is going to show you the path the packets took from you to the destination, not from the destination back.
Unless you did "- g",
Not correct. -g specifies loose source routing on the way *there*, not back.
I think the intention was to indicate that you can traceroute -g <remote-router-before-host> <your-local-ip>
to get the path to and back. -g requires an argument obviously.
That, obviously, is correct. However, the remote ip in this case is your local IP, so you are still getting a path to the destination. Even more importantly, LSR relies on every router on a forward path between <your-local-ip> and <remote-router-before-host> allowing LSR, which is an invalid assumption. Thanks, Alex
at Thursday, October 17, 2002 3:58 PM, alex@yuriev.com <alex@yuriev.com> was seen to say:
Unless you did "- g", Not correct. -g specifies loose source routing on the way *there*, not back. No, you can get both if you ping *yourself* with the actual destination as -g. this gives you both legs of the trip.
Anyone have any idea what really happened : http://www.boston.com/dailyglobe2/330/science/Got_paper_+.shtml <snip> It was too late. Somewhere in the web of copper wires and glass fibers that connects the hospital's two campuses and satellite offices, the data was stuck in an endless loop. Halamka's technicians shut down part of the network to contain it, but that created a cascade of new problems. The entire system crashed, freezing the massive stream of information - prescriptions, lab tests, patient histories, Medicare bills - that shoots through the hospital's electronic arteries every day, touching every aspect of care for hundreds of patients. ... The crisis had nothing to do with the particular software the researcher was using. The problem had to do with a system called ''spanning tree protocol,'' which finds the most efficient way to move information through the network and blocks alternate routes to prevent data from getting stuck in a loop. The large volume of data the researcher was uploading happened to be the last drop that made the network overflow. Regards Marshall Eubanks
Hmm, well until the comment about STP it sounded like the guy did something stupid on a program/database on a mainframe.. I cant see how STP could do this or require that level of DR. Perhaps its just the scapegoat for the Doc's mistake which he didnt want to admit! STeve On Wed, 27 Nov 2002, Marshall Eubanks wrote:
Anyone have any idea what really happened :
http://www.boston.com/dailyglobe2/330/science/Got_paper_+.shtml
<snip> It was too late. Somewhere in the web of copper wires and glass fibers that connects the hospital's two campuses and satellite offices, the data was stuck in an endless loop. Halamka's technicians shut down part of the network to contain it, but that created a cascade of new problems.
The entire system crashed, freezing the massive stream of information - prescriptions, lab tests, patient histories, Medicare bills - that shoots through the hospital's electronic arteries every day, touching every aspect of care for hundreds of patients. ... The crisis had nothing to do with the particular software the researcher was using. The problem had to do with a system called ''spanning tree protocol,'' which finds the most efficient way to move information through the network and blocks alternate routes to prevent data from getting stuck in a loop. The large volume of data the researcher was uploading happened to be the last drop that made the network overflow.
Regards Marshall Eubanks
On Wednesday, Nov 27, 2002, at 10:25 Canada/Eastern, Stephen J. Wilcox wrote:
Hmm, well until the comment about STP it sounded like the guy did something stupid on a program/database on a mainframe..
I cant see how STP could do this or require that level of DR. Perhaps its just the scapegoat for the Doc's mistake which he didnt want to admit!
If it's anything like any other layer-2 IT network meltdown I've seen, it'll be some combination of: + no documentation on what the network looks like, apart from a large yellow autocad diagram which was stapled to the wall in the basement wiring closet in 1988 + a scarcity of diagnostic tools, and no knowledge of how to use the ones that do exist + complete ignorance of what traffic flows when the network is not broken + a cable management standard that was first broken in 1988 and has only been used since to pad out RFPs + consideration to network design which does not extend beyond the reassuring knowledge that the sales guy who sold you the hardware is a good guy, and will look after you + random unauthorised insertion of hubs and switches into the fabric by users who got fed up of waiting eight months to get another ethernet port installed in their lab + customers who have been trained by its vendors to believe that certification is more important than experience + customers who believe in the cost benefit of a large distributed layer-2 network over a large distributed (largely self-documenting) layer-3 network. Just another day at the office. Joe
Sure, which is why "Within a few hours, Cisco Systems, the hospital's network provider, was loading thousands of pounds of network equipment onto an airplane in California, bound " seems somewhat excessive! :) and "The crisis began on a Wednesday afternoon, Nov. 13, and lasted nearly four days" sounds like an opportunity for any consultants on nanog who have half a clue about how to setup a LAN! Steve On Wed, 27 Nov 2002, Joe Abley wrote:
On Wednesday, Nov 27, 2002, at 10:25 Canada/Eastern, Stephen J. Wilcox wrote:
Hmm, well until the comment about STP it sounded like the guy did something stupid on a program/database on a mainframe..
I cant see how STP could do this or require that level of DR. Perhaps its just the scapegoat for the Doc's mistake which he didnt want to admit!
If it's anything like any other layer-2 IT network meltdown I've seen, it'll be some combination of:
+ no documentation on what the network looks like, apart from a large yellow autocad diagram which was stapled to the wall in the basement wiring closet in 1988
+ a scarcity of diagnostic tools, and no knowledge of how to use the ones that do exist
+ complete ignorance of what traffic flows when the network is not broken
+ a cable management standard that was first broken in 1988 and has only been used since to pad out RFPs
+ consideration to network design which does not extend beyond the reassuring knowledge that the sales guy who sold you the hardware is a good guy, and will look after you
+ random unauthorised insertion of hubs and switches into the fabric by users who got fed up of waiting eight months to get another ethernet port installed in their lab
+ customers who have been trained by its vendors to believe that certification is more important than experience
+ customers who believe in the cost benefit of a large distributed layer-2 network over a large distributed (largely self-documenting) layer-3 network.
Just another day at the office.
Joe
Anyone have any idea what really happened : http://www.boston.com/dailyglobe2/330/science/Got_paper_+.shtml
I know someone who worked on it, but I've avoided asking what really happened so I don't freak out the day the ambulence drives me up to their emergency room :) The other day, I did forward the article over to our medical school in the hopes that they might "check" their network for similar "issues" before something happens :) I don't know which scares me more: that the hospital messed up spanning-tree so badly (which means they likely had it turned off) that it imploded their entire network. Or that it took them 4 days to figure it out. Eric :)
At 11:10 AM -0500 11/27/02, Eric Gauthier wrote:
I don't know which scares me more: that the hospital messed up spanning-tree so badly (which means they likely had it turned off) that it imploded their entire network. Or that it took them 4 days to figure it out.
If it's anything like a former employer I used to work for, it's possible the physical wiring plant is owned/managed by the telco group which jealously guards its infrastructure from the networking group. A subnet I used to work on was dropped dead for a day when a telco-type punched a digital phone down into the computer network causing a broadcast storm. It took half a day just to get the wiring map, then another half day to track down the offending port because the tech in the network group dispatched to solve the problem did not have a current network map. The subnet in question contained a unix cluster with cross-mounted file systems that processed CAT scans for brain trauma research. The sysadmin of that system told me that they lost a week's worth of research because of that cock-up. Hospitals are very soft targets network-wise, with hundreds, if not thousands of nodes of edge equipment unmanned for hours long stretches. On a regular basis, I saw wiring closets propped open and used as storage space for other equipment. Track down a pair of scrubs, and you can walk just about anywhere in a hospital without being challenged as long as you look like you know where you are going and what you are doing. Ten years later, there are still routers there that I can log into as the passwords have never been changed because the administrators of them were reorganized out or laid off and the equipment was orphaned. Minimal social engineering plus a weak network security infrastructure is a disaster waiting to happen for any major medical facility. -- Regards, Chris Kilbourn Founder _________________________________________________________________ digital.forest Int'l: +1-425-483-0483 where Internet solutions grow http://www.forest.net
Thus spake "Eric Gauthier" <eric@roxanne.org>
Anyone have any idea what really happened : http://www.boston.com/dailyglobe2/330/science/Got_paper_+.shtml
I can't speak to exactly what happened because of NDA, but I think I can help NANOGers understand the environment and why this happens in general.
I know someone who worked on it, but I've avoided asking what really happened so I don't freak out the day the ambulence drives me up to their emergency room :) The other day, I did forward the article over to our medical school in the hopes that they might "check" their network for similar "issues" before something happens :)
I see a lot of Fortune 500 networks in my job, and I'd say at least 75% of them are in the same state: a house of cards standing only because new cards are added so slowly. Any major event, whether a new bandwidth-hungry application or a parity error in a router, can bring the whole thing down, and there's no way to bring it back up again in its existing state. No matter how many powerpoint slides you send to the CIO, it's always a complete shock when the company ends up in the proverbial handbasket and you're looking at several days of downtime to do 4+ years of maintenance and design changes. And, what's worse, nobody learns the lesson and this repeats every 2-5 years, with varying degrees of public visibility. This is a bit of culture shock for most ISPs, because an ISP exists to serve the network, and proper design is at least understood, if not always adhered to. In the corporate world, however, the network and support staff are an expense to be minimized, and capital or headcount is almost never available to fix things that are "working" today.
I don't know which scares me more: that the hospital messed up spanning-tree so badly (which means they likely had it turned off) that it imploded their entire network. Or that it took them 4 days to figure it out.
It didn't take 4 days to figure out what was wrong -- that's usually apparent within an hour or so. What takes 4 days is having to reconfigure or replace every part of the network without any documentation or advance planning. My nightmares aren't about having a customer crater like this -- that's an expectation. My nightmare is when it happens to the entire Fortune 100 on the same weekend, because it's only pure luck that it doesn't. S
On Fri, 29 Nov 2002, Stephen Sprunk wrote:
This is a bit of culture shock for most ISPs, because an ISP exists to serve the network, and proper design is at least understood, if not always adhered to. In the corporate world, however, the network and support staff are an expense to be minimized, and capital or headcount is almost never available to fix things that are "working" today.
I think you are mistaken. In most "ISPs" engineers are considered an unfortunate expense, to be reduced to bare bone minimum (defined as the point where network starts to fall apart, and irate customers reach CEO through the layers of managerial defenses). Proper design of corporate networks is understood much better than that of backbones (witness the unending stream of new magic backbone routing paradigms, which never seem to deliver anything remotely as useful as claimed), so the only explanation for having 10+ hops in spanning tree is plain old incompetence.
It didn't take 4 days to figure out what was wrong -- that's usually apparent within an hour or so. What takes 4 days is having to reconfigure or replace every part of the network without any documentation or advance planning.
Ditto.
My nightmares aren't about having a customer crater like this -- that's an expectation. My nightmare is when it happens to the entire Fortune 100 on the same weekend, because it's only pure luck that it doesn't.
Hopefully, not all of their staff is sold on the newest magical tricks from OFRV, and most just did old fashioned L-3 routing design. --vadim
Marshall, "It was Dr. John Halamka, the former emergency-room physician who runs Beth Israel Deaconess Medical Center's gigantic computer network" It appears what really happened is that they put an emergency room doctor in charge of a critical system in which he, in all likelyhood, had limited training. In the medical system, he was trusted because of he was a doctor. The sad thing about this is that there seems to be no realization that having experienced networking folks in this job might have averted a situation that could have been (almost certainly was?) deleterious to patient care. We all know folks who are unemployed thanks to the telecom meltdown, so its not like this institution couldn't have hired a competant network engineer on the cheap. Sorry for the rant - I just hate to see the newspaper missing the point, here. They didn't have one quote from an actual networking expert. It does look like Cisco took the oportunity to sell them some stuff - looks like someone got something out of this - too bad it wasn't the patients :) - Dan On Wed, 27 Nov 2002, Marshall Eubanks wrote:
Anyone have any idea what really happened :
http://www.boston.com/dailyglobe2/330/science/Got_paper_+.shtml
<snip> It was too late. Somewhere in the web of copper wires and glass fibers that connects the hospital's two campuses and satellite offices, the data was stuck in an endless loop. Halamka's technicians shut down part of the network to contain it, but that created a cascade of new problems.
The entire system crashed, freezing the massive stream of information - prescriptions, lab tests, patient histories, Medicare bills - that shoots through the hospital's electronic arteries every day, touching every aspect of care for hundreds of patients. ... The crisis had nothing to do with the particular software the researcher was using. The problem had to do with a system called ''spanning tree protocol,'' which finds the most efficient way to move information through the network and blocks alternate routes to prevent data from getting stuck in a loop. The large volume of data the researcher was uploading happened to be the last drop that made the network overflow.
Regards Marshall Eubanks
Unnamed Administration sources reported that Daniel Golding said:
"It was Dr. John Halamka, the former emergency-room physician who runs Beth Israel Deaconess Medical Center's gigantic computer network"
It appears what really happened is that they put an emergency room doctor in charge of a critical system in which he, in all likelyhood, had limited training. In the medical system, he was trusted because of he was a doctor. The sad thing about this is that there seems to be no realization that having experienced networking folks in this job might have averted a situation that could have been (almost certainly was?) deleterious to patient care.
Did you, in fact, read Halamka's resume? He sounds to me like he has more smarts in the networking area than many of the RedmondWorshipers I encounter regularly. Was he Sean Donelan or Randy Bush? No. -- A host is a host from coast to coast.................wb8foz@nrk.com & no one will talk to a host that's close........[v].(301) 56-LINUX Unless the host (that isn't close).........................pob 1433 is busy, hung or dead....................................20915-1433
Yes, I read his bio. I'm sure he's quite the techie amongst his fellow physicans, and I think thats a great thing. However, its more than just a bad idea to put someone who isn't completely proficient in a job like this - its bad for the patients. If you want to run a shoe company, and put a shoe salesman with a couple linux boxes in charge of your network, more power to you. However, if you run a huge hospital, at which, there are numerous patient affecting IT systems, you really have an obligation to hire a professional, rather than a talented amateur, with all due respect to the good doctor. As far as "Redmondworshippers" - whatever does the job. If you are running a hospital, and microsoft products work for you, then buy them. The key is knowing what to buy, how to keep in from breaking, and how to fix it, quickly and efficiently when it does, be it Cisco, Microsoft, Linux, a couple of tin cans with string, or whatever. Some background for those not from/in Boston: This is a very large medical center, not a community or midsized hospital - Dan On Fri, 29 Nov 2002, David Lesher wrote:
Unnamed Administration sources reported that Daniel Golding said:
"It was Dr. John Halamka, the former emergency-room physician who runs Beth Israel Deaconess Medical Center's gigantic computer network"
It appears what really happened is that they put an emergency room doctor in charge of a critical system in which he, in all likelyhood, had limited training. In the medical system, he was trusted because of he was a doctor. The sad thing about this is that there seems to be no realization that having experienced networking folks in this job might have averted a situation that could have been (almost certainly was?) deleterious to patient care.
Did you, in fact, read Halamka's resume? He sounds to me like he has more smarts in the networking area than many of the RedmondWorshipers I encounter regularly.
Was he Sean Donelan or Randy Bush? No.
-- A host is a host from coast to coast.................wb8foz@nrk.com & no one will talk to a host that's close........[v].(301) 56-LINUX Unless the host (that isn't close).........................pob 1433 is busy, hung or dead....................................20915-1433
## On 2002-11-29 15:05 -0600 Daniel Golding typed: DG> DG> DG> Yes, I read his bio. I'm sure he's quite the techie amongst his fellow DG> physicans, and I think thats a great thing. However, its more than just a DG> bad idea to put someone who isn't completely proficient in a job like this DG> - its bad for the patients. If you want to run a shoe company, and put a DG> shoe salesman with a couple linux boxes in charge of your network, more DG> power to you. However, if you run a huge hospital, at which, there are DG> numerous patient affecting IT systems, you really have an obligation to DG> hire a professional, rather than a talented amateur, with all due respect DG> to the good doctor. Hi Daniel, Are you suggesting that a CIO at a "huge hospital"(or any other enterprise) Needs to be an expert at LAN/WAN networking, Systems, DBA & Security Rather than a management expert that has a good grasp of the basic IT issues and understands the core business needs of the enterprise ? -- Rafi
I would be more likely to say that, then he need be a physician with management skills. I think Dan made this point already in his post. I have tremendous respect for physicians having grown up in that field. But they tend to be so smart that they get themselves in trouble, or have trouble knowing their limitations. I say that with all respect. I think this case showed that the IT staff was lacking some checks and balances, or just proper procedures that most networking engineers might have brought to the table. There are a lot of physicians that get themselves in trouble flying planes too, as evident by some of the nicknames given to some of the high performance planes. At some point networks become complicated enough that experts need to be brought in. A manager knows when to delegate and I think it's important to have good people to delegate to. There are plenty of specialists in the medical field, so this is not a new concept. At 23:43 +0200 11/29/02, Rafi Sadowsky wrote:
## On 2002-11-29 15:05 -0600 Daniel Golding typed:
DG> DG> DG> Yes, I read his bio. I'm sure he's quite the techie amongst his fellow DG> physicans, and I think thats a great thing. However, its more than just a DG> bad idea to put someone who isn't completely proficient in a job like this DG> - its bad for the patients. If you want to run a shoe company, and put a DG> shoe salesman with a couple linux boxes in charge of your network, more DG> power to you. However, if you run a huge hospital, at which, there are DG> numerous patient affecting IT systems, you really have an obligation to DG> hire a professional, rather than a talented amateur, with all due respect DG> to the good doctor.
Hi Daniel,
Are you suggesting that a CIO at a "huge hospital"(or any other enterprise) Needs to be an expert at LAN/WAN networking, Systems, DBA & Security Rather than a management expert that has a good grasp of the basic IT issues and understands the core business needs of the enterprise ?
-- Rafi
A good question, Rafi. IMHO, a CIO at a hospital or other large, technology-intensive institution should have a very solid IT background. by preference, it is someone who has come up the ranks from development, systems administration, or network engineering, perhaps gotten an MBA, and gone into the management/financial side of the house. You can't be an expert at everything. However, you should be an expert at some aspect, preferably the one that has the greatest importance to the enterprise. (i.e. you want your CIO at a biotech company to be very database/storage heavy. you probably want your CIO at a bank to be very network or database knowledgable.) I suppose the most important thing is, hire someone who can tell when they are being deceived by vendors, contractors, or employees. That requires a good general knowledge of information technology concepts. This kind of person would also know that some aspects of IT like documentation, planning, and scalability are all constants, regardless of what type of project is being worked on. One of our greatest weaknesses in this field, is the belief, by those who do not work in it, that anyone can pick up a book and quickly get up to speed on technology. Sadly, that is not the case. So, yes, I'm saying that a physician probably should not be the CIO of a very large hospital. - Dan On Fri, 29 Nov 2002, Rafi Sadowsky wrote:
## On 2002-11-29 15:05 -0600 Daniel Golding typed:
DG> DG> DG> Yes, I read his bio. I'm sure he's quite the techie amongst his fellow DG> physicans, and I think thats a great thing. However, its more than just a DG> bad idea to put someone who isn't completely proficient in a job like this DG> - its bad for the patients. If you want to run a shoe company, and put a DG> shoe salesman with a couple linux boxes in charge of your network, more DG> power to you. However, if you run a huge hospital, at which, there are DG> numerous patient affecting IT systems, you really have an obligation to DG> hire a professional, rather than a talented amateur, with all due respect DG> to the good doctor.
Hi Daniel,
Are you suggesting that a CIO at a "huge hospital"(or any other enterprise) Needs to be an expert at LAN/WAN networking, Systems, DBA & Security Rather than a management expert that has a good grasp of the basic IT issues and understands the core business needs of the enterprise ?
-- Rafi
I find the reactions on this mailing list disturbing, to say the least. The rush to judgement about what happened appears to be based on speculation and assumptions about how this large facility was run, managed and staffed. As far as I can see, the known facts are: There was an oversize layer 2 network and it broke. It was hard to repair. The CTO is a physician on the hospital board who, on first sight, appears to have considerable qualifications in the IT area. The unknowns are: How many, if any, employees are there who are directly tasked with maintaining the network? How well trained are they? How well documented was the network? How well did whatever staff were responsible for maintaining it actually understand the network? What sort of reporting was done - did anyone raise the issues of the size of the network? Was any risk assessment done? How much planning had been done, if any, to address the problems? How did the network get to be the way it was? Did it just grow or were changes forced upon it by external constraints? Why was it hard to repair? But people are speculating with no knowledge of the actual organisation, history, planning, what risk assesment had or had not been done, or any other information excpet guesses and prejudices about what they think might have happened and an apparent assumption that this is all the result of turning over a large enterprise network to a jumped up physician whose only qualification was running a couple of Linux boxes on a home network. None of the above unknown issues have been addressed anywhere. I hope the posters never pull jury service, as there seems to be a complete disregard for the idea of gathering facts before passing judgement. -- Jim Segrave jes@nl.demon.net
## On 2002-11-30 15:41 +0100 Jim Segrave typed: JS> JS> I find the reactions on this mailing list disturbing, to say the JS> least. The rush to judgement about what happened appears to be based JS> on speculation and assumptions about how this large facility was run, JS> managed and staffed. JS> JS> As far as I can see, the known facts are: JS> JS> There was an oversize layer 2 network and it broke. JS> It was hard to repair. JS> The CTO is a physician on the hospital board who, on first sight, JS> appears to have considerable qualifications in the IT area. I agree except that it's not CTO but rather the CIO JS> JS> The unknowns are: [snipped for brevity] Many unknowns - no argument here JS> JS> JS> But people are speculating with no knowledge of the JS> actual organisation, history, planning, what risk assesment had or had JS> not been done, or any other information excpet guesses and prejudices JS> about what they think might have happened and an apparent assumption JS> that this is all the result of turning over a large enterprise network JS> to a jumped up physician whose only qualification was running a couple JS> of Linux boxes on a home network. None of the above unknown issues JS> have been addressed anywhere. ## On 2002-11-29 23:43 +0200 I typed: RS> RS> RS> Are you suggesting that a CIO at a "huge hospital"(or any other enterprise) RS> Needs to be an expert at LAN/WAN networking, Systems, DBA & Security RS> Rather than a management expert that has a good grasp of the basic IT RS> issues and understands the core business needs of the enterprise ? RS> Can you please indicate the assumptions/speculations in the above question? JS> I hope the posters never pull jury service, as there seems to be a JS> complete disregard for the idea of gathering facts before passing JS> judgement. JS> 1) You seem to imply *all* previous posters in this thread (which is why I'm responding to you in public) 2) IMHO you should try having a good long look in a mirror ;-) -- Rafi
Thus spake "Jim Segrave" <jes@nl.demon.net>
I find the reactions on this mailing list disturbing, to say the least. The rush to judgement about what happened appears to be based on speculation and assumptions about how this large facility was run, managed and staffed.
Everyone with the facts is covered by NDA. I have tried to provide a characterization of the events based on similar incidents at other Fortune 100 shops. I believe this to be educational even if I can't confirm it is completely relevant to the incident at hand. If you care about getting more details, then call the good doctor and ask him yourself. In the meantime, you're not going to get any more information from the press or NANOG than you have about, say, the week-long WorldCom FR outage a year or so ago.
I hope the posters never pull jury service, as there seems to be a complete disregard for the idea of gathering facts before passing judgement.
I'd take a jury of NANOGers over the usual pool of people too poor or stupid to find a way out of serving. At least most of _us_ live in the same delusional reality. S
On Sat, 30 Nov 2002, Stephen Sprunk wrote:
Everyone with the facts is covered by NDA.
The owner(s) of the facts can always decide what facts to disclose.
If you care about getting more details, then call the good doctor and ask him yourself. In the meantime, you're not going to get any more information from the press or NANOG than you have about, say, the week-long WorldCom FR outage a year or so ago.
Worldcom could disclose what happened to their network. (or what happened to their accounting, but that's a different story). I suspect we will learn more about what happened to Beth Israel Deaconess Hospital's network than we've ever heard publically about any of Worldcom's network problems. Dr. John Halamka has already publically stated he intends to tell other hospitals what happened and how they can avoid the same problem. Excerpt from the Boston Globe article: "No other Massachusetts hospital has ever reported such a long-lasting or disruptive network crash, said Elliot Stone, executive director of the Massachusetts Health Data Consortium, a group that brings together chief information officers from hospitals and health plans around the state. He praised Beth Israel Deaconess for being open about the problem and sharing lessons learned, both about technology itself and about policy - such as the need to enforce rules against unauthorized additions of new software onto the network."
I suspect we will learn more about what happened to Beth Israel Deaconess Hospital's network than we've ever heard publically about any of Worldcom's network problems. Dr. John Halamka has already publically stated he intends to tell other hospitals what happened and how they can avoid the same problem.
Hopefully it will be something along the lines of "complex layer 2 networks are fickle, and have vastly fewer mechanisms to implement policy than are available at layer 3; networks that serve different departments within the same organization are just as worthy of layer-3 policy boundaries as separate enterprises that have a need to keep their networks distinct." Sometimes the router or firewall that protects you from another department is just as valuable as the one that protects you from "the outside." Those of us who have been in the Ethernet-based exchange point business are well-aware of the dangers of building complex layer 2 topologies, especially when a portion of the customer base adds to the L2 fabric by fronting their router with an aggregation switch that is just as likely as not to be connected to another customer's aggregation switch without the first customer's knowledge ("we thought they provisioned a router port on their side, really"). Everyone claims to - and to be honest, many do - operate their L2 equipment correctly, but (funny thing) problems still occur. Administrative boundaries, and well-thought-out means of implementing those boundaries at places where networks touch, are important. Stephen
Radia Pearlman lives only a few miles away - they could have asked her for a quote :) However, I would not be too harsh towards Dr. John - it is common practice in specialty organizations to put a member of the club in charge of every department, even if most of the decisions are actually made by the staff, as he or she is supposed to better understand the needs (and lingo) of the organization. In the military, for example, an Officer is always in charge of an installation - literally the CO - even if he and his aide are the _only_ military personnel stationed there - which happens sometimes with highly technical activities. So I would not assume that the good Doctor is actually the one configuring the network. I wonder if Cisco will be moving them from an enormous flat Layer 2 network to a more sensible Layer 3 IP network. Marshall On Friday, November 29, 2002, at 12:22 PM, Daniel Golding wrote:
Marshall,
"It was Dr. John Halamka, the former emergency-room physician who runs Beth Israel Deaconess Medical Center's gigantic computer network"
It appears what really happened is that they put an emergency room doctor in charge of a critical system in which he, in all likelyhood, had limited training. In the medical system, he was trusted because of he was a doctor. The sad thing about this is that there seems to be no realization that having experienced networking folks in this job might have averted a situation that could have been (almost certainly was?) deleterious to patient care.
We all know folks who are unemployed thanks to the telecom meltdown, so its not like this institution couldn't have hired a competant network engineer on the cheap.
Sorry for the rant - I just hate to see the newspaper missing the point, here. They didn't have one quote from an actual networking expert. It does look like Cisco took the oportunity to sell them some stuff - looks like someone got something out of this - too bad it wasn't the patients :)
- Dan
On Wed, 27 Nov 2002, Marshall Eubanks wrote:
Anyone have any idea what really happened :
http://www.boston.com/dailyglobe2/330/science/Got_paper_+.shtml
<snip> It was too late. Somewhere in the web of copper wires and glass fibers that connects the hospital's two campuses and satellite offices, the data was stuck in an endless loop. Halamka's technicians shut down part of the network to contain it, but that created a cascade of new problems.
The entire system crashed, freezing the massive stream of information - prescriptions, lab tests, patient histories, Medicare bills - that shoots through the hospital's electronic arteries every day, touching every aspect of care for hundreds of patients. ... The crisis had nothing to do with the particular software the researcher was using. The problem had to do with a system called ''spanning tree protocol,'' which finds the most efficient way to move information through the network and blocks alternate routes to prevent data from getting stuck in a loop. The large volume of data the researcher was uploading happened to be the last drop that made the network overflow.
Regards Marshall Eubanks
Thus spake "Daniel Golding" <dgold@FDFNet.Net>
It appears what really happened is that they put an emergency room doctor in charge of a critical system in which he, in all likelyhood, had limited training. In the medical system, he was trusted because of he was a doctor. The sad thing about this is that there seems to be no realization that having experienced networking folks in this job might have averted a situation that could have been (almost certainly was?) deleterious to patient care.
I think it's safe to say there was competent staff involved before the incident and everyone knew exactly how bad the network was and how likely a failure was. It's very rare for people to not know exactly how bad off they are. The question is whether management considers this worth spending resources, either money or manpower, to fix. S
There used to be an old flag you could set on an ICMP_ECHO request to record the path the echo reply takes back (ping -R or -r?), but apparently its not used much anymore. Probably just as well.. it could only hold ~8 hops.. Andy ----- Original Message ----- From: Darrell Carley To: nanog@merit.edu Sent: Thursday, October 17, 2002 10:31 AM Subject: question concerning traceroute? I am trying to troubleshoot a latency issue for some of our networks, and was wondering about this.Knowing that routing isn't always symmetrical, is it possible for a traceroute to traverse a different reverse path, than the path that it took to get there? .or will it provide a trace of the path the packet took to reach the destination? According to definition, is should take the same path, but are there any other cases that I should be aware of? Darrell
On Thu, Oct 17, 2002 at 10:31:12AM -0400, Darrell Carley <darrell@national-net.com> wrote:
I am trying to troubleshoot a latency issue for some of our networks, and was wondering about this.Knowing that routing isn't always symmetrical, is it possible for a traceroute to traverse a different reverse path, than the path that it took to get there? .or will it provide a trace of the path the packet took to reach the destination? According to definition, is should take the same path, but are there any other cases that I should be aware of?
Something else to be aware of is the effect of ECMP on traceroutes--where the source/dest IP (among other hash inputs) can impact which of several parallel equal cost paths you take thru a backbone. ECMP is fairly common, so I would suspect a fairly large percentage of paths are subject to it. -Lane
Darrell
participants (24)
-
alex@yuriev.com
-
Andy Johnson
-
Arnold Nipper
-
Chris Kilbourn
-
Daniel Golding
-
Darrell Carley
-
David Diaz
-
David Howe
-
David Lesher
-
Eric Gauthier
-
Jared Mauch
-
Jim Segrave
-
Joe Abley
-
k claffy
-
Lane Patterson
-
Marshall Eubanks
-
Martin
-
Nipper, Arnold
-
Rafi Sadowsky
-
Sean Donelan
-
Stephen J. Wilcox
-
Stephen Sprunk
-
Stephen Stuart
-
Vadim Antonov