Another Update from our guys... Power problems in the MFS facility caused the initial outage. Power has been restored, but the Gigaswitch has lost it's configuration causing all peers to remain down. Based on some of the mail to this list, it is GIGA 1 at MAE West, 2 and 3 are up. (Most of those complaining are on GIGA 1, Mathew at ACSI is on GIGA 2 and thinks he's up. I can't reach him directly from NASA/Ames, so my guess is that he is up, but can only see peers at GIGA 2 and 3 - see next para...) By the way, to expand the thread, I have the feeling that the three OC-3s from NASA Ames side to MFS all go to GIGA 1, which then has fddi loops to GIGA 2 and then to GIGA 3. Does anyone know if this is the case? If it is, seems that a better design would have been to route some of the OC-3s to the other GIGAs first. If 1 is down, then it can't pass traffic through to 2 and 3, so there is a single point of failure for all the switches at MFS. Rodney Joffe Chief Technology Officer Genuity Inc., a Bechtel company http://www.genuity.net
Yes, all 4 OC-3c circuits do terminate at the MFS Gigaswitch-01. It is a single point of failure, but the ckts must all be on the same switch in order to use the load-sharing feature. -Lance- On Fri, 11 Jul 1997, Rodney Joffe wrote:
Another Update from our guys...
Power problems in the MFS facility caused the initial outage. Power has been restored, but the Gigaswitch has lost it's configuration causing all peers to remain down.
Based on some of the mail to this list, it is GIGA 1 at MAE West, 2 and 3 are up. (Most of those complaining are on GIGA 1, Mathew at ACSI is on GIGA 2 and thinks he's up. I can't reach him directly from NASA/Ames, so my guess is that he is up, but can only see peers at GIGA 2 and 3 - see next para...)
By the way, to expand the thread, I have the feeling that the three OC-3s from NASA Ames side to MFS all go to GIGA 1, which then has fddi loops to GIGA 2 and then to GIGA 3.
Does anyone know if this is the case?
If it is, seems that a better design would have been to route some of the OC-3s to the other GIGAs first. If 1 is down, then it can't pass traffic through to 2 and 3, so there is a single point of failure for all the switches at MFS.
Rodney Joffe Chief Technology Officer Genuity Inc., a Bechtel company http://www.genuity.net
On Fri, 11 Jul 1997, Rodney Joffe wrote:
By the way, to expand the thread, I have the feeling that the three OC-3s from NASA Ames side to MFS all go to GIGA 1, which then has fddi loops to GIGA 2 and then to GIGA 3.
Does anyone know if this is the case?
If it is, seems that a better design would have been to route some of the OC-3s to the other GIGAs first. If 1 is down, then it can't pass traffic through to 2 and 3, so there is a single point of failure for all the switches at MFS.
The Gigaswitch constrains us to a loop-free topology, so there's no way to avoid a single point of failure in a case like this. The Gigaswitch systems in general have been quite reliable over the last few years, modulo individual line card failures. The problems we tend to see are either load-related or caused by human error. This is the first major outage caused by a Gigaswitch itself that we've seen in a very long time. By the way, as of yesterday it's four OC3's between Ames and MFS. Steve
Steve wrote:
The Gigaswitch systems in general have been quite reliable over the last few years, modulo individual line card failures. The problems we tend to see are either load-related or caused by human error. This is the first major outage caused by a Gigaswitch itself that we've seen in a very long time.
Put all your eggs in one basket... just make isure it's a _very_ good basket. Cheers, -- jra -- Jay R. Ashworth jra@baylink.com Member of the Technical Staff Unsolicited Commercial Emailers Sued The Suncoast Freenet "People propose, science studies, technology Tampa Bay, Florida conforms." -- Dr. Don Norman +1 813 790 7592
On Fri, 11 Jul 1997, Rodney Joffe wrote:
By the way, to expand the thread, I have the feeling that the three OC-3s from NASA Ames side to MFS all go to GIGA 1, which then has fddi loops to GIGA 2 and then to GIGA 3.
Does anyone know if this is the case?
If it is, seems that a better design would have been to route some of the OC-3s to the other GIGAs first. If 1 is down, then it can't pass traffic through to 2 and 3, so there is a single point of failure for all the switches at MFS.
The Gigaswitch constrains us to a loop-free topology, so there's no way to avoid a single point of failure in a case like this.
The Gigaswitch systems in general have been quite reliable over the last few years, modulo individual line card failures. The problems we tend to see are either load-related or caused by human error. This is the first major outage caused by a Gigaswitch itself that we've seen in a very long time.
By the way, as of yesterday it's four OC3's between Ames and MFS. Steve
Steve, Are you telling me that the GigaSwitch, unlike every other bridge since well before I became involved in networking, is incapable of spanning tree? I find that hard to believe. Could anyone on the list from DEC please confirm or deny this absurdity? Owen
Are you telling me that the GigaSwitch, unlike every other bridge since well before I became involved in networking, is incapable of spanning tree? I find that hard to believe. Could anyone on the list from DEC please confirm or deny this absurdity?
The GIGAswitch/FDDI does spanning tree. As Lance Tatman pointed out, load-balancing only works between circuits joining the same two switches. This would certainly be a factor in planning what kind of wide-area connectivity to use to join two groups of switches. A single 400Mb/s aggregate might perform much better than a pair of 200Mb/s aggregates. Spanning tree plus wide-area connectivity implies that either you send bits farther than they have to go (like from one switch at 55 S. Market to another via Ames), or you have wide area connectivity sitting idle until a spanning tree recalculation decides to use it. I'll check my firmware release notes to see if there are any issues or restrictions regarding load-balancing and spanning tree in the last couple revisions. Stephen - ----- Stephen Stuart stuart@pa.dec.com Network Systems Laboratory Digital Equipment Corporation
At 11:38 PM 7/13/97 -0700, Stephen Stuart wrote:
As Lance Tatman pointed out, load-balancing only works between circuits joining the same two switches. This would certainly be a factor in planning what kind of wide-area connectivity to use to join two groups of switches. A single 400Mb/s aggregate might perform much better than a pair of 200Mb/s aggregates.
Better performance, but worse reliability? The single 400Mb/s is, after all, a single point of failure, no? d/
At 11:38 PM 7/13/97 -0700, Stephen Stuart wrote:
As Lance Tatman pointed out, load-balancing only works between circuits joining the same two switches. This would certainly be a factor in planning what kind of wide-area connectivity to use to join two groups of switches. A single 400Mb/s aggregate might perform much better than a pair of 200Mb/s aggregates.
Better performance, but worse reliability?
The single 400Mb/s is, after all, a single point of failure, no?
d/
To true. Would you care to cover the cost of replicating the 400Mbp/s to build in better reliability? It appears that the owners/operators of MAE-West are selecting an optimization path based on the assumption that outages are infrequent and can be quickly corrected. --bill
At 11:38 PM 7/13/97 -0700, Stephen Stuart wrote:
As Lance Tatman pointed out, load-balancing only works between circuits joining the same two switches. This would certainly be a factor in planning what kind of wide-area connectivity to use to join two groups of switches. A single 400Mb/s aggregate might perform much better than a pair of 200Mb/s aggregates.
Better performance, but worse reliability?
The single 400Mb/s is, after all, a single point of failure, no?
d/
To true. Would you care to cover the cost of replicating the 400Mbp/s to build in better reliability? It appears that the owners/operators of MAE-West are selecting an optimization path based on the assumption that outages are infrequent and can be quickly corrected.
--bill
Both assumputions have been repeatedly proven false BY MFS. Owen
On Mon, Jul 14, 1997 at 07:27:15AM -0700, Owen DeLong wrote:
To true. Would you care to cover the cost of replicating the 400Mbp/s [ quoting Bill ] to build in better reliability? It appears that the owners/operators of MAE-West are selecting an optimization path based on the assumption that outages are infrequent and can be quickly corrected.
Both assumputions have been repeatedly proven false BY MFS.
Indeed. More importantly, Bill, he wasn't suggesting duplicating the 400Mbps aggregate, but _splitting_ it; it is, after all, _already_ 4 separate links. Cheers, -- jra -- Jay R. Ashworth jra@baylink.com Member of the Technical Staff Unsolicited Commercial Emailers Sued The Suncoast Freenet "People propose, science studies, technology Tampa Bay, Florida conforms." -- Dr. Don Norman +1 813 790 7592
On Mon, Jul 14, 1997 at 07:27:15AM -0700, Owen DeLong wrote:
To true. Would you care to cover the cost of replicating the 400Mbp/s [ quoting Bill ] to build in better reliability? It appears that the owners/operators of MAE-West are selecting an optimization path based on the assumption that outages are infrequent and can be quickly corrected.
Both assumputions have been repeatedly proven false BY MFS.
Indeed. More importantly, Bill, he wasn't suggesting duplicating the 400Mbps aggregate, but _splitting_ it; it is, after all, _already_ 4 separate links.
The underlying physical media may be four separate links, but at L2 it's a single 400Mb/s aggregate. If it were split up into, say two 200Mb/s aggregates: 1) assuming that costs favored intra-building connections, one of the aggregates would be selected for pruning by the spanning tree calculation. 2) assuming that costs favored having both aggregates in service, if utilization on the two aggregates was 50% on (call it) A and 100% on B, the 50% available on A would be wasted. Note that latency would go up, because spanning tree would have pruned some intra-building link would have been pruned in order to keep the inter-building link active. As the risk of belaboring the obvious, of course there are issues of reliability and cost-effectiveness, only some of which are technical. With respect to economics, for instance: The cost to link two ports in the same building (better yet, the same room) is essentially the FDDI cable. The cost to link two ports over a wide area is that of a DS3/OC3. L2 is not all that different from L3 in that once you've spent real money on a circuit, you want to get as much use out of it as you can. Stephen
On Mon, Jul 14, 1997 at 07:27:15AM -0700, Owen DeLong wrote:
To true. Would you care to cover the cost of replicating the 400Mbp/s [ quoting Bill ] to build in better reliability? It appears that the owners/operators of MAE-West are selecting an optimization path based on the assumption that outages are infrequent and can be quickly corrected.
Both assumputions have been repeatedly proven false BY MFS.
Indeed. More importantly, Bill, he wasn't suggesting duplicating the 400Mbps aggregate, but _splitting_ it; it is, after all, _already_ 4 separate links.
The underlying physical media may be four separate links, but at L2 it's a single 400Mb/s aggregate. If it were split up into, say two 200Mb/s aggregates:
1) assuming that costs favored intra-building connections, one of the aggregates would be selected for pruning by the spanning tree calculation.
2) assuming that costs favored having both aggregates in service, if utilization on the two aggregates was 50% on (call it) A and 100% on B, the 50% available on A would be wasted. Note that latency would go up, because spanning tree would have pruned some intra-building link would have been pruned in order to keep the inter-building link active.
As the risk of belaboring the obvious, of course there are issues of reliability and cost-effectiveness, only some of which are technical.
With respect to economics, for instance: The cost to link two ports in the same building (better yet, the same room) is essentially the FDDI cable. The cost to link two ports over a wide area is that of a DS3/OC3. L2 is not all that different from L3 in that once you've spent real money on a circuit, you want to get as much use out of it as you can.
Stephen
I think that the argument here is that there is a real need to have an aggregate 400Mbp/s "pipe" in service as opposed to 2 each 200Mbp/s "pipes". To gain the redunancy that Mr. Dave suggests, would actually encourage the deployment of -two- additional 400Mbp/s channels. Then the economics arguments kick in. -- --bill
I think that the argument here is that there is a real need to have an aggregate 400Mbp/s "pipe" in service as opposed to 2 each 200Mbp/s "pipes". To gain the redunancy that Mr. Dave suggests, would actually encourage the deployment of -two- additional 400Mbp/s channels.
Then the economics arguments kick in.
Haven't been following this thread all that closely so pardon me if this has already been dealt with.. but note the earlier comment from mfs that the design constraints say loop free layer 2 topology. Which means redundant links don't exist. ---------------------------------------------------------------------- Wayne Bouchard GlobalCenter web@primenet.com Primenet Network Engineering Internet Solutions for (602) 416-6422 800-373-2499 x6422 Growing Businesses FAX: (602) 416-9422 http://www.primenet.com http://www.globalcenter.net ----------------------------------------------------------------------
On Mon, Jul 14, 1997 at 10:14:16AM -0700, Stephen Stuart wrote:
Indeed. More importantly, Bill, he wasn't suggesting duplicating the 400Mbps aggregate, but _splitting_ it; it is, after all, _already_ 4 separate links.
The underlying physical media may be four separate links, but at L2 it's a single 400Mb/s aggregate. If it were split up into, say two 200Mb/s aggregates:
1) assuming that costs favored intra-building connections, one of the aggregates would be selected for pruning by the spanning tree calculation.
2) assuming that costs favored having both aggregates in service, if utilization on the two aggregates was 50% on (call it) A and 100% on B, the 50% available on A would be wasted. Note that latency would go up, because spanning tree would have pruned some intra-building link would have been pruned in order to keep the inter-building link active.
If this is true, then the Layer 2 bandwidth aggregation design is pretty weak, no? For example, (and yes, I know there's a world of difference) a MLPPP link is at (effectively) layer 2 (if not 1.5), and if one side of the link drops, the other side will carry what it can. Cheers, -- jra -- Jay R. Ashworth jra@baylink.com Member of the Technical Staff Unsolicited Commercial Emailers Sued The Suncoast Freenet "People propose, science studies, technology Tampa Bay, Florida conforms." -- Dr. Don Norman +1 813 790 7592
2) assuming that costs favored having both aggregates in service, if utilization on the two aggregates was 50% on (call it) A and 100% on B, the 50% available on A would be wasted. Note that latency would go up, because spanning tree would have pruned some intra-building link would have been pruned in order to keep the inter-building link active.
If this is true, then the Layer 2 bandwidth aggregation design is pretty weak, no?
You're mixing apples and oranges.
For example, (and yes, I know there's a world of difference) a MLPPP link is at (effectively) layer 2 (if not 1.5), and if one side of the link drops, the other side will carry what it can.
That is what happens within an aggregate. The multi-link PPP channel corresponds to an "aggregate" in the terminology that I am using. The topic being discussed is not what happens within an aggregate, but what happens when two aggregates are using. This would be akin to having two multi-link PPP connections (each constructed out of some number of physical links). Stephen
On Mon, Jul 14, 1997 at 11:48:31AM -0700, Stephen Stuart wrote:
For example, (and yes, I know there's a world of difference) a MLPPP link is at (effectively) layer 2 (if not 1.5), and if one side of the link drops, the other side will carry what it can.
That is what happens within an aggregate. The multi-link PPP channel corresponds to an "aggregate" in the terminology that I am using.
Ok. Got that.
The topic being discussed is not what happens within an aggregate, but what happens when two aggregates are using. This would be akin to having two multi-link PPP connections (each constructed out of some number of physical links).
Ok, then a fair description of the problem is that "circuits cannot be aggregated across multiple switch chassis". And that's not, in and of itself, bad component design. However, it pretty apparently limits the ability to make the best use of your circuits in your system design... or Mae West wouldn't have taken last Friday morning off. Cheers, -- jra -- Jay R. Ashworth jra@baylink.com Member of the Technical Staff Unsolicited Commercial Emailers Sued The Suncoast Freenet "People propose, science studies, technology Tampa Bay, Florida conforms." -- Dr. Don Norman +1 813 790 7592
The topic being discussed is not what happens within an aggregate, but what happens when two aggregates are using. This would be akin to having two multi-link PPP connections (each constructed out of some number of physical links).
Ok, then a fair description of the problem is that "circuits cannot be aggregated across multiple switch chassis". And that's not, in and of itself, bad component design.
However, it pretty apparently limits the ability to make the best use of your circuits in your system design...
The issue is spanning tree, not component design. Yes, technologies that configure their topology using a spanning tree limit your ability to make best use of potentially expensive circuits in your system design. Stephen
[ quoting Bill ]
to build in better reliability? It appears that the owners/operators of MAE-West are selecting an optimization path based on the assumption that outages are infrequent and can be quickly corrected.
Both assumputions have been repeatedly proven false BY MFS.
Indeed. More importantly, Bill, he wasn't suggesting duplicating the 400Mbps aggregate, but _splitting_ it; it is, after all, _already_ 4 separate links.
I believe that MFS should well be considering redundant 400Mbps pipes on a non-loop-free environment. This would require spanning tree, so MFS would have to set it up so only one of the two sides of the loop would be active at a given time. Yes, this would require an additional OC3. I think given the amount of revenue MFS pulls in from MAE West and the demonstrated problems that have happened because they have not done this, it would be well worth them doing this. Additionally, enabling spanning tree might well eliminate the situation where every time someone loops a ds3 or oc3 circuit it takes down the entire mae. Come on guys, let's fix this. Owen
participants (9)
-
bmanning@ISI.EDU
-
Dave Crocker
-
Jay R. Ashworth
-
Lance Tatman
-
owen@DeLong.SJ.CA.US
-
Rodney Joffe
-
Stephen Stuart
-
Steve Feldman
-
Wayne Bouchard