Re: Sonet protection usage
Bill Simpson writes on the "expertise list": | APS is specified to switch a failing circuit over to a backup within 50 | milliseconds. No. APS/MSP is a simple protocl that uses the K1/K2 MSOH bytes *between* SDH section terminating devices to determine which of several parallel paths is active and should be used as the working TxInterface->RxInterface pair. In bidirectional line-switched restoration, either end may force a switch between working and protect; in unidirectional LSR, the receiver tracks the sender. In either mode, the switch can be as quick as the propagation delay of the section overhead with the new K1/K2 byte values. Alternatively, if there is a cut on the working path, there will be a time threshold at which the receiver will start listening to a non-working path. This is typically 50ms, but one has to add the one-way delay to that. If the sender knows there is an incipient failure on the working path, it can force a switch to a protect path using the K1/K2 protocol. Otherwise, we have a timeout, and how quickly the sender realizes that the working path is malfunctioning depends on whether the signalling is bidirectional or not, although that's often not very important. The Cisco implementation splits the working and protect interfaces across multiple boxes; there is an out-of-band (IP-based) router-to-router protocol that synchronizes the K1/K2 protocol state. The routers can force a switch at the next-hop multiplex section terminator (usually an ADM) if they desire, and a clean shutdown is supposed to force such a switch rather than deal with timeouts. Meanwhile, an SDH trail (an end-to-end circuit) can be composed of concatenated BLSR structures, where only a pair of adjacent multiplex terminators will be aware of a protection switch. It is not an end-to-end protocol as specified (although there is Cisco's "dumb-mode" which deals with broken implementations). There is no requirement that every section of a trail use BLSR or any kind of restoration whatsoever. In many cases, the only BLSR-protected portions are between a pair of Ciscos and an adjacent ADM, at either end of the trail. Other people have noted reasons why one might choose to do this. | It assumes that failure is in a multiplexor. It assumes | that most of the circuits will be statistically idle. It assumes that | the individual T3 (OC-1) paths are burstable, and can be recombined at | the path or section layer. It assumes that 50 milliseconds is short | relative to switching time. In short, it assumes voice. Data doesn't | look like that at all! Well data doesn't look like voice, it's true, but in all other respects this paragraph is wrong. All that APS/MSP does is make it easier to manage which of several supposedly-redundant signals an SDH section terminator hears is the working one. 50ms seems like a reasonable timeout value for section terminators to try to make sense of different signals heard from upstream. If it is sensible to use another value, there is no obvious reason why not to. | If the failure is actually due to the usual circumstances, a lot of | data "circuits" fail all at once. There is no chance that they will all | be backed up. What is protected in BLSR is the highest-order multiplex container. This is usually an STM-16 or OC48 (or STM-64 or OC192 )header. | APS (as sold) is a fraud on the uninformed. Those who live in glass houses should not confuse APS with restoration topology layout. Sean.
smd@clock.org wrote:
Bill Simpson writes on the "expertise list":
| APS is specified to switch a failing circuit over to a backup within 50 | milliseconds.
No. APS/MSP is a simple protocl that uses the K1/K2 MSOH bytes *between* SDH section terminating devices to determine which of several parallel paths is active and should be used as the working TxInterface->RxInterface pair.
Yes, Sean, very good, I see that you've read the specification, too. (Some of us try not to confuse issues with overly verbose terminology, especially as the terms used by SONET in the US differ from SDH used everywhere else that you just cited.)
... This is typically 50ms, but one has to add the one-way delay to that.
I haven't seen anything about adding one-way delay to make it longer, could you cite, please? My reading was that the 50 milliseconds included delay, as 50 milliseconds would include a one-way trip around the world, and section regenerators are much closer together than that....
If the sender knows there is an incipient failure on the working path, it can force a switch to a protect path using the K1/K2 protocol. Otherwise, we have a timeout, and how quickly the sender realizes that the working path is malfunctioning depends on whether the signalling is bidirectional or not, although that's often not very important.
As you may note in your nit-pickery, I wrote "within" 50 milliseconds, which is how the specification is written. We all agree it can be quicker. If it's not very important, why did you bring it up?
| APS (as sold) is a fraud on the uninformed.
Those who live in glass houses should not confuse APS with restoration topology layout.
Let us look at the original question, to wit: Steve Feldman wrote: # # A quick poll: # # Are many ISPs taking advantage of SONET APS protection # to provide port or router redundancy on short (metro-area) # circuits? Or is it more typical to get two circuits and # load-share? Or just not bother? # Now, it appears to me that the question was about metro-area circuits (recalling that these are usually called "intermediate" in the specification) -- not short circuits inside a single facility, and not transoceanic -- for "router redundancy". OK? In giving a cogent answer, I've noted the carrier problems, and given examples and rationale. In my experience, multiple circuits with diverse topology is the best answer, because you also get additional bandwidth during the times that the circuit is not failing. We live in hope that the circuit doesn't fail very often, don't we? Neil says (in a later message) that his metro never overbooks. And that may be a good thing. But I'm not sure that it applies to "router redundancy". WSimpson@UMich.edu Key fingerprint = 17 40 5E 67 15 6F 31 26 DD 0D B9 9B 6A 15 2C 32
participants (2)
-
smd@clock.org
-
William Allen Simpson