Is it that sharing fate in the switching fabric (as opposed to say, in the transport fabric, or even conduit) reduces the resiliency of a given service (in this case FR/ATM/TDM), and as such poses the "danger" you describe?
Sharing fate in the physical layer (multiple fibers in the same conduit) or transport layer (multiple services on the same SONET) have clear and well defined resource limits. A GigE running down a piece of fiber will NEVER jump over to the ATM network fiber and wipe it out. Same goes with SONET. An STS1 is an STS1 and will never eat up an OC-48 no matter how much traffic. Clear well defined resource requirements with well defined protection between resources. shared fate in the switching fabric won't be as stable until routers (the switching fabric) can allocate and manage resources in a clear and defined way. If the resources are being over committed the fabric must be able to handle the full burden of resource requests while still managing to provide appropriate resource limits to services. QoS plays a part in managing the resources of a given link, what manages the resources a service can consume in the fabric itself (CPU, Memory, bandwidth). With proper traffic engineering you can build/overbuild the network to handle 'normal' traffic with a great deal of reliability. The switch fabric and/or network itself must be able to protect itself from the abnormal. Limiting memory/CPU consumption of a flapping BGP peer so you still have enough resources to handle the AToM traffic which is given a higher priority. Let the BGP peers fail, let the Internet traffic drop to save the high priority traffic and the MPLS glue traffic to keep the core operational. Wouldn't it be great if routers had the equivalent of 'User mode Linux' each process handling a service, isolated and protected from each other. The physical router would be nothing more than a generic kernel handling resource allocation. Each virtual router would have access to x amount of resources and will either halt, sleep, crash when it exhausts those resources for a given time slice. I don't know of any method in the current router offerings to limit a VRF to x% of CPU and y% of memory. -Matt
Is this an accurate characterization of your point? If so, why should sharing fate in the switching fabric necessarily reduce the resiliency of the those services that share that fabric (i.e., why should this be so)? I have some ideas, but I'm interested in what ideas other folks have.
Thanks,
Dave