On Wed, 22 Aug 2007, Mike Tancsa wrote:
Multihoming is great for when there is a total outage. In the case of Cogent on Monday, it wasnt "down"... In this case, there is only so much you can do to influence how packets come back at you as BGP doesnt know anything about a "lossy" or slow connections.
---Mike
Take the carrier that is causing you issues out of your eBGP setup and all's well....
Hi, In my case, I have 6453 and 174 for transit. I want to get to 577 which is directly connected to 6453 and 174. 577 has a higher local pref on paths via 174. Short of shutting my 174 session (or some deaggregation), I dont have a way to influence how 577 gets back to me. I can easily exit out 6453, but it does nothing for the return packets. I have enough capacity on 6453 to handle all my traffic, but its a Draconian step to take and some traffic via 174 is fine and would be worse if I fully shut the session. (ie. peers of 174 in Toronto)
I'm posting too much this week and should stop, but... Again, this is a matter of thinking about design goals. What were you trying to accomplish when you bought redundant connections? It probably wasn't "redundancy," but rather something that redundancy would give you. What redundancy gives you is a better statistical probability that not all of the redundant components will be broken at once. It should be noted that multi-homing is just one of many areas of possible redundancy. Anything else that can break -- routers, switches, cables, etc. can all be set up redundantly. No amount of redundancy in any of those components guarantees reliability. What they do mean is that your network can keep functioning if some components break, as long as you still have enough of whatever component it is to keep running. So, in a redundant setup, what happens when a component breaks? In an ideal situation, it breaks cleanly, fail-over happens automatically, and nobody notices. Then you just have to hope your monitoring system is good enough that you know there's something to fix. But, in an ideal situation, things wouldn't break at all, so designing your procedures around "ideal" failure scenarios doesn't make much sense. What redundancy really gives you is the ability to have outages not turn into major disruptions; the ability, when you see that a component is malfunctioning, to turn it off and go back to sleep. You can then do the real fix later, when it's more convenient or less disruptive. Thought about that way, there's nothing "Draconian" about turning off a connection (or a switch, or a router, or any other redundant component) that's not doing what you want it to. Instead, you're taking advantage of a main feature of your design. If your other providers are doing 95th percentile billing, you even have a day and a half per month that you can leave a connection down at no financial cost. The alternative, as you seem to have noticed, is to spend your day stressing out about your network not working properly, and complaining about being helpless. You don't need redundancy for that. -Steve