BGP is not a bug-free protocol.
BGP is the easiest protocol to *debug* when the problem shows up.
BGP does not help to accidently affect *unaffected* paths when a problem shows up.
While there is a recursion issue in the BGP<->IGP scenario, BGP would be just as "broke" if the only path between two nodes (and whatever nodes are behind them) had their BGP session removed. Misconfigurations do not imply bad network designs. Bugs are bugs (whether they be OSPF or ISIS or BGP). We have seen RSVP/SNMP/NTP/.*P bugs break routers. I also would think that it is a bit of a stretch to criticize the design of the largest networks in the world, which, were it not for bugs here and there, are running just fine. Further, and I think this is what is troubling people here, is how, without IBGP mesh reduction mechanisms, you could build a non-fully meshed network without an IGP and static routes? The only way this is possible is via a combination of meshing, confeds, and route-reflectors, the latter two which are busted by design. If you are building fully meshed networks, then they are small. And one last comment - ISIS is the easiest protocol to troubleshoot IMO. Even RIP is harder because of all the silly holddowns and poison-reverses, etc. BGP is pretty straightforward as well, but there is a lot in the sauce that you need to know from vendor to vendor, etc. With ISIS, you have one DB (or 2), 1 LSP per router, 1 decision point. Finally, you seem to have a problem with dependencies and recursion, philosophically. This surprises me from someone who I know writes code. Do you not use functions? Pointers? What you have said is that a program that breaks because one function relied on another (that failed) is a broken design. My .02 chris
It looks like everyone forgot the reason for this discussion to begin with. It is the outage caused by a mistake on a single router that affected parts of the network that were *NOT* affected by the original mess.
Please not that this discussion tends to get restarted whenever we have a real OSPF (or ISIS) caused mess.
Alex
While there is a recursion issue in the BGP<->IGP scenario, BGP would be just as "broke" if the only path between two nodes (and whatever nodes are behind them) had their BGP session removed. Misconfigurations do not imply bad network designs. Bugs are bugs (whether they be OSPF or ISIS or BGP).
In the case on hand, the network had multiple paths to reach outside world. Only one path was affected by misconfiguration. None the less, none of the other paths were used. Since the network statement was missing, the route was gone from IGP. Where is the failover? How could transit customers in Philadelphia and New York be affected by IGP mess in Chicago? I maintain it is caused by one thing and one thing only - bad design. Had this been a BGP route, the other paths would have kicked in just fine, provided that the other paths to the outside world existed.
I also would think that it is a bit of a stretch to criticize the design of the largest networks in the world, which, were it not for bugs here and there, are running just fine.
Until they break. Again, personally, I would be very pleased to see AT&T lose a few million dollars on this, due to the violated SLAs, the same way as UUNET did (and we now know a lot about it, thanks to Worldcom's bankruptcy). If an accidental removal of a network statement in the router config causes such an outage, imagine what kind of problems someone can create by deliberately removing some of those statements?
Further, and I think this is what is troubling people here, is how, without IBGP mesh reduction mechanisms, you could build a non-fully meshed network without an IGP and static routes? The only way this is possible is via a combination of meshing, confeds, and route-reflectors, the latter two which are busted by design. If you are building fully meshed networks, then they are small.
Confederations, peering between real interfaces, and MEDs. Route-injectors to drop in fixer routes help as well.
Finally, you seem to have a problem with dependencies and recursion, philosophically. This surprises me from someone who I know writes code. Do you not use functions? Pointers? What you have said is that a program that breaks because one function relied on another (that failed) is a broken design.
I do not have a problem with dependencies and recursion. What I have a problem is the black box implementation of those. The thought of putting a box on a network, putting "router ospf 22" into it seeing it up everywhere *scares the living shit* out of me. I am a firm believer in the KISS principal. I am also a firm believer in forcing people to go one extra step to make sure that they *really* want to do certain things. It is more than likely that I would have not had such a strong opinion of existing IGPs (OSPF and ISIS specifically) if those IGPs were following "dont tell anyone anything" policy until instructed otherwise. Thanks, Alex
participants (2)
-
alex@yuriev.com
-
Martin, Christian