I think faster routers and bigger pipes will will not solve this problem. As a friend likes to say, that is necessary but not sufficient. HWB is right when he says that the network design is not coping. He left out tools, operations and customer support. I think there is some percentage of the service providers out there who don't care about the service they provide, but I believe (hope) that the mejority really do care. I don't know of a provider who's technical staff is big enough. They all have .45*current staff in open jobs. Until there is a sufficent engineering pool to handle the growth or the growth stops for long enough to catch up, we are stuck being more than a day late. (Hell, several of them have stooped low enough to offer me a job.) Both my direct experience and the discussions I have say that dollars, pounds, dmarks and yen are not the problem. Jerry
] I think faster routers and bigger pipes will will not solve this problem. As a ] friend likes to say, that is necessary but not sufficient. HWB is right when ] he says that the network design is not coping. He left out tools, operations and customer support. I think there is some percentage of the service providers ] out there who don't care about the service they provide, but I believe (hope) ] that the mejority really do care. This whole mess reminds me of some projects I drove by in DC. Perhaps that is what our current Internet is: Low rent housing w/ mass density and all the problem associated therein. A wise man has a vision of the Internet much more hierarchaly laid out. While I have reservations about the Monopoly of certain companies controlling the top layers of the Internet, I think it might be the only way to keep these cesspools of intelligence from corrupting the NAPs and corrupting my connection to anyone single homed through MAE-East. The update I just got on the MAE-E /"Sprint" problem seems rather timely to the discussion. " We have hit the point where BGP processing on SprintLink routers can no longer survive moderate fall-overs. There is no fix for this, except: a/ people MUST withdraw as many prefixes as possible b/ the background route-flap MUST be reduced This is a global problem, and is not, and will not be confined to SprintLink. In short, as has been said on countless mailing-lists for more than a year: CIDRize or die. " says sean. -alan
The update I just got on the MAE-E /"Sprint" problem seems rather timely to the discussion.
" We have hit the point where BGP processing on SprintLink routers can no longer survive moderate fall-overs. There is no fix for this, except: a/ people MUST withdraw as many prefixes as possible b/ the background route-flap MUST be reduced
This is a global problem, and is not, and will not be confined to SprintLink.
In short, as has been said on countless mailing-lists for more than a year: CIDRize or die. " says sean.
Note that the problems that hit SprintLink today also effected AlterNet - we are not 100% sure yet (still comparing timestamps in logs), but it looks like routing problems hit SprintLink, SprintLink route flaps then got to AlterNet, and parts of AlterNet also tipped over (we had several routers near MAE-East hit 100% cpu trying to deal with route flaps). CIDRize or die. I intend to spend quite some time this evening/night trying to see what additional routes AlterNet can squeeze out of the routing tables; I strongly suggest that oter service providers do the same. --asp@uunet.uu.net (Andrew Partan)
In message <199511082323.RAA20836@gaijin.mid.net>, Alan Hannan writes:
The update I just got on the MAE-E /"Sprint" problem seems rather timely to the discussion.
" We have hit the point where BGP processing on SprintLink routers can no longer survive moderate fall-overs. There is no fix for this, except: a/ people MUST withdraw as many prefixes as possible b/ the background route-flap MUST be reduced
You could also take routing from the RA route servers. This would solve your problem. I think this was mentioned at the last NANOG meeting so it is not a new solution either. Curtis
-----BEGIN PGP SIGNED MESSAGE-----
"Curtis Villamizar" == Curtis Villamizar <curtis@ans.net> writes:
Curtis Villamizar> In message Curtis Villamizar> <199511082323.RAA20836@gaijin.mid.net>, Alan Curtis Villamizar> Hannan writes: >> The update I just got on the MAE-E /"Sprint" problem seems >> rather timely to the discussion. >> >> " We have hit the point where BGP processing on SprintLink >> routers can no longer survive moderate fall-overs. There is no >> fix for this, except: a/ people MUST withdraw as many prefixes as >> possible b/ the background route-flap MUST be reduced Curtis Villamizar> You could also take routing from the RA route Curtis Villamizar> servers. This would solve your problem. I think Curtis Villamizar> this was mentioned at the last NANOG meeting so Curtis Villamizar> it is not a new solution either. Curtis Villamizar> Curtis First you will have to explain to me how taking routing from the RA route server would help avoid a collapse of my iBGP mesh... Sean. -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQCVAwUBMKGeZESWYarrFs6xAQEwjAQAkugBRZUv1JBIYVKRDNTPChmNwjiOUIaT 3R5a0PHKiiCBldg+wkW9Qf/6zx7YKYg3KZMJyNwfUVgeuoVTfJ/poL2IbNt9PZRz GA1PDh8sSpqf6abrI17rug0Zo84PCHuL2Oltd404uxaIV6zgAkNPvUi4KnQrnpP6 IYbDpyRUlFc= =e3bY -----END PGP SIGNATURE----- Process pgp-proc killed
In message <95Nov9.012409-0000_est.20701+10@chops.icp.net>, Sean Doran writes:
First you will have to explain to me how taking routing from the RA route server would help avoid a collapse of my iBGP mesh...
Sean.
You got me on that. I can't help you. I doesn't look like we've lost an IBGP connection since Nov 5 at 00:06:37 (E147). :-) We only lost a number of EBGP sessions between AS690 and our Cisco concentrators. We did lose quite a few EBGP sessions to other providers. It's usually a year or more between anything at all of the magnitude of an IBGP collapse on our net since about 1992 when rcp_routed managed 3 backbone wide core dumps in a week so my experience with this sort of things is quite limited. We still do find an occasional gated bug though. I defer to the experts. :-) Curtis ps - faster routers in the world. yeah right. so much for the Bradner tests. :-) :-)
a/ people MUST withdraw as many prefixes as possible b/ the background route-flap MUST be reduced
You could also take routing from the RA route servers. This would solve your problem. I think this was mentioned at the last NANOG meeting so it is not a new solution either.
The RA does route aggregation? I didn't know that. So I can send it all of my more specifics and it will aggregate them for me? Neat. --asp
a/ people MUST withdraw as many prefixes as possible b/ the background route-flap MUST be reduced
You could also take routing from the RA route servers. This would solve your problem. I think this was mentioned at the last NANOG meeting so it is not a new solution either.
The RA does route aggregation? I didn't know that. So I can send it all of my more specifics and it will aggregate them for me? Neat. --asp
Well, the RA does not, but the route server code running in the route servers does. --bill
In message <QQzpbq15078.199511090730@rodan.UU.NET>, Andrew Partan writes:
a/ people MUST withdraw as many prefixes as possible b/ the background route-flap MUST be reduced
You could also take routing from the RA route servers. This would solve your problem. I think this was mentioned at the last NANOG meeting so it is not a new solution either.
The RA does route aggregation? I didn't know that. So I can send it all of my more specifics and it will aggregate them for me? Neat. --asp
It just reduces route flap by doing BGP dampenning plus reducing the number of peering session you need to maintain. That helps with b/ in the list above. As Sean pointed out if your network is imploding, possibly because the flap is within your own network (you pointed out at NANOG that this may be the problem for SprintLink and AlterNet), then there is nothing the RS can do to help you. Curtis
participants (6)
-
Alan Hannan
-
asp@uunet.uu.net
-
bmanning@ISI.EDU
-
Curtis Villamizar
-
scharf@vix.com
-
Sean Doran