Re: [Nanog] Cogent Router dropping packets
I know I have experienced the engineering department there as well, the best one was when they wanted paper documentation for every route I asked to have in our filters... (and they were incapable of using RADB). It was especially odd since we have > 80 of our own peers and three other transit providers to who we were announcing over 100 routes while they still wanted paper docs. But, filters seem to be an annoyance for most big providers... I have been trying to get level3 to fix our radb-based filtering for a while now (it just stopped pulling new updates for some reason). :) John -----Original Message----- From: manolo [mailto:mhernand1@comcast.net] Sent: Tuesday, April 22, 2008 7:23 AM To: Joe Greco Cc: nanog@merit.edu Subject: Re: [Nanog] Cogent Router dropping packets Well it also was the total arrogance on the part of Cogent engineering and management taking zero responsibility and pushing it back everytime valid issue or not. You had to be there. But everyone has a different opinion, my opinion is set regardless of what cogent tries to sell me now. Manolo Joe Greco wrote:
Well it had sounded like I was in the minority and should keep my mouth shut. But here goes. On several occasions the peer that would advertise our routes would drop and with that the peer with the full bgp tables
would drop as well. This happened for months on end. They tried blaming our 6500, our fiber provider, our IOS version, no conclusive findings
where ever found that it was our problem. After some testing at the local Cogent office by both Cogent and myself, Cogent decided that they could "make a product" that would allow us too one have only one peer
and two to connect directly to the GSR and not through a small catalyst. Low and behold things worked well for some time after that.
This all happened while we had 3 other providers on the same router
with no issues at all. We moved gbics, ports etc around to make sure it was not some odd ASIC or throughput issue with the 6500.
Perhaps you haven't considered this, but did it ever occur to you that Cogent probably had the same situation? They had a router with a bunch of other customers on it, no reported problems, and you were the oddball reporting significant issues?
Quite frankly, your own description does not support this as being a problem inherent to the peerA/peerB setup.
You indicate that the peer advertising your routes would drop. The peer with the full BGP tables would then drop as well. Well, quite frankly, that makes complete sense. The peer advertising your routes also advertises to you the route to get to the multihop peer, which you need in order to be able to talk to that. Therefore, if the directly connected BGP goes away for any reason, the multihop is likely to go away too.
However, given the exact same hardware minus the multihop, your direct BGP was still dropping. So had they been able to send you a full table from the aggregation router, the same thing probably would have happened.
This sounds more like flaky hardware, dirty optics, or a bad cable (or several of the above).
Given that, it actually seems quite reasonable to me to guess that it could have been your 6500, your fiber provider, or your IOS version that was introducing some problem. Anyone who has done any reasonable amount of work in this business will have seen all three, and many of the people here will say that the 6500 is a bit flaky and touchy when pushed into service as a real router (while simultaneously using them in their networks as such, heh, since nothing else really touches the price per port), so Cogent's suggestion that it was a problem on your side may have been based on bad experiences with other customer 6500's.
However, it is also likely that it was some other mundane problem, or a problem with the same items on Cogent's side. I would consider it a shame that Cogent didn't work more closely with you to track down the specific issue, because most of the time, these things can be isolated
and eliminated, rather than being potentially left around to mess up someone in the future (think: bad port).
... JG
_______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
John van Oppen (list account) wrote:
I know I have experienced the engineering department there as well, the best one was when they wanted paper documentation for every route I asked to have in our filters... (and they were incapable of using RADB). It was especially odd since we have > 80 of our own peers and three other transit providers to who we were announcing over 100 routes while they still wanted paper docs.
I've fixed this by throwing their own policies back at them. Point out to them that the route is already appearing globally through your AS, and remind them that their policy, section 3b, already allows that. :) On the previous topic, I'd have to say that their two-peer system is perhaps one of the better, if not best, multihop implementations I've seen. Amongst other things, it tends to provide a rapid assessment of "life in the POP". I just wish they'd use their network status messages to reflect when they were having problems, instead of just problems that are too large for the call center to handle. :( pt _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
participants (2)
-
John van Oppen (list account)
-
Pete Templin