Re: [Nanog] Cogent Router dropping packets
(Crossed Fingers) Cogent's network seems "OK", for now. I've received several responses asking for details on how I would avoid Cogent. It looks like getting a connection to the AT&T network will allow us to serve our customers on their DSLS and use their direct peering to the Time Warner network for our customers with cable Internet. If anyone has any ideas on how this will work, please let me know. For instance, do most networks prefer to keep packets on their network until closest to the end point or might a network just send the traffic through cogent in another part of their network a few hops away? -----Original Message----- From: Mike Fedyk Sent: Thursday, April 17, 2008 2:59 PM To: nanog@merit.edu Subject: RE: Cogent Router dropping packets I spoke too soon: Host Loss% Snt Last Avg Best Wrst StDev 1. adsl-63-194-XXX-XXX.dsl.lsan03.pacbell.net 0.0% 109 9.2 19.2 8.4 57.9 11.0 2. dist3-vlan60.irvnca.sbcglobal.net 0.9% 109 8.4 16.7 8.3 45.6 9.6 3. bb1-p6-7.emhril.ameritech.net 0.0% 109 8.6 36.3 8.5 256.6 44.2 4. ex2-p14-0.eqlaca.sbcglobal.net 0.0% 109 10.3 39.4 9.3 209.3 46.2 5. te8-1.mpd01.lax05.atlas.cogentco.com 0.0% 108 32.4 34.3 9.3 238.6 45.1 6. vl3491.ccr02.lax01.atlas.cogentco.com 3.7% 108 17.0 23.4 12.9 98.9 13.4 7. te3-4.ccr01.lax04.atlas.cogentco.com 17.6% 108 39.1 28.8 16.4 198.9 22.1 8. vl3805.na21.b002695-2.lax04.atlas.cogentco.com 12.0% 108 34.1 27.6 17.0 68.7 11.2 9. PAETEC_Communications_Inc.demarc.cogentco.com 10.2% 108 22.4 35.3 17.0 168.7 27.8 10. gi-4-0-1-3.core01.lsajca01.paetec.net 18.5% 108 21.2 34.2 21.0 188.6 20.6 11. po-5-0-0.core01.anhmca01.paetec.net 10.3% 108 35.7 33.9 20.5 232.7 23.9 12. gi-3-0-0.edge03.anhmca01.paetec.net 13.0% 108 21.0 31.6 20.2 157.9 16.6 13. 74.10.xxx.xxx 11.1% 108 25.7 33.9 25.2 55.2 8.9 14. 74.10.xxx.xxx 15.7% 108 26.7 35.7 25.0 70.8 11.7 -----Original Message----- From: Mike Fedyk Sent: Thursday, April 17, 2008 2:15 PM To: Ryan Harden Cc: nanog@merit.edu Subject: RE: Cogent Router dropping packets Thank you, the issue seems to be fixed now at Cogent. _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
It is Saturday after all. We generally are all aware of Cogents 'status'. You're not having a unique experience. Martin On 4/18/08, Mike Fedyk <mike@reachme.com> wrote:
(Crossed Fingers)
Cogent's network seems "OK", for now.
I've received several responses asking for details on how I would avoid Cogent. It looks like getting a connection to the AT&T network will allow us to serve our customers on their DSLS and use their direct peering to the Time Warner network for our customers with cable Internet.
If anyone has any ideas on how this will work, please let me know. For instance, do most networks prefer to keep packets on their network until closest to the end point or might a network just send the traffic through cogent in another part of their network a few hops away?
-----Original Message----- From: Mike Fedyk Sent: Thursday, April 17, 2008 2:59 PM To: nanog@merit.edu Subject: RE: Cogent Router dropping packets
I spoke too soon:
Host Loss% Snt Last Avg Best Wrst StDev 1. adsl-63-194-XXX-XXX.dsl.lsan03.pacbell.net 0.0% 109 9.2 19.2 8.4 57.9 11.0 2. dist3-vlan60.irvnca.sbcglobal.net 0.9% 109 8.4 16.7 8.3 45.6 9.6 3. bb1-p6-7.emhril.ameritech.net 0.0% 109 8.6 36.3 8.5 256.6 44.2 4. ex2-p14-0.eqlaca.sbcglobal.net 0.0% 109 10.3 39.4 9.3 209.3 46.2 5. te8-1.mpd01.lax05.atlas.cogentco.com 0.0% 108 32.4 34.3 9.3 238.6 45.1 6. vl3491.ccr02.lax01.atlas.cogentco.com 3.7% 108 17.0 23.4 12.9 98.9 13.4 7. te3-4.ccr01.lax04.atlas.cogentco.com 17.6% 108 39.1 28.8 16.4 198.9 22.1 8. vl3805.na21.b002695-2.lax04.atlas.cogentco.com 12.0% 108 34.1 27.6 17.0 68.7 11.2 9. PAETEC_Communications_Inc.demarc.cogentco.com 10.2% 108 22.4 35.3 17.0 168.7 27.8 10. gi-4-0-1-3.core01.lsajca01.paetec.net 18.5% 108 21.2 34.2 21.0 188.6 20.6 11. po-5-0-0.core01.anhmca01.paetec.net 10.3% 108 35.7 33.9 20.5 232.7 23.9 12. gi-3-0-0.edge03.anhmca01.paetec.net 13.0% 108 21.0 31.6 20.2 157.9 16.6 13. 74.10.xxx.xxx 11.1% 108 25.7 33.9 25.2 55.2 8.9 14. 74.10.xxx.xxx 15.7% 108 26.7 35.7 25.0 70.8 11.7
-----Original Message----- From: Mike Fedyk Sent: Thursday, April 17, 2008 2:15 PM To: Ryan Harden Cc: nanog@merit.edu Subject: RE: Cogent Router dropping packets
Thank you, the issue seems to be fixed now at Cogent.
_______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
_______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
Some things just never change at cogent.. fought them for months way back when to get me off their infamous 2 bgp peer setup after many an outage due to this setup, they finally put us on a single bgp session but it took forever. Lets just say cogent didn't last long at the company I worked for. You get what you pay for.... Manolo Martin Hannigan wrote:
It is Saturday after all. We generally are all aware of Cogents 'status'. You're not having a unique experience.
Martin
On 4/18/08, Mike Fedyk <mike@reachme.com> wrote:
(Crossed Fingers)
Cogent's network seems "OK", for now.
I've received several responses asking for details on how I would avoid Cogent. It looks like getting a connection to the AT&T network will allow us to serve our customers on their DSLS and use their direct peering to the Time Warner network for our customers with cable Internet.
If anyone has any ideas on how this will work, please let me know. For instance, do most networks prefer to keep packets on their network until closest to the end point or might a network just send the traffic through cogent in another part of their network a few hops away?
-----Original Message----- From: Mike Fedyk Sent: Thursday, April 17, 2008 2:59 PM To: nanog@merit.edu Subject: RE: Cogent Router dropping packets
I spoke too soon:
Host Loss% Snt Last Avg Best Wrst StDev 1. adsl-63-194-XXX-XXX.dsl.lsan03.pacbell.net 0.0% 109 9.2 19.2 8.4 57.9 11.0 2. dist3-vlan60.irvnca.sbcglobal.net 0.9% 109 8.4 16.7 8.3 45.6 9.6 3. bb1-p6-7.emhril.ameritech.net 0.0% 109 8.6 36.3 8.5 256.6 44.2 4. ex2-p14-0.eqlaca.sbcglobal.net 0.0% 109 10.3 39.4 9.3 209.3 46.2 5. te8-1.mpd01.lax05.atlas.cogentco.com 0.0% 108 32.4 34.3 9.3 238.6 45.1 6. vl3491.ccr02.lax01.atlas.cogentco.com 3.7% 108 17.0 23.4 12.9 98.9 13.4 7. te3-4.ccr01.lax04.atlas.cogentco.com 17.6% 108 39.1 28.8 16.4 198.9 22.1 8. vl3805.na21.b002695-2.lax04.atlas.cogentco.com 12.0% 108 34.1 27.6 17.0 68.7 11.2 9. PAETEC_Communications_Inc.demarc.cogentco.com 10.2% 108 22.4 35.3 17.0 168.7 27.8 10. gi-4-0-1-3.core01.lsajca01.paetec.net 18.5% 108 21.2 34.2 21.0 188.6 20.6 11. po-5-0-0.core01.anhmca01.paetec.net 10.3% 108 35.7 33.9 20.5 232.7 23.9 12. gi-3-0-0.edge03.anhmca01.paetec.net 13.0% 108 21.0 31.6 20.2 157.9 16.6 13. 74.10.xxx.xxx 11.1% 108 25.7 33.9 25.2 55.2 8.9 14. 74.10.xxx.xxx 15.7% 108 26.7 35.7 25.0 70.8 11.7
-----Original Message----- From: Mike Fedyk Sent: Thursday, April 17, 2008 2:15 PM To: Ryan Harden Cc: nanog@merit.edu Subject: RE: Cogent Router dropping packets
Thank you, the issue seems to be fixed now at Cogent.
_______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
_______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
_______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
On Sat, Apr 19, 2008 at 7:26 PM, manolo <mhernand1@comcast.net> wrote:
Some things just never change at cogent.. fought them for months way back when to get me off their infamous 2 bgp peer setup after many an outage due to this setup, they finally put us on a single bgp session but it took forever. Lets just say cogent didn't last long at the company I worked for.
Could you provide additional details on the failure mode experienced resultant from this "two tiered" configuration? How did moving to a "conventional" configuration with a single directly-connected neighbor solve things? What steps were taken during your postmortem and subsequent lab simulations to verify that the outages were not with the customer-side implementation, or perhaps a simple typographical error? Here in H-town, we are deploying a metro/BLEC network comprised of 1000s of small L3 boxes not carrying full tables (Cisco 3560 and similar), and would like very much to learn from these major architectural mistakes, so that we can avoid similar outage scenarios. Any information you could provide would be excellent.
You get what you pay for....
Not passing any judgment on quality, Cogent is more towards the middle of the road for price, these days, on larger commits. Paul Wall _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
On Sat, Apr 19, 2008 at 7:26 PM, manolo <mhernand1@comcast.net> wrote:
Some things just never change at cogent.. fought them for months way back when to get me off their infamous 2 bgp peer setup after many an outage due to this setup, they finally put us on a single bgp session but it took forever. Lets just say cogent didn't last long at the company I worked for.
Could you provide additional details on the failure mode experienced resultant from this "two tiered" configuration? How did moving to a "conventional" configuration with a single directly-connected neighbor solve things?
For those unfamiliar, Cogent has a system where you set up an EBGP peering with the Cogent router you're connected to, for the purposes of announcing your routes into Cogent. However, these are typically smaller, aggregation class routers, and do not handle full tables - so you don't get your routes from that router. To get a full table FROM Cogent, you need to set up an EBGP multihop session with them, to their nearest full-table router. I believe they actually do all their BGP connections in that manner. This probably makes a lot of sense from an engineering point of view, and could be construed as a BGP competence test. On the other hand, it does have the potential to make things more complex in the event of a failure. I'm not aware of any flaws with such a design that would cause "many an outage," and connections that we've managed for customers with Cogent suggest that it works well. However, if there are problems within the local Cogent node, I could easily see situations where hard-to-identify problems could result. That would seem to me to be an equipment, capacity, or possibly a configuration issue, but not something which discredits the overall strategy. Given that they're providing inexpensive bandwidth, it isn't likely that they'll be sticking large routers everywhere for the customers who want a full table and a simpler BGP configuration. There are many things that you can realistically criticize Cogent for, but I'm not sure the peerA/peerB thing should be one of them. It is certainly more complex, but seems to serve a purpose.
What steps were taken during your postmortem and subsequent lab simulations to verify that the outages were not with the customer-side implementation, or perhaps a simple typographical error?
Here in H-town, we are deploying a metro/BLEC network comprised of 1000s of small L3 boxes not carrying full tables (Cisco 3560 and similar), and would like very much to learn from these major architectural mistakes, so that we can avoid similar outage scenarios. Any information you could provide would be excellent.
Interesting :-)
You get what you pay for....
Not passing any judgment on quality, Cogent is more towards the middle of the road for price, these days, on larger commits.
Or in places like Ashburn. I've been wondering what their future strategy will be. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples. _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
Joe Greco wrote:
For those unfamiliar, Cogent has a system where you set up an EBGP peering with the Cogent router you're connected to, for the purposes of announcing your routes into Cogent. However, these are typically smaller, aggregation class routers, and do not handle full tables - so you don't get your routes from that router. To get a full table FROM Cogent, you need to set up an EBGP multihop session with them, to their nearest full-table router. I believe they actually do all their BGP connections in that manner. Depends on the service you purchase. Fast Ethernet seems to be delivered as eBGP-multihop (the first hop is just a L3 switch), however DS-3 is handled as a single BGP session. I'm not sure if GigE or SONET services are handled as multihop or not.
Probably all depends what hardware they have at each POP.... _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
Joe Greco wrote:
For those unfamiliar, Cogent has a system where you set up an EBGP peering with the Cogent router you're connected to, for the purposes of announcing your routes into Cogent. However, these are typically smaller, aggregation class routers, and do not handle full tables - so you don't get your routes from that router. To get a full table FROM Cogent, you need to set up an EBGP multihop session with them, to their nearest full-table router. I believe they actually do all their BGP connections in that manner.
Depends on the service you purchase. Fast Ethernet seems to be delivered as eBGP-multihop (the first hop is just a L3 switch), however DS-3 is handled as a single BGP session. I'm not sure if GigE or SONET services are handled as multihop or not.
GigE is, though perhaps not in all cases (we had a client buying x00Mbps delivered over gigE, which was definitely multihop).
Probably all depends what hardware they have at each POP....
In part, I'm sure. There is also a certain benefit to having consistency throughout your network, and it sometimes struck me that many of the folks working for Cogent had a bit more than average difficulty dealing with the unusual situation. This is not meant harshly, btw. Generally I like the Cogent folks, but they (and their products) have their faults, just as any of the competition does. It may also help to remember that there's "legacy" Cogent and then there's PSI/etc. Perhaps there are some differences as a result. The more things you can do using the same template, the less difficult it is to support. On the flip side, the less flexible you are ... ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples. _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
I do have to say that the PSI net side of cogent is very good. We use them in Europe without many issues. I stay far away from the legacy cogent network in US. Manolo Joe Greco wrote:
Joe Greco wrote:
For those unfamiliar, Cogent has a system where you set up an EBGP peering with the Cogent router you're connected to, for the purposes of announcing your routes into Cogent. However, these are typically smaller, aggregation class routers, and do not handle full tables - so you don't get your routes from that router. To get a full table FROM Cogent, you need to set up an EBGP multihop session with them, to their nearest full-table router. I believe they actually do all their BGP connections in that manner.
Depends on the service you purchase. Fast Ethernet seems to be delivered as eBGP-multihop (the first hop is just a L3 switch), however DS-3 is handled as a single BGP session. I'm not sure if GigE or SONET services are handled as multihop or not.
GigE is, though perhaps not in all cases (we had a client buying x00Mbps delivered over gigE, which was definitely multihop).
Probably all depends what hardware they have at each POP....
In part, I'm sure. There is also a certain benefit to having consistency throughout your network, and it sometimes struck me that many of the folks working for Cogent had a bit more than average difficulty dealing with the unusual situation. This is not meant harshly, btw. Generally I like the Cogent folks, but they (and their products) have their faults, just as any of the competition does.
It may also help to remember that there's "legacy" Cogent and then there's PSI/etc. Perhaps there are some differences as a result.
The more things you can do using the same template, the less difficult it is to support. On the flip side, the less flexible you are ...
... JG
_______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
Joe Greco wrote:
For those unfamiliar, Cogent has a system where you set up an EBGP
Not sure what you are talking about, cogent is all AS174... Other than a few odd routers doing DS3 aggregation I don't think there is any old PSInet network online (other than the AS number and IP addresses). Cogent integrated acquisitions quite quickly (I was an aleron customer and it only took two months from the purchase close for us to move from AS4200 to 174). As for the two BGP peer question, they do it anywhere where they have Ethernet distribution, at least as far I can tell. That being said, we don't use them anymore since we could not get them to play-ball on pricing at larger commits either (I won't buy cogent if they don't at least match the terms of our cheapest large-network transit provider). :) John van Oppen Spectrum Networks LLC 206.973.8302 (Direct) 206.973.8300 (main office) -----Original Message----- From: manolo [mailto:mhernand1@comcast.net] Sent: Monday, April 21, 2008 1:03 PM To: Joe Greco Cc: nanog@merit.edu Subject: Re: [Nanog] Cogent Router dropping packets I do have to say that the PSI net side of cogent is very good. We use them in Europe without many issues. I stay far away from the legacy cogent network in US. Manolo Joe Greco wrote: peering
with the Cogent router you're connected to, for the purposes of announcing your routes into Cogent. However, these are typically smaller, aggregation class routers, and do not handle full tables - so you don't get your routes from that router. To get a full table FROM Cogent, you need to set up an EBGP multihop session with them, to their nearest full-table router. I believe they actually do all their BGP connections in that manner.
Depends on the service you purchase. Fast Ethernet seems to be delivered as eBGP-multihop (the first hop is just a L3 switch), however DS-3 is
handled as a single BGP session. I'm not sure if GigE or SONET services are handled as multihop or not.
GigE is, though perhaps not in all cases (we had a client buying x00Mbps delivered over gigE, which was definitely multihop).
Probably all depends what hardware they have at each POP....
In part, I'm sure. There is also a certain benefit to having consistency throughout your network, and it sometimes struck me that many of the folks working for Cogent had a bit more than average difficulty dealing with the unusual situation. This is not meant harshly, btw. Generally I like the Cogent folks, but they (and their products) have their faults, just as any of the competition does.
It may also help to remember that there's "legacy" Cogent and then there's PSI/etc. Perhaps there are some differences as a result.
The more things you can do using the same template, the less difficult it is to support. On the flip side, the less flexible you are ...
... JG
_______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
On Mon, Apr 21, 2008 at 4:02 PM, manolo <mhernand1@comcast.net> wrote:
I do have to say that the PSI net side of cogent is very good. We use them in Europe without many issues. I stay far away from the legacy cogent network in US.
You still haven't explained the failure modes you've experienced as a result of cogent's A/B peer configuration, only fronted. Inquiring minds would like to know! _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
Well it had sounded like I was in the minority and should keep my mouth shut. But here goes. On several occasions the peer that would advertise our routes would drop and with that the peer with the full bgp tables would drop as well. This happened for months on end. They tried blaming our 6500, our fiber provider, our IOS version, no conclusive findings where ever found that it was our problem. After some testing at the local Cogent office by both Cogent and myself, Cogent decided that they could "make a product" that would allow us too one have only one peer and two to connect directly to the GSR and not through a small catalyst. Low and behold things worked well for some time after that. This all happened while we had 3 other providers on the same router with no issues at all. We moved gbics, ports etc around to make sure it was not some odd ASIC or throughput issue with the 6500. Hope this answers the question. Manolo Paul Wall wrote:
On Mon, Apr 21, 2008 at 4:02 PM, manolo <mhernand1@comcast.net> wrote:
I do have to say that the PSI net side of cogent is very good. We use them in Europe without many issues. I stay far away from the legacy cogent network in US.
You still haven't explained the failure modes you've experienced as a result of cogent's A/B peer configuration, only fronted.
Inquiring minds would like to know!
_______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
Well it had sounded like I was in the minority and should keep my mouth shut. But here goes. On several occasions the peer that would advertise our routes would drop and with that the peer with the full bgp tables would drop as well. That doesn't sound like the problem has anything to do with their multihop-eBGP configuration - It just appears that whatever you were
manolo wrote: directly connected to was flaking out. If they had moved you to a directly connected BGP session and it all worked, that would be one argument, but you also moved from a junky 3550 or something to the GSR in the process. I'd argue that if the switch could handle full tables and you just had a single session, you would probably have experienced the same issue. I've ran with both direct and multihop with Cogent, and I honestly never noticed any difference in stability. I hear what you're saying, and I think you have a valid argument in some respects, but I just think the BGP problem is a symptom, not a cause. _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
Well it had sounded like I was in the minority and should keep my mouth shut. But here goes. On several occasions the peer that would advertise our routes would drop and with that the peer with the full bgp tables would drop as well. This happened for months on end. They tried blaming our 6500, our fiber provider, our IOS version, no conclusive findings where ever found that it was our problem. After some testing at the local Cogent office by both Cogent and myself, Cogent decided that they could "make a product" that would allow us too one have only one peer and two to connect directly to the GSR and not through a small catalyst. Low and behold things worked well for some time after that.
This all happened while we had 3 other providers on the same router with no issues at all. We moved gbics, ports etc around to make sure it was not some odd ASIC or throughput issue with the 6500.
Perhaps you haven't considered this, but did it ever occur to you that Cogent probably had the same situation? They had a router with a bunch of other customers on it, no reported problems, and you were the oddball reporting significant issues? Quite frankly, your own description does not support this as being a problem inherent to the peerA/peerB setup. You indicate that the peer advertising your routes would drop. The peer with the full BGP tables would then drop as well. Well, quite frankly, that makes complete sense. The peer advertising your routes also advertises to you the route to get to the multihop peer, which you need in order to be able to talk to that. Therefore, if the directly connected BGP goes away for any reason, the multihop is likely to go away too. However, given the exact same hardware minus the multihop, your direct BGP was still dropping. So had they been able to send you a full table from the aggregation router, the same thing probably would have happened. This sounds more like flaky hardware, dirty optics, or a bad cable (or several of the above). Given that, it actually seems quite reasonable to me to guess that it could have been your 6500, your fiber provider, or your IOS version that was introducing some problem. Anyone who has done any reasonable amount of work in this business will have seen all three, and many of the people here will say that the 6500 is a bit flaky and touchy when pushed into service as a real router (while simultaneously using them in their networks as such, heh, since nothing else really touches the price per port), so Cogent's suggestion that it was a problem on your side may have been based on bad experiences with other customer 6500's. However, it is also likely that it was some other mundane problem, or a problem with the same items on Cogent's side. I would consider it a shame that Cogent didn't work more closely with you to track down the specific issue, because most of the time, these things can be isolated and eliminated, rather than being potentially left around to mess up someone in the future (think: bad port). ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples. _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
Well it also was the total arrogance on the part of Cogent engineering and management taking zero responsibility and pushing it back everytime valid issue or not. You had to be there. But everyone has a different opinion, my opinion is set regardless of what cogent tries to sell me now. Manolo Joe Greco wrote:
Well it had sounded like I was in the minority and should keep my mouth shut. But here goes. On several occasions the peer that would advertise our routes would drop and with that the peer with the full bgp tables would drop as well. This happened for months on end. They tried blaming our 6500, our fiber provider, our IOS version, no conclusive findings where ever found that it was our problem. After some testing at the local Cogent office by both Cogent and myself, Cogent decided that they could "make a product" that would allow us too one have only one peer and two to connect directly to the GSR and not through a small catalyst. Low and behold things worked well for some time after that.
This all happened while we had 3 other providers on the same router with no issues at all. We moved gbics, ports etc around to make sure it was not some odd ASIC or throughput issue with the 6500.
Perhaps you haven't considered this, but did it ever occur to you that Cogent probably had the same situation? They had a router with a bunch of other customers on it, no reported problems, and you were the oddball reporting significant issues?
Quite frankly, your own description does not support this as being a problem inherent to the peerA/peerB setup.
You indicate that the peer advertising your routes would drop. The peer with the full BGP tables would then drop as well. Well, quite frankly, that makes complete sense. The peer advertising your routes also advertises to you the route to get to the multihop peer, which you need in order to be able to talk to that. Therefore, if the directly connected BGP goes away for any reason, the multihop is likely to go away too.
However, given the exact same hardware minus the multihop, your direct BGP was still dropping. So had they been able to send you a full table from the aggregation router, the same thing probably would have happened.
This sounds more like flaky hardware, dirty optics, or a bad cable (or several of the above).
Given that, it actually seems quite reasonable to me to guess that it could have been your 6500, your fiber provider, or your IOS version that was introducing some problem. Anyone who has done any reasonable amount of work in this business will have seen all three, and many of the people here will say that the 6500 is a bit flaky and touchy when pushed into service as a real router (while simultaneously using them in their networks as such, heh, since nothing else really touches the price per port), so Cogent's suggestion that it was a problem on your side may have been based on bad experiences with other customer 6500's.
However, it is also likely that it was some other mundane problem, or a problem with the same items on Cogent's side. I would consider it a shame that Cogent didn't work more closely with you to track down the specific issue, because most of the time, these things can be isolated and eliminated, rather than being potentially left around to mess up someone in the future (think: bad port).
... JG
_______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog
participants (7)
-
David Coulson
-
Joe Greco
-
John van Oppen (list account)
-
manolo
-
Martin Hannigan
-
Mike Fedyk
-
Paul Wall