On Sat, Apr 19, 2008 at 7:26 PM, manolo <mhernand1@comcast.net> wrote:
Some things just never change at cogent.. fought them for months way back when to get me off their infamous 2 bgp peer setup after many an outage due to this setup, they finally put us on a single bgp session but it took forever. Lets just say cogent didn't last long at the company I worked for.
Could you provide additional details on the failure mode experienced resultant from this "two tiered" configuration? How did moving to a "conventional" configuration with a single directly-connected neighbor solve things?
For those unfamiliar, Cogent has a system where you set up an EBGP peering with the Cogent router you're connected to, for the purposes of announcing your routes into Cogent. However, these are typically smaller, aggregation class routers, and do not handle full tables - so you don't get your routes from that router. To get a full table FROM Cogent, you need to set up an EBGP multihop session with them, to their nearest full-table router. I believe they actually do all their BGP connections in that manner. This probably makes a lot of sense from an engineering point of view, and could be construed as a BGP competence test. On the other hand, it does have the potential to make things more complex in the event of a failure. I'm not aware of any flaws with such a design that would cause "many an outage," and connections that we've managed for customers with Cogent suggest that it works well. However, if there are problems within the local Cogent node, I could easily see situations where hard-to-identify problems could result. That would seem to me to be an equipment, capacity, or possibly a configuration issue, but not something which discredits the overall strategy. Given that they're providing inexpensive bandwidth, it isn't likely that they'll be sticking large routers everywhere for the customers who want a full table and a simpler BGP configuration. There are many things that you can realistically criticize Cogent for, but I'm not sure the peerA/peerB thing should be one of them. It is certainly more complex, but seems to serve a purpose.
What steps were taken during your postmortem and subsequent lab simulations to verify that the outages were not with the customer-side implementation, or perhaps a simple typographical error?
Here in H-town, we are deploying a metro/BLEC network comprised of 1000s of small L3 boxes not carrying full tables (Cisco 3560 and similar), and would like very much to learn from these major architectural mistakes, so that we can avoid similar outage scenarios. Any information you could provide would be excellent.
Interesting :-)
You get what you pay for....
Not passing any judgment on quality, Cogent is more towards the middle of the road for price, these days, on larger commits.
Or in places like Ashburn. I've been wondering what their future strategy will be. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples. _______________________________________________ NANOG mailing list NANOG@nanog.org http://mailman.nanog.org/mailman/listinfo/nanog