I love that we can't even get a full week into the new year without beating the "let's overhaul BGP" drum. Some things never change. <3 Chris -----Original Message----- From: NANOG <nanog-bounces+chris.wright=commnetbroadband.com@nanog.org> On Behalf Of Joe Maimon Sent: Thursday, January 5, 2023 5:51 PM To: Mel Beckman <mel@beckman.org>; Mike Hammett <nanog@ics-il.net> Cc: NANOG <nanog@nanog.org> Subject: Re: SDN Internet Router (sir) And here is another interesting approach Ive left open in my browser window for who knows how long https://inog.net/files/iNOG14v_oliver_sourcerouting.pdf The problem with BGP is that local actors can exact global costs trivially by consuming as many routing slots as they can get away with, add together BGP path decisions and Most Specific traffic-engineering is the goto knob. Sometimes you just want to say this is the route, do not accept any more specifics, unless this route is no longer the route. But you want that done automatically and correctly, reliably. This is also why all the multi-homing approaches that do not involve global routing havent really taken off in any way to blunt table growth. And likely wont. See the aggregation factor in the routing report for how bad this is. There have been lots of BGP protocol and feature updates, but unless your going to uniformly run new systems and enterprise systems that support all of them, its hard to decide to build your entire routing strategy around them. That BGP unlike EIGRP never tried to tie together performance indicators with routing metrics feature or misdesign, you could debate that but it was always intentional. And opex has pretty much fallen down on the side of against IGP->BGP redistribution of prefixes, let alone performance metrics. That eBGP prefix has no good reliable way of indicating that an advertised route sucks so bad that you should never attempt to use it unless as last resort, thats why we have AS-paths wrapping screen lines. "finish IPv6 migration"? Letting IPv6 migration state factor as decision input on anything not directly related to IPv6 migration was never logical, just naively optimistic, and should be stamped out wherever encountered. If its good, use it now and Ipv6 will adopt it as well. If it isnt, why wait to find out? Joe Mel Beckman wrote:
Mike,
Thanks for that useful example. On a side note, Netflix is a thorn in all our sides :) You could put a localpref filter route to override the default for Netflix prefixes, but this impacts resilience. Since you peer with Netflix, I suspect we probably agree that Netflix’s ideas on traffic engineering are pretty one sided.
I think it’s safe to say that BGP, which has scaled amazingly well, didn’t anticipate some of the big gorilla content systems. I don’t really see, though, how injecting FIB entries helps more than other methods. And as others have pointed out, the risk of creating routing loops is significant.
Perhaps it is time to migrate to a new version of BGP. Projects like MBGP and FP-7‘s 4WARD are working on new follow-on routing models, but nothing is on the immediate horizon. I think we all thought we should finish IPv6 migration first :)
-mel via cell
On Jan 5, 2023, at 1:11 PM, Mike Hammett <nanog@ics-il.net> wrote:
I hesitated to get too specific in examples because someone is going to drag the conversation into the weeds.
Let's take the the Dallas - New Orleans - Atlanta example where I have a connection from New Orleans to Dallas and a connection from New Orleans to Atlanta.
Let's say I peer with Netflix in both markets. Netflix chooses to serve me out of Atlanta, for whatever reason. Say my default route sends my traffic to Dallas. That's not where Netflix wanted it, so now I have to go from Dallas to Atlanta, whether that's my circuit or across the public Internet. Potentially, it's on MPLS and it rides back through the New Orleans router to get back to Atlanta. That's a long trip when I already had a better path, the less-than-full-fib router just didn't know about it. Given that Netflix is a sizable amount of traffic in an eyeball ISP, that's a lot of traffic to be going the wrong way. If the website for Viktor's Arctic Plunge in Siberia was hosted in Atlanta, I wouldn't give two craps that the traffic went the wrong way because A), I'll probably never go there and B) when someone does, it won't be meaningfully enough traffic to accommodate.
Someone's going to tell me to put a full-table router in New Orleans. Maybe I should. Okay, so maybe I have a POP in Ashford, Alabama. It has transport to New Orleans and Atlanta. There aren't enough grains of sugar in Ashford, Alabama to justify a current-generation, full table router. Now I'm even closer to Atlanta, but default may point to New Orleans.
----- Mike Hammett Intelligent Computing Solutions <http://www.ics-il.com/> <https://www.facebook.com/ICSIL><https://plus.google.com/+IntelligentComputingSolutionsDeKalb><https://www.linkedin.com/company/intelligent-computing-solutions><https://twitter.com/ICSIL> Midwest Internet Exchange <http://www.midwest-ix.com/> <https://www.facebook.com/mdwestix><https://www.linkedin.com/company/midwest-internet-exchange><https://twitter.com/mdwestix> The Brothers WISP <http://www.thebrotherswisp.com/> <https://www.facebook.com/thebrotherswisp><https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> ------------------------------------------------------------------------ *From: *"Mel Beckman" <mel@beckman.org> *To: *"Mike Hammett" <nanog@ics-il.net> *Cc: *"Joe Maimon" <jmaimon@jmaimon.com>, "NANOG" <nanog@nanog.org> *Sent: *Thursday, January 5, 2023 2:54:27 PM *Subject: *Re: SDN Internet Router (sir)
Mike,
I’m not sure I understand what you mean by “suboptimal“ routing. Even though the Internet uses AS path length for routing, many of those path lengths are bogus, and don’t really represent any kind of path performance value. For example, a single AS might hide many hops in an MPLS network as a single hop, obscuring asymmetric routing and other uglies. Prepending also occurs when destinations are trying to enforce their own engineering policies, which often conflict with yours or mine.
So what do you mean by “suboptimal“? Are you thinking that the “best” path in BGP tables actually meant you were getting a performance benefit? Because that’s definitely not the case in today’s Internet. Were were you thinking that you would be going along less congested paths? That’s really at the mercy of the traffic engineering of backbone providers over which we have no control.
I generally populate local router FIBs to merel choose an exit point for purposes of load balancing, and nothing more.
-mel
On Jan 5, 2023, at 12:38 PM, Mike Hammett <nanog@ics-il.net> wrote:
I guess I wasn't around for those days.
As far as running out, again, assuming the tooling works correctly, I'd think to target fewer routes than you could hold. Maybe 1k routes is all one would need to get a significant percent of the traffic. A lot of room to mess up if you can hold 100k, 500k routes.
----- Mike Hammett Intelligent Computing Solutions <http://www.ics-il.com/> <https://www.facebook.com/ICSIL><https://plus.google.com/+IntelligentComputingSolutionsDeKalb><https://www.linkedin.com/company/intelligent-computing-solutions><https://twitter.com/ICSIL> Midwest Internet Exchange <http://www.midwest-ix.com/> <https://www.facebook.com/mdwestix><https://www.linkedin.com/company/midwest-internet-exchange><https://twitter.com/mdwestix> The Brothers WISP <http://www.thebrotherswisp.com/> <https://www.facebook.com/thebrotherswisp><https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> ------------------------------------------------------------------------ *From: *"Joe Maimon" <jmaimon@jmaimon.com> *To: *"Mike Hammett" <nanog@ics-il.net>, "Christopher Morrow" <morrowc.lists@gmail.com> *Cc: *"NANOG" <nanog@nanog.org> *Sent: *Thursday, January 5, 2023 2:30:40 PM *Subject: *Re: SDN Internet Router (sir)
Mike Hammett wrote: > I'm not concerned with which technology or buzzword gets the job done, > only that the job is done. > > > > Looking briefly at the couple of things out there, they're evaluating > the top X prefixes in terms of traffic reported by s-flow, where X is > the number I define, and those get pushed into the FIB. One > recalculates every hour, one does so more quickly. How much is > appropriate? I'm not sure. I can't imagine it would *NEED* to be done > all of that often, given the traffic/prefix density an eyeball network > will have. Default routes carry the rest. Default routes could be > handled outside of this process, such that if this process fails, you > just get some sub-optimal routing until repaired. Maybe it doesn't > filter properly and sends a bunch of routes. Then just have a prefix > limit set on the box. Maybe it sends the wrong prefixes. No harm, no > foul. If you're routing sub-optimally internally, when it does hit a > real router with a full FIB, it gets handled appropriately.
Unless it loops.
The rest sounds nice. But flow caching got a bad rap back in the early worm days. But thats because the situation was a little worse back then. Cache the wrong routes or run out of cache, router dies. So long as thats not the case automating optimization is an extremely valuable goal.
> > > I would just be looking for solutions that influence what's in the FIB > and let the rest of the router work as the rest of the router would.
The problem comes when the router wont work at all without the FIB routes, like in the olden days. > > > > ----- > Mike Hammett > Intelligent Computing Solutions <http://www.ics-il.com/> > <https://www.facebook.com/ICSIL><https://plus.google.com/+IntelligentComputingSolutionsDeKalb><https://www.linkedin.com/company/intelligent-computing-solutions><https://twitter.com/ICSIL> > Midwest Internet Exchange <http://www.midwest-ix.com/> > <https://www.facebook.com/mdwestix><https://www.linkedin.com/company/midwest-internet-exchange><https://twitter.com/mdwestix> > The Brothers WISP <http://www.thebrotherswisp.com/> > <https://www.facebook.com/thebrotherswisp><https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> > ------------------------------------------------------------------------ > *From: *"Christopher Morrow" <morrowc.lists@gmail.com> > *To: *"Mike Hammett" <nanog@ics-il.net> > *Cc: *"Tom Beecher" <beecher@beecher.cc>, "NANOG" <nanog@nanog.org> > *Sent: *Thursday, January 5, 2023 12:27:08 PM > *Subject: *Re: SDN Internet Router (sir) > > > > On Thu, Jan 5, 2023 at 11:18 AM Mike Hammett <nanog@ics-il.net > <mailto:nanog@ics-il.net>> wrote: > > Initially, my thought was to use community filtering to push just > IXes, customers, and defaults throughout the network, but that's > obviously still sub-optimal. > > I'd be surprised if a last mile network had a ton of traffic going > to any more than a few hundred prefixes. > > > I think in a low-fib box at the edge of your network your choices are: > "the easy choice, get default, follow that" > > "send some limited set of prefixes to the device, and default, so > you MAY choose better for the initial hop away" > > you certainly can do the second with communities, or route-filters > (prefix-list) on the senders, or.... > you can choose what prefixes make the cut (get the community(ies)) > based on traffic volumes or expected destination locality: > "do not go east to go west!" > > these things will introduce toil and SOME suboptimal routing in some > instances... perhaps it's better than per flow choosing left/right > though and the support calls related to that choice. > > In your NOLA / DFW / ATL example it's totally possible that the > networks in question do something like: > "low fib box in tier-2 city (NOLA), dfz capable/core devices in > tier-1 city (DFW/ATL), and send default from left/right to NOLA" > > Could they send more prefixes than default? sure... do they want to > deal with the toil that induces? (probably not says your example). > > SDN isn't really an answer to this, though.. I don't think. Unless you > envision that to lower the toil ? >