On Mon, Oct 2, 2023 at 6:21 AM tim@pelican.org <tim@pelican.org> wrote:
On Monday, 2 October, 2023 09:39, "William Herrin" <bill@herrin.us> said:
That depends. When the FIB gets too big, routers don't immediately die. Instead, their performance degrades. Just like what happens with oversubscription elsewhere in the system.
With a TCAM-based router, the least specific routes get pushed off the TCAM (out of the fast path) up to the main CPU. As a result, the PPS (packets per second) degrades really fast.
With a DRAM+SRAM cache system, the least used routes fall out of the cache. They haven't actually been pushed out of the fast path, but the fast path gets a little bit slower. The PPS degrades, but not as sharply as with a TCAM-based router.
Spit-balling here, is there a possible design for not-Tier-1 providers where routing optimality (which is probably not a word) degrades rather than packet-shifting performance?
If the FIB is full, can we start making controlled and/or smart decisions about what to install, rather than either of the simple overflow conditions?
For starters, as long as you have *somewhere* you can point a default at in the worst case, even if it's far from the *best* route, you make damn sure you always install a default.
Then you could have knobs for what other routes you discard when you run out of space. Receiving a covering /16? Maybe you can drop the /24s, even if they have a different next hop - routing will be sub-optimal, but it will work. (I know, previous discussions around traffic engineering and whether the originating network must / does do that in practice...)
The problem with this approach is you now have non-deterministic routing. Depending on the state of FIB compression, packets *may* flow out interfaces that are not what the RIB thinks they will be. This can be a good recipe for routing micro-loops that come and go as your FIB compression size ebbs and flows. Taking your example: RTR-A----------RTR-B---------RTR-C RTR-A is announcing a /16 to RTR-B RTR-C is announcing a /24 from within the /16 to RTR-B, which is passing it along to RTR-A If RTR-B's FIB compression fills up, and falls back to "drop the /24, since I see a /16", packets destined to the /24 arriving from RTR-A will reach RTR-B, which will check its FIB, and send them back towards RTR-A....which will send them back to RTR-B, until TTL is exceeded. BTW, this scenario holds true even when it's a default route coming from RTR-A, so saying "well, OK, but we can do FIB compression easily as long as we have a default route to fall back on" still leads to packet-ping-ponging on your upstream interface towards your default if you ever drop a more specific from your FIB that is destined downstream of you. You're better off doing the filtering at the RIB end of things, so that RTR-B no longer passes the /24 to RTR-A; sure, routing breaks at that point, but at least you haven't filled up the RTR-A to RTR-B link with packets ping-ponging back and forth. Your routing protocols *depend* on packets being forwarded along the interfaces the RIB thinks they'll be going out in order for loop-free routing to occur. If the FIB decisions are made independent of the RIB state, your routing protocols might as well just give up and go home, because no matter how many times they run Dijkstra, the path to the destination isn't going to match where the packets ultimately end up going. You could of course fix this issue by propagating the decisions made by the FIB compression algorithm back up into the RIB; at least then, the network engineer being paged at 3am to figure out why a link is full will instead be paged to figure out why routes aren't showing up in the routing table that policy *says* should be showing up. Understand which routes your customers care about / where most of your
traffic goes? Set the "FIB-preference" on those routes as you receive them, to give them the greatest chance of getting installed.
Not a hardware designer, I have little idea as to how feasible this is - I suspect it depends on the rate of churn, complexity of FIB updates, etc. But it feels like there could be a way to build something other than "shortest -> punt to CPU" or "LRU -> punt to CPU".
Or is everyone who could make use of this already doing the same filtering at the RIB level, and not trying to fit a quart RIB into a pint FIB in the first place?
The sane ones who care about the sanity of their network engineers certainly do. ^_^;
Thanks, Tim.
Thanks! Matt