On Wednesday, 18 August, 2021 14:21, "Tom Beecher" <beecher@beecher.cc> said:
We created 5 or 6 different buckets of limit values (for v4 and v6 of course.) Depending on what you have published in PeeringDB (or told us directly what to expect), you're placed in a bucket that gives you a decent amount of headroom to that bucket's max. If your ASN reaches 90% of your limit, our ops folks just move you up to the next bucket. If you start to get up there in the last bucket, then we'll take a manual look and decide what is appropriate. This covers well over 95% of our non-transit sessions, and has dramatically reduced the volume of tickets and changes our ops team has had to sort through.
Depending on what failure cases you actually see from your peers in the wild, I can see (at least as a thought experiment), a two-bucket solution - "transit" and "everyone else". (Excluding downstream customers, who you obviously hold some responsibility for the hygiene of.) How often do folks see a failure case that's "deaggregated something and announced you 1000 /24s, rather than the expected/configured 100 max", vs "fat-fingered being a transit provider, and announced you the global table"? My gut says it's the latter case that breaks things and you need to make damn sure doesn't happen. Curious to hear others' experience. Thanks, Tim.