well, I was thinking that you can survey your customers to know their approximate inbound number, you can implement a max-prefix in from them with that (ideally you're already doing that).
You can figure out the output from you as well in a similar fashion.
In either case you're not implementing a limit that's 1% larger than the actual number, you're hedging the number for at least operational overhead reasons to 20-40%. Even a large ISP is sending (today) less than 100k prefixes when the peer isn't asking for 'full routes'.
So, I'd imagine you bucket your customers as: default only - limit 10 customer prefixes only - limit +30% of your customer routes set full transit - +20% of current full table (yes, you may have more buckets than me, meh)
and those are good starting points, if you keep these bucketed you can just ratchet up the limits as time requires. The prefix-limits (in or out) isn't to stop jim-isp from sending 2 of jane-isp's routes, it's to keep jim-isp from making a bad situation very bad. You (ideally!) have prefix-lists to limit jim from sending jane's routes.
first, i have no magic bullet. sure wish i did. and i do not mean ill using ntt as an example; after all, job assures us they are very very important and very smart :) even pulling from peering.db, which is about as well-maintained as the irr (a race to the bottom), as job suggests, this relies on manual maintenance. it assumes the same count at all peerings, etc. etc. and the registered counts are horrifyingly approximate; ntt could leak 10k prefixes and not hit the limit as published. that they are gross approximations shows that they are not at all rigorous, calculated, ... this is not to say that any reasonable prefix count would have allowed the full-table goog leak to vz. and vz could have used an as-path filter not allowing _goog_(lotso-tier-ones)_ (which ntt uses, for example). but without a rigorous source of ground truth, prefix count limits will be approximate upper bounds and hence allow large mis-announcements. it is one tool in a sadly sparse toolbox, and not a strong one. randy