Hi Robert,
Without naming any names, I will note that at some point
in the not-too-distant past, I was part of a
new-years-eve-holiday-escalation to
$BACKBONE_ROUTER_PROVIDER when the global network I was
involved with started seeing excessive convergence times
(greater than one hour from BGP update message received to
FIB being updated).
After tracking down development engineer from
$RTR_PROVIDER on the new years eve holiday, it was
determined that the problem lay in assumptions made about
how communities were stored in memory. Think hashed
buckets, with linked lists within each bucket. If the
communities all happened to hash to the same bucket, the
linked list in that bucket became extremely long; and if
every prefix coming in, say from multiple sessions with a
major transit provider, happened to be adding one more
community to the very long linked list in that one hash
bucket, well, it ended up slowing down the processing to the
point where updates to the FIB were still trickling in an
hour after the BGP neighbor had finished sending updates
across.
A new hash function was developed on New Year's day, and
a new version of code was built for us to deploy under
relatively painful circumstances.
It's easy to say "Considering that we are talking about
control plane memory I think the cost/space associated with
storing communities is less then negligible these days."
The reality is very different, because it's not just
about efficiently *storing* communities, it's really about
efficiently *parsing and updating* communities--and the
choices made there absolutely *DO* "contribute to longer
protocol convergences in any measurable way."
Matt
(the names have been obscured to increase my chances of
being hireable in the industry again at some future date.
;)