We at ARIN have verified that all our systems are functioning optimally and there were no indications of any issues at the time of the PacketVis alerts. 

 

Brad Gorman

Director, Customer Technical Services

American Registry for Internet Numbers

 

From: NANOG <nanog-bounces+bgorman=arin.net@nanog.org> on behalf of Job Snijders <job@sobornost.net>
Date: Thursday, January 30, 2025 at 07:57
To: Christopher Hawker <chris@thesysadmin.au>
Cc: NANOG <nanog@nanog.org>
Subject: Re: ARIN RPKI Trust Anchor Issue

Dear all,

I analysed the alert, here is my assessment.

If I recall correctly, Packetvis uses multiple data sources (different
versions of validator implementations) and alerts on anomalies spotted
by more than a single data source.

Most RPKI Validator implementations limit the maximum allowable file
size of RPKI Signed Objects. It appears one particular Manifest in
ARIN's "Hosted CA" system exceeded a threshold known to exist in older
implementations, rendering all subordinate ROAs on that manifest invalid
for those instances.

Timeline (based on rpkiviews.org):

* On Tue 28 Jan 2025 14:02:07 +0000, one Large CA's keypair signed a
  Manifest with FileAndHash 50,157 entries. This manifest was nominally
  valid until Thu 30 Jan 2025 10:00:00 +0000 and 3,964,618 bytes in
  size.  Note that this is very large compared to other Manifest
  objects: it is 174% larger than the second largest Manifest, and 456%
  larger than the third largest Manifest.

* On Tue 28 Jan 2025 17:54:07 +0000 this Large CA's keypair signed a
  Manifest with FileAndHash 51,014 entries. That particular issuance was
  4,032,321 bytes in size and exceeded a threshold (4M bytes). Following
  the "Failed Fetch" mechanism described in RFC 9286 Section 6.6,
  affected instances continued to use the older "Tue 28 Jan 2025
  14:02:07" manifest, until it expired (which happened today at Thu 30
  Jan 2025 10:00:00).

It is interesting that the 'trigger event' happened two days ago, but it
is only just now that it became quite tangible! It seems this anomaly
could've been alerted for earlier on.

I noted in my "RPKI's 2024 Year In Review" report:

        """
        "Efficiency" in this context arises from validators spending the
        computational cost of validating a single EE certificate and
        yielding more than 1 ROAIPAddress. Under the RIPE NCC TAL one
        yields 6.5 prefixes per ROA, while in the ARIN region this is
        number is 1.1 prefixes per ROA. As stewards of this technology,
        we need to keep an eye on the overall efficiency of the RPKI to
        ensure things don't get out of hand.
        """
        source: https://mailman.nanog.org/pipermail/nanog/2025-January/227166.html

When a resource holder creates many ROAs (tens of thousands), it'll
result in many Manifest FileAndHash entries (again, tens of thousands),
which increases the file size of the Manifest (to the point that some
validators may consider such a Manifest object invalid). When this
happens, the validator marks ROAs as invalid, in turn BGP routers will
considered covered routes 'not-found'.

Another downside of systems signing over only a single prefix per ROA is
that each individual ROA object comes with 1500~2100 bytes 'overhead'
regardless of how many prefixes are encoded inside of it (due to the
embedded X.509 End-Entity certificate and signature). Expressed as a
percentage, this overhead is ~ 98% in the case of single prefixes. While
Manifests grow linearly, the per-ROA overhead makes for a somewhat steep
curve in turn directly impact size of RRDP snapshots (in which all
Manifests + ROAs are bundled together). Publication point operators
are recommended to keep in mind that RRDP snapshot size is another limit
that can be tripped.

Stakeholders operating certification services should keep in mind that
validator implementations might restrict the file size of individual
objects, and the number of objects, but also impose limits on the size
of the RRDP snapshot, the duration of the synchronization task, etc.

Kind regards,

Job


On Thu, Jan 30, 2025 at 10:43:52AM +0000, Christopher Hawker wrote:
> Hello folks,
>
> Has anyone received any similar event notifications (from PacketVis or other)? Trying to work out if it's a false-positive.
>
> Regards,
> Christopher Hawker
> ________________________________
> From: PacketVis <notifications@packetvis.com>
> Sent: Thursday, January 30, 2025 9:40 PM
> To: Christopher Hawker <chris@thesysadmin.au>
> Subject: bgp ta-malfunction - low severity - PacketVis
>
> Possible TA malfunction: 29.17% of the ROAs disappeared from ARIN.
>
> Type: ta-malfunction
> Severity: low
> Monitored: ASarin
> When: 2025-01-30 10:40 UTC
>
> See more details about the event:
> https://packetvis.com/bgp/event/554c1692012ae202467afe3a38ae5075-dca21ae6-ff6f-4133-9095-e421f8b27131/3bc5a1021e196af96fde22d1608cf073c09d9496/