
Hello folks, Has anyone received any similar event notifications (from PacketVis or other)? Trying to work out if it's a false-positive. Regards, Christopher Hawker ________________________________ From: PacketVis <notifications@packetvis.com> Sent: Thursday, January 30, 2025 9:40 PM To: Christopher Hawker <chris@thesysadmin.au> Subject: bgp ta-malfunction - low severity - PacketVis Possible TA malfunction: 29.17% of the ROAs disappeared from ARIN. Type: ta-malfunction Severity: low Monitored: ASarin When: 2025-01-30 10:40 UTC See more details about the event: https://packetvis.com/bgp/event/554c1692012ae202467afe3a38ae5075-dca21ae6-ff...

Dear all, I analysed the alert, here is my assessment. If I recall correctly, Packetvis uses multiple data sources (different versions of validator implementations) and alerts on anomalies spotted by more than a single data source. Most RPKI Validator implementations limit the maximum allowable file size of RPKI Signed Objects. It appears one particular Manifest in ARIN's "Hosted CA" system exceeded a threshold known to exist in older implementations, rendering all subordinate ROAs on that manifest invalid for those instances. Timeline (based on rpkiviews.org): * On Tue 28 Jan 2025 14:02:07 +0000, one Large CA's keypair signed a Manifest with FileAndHash 50,157 entries. This manifest was nominally valid until Thu 30 Jan 2025 10:00:00 +0000 and 3,964,618 bytes in size. Note that this is very large compared to other Manifest objects: it is 174% larger than the second largest Manifest, and 456% larger than the third largest Manifest. * On Tue 28 Jan 2025 17:54:07 +0000 this Large CA's keypair signed a Manifest with FileAndHash 51,014 entries. That particular issuance was 4,032,321 bytes in size and exceeded a threshold (4M bytes). Following the "Failed Fetch" mechanism described in RFC 9286 Section 6.6, affected instances continued to use the older "Tue 28 Jan 2025 14:02:07" manifest, until it expired (which happened today at Thu 30 Jan 2025 10:00:00). It is interesting that the 'trigger event' happened two days ago, but it is only just now that it became quite tangible! It seems this anomaly could've been alerted for earlier on. I noted in my "RPKI's 2024 Year In Review" report: """ "Efficiency" in this context arises from validators spending the computational cost of validating a single EE certificate and yielding more than 1 ROAIPAddress. Under the RIPE NCC TAL one yields 6.5 prefixes per ROA, while in the ARIN region this is number is 1.1 prefixes per ROA. As stewards of this technology, we need to keep an eye on the overall efficiency of the RPKI to ensure things don't get out of hand. """ source: https://mailman.nanog.org/pipermail/nanog/2025-January/227166.html When a resource holder creates many ROAs (tens of thousands), it'll result in many Manifest FileAndHash entries (again, tens of thousands), which increases the file size of the Manifest (to the point that some validators may consider such a Manifest object invalid). When this happens, the validator marks ROAs as invalid, in turn BGP routers will considered covered routes 'not-found'. Another downside of systems signing over only a single prefix per ROA is that each individual ROA object comes with 1500~2100 bytes 'overhead' regardless of how many prefixes are encoded inside of it (due to the embedded X.509 End-Entity certificate and signature). Expressed as a percentage, this overhead is ~ 98% in the case of single prefixes. While Manifests grow linearly, the per-ROA overhead makes for a somewhat steep curve in turn directly impact size of RRDP snapshots (in which all Manifests + ROAs are bundled together). Publication point operators are recommended to keep in mind that RRDP snapshot size is another limit that can be tripped. Stakeholders operating certification services should keep in mind that validator implementations might restrict the file size of individual objects, and the number of objects, but also impose limits on the size of the RRDP snapshot, the duration of the synchronization task, etc. Kind regards, Job On Thu, Jan 30, 2025 at 10:43:52AM +0000, Christopher Hawker wrote:
Hello folks,
Has anyone received any similar event notifications (from PacketVis or other)? Trying to work out if it's a false-positive.
Regards, Christopher Hawker ________________________________ From: PacketVis <notifications@packetvis.com> Sent: Thursday, January 30, 2025 9:40 PM To: Christopher Hawker <chris@thesysadmin.au> Subject: bgp ta-malfunction - low severity - PacketVis
Possible TA malfunction: 29.17% of the ROAs disappeared from ARIN.
Type: ta-malfunction Severity: low Monitored: ASarin When: 2025-01-30 10:40 UTC
See more details about the event: https://packetvis.com/bgp/event/554c1692012ae202467afe3a38ae5075-dca21ae6-ff...

We at ARIN have verified that all our systems are functioning optimally and there were no indications of any issues at the time of the PacketVis alerts. Brad Gorman Director, Customer Technical Services American Registry for Internet Numbers From: NANOG <nanog-bounces+bgorman=arin.net@nanog.org> on behalf of Job Snijders <job@sobornost.net> Date: Thursday, January 30, 2025 at 07:57 To: Christopher Hawker <chris@thesysadmin.au> Cc: NANOG <nanog@nanog.org> Subject: Re: ARIN RPKI Trust Anchor Issue Dear all, I analysed the alert, here is my assessment. If I recall correctly, Packetvis uses multiple data sources (different versions of validator implementations) and alerts on anomalies spotted by more than a single data source. Most RPKI Validator implementations limit the maximum allowable file size of RPKI Signed Objects. It appears one particular Manifest in ARIN's "Hosted CA" system exceeded a threshold known to exist in older implementations, rendering all subordinate ROAs on that manifest invalid for those instances. Timeline (based on rpkiviews.org): * On Tue 28 Jan 2025 14:02:07 +0000, one Large CA's keypair signed a Manifest with FileAndHash 50,157 entries. This manifest was nominally valid until Thu 30 Jan 2025 10:00:00 +0000 and 3,964,618 bytes in size. Note that this is very large compared to other Manifest objects: it is 174% larger than the second largest Manifest, and 456% larger than the third largest Manifest. * On Tue 28 Jan 2025 17:54:07 +0000 this Large CA's keypair signed a Manifest with FileAndHash 51,014 entries. That particular issuance was 4,032,321 bytes in size and exceeded a threshold (4M bytes). Following the "Failed Fetch" mechanism described in RFC 9286 Section 6.6, affected instances continued to use the older "Tue 28 Jan 2025 14:02:07" manifest, until it expired (which happened today at Thu 30 Jan 2025 10:00:00). It is interesting that the 'trigger event' happened two days ago, but it is only just now that it became quite tangible! It seems this anomaly could've been alerted for earlier on. I noted in my "RPKI's 2024 Year In Review" report: """ "Efficiency" in this context arises from validators spending the computational cost of validating a single EE certificate and yielding more than 1 ROAIPAddress. Under the RIPE NCC TAL one yields 6.5 prefixes per ROA, while in the ARIN region this is number is 1.1 prefixes per ROA. As stewards of this technology, we need to keep an eye on the overall efficiency of the RPKI to ensure things don't get out of hand. """ source: https://mailman.nanog.org/pipermail/nanog/2025-January/227166.html When a resource holder creates many ROAs (tens of thousands), it'll result in many Manifest FileAndHash entries (again, tens of thousands), which increases the file size of the Manifest (to the point that some validators may consider such a Manifest object invalid). When this happens, the validator marks ROAs as invalid, in turn BGP routers will considered covered routes 'not-found'. Another downside of systems signing over only a single prefix per ROA is that each individual ROA object comes with 1500~2100 bytes 'overhead' regardless of how many prefixes are encoded inside of it (due to the embedded X.509 End-Entity certificate and signature). Expressed as a percentage, this overhead is ~ 98% in the case of single prefixes. While Manifests grow linearly, the per-ROA overhead makes for a somewhat steep curve in turn directly impact size of RRDP snapshots (in which all Manifests + ROAs are bundled together). Publication point operators are recommended to keep in mind that RRDP snapshot size is another limit that can be tripped. Stakeholders operating certification services should keep in mind that validator implementations might restrict the file size of individual objects, and the number of objects, but also impose limits on the size of the RRDP snapshot, the duration of the synchronization task, etc. Kind regards, Job On Thu, Jan 30, 2025 at 10:43:52AM +0000, Christopher Hawker wrote:
Hello folks,
Has anyone received any similar event notifications (from PacketVis or other)? Trying to work out if it's a false-positive.
Regards, Christopher Hawker ________________________________ From: PacketVis <notifications@packetvis.com> Sent: Thursday, January 30, 2025 9:40 PM To: Christopher Hawker <chris@thesysadmin.au> Subject: bgp ta-malfunction - low severity - PacketVis
Possible TA malfunction: 29.17% of the ROAs disappeared from ARIN.
Type: ta-malfunction Severity: low Monitored: ASarin When: 2025-01-30 10:40 UTC
See more details about the event: https://packetvis.com/bgp/event/554c1692012ae202467afe3a38ae5075-dca21ae6-ff...

Dear Job,
I analysed the alert, here is my assessment.
Thanks a lot for the analysis. I had also received the alert (Randy Bush and others as well, see "Subject: TA Malfunction??" thread :-) and was wondering... your analysis makes sense as far as I can judge (which is not very far). [...]
It is interesting that the 'trigger event' happened two days ago, but it is only just now that it became quite tangible! It seems this anomaly could've been alerted for earlier on.
Can you elaborate how? (Looking for overly-large or otherwise suspicious manifests signed by CAs?)
I noted in my "RPKI's 2024 Year In Review" report:
Thanks for that one as well. It has interesting information and reflections that should be discussed in the operator/sidrops community, preferably by people more knowledgeable than me... Cheers, -- Simon.

On Thu, Jan 30, 2025 at 04:03:58PM +0100, Simon Leinen wrote:
It is interesting that the 'trigger event' happened two days ago, but it is only just now that it became quite tangible! It seems this anomaly could've been alerted for earlier on.
Can you elaborate how? (Looking for overly-large or otherwise suspicious manifests signed by CAs?)
One could develop simple monitoring utility which checks for 'overly' long filesizes of signed objects in the Relying Party's cache. I don't recommend the below for production monitoring, but merely as illustration. For example, using rpki-client on Debian Linux, the following displays the top 10 largest objects: $ cd /var/lib/rpki-client/cache $ find * -type f | xargs du -ka | sort -nr | head Another example, one could monitor the RRDP snapshot size simply by fetching it: $ curl -s https://rrdp.arin.net/notification.xml | grep snapshot <snapshot uri="https://rrdp.arin.net/4a394319-7460-4141-a416-1addb69284ff/99127/snapshot.xm..." hash="3f2acde605e9aa4b2370e41299d445b5c01a47f78d5ac8df4c8cdc69cf837a98"/> $ wget --no-verbose --compression=gzip https://rrdp.arin.net/4a394319-7460-4141-a416-1addb69284ff/99127/snapshot.xm... 2025-01-30 15:22:52 URL:https://rrdp.arin.net/4a394319-7460-4141-a416-1addb69284ff/99127/snapshot.xm... [532342274] -> "snapshot.xml" [1] In a similar way, the notification.xml can be used to find RRDP deltas and monitor those for size and trends in size. There also are all kinds of metrics available in OpenMetrics format in /var/lib/rpki-client/metrics All in all - there are hundreds of metrics to look at! :-) Kind regards, Job
participants (4)
-
Brad Gorman
-
Christopher Hawker
-
Job Snijders
-
Simon Leinen