Theorical question about cyclic dependency in IRR filtering
Hello everyone, While discussing IRR on some groups recently, I was thinking if there can be (and if there is) cycling dependency in filtering where IRR (run by whoever APNIC, RIPE, RADB etc) uses some upstream and accepts only routes with existing & valid route object. So hypothetical case (can apply to any IRR): 1. APNIC registry source is whois.apnic.net and points to 202.12.28.136 / 2001:dc0:1:0:4777::136. The aggregate of both these has a valid route object at the APNIC registry itself. 2. Their upstreams say AS X, Y and Z have tooling in place to generate and push filters by checking all popular IRRs. All is well till this point. 3. Say APNIC has some server/service issue for a few mins and X Y and Z are updating their filters at the same time. They cannot contact whois.apnic.net and hence miss generating filters for all APNIC IRR hosted prefixes. 4. X, Y and Z drop APNIC prefixes including those of IRR & the loop goes on from this point onwards. So my question is: Can that actually happen? If not, do X, Y and Z and possible all upstreams till default-free zone treat these prefixes in a special manner to avoid such loop in resolution? Thanks! -- Anurag Bhatia anuragbhatia.com
Hi Anurag, Circular dependencies definitely are a thing to keep in mind when designing IRR and RPKI pipelines! In the case of IRR: It is quite rare to query the RIR IRR services directly. Instead, the common practise is that utilities such as bgpq3, peval, and bgpq4 query “IRRd” (https://IRRd.net) instances at for example whois.radb.net and rr.ntt.net. You can verify this with tcpdump. These IRRd instances serve as intermediate caches, and will continue to serve old cached data in case the origin is down. This phenomenon in the global IRR deployment avoids a lot of potential for circular dependencies. Also, some organisations use threshold checks before deploying new IRR-based filters to reduce risk of “misfiring”. The RPKI case is slightly different: the timers are far more aggressive compared to IRR, and until “Publish in Parent” (RFC 8181) becomes common place, there are more publication points, thus more potential for operators to paint themselves into a corner. Certainly, in the case of RPKI, all Publication Point (PP) operators need to take special care to not host CAs which have the PP’s INRs listed as subordinate resources inside the PP. See RFC 7115 Section 5 for more information: “Operators should be aware that there is a trade-off in placement of an RPKI repository in address space for which the repository’s content is authoritative. On one hand, an operator will wish to maximize control over the repository. On the other hand, if there are reachability problems to the address space, changes in the repository to correct them may not be easily access by others” Ryan Sleevi once told me: "yes, it strikes me that you should prevent self-compromise from being able to perpetually own yourself, by limiting an attacker’s ability to persist beyond remediation." A possible duct tape approach is outlined at https://bgpfilterguide.nlnog.net/guides/slurm_ta/ However, I can’t really recommend the SLURM file approach. Instead, RPKI repository operators are probably best off hosting their repository *outside* their own address space. Just like with Authoritative DNS servers, make sure you also can serve your records via a competitor! :-) For example, if ARIN moved one of their three publication point clusters into address space managed by any of the other four RIRs, some risk would be reduced. Kind regards, Job On Mon, 29 Nov 2021 at 13:37, Anurag Bhatia <me@anuragbhatia.com> wrote:
Hello everyone,
While discussing IRR on some groups recently, I was thinking if there can be (and if there is) cycling dependency in filtering where IRR (run by whoever APNIC, RIPE, RADB etc) uses some upstream and accepts only routes with existing & valid route object.
So hypothetical case (can apply to any IRR):
1. APNIC registry source is whois.apnic.net and points to 202.12.28.136 / 2001:dc0:1:0:4777::136. The aggregate of both these has a valid route object at the APNIC registry itself.
2. Their upstreams say AS X, Y and Z have tooling in place to generate and push filters by checking all popular IRRs. All is well till this point.
3. Say APNIC has some server/service issue for a few mins and X Y and Z are updating their filters at the same time. They cannot contact whois.apnic.net and hence miss generating filters for all APNIC IRR hosted prefixes.
4. X, Y and Z drop APNIC prefixes including those of IRR & the loop goes on from this point onwards.
So my question is: Can that actually happen? If not, do X, Y and Z and possible all upstreams till default-free zone treat these prefixes in a special manner to avoid such loop in resolution?
Thanks!
-- Anurag Bhatia anuragbhatia.com
On Mon, Nov 29, 2021 at 8:14 AM Job Snijders via NANOG <nanog@nanog.org> wrote:
Hi Anurag,
Circular dependencies definitely are a thing to keep in mind when designing IRR and RPKI pipelines!
In the case of IRR: It is quite rare to query the RIR IRR services directly. Instead, the common practise is that utilities such as bgpq3, peval, and bgpq4 query “IRRd” (https://IRRd.net) instances at for example whois.radb.net and rr.ntt.net. You can verify this with tcpdump. These IRRd instances serve as intermediate caches, and will continue to serve old cached data in case the origin is down. This phenomenon in the global IRR deployment avoids a lot of potential for circular dependencies.
Also, some organisations use threshold checks before deploying new IRR-based filters to reduce risk of “misfiring”.
beyond just 'did the filter deployed change by +/- X%' you probably don't want to deploy content if you can't actually talk to the source... which was anurag's proposed problem. I suppose there are a myriad of actual failure modes though ;) and we'll always find more as deployments progress... hurray?
Hi Chris, On 11/29, Christopher Morrow wrote:
On Mon, Nov 29, 2021 at 8:14 AM Job Snijders via NANOG <nanog@nanog.org> wrote:
Hi Anurag,
Circular dependencies definitely are a thing to keep in mind when designing IRR and RPKI pipelines!
In the case of IRR: It is quite rare to query the RIR IRR services directly. Instead, the common practise is that utilities such as bgpq3, peval, and bgpq4 query “IRRd” (https://IRRd.net) instances at for example whois.radb.net and rr.ntt.net. You can verify this with tcpdump. These IRRd instances serve as intermediate caches, and will continue to serve old cached data in case the origin is down. This phenomenon in the global IRR deployment avoids a lot of potential for circular dependencies.
Also, some organisations use threshold checks before deploying new IRR-based filters to reduce risk of “misfiring”.
beyond just 'did the filter deployed change by +/- X%' you probably don't want to deploy content if you can't actually talk to the source... which was anurag's proposed problem.
The point that Job was (I think?) trying to make was that by querying a mirror for IRR data at filter generation time, as opposed to the source DB directly, the issue that Anurag envisioned can be avoided. I would recommend that anyone (esp. transit operators) using IRR data for filter generation run a local mirror whose reachability is not subject to IRR-based filters. Of course, disruption of the NRTM connection between the mirror and the source DB can still result in local data becoming stale/incomplete. You can imagine a situation where an NRTM update to an object covering the source DB address space is missed during a connectivity outage, and that missed change causes the outage to become persistent. However, I think that is fairly contrived. I have certainly never seen it in practise. Cheers, Ben
On Tue, Nov 30, 2021 at 3:20 AM Ben Maddison <benm@workonline.africa> wrote:
Hi Chris,
On Mon, Nov 29, 2021 at 8:14 AM Job Snijders via NANOG <nanog@nanog.org> wrote:
Hi Anurag,
Circular dependencies definitely are a thing to keep in mind when designing IRR and RPKI pipelines!
In the case of IRR: It is quite rare to query the RIR IRR services directly. Instead, the common practise is that utilities such as bgpq3, peval, and bgpq4 query “IRRd” (https://IRRd.net) instances at for example whois.radb.net and rr.ntt.net. You can verify this with tcpdump. These IRRd instances serve as intermediate caches, and will continue to serve old cached data in case the origin is down. This phenomenon in the global IRR deployment avoids a lot of potential for circular dependencies.
Also, some organisations use threshold checks before deploying new IRR-based filters to reduce risk of “misfiring”.
beyond just 'did the filter deployed change by +/- X%' you probably don't want to deploy content if you can't actually talk to
On 11/29, Christopher Morrow wrote: the
source... which was anurag's proposed problem.
The point that Job was (I think?) trying to make was that by querying a mirror for IRR data at filter generation time, as opposed to the source DB directly, the issue that Anurag envisioned can be avoided.
I would recommend that anyone (esp. transit operators) using IRR data for filter generation run a local mirror whose reachability is not subject to IRR-based filters.
yup, sure; "remove external dependencies, move them internal" :) you can STILL end up with zero prefixes even in this case, of course.
Of course, disruption of the NRTM connection between the mirror and the source DB can still result in local data becoming stale/incomplete.
yup!
You can imagine a situation where an NRTM update to an object covering the source DB address space is missed during a connectivity outage, and that missed change causes the outage to become persistent. However, I think that is fairly contrived. I have certainly never seen it in practise.
sure, there's a black-swan comment in here somewhere :) The overall comment set here is really: "Plan for errors and graceful resumption of service in their existence" (and planning is hard)
Cheers,
Ben
Coin phrase ... IRR (dedup) -- J. Hellenthal The fact that there's a highway to Hell but only a stairway to Heaven says a lot about anticipated traffic volume.
On Nov 29, 2021, at 07:17, Job Snijders via NANOG <nanog@nanog.org> wrote:
Hi Anurag,
Circular dependencies definitely are a thing to keep in mind when designing IRR and RPKI pipelines!
In the case of IRR: It is quite rare to query the RIR IRR services directly. Instead, the common practise is that utilities such as bgpq3, peval, and bgpq4 query “IRRd” (https://IRRd.net) instances at for example whois.radb.net and rr.ntt.net. You can verify this with tcpdump. These IRRd instances serve as intermediate caches, and will continue to serve old cached data in case the origin is down. This phenomenon in the global IRR deployment avoids a lot of potential for circular dependencies.
Also, some organisations use threshold checks before deploying new IRR-based filters to reduce risk of “misfiring”.
The RPKI case is slightly different: the timers are far more aggressive compared to IRR, and until “Publish in Parent” (RFC 8181) becomes common place, there are more publication points, thus more potential for operators to paint themselves into a corner.
Certainly, in the case of RPKI, all Publication Point (PP) operators need to take special care to not host CAs which have the PP’s INRs listed as subordinate resources inside the PP.
See RFC 7115 Section 5 for more information: “Operators should be aware that there is a trade-off in placement of an RPKI repository in address space for which the repository’s content is authoritative. On one hand, an operator will wish to maximize control over the repository. On the other hand, if there are reachability problems to the address space, changes in the repository to correct them may not be easily access by others”
Ryan Sleevi once told me: "yes, it strikes me that you should prevent self-compromise from being able to perpetually own yourself, by limiting an attacker’s ability to persist beyond remediation."
A possible duct tape approach is outlined at https://bgpfilterguide.nlnog.net/guides/slurm_ta/ However, I can’t really recommend the SLURM file approach. Instead, RPKI repository operators are probably best off hosting their repository *outside* their own address space.
Just like with Authoritative DNS servers, make sure you also can serve your records via a competitor! :-)
For example, if ARIN moved one of their three publication point clusters into address space managed by any of the other four RIRs, some risk would be reduced.
Kind regards,
Job
On Mon, 29 Nov 2021 at 13:37, Anurag Bhatia <me@anuragbhatia.com> wrote: Hello everyone,
While discussing IRR on some groups recently, I was thinking if there can be (and if there is) cycling dependency in filtering where IRR (run by whoever APNIC, RIPE, RADB etc) uses some upstream and accepts only routes with existing & valid route object.
So hypothetical case (can apply to any IRR):
APNIC registry source is whois.apnic.net and points to 202.12.28.136 / 2001:dc0:1:0:4777::136. The aggregate of both these has a valid route object at the APNIC registry itself.
Their upstreams say AS X, Y and Z have tooling in place to generate and push filters by checking all popular IRRs. All is well till this point.
Say APNIC has some server/service issue for a few mins and X Y and Z are updating their filters at the same time. They cannot contact whois.apnic.net and hence miss generating filters for all APNIC IRR hosted prefixes.
X, Y and Z drop APNIC prefixes including those of IRR & the loop goes on from this point onwards.
So my question is: Can that actually happen? If not, do X, Y and Z and possible all upstreams till default-free zone treat these prefixes in a special manner to avoid such loop in resolution?
Thanks!
-- Anurag Bhatia anuragbhatia.com
participants (5)
-
Anurag Bhatia
-
Ben Maddison
-
Christopher Morrow
-
J. Hellenthal
-
Job Snijders