ROVER routing security - its not enumeration
Hi, Just wanted to clarify a few things about the ROVER approach. One key misunderstanding seems to be that ROVER is an approach for enumerating all potentially valid routes. This is not the case. Slides on ROVER are posted for the NANOG 55 talk and there was an additional Lightning talk Monday in NANOG A good summary of misunderstandings are listed below and addressed below:
Summarizing a few other things other people have mentioned:
- The normal operating mode with RPKI is to fetch everything rather than do a point query. We've spent the last decade or so making that harder to do with DNS (blocking AXFR/IXFR, using NSEC3 instead of NSEC, etc). This makes it fairly difficult to know in advance what queries one should be asking ROVER (as Paul Vixie puts it, ROVER isn't a catalogue). When I pressed the ROVER folks about this at the Paris IETF meeting, they mumbled something about maybe walking the IRR or other external databases as a way of knowing what DNS queries to issue.
ROVER's operational model is ask a question and get an answer. ROVER is not an enumeration method. RPKI does provide enumeration, but ROVER is not trying to duplicate RPKI. I think the first step is to step back and ask whether every operational model needs enumeration. For example, the talk yesterday by Level3 used the DNS and IRR did not need such an enumeration. Enumeration is not a goal in itself. There are number of operational models that provide the needed routing protection without enumeration.
- Circular dependencies are a problem. Helical dependencies can be made to work, but this says that one probably should not be depending on routing to make a point query to make decisions about routing. If you look at the architecture of the existing RPKI validators (well, mine and BBN's, anyway, not sure about RIPE's but suspect they took the same approach), we've gone to some trouble to make sure that the validator will continue to work across network outages as long as the collected data haven't expired or been revoked. In theory one could do the same thing with bulk transfers of DNS (whether AXFR/IXFR or NSEC walking, if they worked) but it would not work well with point queries.
Or a simpler approach that does not require bulk zone transfers or zone walking is simply DNS caching, which already exists and is well understood. More broadly, whether one calls its a cache or RPKI validator or whatever, you can build it with redundancy. One can certainly make either system work across network outages.
- ROVER gives us no traction on path validation (BGPSEC), it's limited to origin validation. RPKI can certify both prefixes and ASNs, which gives it the basics needed to support path validation as well as origin validation. ASNs have no hierarchical structure, thus would be a very poor match for encoding as DNS names. The focus is on origin and sub prefix hijacks. There are certainly discussions and early experiments with future additions, but the work is focused on origin/subprefix events. - Some of the DNS aspects of ROVER are a little strange. In particular, as currently specified ROVER requires the relying party to pay attention to DNS zone cuts, which is not normal in DNS (the basic DNS model since RFC 883 has been that zones are something for the zone administrator to worry about, resolvers mostly just see a tree of RRsets). ROVER requires the relying party to check for the same data in multiple zones and pay close attention to zone cuts. While it is certainly possible to do all this, it is not a matter of issuing a simple DNS query and you're done. DNS caching effects can also complicate matters here if the zone structure is changing: think about what happens if you have cached responses to some (but not all) of the queries you need to make to figure out whether to allow a more specific route punched out of a larger prefix block.
This is a misunderstanding of the ROVER approach. Multiple copies of the data do not exist in multiple zones. There is a one-to-one mapping between a prefix and a DNS name. The resolver simply finds the data and has no need to understand where zone cuts occur. On the other hand, DNS administrators do care about how they make zone cuts and delegate to their customers. They can take a /16 and delegate two /17's, or they can manage the whole thing in a single zone. Their choice. A resolver simply issues a query for the unique DNS name associated with a prefix. This could be done with anything from a complex tool set to a simply command line tool like dig. The confusion here may arise from what happens if you get an *authenticated* response saying there is no routing data at this name. This could mean 1) the prefix should not be announced or 2) the reverse DNS happens to be signed with DNSSEC but the site is not participating in routing security via DNS. To determine this, you issue a second query. Is an RLOCK present along with the DNSKEY used to sign the data? The existence of an RLOCK proves participation.
- The reuse of existing infrastructure argument for ROVER is somewhat disingenuous -- it's only partial reuse of existing infrastructure. ROVER's new encoding of prefixes as DNS names means that a lot of new stuff would need to be deployed, and attempting to be backwards compatible with the existing DNS reverse tree adds some complexity to ROVER's architecture I strongly disagree with this. ROVER does use a naming convention.
This is simply a convention, not a protocol change. The best analogy here is that one may have an internal naming convention for naming routers or particular servers or so forth. You should follow this convention and build this into your provisioning scripts where appropriate. Clearly it is enormously better if there is a consistent way to name prefixes so we have a common convention for naming the data. Everyone putting data in is using the convention and we are working to get the convention standardized. The convention is also useful for storing data at prefixes; geolocations is one example.
(conflicting data for same prefix can appear in multiple zones, relying party has to sort this out, yum).
Again, this is simply a naming convention. There is a unique name for a prefix. To DNS, this is a name like any other name. A DNS name belongs to a zone. It cannot appear in multiple zones. The prefix has a unique name. The name cannot appear in multiple zones. ROVER is not trying to do exactly what RPKI is doing. Much of this seems to be an attempt to build a form of enumeration into ROVER. See the Level3 NANOG talk from Monday (6/4/12) for a concrete example of a different model. There are many different operational models. We seek a common convention for data publishing, but believe strongly there can and should be different operational models for how you do validation in your network. Thanks, Dan and Joe
One correction below. On Jun 5, 2012, at 12:42 PM, Daniel Massey wrote: [--snip--]
I think the first step is to step back and ask whether every operational model needs enumeration. For example, the talk yesterday by Level3 used the DNS and IRR did not need such an enumeration.
To clarify the above, the IRR _does_ provide an enumerated list of "Candidate" (IP prefix + Origin_AS) pairs. The second step is to walk through those "Candidate" pairs and ask DNSSEC, in question/answer process, to validate that the "Candidate" IRR (IP prefix, Origin_AS) pairs are authentic, or not. So, considering each step independently: the former (IRR data) is enumeration, the second is not. However, in the context of this specific operational model, the end result is an enumerated list of validated (IP Prefix, Origin_AS) pairs. -shane
On Tue, Jun 5, 2012 at 2:42 PM, Daniel Massey <massey@cs.colostate.edu> wrote:
did not need such an enumeration. Enumeration is not a goal in itself. There are number of operational models that provide the needed routing protection without enumeration.
which are? I can see a use-case for something like: "Build me a prefix list from the RIR data" which is essentially: 1) pull IRR data for customer-X 2) validate all entries with 'resource certification' data 3) deploy new filter to edge-link-to-customer-X (only if changes occur) (shane seems to point at this as the method in question...) I think this means that the customer here has to keep updated their DNS data and their IRR data, and in the case (today) of 'ROVER' getting no-answer, the customer skates... (no validation is possible). I'm not sure you can extend usage of 'ROVER' to things which are not 'offline processed' though, and it's not clear to me that the fail-open answer is good for us, absent some signal that 'customer-x will not be playing today'.
- Circular dependencies are a problem. Helical dependencies can be made to work, but this says that one probably should not be depending on routing to make a point query to make decisions about routing. If you look at the architecture of the existing RPKI validators (well, mine and BBN's, anyway, not sure about RIPE's but suspect they took the same approach), we've gone to some trouble to make sure that the validator will continue to work across network outages as long as the collected data haven't expired or been revoked. In theory one could do the same thing with bulk transfers of DNS (whether AXFR/IXFR or NSEC walking, if they worked) but it would not work well with point queries.
Or a simpler approach that does not require bulk zone transfers or zone walking is simply DNS caching, which already exists and is well understood.
caching implies that: 1) the cache is filled 2) the timeout on records is longer than the outage(s) 3) the timeout is still short-enough to meet user change requirements
- ROVER gives us no traction on path validation (BGPSEC), it's limited to origin validation. RPKI can certify both prefixes and ASNs, which gives it the basics needed to support path validation as well as origin validation. ASNs have no hierarchical structure, thus would be a very poor match for encoding as DNS names.
The focus is on origin and sub prefix hijacks. There are certainly discussions and
in somewhat real-time on the router (get update, lookup dns records, decide)? or via offline compute and peer filter-updates?
- Some of the DNS aspects of ROVER are a little strange. In particular, as currently specified ROVER requires the relying party to pay attention to DNS zone cuts, which is not normal in DNS (the basic DNS model since RFC 883 has been that zones are something for the zone administrator to worry about, resolvers mostly just see a tree of RRsets). ROVER requires the relying party to check for the same data in multiple zones and pay close attention to zone cuts. While it is certainly possible to do all this, it is not a matter of issuing a simple DNS query and you're done. DNS caching effects can also complicate matters here if the zone structure is changing: think about what happens if you have cached responses to some (but not all) of the queries you need to make to figure out whether to allow a more specific route punched out of a larger prefix block.
This is a misunderstanding of the ROVER approach. Multiple copies of the data do not exist in multiple zones. There is a one-to-one mapping
1.23.45.10.in-addr.arpa. <rover prefix entry-10.45/16> that's 2 copies... what about: 1.23.45.10.in-addr-arpa. <rover-covering-route entry> <rover-customer-allocation-10.45.16/19> <rover-customer-of-customer-allocation-10.45.23/24> that's 4 copies.
between a prefix and a DNS name. The resolver simply finds the data and has no need to understand where zone cuts occur.
don't I have to walk up the tree a few times in the above example though? "Is this the covering route? the customer route? the customer-of-customer-route? the-hijack? Wait, no RLOCK, so this was a giant waste of time..."
A resolver simply issues a query for the unique DNS name associated with a prefix. This could be done with anything from a complex tool set to a simply command line tool like dig.
'resolver' here is what? router? unix-y-box-thing doing filter-generation? near-line-query/response-box for router-real-time-lookup?
The convention is also useful for storing data at prefixes; geolocations is one example.
not to nit-pick, but near as I can tell no one uses the geoloc entries in dns... also they aren't very well kept up to date by those few who actually do put them into dns :(
(conflicting data for same prefix can appear in multiple zones, relying party has to sort this out, yum).
Again, this is simply a naming convention. There is a unique name for a prefix. To DNS, this is a name like any other name. A DNS name belongs to a zone. It cannot appear in multiple zones. The prefix has a unique name. The name cannot appear in multiple zones.
10.45.23.0/24 10.45.16.0/19 10.45.0.0/16 10.0.0.0/8
ROVER is not trying to do exactly what RPKI is doing. Much of this seems to be an attempt to build a form of enumeration into ROVER. See the Level3 NANOG talk from Monday (6/4/12) for a concrete example of a different model. There are many different
you referenced this a few times: <http://www.nanog.org/meetings/nanog55/agenda.php> doesn't mention a talk from L3 on 6/4 ... got link? -chris
There are number of operational models that provide the needed routing protection without enumeration. I can see a use-case for something like: "Build me a prefix list from the RIR data"
this requires a full data fetch, not doable in dns. and, at the other end of the spectrum, for any dynamic lookup on receiving a bgp announcement, the data had best be already in the router. a full data set on an in-rack cache will go nuts on any significant bgp load. beyond that, you are in non-op space. randy
On Tue, Jun 5, 2012 at 3:40 PM, Randy Bush <randy@psg.com> wrote:
There are number of operational models that provide the needed routing protection without enumeration. I can see a use-case for something like: "Build me a prefix list from the RIR data"
this requires a full data fetch, not doable in dns.
does it? shane implied (and it doesn't seem UNREASONABLE, modulo some 'doing lots of spare queries') to query for each filter entry at filter creation time, no? get-as-GOOGLE = 216.239.32.0/19 lookup-in-dns = <rover-query-for-/19> + <rover-query-for-/20> + <rover-query-for-/21>..... that could be optimized I bet, but it SEEMS doable, cumbersome, but doable. the 'fail open' answer also seems a bit rough in this case (but no worse than 'download irr, upload to router, win!' which is today's model). -chris
routing protection without enumeration. I can see a use-case for something like: "Build me a prefix list from the RIR data" this requires a full data fetch, not doable in dns. does it? shane implied (and it doesn't seem UNREASONABLE, modulo some 'doing lots of spare queries') to query for each filter entry at filter creation time, no?
what is the query set, every prefix /7-/24 for the whole fracking ABC space?
that could be optimized I bet, but it SEEMS doable, cumbersome, but doable. the 'fail open' answer also seems a bit rough in this case (but no worse than 'download irr, upload to router, win!' which is today's model).
irr, i do have the 'full' set. but you said RIR (the in-addr roots), not IRR. was it a mis-type? and i am not gonna put my origin data in the irr and the dns. randy
On Tue, Jun 5, 2012 at 5:00 PM, Randy Bush <randy@psg.com> wrote:
routing protection without enumeration. I can see a use-case for something like: "Build me a prefix list from the RIR data" this requires a full data fetch, not doable in dns. does it? shane implied (and it doesn't seem UNREASONABLE, modulo some 'doing lots of spare queries') to query for each filter entry at filter creation time, no?
what is the query set, every prefix /7-/24 for the whole fracking ABC space?
that could be optimized I bet, but it SEEMS doable, cumbersome, but doable. the 'fail open' answer also seems a bit rough in this case (but no worse than 'download irr, upload to router, win!' which is today's model).
irr, i do have the 'full' set. but you said RIR (the in-addr roots), not IRR. was it a mis-type?
oh hell :( yes, I meant IRR.
and i am not gonna put my origin data in the irr and the dns.
yea... so today people already fill in: RIR (swip/rwhois) IRR (routing filter updates) DNS (make sure your mailserver has PTRs!) putting origin-validation data into IRR's happens today, it's not 'secured' in any fashion, and lots of proof has shown that 'people fill it with junk' :( So being able to bounce the IRR data off some verifiable source of truth seems like a plus. How verifiable is the rdns-rover tree though? how do I get my start in that prefix hierarchy anyway? by talking to IANA? to my local RIR? to 'jimbo the dns guy down the street?' (I realize that referencing the draft would probably get me this answer but it's too hard to look that up in webcrawler that right now...) -Chris
putting origin-validation data into IRR's happens today, it's not 'secured' in any fashion, and lots of proof has shown that 'people fill it with junk' :( So being able to bounce the IRR data off some verifiable source of truth seems like a plus.
so i should use the sow's ear as the authoritative definition of the full set? randy
On 6/5/12 3:40 PM, Randy Bush wrote:
There are number of operational models that provide the needed routing protection without enumeration. I can see a use-case for something like: "Build me a prefix list from the RIR data" this requires a full data fetch, not doable in dns.
and, at the other end of the spectrum, for any dynamic lookup on receiving a bgp announcement, the data had best be already in the router. a full data set on an in-rack cache will go nuts on any significant bgp load. beyond that, you are in non-op space.
randy
I think we debate the superficial here, and without sufficient imagination. The enumerations vs query issue is a NOOP as far as I am concerned. With a little imagination, one could envision building a box that takes a feed of prefixes observed, builds an aged cache of prefixes of interest, queries for their SRO records, re queries for those records before their TTLs expire, and maintains a white list of "SRO valid" prefix/origin pairs that it downloads to the router. Lets call that box a SRO validating cache. Where do you get the feed of prefixes of interest? From your own RIBs if you are only interested in white lists proportional to the routes you actually see, e.g., feed the box iBGP. From other sources (monitors, etc) if you would like a white list of every known prefix that anyone has seen. What about a completely new prefix being turned up? ... we could talk through those scenarios in each approach. How does the cache down load the white list to the router ... we already have one approach for that. Add a bit to the protocol to distinguish semantics of SRO from ROA semantics if necessary. Point being, with a little imagination I think one could build components with either approach with similar black box behavior. If there are real differences in these approaches it will be in their inherent trust models, the processes that maintain those trust models, the system's level behavior of the info creation and distribution systems, and the expressiveness of their validation frameworks. dougm
Doug Montgomery <dougm.tlist@gmail.com> writes:
...
I think we debate the superficial here, and without sufficient imagination. The enumerations vs query issue is a NOOP as far as I am concerned. With a little imagination, one could envision building a box that takes a feed of prefixes observed, builds an aged cache of prefixes of interest, queries for their SRO records, re queries for those records before their TTLs expire, and maintains a white list of "SRO valid" prefix/origin pairs that it downloads to the router.
this sounds like a steady state system. how would you initially populate it, given for example a newly installed core router having no routing table yet? if the answer is, rsync from somewhere, then i propose, rsync from RPKI. if the answer is, turn off security during bootup, then i claim, bad idea.
...
Point being, with a little imagination I think one could build components with either approach with similar black box behavior.
i don't think so. and i'm still waiting for a network operator to say what they think the merits of ROVER might be in comparison to the RPKI approach. (noting, arguments from non-operators should and do carry less weight.) -- Paul Vixie KI6YSY
On 6/10/12 5:53 PM, "Paul Vixie" <vixie@isc.org> wrote:
Doug Montgomery <dougm.tlist@gmail.com> writes:
...
I think we debate the superficial here, and without sufficient imagination. The enumerations vs query issue is a NOOP as far as I am concerned. With a little imagination, one could envision building a box that takes a feed of prefixes observed, builds an aged cache of prefixes of interest, queries for their SRO records, re queries for those records before their TTLs expire, and maintains a white list of "SRO valid" prefix/origin pairs that it downloads to the router.
this sounds like a steady state system. how would you initially populate it, given for example a newly installed core router having no routing table yet?
if the answer is, rsync from somewhere, then i propose, rsync from RPKI.
if the answer is, turn off security during bootup, then i claim, bad idea.
Well, I should probably let the ROVER guys say what they have in mind. The above started from my imagination that if you did not want routers actually doing route-by-route queries, that it would be easy to build a validating cache that behaves similar to a RPKI validating cache, but pulling the info from rDNS as opposed to RPKI. Maybe the ROVER guys have something else in mind (e.g., routers doing the queries themselves, some other model of how the info ... Or its impacts ... Is effected on the router). IFF you do imagine that there is a SRO validating cache box - you can decompose the question of how one solves state skew between (1) rtr and cache, (2) cache and info authoritative source, and (3) how new authoritative information gets globally distributed/effected in the system. Looking at just (1) (your question I think), we have a couple of different questions to look at. a. How does a router with no origin info (new router, router reboot), synchronize with the cache (assuming the cache has state). The current machinery of rtr-to-cache would work fine here. Might need to add a bit or two, but the basic problem is the same. b. How does a cache with no state, build a list of prefix-origin pairs? Clearly if one builds a SRO validating cache box, the usual techniques of checkpointing state, having redundant cache's etc could be used ... But at some level the question of having to get initial state, and what the router does during that period (assuming that the stateless cache is his only source) must be answered. One way of thinking about these questions, is to ask how would it work in RPKI? If for origin validation we have a strict "don't fail open" during resets requirement, then there are a lot of initialization questions we must address in any system. I.e., what does the router do, if its only RPKI cache has to rebuild state from zero? What does such a router do if it looses contact with its cache? At this point, I could propose more ideas, but probably going further with my imagination is not important. The ROVER guys should tell us what they have in mind, or someone interested in building a ROVER validating cache should design one and tell us. But maybe stepping back one level of abstraction, you can think of things this way. We have a top-down-enumeration vs query model. One could put a cache in the the query model to make it approximate an enumeration model, but only to the point that one has, or can build a reasonably complete, list of prefixes of interest. If one admits that sometimes there will be cache misses (in the query/cache model) and one might have to query in those cases, then the trade off seems to be how often that occurs vs the responsiveness one would get out of such a system for situations when the authoritative information itself changes (case 3 above). I.e., how fast could you turn up a new prefix in each system? Maybe the ROVER guys don't believe in caches at all. In which case I return you to the original "OMG! Enumeration vs Query thread". I just don't think that is the most significant difference between the two approaches. dougm
...
Point being, with a little imagination I think one could build components with either approach with similar black box behavior.
i don't think so. and i'm still waiting for a network operator to say what they think the merits of ROVER might be in comparison to the RPKI approach. (noting, arguments from non-operators should and do carry less weight.)
-- Paul Vixie KI6YSY
Shane A. gave a Lightning Talk the slides for which will be posted at some time soon. They came in at the last minute which is why they're not up already. Tony On Tue, Jun 5, 2012 at 3:28 PM, Christopher Morrow <morrowc.lists@gmail.com>wrote:
On Tue, Jun 5, 2012 at 2:42 PM, Daniel Massey <massey@cs.colostate.edu> wrote:
ROVER is not trying to do exactly what RPKI is doing. Much of this seems to be an attempt to build a form of enumeration into ROVER. See the Level3 NANOG talk from Monday (6/4/12) for a concrete example of a different model. There are many different
you referenced this a few times: <http://www.nanog.org/meetings/nanog55/agenda.php>
doesn't mention a talk from L3 on 6/4 ... got link?
-chris
On Tue, Jun 5, 2012 at 5:39 PM, Tony Tauber <ttauber@1-4-5.net> wrote:
Shane A. gave a Lightning Talk the slides for which will be posted at some time soon.
I figured the talk was shane's.
They came in at the last minute which is why they're not up already.
ok, cool. thanks -chris
Tony
On Tue, Jun 5, 2012 at 3:28 PM, Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Tue, Jun 5, 2012 at 2:42 PM, Daniel Massey <massey@cs.colostate.edu> wrote:
ROVER is not trying to do exactly what RPKI is doing. Much of this seems to be an attempt to build a form of enumeration into ROVER. See the Level3 NANOG talk from Monday (6/4/12) for a concrete example of a different model. There are many different
you referenced this a few times: <http://www.nanog.org/meetings/nanog55/agenda.php>
doesn't mention a talk from L3 on 6/4 ... got link?
-chris
participants (7)
-
Christopher Morrow
-
Daniel Massey
-
Doug Montgomery
-
Paul Vixie
-
Randy Bush
-
Shane Amante
-
Tony Tauber