looking for hostname router identifier validation
Hi NANOG, To support Internet topology analysis efforts, I have been working on an algorithm to automatically detect router names inside hostnames (PTR records) for router interfaces, and build regular expressions (regexes) to extract them. By "router name" inside the hostname, I mean a substring, or set of non-contiguous substrings, that is common among interfaces on a router. For example, suppose we had the following three routers in the savvis.net domain suffix, each with two interfaces: das1-v3005.nj2.savvis.net das1-v3006.nj2.savvis.net das1-v3005.oc2.savvis.net das1-v3007.oc2.savvis.net das2-v3009.nj2.savvis.net das2-v3012.nj2.savvis.net We might infer the router names are das1|nj2, das1|oc2, and das2|nj2, respectively, and captured by the regex: ^([a-z]+\d+)-[^\.]+\.([a-z]+\d+)\.savvis\.net$ After much refinement based on smaller sets of ground truth, I'm asking for broader feedback from operators. I've placed a webpage at https://www.caida.org/~mjl/rnc/ that shows the inferences my algorithm made for 2523 domains. If you operate one of the domains in that list, I would appreciate it if you could comment (private is probably better but public is fine with me) on whether the regex my algorithm inferred represents your naming intent. In the first instance, I am most interested in feedback for the suffix / date combinations for suffixes that are colored green, i.e. appear to be reasonable. Each suffix / date combination links to a page that contains the naming convention and corresponding inferences. The colored part of each hostname is the inferred router name. The green hostnames appear to be correct, at least as far as the algorithm determined. Some suffixes have errors due to either stale hostnames or incorrect training data, and those hostnames are colored red or orange. If anyone is interested in sets of hostnames the algorithm may have inferred as 'stale' for their network, because for some operators it was an oversight and they were grateful to learn about it, I can provide that information. Thanks, Matthew
I would caution against putting much faith in the validity of geolocation or site ID by reverse DNS PTR records. There are a vast number of unmaintained, ancient, stale, erroneous or wildly wrong PTR records out there. I can name at least a half dozen ISPs that have absorbed other ASes, some of those which also acquired other ASes earlier in their history, forming a turducken of obsolete PTR records that has things with ISP domain names last in use in the year 2002. On Mon, Apr 29, 2019 at 6:15 AM Matthew Luckie <mjl@luckie.org.nz> wrote:
Hi NANOG,
To support Internet topology analysis efforts, I have been working on an algorithm to automatically detect router names inside hostnames (PTR records) for router interfaces, and build regular expressions (regexes) to extract them. By "router name" inside the hostname, I mean a substring, or set of non-contiguous substrings, that is common among interfaces on a router. For example, suppose we had the following three routers in the savvis.net domain suffix, each with two interfaces:
das1-v3005.nj2.savvis.net das1-v3006.nj2.savvis.net
das1-v3005.oc2.savvis.net das1-v3007.oc2.savvis.net
das2-v3009.nj2.savvis.net das2-v3012.nj2.savvis.net
We might infer the router names are das1|nj2, das1|oc2, and das2|nj2, respectively, and captured by the regex: ^([a-z]+\d+)-[^\.]+\.([a-z]+\d+)\.savvis\.net$
After much refinement based on smaller sets of ground truth, I'm asking for broader feedback from operators. I've placed a webpage at https://www.caida.org/~mjl/rnc/ that shows the inferences my algorithm made for 2523 domains. If you operate one of the domains in that list, I would appreciate it if you could comment (private is probably better but public is fine with me) on whether the regex my algorithm inferred represents your naming intent. In the first instance, I am most interested in feedback for the suffix / date combinations for suffixes that are colored green, i.e. appear to be reasonable.
Each suffix / date combination links to a page that contains the naming convention and corresponding inferences. The colored part of each hostname is the inferred router name. The green hostnames appear to be correct, at least as far as the algorithm determined. Some suffixes have errors due to either stale hostnames or incorrect training data, and those hostnames are colored red or orange.
If anyone is interested in sets of hostnames the algorithm may have inferred as 'stale' for their network, because for some operators it was an oversight and they were grateful to learn about it, I can provide that information.
Thanks,
Matthew
On 4/29/19 3:13 PM, Eric Kuhnke wrote:
I would caution against putting much faith in the validity of geolocation or site ID by reverse DNS PTR records. There are a vast number of unmaintained, ancient, stale, erroneous or wildly wrong PTR records out there. I can name at least a half dozen ISPs that have absorbed other ASes, some of those which also acquired other ASes earlier in their history, forming a turducken of obsolete PTR records that has things with ISP domain names last in use in the year 2002.
I still see references to UUNet in some reverse PTRs. So, uh, yeah.
ekuhnke> I would caution against putting much faith in the validity of ekuhnke> geolocation or site ID by reverse DNS PTR records. There are a ekuhnke> vast number of unmaintained, ancient, stale, erroneous or ekuhnke> wildly wrong PTR records out there. I can name at least a half ekuhnke> dozen ISPs that have absorbed other ASes, some of those which ekuhnke> also acquired other ASes earlier in their history, forming a ekuhnke> turducken of obsolete PTR records that has things with ISP ekuhnke> domain names last in use in the year 2002. That's because the version of perl required to run the perl script that creates the ascii text PTR zone file is 4.x. perhaps? :) bryan> I still see references to UUNet in some reverse PTRs. bryan> So, uh, yeah. The uu.net PTRs should mostly have been service machines, like ns.uu.net, auth00.ns.uu.net (which horrifyingly do still resolve). Routers should have been in alter.net, which I do still see in traceroutes.
On Mon, 29 Apr 2019 16:16:06 -0500, Bryan Holloway said:
I still see references to UUNet in some reverse PTRs.
So, uh, yeah.
I wonder what year we'll get to a point where less than half of NANOG's membership was around when UUNet was. We're probably there already. And likely coming up on when less than half the people know what it was, other than myth and legend....
Once upon a time, Valdis Klētnieks <valdis.kletnieks@vt.edu> said:
I wonder what year we'll get to a point where less than half of NANOG's membership was around when UUNet was. We're probably there already. And likely coming up on when less than half the people know what it was, other than myth and legend....
I still refer to ASes by companies that haven't existed in ages... 701 is UUNet, 3561 is MCI, 1 is BBN, etc. :) I don't handle name changes well (I also refer to one of the main roads where I live by a name it hasn't had in close to 20 years). -- Chris Adams <cma@cmadams.net>
And 666 is Nero Caesar :-) On 19-04-29 17 h 38, Chris Adams wrote:
Once upon a time, Valdis Klētnieks <valdis.kletnieks@vt.edu> said:
I wonder what year we'll get to a point where less than half of NANOG's membership was around when UUNet was. We're probably there already. And likely coming up on when less than half the people know what it was, other than myth and legend.... I still refer to ASes by companies that haven't existed in ages... 701 is UUNet, 3561 is MCI, 1 is BBN, etc. :) I don't handle name changes well (I also refer to one of the main roads where I live by a name it hasn't had in close to 20 years).
On 30/4/19 10:38 am, Chris Adams wrote:
I still refer to ASes by companies that haven't existed in ages... 701 is UUNet, 3561 is MCI, 1 is BBN, etc. :) I don't handle name changes well (I also refer to one of the main roads where I live by a name it hasn't had in close to 20 years).
This is especially true with acquisitions, AS3549 will be GBLX for me until it finally goes offline, and AS3356 likewise L3.
On 4/29/19 7:21 PM, Valdis Klētnieks wrote:
On Mon, 29 Apr 2019 16:16:06 -0500, Bryan Holloway said:
I still see references to UUNet in some reverse PTRs.
So, uh, yeah.
I wonder what year we'll get to a point where less than half of NANOG's membership was around when UUNet was. We're probably there already. And likely coming up on when less than half the people know what it was, other than myth and legend....
Bought my first T-1 from those guys ... don't even ask how much it cost.
How much did it cost? :-) On 19-04-30 08 h 38, Bryan Holloway wrote:
On 4/29/19 7:21 PM, Valdis Klētnieks wrote:
On Mon, 29 Apr 2019 16:16:06 -0500, Bryan Holloway said:
I still see references to UUNet in some reverse PTRs.
So, uh, yeah.
I wonder what year we'll get to a point where less than half of NANOG's membership was around when UUNet was. We're probably there already. And likely coming up on when less than half the people know what it was, other than myth and legend....
Bought my first T-1 from those guys ... don't even ask how much it cost.
lhc> How much did it cost? :-) valdis> I'm willing to guess US$6digits/mo. 5 digits if you qualified for valdis> the quantity discount. :) We used to charge $2500 install and $2500/month for a T1 with agreement to not share or resell. It was something like double that if you wanted to resell? We sold a lot more 56k circuits. I'm going by memory here, which isn't as reliable as it once was. :)
Hi, I am aware that some PTR records are wrong. Can you please name the half dozen ISPs / suffixes so I can take a look at those in the data. In theory the code should score suffixes which have out of date records poorly. For suffixes that don't score poorly but have errors, there are other techniques that could reject incorrect clustering of router interfaces. Regarding uu.net (Bryan's email), it looks like those are colored red on the website after 201207, i.e. I would not use them for anything. But the transition to alter.net (Paul's email) looks good to me: https://www.caida.org/~mjl/rnc/201901/alter.net.html and I would claim the regex for alter.net is very good. If someone from alter.net is watching, can you comment on the gw1.iad8 inferences, where six interfaces are colored red as if they are named wrong (back in Jan 2019). My hunch is that the training data is wrong, and those interfaces belong on the same router. I can see similar behavior for gw4.lax15. Matthew On Mon, Apr 29, 2019 at 01:13:38PM -0700, Eric Kuhnke wrote:
I would caution against putting much faith in the validity of geolocation or site ID by reverse DNS PTR records. There are a vast number of unmaintained, ancient, stale, erroneous or wildly wrong PTR records out there. I can name at least a half dozen ISPs that have absorbed other ASes, some of those which also acquired other ASes earlier in their history, forming a turducken of obsolete PTR records that has things with ISP domain names last in use in the year 2002.
On Mon, Apr 29, 2019 at 6:15 AM Matthew Luckie <mjl@luckie.org.nz> wrote:
Hi NANOG,
To support Internet topology analysis efforts, I have been working on an algorithm to automatically detect router names inside hostnames (PTR records) for router interfaces, and build regular expressions (regexes) to extract them. By "router name" inside the hostname, I mean a substring, or set of non-contiguous substrings, that is common among interfaces on a router. For example, suppose we had the following three routers in the savvis.net domain suffix, each with two interfaces:
das1-v3005.nj2.savvis.net das1-v3006.nj2.savvis.net
das1-v3005.oc2.savvis.net das1-v3007.oc2.savvis.net
das2-v3009.nj2.savvis.net das2-v3012.nj2.savvis.net
We might infer the router names are das1|nj2, das1|oc2, and das2|nj2, respectively, and captured by the regex: ^([a-z]+\d+)-[^\.]+\.([a-z]+\d+)\.savvis\.net$
After much refinement based on smaller sets of ground truth, I'm asking for broader feedback from operators. I've placed a webpage at https://www.caida.org/~mjl/rnc/ that shows the inferences my algorithm made for 2523 domains. If you operate one of the domains in that list, I would appreciate it if you could comment (private is probably better but public is fine with me) on whether the regex my algorithm inferred represents your naming intent. In the first instance, I am most interested in feedback for the suffix / date combinations for suffixes that are colored green, i.e. appear to be reasonable.
Each suffix / date combination links to a page that contains the naming convention and corresponding inferences. The colored part of each hostname is the inferred router name. The green hostnames appear to be correct, at least as far as the algorithm determined. Some suffixes have errors due to either stale hostnames or incorrect training data, and those hostnames are colored red or orange.
If anyone is interested in sets of hostnames the algorithm may have inferred as 'stale' for their network, because for some operators it was an oversight and they were grateful to learn about it, I can provide that information.
Thanks,
Matthew
I legit guffawed. On 19-04-29 13 h 13, Eric Kuhnke wrote:
I would caution against putting much faith in the validity of geolocation or site ID by reverse DNS PTR records. There are a vast number of unmaintained, ancient, stale, erroneous or wildly wrong PTR records out there. I can name at least a half dozen ISPs that have absorbed other ASes, some of those which also acquired other ASes earlier in their history, forming a turducken of obsolete PTR records that has things with ISP domain names last in use in the year 2002.
On Mon, Apr 29, 2019 at 6:15 AM Matthew Luckie <mjl@luckie.org.nz <mailto:mjl@luckie.org.nz>> wrote:
Hi NANOG,
To support Internet topology analysis efforts, I have been working on an algorithm to automatically detect router names inside hostnames (PTR records) for router interfaces, and build regular expressions (regexes) to extract them. By "router name" inside the hostname, I mean a substring, or set of non-contiguous substrings, that is common among interfaces on a router. For example, suppose we had the following three routers in the savvis.net <http://savvis.net> domain suffix, each with two interfaces:
das1-v3005.nj2.savvis.net <http://das1-v3005.nj2.savvis.net> das1-v3006.nj2.savvis.net <http://das1-v3006.nj2.savvis.net>
das1-v3005.oc2.savvis.net <http://das1-v3005.oc2.savvis.net> das1-v3007.oc2.savvis.net <http://das1-v3007.oc2.savvis.net>
das2-v3009.nj2.savvis.net <http://das2-v3009.nj2.savvis.net> das2-v3012.nj2.savvis.net <http://das2-v3012.nj2.savvis.net>
We might infer the router names are das1|nj2, das1|oc2, and das2|nj2, respectively, and captured by the regex: ^([a-z]+\d+)-[^\.]+\.([a-z]+\d+)\.savvis\.net$
After much refinement based on smaller sets of ground truth, I'm asking for broader feedback from operators. I've placed a webpage at https://www.caida.org/~mjl/rnc/ that shows the inferences my algorithm made for 2523 domains. If you operate one of the domains in that list, I would appreciate it if you could comment (private is probably better but public is fine with me) on whether the regex my algorithm inferred represents your naming intent. In the first instance, I am most interested in feedback for the suffix / date combinations for suffixes that are colored green, i.e. appear to be reasonable.
Each suffix / date combination links to a page that contains the naming convention and corresponding inferences. The colored part of each hostname is the inferred router name. The green hostnames appear to be correct, at least as far as the algorithm determined. Some suffixes have errors due to either stale hostnames or incorrect training data, and those hostnames are colored red or orange.
If anyone is interested in sets of hostnames the algorithm may have inferred as 'stale' for their network, because for some operators it was an oversight and they were grateful to learn about it, I can provide that information.
Thanks,
Matthew
While at NTT and at Akamai we have managed to publish sane PTR records and make the forward work as well. You need to automate it by pulling from your router configuration database and publish to your DNS database. If you are still doing either by hand then it’s time to make the switch ASAP. Sent from my iCar
On Apr 29, 2019, at 4:13 PM, Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
I would caution against putting much faith in the validity of geolocation or site ID by reverse DNS PTR records. There are a vast number of unmaintained, ancient, stale, erroneous or wildly wrong PTR records out there. I can name at least a half dozen ISPs that have absorbed other ASes, some of those which also acquired other ASes earlier in their history, forming a turducken of obsolete PTR records that has things with ISP domain names last in use in the year 2002.
On Mon, Apr 29, 2019 at 6:15 AM Matthew Luckie <mjl@luckie.org.nz> wrote: Hi NANOG,
To support Internet topology analysis efforts, I have been working on an algorithm to automatically detect router names inside hostnames (PTR records) for router interfaces, and build regular expressions (regexes) to extract them. By "router name" inside the hostname, I mean a substring, or set of non-contiguous substrings, that is common among interfaces on a router. For example, suppose we had the following three routers in the savvis.net domain suffix, each with two interfaces:
das1-v3005.nj2.savvis.net das1-v3006.nj2.savvis.net
das1-v3005.oc2.savvis.net das1-v3007.oc2.savvis.net
das2-v3009.nj2.savvis.net das2-v3012.nj2.savvis.net
We might infer the router names are das1|nj2, das1|oc2, and das2|nj2, respectively, and captured by the regex: ^([a-z]+\d+)-[^\.]+\.([a-z]+\d+)\.savvis\.net$
After much refinement based on smaller sets of ground truth, I'm asking for broader feedback from operators. I've placed a webpage at https://www.caida.org/~mjl/rnc/ that shows the inferences my algorithm made for 2523 domains. If you operate one of the domains in that list, I would appreciate it if you could comment (private is probably better but public is fine with me) on whether the regex my algorithm inferred represents your naming intent. In the first instance, I am most interested in feedback for the suffix / date combinations for suffixes that are colored green, i.e. appear to be reasonable.
Each suffix / date combination links to a page that contains the naming convention and corresponding inferences. The colored part of each hostname is the inferred router name. The green hostnames appear to be correct, at least as far as the algorithm determined. Some suffixes have errors due to either stale hostnames or incorrect training data, and those hostnames are colored red or orange.
If anyone is interested in sets of hostnames the algorithm may have inferred as 'stale' for their network, because for some operators it was an oversight and they were grateful to learn about it, I can provide that information.
Thanks,
Matthew
On 4/30/19 7:12 AM, Jared Mauch wrote:
While at NTT and at Akamai we have managed to publish sane PTR records and make the forward work as well. You need to automate it by pulling from your router configuration database and publish to your DNS database. If you are still doing either by hand then it’s time to make the switch ASAP.
Sent from my iCar
What's the reverse of your iCar? ;)
Automation isn’t even that hard - just outsource (e.g. 6Connect). I get why some things stagnate & collect kruft. But it is actually EASIER, and probably cheaper (including people time), to have a 3rd party “just do it” when it comes to things like DNS & IPAM. Then again, if everyone ran everything perfectly … oh, then I could retire. :-) -- TTFN, patrick
On Apr 30, 2019, at 8:12 AM, Jared Mauch <jared@puck.nether.net> wrote:
While at NTT and at Akamai we have managed to publish sane PTR records and make the forward work as well. You need to automate it by pulling from your router configuration database and publish to your DNS database. If you are still doing either by hand then it’s time to make the switch ASAP.
Sent from my iCar
On Apr 29, 2019, at 4:13 PM, Eric Kuhnke <eric.kuhnke@gmail.com <mailto:eric.kuhnke@gmail.com>> wrote:
I would caution against putting much faith in the validity of geolocation or site ID by reverse DNS PTR records. There are a vast number of unmaintained, ancient, stale, erroneous or wildly wrong PTR records out there. I can name at least a half dozen ISPs that have absorbed other ASes, some of those which also acquired other ASes earlier in their history, forming a turducken of obsolete PTR records that has things with ISP domain names last in use in the year 2002.
On Mon, Apr 29, 2019 at 6:15 AM Matthew Luckie <mjl@luckie.org.nz <mailto:mjl@luckie.org.nz>> wrote: Hi NANOG,
To support Internet topology analysis efforts, I have been working on an algorithm to automatically detect router names inside hostnames (PTR records) for router interfaces, and build regular expressions (regexes) to extract them. By "router name" inside the hostname, I mean a substring, or set of non-contiguous substrings, that is common among interfaces on a router. For example, suppose we had the following three routers in the savvis.net <http://savvis.net/> domain suffix, each with two interfaces:
das1-v3005.nj2.savvis.net <http://das1-v3005.nj2.savvis.net/> das1-v3006.nj2.savvis.net <http://das1-v3006.nj2.savvis.net/>
das1-v3005.oc2.savvis.net <http://das1-v3005.oc2.savvis.net/> das1-v3007.oc2.savvis.net <http://das1-v3007.oc2.savvis.net/>
das2-v3009.nj2.savvis.net <http://das2-v3009.nj2.savvis.net/> das2-v3012.nj2.savvis.net <http://das2-v3012.nj2.savvis.net/>
We might infer the router names are das1|nj2, das1|oc2, and das2|nj2, respectively, and captured by the regex: ^([a-z]+\d+)-[^\.]+\.([a-z]+\d+)\.savvis\.net$
After much refinement based on smaller sets of ground truth, I'm asking for broader feedback from operators. I've placed a webpage at https://www.caida.org/~mjl/rnc/ <https://www.caida.org/~mjl/rnc/> that shows the inferences my algorithm made for 2523 domains. If you operate one of the domains in that list, I would appreciate it if you could comment (private is probably better but public is fine with me) on whether the regex my algorithm inferred represents your naming intent. In the first instance, I am most interested in feedback for the suffix / date combinations for suffixes that are colored green, i.e. appear to be reasonable.
Each suffix / date combination links to a page that contains the naming convention and corresponding inferences. The colored part of each hostname is the inferred router name. The green hostnames appear to be correct, at least as far as the algorithm determined. Some suffixes have errors due to either stale hostnames or incorrect training data, and those hostnames are colored red or orange.
If anyone is interested in sets of hostnames the algorithm may have inferred as 'stale' for their network, because for some operators it was an oversight and they were grateful to learn about it, I can provide that information.
Thanks,
Matthew
participants (10)
-
Bryan Holloway
-
Chris Adams
-
Eric Kuhnke
-
Jared Mauch
-
Julien Goodwin
-
Large Hadron Collider
-
Matthew Luckie
-
Patrick W. Gilmore
-
Paul Ebersman
-
Valdis Klētnieks