looking for operator validation for regexes that extract ASNs
Hi NANOG To support Internet topology analysis efforts, we have been working on an algorithm to detect AS numbers inside hostnames (PTR records) for router interfaces, and automatically build regular expressions (regexes) to extract them. Specifically, we are looking at operators who embed the ASN of their neighbor in the hostname when they provide the IP address to the neighbor for interconnection. For example, suppose we had the following three hostnames in the gtt.net domain suffix, which we believed to be assigned to routers operated by ASes 1215, 1273, and 10835 as1215.xe-7-0-6.ar2.sjc1.us.as4436.gtt.net as1273.hkg11.ip4.gtt.net as10835.cr3-sea2.ip4.gtt.net We might infer the regex ^as(\d+)\..+\.gtt\.net$ extracts these ASNs and reflects GTT practice to name these IP addresses they assigned to neighbors with the ASN that operates the router. We're at the stage where we are asking for broader feedback from operators. The webpage at https://www.caida.org/~mjl/rnc/asn/ shows the inferences our algorithm made for 219 domains. If you operate one of the domains in that list, we would appreciate it if you could comment (private is probably better but public is fine with me) on whether the regex our algorithm inferred represents your naming intent. In the first instance, we are most interested in feedback for the suffix / date combinations for suffixes that are colored green or orange, i.e. appear to be reasonable. Each suffix / date combination links to a page that contains the naming convention and corresponding inferences. The colored part of each hostname is the ASN extracted by the regex. The green hostnames appear to be correct, at least as far as the algorithm determined. Some suffixes have errors due to either stale hostnames or incorrect training data, and those hostnames are colored red. We'd appreciate particular feedback for the red hostnames -- was the hostname stale, or the training data incorrect? Thanks, Matthew
❦ 11 mai 2020 20:03 +12, Matthew Luckie:
To support Internet topology analysis efforts, we have been working on an algorithm to detect AS numbers inside hostnames (PTR records) for router interfaces, and automatically build regular expressions (regexes) to extract them.
Hello Matthew, This work is quite interesting. I see you have also a page to build regex from router names for each operator. Did you already work on extracting city names/US states? This would be quite helpful as well. -- Take care to branch the right way on equality. - The Elements of Programming Style (Kernighan & Plauger)
Hi Vincent, On Mon, May 11, 2020 at 10:36:03AM +0200, Vincent Bernat wrote:
This work is quite interesting. I see you have also a page to build regex from router names for each operator. Did you already work on extracting city names/US states? This would be quite helpful as well.
I haven't myself, but others have: http://ddec.caida.org/ https://www.caida.org/publications/papers/2014/drop/ https://www.cs.umd.edu/~nspring/ (undns software, the rubygem link works) Matthew
participants (2)
-
Matthew Luckie
-
Vincent Bernat