Re: looking for hostname geographic hint validation
On Wed, Aug 28, 2013 at 04:07:05PM +0100, Ben wrote:
Dear Bradley,
So basically you're asking others to do your homework for you ? ;-)
Actually no, I'm asking people to do something which I can not. While it is true I could test against a manual inference, I would simply be checking one inference against another. Agreement would only prove that the algorithm does what I expect. Only the operators, who actually know what they are doing, can give me the ground truth I need to test my inferences against reality.
For example, picking one example from your list ....
<iata>([^a-z]+[a-z]+\d*){3}.ic.ac.uk
Far from being IATA codes, the intermediate subdomains actually refer to departments (DepartmentOfComputing and CHemistry in the two I quoted).
Sorry to rain on your parade, but someone had to say it. ;-)
You are most likely right, but I am not looking for perfection. I am hoping for an inference that will get me with in 10 km of the actual city most of the time. Given the validation I have so far, out of the 19,611 hostnames for which a location is inferred, and I have validation data, we infer the city correctly 93% of the time. While there is work left to do, it is far from the lost cause you present. -- the value of a world model is not how accurately it captures reality but how often it leads us to take appropriate action
participants (1)
-
Bradley Huffaker