On Nov 30, 2012, at 11:09 AM, William Herrin <bill@herrin.us> wrote:
On Fri, Nov 30, 2012 at 9:45 AM, Ray Soucy <rps@maine.edu> wrote:
I'll see your disagree and raise you another ;-)
I would say you almost never want to store addresses as character data unless the only thing you're using them for is logging (even then it's questionable). I run into people who do this all the time and it's a nightmare.
It's easy to store a v6 address as a string, but when you want to select a range of IPv6 addresses from a database, not having them represented as integers means you can't do efficient numerical comparisons in your SQL statements, it also makes indexing your table slower; to put it simply, it doesn't scale well.
Hi Ray,
If you've stored them in the string format I suggested, the string comparison *is* an efficient numerical comparison. On a CISC processor it may even be implemented with a single instruction byte string comparison. Go test. You may be surprised at the results.
The one useful function you can't do directly from a string format is apply an AND mask (netmask). More often than not this is irrelevant: you don't want to load the data and then apply the mask, you want the mask to constrain the data which you load from the database. You'd need the database software to understand the address type and index it with a radix tree, something it can do with neither a string format nor your split 64-bit format.
Since non-contiguous masking is rare, this can, actually be pretty efficient for contiguous masking because you have a ¼ chance that the mask aligns with a character (the more I think about this, the more I think storing the address as a 32-character string without colons makes the most sense). If it's not aligned on a nibble boundary, then you can either do ranged comparisons as suggested below, or, you can do a two-step process like this: Let's say we want to look for addresses within 2001:db8::/29. This would mean we need to match all strings starting with 2001:0db8 through 2001:0dbf. We could easily grab everything that begins with '20010db%' and then select the masked values matching from the 8th column where (atoi(concat("0x",substr(addr,8,1))) & 0x8). Forgive me if I don't get the SQL syntax exactly right or have a wrong function name… I do more C than SQL. Both of these comparisons could be performed in a single select like: SELECT * FROM <table> WHERE ip6addr is like '20010db%' and \ (atoi(concat('0x', substr(ip6addr,8,1))) & 0x8) This should be relatively efficient because the more expensive second test will only be performed on records that first pass the relatively cheap match of the first 7 characters. Owen