On 2/15/12 8:32 AM, Mark Andrews wrote:
... Before deciding to go the IDNA route, treating DNS labels as UTF-8 was discussed, evaluated and rejected.
well, sort of. we started with "idn" as a wg label. the smtp weenies opined that they'd never have a flag day and anything other than a boot encoding in LDH would harm LDH limited mailers, so ... the code point problem (or problems) was moved out of "infrastructure" and into "applications", so the work product was labeled "idna", which the successor wg had no alternative except to follow the "in a" set of dependencies and assumptions. as you observed, labels are length tagged binary blobs, and where the blobs consist of 7 bit ascii values in the 'a'-'z' range, case folding is performed in lookup. what happens outside of that range is a path not taken, though i tried in 2929 to leave that open for future work, the sentence which read "text labels can, in fact, include any octet value including zero octets but most current uses involve only [US-ASCII]." was, if memory serves, proposed by a co-author to have been more restrictive. i agree with the "rejected" statement, the "evaluated" and even the "discussed" overstate the room available after the smtp weenies weighed in on what was permissible in headers. -e