Stephane, can I ask you what your detailed objections are to the Moz/Opera mechanism, and could you let me know your proposal for an alternative mechanism for preventing IDN spoofing?
I would suggest that an alternative mechanism should include a set of code points to be used for the on-the-wire DNS protocol and the registry databases. This set of codepoints will greatly restrict the possibility of ambiguity. Right now it is utterly impossible to represent the ambiguity of IBM, ibm, IBM or IbM in the DNS because the set of codepoints only allows for one code to be shared by I and i. This principle could be extended to other scripts so that, for instance, codes for the 2nd and 4th letters of the Cyrillic alphabet could be added while not adding codes for the 1st and 3rd letters because A and B are already there. Two additional items needed are translation tables. One translation table would be the PREFERRED mapping from the DNS codepoints to Unicode. I say "preferred" because while some people will be happy to see the "b" as in "ibm", others may prefer to see it as "B" especially Cyrillic users who use "B" for a completely different letter most of the time. Also, Arabs may prefer to map first and last letters of a domain to the initial and final forms of the letter and use medials for the rest because it looks better most of the time. This does not create exploitable ambiguity. The second item is a comprehensive mapping for all of UNICODE that maps each code point into one of the DNS code points. This should be defined as an algorithm because that allows for a combination of mapping tables and more efficient ways of defining and executing the mapping. It may be painful to upgrade the DNS, but if we are going to do so, we need to try to make it a solution that will work for a long time, not just quick fix patches. I have nothing against the Mozilla solution as a quick fix but I hope that it is used to demonstrate the need for upgrading DNS and fixing the problem at its root.
For example, simple script restrictıons alone, as per ICANN, do not solve the problem -- there are plenty of subtle homographs in the Latin alphabet, such as the one embedded in this sentence.
Personally, I consider that to be the Turkish alphabet, not the Latin one. Turkic speakers who use Cyrillic also have a habit of adopting munged up characters in their alphabets. I think this is solved by defining the PREFERRED mapping as described above. Turkey would implement it keeping the distinction between the i with and without the dot. Many other countries would opt for sticking in some code like "?" to indicate that there is a wierd character there. If I localize my computer to allow Turkish text entry and Turkish fonts, no doubt I would also get the Turkish domain name mapping preferences. And no doubt, central asian countries speaking Turkic languages but using the Cyrillic alphabet would map all the codes into their familiar Cyrillic forms. This is possible because the reverse mapping allows one to type in many different possible UNICODE character forms of a domain name in order to get the same single unambiguous registered domain name.
* it is scalable on a per-registry basis, so there's no need for a "flag
day", and requires no action on behalf of the registry beyond that which
might be expected as a service to their customers, who have a reasonable
expectation that their domains not be easily spoofed.
I think if we are going to upgrade the DNS, then registries will have to adapt in the same way as everybody else. And if that includes a flag day, then so be it. I suspect, however, that we will find some less disruptive way to transition, perhaps with two flag days to indicate the beginning and the end of a transition period.
For example, for .fr, it could be as simple as saying something like "labels in .fr must consist only of characters from the set -, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, à, â, æ, ç, è, é, ê, ë, î, ï, ô, ù, û, ü, ÿ, œ", putting that statement on their website, and letting the software makers know about it.
And if a Turkish cultural centre in Paris wants to register a domain name with the undotted i, then what? National boundaries have no relationship to cultural boundaries. Admittedly, in my solution suggested above, if such a turkish domain name did exist, anyone who did not have a localized system supporting entry of the undotted i would not be able to enter the name of the domain. They could still access the website by leveraging a website that allowed them to access it by clicking a link, in the same way that http://www.translit.ru provides a Cyrillic keyboard for computers without Cyrillic localization installed. --Michael Dillon