Re: Non-English Domain Names Likely Delayed

18 Jul 2005

...
Stephane, can I ask you what your detailed objections are to the 
Moz/Opera mechanism, and could you let me know your proposal for an 
alternative mechanism for preventing IDN spoofing?
I would suggest that an alternative mechanism should include
a set of code points to be used for the on-the-wire DNS 
protocol and the registry databases. This set of codepoints
will greatly restrict the possibility of ambiguity. Right
now it is utterly impossible to represent the ambiguity
of IBM, ibm, IBM or IbM in the DNS because the set of
codepoints only allows for one code to be shared by I and i.
This principle could be extended to other scripts so that,
for instance, codes for the 2nd and 4th letters of the
Cyrillic alphabet could be added while not adding codes
for the 1st and 3rd letters because A and B are already there.

Two additional items needed are translation tables. One
translation table would be the PREFERRED mapping from the
DNS codepoints to Unicode. I say "preferred" because while
some people will be happy to see the "b" as in "ibm", others
may prefer to see it as "B" especially Cyrillic users who
use "B" for a completely different letter most of the time.
Also, Arabs may prefer to map first and last letters of a
domain to the initial and final forms of the letter and
use medials for the rest because it looks better most of
the time. This does not create exploitable ambiguity.

The second item is a comprehensive mapping for all of 
UNICODE that maps each code point into one of the DNS
code points. This should be defined as an algorithm because
that allows for a combination of mapping tables and more
efficient ways of defining and executing the mapping.

It may be painful to upgrade the DNS, but if we are going
to do so, we need to try to make it a solution that will
work for a long time, not just quick fix patches.

I have nothing against the Mozilla solution as a quick
fix but I hope that it is used to demonstrate the need
for upgrading DNS and fixing the problem at its root.
...
For example, simple script 
restrictıons alone, as per ICANN, do not solve the problem -- there are 
plenty of subtle homographs in the Latin alphabet, such as the one 
embedded in this sentence.
Personally, I consider that to be the Turkish alphabet, not the 
Latin one. Turkic speakers who use Cyrillic also have a habit
of adopting munged up characters in their alphabets. I think this
is solved by defining the PREFERRED mapping as described above.
Turkey would implement it keeping the distinction between the
i with and without the dot. Many other countries would opt for
sticking in some code like "?" to indicate that there is a wierd
character there. If I localize my computer to allow Turkish text 
entry and Turkish fonts, no doubt I would also get the Turkish
domain name mapping preferences. And no doubt, central asian countries
speaking Turkic languages but using the Cyrillic alphabet would map
all the codes into their familiar Cyrillic forms.

This is possible because the reverse mapping allows one to type
in many different possible UNICODE character forms of a domain name
in order to get the same single unambiguous registered domain name.
...
* it is scalable on a per-registry basis, so there's no need for a "flag
...
day", and requires no action on behalf of the registry beyond that which
...
might be expected as a service to their customers, who have a reasonable
...
expectation that their domains not be easily spoofed.
I think if we are going to upgrade the DNS, then registries will have
to adapt in the same way as everybody else. And if that includes a
flag day, then so be it. I suspect, however, that we will find some
less disruptive way to transition, perhaps with two flag days to
indicate the beginning and the end of a transition period.
...
For example, for .fr, it could be as simple as saying something like 
"labels in .fr must consist only of characters from the set -, 0, 1, 2, 
3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, 
r, s, t, u, v, w, x, y, z, à, â, æ, ç, è, é, ê, ë, î, ï, ô, ù, û, ü, ÿ, 
œ", putting that statement on their website, and letting the software 
makers know about it.
And if a Turkish cultural centre in Paris wants to register a domain
name with the undotted i, then what? National boundaries have no 
relationship
to cultural boundaries. Admittedly, in my solution suggested above, if 
such
a turkish domain name did exist, anyone who did not have a localized 
system
supporting entry of the undotted i would not be able to enter the name of
the domain. They could still access the website by leveraging a website 
that
allowed them to access it by clicking a link, in the same way that 
http://www.translit.ru provides a Cyrillic keyboard for computers without
Cyrillic localization installed.

--Michael Dillon

Re: Non-English Domain Names Likely Delayed

Michael.Dillon＠btradianz.com