Re: Underscores in host names

19 May 2005

...
There is a solution for this problem.  Use 32-bit character sets 
which are defined to include the entire collection of known character 
sets in all other languages on the planet.
This doesn't solve the problem of case-sensitivity and
its relatives. You probably don't want NANOG.org, nanog.org
and NaNoG.org to be three different domain names. There
are related issues with other scripts, for instance in
Arabic most letters can have different forms
depending on whether they are written isolated, at
the beginning, in the middle or at the end of a word.

Then there are the ambiguities that go across scripts.
For instance, the numeric digits are repeated in both
the arabic form and the common western form. In Russian
the letters HAC are spelled en-ah-ess but they look
like the English letters aitch-ey-see even though they
are encoded differently. Also, Cyrillic Unicode includes
historical letters that are not currently used which
means that many words have more than one spelling.

Unicode is not a workable solution for hostnames or
domain names or any sort of identifier where you want
to unambiguously distinguish the identifiers. For that
we need some kind of mapping that maps all unicode characters
into one single unambigous subset of unicode that can
be used for hostnames, etc.

The good thing is that when we deploy that mapping, you
will be able to use underscores in hostnames. But don't be
surprised if it gets automatically mapped to a dash in
order to avoid ambiguity.

--Michael Dillon

Re: Underscores in host names

Michael.Dillon＠radianz.com