Laurent Frigault wrote:
gethostbyaddr (and may be other functions) will return NULL under at least FreeBSD/NetBSD for ANY PTR having the "_" character.
As it should. I wish it would also return a null for hostnames containing sequential non-alphanumerics (--, ---, __, ___, ...). -- Roger Marquis Roble Systems Consulting http://www.roble.com/
On Thu, 19 May 2005 08:04:31 PDT, Roger Marquis said:
Laurent Frigault wrote:
gethostbyaddr (and may be other functions) will return NULL under at least FreeBSD/NetBSD for ANY PTR having the "_" character.
As it should. I wish it would also return a null for hostnames containing sequential non-alphanumerics (--, ---, __, ___, ...).
Don't like RFC3490 and its xn-- hostnames? ;)
On Thu, 19 May 2005 Valdis.Kletnieks@vt.edu wrote:
On Thu, 19 May 2005 08:04:31 PDT, Roger Marquis said:
Laurent Frigault wrote:
gethostbyaddr (and may be other functions) will return NULL under at least FreeBSD/NetBSD for ANY PTR having the "_" character.
As it should. I wish it would also return a null for hostnames containing sequential non-alphanumerics (--, ---, __, ___, ...).
Don't like RFC3490 and its xn-- hostnames? ;)
Most definitely not, and if this were 1985 I'd be {rf}commenting on the inadvisability of such hostnames, and those beginning or ending with "-", TLD names shorter than 2 or longer than 4 characters, spaces in hostnames, [^a-z0-9\\-\\.], and other such marginally useful but infinitely problematic features. There is real value in KIS, and not just from the perspective of a security-minded coder... -- Roger Marquis Roble Systems Consulting http://www.roble.com/
On Thu, May 19, 2005 at 08:52:56AM -0700, Roger Marquis wrote:
Most definitely not, and if this were 1985 I'd be {rf}commenting on the inadvisability of such hostnames, and those beginning or ending with "-", TLD names shorter than 2 or longer than 4 characters, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ spaces in hostnames, [^a-z0-9\\-\\.], and other such marginally useful but infinitely problematic features.
I understand your other concerns, there, but TLD length? To preserve assumptions in old apps? Cheers, -- jra -- Jay R. Ashworth jra@baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274 If you can read this... thank a system administrator. Or two. --me
At 8:52 -0700 5/19/05, Roger Marquis wrote:
On Thu, 19 May 2005 Valdis.Kletnieks@vt.edu wrote: ...
Don't like RFC3490 and its xn-- hostnames? ;)
xn--... aren't host names, they are domain names. The host name corresponding to that would be something my simple minded mail application can't accept as input.
Most definitely not, and if this were 1985 I'd be {rf}commenting on the inadvisability of such hostnames, and those beginning or ending with "-", TLD names shorter than 2 or longer than 4 characters, spaces in hostnames, [^a-z0-9\\-\\.], and other such marginally useful but infinitely problematic features.
There is real value in KIS, and not just from the perspective of a security-minded coder...
KIS is great, if it gets the job done. Systems that are too simple are useless too. Supporting "IDN" is a necessary job. That's been made clear to the Internet community. If it "complicates" things, well, then that's what has to be done. If the Internet is to be global, it can't restrict the world to just a few convenient languages. It's true that the xn-- convention isn't the best way to encode IDN's, but it has proven to be the optimal one in design (at least). It would have been nice to use a new domain name label type, but we've about run out of them. It would have been nice to use domain classes and use this to create a new domain name syntax, but that can't be done either. Encoding IDNs this way ("xn--") is optimal according to the considered opinion of the IETF, at least those working on RFC 3490 (published in 2003), when you consider impacts on other protocols and applications. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Edward Lewis +1-571-434-5468 NeuStar If you knew what I was thinking, you'd understand what I was saying.
Supporting "IDN" is a necessary job. That's been made clear to the Internet community. If it "complicates" things, well, then that's what has to be done. If the Internet is to be global, it can't restrict the world to just a few convenient languages.
Not to quibble unnecessarily, but the folks I came to the dance with at IETF-50, eventually went home fairly disapointed after -51, and -52,with none of their proposed mechanisms drafts having obtained even working group draft status. You know what the constraints are -- no zone local semantics (e.g., case folding rules, courtesy H.A.) for a glyph repetoire that in some ranges is also a character set, no intermediate tables, no flag day(s) for apps, and so on. To describe that as "IDN", rather than "a way to represent, poorly for some, not so poorly for others, character sets other than ASCII in apps", leaves the later reader ignorant of the baroque design choices available and discarded on the road to RACE II. In Abenaki, "w", "ou" and "8" all collate to the same code point, and the representation of the code point is application specific (modern, early, and 17thrCa styles). Eric P.S. 17th century French lacked a "w" character, "8" is a "u" atop an "o".
You know what the constraints are -- no zone local semantics (e.g., case folding rules, courtesy H.A.) for a glyph repetoire that in some ranges is also a character set, no intermediate tables, no flag day(s) for apps, and so on.
It's sad that one of the constraints isn't for this to be explained in plain English. Sometimes I think people take jargon too far. Yes, we do need some special vocabulary to talk about detailed technical things, but every time we invent new vocabulary, we compartmentalize knowledge into stovepipes and we prevent cross-fertilization with other fields of knowledge.
P.S. 17th century French lacked a "w" character, "8" is a "u" atop an "o".
And people who write Russian in mobile phone SMS will often write things like 4to ti xo4esh videt? Where the "4" represents "ch" and the two occurences of "i" represent two separate cyrillic letters. Russia is an interesting country with respect to domain names. Sometimes you will see a domain name written in cyrillic characters that are intended to be transliterated one-by-one into latin characters. This is signified by using cyrillic for the .ru ending. And sometimes you see a cyrillic domain name with a russian word which is intended to be translated into the english word to form the domain name. --Michael Dillon
And people who write Russian in mobile phone SMS will often write things like
4to ti xo4esh videt?
It would be written "chto ti hochesh videti" or "chto ti xochesh videti". Russian transliterations are rather easy to follow since they are phonetic. We are not counting 3l33t speakers.
Russia is an interesting country with respect to domain names. Sometimes you will see a domain name written in cyrillic characters that are intended to be transliterated one-by-one into latin characters. This is signified by using cyrillic for the .ru ending. And sometimes you see a cyrillic domain name with a russian word which is intended to be translated into the english word to form the domain name.
When Russian is written using English letters, it is phonetic. The native speakers understand it. The non-native speakers look at it the same way as they view domain names that do not contain recognizable words. Alex
On Fri, 20 May 2005 alex@yuriev.com wrote:
It would be written "chto ti hochesh videti" or "chto ti xochesh videti". Russian transliterations are rather easy to follow since they are phonetic. We are not counting 3l33t speakers.
When Russian is written using English letters, it is phonetic. The native speakers understand it. The non-native speakers look at it the same way as they view domain names that do not contain recognizable words.
Even in your own example you used "x" in place of "h" - this is not phonetic but literal representation of russian letter "x". So while it is for the most part phonetic, it really depends on who is writing and I've yet to see two people use exactly the same transliteration of russian in latin letters; as an example I would write above as "chto ty hochesh videt'". Oh, and did I mention that written cyrillic russian difers from spoken language and as it regularly has ambigous soft/hard sounds transliterated only as hard. When transliterating to latin many do it from spoken language sounds, so don't be surprised to see "shto ty hochesh videt'" (which might turn into "wto ty hochew videt" for those few who represent "sh" as "w" because letters are visually similar eventhough sounds are not) and then others do it the other way around making everything hard and even getting rid of yat' derived letters - "chto ti hochesh videt". -- William Leibzon Elan Networks william@elan.net
Edward Lewis wrote:
It's true that the xn-- convention isn't the best way to encode IDN's, but it has proven to be the optimal one in design (at least).
It's the necessary minimum for compatibilty purposes, but not anywhere near the optimal design. Moreover, those have nothing to do with each other. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
At 17:53 -0400 5/19/05, Eric A. Hall wrote:
Edward Lewis wrote:
It's true that the xn-- convention isn't the best way to encode IDN's, but it has proven to be the optimal one in design (at least).
It's the necessary minimum for compatibilty purposes, but not anywhere near the optimal design.
Moreover, those have nothing to do with each other.
"Optimal" is so subjective. ;) -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Edward Lewis +1-571-434-5468 NeuStar If you knew what I was thinking, you'd understand what I was saying.
On Thu, 19 May 2005, Roger Marquis wrote:
As it should. I wish it would also return a null for hostnames containing sequential non-alphanumerics (--, ---, __, ___, ...).
It is possible to reject multiple dots, both in theory and in practice (in fact it's a useful for spotting certain kinds of spamware that generates bogus HELO domains). You can't reject double hyphens because they are permitted by the syntax and used by IDN, for example. Tony. -- f.a.n.finch <dot@dotat.at> http://dotat.at/ BISCAY: WEST 5 OR 6 BECOMING VARIABLE 3 OR 4. SHOWERS AT FIRST. MODERATE OR GOOD.
At 8:04 -0700 5/19/05, Roger Marquis wrote:
Laurent Frigault wrote:
gethostbyaddr (and may be other functions) will return NULL under at least FreeBSD/NetBSD for ANY PTR having the "_" character.
As it should. I wish it would also return a null for hostnames containing sequential non-alphanumerics (--, ---, __, ___, ...).
If null were returned for all "host names" containing "--" then IDN names wouldn't "work." (See RFC 3490, section 5.) -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Edward Lewis +1-571-434-5468 NeuStar If you knew what I was thinking, you'd understand what I was saying.
participants (10)
-
alex@yuriev.com
-
Edward Lewis
-
Eric A. Hall
-
Eric Brunner-Williams in Portland Maine
-
Jay R. Ashworth
-
Michael.Dillon@radianz.com
-
Roger Marquis
-
Tony Finch
-
Valdis.Kletnieks@vt.edu
-
william(at)elan.net