RE: Mozilla Implements TLD Whitelist for Firefox in Response to IDN Homogr aphs Spoofing
Does anyone else think that it's not the job of a web browser to do
Phil said: this? Yes, it's recognized by Mozilla and others as the job of the Internet Architecture Board (in particular, the IAB-IDN group) to make a final decision on how to deal with homographs. However, for early adopters like Mozilla with a released software package that supports IDNs, they are taking an intermediate action until the committee comes up with a guideline. I think you're both right -- most applications can't feasibly manage their own Unicode Philosophy, but Mozilla needs to do something for the short term. -Jason -- Jason Sloderbeck Positive Networks jason @ positivenetworks . net
* Jason Sloderbeck:
Yes, it's recognized by Mozilla and others as the job of the Internet Architecture Board (in particular, the IAB-IDN group) to make a final decision on how to deal with homographs.
Homographs are a classical example of a PR attack. It's a complete non-issue. In practice, people don't use domain names to assess the credibility of web sites. 1/l/I and 0/O are homographs as well, and the Internet hasn't collapsed as a result. The really stunning thing about the whole mess is that nobody seems to grasp that technically, TLDs are not in a position to restrict name server operators to any character sets in the domain names they use. After all, I can add any domain name I want to my zone files.
Florian Weimer wrote:
* Jason Sloderbeck:
Yes, it's recognized by Mozilla and others as the job of the Internet Architecture Board (in particular, the IAB-IDN group) to make a final decision on how to deal with homographs.
Homographs are a classical example of a PR attack. It's a complete non-issue. In practice, people don't use domain names to assess the credibility of web sites. 1/l/I and 0/O are homographs as well, and the Internet hasn't collapsed as a result.
The really stunning thing about the whole mess is that nobody seems to grasp that technically, TLDs are not in a position to restrict name server operators to any character sets in the domain names they use. After all, I can add any domain name I want to my zone files.
Indeed you can. But since the TLD registry operators can, and do, control the delegation of their TLDs, they have de-facto control over the sets of labels that can be used for second-level domain labels that are publically visible within their TLD domains, unless you can persuade people to point at your nameserver other than through the normal delegation from the root. This means that they can, if they so wish, apply character set restrictions to those labels. Your TLD registry, for example, can and does enforce such a policy. (http://www.denic.de/en/richtlinien.html) On the other hand, there's nothing anyone can do to stop you resolving whatever labels you like on your own public nameservers, within your third-level, fourth-level and so on domains. However, this is unlikely to cause security problems for anyone apart from yourself and/or your customers. -- Neil -- Neil
* Neil Harris:
But since the TLD registry operators can, and do, control the delegation of their TLDs, they have de-facto control over the sets of labels that can be used for second-level domain labels that are publically visible within their TLD domains,
I just don't see why this label is particularly important. If the domain name is sufficiently long, it's not even displayed by current browsers. Even if this is fixed, how many users are aware that you have to read domain names from right to left?
Homographs are a classical example of a PR attack. It's a complete non-issue.
I am inclined to agree.
But since the TLD registry operators can, and do, control the delegation of their TLDs, they have de-facto control over the sets of labels that can be used for second-level domain labels that are publically visible within their TLD domains
Indeed. The actual problem is that ICANN has been captured by the trademark community (WIPO, basically) and has internalized two bad ideas, that domains are like trademarks, and it is ICANN's job to protect them. Once the registrars and registries realized that this meant a thousand first-day registrations in a new domain (you may be sure that disney.xxx has been presold), there hasn't been any serious opposition so there are continuing inane arguments about how to prevent 2LD homographs, even as everyone agrees that it's impossible. Mozilla's approach strikes me as the least bad way to appease the trademark crazies without interfering too badly with useful work. I will be interested to see what they do when a cctld declares that their policy is that they permit any name. R's, John
John Levine wrote:
Homographs are a classical example of a PR attack. It's a complete non-issue.
I am inclined to agree.
But since the TLD registry operators can, and do, control the delegation of their TLDs, they have de-facto control over the sets of labels that can be used for second-level domain labels that are publically visible within their TLD domains
Indeed. The actual problem is that ICANN has been captured by the trademark community (WIPO, basically) and has internalized two bad ideas, that domains are like trademarks, and it is ICANN's job to protect them. Once the registrars and registries realized that this meant a thousand first-day registrations in a new domain (you may be sure that disney.xxx has been presold), there hasn't been any serious opposition so there are continuing inane arguments about how to prevent 2LD homographs, even as everyone agrees that it's impossible.
Mozilla's approach strikes me as the least bad way to appease the trademark crazies without interfering too badly with useful work. I will be interested to see what they do when a cctld declares that their policy is that they permit any name.
R's, John
On the first point, yes, I agree, it's probably the least-worst solution. On the second point: Mozilla, I imagine, would do nothing at all. -- Neil
On Thu, 28 Jul 2005, Florian Weimer wrote:
Yes, it's recognized by Mozilla and others as the job of the Internet Architecture Board (in particular, the IAB-IDN group) to make a final decision on how to deal with homographs.
Homographs are a classical example of a PR attack. It's a complete non-issue. In practice, people don't use domain names to assess the credibility of web sites. 1/l/I and 0/O are homographs as well, and the Internet hasn't collapsed as a result.
English-speaking folks actually do often notice the difference between 1/l/I and 0/O, partly because they're usually (in browsers) lower case -- hence 1/l/i and 0/o (while 1/l is still close, the users are trained by years to know the difference). It's an implicit Turing-test factor based on linguistic experience. Homographs where the glyphs are almost or completely identical, but completely different code points, is where this *really* breaks down. There are several sets of glyphs that can mimic nearly all of the Latin alphabet -- and in most fonts, looks *identical* to the Latin glyphs (some fonts simply remap to use the Latin glyph's data). Unfortunately, Pine isn't really a UTF-8 mailer, or I'd demonstrate on list for you. However, if you have a UTF-capable browser (chances are, you do), the following should demonstrate identical-glyph homographs nicely. http://www.duh.org/homographs.cgi (Hint: In each group of three lines, the strings of characters are NOT identical, regardless of what your eyes may tell you.) -- -- Todd Vierling <tv@duh.org> <tv@pobox.com> <todd@vierling.name>
* Todd Vierling:
Homographs are a classical example of a PR attack. It's a complete non-issue. In practice, people don't use domain names to assess the credibility of web sites. 1/l/I and 0/O are homographs as well, and the Internet hasn't collapsed as a result.
English-speaking folks actually do often notice the difference between 1/l/I and 0/O, partly because they're usually (in browsers) lower case -- hence 1/l/i and 0/o (while 1/l is still close, the users are trained by years to know the difference). It's an implicit Turing-test factor based on linguistic experience.
But case is controlled by the attacker. Maybe users would be alerted if they saw a capitalized domain name, which rules out the O/0 replacement. But the l/1/I issue still remains.
Homographs where the glyphs are almost or completely identical, but completely different code points, is where this *really* breaks down. There are several sets of glyphs that can mimic nearly all of the Latin alphabet -- and in most fonts, looks *identical* to the Latin glyphs (some fonts simply remap to use the Latin glyph's data).
So what? For most .DE domain, I still can get the corresponding .DE.VU domain. Apart from the trailing .VU, the strings are even bitwise identical. Let me repeat my other argument: Users don't use domain names in trust assessments. The smarter ones seem to recall how they got to a particular page. This is quite consistent with real-world behavior. Most people tend not to forget that they are in some questionable part of the city just because they meet an attractive member of the appropriate sex (or something like that, you get the idea).
(Hint: In each group of three lines, the strings of characters are NOT identical, regardless of what your eyes may tell you.)
They appear differently because even though they are from a single font, the characters have slightly different widths. This wouldn't matter in the location field, of course.
On Thu, 28 Jul 2005, Florian Weimer wrote:
Let me repeat my other argument: Users don't use domain names in trust assessments. The smarter ones seem to recall how they got to a particular page. This is quite consistent with real-world behavior.
Uh, I beg to differ -- most of my family would see h t t p : / / w w w . y a h <omicron> <omicron> . g r / and think "the Yahoo site in Greece". After all, it renders as precisely http://www.yahoo.gr/ on-screen, same character glyph, width, and all. This isn't a PR attack; it's a real inverse-Turing-test type of attack. People do look at URLs visually, and many can recognize the difference with simple homographs, but most, I assure you, cannot.
(Hint: In each group of three lines, the strings of characters are NOT identical, regardless of what your eyes may tell you.)
They appear differently because even though they are from a single font, the characters have slightly different widths.
Actually, out of all the fonts and OSs I tried, including one I prefer not to use or name but which many people do use, only the Cyrillic lowercase on one font on one OS had different widths, for exactly one character -- all others had identical widths. So you probably have a lucky font -- and you're fortunately already technically knowledgeable to know what a Unicode character is and how it's different from plain ASCII. Most users are *NOT* so lucky, as much as you'd hope for that.
This wouldn't matter in the location field, of course.
How so? The movement is in the direction of rendering IDNs natively as Unicode in the Location field, so this is exactly the same problem. (Hm. I'm beginning to smell the T-word, but I'll wait and see how thick the skull material is first.) -- -- Todd Vierling <tv@duh.org> <tv@pobox.com> <todd@vierling.name>
participants (5)
-
Florian Weimer
-
Jason Sloderbeck
-
John Levine
-
Neil Harris
-
Todd Vierling