ShamFinder: An Automated Framework for Detecting IDN Homographs∗ Hiroaki Suzuki Daiki Chiba Yoshiro Yoneya Waseda University NTT Secure Platform Laboratories Japan Registry Services Tokyo, Japan Tokyo, Japan Tokyo, Japan
[email protected] [email protected] [email protected] Tatsuya Mori Shigeki Goto Waseda University/NICT/RIKEN Waseda University AIP Tokyo, Japan Tokyo, Japan
[email protected] [email protected] ABSTRACT Chinese, Cyrillic, Hangul, Hebrew, Hiragana, or Tamil. IDN The internationalized domain name (IDN) is a mechanism was first proposed by Dürst in 1996 as an Internet Draft (I- that enables us to use Unicode characters in domain names. D) [19]. Subsequently, a system known as Internationalizing The set of Unicode characters contains several pairs of charac- Domain Names in Applications (IDNA) was adopted as an ters that are visually identical with each other; e.g., the Latin Internet standard [11]. Currently, the IDNA system is widely character ‘a’ (U+0061) and Cyrillic character ‘0’(U+0430). deployed in various domains including hundreds of top-level Visually identical characters such as these are generally domains (TLDs). In addition, the majority of modern web known as homoglyphs. IDN homograph attacks, which are browsers are capable of accommodating IDNs. widely known, abuse Unicode homoglyphs to create looka- Character sets permitted to be used as IDNs contain sev- like URLs. Although the threat posed by IDN homograph eral pairs of characters that are visually similar with each attacks is not new, the recent rise of IDN adoption in both other. These characters are known as homoglyphs.