Unicode Transformations: Finding Elusive Vulnerabilities OWASP Appsecdc November 2009

Unicode Transformations: Finding Elusive Vulnerabilities OWASP AppSecDC November 2009 Chris Weber [email protected] Casaba Security What’s this about? • Visual spoofing and counterfeiting • Text transformation attacks OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber What will you learn? • Why you should care about Visual Integrity… – Branding – Identity – Cloud Computing – URI’s! OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber What will you learn? • Good techniques for finding bugs – Web-apps and clever XSS – Test cases for fuzzing OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber What about tools? • Watcher – Microsoft SDL recommended tool – Passive Web-app testing for free – http://websecuritytool.codeplex.com/ • Unibomber – Deterministic auto-pwn XSS testing OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Can you tell the difference? OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber How about now? OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber The Transformers When good input turns bad <scrİpt> becomes <script> OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Agenda OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Transformations Agenda • Unicode crash course • Root Causes • Attack Vectors • Tools – Watcher – Unibomber OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Transformations Agenda • Unicode crash course • Root Causes • Attack Vectors • Tools OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course What is Unicode • Globalization • One framework for all languages • Storage and transmission of text • A large database OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course The Unicode Attack Surface • All software • All users OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course Unthink it OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course • A large and complex standard code points canonical mappings encodings decomposition types categorization case folding normalization best-fit mapping binary properties 17 planes case mapping private use ranges conversion tables script blocks bi-directional properties escapings OWASP AppSecDC - November 2009 © 2009 Chris Weber Unicode Crash Course Code pages and charsets Shift_jis Gb2312 ISCII Windows-1252 ISO-8859-1 EBCDIC 037 OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course Ad Infinitum • Unicode can represent them all • ASCII range is preserved – U+0000 to U+007F are mapped to ASCII OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course Code points • Unicode 5.1 uses a 21-bit scalar value with space for over 1,100,000 code points: U+0000 to U+10FFFF OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course Code Points A = U+0041 Every character has a unique number OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course A U+0041 OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course ſ U+017F OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course Encodings UTF-8 – variable width 1 to 4 bytes (used to be 6) UTF-16 – Endianess – Variable width 2 or 4 bytes – Surrogate pairs! UTF-32 – Endianess – Fixed width 4 bytes – Fixed mapping, no algorithms needed OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Crash Course Encodings and Escape sequences U+FF21 FULLWIDTH LATIN CAPITAL LETTER A %EF%BC%A1 ＡＡ \xEF\xBC\xA1 \uFF21 OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Transformations Agenda • Unicode crash course • Root Causes • Attack Vectors • Tools OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Transformations Agenda • Unicode crash course • Root Causes • Attack Vectors • Tools OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Unicode Transformations Overview • Unicode crash course • Root Causes – Visual Spoofing and IDN’s – Best-fit mappings – Normalization – Overlong UTF-8 – Over-consumption – Character substitution – Character deletion – Casing – Buffer overflows – Controlling Syntax – Charset transformations – Charset mismatches • Tools OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Visual Spoofing and IDN’s OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Root Causes Visual Spoofing • Over 100,000 assigned characters • Many lookalikes within and across scripts AΑАᐱᗅᗋᗩᴀᴬ⍲Ａ OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Open your Web browser And follow along http://nottrusted.com/idn.php OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors IDN homograph attacks Some browsers allow .COM IDN’s based on script family – (Latin has a big family) OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors IDN homograph attacks Safari OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors IDN homograph attacks Opera OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors IDN homograph attacks www.google.com is not www.gooɡle.com Latin Latin U+0069 U+0261 gɡ OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Root Causes The state of International Domain Names ICANN guidelines v2.0 Deny-all default seems to – Inclusion-based be the right concept. – Script limitations A script can cross many – Character limitations blocks. Even with limited script choices, there’s plenty to choose from. Great for domain labels, but sub domain labels still open to punctuation and syntax spoofing. OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Root Causes IDN – so what’s the problem? • Divergent user-agent implementations • Lookalikes everywhere • IDNA and Nameprep based on Unicode 3.2 – We’re up to Unicode 5.1 (larger repertoire) OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Root Causes The state of International Domain Names • Registrars still allow – Confusables – Combining marks – Single, Whole and Mixed-script • Registrars can’t control – Syntax spoofing in sub domain labels OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors Visual spoofing Vectors • Non-Unicode attacks • Confusables • Invisibles • Problematic font-rendering • Manipulating Combining Marks • Bidi and syntax spoofing OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors Non-Unicode homograph attacks rn can look like m in certain fonts www.mullets.com is not www.rnullets.com Latin Latin U+006D U+0073 U+006E OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors Non-Unicode homograph attacks Are you using mono-width fonts? 0 and O 1 and l 5 and S OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors Non-Unicode homograph attacks Classic long URL’s http://login.facebook.intvitation.videomessageid- h048892r39.sessionnfbid.com/home.htm?/disbursements/ OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors Unicode and The Confusables www.ɑpple.com // All Latin using Latin small letter Alpha ‘ɑ’ www.faϲebook.com // Mixed Latin/Greek with lunate sigma symbol ‘c’ www.аЬс.com // All Cyrillic ‘abc’ OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors IDN homograph attacks Browsers whitelist .ORG OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors IDN homograph attacks Others don’t necessarily but… OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors IDN homograph attacks www.mozilla.org is not www.mozílla.org Latin Latin U+0069 U+00ED OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors IDN Syntax Spoofing with / lookalikes http://www.google.com／path／file?.nottrusted.org FULLWIDTH SOLIDUS U+FF0F (This case doesn’t work anymore) OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors IDN Syntax Spoofing with / lookalikes http://www.google.com/path/file.nottrusted.org SOLIDUS U+002F (Normalized to a / U+002F) OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors IDN Syntax Spoofing with / lookalikes http://www.google.comﾉpathﾉfile.nottrusted.org (However punctuation not required…) OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors The Invisibles OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Attack Vectors Visual Spoofing with Bidi Explicit Directional Overrides OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Best-fit Mappings OWASP AppSecDC - November 2009 www.casabasecurity.com © 2009 Chris Weber Root Causes Best-fit mappings Commonly occur in charset transformations and even innocuous API’s Impact: Filter evasion, Enable code execution When σ becomes s U+03C3 GREEK SMALL LETTER SIGMA When ′ becomes ' U+2032 PRIME OWASP AppSecDC - November 2009 www.casabasecurity.com

Unicode Transformations: Finding Elusive Vulnerabilities OWASP Appsecdc November 2009

Unicode Ate My Brain

ISO Basic Latin Alphabet

Unicode and Code Page Support

Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress

Unicode Encoding the TITUS Project

Name Synopsis Description

Fun with Unicode - an Overview About Unicode Dangers

Exploiting Unicode-Enabled Software

Character Properties 4

Proposal to Create a New Block for Missing Block Element Characters

Unicode Character Properties

Information, Characters, Unicode