Camomile : a Unicode Library for Ocaml

Camomile : a Unicode Library for Ocaml

Camomile : A Unicode library for OCaml Yoriyuki Yamagata National Institute of Advanced Science and Technology (AIST) ML Workshop, September 18, 2011 Outline Overview ASCII to Unicode : A challenge of multilingualization A brief tour of Camomile modules ulib Conclusion Outline Overview ASCII to Unicode : A challenge of multilingualization A brief tour of Camomile modules ulib Conclusion I Unicode character type I UTF-8, UTF-16, UTF-32 strings I Conversion to/from approx 200 encodings I Case mapping I Collation (sort and search) Camomile - A Unicode library for OCaml Overview - functionality I Unicode character type I UTF-8, UTF-16, UTF-32 strings I Conversion to/from approx 200 encodings I Case mapping I Collation (sort and search) Overview - functionality Camomile - A Unicode library for OCaml I UTF-8, UTF-16, UTF-32 strings I Conversion to/from approx 200 encodings I Case mapping I Collation (sort and search) Overview - functionality Camomile - A Unicode library for OCaml I Unicode character type I Conversion to/from approx 200 encodings I Case mapping I Collation (sort and search) Overview - functionality Camomile - A Unicode library for OCaml I Unicode character type I UTF-8, UTF-16, UTF-32 strings I Case mapping I Collation (sort and search) Overview - functionality Camomile - A Unicode library for OCaml I Unicode character type I UTF-8, UTF-16, UTF-32 strings I Conversion to/from approx 200 encodings I Collation (sort and search) Overview - functionality Camomile - A Unicode library for OCaml I Unicode character type I UTF-8, UTF-16, UTF-32 strings I Conversion to/from approx 200 encodings I Case mapping Overview - functionality Camomile - A Unicode library for OCaml I Unicode character type I UTF-8, UTF-16, UTF-32 strings I Conversion to/from approx 200 encodings I Case mapping I Collation (sort and search) I Only support “logical” operations I No support for rendering or formatting I Purely written in OCaml Overview - feature I No support for rendering or formatting I Purely written in OCaml Overview - feature I Only support “logical” operations I Purely written in OCaml Overview - feature I Only support “logical” operations I No support for rendering or formatting Overview - feature I Only support “logical” operations I No support for rendering or formatting I Purely written in OCaml Outline Overview ASCII to Unicode : A challenge of multilingualization A brief tour of Camomile modules ulib Conclusion UTF-8, UTF-16 and UTF-32 legacy encodings ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) Multiple representation of strings Combining characters Diverse cultural conventions Large number of characters ASCII to Unicode : challenge of multilingualization UTF-8, UTF-16 and UTF-32 legacy encodings ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) Multiple representation of strings Combining characters Diverse cultural conventions ASCII to Unicode : challenge of multilingualization Large number of characters UTF-8, UTF-16 and UTF-32 legacy encodings ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) Multiple representation of strings Combining characters Diverse cultural conventions ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) UTF-8, UTF-16 and UTF-32 legacy encodings Combining characters Diverse cultural conventions ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff Multiple representation of strings ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) legacy encodings Combining characters Diverse cultural conventions ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff Multiple representation of strings UTF-8, UTF-16 and UTF-32 ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) Combining characters Diverse cultural conventions ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff Multiple representation of strings UTF-8, UTF-16 and UTF-32 legacy encodings Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Diverse cultural conventions ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff Multiple representation of strings UTF-8, UTF-16 and UTF-32 legacy encodings Combining characters Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Diverse cultural conventions ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff Multiple representation of strings UTF-8, UTF-16 and UTF-32 legacy encodings Combining characters ä = a + ¨ Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) â. = a + . + ˆ = a + ˆ + . Diverse cultural conventions ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff Multiple representation of strings UTF-8, UTF-16 and UTF-32 legacy encodings Combining characters ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) Diverse cultural conventions ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff Multiple representation of strings UTF-8, UTF-16 and UTF-32 legacy encodings Combining characters ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff Multiple representation of strings UTF-8, UTF-16 and UTF-32 legacy encodings Combining characters ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Diverse cultural conventions Sorting ... < H < CH < I < ... (Slovak) ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff Multiple representation of strings UTF-8, UTF-16 and UTF-32 legacy encodings Combining characters ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Diverse cultural conventions Case mapping OΣOΣ ! oσo& (Greek) ASCII to Unicode : challenge of multilingualization Large number of characters code range 0x0 - 0x10ffff Multiple representation of strings UTF-8, UTF-16 and UTF-32 legacy encodings Combining characters ä = a + ¨ Nguyên˜ = Nguyê + ˜ + en = Nguye + ˆ + ˜ + en â. = a + . + ˆ = a + ˆ + . Diverse cultural conventions Case mapping OΣOΣ ! oσo& (Greek) Sorting ... < H < CH < I < ... (Slovak) Outline Overview ASCII to Unicode : A challenge of multilingualization A brief tour of Camomile modules ulib Conclusion module Camomile= CamomileLibrary.Make(Parameters ) Camomile modules - Initialization module Camomile= CamomileLibrary.Make(Parameters) Camomile modules - Initialization module Camomile= CamomileLibrary.Make(Parameters ) Parameter: sig val datadir: string val charmapdir: string val unimapdir: string val localedir: string end Camomile modules - Initialization module Camomile= CamomileLibrary.Make(Parameters ) Parameter: sig val datadir : string val charmapdir: string val unimapdir: string val localedir: string end Location of compiled Unicode database Camomile modules - Initialization module Camomile= CamomileLibrary.Make(Parameters ) Parameter: sig val datadir: string val charmapdir : string val unimapdir: string val localedir: string end Location of compiled mapping tables for character encodings Camomile modules - Initialization module Camomile= CamomileLibrary.Make(Parameters ) Parameter: sig val datadir: string val charmapdir: string val unimapdir : string val localedir: string end Location of compiled mapping tables for East Asian encodings Camomile modules - Initialization module Camomile= CamomileLibrary.Make(Parameters ) Parameter: sig val datadir: string val charmapdir: string val unimapdir: string val localedir : string end Location of compiled locale data Camomile modules - UChar typet exception Out_of_range val char_of:t -> char val of_char: char ->t val code:t -> int val chr: int ->t val eq:t ->t -> bool val compare:t ->t -> int Camomile modules - UChar typet exception Out_of_range val char_of:t -> char val of_char: char ->t val code:t -> int val chr: int ->t val eq:t ->t -> bool val compare:t ->t -> int Unicode type and exception Camomile modules - UChar typet exception Out_of_range val char_of:t -> char val of_char: char ->t val code:t -> int val chr: int ->t val eq:t ->t -> bool val compare:t ->t -> int Conversion to/from char Camomile modules - UChar typet exception Out_of_range val char_of:t -> char val of_char: char ->t val code:t -> int val chr: int ->t val eq:t ->t -> bool val compare:t

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    83 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us