<<

CONNECT TO COMMUNITY. At SunGard Summit, come together as a community dedicated to education.

UTF-8 data in Banner 8.0, now what?

Presented by: Arnold . Smith III Virginia Tech March 23, 2009 Course ID 0996

Session Rules of Etiquette

• Please turn off your cell phone/pager • If you must leave the session early, please do so as discreetly as possible • Please avoid side conversation during the session

Thank you for your cooperation!

Course ID 0996 2

Introduction

• Share information about UTF-8 —The concept of “speaking” UTF-8 —Testing and test data —Considerations beyond the database —Configuration information • Questions and Answers

Course ID 0996 3 Agenda Slide

• Does your software “speak” UTF-8? —“Speak” UTF-8? What do you mean? —Single versus multiple —Test data • UTF-8 and the outside world —Interfaces and UTF-8 —Some interfaces will not support • Additional UTF-8 information —Tips and tricks —Various configuration options explained • API user exits at Virginia Tech

Course ID 0996 4

CONNECT TO COMMUNITY. At SunGard Summit, come together as a community dedicated to education.

Does your software “speak” UTF-8?

Speak “UTF-8”? What do you mean?

• What do the letters “Summit” mean? —A meeting/retreat typically involving the highest level decision makers within a given domain —summit -> gipfel -> cumbre • What is “a”? —A number we have all agreed represents the “a” —Unicode notation U+0061 Encoding Hex ASCII 97 61 Windows-1252 97 61 UTF-8 97 61 EBCDIC 129 81 UTF-16 00 97 00 61

Course ID 0996 6 UTF-8 encoding

• Standard characters, symbols and control —Use 1 —Standard Keyboard —Decimal value < 127 —Hex value < 80 —Identical to ASCII, Windows-1252, ISO-8859-1 • Extended characters, symbols, etc. —Use 2 to 4 bytes —Supports multiple languages and sets

Character Windows-1252 UTF-8 Unicode ä E4 C3 A4 U+00E4 ™ 99 E2 84 A2 U+2122 E2 8C 81 U+2301

Course ID 0996 7

Single versus multiple bytes

• “Tämpico™” might appear as “Tämpicoâ„¢”

UTF-8 Hex Windows-1252 54 T ä C3 A4 ä 6D m p 70 p i 69 i 63 c 6F o ™ E2 8C 81 â„¢

• Possible Reasons —Software not configured for UTF-8 —Software doesn’t support UTF-8

Course ID 0996 8

Practical Example

• sqlplus on database server —Sun Solaris 5.10 —Database converted to AL32UTF8 —NLS_LANG=AMERICAN_AMERICA.AL32UTF8 —LANG=en_US.UTF-8 • The name “Moët” appears as “Moët” —SSH client did not support UTF-8 —Use putty with UTF-8 enabled

Course ID 0996 9 Test Data

• Quality of test data is critical —Configuration Issues —Support for UTF-8 • What constitutes good test data —Multi-byte characters —Maximum field length —Asian characters —Esoteric characters • The web plus copy and paste is your friend

Course ID 0996 10

CONNECT TO COMMUNITY. At SunGard Summit, come together as a community dedicated to education.

UTF-8 and the outside world

Interfaces and UTF-8

• Interface defined —An interface defines the communication boundary between two entities, such as a piece of software, a hardware device, or a user. • Interfaces and —Both entities must agree on character encoding —Some interfaces specify encoding • XML files • HTTP protocol —Most interfaces assume a default encoding • Some interfaces exist internal to a single machine

Course ID 0996 12 Multiple interface example

LANG NLS_LANG sqlplus

Oracle AL32UTF8

SSH Client SSH Client Configuration

Course ID 0996 13

Interface thoughts

• Interfaces that “just worked” — Native Banner —Banner Self-Service • Interfaces impacted by UTF-8 —Desktop query tools —Reports —Non-banner applications —Other entities on campus —Outside agencies (IRS, bank, etc.) —Outsourced systems • Lessons Learned —Character encoding issues existed before UTF-8 —Need for better data flow documentation

Course ID 0996 14

Some interfaces will not support Unicode

• Are you really going to encounter UTF-8 data? • Possible Solutions —Use NLS_LANG like • AMERICAN_AMERICA.WE8MSWIN1252 • AMERICAN_AMERICA.US7ASCII —Use conversion tool like iconv • Possible data loss —Is the use of substitution characters acceptable? —Will the data be fed back into Banner? • Possible data corruption —Does the data contain “keys” that will be used to access or update information in Banner?

Course ID 0996 15 CONNECT TO COMMUNITY. At SunGard Summit, come together as a community dedicated to education.

Additional UTF-8 information

Query Tools

• Fully support AL32UTF8 — 2007 —Oracle SQL Developer 1.5 • Requires NLS_LANG set to WE8MSWIN1252 —Quest Software SQL Navigator 5.5 • Old tools that will not connect to AL32UTF8 database —Oracle Data Browser (Oracle Developer/2000) —Oracle Query Builder (Oracle Developer 6i)

Course ID 0996 17

UTF-8 files on Windows XP

• Multiple Tools —Notepad —WordPad —Microsoft Office Word 2007 —OpenOffice.org Writer • Encoding —Unicode (UTF-8)

Course ID 0996 18 BYTE versus CHAR semantics

• BYTE semantics —Specify the # of bytes —VARCHAR2(3 BYTE) • CHAR semantics —Specify the # of characters —VARCHAR2(3 CHAR) • Character Semantics Default —NLS_LENGTH_SEMANTICS —Session or instance level —Controls how VARCHAR2(3) is interpreted • Oracle Article —Globalize with Character Semantics

Course ID 0996 19

NLS_LANG

• language_territory.charset —AMERICAN_AMERICA.AL32UTF8 —AMERICAN_AMERICA.WE8MSWIN1252 —AMERICAN_AMERICA.WE8ISO8859P1 —AMERICAN_AMERICA.US7ASCII • Oracle NLS_LANG FAQ

Database Client Substitution NLS_CHARACTERSET NLS_LANG charset Character AL32UTF8 WE8ISO8859P1 ¿ AL32UTF8 WE8MSWIN1252 ¿ AL32UTF8 US7ASCII ?

Course ID 0996 20

LANG and locale (Solaris)

• LANG —Controls the character set used by the Unicode libraries employed by Banner 8.0 Pro*C —Adapts Solaris to a specific geographic market and corresponding character set • Determine current locale using locale command LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_ALL= • Solaris locale FAQ

Course ID 0996 21 Identify data in Oracle

• SPRIDEN names with characters beyond ASCII SELECT * FROM SPRIDEN WHERE SPRIDEN_CHANGE_IND IS NULL AND ( CONVERT( CONVERT(SPRIDEN_LAST_NAME,'US7ASCII'), 'AL32UTF8','US7ASCII') != SPRIDEN_LAST_NAME OR CONVERT( CONVERT(SPRIDEN_FIRST_NAME,'US7ASCII'), 'AL32UTF8','US7ASCII') != SPRIDEN_FIRST_NAME OR CONVERT( CONVERT(SPRIDEN_MI,'US7ASCII'), 'AL32UTF8','US7ASCII') != SPRIDEN_MI )

Course ID 0996 22

Identify data in Oracle

• SPRIDEN names with characters beyond Windows-1252 SELECT * FROM SPRIDEN WHERE SPRIDEN_CHANGE_IND IS NULL AND ( CONVERT( CONVERT(SPRIDEN_LAST_NAME, 'WE8MSWIN1252'), 'AL32UTF8', 'WE8MSWIN1252') != SPRIDEN_LAST_NAME OR CONVERT( CONVERT(SPRIDEN_FIRST_NAME, 'WE8MSWIN1252'), 'AL32UTF8', 'WE8MSWIN1252') != SPRIDEN_FIRST_NAME OR CONVERT( CONVERT(SPRIDEN_MI,'WE8MSWIN1252'), 'AL32UTF8','WE8MSWIN1252') != SPRIDEN_MI )

Course ID 0996 23

Hyperion SQR 8.5 – SQR.INI

[Default-Settings] Default-Numeric = V30 NewGraphics = True AutoDetectUnicodeFiles = FALSE UseUnicodeInternal = FALSE Output-File-Mode = Short OutputTwoDigitYearWarningMsg = FALSE

Course ID 0996 24 Hyperion SQR 8.5 – SQR.INI

[Environment:Common] Encoding = UTF-8 Encoding-SQR-Source = ASCII Encoding-File-Input = UTF-8 Encoding-File-Output = UTF-8 Encoding-Console = UTF-8 Encoding-Database = UTF-8 Encoding-Report-Output = ASCII

Course ID 0996 25

Hyperion SQR 8.5 – write statement

• length uses byte semantics —Example write 1 from &LNAME:30 '^':1 &FNAME:30 —Workaround let $LNAME = substr(&LNAME,1,30) let $FNAME = substr(&FNAME,1,30) write 1 from $LNAME '^' $FNAME

Course ID 0996 26

Hyperion SQR 8.5 – read statement

• length uses byte semantics —Example read 1 into $LNAME:30 $FNAME:30 —Workaround read 1 into $RECORD let $LNAME = substr($RECORD,1,30) let $FNAME = substr($RECORD,31,30)

Course ID 0996 27 CONNECT TO COMMUNITY. At SunGard Summit, come together as a community dedicated to education.

API user exits at Virginia Tech

API user exits at Virginia Tech

• We do not officially support full spectrum of UTF-8 • Wanted to limit critical fields to Windows-1252 • Added user exits to the following APIs —FB_INVOICE_HEADER —GB_ADDRESS —GB_IDENTIFICATION —GB_BIO

Course ID 0996 29

Summary

• Multi-byte nature of UTF-8 • Importance of quality test data • Interfaces and character set encoding —Do you know how data flows in and out? • Level of support for UTF-8

Course ID 0996 30 Questions & Answers

• Be sure to leave about 10-15 minutes for questions from your audience

Course ID 0996 31

Thank You! Arnold J. Smith III [email protected]

Please complete the online class evaluation form Course ID 0996

SunGard, the SunGard logo, Banner, Campus , Luminis, PowerCAMPUS, Matrix, and Plus are trademarks or registered trademarks of SunGard Data Systems Inc. or its subsidiaries in the U.. and other countries. Third-party names and marks referenced herein are trademarks or registered trademarks of their respective owners.

© 2009 SunGard. All rights reserved.

Course ID 0996 32