Unicode in SAP Netweaver

Home , ASCII, Unicode Consortium

Unicode in SAP NetWeaver

Sebastian Buhlinger SAP Consultant, HP-SAP EMEA CC

1. Introduction to Unicode

2. Unicode & SAP in General

3. Technology in Depth

4. Sizing Information for Unicode- based SAP Systems

3/31/2004 2 Introduction to Unicode

3/31/2004 3 1. Introduction to Unicode

• What is text? • History of character encoding • Problem of character encoding • From ASCII to Unicode • What is Unicode exactly? • The Unicode Standard • Where is Unicode used? • The Unicode Consortium • Unicode Encodings

3/31/2004 4 What is text?

• Code pages & encodings describe the handling of and the way text is stored in • Computers • Files • Data structures • Inside a computer program or data file, text is stored as a sequence of numbers – just like “everything else” • A character is a: • Letter, • Digit, • Period, • Hyphen, • Punctuation or • Math symbol • Furthermore there are control characters – typically not visible

3/31/2004 5 History of Character Encoding

• Historically, computers were pretty slow, had fairly little memory and were very expensive • Up to 1960s I/O meant pushing holes into paper tapes • Most of the character sets date back to punch-card age and are designed with these cards in mind • In the early days of computers every hardware manufacturer used proprietary technology (and encodings) • International data interchange was no issue and so nothing needed to fit together

3/31/2004 6 Problem of character encoding

• Which number is assigned to which character? • When typing an ‘A’ on the keyboard, the computer uses the character code as a basis for pulling the character shape of ‘A’ from a font file listing with the same binary number, and displays or prints it • The character ‘A’ may also have different integer values in different programs or data files (‘A’ might be ‘•’ in an Arabic font file) • In some instances no number available for certain characters (f.i. “ä” à Ä) • All data encoded in the form of binary numerical codes 3/31/2004 7 Character repertoire

• English alphabet: with some digits and little more: ~ 60 characters • Western European Standard: ~ 300 characters for several languages • Korean: ~12.000 syllables • Chinese dictionaries: ~ 50.000 letters • Hundreds of other characters in common use, such as math and currency symbols

3/31/2004 8 From ASCII to Unicode

• Most character sets and encodings in 70s/80s were modifications or extensions of ASCII • Many of them used 8-bit with a subset of the 94 used ASCII characters • Most common encodings nowadays use single byte per character (SBCS) • They are all limited to 256 characters • Due to that, none of them can even cover the letters for the Western European languages

3/31/2004 9 From ASCII to Unicode

• Consequence: many different 8-bit encodings were created to fulfill the needs of different user communities • Solution for data interchange in global networked information society and collaborative business world: single character set for all languages in use • Unicode can encode 4.294.967.296 different characters, symbols and control characters

3/31/2004 10 What is Unicode exactly?

• Unicode = universally encoded character set to store information from any language • Unicode defines • properties for each character • standardizes script behavior • provides a standard algorithm for bi directional text • defines cross-mappings for other standards • Unicode defines a unique code value for every character, regardless of platform, program or programming language used

3/31/2004 11 What is Unicode exactly?

• The Unicode standard primarily encodes scripts rather than languages • Scripts comprise several languages that historically share the same set of symbols • In many cases a script may serve to write dozens of languages (e.g. the Latin script) • In other cases one script complies to one language (e.g. Hangul)

3/31/2004 12 What is Unicode exactly?

• Additionally it also includes punctuation marks, diacritics, mathematical symbols, technical symbols, musical symbols, arrows, dingbats etc.

• In all, the Unicode Standard comprises >95.000 characters, ideograph sets, symbols (version 4.0)

3/31/2004 13 The Unicode Standard

• The Unicode Standard is a character coding system designed to support the worldwide •interchange, •processing, •and display of written text of the diverse languages and technical disciplines of the modern world

• In addition, it supports classical and historical texts of many written languages

3/31/2004 14 Where is Unicode used?

• The Unicode standards has been adopted by many software and hardware vendors • Mosts OSs support Unicode • Unicode is required for international document and data interchange, the Internet and the WWW, and therefore by modern standards such as: • Java, C#, Perl, Python • Markup languages such as XML, HTML, XHTML, MathML, WML etc. • JavaScript • LDAP • CORBA etc.

3/31/2004 15 The Unicode Consortium

• The Unicode Consortium is a non-profit organization originally founded to •develop, •extend, •and promote the use of the Unicode Standard

• Members of the Consortium include major computer corporations, software producers, database vendors, research institutions, international agencies, various user groups, and interested individuals

3/31/2004 16 The Unicode Consortium

• The Consortium cooperates with •W3C and •ISO •and has liaison status "C" with ISO/IEC/ JTC 1/SC2/WG2, which is responsible for in refining the specification and expanding the character set of ISO/IEC 10646

3/31/2004 17 Unicode Encodings

• UTF = Unicode Transformation Format • UCS = Universal Character Set • CESU = Compatibility Encoding Scheme

• Conversion between different encodings is a simple, bit-wise operation (defined in standard) • No performance excessive conversion table necessary!

3/31/2004 18 Unicode Encodings

• UTF-8: Unicode Transformation based on 8- bit representation

• CESU-8: Compatibility Encoding Scheme of UTF-16 on an 8-bit base

• UTF-16: Unicode Transformation based on 16-bit representation

3/31/2004 19 Unicode Encodings

• UCS-2: Universal Character Set 2 byte variation (16-bit)

• UTF-32: Unicode Transformation based on 32-bit representation

• UCS-4: Universal Character Set 4 byte variation (32 bit)

3/31/2004 20 Unicode Encodings

• Not all Unicode characters are 2 bytes long ’ no doubling of hw requirements in the first place • Unicode encoding determines the length of a character • Character in one Unicode encoding can be longer than 1 byte; therefore Unicode characters can be longer than characters defined in a standard code page

3/31/2004 21 UTF-8

• UTF-8 is the 8-bit encoding of Unicode • It’s a variable-width encoding and also a strict superset of 7-bit ASCII • “Strict superset” means that every character in 7-bit ASCII is available in UTF-8 with the same corresponding code point value • 1 character = 1byte – 4 bytes in the encoding • Characters from European scripts: either 1or 2 bytes • Asian scripts: 3 or 4 bytes

3/31/2004 22 UTF-8

• UTF-8 used for UNIX-platforms, HTML and most Internet Browsers • Main benefits of UTF-8: •compact storage requirements for European scripts •in general European scripts will occupy less storage on disk and memory •ease of migration –> since 7-bit ASCII data remains the same in UTF-8, data conversion effort between ASCII based character sets and UTF-8 is reduced significantly

3/31/2004 23 UTF-8 / CESU-8 (8-bit encodings)

• 8-bit encodings are well-suited for data transfer since all 7-bit ASCII and 8-bit ISO characters retain the same code points

• Easier communication with legacy and non- Unicode systems

• Downside: variable character length

3/31/2004 24 UCS-2

• UCS-2 has a fixed width of 16 bit (2 bytes) • UCS-2 is the Unicode encoding for Java & Win NT 4.0 • Main benefits of UCS-2: • More compact storage requirements for Asian scripts (each character represented with 2 bytes only) • String processing will be faster because all characters are of the same width • Good compatibility with Java and Microsoft clients

• Downside: • UCS-2 can support Unicode characters defined up to Unicode 3.0 only (max. 65.536)

3/31/2004 25 UTF-16

• UTF-16 is the 16-bit encoding of Unicode • Basically an extension of UCS-2 • One Unicode character can be 2 or 4 bytes in the encoding • Characters from European and most Asian scripts are represented in 2 bytes • Supplementary characters are represented in 4 bytes • UTF-16 is the main Unicode encoding from Windows 2K

3/31/2004 26 UTF-16

• Main benefits of UTF-16: •More compact storage requirements for Asian scripts (2 bytes for commonly used characters) •Ideal if European and Asian scripts are used together --> UTF-16 will occupy less storage on disk and memory than with UTF-8 (3 bytes for Asian part) •Balance of efficient access to characters and economical use of storage

• Above mentioned points reason for use of UTF-16 in SAP Web Application Server

3/31/2004 27 UCS-2 / UTF-16 (16-bit encodings)

• 16-bit encodings offer a compromise between the pros and cons of the 8-bit and the 32-bit encodings, respectively • They do not need as much memory as 32-bit encodings, but offer quasi fixed character length • UCS-2 has a fixed character length, but it cannot define more than 2^16 (65.636) characters

3/31/2004 28 UTF-32

• 32-Bit encoding

• Popular when memory space is no concern

• Fixed width (4Byte)

3/31/2004 29 UCS-4 / UTF-32 (32-bit encodings)

• All 32-bit encodings have a fixed length

• This advantage is outweighed by the extensive memory & storage requirements

3/31/2004 30 Example #1

Character UTF-8 UCS-2 UTF-16 A 41 0041 0041 c 63 0063 0063 Æ C3 86 00C6 00C6 Ö C3 B6 00F6 00F6

• DA 64 0664 0664 • E4 BA 75 9875 9875 • F0 9D 84 9E N/A D834 DD1E 3/31/2004 31 Example #2 – character “•” U+AC00

UTF- 8 HEX E A B 0 8 0

BIN 1110 1010 1011 0000 1000 0000

Lead Byte Indicator Trailing Byte Indicator Remove lead bytes 1110 1010 1011 0000 1000 0000

1010 11 0000 00 0000

Regroup bits 1010 1100 0000 0000

UTF- 16 BIN 1010 1100 0000 0000

HEX A C 0 0

3/31/2004 32 Unicode & SAP in General

3/31/2004 33 2. Unicode & SAP in General

• Languages and characters • Characters on Disk/Memory • Code Pages • SAP & Code Pages • Language Combinations before Unicode • Recommendations from SAP (w/o Unicode) • Unicode-compliant SAP products • When/why do customers need Unicode?

3/31/2004 34 Language and characters

• Languages are written in fonts • Only a few languages use the same fonts • A font is a group of characters

3/31/2004 35 Characters on Disk/Memory

• A character is stored as a byte sequence on disk • a code page defines the mapping between the byte sequence and a character Characters on Disk/Memory

3/31/2004 36 Code Pages

• The code page determine what character you can see and enter

Characters on Disk/Memory

3/31/2004 37 Code Pages

• different code pages map different characters to the same byte sequence

Single Byte Double Byte Characters on Disk/Memory

3/31/2004 38 SAP & Code Pages

3/31/2004 39 Language Combinations before Unicode

• Single Standard Code Pages • supports specific sets of languages • the number and combination of languages that are supported cannot be altered

• Standard code pages and R/3 languages (w/o EBCDIC)

Double-Byte Code Pages

3/31/2004 40 Language Combinations before Unicode

• It is also possible to specify a customer- specific language; this language must use one of the code pages that SAP supports; see Note 0112065

3/31/2004 41 Language Combinations before Unicode

• Blended Code Pages (³ Rel. 3.1D) • SAP proprietary code pages that contain characters from one or more standard code pages

• increases the combinations of languages that can be used

• functionally, a Blended Code Page system uses a single code page

• a Blended Code Page is a single code page system

• users can see and enter all characters contained in the code page, regardless of their log-in language 3/31/2004 42 Language Combinations before Unicode

SAP Code Page Supported Languages

3/31/2004 43 Language Combinations before Unicode

• the availability of SAP blended code pages is platform dependent, because SAP blended locales need to be created for each platform • Blended Locale Status (x = available -- = not available)

3/31/2004 44 Language Combinations before Unicode

• MDMP (³ Rel. 3.1I) Multi-Display / Multi-Processing

• allows dynamic code page switching on the application server • therefore permits any combination of standard code pages on one system • the log-on language determines the code page that is active for each user • an MDMP system is recommended if:

1. one or more additional code pages are required to add languages to your existing installation 2. a blended code page cannot support the combination of languages you need for a new installation. For example, an MDMP system with the code pages 1100 and 8000, allows German and Japanese users to log onto the same R/3 system in their respective languages

3/31/2004 45 Language Combinations before Unicode Front End Example

8000 - SJIS

Japan Application DB Server

1100 – ISO-1

Germany • Each user can only access one code page at a time: a user who logs in as a Japanese user cannot enter German characters, and all German characters in the database will not be correctly displayed 3/31/2004 46 Language Combinations before Unicode Example

Japanese German User User

3/31/2004 47 Language Combinations before Unicode

Please Note: • It is possible for a user to log on with German and then manipulate the character set and font settings so that he can enter what appear to be Japanese characters; these characters will not be correctly stored in the database and this data will be corrupt

• If a user wants to enter f.i. Japanese, he/she must log on in Japanese

3/31/2004 48 Language Combinations before Unicode

Please Note: • To insure that no data corruption occurs, the following restrictions must be followed:

•Global data must contain only 7-bit ASCII characters, which are in all code pages •Users may use only the characters of their log-in language or 7-bit ASCII •Batch processes must be assigned with the correct user ID and language •EBCDIC code pages are not supported

3/31/2004 49 Recommendations from SAP (w/o Unicode)

• In general, using a single standard code page for new installations and upgrades is the optimal decision • If additional languages or language combinations are needed, SAP recommends Unambiguous Blended Code Pages for new installations and MDMP for existing installations • Unambiguous Blended Code Pages only support certain language combinations and therefore an MDMP setup may be the only possibility for new installations as well

3/31/2004 50 Unicode-compliant SAP products

• All Unicode installations are currently planned only with written permission of SAP carried out as customer projects together with SAP, except of new installations of R/3 Enterprise Extension Set 2.0

3/31/2004 51 Unicode-compliant SAP products (SAP Note 79991)

ü SAP Web Application Server (³ 6.20)

ü mySAP Customer Relationship Management (CRM) • The Unicode version of mySAP CRM 4.0 is available via Ramp-Up

ü mySAP Supply Chain Management (SCM) • The Unicode version of mySAP SCM 4.0 is available via Ramp-Up

ü mySAP Supplier Relationship Management (SRM) • The Unicode version of mySAP SRM 4.0 is available via Ramp-Up • conversions (with or without MDMP) of existing SRM installations

3/31/2004 52 Unicode-compliant SAP products (SAP Note 79991)

ü mySAP Business Intelligence (BW) • The Unicode version of mySAP BW 3.5 is available via Ramp-Up • the conversion of existing BW installations as customer project • SAP Note 643813 has a collection of all relevant SAP notes concerning Unicode-based SAP BW installations

ü mySAP Product Lifecycle Management (PLM) • The Unicode version of mySAP PLM 4.0 is available via Ramp-Up

ü SAP R/3 Enterprise (Ext. 1.10 & higher)

ü SAP Exchange Infrastructure

3/31/2004 53 When/why do customers need Unciode?

• Global businesses that require IT systems to support multilingual data without any restrictions ’ f.i. customers with one WW central SAP system

• Web interfaces open the door to a global customer base, and IT systems must consequently be able to support multiple local languages simultaneously

3/31/2004 54 When/why do customers need Unciode?

• With J2EE integration, mySAP components fully support web standards, and with Unicode, it now can take full advantage of XML and Java

• Only Unicode makes it possible to seamlessly integrate inhomogeneous SAP and non-SAP system landscapes ’ NetWeaver

3/31/2004 55 Technology in Depth

3/31/2004 56 3. Technology in Depth

• Unicode & Operating Systems • Unicode & Databases • SAP Unicode-based Code Pages • How to Unicode-enable a program • Unicode-enabled ABAP • Migrating to Unicode enabled ABAP • Unicode Conversion, IMIG Lab Test • SAP System-to-System communication • Printing & Output Management 3/31/2004 57 Unicode & Operating Systems – HP-UX

• HP-UX is Unicode-enabled since version 10.x • All Unicode locales in the HP-UX operating environment are based on the UTF-8 format • Each locale includes a base language in the UTF-8 code set and the regional data related to this base language • This includes local formatting rules, text messages, help messages, and other related files • Each locale also supports several other scripts for input, display, code conversion, and printing

3/31/2004 58 Unicode & Operating Systems - Windows

• Some Unicode support has been included in Microsoft Windows since Windows 95, and Windows NT 4 • Windows 2000 and Windows XP/2003 are based on Unicode instead of the ANSI or WGL4 character sets • Before Win2K, your version of Windows may have used a different character set if you live in a country such as Egypt, Greece, Israel, Russia or Thailand that uses a non-Latin alphabet

3/31/2004 59 Unicode & Operating Systems – Windows

• The first 128 characters were the same as in ANSI, but many of the places in the second set of 128 were taken by characters from the Arabic, Greek, Hebrew, Cyrillic or Thai alphabets

• This caused and still causes problems when moving documents between operating systems such as DOS, Windows, Mac OS and UNIX or exchanging documents electronically that were created on computers using different character sets

3/31/2004 60 Unicode & Operating Systems – Linux

• Before UTF-8 emerged, Linux users all over the world had to use various different language- specific extensions of ASCII

• Most popular were ISO 8859-1 and ISO 8859-2 in Europe, ISO 8859-7 in Greece, KOI-8 / ISO 8859-5 / CP1251 in Russia, EUC and Shift-JIS in Japan, BIG5 in Taiwan, etc.

• This made the exchange of files difficult and application software had to worry about various small differences between these encodings

3/31/2004 61 Unicode & Operating Systems – Linux

• Because of these difficulties, major Linux distributors and application developers have now started to phase out these older legacy encodings in favor of UTF-8

• UTF-8 support has improved dramatically over the last few years and ever more people now use UTF- 8 on a daily basis in • text files (source code, HTML files, email messages, etc.) • file names • standard input and standard output, pipes • …

3/31/2004 62 Unicode & Operating Systems – Linux

• In UTF-8 mode, terminal emulators (such as xterm) transform every keystroke into the corresponding UTF-8 sequence and send it to the stdin of the foreground process

• Similarly, any output of a process on stdout is sent to the terminal emulator, where it is processed with a UTF-8 decoder and then displayed using a 16- bit font

3/31/2004 63 Unicode & Operating Systems – Linux

• Before you start experimenting with UTF-8 under Linux, update your installation to a recent distribution with up-to-date UTF-8 support

• This is particular the case if you use an installation older than SuSE 8.1 or Red Hat 8.0

• Before these, UTF-8 support was far too limited and experimental to be recommendable for daily use

3/31/2004 64 Little vs. Big Endian

• UCS and Unicode are first of all just code tables that assign integer numbers to characters

• There exist several alternatives for how a sequence of such characters or their respective integer values can be represented as a sequence of bytes

• The two most obvious encodings store Unicode text as sequences of either 2 or 4 bytes sequences 3/31/2004 65 Little vs. Big Endian

• The official terms for these encodings are UCS- 2 and UCS-4, respectively • Unless otherwise specified, the most significant byte comes first in these (Big Endian convention) • An ASCII or Latin-1 file can be transformed into a UCS-2 file by simply inserting a 0x00 byte in front of every ASCII byte • If we want to have a UCS-4 file, we have to insert three 0x00 bytes instead before every ASCII byte

3/31/2004 66 Little vs. Big Endian

UTF-16 UTF-16 Character Unicode Scalar Value UTF-8 / CESU-8 [Little [Big Endian] Endian]

A U+0041 41 41 00 00 41

Ä U+00C4 C3 84 C4 00 00 C4

• U+03B1 CE B1 B1 03 03 B1

• U+05D0 D7 90 D0 05 05 D0

• U+6653 E6 99 93 53 66 66 53

3/31/2004 67 Unicode & Databases

Supported Databases by SAP (WAS 6.20)

P Available ? Currently not available -- Unsupported in general

Win2K HP-UX Solaris AIX OS/400 OS/390 Linux SQL Server P ------Oracle P P P P -- -- P DB2 P P P P P ? P SAP DB P P P P -- -- P

3/31/2004 68 Unicode & Databases

Manufacturer Version Encodings

SQL Server 2000 UTF-16

Oracle 7.2 UTF-8

8 UTF-8

9i UTF-8 / UTF-16

10g UTF-8 / UTF-16

DB2 AIX CESU-8

AS400 UTF-16

SAP DB 7.0 UTF-16

8.0 UTF-8 3/31/2004 69 SAP Unicode-based Code Pages

• With the Unicode enablement of mySAP.com components (check chapter #1), the old code page management had to be changed • Instead of using SAP character numbers all code pages are now based on Unicode character Ids • ’ 5 digit SAP Character numbers no longer adequate This change is valid for both Unicode and Non-Unicode Systems!

3/31/2004 70 SAP Unicode-based Code Pages

3/31/2004 71 SAP Unicode-based Code Pages

• Connection between SAP character number & Unicode character ID is found in table TCP01 • You can see the connection in the SPAD character section • NOTE: not every character has a corresponding Unicode character ID! f.i.

3/31/2004 72 SAP Unicode-based Code Pages

• The migration of all SAP code pages from the old to the new format was done using report RSCP0126 • The definition of code pages is still in TCP00

Customers must migrate their own code pages (9xxx) using RSCP0126 themselves!

3/31/2004 73 How to Unicode-enable a program

• Separate Unicode and Non-Unicode version of R/3 ABAP • 1 character = 1 byte (types C, N, D, T, STRING) source Non-Unicode • Non-Unicode kernel R/3 • Non-Unicode database

• 1 character = 2 bytes ’ UTF-16 (types C, N, D, T, STRING) Unicode R/3 • Unicode kernel • Unicode database • No explicit Unicode data type in ABAP • Single ABAP source for Unicode and non-Unicode systems 3/31/2004 74 How to Unicode-enable a program

• Major part of ABAP coding is ready for Unicode without any changes • Minor part of ABAP coding has to be adapted to comply with Unicode restrictions (f.i. syntactical restrictions)

3/31/2004 75 How to Unicode-enable a program

• Program attribute „Unicode checks active“

3/31/2004 76 Unicode Enabled ABAP Design Goals • Platform independence ØIdentical behavior on Unicode and non-Unicode systems • Highest level of compatibility to the pre-Unicode world ØMinimize costs for Unicode enabling of ABAP Programs Main Features • Clear distinction between character and byte processing 1 Character <> 1 Byte

3/31/2004 77 Unicode Enabled ABAP ABAP lists: Difference between memory and display length

3/31/2004 78 Migrating to Unicode enabled ABAP

Step 1 • In non-Unicode system

• Adapt all ABAP programs to Unicode syntax and runtime restrictions

• Set attribute "Unicode enabled" for all programs

3/31/2004 79 Migrating to Unicode enabled ABAP

Step 2 • Set up a Unicode system • Unicode kernel + Unicode database • Only ABAP programs with the Unicode attribute are executable

• Do runtime tests in Unicode system

• Check for runtime errors

• Look for semantic errors

• Check ABAP list layout with former double byte characters 3/31/2004 80 Migrating to Unicode enabled ABAP Use UCCHECK to analyze your applications: • Remove errors • Inspect statically not analyzable places (optional) • Untyped field symbols • Offset with variable length • Generic access to database tables • Set Unicode program attribute using UCCHECK or SE38 / SE24 / ... • Do additional checks with SLIN (e.g. matching of actual and formal parameters in function modules)

3/31/2004 81 Migrating to Unicode enabled ABAP

3/31/2004 82 Migrating to Unicode enabled ABAP

3/31/2004 83 Upgrade to Unicode Upgrade to Unicode

• With Unicode, there are no limitations on users, and all languages in the ISO639 standard can be used

• Unicode is technically supported as of Basis Release 6.20, see Note 0379940 for more information

• A single code page system (standard or Unambiguous Blended Code Page) can be upgraded to Unicode using the normal upgrade method

3/31/2004 85 Unicode Conversion Roadmap Preparation • During preparation, topics such as

• additional hardware requirements, • downtime issues, • Unicode-enabling of customer developments, • and the special treatment of MDMP systems

have to be taken into consideration

3/31/2004 86 Unicode Conversion Roadmap Conversion • The Unicode conversion process is based on a system copy, and during this process, the database conversion and system shutdown/restart are as automated as possible • For small to mid-size databases (< 1 TB), this is based on an SAP Unload/Reload of the complete database; minimum downtime tools will be used for larger databases.

3/31/2004 87 Unicode Conversion Roadmap

Post-Conversion

• Once the Unicode system is up and running, you need to • verify data consistency on a scenario basis, • as well as carry out general integration testing

• For systems that support multiple languages, special emphasis needs to be placed on cross-language handling during the test phase.

• Correction tools are provided by SAP, which can be used in the case that conversion did not run properly.

3/31/2004 88 Unicode Conversion Roadmap

Post-Conversion

• Additional Tool: SAP Data Management - reducing the database size and growth

• To keep your database costs in check, the SAP Data Management service frees up valuable database resources by showing you how to reduce the size and growth of your database by typically 25 % (see details).

3/31/2004 89 Unicode Conversion at a Glance

Preparation

Conversion

Post-Conversion

Set up the Unicode Highly automated Conversion Project Unicode system is up and running Check Prerequisites System will be down during database conversion Verification of Data Analysis for Data Consistency downtime minimization – special MDMP treatment Unload /reload process for small databases Integration Testing focused on Enabling of Customer language handling Developments Minimum downtime tool for large databases

3/31/2004 90 Upgrade Paths to Unicode (R/3 Enterprise) Source system Target system

R/3 3.1i

R/3 Enterprise R/3 Enterprise R/3 4.0b Conversion Unicode Direct non-Unicode upgrade R/3 4.5b

l First upgrade, then conversion to Unicode R/3 4.6b l R/3 Enterprise Ramp-Up started 2002-07 l Unicode availability follows a phase of restricted shipment with pilot customers R/3 4.6c

3/31/2004 91 Upgrade Paths to Unicode (BW 3.1) Source system Target system

BW 2.0B

BW 3.1 BW 3.1 Conversion

non-Unicode Unicode

BW 2.1C l Interfacing R/3 MDMP on a project base only l Unicode BEXGUI restrictions apply l First upgrade, then conversion to Unicode l BW 3.1 Ramp-Up starting 2002-12 BW 3.0 l Unicode availability follows a phase of restricted shipment with pilot customers

3/31/2004 92 Upgrade Paths to Unicode (CRM 3.1) Source system Target system

CRM 2.0C

CRM 3.1 CRM 3.1 Conversion

non-Unicode Unicode

CRM 2.0B

l Selected scenarios only çè cooperation with SAP GBU CRM required l First upgrade, then conversion to Unicode l CRM 3.1 Ramp-Up starting 2002-12 CRM 3.0 l Unicode availability follows a phase of restricted shipment with pilot customers 3/31/2004 93 Unicode Conversion at a Glance

Preparation

Conversion

Post-Conversion

3/31/2004 94 Prerequisites, special MDMP treatment

• OSS Note 548016 Conversion from Unicode to non-Unicode is not possible The Unicode Conversion of MDMP AND also Ambiguous Code page systems ( Code Page numbers 6100, 6200 and 6500 ) is only supported on project basis with SAP involvement

• OSS Note 543715 The Unicode Conversion of a BW 3.1 system requires additional steps regarding the system copy

• OSS Note 573044 If you are using HR functionality within R/3 Enterprise , also additional steps are mandatory

3/31/2004 95 6.30 Unicode & MCOD • With SAP WebAS 6.30 a database abstraction layer for the Java stack was introduced – OpenSQL for Java • Tables of the Java stack are stored in the same database instance like the tables of the ABAP stack in two different schema (except Informix) • The concept of MCOD installations is fully supported by the combined stack of ABAP and Java

ABAP Stack (non Unicode/Unicode) SAPQA1 System QA1 Java Stack (Unicode) SAPQA1DB

ABAP Stack (non Unicode/Unicode) SAPTC2 System TC2 Java Stack (Unicode) SAPTC2DB

3/31/2004 96 Unicode Conversion at a Glance

Preparation

Conversion

Post-Conversion

3/31/2004 97 Unicode Conversion - IMIG

Whitepaper:

„SAP R/3 incremental migration test“

http://saphpcc.bbn.hp.com/Global/Compet/migration/migration.HTM

3/31/2004 98 SAP System-to-System Communication

3/31/2004 99 SAP System-to-System communication

• SAP Web Application Server (³ 6.20)

• Only one source code exists for Unicode-based and non- Unicode-based systems, ’ new developments can be smoothly exchanged

• The interfaces (e.g. RFC) have been extended, so that communication between other Unicode-based systems or non-Unicode-based systems is possible. Furthermore, SAP provides standard tools for the installation of (and conversion to) Unicode-based systems that can also be used for checking and Unicode-enabling of customer developments

3/31/2004 100 SAP System-to-System communication

Latin-1 SJIS • solid lines: receiver can receive all characters http/RFC MDMP R/3

Unicode R/3 • dotted lines: receiver cannot receive characters, which are not in its SJIS own code page. But WWW as long as you restrict the character set, data can be sent from everywhere to Latin-1 everywhere. http/RFC Non-Unicode SJIS R/3

3/31/2004 101 SAP System-to-System communication RFC • Unicode <-> Unicode • no problem

• non Unicode <-> non Unicode • old stuff, receiver converts code page if possible

• Unicode <-> non Unicode • the Unicode side converts from/ to the code page of the non Unicode side • MDMP is converted with a languages key • System settings allow the configuration of error handling

3/31/2004 102 SAP System-to-System communication RFC (SM59) – Unicode <–> non Unicode

3/31/2004 103 SAP System-to-System communication RFC (SM59) – Unicode <–> non Unicode

3/31/2004 104 Printing & Output Management What is a SAP device type? • configuration file for the SAP printer driver that ensures proper functionality between the SAP data stream and the printer or output device where the data is sent

Printer drivers & device types • In R/3, a distinction is made between "printer driver" and "device type“ • A device type consists of a variety of attributes defined for an output device • One of these attributes is the printer driver to be used by SAPscript (R/3 forms processor) for this particular printer

3/31/2004 105 Printing & Output Management

• device types cover aspects such as control commands for font selection, page size, character set selection, character set used and so on • a device type must be specified to enable direct- printing from the SAP applications for every new printer defined in SAP environment • device types are created by SAP for the entire HP LaserJet printer family on the basis of PCL5, PCL6 and PostScript • SAP develops, tests and supports device types for HP products that can be found here: http://h40045.www4.hp.com/printing_solutions/Device_Types.html

3/31/2004 106 Printing & Output Management

• at present, there are five SAPscript printer drivers They include: • HP-PCL5 (for example, HP Laserjet 3,4,5,6 series) • PostScript printers (PS level 2) • PRESCRIBE (for example, Kyocera FS-1500) • device types SWIN/SAPWIN/xxSWIN/xxSAPWIN

3/31/2004 107 Printing & Output Management Unicode Device Types • LEXMARK is going into HP accounts, claiming that only LEXMARK could support SAP UNICODE printing. Background: • in order to support UNICODE character-sets on an HP printer, customers need to have a UNICODE compliant printer and a SAP UNICODE device-type • UNICODE compliant printer are defined by firmware support for UTF8 and/or UTF16 and UNICODE fonts loaded on the printer • today LEXMARK is the preferred vendor for SAP UNICODE printing

3/31/2004 108 Printing & Output Management Solution for HP • all OZ based printers (LJ2300 and higher) support by default UNICODE UTF16 fonts in PCL6 • the LJ2300, CLJ9500 and future products will support UTF8 fonts in PCL5 • firmware role is planned to also support all current OZ based printers (LJ4200/4300, LJ9000, CLJ4600, CLJ5500) to support UTF-8 in PCL5 • furthermore the UNICODE fonts need to be loaded on the printer (e.g. stored on internal hard-disc) • today we have a UNICODE-prototype-solution available to print from an SAP environment • for more information, contact Alan Cooke (U.S.) or Stephen Westberg (EMEA) 3/31/2004 109 Sizing Information for Unicode-based SAP Systems

3/31/2004 110 Sizing Info - General The space requirements for encoding a text, compared to encodings currently in use (8 bit per character for European languages, more for Chinese/ Japanese/ Korean), is as follows ’ next Slide

This has an influence on disk storage space and network download speed (when no form of compression is used)

3/31/2004 111 Sizing Info - General UTF-8 No change for US ASCII, just a few percent more for ISO-8859-1, 50% more for Chinese/Japanese/Korean, 100% more for Greek and Cyrillic UCS-2 and UTF-16 No change for Chinese/Japanese/Korean. 100% more for US ASCII and ISO-8859-1, Greek and Cyrillic UCS-4 100% more for Chinese/Japanese/Korean. 300% more for US ASCII and ISO-8859-1, Greek and Cyrillic

3/31/2004 112 Expected Hardware Requirements

• Increase of CPU requirements ØDepending on existing solution: ISO-LATIN1 (ASCII) ð Unicode: +30% Double-Byte/MDMP ð Unicode: + <5%

• Increase of memory requirements ØIncrease of memory requirements depending on underlying DB (+ ~50%) ØApplication Server internally based on UTF-16; DB either UTF-8, CESU-8 or UTF-16

3/31/2004 113 Unicode Conversion Demo

JAVA Applet Demo

3/31/2004 114 Expected Hardware Requirements • Database growth depending on Ø DB Unicode encoding schema (e.g. CESU-8, UTF-16) Ø Languages in use • A Ä 1 Byte 1100 8000 CESU-8 UTF-16 1100 8000 CESU-8 UTF-16 1100 8000 CESU-8 UTF-16 Encoding Manufacturers Additional Storage Req‘s UTF-8 Oracle, SAP DB (8.0) 35% CESU-8 DB/2 (AIX) UTF-16 SQL Server, DB/2 (AS400), SAP DB 60-70% (7.0) • Network load: (draft results) <7% for Latin-1, about 15% for Japanese, 25% for other Asian languages 3/31/2004 115 Expected Hardware Requirements

R/3 Release 4.0 4.5 4.6c 4.7 (6.20) non-Unicode

CPU 1 +20% +15% +5%

Memory 1 +20% DB: +20%; +5% App:+10%

Disk 1 +10% +10% +10%

NON-Unicode

3/31/2004 116 Expected Hardware Requirements

R/3 Release 4.7 (6.20) non-Unicode 4.7 with Unicode

CPU 1 +30% to 35%

Memory 1 +50%

Disk 1 +~35% (UTF-8) +60-70% (UTF-16)

Unicode

3/31/2004 117