SAP Data Quality Management SDK Document Version: 4.2 Support Package 3 (14.2.3.0) - 2014-09-05

Developer Guide Table of Contents

1 Data Quality Management SDK overview...... 10 1.1 Installing Data Quality Management SDK...... 10 1.1.1 Upgrading...... 11 1.1.2 Installing the SDK on Windows...... 11 1.1.3 Installing the SDK on Unix...... 11

2 Directory data...... 13 2.1 Directory listing and update schedule...... 13 2.2 U.S. directory expiration...... 14 2.2.1 U.S. National and Auxiliary files...... 15 2.3 Where to copy directories...... 15 2.4 To install and set up SAP Download Manager...... 16 2.5 To download directory files...... 16 2.6 Extracting directory files...... 16

3 Samples...... 18 3.1 Sample program files...... 18 3.2 Building samples...... 18 3.3 Running samples...... 19

4 API reference for C++...... 20 4.1 Message types...... 20 4.2 ToLatin1...... 20 4.3 CertifiedReportGenerator...... 21 4.4 DataRecordSchema...... 22 4.5 Date...... 23 4.6 DateTime...... 25 4.7 EmdqException...... 25 4.8 InputDataRecord...... 26 4.9 MessageHandler...... 28 4.10 MultiRecordTransform...... 28 4.11 MultiRecordTransformHelper...... 31 4.12 OutputDataRecord...... 32 4.13 ProgressHandler...... 36 4.14 RecordTransform...... 36 4.15 RecordTransformHelper...... 39 4.16 StatisticsHandler...... 39 4.17 StatisticsSchema...... 40 4.18 Time...... 41

Developer Guide 2 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Table of Contents 4.19 TransformFactory...... 43

5 API reference for Java...... 46 5.1 Message types...... 46 5.2 CertifiedReportGenerator...... 46 5.3 DataRecordSchema ...... 47 5.4 EmdqException ...... 48 5.5 InputDataRecord ...... 49 5.6 MessageHandler ...... 51 5.7 MultiRecordTransform ...... 51 5.8 MultiRecordTransformHelper ...... 54 5.9 OutputDataRecord ...... 56 5.10 ProgressHandler ...... 58 5.11 RecordTransform ...... 59 5.12 RecordTransformHelper ...... 61 5.13 StatisticsHandler ...... 62 5.14 StatisticsSchema ...... 62 5.15 TransformFactory ...... 63

6 API reference for .Net...... 66 6.1 Message types...... 66 6.2 EmDQException...... 66 6.3 LogHandler...... 67 6.4 MultiRecordProgressHandler...... 67 6.5 MultiRecordTransform...... 67 6.6 MultiRecordTransformHelper...... 69 6.7 RecordTransform ...... 69 6.8 RecordTransformHelper ...... 71 6.9 TransformFactory...... 71

7 Address cleanse concepts...... 74 7.1 Setting up the reference files...... 74

8 USA Regulatory Address Cleanse...... 76 8.1 USPS DPV®...... 76 8.1.1 Benefits of DPV...... 77 8.1.2 DPV security...... 77 8.1.3 DPV monthly directories...... 80 8.1.4 Required information in the job setup...... 81 8.1.5 DPV output fields...... 81 8.1.6 Non certified mode...... 83 8.1.7 DPV performance...... 84

Developer Guide Table of Contents © 2014 SAP SE or an SAP affiliate company. All rights reserved. 3 8.1.8 DPV No Stats indicators...... 85 8.1.9 DPV Vacant indicators...... 85 8.2 USPS eLOT® ...... 86 8.3 Early Warning System (EWS)...... 87 8.3.1 Overview of EWS...... 87 8.3.2 EWS directory...... 87 8.4 SuiteLink™ ...... 87 8.4.1 Benefits of SuiteLink ...... 88 8.4.2 How SuiteLink works ...... 88 8.4.3 SuiteLink directory ...... 89 8.4.4 Improve processing speed ...... 89 8.5 LACSLink® ...... 89 8.5.1 Benefits of LACSLink ...... 90 8.5.2 LACSLink® security...... 90 8.5.3 How LACSLink works ...... 94 8.5.4 Conditions for address processing...... 94 8.5.5 LACSLink directory files...... 94 8.5.6 Required information in the job setup ...... 95 8.5.7 Reasons for errors...... 95 8.5.8 LACSLink output fields ...... 96 8.5.9 Memory usage and caching for LACSLink processing...... 97 8.5.10 USPS Form 3553 ...... 97 8.6 USPS RDI® ...... 97 8.6.1 How RDI works ...... 98 8.6.2 RDI directory files...... 98 8.6.3 RDI output field...... 99 8.6.4 CASS Statement, USPS Form 3553 ...... 99 8.7 Z4Change (USA Regulatory Address Cleanse)...... 99 8.7.1 Enable Z4Change for faster processing ...... 100 8.7.2 Z4Change and USPS rules ...... 100 8.7.3 Z4Change directory...... 100 8.8 Suggestion lists...... 101 8.8.1 Breaking ties...... 102 8.8.2 More information is needed...... 103 8.8.3 CASS rule ...... 103 8.9 USPS certifications...... 104 8.9.1 Completing USPS certifications...... 104 8.9.2 Static directories...... 105 8.9.3 CASS self-certification ...... 106 8.9.4 NCOALink certification...... 108 8.10 NCOALink (USA Regulatory Address Cleanse)...... 111

Developer Guide 4 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Table of Contents 8.10.1 The importance of move updating ...... 111 8.10.2 Benefits of NCOALink...... 111 8.10.3 How NCOALink works...... 112 8.10.4 Software performance ...... 112 8.10.5 Address not known (ANKLink) ...... 113 8.10.6 Getting started with NCOALink...... 114 8.10.7 What to expect from the USPS and SAP ...... 115 8.10.8 About NCOALink directories...... 116 8.10.9 About the NCOALink daily delete file ...... 118 8.10.10 Improving NCOALink processing performance...... 119 8.10.11 NCOALink log files...... 121 8.11 Multiple data source statistics reporting...... 123 8.11.1 Data_Source_ID field...... 124 8.11.2 USPS Form 3553 and group reporting...... 124

9 USA Regulatory Address Cleanse transform reference...... 126 9.1 System group...... 126 9.2 Report and analysis...... 126 9.3 Transform performance...... 127 9.4 Reference files...... 128 9.5 Assignment options...... 130 9.6 Standardization options...... 131 9.7 Z4 Change options...... 139 9.8 CASS Report options...... 139 9.9 Suggestion List options...... 140 9.9.1 Suggestion List output options...... 141 9.9.2 Suggestion list components...... 142 9.10 Non Certified options ...... 144 9.11 USPS license information options ...... 145 9.11.1 Required options for USPS License Information...... 147 9.12 NCOALink options...... 148 9.12.1 Processing options...... 148 9.12.2 Report Options...... 150 9.12.3 Output options...... 150 9.12.4 Processing Acknowledgment Form (PAF) Details...... 150 9.12.5 Service provider options...... 152 9.12.6 Contact Details...... 154

10 Global Address Cleanse...... 156 10.1 Supported countries (Global Address Cleanse)...... 156 10.2 Processing Japanese addresses...... 156 10.2.1 Standard Japanese address format...... 156

Developer Guide Table of Contents © 2014 SAP SE or an SAP affiliate company. All rights reserved. 5 10.2.2 Special Japanese address formats...... 161 10.3 Process Chinese addresses...... 163 10.3.1 Chinese address format...... 163 10.3.2 Sample Chinese address...... 165

11 Global Address Cleanse transform reference...... 167 11.1 System group...... 167 11.2 Report and analysis...... 167 11.3 Reference files...... 168 11.4 Country ID options (Global Address Cleanse)...... 168 11.5 Engines...... 169 11.6 Standardization options...... 170 11.7 engine...... 179 11.7.1 Canada engine options...... 179 11.7.2 Canada engine report options...... 181 11.7.3 Canada engine suggestion list options...... 182 11.8 Global Address Country options...... 183 11.9 Global Address engine report options...... 185 11.9.1 Report options for Australia...... 185 11.9.2 Report options for New Zealand...... 185 11.10 USA engine...... 186 11.10.1 USA engine options...... 186 11.10.2 USA engine suggestion lists options...... 187

12 Data Cleanse...... 189 12.1 Ranking and prioritizing parsing engines...... 189 12.2 About parsing data...... 189 12.2.1 About parsing phone numbers...... 190 12.2.2 About parsing dates...... 194 12.2.3 About parsing Social Security numbers...... 194 12.2.4 About parsing Email addresses...... 195 12.2.5 About parsing street addresses...... 196 12.3 About standardizing data...... 196 12.4 About assigning gender descriptions and prenames...... 197 12.5 Prepare records for matching...... 197 12.6 Cleansing package...... 198 12.7 About Japanese data...... 198 12.7.1 Text width in output fields...... 199 12.7.2 Processing Japanese data...... 199

13 Data Cleanse transform reference...... 200 13.1 System group...... 200

Developer Guide 6 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Table of Contents 13.2 Report and analysis...... 200 13.3 Cleansing Package...... 201 13.4 Input word breaker...... 203 13.5 Person standardization options...... 204 13.5.1 Gender options...... 206 13.6 Firm standardization options...... 207 13.7 Other standardization options...... 207 13.8 Date options...... 209 13.9 Phone options...... 210 13.10 Parser configuration...... 212

14 Geocoder...... 213 14.1 POI and address geocoding ...... 213 14.2 POI and address reverse geocoding ...... 213

15 Geocoder transform reference...... 215 15.1 Directories...... 215 15.2 System group...... 216 15.3 Geocoder options...... 216 15.3.1 Report and analysis...... 216 15.3.2 Reference files...... 216

16 Match...... 220 16.1 Match components...... 220 16.2 Physical and logical sources...... 222 16.3 Using sources ...... 223 16.3.1 Source types ...... 223 16.3.2 Source groups ...... 224 16.4 Prepare data for matching...... 224 16.4.1 Fields to include for matching...... 226 16.5 Compare tables...... 226 16.6 Data Salvage ...... 226 16.6.1 Data salvaging and initials ...... 227 16.7 Overview of match criteria...... 228 16.8 Matching methods...... 229 16.8.1 Similarity score...... 229 16.8.2 Rule-based method...... 230 16.8.3 Weighted-scoring method...... 231 16.8.4 Combination method...... 232 16.9 Matching business rules...... 233 16.9.1 Matching on strings, abbreviations, and initials...... 233 16.9.2 Extended abbreviation matching...... 234

Developer Guide Table of Contents © 2014 SAP SE or an SAP affiliate company. All rights reserved. 7 16.9.3 Name matching...... 234 16.9.4 Numeric data matching...... 235 16.9.5 Blank field matching...... 237 16.9.6 Multiple field (cross-field) comparison...... 238 16.10 Group statistics...... 239 16.11 Input source select records ...... 240

17 Match transform reference...... 241 17.1 System group...... 241 17.1.1 MatchSettings...... 241 17.2 Report and analysis...... 242 17.3 Match control...... 243 17.3.1 Match levels group...... 243 17.3.2 Input fields group...... 244 17.4 Match level group...... 244 17.5 Match criteria standard keys...... 245 17.6 Match criteria key layout...... 248 17.7 Compare table group...... 252 17.8 Compare match criteria group...... 255 17.8.1 Standard key match options...... 256 17.8.2 Criteria definition group...... 259 17.9 Post match processing group...... 266 17.10 Group statistics group...... 267 17.11 Input sources...... 270 17.11.1 Source groups...... 272 17.11.2 Input sources / Input fields...... 274 17.12 Field algorithm numeric difference group...... 275 17.13 Field algorithm numeric percent difference group ...... 275 17.14 Field algorithm geo proximity group ...... 276 17.15 Input source group statistics group...... 276 17.16 Input source select record group...... 277

18 Data Quality fields...... 279 18.1 Input fields...... 279 18.2 Output fields...... 280 18.3 Data type support...... 281 18.4 Data Cleanse fields...... 283 18.4.1 Input fields...... 283 18.4.2 Output fields...... 284 18.5 Geocoder fields...... 290 18.5.1 Input fields...... 290 18.5.2 Output fields...... 292

Developer Guide 8 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Table of Contents 18.6 Global Address Cleanse fields...... 296 18.6.1 Input fields...... 296 18.6.2 Output fields...... 302 18.6.3 Global Address Cleanse Suggestion List fields...... 326 18.7 USA Regulatory Address Cleanse fields...... 330 18.7.1 Input fields...... 330 18.7.2 Output fields...... 332 18.8 Match output fields...... 350

19 Data Quality codes...... 354 19.1 Information codes (Data Cleanse)...... 354 19.2 Country ISO codes and assignment engines...... 358 19.3 Information codes (Global Address Cleanse)...... 372 19.4 Status codes (USA Regulatory Address Cleanse)...... 374 19.5 Quality codes (Global Address Cleanse)...... 377 19.6 Status codes (Global Address Cleanse)...... 378

20 ShowA and ShowL (USA and Canada)...... 383 20.1 USA ShowA command line options...... 384 20.2 Canada ShowA command line options...... 385 20.3 Canada ShowL command line options...... 386

21 Glossary...... 388

Developer Guide Table of Contents © 2014 SAP SE or an SAP affiliate company. All rights reserved. 9 1 Data Quality Management SDK overview

The Data Quality Management SDK provides a framework and APIs that allow you to write applications that use SAP Data Quality technology, such as parsing, standardization, correction, and matching of data. You can use it to create applications that target the specific Data Quality functionality you want to employ with an in-process integration.

Relationship to Data Services

This product provides functionality similar to SAP Data Services, but deploys that technology as an API.

The Data Quality Management SDK provides a lighter footprint than Data Services. This product requires no server components (either from SAP or a third party) or user interface to access the Data Quality functionality.

Many customers choose to use this product in conjunction with Data Services, however. You can use the same release number version of Data Services to configure transform options in the Data Services Designer and create a configuration XML file for use with this SDK. To create the file, right-click on a transform in the Data Services Designer and select Save settings for DQM SDK. For more information on using the Data Services Designer, see the Data Services documentation.

When you use Data Services as a configuration tool for the Data Quality Management SDK, Data Services does not support the creation of a change log for changes to the configuration. That is, you can employ the Data Services central repository concept to manage changes to the Data Quality transforms, but no change log is created. Instead, the developer must implement a change log within a custom application created using the SDK.

EmDQ

In many aspects of this product, the letters “emdq” (or “EmDQ”) are often used in naming conventions. You can see this convention in namespaces, folder names, and file names. As the Data Quality Management SDK is an embedded, in-line processing, data quality solution, you might think of the letters emdq meaning Embedded Data Quality.

1.1 Installing Data Quality Management SDK

Installing Data Quality Management SDK is as simple as running a self-extracting executable.

Each platform (Windows, AIX, Solaris, etc.) has its own installer. You must retrieve from the SAP Service Marketplace the installer appropriate to your platform.

If you are installing to a network location, you cannot install more than one installation to a single directory. For example, if you are installing both the AIX and Solaris versions to a network location, you should create a distinct directory for each version.

Developer Guide 10 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality Management SDK overview 1.1.1 Upgrading

If you are upgrading this product from the previous release, you can install this version while the existing version still exists on the same machine. However, you must install the new version to a different directory and not overwrite the files from the previous version.

This product provides a method to the TransformFactory class, UpgradeTransformSettings(), that makes the transform settings built from the previous version of this product compatible with this version.

For information about using UpgradeTransformSettings(), see the TransformFactory class documentation for C+ +, Java, or .Net.

1.1.2 Installing the SDK on Windows

Before installing this product, you must have downloaded from the SAP Service Marketplace the appropriate package file (named *.exe).

1. Run the executable file. The Welcome screen appears.

Tip

If the installer does not start by running the executable, you can begin the installation routine by running setup.exe, which is contained in the archive.

2. Click Next. The License Agreement screen appears. 3. After reading and indicating that you accept the license agreement, click Next. The Specify the destination folder screen appears. 4. After choosing a folder to install the files for this product, click Next. The Start Installation screen appears. 5. Click Next. The installation routine extracts and places the files for this product in the folder you specified, until the final screen appears. 6. Click Finish to dismiss the installer.

The files for the SDK are now installed.

You must also install the addressing directories and cleansing package before using the address correction and data cleanse functionality of this product, or the sample applications.

1.1.3 Installing the SDK on Unix

Before installing this product, you must have downloaded from the SAP Service Marketplace the appropriate package file (named *.tgz).

1. Unpack the *.tgz file.

Developer Guide Data Quality Management SDK overview © 2014 SAP SE or an SAP affiliate company. All rights reserved. 11 The files required for installation are copied to your system. 2. Run setup.sh. The Destination Path screen appears. 3. Type a destination path for the installation.

Note

You must choose a different path than the default (which is the current working directory).

The Welcome screen appears. 4. Press Enter to dismiss the Welcome screen The License Agreement screen appears. 5. Press Enter to accept the license agreement. The installation routine places the files for this product in the path you specified until completion of the installation.

The files for the SDK are now installed.

You must also install the addressing directories and cleansing package before using the address correction and data cleanse functionality of this product, or the sample applications.

Note

If you are running on Solaris 11, you must add the following to the environment: LD_PRELOAD_64=/bin/libnanosleep.so

Developer Guide 12 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality Management SDK overview 2 Directory data

To correct addresses and assign codes with SAP Data Quality Management SDK, the transforms rely on directories, or databases. When this product uses the directories, it’s similar to the way that you use the telephone directory. A telephone directory is a large table in which you look up something you know—someone’s name—and locate something that you don’t know—their phone number.

Depending on which option you own, some disks or online packages that you receive may contain extra files in addition to your directories. You may not need to use all of these reference files depending on which transforms or options you use. For example, you may see an Extract folder. If you do not need these extra files, do not copy them to your computer. For information about extra folders, see the ReadMe.txt file included with the reference files.

2.1 Directory listing and update schedule

Directory type Directory filename Approximate Size Updated Monthly (M)

Bimonthly (B)

Quarterly (Q)

ZIP4 and Auxiliary Directories zip4us.dir 699 MB MB

Auxiliary Directories cityxx.dir 2 MB MB

zcfxx.dir 2 MB

revzip4.dir 1 MB

zip4us.rev 97 MB

zip4us.shs 4 MB

Early Warning System Directory ew.dir 1 MB Weekly

DPV Data dpv_path 653 MB M

Enhanced Line of Travel Directory elot.dir 486MB M

Canada engine - Address Data canada.dir 42 MB M

cancity.dir

canfsa.dir

canpci.dir

Australia engine - Address Data apc.dir 200 MB MQ

aucity.dir

aus.dir

Developer Guide Directory data © 2014 SAP SE or an SAP affiliate company. All rights reserved. 13 Directory type Directory filename Approximate Size Updated Monthly (M)

Bimonthly (B)

Quarterly (Q)

Global Address engine and EMEA Engine - all files up to 12.2 GB (for all Q Data countries)

Note You will receive files only for those countries your company has purchased.

Centroid Level Geo Data cgeox.dir 720 MB Q

Address Level Geo Data ageox.dir 4.67 GB Q

Geocoder geo_addr_ca_.dir Canada: 1 MB

geo_cent_ca_.dir France: 6 MB geo_addr_fr_.dir USA: < 2 GB geo_cent_fr_.dir

geo_addr_us_ >.dir

geo_cent_us_ >.dir

Z4Change Data z4change.dir 199 MB M

LACSLink all files 461 MB

2.2 U.S. directory expiration

We publish and distribute the ZIP4 and supporting directory files under a non-exclusive license from the USPS. The USPS requires that our software disable itself when a user attempts to use expired directories.

If you do not install new directories as you receive them, the software issues a warning in the log files when the directories are due to expire within 30 days. To ensure that your projects are based on up-to-date directory data, it's recommended that you heed the warning and install the latest directories.

Developer Guide 14 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Directory data Note

Incompatible or out-of-date directories can render the software unusable. The directories are lookup files used by SAP solution portfolio software. The system administrator must install monthly or bimonthly directory updates to ensure that they are compatible with the current software.

Expiration schedule

You can choose to receive updated U.S. national directories on a monthly or bimonthly basis. Bimonthly updates are distributed during the even months. Directory expiration guidelines are:

● ZIP4 and Auxiliary Directories expire on 1st day of the fourth month after directory creation. When running in Non-Certified mode, Zip4 and Auxiliary directories expire on the first day of the fourteenth month after directory creation. ● LACSLink directories expire 105 days after directory creation.

2.2.1 U.S. National and Auxiliary files

The U.S. National and Auxiliary file self-extracting files are named as follows.

Directory name Zip file name

2004-2008 U.S. National directory us_dirs_2004.exe

U.S. Address-level GeoCensus us_ageo1_2.exe

us_ageo3_4.exe

us_ageo5_6.exe

us_ageo7_8.exe

us_ageo9_10.exe

U.S. Centroid-level GeoCensus us_cgeo.exe

us_cgeo1.exe

us_cgeo2.exe

2.3 Where to copy directories

We recommend that you install the directory files in the reference_data folder for each transform created during the Data Quality Management SDK installation. By default, the software looks for directories in \DataQuality\reference_data (Windows) /DataQuality/reference_data

Developer Guide Directory data © 2014 SAP SE or an SAP affiliate company. All rights reserved. 15 (Unix). If you place your directories in a different location, you must change the individual reference file option values in the XML files.

2.4 To install and set up SAP Download Manager

Before you can download directory files, you need to install and set up SAP Download Manager.

To install and set up SAP Download Manager:

1. Access the SAP Service Marketplace (SMP): http://service.sap.com/bosap-support 2. Select Downloads. 3. Select Download Basket. 4. Click the Get Download Manager button. 5. Follow the steps to install and set up the Download Manager.

2.5 To download directory files

The directories are available for download from the SAP Service Marketplace (SMP).

To download directories:

1. Access the SAP Service Marketplace (SMP) site: http://service.sap.com/bosap-support 2. Select Software Downloads. 3. From the left pane, select Downloads > SAP Software Distribution Center > Installations and Upgrades > My Company's Application Components. A list of your company's applications and any license-free products or components appear. 4. Select the files you want to download and add them to the Download Basket. The files you select are placed in the Download Basket. 5. To access the Download Basket, click Download Basket. 6. To access the Download Manager documentation, click Get Download Manager. 7. Follow the steps included in the Download Manager documentation to download the directory files.

2.6 Extracting directory files

The steps listed here describe how to install the zipped directories using Info-Zip. If you use a different unzip tool, see the unzip procedure included with that tool.

1. Copy the self-extracting directory files manually from the download package to the \temporary\ folder. 2. Locate and double-click the file.

Developer Guide 16 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Directory data The files are extracted and placed in the \temporary\ folder. 3. Copy the directory files from the \temporary\ folder to the location where you keep your directories. 4. Copy the zipped directory files manually from location of the extracted files to the location where you keep your directories. 5. Type unzip .zip -d . For ZIP4US, type unzip us_dirs_2004.zip -d /SAP BusinessObjects/SAP BusinessObjects Data Quality Management SDK/linux_x86_32/DataQuality/gac). 6. Repeat these steps for each required file.

Developer Guide Directory data © 2014 SAP SE or an SAP affiliate company. All rights reserved. 17 3 Samples

The best way to get started with this SDK is to examine, build, and run the provided sample programs.

The installation routine places folders containing the sample program files in each supported operating system and language (for example, \windows_32\Java\samples). The \samples folder contains files needed for integrating the public API and running Data Quality transforms built using this SDK.

3.1 Sample program files

The following is the folder structure for the source code, build, and run script the samples use to demonstrate a given Data Quality transform.

● cpp\ - contains files needed to integrate this SDK into a C++ environment.

○ inc\ - contains the SDK public headers you will include in your code. ○ lib\ - contains the SDK public libraries you need to link against for C++ code. You need not link the certifiedreportgenerator.lib library unless you intend to use the certified report generator. ○ samples\ - contains C++ sample drivers for several Data Quality transforms. ● dotNet\ - contains files needed to integrate the SDK into a .Net environment.

○ bin\ - contains the SDK public library you need to link against for .Net code. ○ samples\ - contains .Net sample drivers for several Data Quality transforms. ● Java\ - contains files needed to integrate the SDK into a Java environment.

○ bin\ - contains the SDK public library you need to link against for Java code. ○ samples\ - contains Java sample drivers for several Data Quality transforms. ● bin\ - contains many of the shared libraries and binaries needed to run the Data Quality transforms included in this package. This directory must be in your PATH and shared library load environment variable (such as LD_LIBRARY_PATH on Linux) for the shared objects and other required files to be loaded properly. The run scripts for the included samples set these variables for you. ● DataQuality\ - contains many files required for the Data Quality transforms to run. ● redist\ - contains the MSVC VS 2005 redistributable package that you must have installed to run the windows executables. ● xsd\ - contains all Data Quality transform configuration file XSD files. Note that the xsi:schemaLocation element in your configuration xml files must be able to locate these XSD files.

3.2 Building samples

All the build scripts included in a samples folder assume you are running from a command prompt with your compiler paths set up correctly. On Windows, for C++ and .Net builds, the devenv and dumpbin executable from the VisualStudio 2005 SP1 or greater should be available in your PATH environment variable. Likewise on Unix platforms, the appropriate compiler for that platform should available.

Developer Guide 18 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Samples For Java projects, JAVA_HOME must be set to a compatible JDK location so that the javac executable can be found.

For .Net projects, ensure the \\bin folder is added to your PATH environment variable prior to launching Visual Studio. The samples require these libraries to be found to create an instance of an SDK transform object.

To build all samples for a particular programming language use the build.bat (Windows) or build.sh (Unix).

1. Navigate to your desired language folder. 2. Navigate to the samples folder. 3. Run the build.bat (Windows) or build.sh (Unix) script to build all samples.

The script builds the samples.

Example

To build all C++ samples on Windows 32 bit, navigate to \windows_32\cpp\samples, and type build.bat.

To build a specific sample navigate one level further to the transform's sub directory and run the build.bat (Windows) or build.sh (Unix) script within that subdirectory.

3.3 Running samples

Before running the samples, you must have installed the address directory reference files and cleansing package, and built the sample.

All of the run scripts included in a samples folder assume you are running from a command prompt.

For Java projects, JAVA_HOME must be set to a compatible JDK or JRE location so that the Java executable can be found.

Each run script takes at least the configuration XML file as the first command line argument. Multi-record transforms such as Match also require you to list the input .txt file as the second argument to the run script.

1. Navigate to the sample transform directory. 2. Run the run.bat (Windows) or run.sh (Unix) run scripts with the necessary command line arguments, to setup your environment and run the sample.

Example

To run the Global Address Cleanse C++ sample on Windows 32 bit, navigate to the folder / windows_32/cpp/samples/gac and type run.bat EmDQ_GlobalAddressCleanse.xml.

To run the Match C++ sample on Windows 32 bit, navigate to the folder / windows_32/cpp/cpp/samples/match and type run.bat EmDQ_NameAddressMatch.xml MatchNameAddrUSSingleSource.txt.

Developer Guide Samples © 2014 SAP SE or an SAP affiliate company. All rights reserved. 19 4 API reference for C++

This section details the API for the C++ implementation.

The define for the C++ namespace is EmDQ.

By default, logging is not enabled. You must set a logger in order to use the logging capability of the methods in this API. We recommend that you set a log handler; otherwise you will not see specific warnings or error messages. See the documentation of the individual methods for details of the information that can be logged.

4.1 Message types

You can use the following message types in this implementation.

Message type Description

MESSAGETYPE_ERROR Used for error messages

MESSAGETYPE_WARNING Used for warning messages

MESSAGETYPE_INFO Used for information messages

MESSAGETYPE_TRACE Used for tracing messages

4.2 ToLatin1

Syntax

The ToLatin1 method converts Unicode data to standard Latin1 data. It is defined in the file utility.h.

char* ToLatin1(const uint16_t* str, char* dstBuf, int32_t dstBufLength, char invalidCharReplacement = '?');

Parameter Description

str [IN] The string to convert

dstBuf [OUT] The buffer to receive Latin1 chars and a NULL terminator

dstBufLength [IN] The size of dstBuf in bytes.

invalidCharReplacement [IN] The character to be used to replace invalid Latin1 characters (a 0 means to drop invalid characters)

This method is used convert the UCS2 characters of the str object to Latin1 characters and copy the Latin1 characters into the passed-in buffer. A NULL terminator is always added, so there must be room in the buffer for the entire string plus a NULL terminator.

Developer Guide 20 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ A UCS2 value greater than 255 is considered to be an invalid Latin1 character. An invalid Latin1 character is replaced with the invalidCharReplacement value. If this value is set to 0, then the invalid Latin1 character is deleted.

There are no conversions made for Locale.

Returns the dstBuf parameter.

4.3 CertifiedReportGenerator

Syntax

Class CertifiedReportGenerator is an implementation of the public StatisticsEvenHandler interface that can generate the certified mailing Statement of Address Accuracy (SERP), Address Matching processing Summary (AMAS), and Coding Accuracy Support System (CASS) 3553 reports.

This method Implements the StatisticsHandler interface.

void SetReportFile(const char* fileName, REPORT_TYPE report);

Parameter Description

fileName [IN] A valid filename and path where the report is to be generated

report [IN] Which report to write to fileName

This method tells CertifiedReportGenerator a path and file name to create the report. Valid report types are REPORT_3553, REPORT_AMAS, and REPORT_SERP. You must call this method for each report you want generated.

If the specified file exists, the previous version of the file is overwritten. If the path to the file specified does not exist, the file is not created and an error occurs.

This method must be called for each report you wish to have generated prior to using any transform.

Related Information

StatisticsHandler [page 39]

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 21 4.4 DataRecordSchema

Syntax

Class DataRecordSchema defines the layout of a Data Record.

int GetFieldCount();

This method returns the number of fields defined in the data record.

int GetFieldIndex(const uint16_t* fieldName);

Parameter Description

fieldName [IN] The name of the field to get

This method returns the field index of the fieldName field. Field names are treated case-insensitive (that is, NAME is equivalent to name). Returns the field index (0-based) that can be used in the other methods that have a field index as a parameter; otherwise, if fieldName is invalid, a value of -1 is returned.

int GetFieldIndex(const char* fieldName);

Parameter Description

fieldName [IN] The name of the field to get

This method returns the field index of the fieldName field. Field names are treated case-insensitive (that is, NAME is equivalent to name). Returns the field index (0-based) that can be used in the other methods that have a field index as a parameter; otherwise, if fieldName is invalid, a value of -1 is returned.

int GetFieldLength(int fieldIndex);

Parameter Description

fieldIndex [IN] The field for which to get information (0-based)

This method returns the length of the fieldIndex field; otherwise, on a non-fatal error, returns 0.

const uint16_t* GetFieldName(int fieldIndex);

Parameter Description

fieldIndex [IN] The field for which to get information (0-based)

This method returns the name of the fieldIndex field; otherwise, on a non-fatal error, returns 0.

bool GetFieldName(int fieldIndex, char* buffer, int bufferSize, char unicodeReplacement = '?');

Parameter Description

fieldIndex [IN] The field for which to get information (0-based)

buffer [OUT] The buffer where the field name is to be placed

Developer Guide 22 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ Parameter Description

bufferSize [IN] The size of the buffer

unicodeReplacement [IN] The character that is substituted when a character is encountered that cannot be represented as Latin1; set to to remove the character instead

This method gets the name of the fieldIndex field. Returns TRUE if successful; otherwise, on a non-fatal error, returns FALSE.

DATATYPE GetDataType(int fieldIndex, bool& status);

Parameter Description

fieldIndex [IN] The field for which to get information (0-based)

status [OUT] TRUE upon success; FALSE if a non-fatal error has occured

This method gets the datatype of the field.

4.5 Date

Syntax

Class Date represents a datatype. It is used to hold a date value for a record field.

Default Constructor Date();

Copy Constructor Date(const Date& rhs);

bool SetDate(const char* dateStr);

Parameter Description

dateStr [IN] The date in the format YYYYMMDD

This method sets the date of this object. The string must be in the format YYYYMMDD, where YYYY is the year, MM is the month and DD is the day. The year can range from 0 to 9999. The month can range from 1 to 12. The day can range from 1 to 31. If the string is not formatted correctly, the date value is not changed. Returns TRUE if the date is valid; otherwise, it returns FALSE.

bool SetDate(int year, int month, int day);

Parameter Description

year [IN] The year value from 0 to 9999

month [IN] The month value from 1 to 12

day [IN] The day value from 1 to 31

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 23 This method sets the date of this object. Returns TRUE if the date is valid; otherwise, it returns FALSE.

bool SetDay(int day);

Parameter Description

day [IN] The day value from 1 to 31

This method sets the day of this object. The day can range from 1 to 28-31, depending on the month. Returns TRUE if the day is valid for the month and year; otherwise, it returns FALSE .

bool SetMonth(int month);

Parameter Description

month [IN] The month value from 1 to 12

This method sets the month of this object. The month can range from 1 to 12. Returns TRUE if the month is valid for the day and year; otherwise, it returns FALSE .

bool SetYear(int year);

Parameter Description

year [IN] The year value from 0 to 9999

This method sets the year of this object. The year can range from 0 to 9999. Returns TRUE if the year is valid for the day and month; otherwise, it returns FALSE .

void GetDate(char* dateStr, int bufferSize) const;

Parameter Description

bufferSize [IN] The size of the dateStr destination buffer

dateStr [OUT] The date value in YYYYMMDD format

This method gets the date value of this object. The date is returned as a string with the format YYYYMMDD, where YYYY is the year, MM is the month and DD is the day.

int GetDay() const;

This method gets the day value of this object. It returns the day value from 1 to 31.

int GetMonth() const;

This method gets the month value of this object. It returns the month value from 1 to 12.

int GetYear() const;

This method gets the year value of this object. It returns the year value from 0 to 9999.

Developer Guide 24 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ 4.6 DateTime

Syntax

Class DateTime represents a datatype. It is used to hold a date and time value for a record field. This class inherits from both the Date and the Time class, so the methods of those classes are available also.

Default Constructor DateTime();

Copy Constructor DateTime(const DateTime& rhs);

bool SetDateTime(const char* dateTimeStr);

Parameter Description

dateTimeStr [IN] The date time in the format YYYYMMDDHHMMSSF

This method sets the date time of this object. The string must be in the format YYYYMMDDHHMMSSF, where YYYY is the year, MM is the month, DD is the day, HH is the hours, MM is the minutes, SS is the seconds, and F is an optional one digit of the fraction, which can be repeated. The year can range from 0 to 9999. The month can range from 1 to 12. The day can range from 1 to 31. The hours can range from 0 to 23. The minutes can range from 0 to 59. The seconds can range from 0 to 59. Each optional fraction digit can range from 0 to 9. If the string is not formatted correctly, the date time value is not changed.

Returns TRUE if the date is valid; otherwise, it returns FALSE.

void GetDateTime(char* dateTimeStr, int bufferSize) const;

Parameter Description

dateTimeStr [OUT] The date time in the format YYYYMMDDHHMMSSF

bufferSize [IN] The number of characters of the dateTimeStr buffer

This method gets the date time value of this object. The date time will be returned as a string with the following format YYYYMMDDHHMMSSF, where YYYY is the year, MM is the month, DD is the day, HH is the hours, MM is the minutes, SS is the seconds, and F is an optional one digit of the fraction, which can be repeated.

4.7 EmdqException

Syntax

Class EmdqException is the the exception class thrown by all public interfaces of this product. This class is required for processing. It is defined in the file exception.h.

virtual const uint16_t* GetMessage() const = 0;

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 25 This method returns this object's message in UCS2 characters.

virtual const char* GetMessageId() const = 0;

This method returns the message ID of this exception object. The message ID is in the format CCCNNN, where C is an alpha character and N is a numeric character. The CCC represents the source of the error. The NNN is the message number. For example, REC001 is the first message for the DataRecord class.

4.8 InputDataRecord

Syntax

Class InputDataRecord is the the main interface to the Input Data Record functionality. It inherits from the superclass DataRecord. This class is required for processing.

void Clear();

This method clears all of the fields of the data record. Each character field has a data length of 0.

void SetFieldData(int fieldIndex, const uint16_t* fieldData, int fieldDataLength = -1);

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

fieldDataLength [IN] The number of UCS2 characters in fieldData

This method sets the data of the fieldIndex field of the data record. fieldDataLength UCS2 characters are copied from the fieldData buffer to the specified data record field.

If the field is set as null, the null will be cleared.

If fieldDataLength is -1, which is the default, then fieldData is assumed to be NULL terminated and its length will be calculated.

void SetFieldData(int fieldIndex, const char* fieldData, int fieldDataLength = -1);

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

fieldDataLength [IN] The number of Latin1 characters in fieldData

This method sets the data of the fieldIndex field of the data record. fieldDataLength Latin1 characters are copied from the fieldData buffer to the specified data record field.

If the data is longer that the specified field's length, the data is truncated. If the data is shorter than the specified field's length, the field is copied left-justified into the field.

Developer Guide 26 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ If the field is set as null, the null will be cleared.

If fieldDataLength is -1, which is the default, then strlen() will be used to determine the length fieldData.

void SetFieldData(int fieldIndex, const Date& fieldData);

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record.

If the field is set as null, the null will be cleared.

If the field datatype is not compatible with Date, an exception is thrown.

void SetFieldData(int fieldIndex, const DateTime& fieldData);

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record.

If the field is set as null, the null will be cleared.

If the field datatype is not compatible with DateTime, an exception is thrown.

vvoid SetFieldData(int fieldIndex, const Time& fieldData);

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record.

If the field is set as null, the null will be cleared.

If the field datatype is not compatible with Time, an exception is thrown.

void SetFieldData(int fieldIndex, double fieldData);

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record.

If the field is set as null, the null will be cleared.

If the field datatype is not compatible with double, an exception is thrown.

void SetFieldData(int fieldIndex, int fieldData);

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 27 Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record.

If the field is set as null, the null will be cleared.

If the field datatype is not compatible with int, an exception is thrown.

void SetFieldNull(int fieldIndex);

Parameter Description

fieldIndex [IN] The field to set to NULL (0-based)

This method sets the field to NULL.

4.9 MessageHandler

Syntax

Class MessageHandler is a callback class to handle messages from a transform. This class is required for processing and its interface must be implemented by the integrating application.

virtual bool HandleMessage(MESSAGETYPE messageType, const char* messageId, const uint16_t* message) = 0;

Parameter Description

messageType [IN] The type of message to handle

messageId [IN] The ID of message to handle

message [IN] The message to handle

Implement to handle a message. This method is passed a message for the implementor to handle. Returns true upon success; returns false upon error and stops processing.

4.10 MultiRecordTransform

Syntax

Class MultiRecordTransform is the record processing Transform class for processing multiple records. This class is required for processing. Some of the methods listed here are inherted from the Transform class.

Developer Guide 28 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ Instances of MultiRecordTransform cannot be instantiated. You must use the CreateMultiRecordTransform methods in TransformFactory methods to create valid MultiRecordTransform instances.

MultiRecordTransform objects are used to represent Match.

MultiRecordTransformHelper* CreateHelper();

This method is used to create a helper for this transform. A helper is used to process records, just like the transform itself does. Typically a helper is run in a different thread than the transform. The helper will have all the same settings as this transform. The helper will share some of the transform's resources. It will also share the transform's handlers (log, statistics, etc). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics. Returns a pointer to a newly created helper object.

This method is not thread safe.

void DestroyHelper(MultiRecordTransformHelper* helper);

Parameter Description

helper [IN] The helper to be destroyed

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

void LoadInputDataRecord(InputDataRecord* record);

Parameter Description

record [IN] The input data record to load

This method loads an input data record into this transform. The input data record is copied, so the passed-in input data record is available for use upon return from this method. Do not attempt to use an input data record that belongs to a different transform.

void Process();

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

void SetProgressHandler(ProgressHandler* handler);

Parameter Description

handler [IN] The progress event handler

This method sets the progress event handler. A progress event handler is called by the transform as it processes the loaded input records. A progress event handler is optional. This method saves a shallow copy of the passed-in progress event handler. It is the application's responsibility to not delete the event handler until this transform has been destroyed.

const OutputDataRecord* UnloadOutputDataRecord();

This method unloads the next available output data record from this transform. There should be one output data record for each input data record that was loaded. An output data record becomes available after the transform has finished processing the input data record and posting results to the output data record.

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 29 Normally output records will be available for unloading after Process() has been called. But it is possible for a transform to make an output record available for unloading immediately after the call to LoadInputDataRecord(). The application is free to call this method at any time to see if there are any output records available for unloading.

Once Process() is called, all current output records must be unloaded before any new input records are loaded for processing.

Returns the next available output record; returns 0 if no records are available.

void ClearRecords();

This method clears all input and output records, and readies the transform to process again. Call this method only after you have loaded your input records, processed, and extracted your output.

ProgressHandler* GetProgressHandler();

This method returns a pointer to the current progress event hander. Returns a pointer to the progress event handler.

void SetStatisticsHandler(StatisticsHandler* handler);

Parameter Description

handler [IN] The statisticis event handler

This method sets the statistics event handler. A statistics event handler is called whenever a transform wishes to output statistics. Normally, statistics are output when the transform is terminating (see the method DestroyTransform()). A statistics event handler is optional. If omitted, the statistics are not output.

This method saves a shallow copy of the passed-in statistics event handler to be used for handling statistics events. It is the application's responsibility not to delete the event handler until this transform has been destroyed.

int GetExtraInfoLength(uint16_t* request);

Parameter Description

request [IN] A buffer with the request string

This method determines the length of the buffer to pass into the method GetExtraInfo().

void GetExtraInfo(uint16_t* request, uint16_t* response, int responseBufferLength);

Parameter Description

request [IN] A buffer with the request string

response [OUT] A buffer with a response string

responseBufferLength [IN] The length of the response string

This method is designed to get extra information from a transform. However, in its default implementation, the method returns only an empty string. If a transform has been customized to determine valid requests from this method and define appropriate responses, the information could be returned using this method. For example,

Developer Guide 30 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ the Global Address Cleanse transform could be customized to work with this method and return a list of available address cleanse engines.

InputDataRecord* GetInputDataRecord();

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application only has to get the input record once. The same input record may be used repeatedly.

StatisticsHandler* GetStatisticsHandler();

This method returns a pointer to the current statistics event handler.

const DataRecordSchema* GetInputSchema() const;

This method returns the schema of the input data record.

const DataRecordSchema* GetOutputSchema() const;

This method returns the schema of the output data record.

const StatisticsSchema* GetStatisticsSchema(int schemaIndex) const;

Parameter Description

schemaIndex [IN] The handler of the statistics schema for which to get information

This method returns schemaIndex statistics schema.

int GetStatisticsSchemaCount() const;

This method returns the number of statistics schemas defined for this transform. If statistics are not enabled, or if the transform does not provide statistics, the count returned will be 0.

4.11 MultiRecordTransformHelper

Syntax

Class MultiRecordTransformHelper is the helper class for the multi-record processing Transform class.

void LoadInputDataRecord(InputDataRecord* record);

Parameter Description

record [IN] The input data record to load

This method loads an input data record into this transform. The input data record is copied, so the passed-in input data record is available for use upon return from this method.

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 31 This method cannot process an input data record owned by a different transform.

void Process();

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

const OutputDataRecord* UnloadOutputDataRecord();

This method unloads the next available output data record from this transform. There should be one output data record for each input data record that was loaded. An output data record becomes available after the transform has finished processing the input data record and posting results to the output data record.

Normally output records are available for unloading after Process() has been called. However, a transform can make an output record available for unloading immediately after the call to LoadInputDataRecord(). The application is free to call this method at any time to check if there are any output records available for unloading.

Once Process() is called, all current output records must be unloaded before any new input records are loaded for processing.

Returns the next available output record, if available; otherwise, returns 0 if no records are available.

void ClearRecords();

Clears all input and output records and makes the transform ready to process again.

Call this method only after you have loaded your input records, processed, and extracted your output.

InputDataRecord* GetInputDataRecord();

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application must get the input record only once. The same input record may be used repeatedly.

4.12 OutputDataRecord

Syntax

Class OutputDataRecord is the the main interface to the Output Data Record functionality. It inherits its methods from the superclass DataRecord. This class is required for processing.

void GetFieldData(int fieldIndex, uint16_t* buffer, int bufferSize;

Parameter Description

fieldIndex [IN] The field to get (0-based)

buffer [OUT] Holds UCS2 data and a NULL terminator

Developer Guide 32 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ Parameter Description

bufferSize[IN] The size of buffer

Before calling GetFieldData you should call IsFieldNull using the same fieldIndex parameter. If IsFieldNull returns TRUE, the results of the function call are undefined and should not be used. If IsFieldNull returns FALSE, then you can call GetFieldData and reliably get the data of the fieldIndex field of the data record.

This method gets the data of the fieldIndex field of the data record. The data is copied to the buffer, including a NULL terminator. bufferSize indicates the number of UCS2 characters that will fit into the buffer.

If the data and NULL terminator is longer that the specified buffer length, the data is truncated. The NULL terminator is always copied.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, char* buffer, int bufferSize, char unicodeReplacement = '?');

Parameter Description

fieldIndex [IN] The field to get (0-based)

buffer [OUT] Holds Latin1 data and a NULL terminator

bufferSize [IN] The size of buffer

unicodeReplacement [IN] The character that is substitued if a character is encountered that cannot be recognized as a Latin1 character; set this parameter to 0 if you want characters that cannot be recognized as Latin1 to be removed

Before calling GetFieldData you should call IsFieldNull using the same fieldIndex parameter. If IsFieldNull returns TRUE, the results of the function call are undefined and should not be used. If IsFieldNull returns FALSE, then you can call GetFieldData and reliably get the data of the fieldIndex field of the data record.

This method gets the data of the fieldIndex field of the data record. The data is copied to the buffer, including a NULL terminator. bufferSize indicates the number of Latin1 characters that will fit into the buffer.

All UCS2 values above 255 are converted or dropped. All UCS2 values <= 255 are saved as is. Locality and/or codepage are not considered.

If the data and NULL terminator is longer that the specified buffer length, the data is truncated. The NULL terminator is always copied.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, uint16_t* buffer, int bufferSize, int& numCopied);

Parameter Description

fieldIndex [IN] The field to get (0-based)

buffer [OUT] Holds Latin1 data and a NULL terminator

bufferSize [IN] The size of buffer

numCopied [OUT] The number of UCS2 characters copied to buffer

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 33 Before calling GetFieldData you should call IsFieldNull using the same fieldIndex parameter. If IsFieldNull returns TRUE, the results of the function call are undefined and should not be used. If IsFieldNull returns FALSE, then you can call GetFieldData and reliably get the data of the fieldIndex field of the data record.

This method gets the data of the fieldIndex field of the data record. The data is copied to the buffer. bufferSize indicates the number of UCS2 characters that will fit into buffer. numCopied is set to the number of UCS2 characters copied into buffer . Thebuffer is not NULL-terminated.

If the data is longer that the specified buffer length, the data is truncated.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, Date& output);

Parameter Description

fieldIndex [IN] The field to get (0-based)

output [OUT] The resulting Date

Before calling GetFieldData you should call IsFieldNull using the same fieldIndex parameter. If IsFieldNull returns TRUE, the results of the function call are undefined and should not be used. If IsFieldNull returns FALSE, then you can call GetFieldData and reliably get the data of the fieldIndex field of the data record.

This method gets the data of the fieldIndex field of the data record. The data is copied to output.

If the field datatype is not compatible with Date, an exception is thrown.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, Time& output);

Parameter Description

fieldIndex [IN] The field to get (0-based)

output [OUT] The resulting Time

Before calling GetFieldData you should call IsFieldNull using the same fieldIndex parameter. If IsFieldNull returns TRUE, the results of the function call are undefined and should not be used. If IsFieldNull returns FALSE, then you can call GetFieldData and reliably get the data of the fieldIndex field of the data record.

This method gets the data of the fieldIndex field of the data record. The data is copied to output.

If the field datatype is not compatible with Time, an exception is thrown.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, DateTime& output);

Parameter Description

fieldIndex [IN] The field to get (0-based)

output [OUT] The resulting DateTime

Before calling GetFieldData you should call IsFieldNull using the same fieldIndex parameter. If IsFieldNull returns TRUE, the results of the function call are undefined and should not be used. If IsFieldNull returns FALSE, then you can call GetFieldData and reliably get the data of the fieldIndex field of the data record.

Developer Guide 34 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ This method gets the data of the fieldIndex field of the data record. The data is copied to output.

If the field datatype is not compatible with DateTime, an exception is thrown.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, double& output);

Parameter Description

fieldIndex [IN] The field to get (0-based)

output [OUT] The resulting double

Before calling GetFieldData you should call IsFieldNull using the same fieldIndex parameter. If IsFieldNull returns TRUE, the results of the function call are undefined and should not be used. If IsFieldNull returns FALSE, then you can call GetFieldData and reliably get the data of the fieldIndex field of the data record.

This method gets the data of the fieldIndex field of the data record. The data is copied to output.

If the field datatype is not compatible with double, an exception is thrown.

If the field fieldIndex is NULL, no processing happens.

void GetFieldData(int fieldIndex, int& output);

Parameter Description

fieldIndex [IN] The field to get (0-based)

output [OUT] The resulting int

Before calling GetFieldData you should call IsFieldNull using the same fieldIndex parameter. If IsFieldNull returns TRUE, the results of the function call are undefined and should not be used. If IsFieldNull returns FALSE, then you can call GetFieldData and reliably get the data of the fieldIndex field of the data record.

This method gets the data of the fieldIndex field of the data record. The data is copied to output.

If the field datatype is not compatible with int, an exception is thrown.

If the field fieldIndex is NULL, no processing happens.

int GetFieldDataLength(int fieldIndex);

Parameter Description

fieldIndex [IN] The field to get (0-based)

Before calling GetFieldDataLength you should call IsFieldNull using the same fieldIndex parameter. If IsFieldNull returns TRUE, the results of the function call are undefined and should not be used. If IsFieldNull returns FALSE, then you can call GetFieldData and reliably get the data of the fieldIndex field of the data record.

This method returns the number of characters in the field fieldIndex of the data record, or it returns 0 if the field is NULL.

bool IsFieldNull(int fieldIndex);

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 35 Parameter Description

fieldIndex [IN] The field to check (0-based)

This method determines if the field fieldIndex is NULL. Returns TRUE if the field is NULL; otherwise, it returns FALSE

4.13 ProgressHandler

Syntax

Class ProgressHandler is a callback class to show the progress of a MultiRecordTransform and allow the handler to end processing.

virtual bool HandleProgress(double percentDone) = 0;

Parameter Description

percentDone [IN] The percent done (0.0 - 100.0)

This method shows the percentage of completion for the current set of records being processed.

Returns TRUE on success; otherwise, returns FALSE and stops processing.

bool SetProgressInterval(int interval);

Parameter Description

interval [IN] The number of seconds to wait between calls to Progress()

This method specifies the interval the transform should wait between calls to the Progress() method. The interval is in seconds and should be greater than 0. Interval values less than or equal to 0 are invalid and will be ignored.

Returns TRUE on success; otherwise, returns FASLE to indicate an invalid interval.

int GetProgressInterval();

This method returns the current progress interval in seconds.

4.14 RecordTransform

Syntax

Class RecordTranform is the record processing Transform class for processing single records. This class is required for processing. Some of the methods listed here are inherted from the Transform class.

Developer Guide 36 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ Instances of RecordTransform cannot be instantiated. You must use the CreateRecordTransform methods in TransformFactory methods to create valid RecordTransform instances.

RecordTransform objects are used to represent Data Cleanse, USA Regulatory Address Cleanse, Global Address Cleanse and Geocoder.

RecordTransformHelper* CreateHelper();

This method is used to create a helper for this transform. A helper is used to process records. Typically a helper is run in a different thread than the transform. The helper has all the same settings as this transform. The helper shares some of the transform's resources. It also shares the transform's handlers (for example, for logs and statistics). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics.

void DestroyHelper(RecordTransformHelper* helper);

Parameter Description

helper [IN] The helper to be destroyed

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

const OutputDataRecord* Process(InputDataRecord* record);

Parameter Description

record [IN] The input data record to process

This method processes the input data record owned by this transform. The input data record can be obtained by calling GetInputDataRecord(). The input data record should be loaded with data before being passed to this method. This method will read the fields of the input data record and post results to an output data record. The output data record is returned. The results can be queried from the output data record. Do not attempt to process an input data record owned by a different transform. Returns output data record on success; returns 0 on a nonfatal error.

void SetStatisticsHandler(StatisticsHandler* handler);

Parameter Description

handler [IN] The statisticis event handler

This method sets the statistics event handler. A statistics event handler is called whenever a transform wishes to output statistics. Normally, statistics are output when the transform is terminating (see the method DestroyTransform()). A statistics event handler is optional. If omitted, the statistics are not output.

This method saves a shallow copy of the passed-in statistics event handler to be used for handling statistics events. It is the application's responsibility not to delete the event handler until this transform has been destroyed.

int GetExtraInfoLength(uint16_t* request);

Parameter Description

request [IN] A buffer with the request string

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 37 This method determines the length of the buffer to pass into the method GetExtraInfo().

void GetExtraInfo(uint16_t* request, uint16_t* response, int responseBufferLength);

Parameter Description

request [IN] A buffer with the request string

response [OUT] A buffer with a response string

responseBufferLength [IN] The length of the response string

This method is designed to get extra information from a transform. However, in its default implementation, the method returns only an empty string. If a transform has been customized to determine valid requests from this method and define appropriate responses, the information could be returned using this method. For example, the Global Address Cleanse transform could be customized to work with this method and return a list of available address cleanse engines.

InputDataRecord* GetInputDataRecord();

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application only has to get the input record once. The same input record may be used repeatedly.

StatisticsHandler* GetStatisticsHandler();

This method returns a pointer to the current statistics event handler.

const DataRecordSchema* GetInputSchema() const;

This method returns the schema of the input data record.

const DataRecordSchema* GetOutputSchema() const;

This method returns the schema of the output data record.

const StatisticsSchema* GetStatisticsSchema(int schemaIndex) const;

Parameter Description

schemaIndex [IN] The handler of the statistics schema for which to get information

This method returns schemaIndex statistics schema.

int GetStatisticsSchemaCount() const;

This method returns the number of statistics schemas defined for this transform. If statistics are not enabled, or if the transform does not provide statistics, the count returned will be 0.

Developer Guide 38 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ 4.15 RecordTransformHelper

Syntax

Class RecordTransformHelper is the helper class for the record processing Transform class.

const OutputDataRecord* Process(InputDataRecord* record);

Parameter Description

record [IN] The input data record to process

This method processes the input data record owned by this transform. The input data record can be obtained by calling GetInputDataRecord(). The input data record should be loaded with data before being passed to this method. This method will read the fields of the input data record and post results to an output data record.

Returns the output data record. The results can be queried from the output data record.

This method cannot process an input data record owned by a different transform.

InputDataRecord* GetInputDataRecord();

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application must get the input record only once. The same input record may be used repeatedly, but make sure you call Clear() before re-loading the record.

4.16 StatisticsHandler

Syntax

Class StatisticsHandler is a callback class to handle statistics records. This interface must be implemented by the integrating application.

virtual bool HandleStatistics(const OutputDataRecord* record) = 0;

Parameter Description

record [IN] The output statistics record to output

This method is passed an output record that holds statistics information. The application may query the record to determine which statistics table the record belongs.

The record pointer passed to this method should not be saved. The pointer becomes invalid after this method returns. The application must query and save any field data from the record that it intends to keep.

Returns TRUE upon success; otherwise, returns FALSE and produces an error and stops.

int GetRecordsRemainingCount() const;

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 39 This method gets the number of output records that remain to be passed to the Output() method. If the transform has a block of records to send, the transform calls SetRecordsRemainingCount() before each call to Output() to indicate how many records are left to send. The application may use this information to buffer the records instead of processing them individually.

Returns the number of additional records ready to be output.

const StatisticsSchema* GetStatisticsSchema() const;

This method returns the statistics schema.

4.17 StatisticsSchema

Syntax

Class StatisticsSchema defines the layout of a statistics table.

DATATYPE GetFieldDataType(int fieldIndex, bool& status) const = 0

Parameter Description

fieldIndex [IN] The field on which to get information (0-based)

status [OUT] TRUE on success; FALSE on a nonfatal error

This method gets the data type of the fieldIndex field.

const uint16_t* GetTableName();

This method returns the name of the table that this schema describes.

bool GetTableName(char* buffer, int bufferSize, char unicodeReplacement = '?');

Parameter Description

buffer [OUT] The buffer to hold the table name

bufferSize [IN] The size of the buffer

unicodeReplacement [IN] The character substituted if a character is encountered that cannot be represented as Latin1; set to 0 if you want characters that cannot be converted to be removed.

This method returns the name of the table that this schema describes.

bool AllowNull(int fieldIndex;

Parameter Description

fieldIndex [IN] The field on which to get information (0-based)

Developer Guide 40 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ This method indicates whether the fieldIndex field allows a NULL value. Returns TRUE if fieldIndex allows a NULL value or if fieldIndex is invalid; otherwise, returns FALSE.

bool IsPrimaryKey(int fieldIndex);

Parameter Description

fieldIndex [IN] The field on which to get information (0-based)

This method indicates whether the fieldIndex field is a primary key. Returns TRUE if fieldIndex is a primary key; otherwise, returns FALSE.

DataRecordSchema::DATATYPE GetDataType(int fieldIndex, bool& status);

Parameter Description

fieldIndex [IN] The field on which to get information (0-based)

status [OUT] FALSE if a non-fatal error has occurred

This method gets the datatype for the field.

4.18 Time

Syntax

Class Time represents a datatype. It is used to hold a time value for a record field.

bool SetTime(const char* timeStr);

Parameter Description

timeStr [IN] The time in the format HHMMSSF

This method sets the time of this object. The string must be in the format HHMMSSF, where HH is the hour, MM is the minutes, SS is the seconds, and F is an optional digit of the fraction, which can be repeated. The hours can range from 0 to 23. The minutes can range from 0 to 59. The seconds can range from 0 to 59. Each optional fraction digit can range from 0 to 9. If the string is not formatted correctly, the time value is not changed. Returns TRUE if the time is valid; otherwise, it returns FALSE.

bool SetHours(int hours);

Parameter Description

hours [IN] The hours value from 0 to 23

This method sets the hours of this object. The hours can range from 0 to 23. Returns TRUE if the hours is valid; otherwise, it returns FALSE.

bool SetMinutes(int minutes);

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 41 Parameter Description

minutes [IN] The minutes value from 0 to 59

This method sets the minutes of this object. The minutes can range from 0 to 59. Returns TRUE if the minutes is valid; otherwise, it returns FALSE.

bool SetSeconds(int seconds);

Parameter Description

seconds [IN] The seconds value from 0 to 59

This method sets the seconds of this object. The seconds can range from 0 to 59. Returns TRUE if the seconds is valid; otherwise, it returns FALSE.

bool SetFractionOfSeconds(double fraction);

Parameter Description

fraction [IN] The fractional seconds value.

This method sets the fractional seconds of this object. The range must be 0.0 <= fraction < 1.0. Returns TRUE if the seconds is valid; otherwise, it returns FALSE.

void GetTime(char* timeStr, int bufferLength) const;

Parameter Description

bufferLength [IN] The size of the timeStr buffer

timeStr [OUT] The time value in HHMMSSF format

This method gets the time value of this object. The time is returned as a string with the format HHMMSSF, where HH is the hours, MM is the minutes, SS is the seconds, and F is one digit of the fraction, which can be repeated.

int GetHours() const;

This method gets the hours value of this object. It returns the hours value from 0 to 23.

int GetMinutes() const;

This method gets the minutes value of this object. It returns the minutes value from 0 to 59.

int GetSeconds() const;

This method gets the seconds value of this object. It returns the seconds value from 0 to 59.

double GetFractionOfSeconds() const;

This method gets the fractional seconds of this object. The fractional seconds can range from 0 to < 1.

Developer Guide 42 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ 4.19 TransformFactory

Syntax

Class TransformFactory is used to create a Transform. This class is required for processing.

MultiRecordTransform* CreateMultiRecordTransform(const char* transformSettings, int transformSettingsBufferSize);

Parameter Description

transformSettingsBuffer [IN] The transform settings buffer

transformSettingsBufferSize The number of bytes in transformSettingsBuffer

This method creates a multi-record transform, using the XML found in transformSettingsBuffer.

transformSettingsBuffer is a buffer of bytes. The encoding is determined automatically or by the encoding XML attribute.

If you are passing UCS2 data (2 byte characters) to this method, then the encoding attribute in the XML must either not exist, or be UCS2/UTF16. If you are passing Latin1 (1 byte characters) to this method, the encoding attribute in the XML must either not exist, or be UTF-8/Latin1.

Returns a pointer to the created multi-record transform.

RecordTransform* CreateRecordTransform(const char* transformSettings, int transformSettingsBufferSize);

Parameter Description

transformSettingsBuffer [IN] The transform settings buffer

transformSettingsBufferSize The number of bytes in transformSettingsBuffer

This method creates a record transform, using the XML found in transformSettingsBuffer.

transformSettingsBuffer is a buffer of bytes. The encoding is determined automatically or by the encoding XML attribute.

If you are passing UCS2 data (2 byte characters )to this method, then the encoding attribute in the XML must either not exist, or be UCS2/UTF16. If you are passing Latin1 (1 byte characters) to this method, the encoding attribute in the XML must either not exist, or be UTF-8/Latin1.

Returns a pointer to the created record transform.

void DestroyTransform(Transform* transform);

Parameter Description

transform [IN] The transform to destroy

This method destroys a record transform or a multi-record transform. Destroying a transform may cause final statistics to be passed to the statistics event handler.

const char* UpgradeTransformSettings(const char* transformSettings, int transformSettingsLength, int& upgradedSettingsLength);

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 43 Parameter Description

transformSettings [IN] The transform settings buffer

transformSettingsLength [IN] The number of bytes in transformSettings

upgradedSettingsLength [OUT] The actual length in bytes of the upgraded XML

This method upgrades the transform settings. It upgrades a transform’s XML settings found in transformSettings. The transformSettings parameter is a buffer of bytes. The encoding is determined automatically or by the encoding XML attribute. If you are passing UCS2 data to this method (2 byte characters) then the encoding attribute in the XML must either not exist, or must be set to UCS2 or UTF16. If you are passing Latin (1 byte characters) to this method, then the encoding attribute in the XML must either not exist, or must be set to UTF-8 or Latin1.

Once the XML is successfully parsed, the version is checked. If the XML is current, then the pointer to the passed in buffer is returned. If the XML is not current, then the XML is upgraded with the latest version string and other transform-specific changes. The upgraded XML is then stored as a string into an internal buffer and that internal buffer is returned. The number of bytes in the returned buffer is stored in upgradedSettingsLength.

Returns the pointer to the passed in buffer if the XML is current; otherwise, returns the internal buffer that holds the updated XML. If the internal buffer is returned, the data in the buffer should be copied out of the buffer before any other calls to this object.

bool ValidateTransformSettings(const char* transformSettings, int transformSettingsBufferSize);

Parameter Description

transformSettingsBuffer [IN] The transform settings buffer

transformSettingsBufferSize The number of bytes in transformSettingsBuffer

This method validates the XML for the transform found in transformSettingsBuffer.

transformSettingsBuffer is a buffer of bytes. The encoding is determined automatically or by the encoding XML attribute.

If you are passing UCS2 data (2 byte characters) to this method, then the encoding attribute in the XML must either not exist, or be UCS2/UTF16. If you are passing Latin1 (1 byte characters) to this method, the encoding attribute in the XML must either not exist, or be UTF-8/Latin1.

void SetMessageHandler(MessageHandler* handler);

Parameter Description

handler [IN] The log event handler

This method sets the log event handler. A log event handler is called whenever a transform needs to output log information. A log event handler is required.

This method saves a shallow copy of the passed-in log event handler to be used for logging messages. The object will be used by each subsequently created Transform. If the application needs that each created transform has its own statistics event handler, the application must call this method with a new log event

Developer Guide 44 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for C++ handler before each new transform is created. It is the application's responsibility to not delete the event handler until all transforms that are using the event handler have been destroyed.

void SetLocale(const char* locale);

Parameter Description

locale [IN] The locale to use

This method sets the locale to use for messages produced by Transforms. If the locale is not supported, a warning will be logged to the MessageHandler set using SetMessageHandler and all messages will default to en_US.

MessageHandler* GetMessageHandler();

This method returns a pointer to the current log event handler.

const char* GetLocale() const;

This method gets the locale that is currently being used. If the locale set by a call to SetLocale is supported, this method will return that value. If the locale set using SetLocale was not supported, the default locale of en_US will be returned.

static const char* GetVersion();

This method gets the version of the Data Quality Management SDK being used.

Developer Guide API reference for C++ © 2014 SAP SE or an SAP affiliate company. All rights reserved. 45 5 API reference for Java

This section details the API for the Java implementation.

The package for the Java API is com.sap.emdq.

By default, logging is not enabled. You must set a logger in order to use the logging capability of the methods in this API. We recommend that you set a log handler; otherwise you will not see specific warnings or error messages. See the documentation of the individual methods for details of the information that can be logged.

5.1 Message types

You can use the following message types in this implementation.

Message type Description

MESSAGETYPE_ERROR Used for error messages

MESSAGETYPE_WARNING Used for warning messages

MESSAGETYPE_INFO Used for information messages

MESSAGETYPE_TRACE Used for tracing messages

5.2 CertifiedReportGenerator

Syntax

Class CertifiedReportGenerator is an implementation of the public StatisticsEvenHandler interface that can generate the certified mailing Statement of Address Accuracy (SERP), Address Matching processing Summary (AMAS), and U.S. Coding Accuracy Support System (CASS) 3553 reports.

CertifiedReportGenerator()

This method is the constructor and must be run before use of the Certified Report Generator.

void Destroy()

This method is required to create the reports. Only call this method after all processing is done. The object will no longer be valid after this call.

boolean handleStatistics(OutputDataRecord record)

Parameter Description

record The statistics record to output

Developer Guide 46 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for Java This method Implements the StatisticsHandler interface.

void setReportFile(ReportType reportType, String reportFile)

Parameter Description

reportType Which report to write to reportFile

reportFile A valid filename and path where the report is to be generated

This method tells CertifiedReportGenerator a path and file name to create the report. Valid reportType options are Cass3553Report, AmasReport, and SerpReport. You must call this method for report you want generated.

If the specified file exists, the previous version of the file is overwritten. If the path to the file specified does not exist, the file is not created and an error occurs.

This method must be called for each report you wish to have generated prior to using any transform.

Throws EmdqException.

Related Information

StatisticsHandler [page 62]

5.3 DataRecordSchema

Syntax

Class DataRecordSchema defines the layout of a Data Record.

int getFieldCount()

This method gets the field count.

Returns the number of fields defined in the data record.

Throws EmdqException.

int getFieldIndex(String fieldName)

Parameter Description

fieldName [IN] The name of the field to get

This method returns the field index of the fieldName field. Field names are treated case-insensitive (that is, NAME is equivalent to name).

Returns the field index (0-based) that can be used in the other methods that have a field index as a parameter; otherwise, if fieldName is invalid, a value of -1 is returned.

Developer Guide API reference for Java © 2014 SAP SE or an SAP affiliate company. All rights reserved. 47 Throws EmdqException.

int getFieldLength(int fieldIndex)

Parameter Description

fieldIndex [IN] The field for which to get information (0-based)

This method gets the length of the field fieldIndex.

Returns the length of the fieldIndex field; otherwise, on a non-fatal error, returns 0.

Throws EmdqException.

String getFieldName(int fieldIndex)

Parameter Description

fieldIndex [IN] The field for which to get information (0-based)

This method gets the name of the field fieldIndex.

Returns the name of the fieldIndex field; otherwise, on a non-fatal error, returns 0.

Throws EmdqException.

DataType getDataType(int fieldIndex)

Parameter Description

fieldIndex [IN] The field for which to get information (0-based)

This method gets the datatype of the field fieldIndex.

Returns the datatype of the fieldIndex field; otherwise, on a non-fatal error, returns 0.

Throws EmdqException.

5.4 EmdqException

Syntax

Class EmdqException is the the exception class thrown by all public interfaces of this product. This class is required for processing.

public String getMessageId()

This method returns the message ID of this exception object. The message ID is in the format CCCNNN, where C is an alpha character and N is a numeric character. The CCC represents the source of the error. The NNN is the message number. For example, REC001 is the first message for the DataRecord class.

Developer Guide 48 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for Java 5.5 InputDataRecord

Syntax

Class InputDataRecord is the main interface to the Input Data Record functionality. It inherits from the superclass DataRecord. This class is required for processing.

void clear()

This method clears all of the fields of the data record. Each character field has a data length of 0.

Throws EmdqException.

void setStringData(int fieldIndex, String fieldData)

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record. The data is copied from the fieldData buffer.

Throws EmdqException.

void setDateData(int fieldIndex, Calendar fieldData)

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record. If the field is set as null, the null will be cleared.

If the field is set as null, the null will be cleared.

Throws EmdqException if the field datatype is not compatible with Date.

void setDateTimeData(int fieldIndex, Calendar fieldData)

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record. If the field is set as null, the null will be cleared.

If the field is set as null, the null will be cleared.

Throws EmdqException if the field datatype is not compatible with DateTime.

void setTimeData(int fieldIndex, Calendar fieldData)

Developer Guide API reference for Java © 2014 SAP SE or an SAP affiliate company. All rights reserved. 49 Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record. If the field is set as null, the null will be cleared.

If the field is set as null, the null will be cleared.

Throws EmdqException if the field datatype is not compatible with Time.

void setDoubleData(int fieldIndex, double fieldData)

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record. If the field is set as null, the null will be cleared.

If the field is set as null, the null will be cleared.

Throws EmdqException if the field datatype is not compatible with double.

void setIntegerData(int fieldIndex, int fieldData)

Parameter Description

fieldIndex [IN] The field to set (0-based)

fieldData [IN] The field data to set

This method sets the data of the fieldIndex field of the data record. If the field is set as null, the null will be cleared.

If the field is set as null, the null will be cleared.

Throws EmdqException if the field datatype is not compatible with int.

void setFieldNull(int fieldIndex)

Parameter Description

fieldIndex [IN] The field to set to NULL (0-based)

This method sets the field to NULL.

Throws EmdqException.

Developer Guide 50 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for Java 5.6 MessageHandler

Syntax

Class MessageHandler is a callback class to handle messages from a transform. This class is required for processing and its interface must be implemented by the integrating application.

MessageHandler()

This method is the protected scope constructor. It must be called before the extended class is used.

abstract boolean handleMessage(MessageType type, String messageId, String message)

Parameter Description

type The type of message to handle

messageId The ID of message to handle

message The message to handle

This method handles log messages produced by this product. Set delegate object on TransformFactory using the LogHandler property.

Returns true upon success; returns false upon error and stops processing.

5.7 MultiRecordTransform

Syntax

Class MultiRecordTransform is the record processing Transform class for processing multiple records. This class is required for processing. Some of the methods listed here are inherted from the Transform class.

Instances of MultiRecordTransform cannot be instantiated. You must use the CreateMultiRecordTransform methods in TransformFactory methods to create valid MultiRecordTransform instances.

MultiRecordTransform objects are used to represent Match.

MultiRecordTransformHelper createHelper()

This method is used to create a helper for this transform. A helper is used to process records, just like the transform itself does. Typically a helper is run in a different thread than the transform. The helper will have all the same settings as this transform. The helper will share some of the transform's resources. It will also share the transform's handlers (log, statistics, etc). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics.

Returns a newly created helper object.

Developer Guide API reference for Java © 2014 SAP SE or an SAP affiliate company. All rights reserved. 51 This method is not thread safe.

Throws EmdqException.

void destroyHelper(MultiRecordTransformHelper helper)

Parameter Description

helper [IN] The helper to be destroyed

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

Throws EmdqException .

void loadInputDataRecord(InputDataRecord record)

Parameter Description

record [IN] The input data record to load

This method loads an input data record into this transform. The input data record is copied, so the passed-in input data record is available for use upon return from this method. Do not attempt to use an input data record that belongs to a different transform.

Throws EmdqException.

void process()

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

Throws EmdqException.

void setProgressHandler(ProgressHandler handler)

Parameter Description

handler [IN] The progress event handler

This method sets the progress event handler. A progress event handler is called by the transform as it processes the loaded input records. A progress event handler is optional. This method saves a shallow copy of the passed-in progress event handler. It is the application's responsibility to not delete the event handler until this transform has been destroyed.

Throws EmdqException.

OutputDataRecord unloadOutputDataRecord()

This method unloads the next available output data record from this transform. There should be one output data record for each input data record that was loaded. An output data record becomes available after the transform has finished processing the input data record and posting results to the output data record.

Normally output records will be available for unloading after Process() has been called, but it is possible for a transform to make an output record available for unloading immediately after the call to LoadInputDataRecord(). The application is free to call this method at any time to see if there are any output records available for unloading.

Developer Guide 52 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for Java Once Process() is called, all current output records must be unloaded before any new input records are loaded for processing.

Returns the next available output record; returns null if no records are available.

Throws EmdqException.

void clearRecords()

This method clears all input and output records, and readies the transform to process again. Call this method only after you have loaded your input records, processed, and extracted your output.

Throws EmdqException.

ProgressHandler getProgressHandler()

This method returns a pointer to the current progress event handler. Returns a pointer to the progress event handler.

Throws EmdqException.

void setStatisticsHandler(StatisticsHandler handler)

Parameter Description

handler [IN] The statisticis event handler

This method sets the statistics event handler. A statistics event handler is called whenever a transform wishes to output statistics. Normally, statistics are output when the transform is terminating (see the method DestroyTransform()). A statistics event handler is optional. If omitted, the statistics are not output.

This method saves a shallow copy of the passed-in statistics event handler to be used for handling statistics events. It is the application's responsibility not to delete the event handler until this transform has been destroyed.

Throws EmdqException.

String getExtraInfo(String request)

Parameter Description

request [IN] The request for the transform

This method is designed to get extra information from a transform. However, in its default implementation, the method returns only an empty string. If a transform has been customized to determine valid requests from this method and define appropriate responses, the information could be returned using this method. For example, the Global Address Cleanse transform could be customized to work with this method and return a list of available address cleanse engines.

Throws EmdqException.

InputDataRecord getInputDataRecord()

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application has to get the input record only once. The same input record may be used repeatedly.

Developer Guide API reference for Java © 2014 SAP SE or an SAP affiliate company. All rights reserved. 53 Throws EmdqException.

StatisticsHandler getStatisticsHandler()

This method returns the current statistics event handler.

Throws EmdqException.

DataRecordSchema getInputSchema()

This method returns the schema of the input data record.

Throws EmdqException.

DataRecordSchema getOutputSchema()

This method returns the schema of the output data record.

Throws EmdqException.

StatisticsSchema getStatisticsSchema(int schemaIndex)

Parameter Description

schemaIndex [IN] The statistics schema for which to get information (0- based)

This method returns schemaIndex statistics schema.

Throws EmdqException.

int getStatisticsSchemaCount()

This method returns the number of statistics schemas defined for this transform. If statistics are not enabled, or if the transform does not provide statistics, the count returned will be 0.

Throws EmdqException.

5.8 MultiRecordTransformHelper

Syntax

Class MultiRecordTransformHelper is the helper class for the multi-record processing Transform class.

Default Constructor MultiRecordTransformHelper();

Default Destructor virtual ~MultiRecordTransformHelper();

void loadInputDataRecord(InputDataRecord record)

Developer Guide 54 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for Java Parameter Description

record [IN] The input data record to load

This method loads an input data record into this transform. The input data record is copied, so the passed-in input data record is available for use upon return from this method.

This method cannot process an input data record owned by a different transform.

Throws EmdqException.

void process()

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

Throws EmdqException.

OutputDataRecord unloadOutputDataRecord()

This method unloads the next available output data record from this transform. There should be one output data record for each input data record that was loaded. An output data record becomes available after the transform has finished processing the input data record and posting results to the output data record.

Normally output records are available for unloading after Process() has been called. However, a transform can make an output record available for unloading immediately after the call to LoadInputDataRecord(). The application is free to call this method at any time to check if there are any output records available for unloading.

Once Process() is called, all current output records must be unloaded before any new input records are loaded for processing.

Returns the next available output record if available; otherwise, returns 0 if no records are available.

Throws EmdqException.

void clearRecords()

Clears all input records set and makes the transform ready to process again.

Call this method only after you have loaded your input records, processed, and extracted your output.

Throws EmdqException.

InputDataRecord getInputDataRecord()

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application must get the input record only once. The same input record may be used repeatedly.

Developer Guide API reference for Java © 2014 SAP SE or an SAP affiliate company. All rights reserved. 55 5.9 OutputDataRecord

Syntax

Class OutputDataRecord is the main interface to the Output Data Record functionality. It inherits its methods from the superclass DataRecord. This class is required for processing.

OutputDataRecord(long internalPtr)

Parameter Description

internalPtr The long value representing C++ pointer to OutputDataRecord object in native code

This method is the package scope constructor of this class.

String getStringData(int fieldIndex)

Parameter Description

fieldIndex [IN] The field to get (0-based)

This method gets the data of the fieldIndex field of the data record. In the process a new string is created and returned containing the data of the field.

Returns the field data on success; otherwise, returns null if there is no data.

Throws EmdqException.

Calendar getDateData(int fieldIndex)

Parameter Description

fieldIndex [IN] The field to get (0-based)

This method gets the Date data from a field and returns it as a copy in a Calendar object. If the field datatype is not compatible with Date, an exception is thrown.

Any time information stored within the Calendar object will be invalid.

Returns the field data. If fieldIndex is null, no processing occurs.

Throws EmdqException.

Calendar getTimeData(int fieldIndex)

Parameter Description

fieldIndex [IN] The field to get (0-based)

This method gets the Time data from a field and returns it as a copy in a Calendar object. If the field datatype is not compatible with Time, an exception is thrown.

Any time information stored within the Calendar object will be invalid.

Returns the field data. If fieldIndex is null, no processing occurs.

Developer Guide 56 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for Java Throws EmdqException.

Calendar getDateTimeData(int fieldIndex)

Parameter Description

fieldIndex [IN] The field to get (0-based)

This method gets the DateTime data from a field and returns it as a copy in a Calendar object. If the field datatype is not compatible with DateTime, an exception is thrown.

Returns the field data. If fieldIndex is null, no processing occurs.

Throws EmdqException.

double getDoubleData(int fieldIndex)

Parameter Description

fieldIndex [IN] The field to get (0-based)

This method gets the double data from a field and returns it as a copy in a Calendar object. If the field datatype is not compatible with double, an exception is thrown.

Returns the field data. If fieldIndex is null, no processing occurs.

Throws EmdqException.

int getIntData(int fieldIndex)

Parameter Description

fieldIndex [IN] The field to get (0-based)

This method gets the int data from a field and returns it as a copy in a Calendar object. If the field datatype is not compatible with int, an exception is thrown.

Returns the field data. If fieldIndex is null, no processing occurs.

Throws EmdqException.

boolean isFieldNull(int fieldIndex)

Parameter Description

fieldIndex [IN] The field to get (0-based)

This method determines if the field fieldIndex is null.

Returns TRUE if the field is null; otherwise, it returns FALSE.

Throws EmdqException.

int getFieldDataLength(int fieldIndex)

Parameter Description

fieldIndex [IN] The field to get (0-based)

output [OUT] The resulting int

Developer Guide API reference for Java © 2014 SAP SE or an SAP affiliate company. All rights reserved. 57 Before calling getFieldDataLength you should call IsFieldNull using the same fieldIndex parameter. If IsFieldNull returns TRUE, the results of the function call are undefined and should not be used. If IsFieldNull returns FALSE, then you can call GetFieldData and reliably get the data of the fieldIndex field of the data record.

This method gets the field data length.

Returns the number of characters in the field fieldIndex of the data record; returns -1 if fieldIndex is invalid.

Throws EmdqException.

5.10 ProgressHandler

Syntax

Class ProgressHandler is a callback class to show the progress of a MultiRecordTransform and allow the handler to end processing.

protected abstract boolean handleProgress(double percentDone)

Parameter Description

percentDone [IN] The percent done (0.0 - 100.0)

This method shows the percentage of completion for the current set of records being processed. Returns TRUE to continue processing; otherwise, returns FALSE to stop processing.

boolean setProgressInterval(int interval)

Parameter Description

interval [IN] The number of seconds to wait between calls to Progress()

This method specifies the interval the transform should wait between calls to Progress(). The interval is in seconds and must be greater than 0.

Throws EmdqException.

int getProgressInterval()

This method returns the current progress interval in seconds.

Throws EmdqException.

Developer Guide 58 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for Java 5.11 RecordTransform

Syntax

Class RecordTranform is the record processing Transform class for processing single records. This class is required for processing. Some of the methods listed here are inherted from the Transform class.

Instances of RecordTransform cannot be instantiated. You must use the CreateRecordTransform methods in TransformFactory methods to create valid RecordTransform instances.

RecordTransform objects are used to represent Data Cleanse, USA Regulatory Address Cleanse, Global Address Cleanse, and Geocoder.

RecordTransform(long internalPtr)

Parameter Description

internalPt The pointer representing the C++ pointer to the C++ RecordTransform object in native code

This method is the package scoped constructor for RecordTransform. It should be called only from TransformFactory.

RecordTransformHelper createHelper()

This method is used to create a helper for this transform. A helper is used to process records. Typically a helper is run in a different thread than the transform. The helper has all the same settings as this transform. The helper shares some of the transform's resources. It also shares the transform's handlers (for example, for logs and statistics). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics.

Returns a newly created helper object.

Throws EmdqException.

void destroyHelper(RecordTransformHelper helper)

Parameter Description

helper [IN] The helper to be destroyed

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

Throws EmdqException.

OutputDataRecord process(InputDataRecord record)

Parameter Description

record [IN] The input data record to process

This method processes the input data record owned by this transform. The input data record can be obtained by calling GetInputDataRecord(). The input data record should be loaded with data before being passed to this method. This method will read the fields of the input data record and post results to an output data record. The

Developer Guide API reference for Java © 2014 SAP SE or an SAP affiliate company. All rights reserved. 59 output data record is returned. The results can be queried from the output data record. Do not attempt to process an input data record owned by a different transform.

Values found within the OuputDataRecord return will only be valid until a Process is called again.

Returns output data record on success; returns null on a nonfatal error.

Throws EmdqException.

void setStatisticsHandler(StatisticsHandler handler)

Parameter Description

handler [IN] The statisticis event handler

This method sets the statistics event handler. A statistics event handler is called whenever a transform wishes to output statistics. Normally, statistics are output when the transform is terminating (see the method DestroyTransform()). A statistics event handler is optional. If omitted, the statistics are not output.

This method saves a shallow copy of the passed-in statistics event handler to be used for handling statistics events. It is the application's responsibility not to delete the event handler until this transform has been destroyed.

Throws EmdqException.

String getExtraInfo(String request)

Parameter Description

request [IN] The request for the transform

This method is designed to get extra information from a transform. However, in its default implementation, the method returns only an empty string. If a transform has been customized to determine valid requests from this method and define appropriate responses, the information could be returned using this method. For example, the Global Address Cleanse transform could be customized to work with this method and return a list of available address cleanse engines.

Throws EmdqException.

InputDataRecord getInputDataRecord()

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application has to get the input record only once. The same input record may be used repeatedly.

Throws EmdqException.

StatisticsHandler getStatisticsHandler()

This method returns the current statistics event handler.

Throws EmdqException.

DataRecordSchema getInputSchema()

This method returns the schema of the input data record.

Developer Guide 60 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for Java Throws EmdqException.

DataRecordSchema getOutputSchema()

This method returns the schema of the output data record.

Throws EmdqException.

StatisticsSchema getStatisticsSchema(int schemaIndex)

Parameter Description

schemaIndex [IN] The statistics schema for which to get information (0- based)

This method returns schemaIndex statistics schema.

Throws EmdqException.

int getStatisticsSchemaCount()

This method returns the number of statistics schemas defined for this transform. If statistics are not enabled, or if the transform does not provide statistics, the count returned will be 0.

Throws EmdqException.

5.12 RecordTransformHelper

Syntax

Class RecordTransformHelper is the helper class for the record processing Transform class.

OutputDataRecord process(InputDataRecord record)

Parameter Description

record [IN] The input data record to process

This method processes the input data record owned by this transform. The input data record can be obtained by calling GetInputDataRecord(). The input data record should be loaded with data before being passed to this method. This method will read the fields of the input data record and post results to an output data record.

Returns the output data record. The results can be queried from the output data record.

This method cannot process an input data record owned by a different transform.

Throws EmdqException.

InputDataRecord getInputDataRecord()

This method returns the input data record that holds this transform's input fields. An application can use this record to pass data to this transform. The application must get the input record only once. The same input record may be used repeatedly.

Developer Guide API reference for Java © 2014 SAP SE or an SAP affiliate company. All rights reserved. 61 Throws EmdqException.

5.13 StatisticsHandler

Syntax

Class StatisticsHandler is a callback class to handle statistics records. This interface must be implemented by the integrating application.

abstract boolean handleStatistics(OutputDataRecord record);

Parameter Description

record [IN] The output statistics record to output

This method is passed an output record that holds statistics information. The application may query the record to determine which statistics table the record belongs.

The record pointer passed to this method should not be saved. The pointer becomes invalid after this method returns. The application must query and save any field data from the record that it intends to keep.

Returns TRUE upon success; otherwise, returns FALSE and produces an error and stops.

int getRecordsRemainingCount()

This method gets the number of output records that remain to be passed to the Output() method. If the transform has a block of records to send, the transform calls SetRecordsRemainingCount() before each call to Output() to indicate how many records are left to send. The application may use this information to buffer the records instead of processing them individually.

Returns the number of additional records ready to be output.

StatisticsSchema getStatisticsSchema()

This method returns the statistics schema.

5.14 StatisticsSchema

Syntax

Class StatisticsSchema defines the layout of a statistics table.

DataType getFieldDataType(int fieldIndex)

Developer Guide 62 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for Java Parameter Description

fieldIndex [IN] The field on which to get information (0-based)

This method gets the data type of the fieldIndex field.

Throws EmdqException.

String getTableName()

This method returns the name of the table that this schema describes.

Throws EmdqException.

boolean allowNull(int fieldIndex)

Parameter Description

fieldIndex [IN] The field on which to get information (0-based)

This method indicates whether the fieldIndex field allows a NULL value. Returns TRUE if fieldIndex allows a NULL value or if fieldIndex is invalid; otherwise, returns FALSE.

Throws EmdqException.

boolean isPrimaryKey(int fieldIndex)

Parameter Description

fieldIndex [IN] The field on which to get information (0-based)

This method indicates whether the fieldIndex field is a primary key. Returns TRUE if fieldIndex is a primary key; otherwise, returns FALSE.

Throws EmdqException.

5.15 TransformFactory

Syntax

Class TransformFactory is used to create a Transform. This class is required for processing.

synchronized MultiRecordTransform createMultiRecordTransform (String transformSettings)

Parameter Description

transformSettings [IN] The transform settings buffer

This method creates a multi-record transform, using the XML found in transformSettings.

Returns a handle to the created multi-record transform.

Developer Guide API reference for Java © 2014 SAP SE or an SAP affiliate company. All rights reserved. 63 Throws EmdqException.

synchronized RecordTransform createRecordTransform(String transformSettings)

Parameter Description

transformSettings [IN] The transform settings buffer

This method creates a record transform, using the XML found in transformSettings.

Returns a handle to the created record transform.

Throws EmdqException.

void destroyTransform(Transform transform)

Parameter Description

transform [IN] The transform to destroy

This method destroys a record transform or a multi-record transform. Destroying a transform may cause final statistics to be passed to the statistics event handler. You must call this method when you are finished using a transform instance.

Throws EmdqException.

synchronized String upgradeTransformSettings(String transformSettings)

Parameter Description

transformSettings [IN] The transform settings buffer

This method upgrades the transform settings. It upgrades a transform’s XML settings found in transformSettings and returns it as a String.

Throws EmdqException.

synchronized boolean validateTransformSettings(String transformSettings)

Parameter Description

transformSettings [IN] The transform settings buffer

This method validates the XML for the transform found in transformSettings. Returns TRUE if the transform settings had no error; otherwise, returns FALSE.

Throws EmdqException.

void setMessageHandler(MessageHandler logEventHandler)

Parameter Description

logEventHandler [IN] The log event handler

This method sets the log event handler. A log event handler is called whenever a transform wishes to output log information. A log event handler is required.

public void setLocale(String locale)

Developer Guide 64 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for Java Parameter Description

locale [IN] The locale to use

This method sets the locale to use for messages produced by Transforms. If the locale is not supported, a warning will be logged to the MessageHandler set using SetMessageHandler and all messages will default to en_US.

public MessageHandler getMessageHandler()

This method returns the current log event handler.

String getLocale()

This method gets the locale that is currently being used. If the locale set by a call to SetLocale is supported, this method will return that value. If the locale set using SetLocale was not supported, the default locale of en_US will be returned.

static const char* getVersion();

This method gets the version of the Data Quality Management SDK being used.

Developer Guide API reference for Java © 2014 SAP SE or an SAP affiliate company. All rights reserved. 65 6 API reference for .Net

This section details the API for the .Net implementation.

The namespace for the .Net API is Sap.Emdq.

Ensure the \\bin folder is added to your PATH environment variable prior to launching Visual Studio.

In Visual Studio, set the application type within the application integrating this product to x86 for 32 bit applications and x64 for 64 bit applications. The default value of “Either” is not sufficient.

By default, logging is not enabled. You must set a logger in order to use the logging capability of the methods in this API. We recommend that you set a log handler; otherwise you will not see specific warnings or error messages. See the documentation of the individual methods for details of the information that can be logged.

6.1 Message types

You can use the following message types in this implementation.

Message type Description

ErrorMessage Used for error messages

WarningMessage Used for warning messages

InfoMessage Used for information messages

TraceMessage Used for tracing messages

6.2 EmDQException

Syntax

Class EmDQException is the the exception class thrown by all public interfaces of this product. This class is derived from the Exception class and the text of the message can be found in the Message member. This class is required for processing.

property System::String^ MessageId

This method returns the message ID of this exception object. The message ID is in the format CCCNNN, where C is an alpha character and N is a numeric character. The CCC represents the source of the error. The NNN is the message number. For example, REC001 is the first message for the DataRecord class.

property System::String^ Message

This method is a member of the base class, Exception. It contains the content of the message.

Developer Guide 66 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for .Net 6.3 LogHandler

Syntax

Class Loghandler is a delegate object used to pass messages from the SDK to the integrating application.

delegate bool LogHandler(LogMessageType type, string messageId, string message)

Parameter Description

type The type of message to handle

messageId The ID of the message to handle

message The message to handle

This method passes messages from the SDK to the integrating application.

6.4 MultiRecordProgressHandler

Syntax

Class MultiRecordProgresshandler is a delegate object used to pass progress information from the SDK to the integrating application.

delegate bool MultiRecordProgressHandler(double percentDone)

Parameter Description

percentDone The percent of progress

This method indicates indicates progress information.

6.5 MultiRecordTransform

Syntax

Class MultiRecordTransform is the record processing Transform class for processing multiple records. This class is required for processing. Some of the methods listed here are inherted from the Transform class.

Instances of MultiRecordTransform cannot be instantiated. You must use the CreateMultiRecordTransform methods in TransformFactory methods to create valid MultiRecordTransform instances.

Developer Guide API reference for .Net © 2014 SAP SE or an SAP affiliate company. All rights reserved. 67 MultiRecordTransform objects are used to represent Match.

System::Data::DataTable^ Process(System::Data::DataTable^ input);

Parameter Description

input The collection of input data records to process

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

To monitor progress of processing, register a delegate with the DataTable input, RowChanged event.

MultiRecordTransformHelper^ CreateHelper();

This method is used to create a helper for this transform. A helper is used to process records, just like the transform itself does. Typically a helper is run in a different thread than the transform. The helper will have all the same settings as this transform. The helper will share some of the transform's resources. It will also share the transform's handlers (log, statistics, etc). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics.

Returns a newly created helper object.

This method is not thread safe.

void DestroyHelper(MultiRecordTransformHelper^ helper);

Parameter Description

helper [IN] The helper to be destroyed

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

property System::Data::DataTable^ InputSchema

This method returns the schema of the input data record.

property System::Data::DataTable^ OutputSchema

This method returns the schema of the output data record.

property System::Data::DataSet^ StatisticsSchemas

This method gets the set of statistics tables that will be populated.

Statistics are received from the SDK by adding a DataRowChangeEventHandler delegate to each of the statistics tables contained in the StatisticsSchemas data set. The method associated with the delegate is called each time statistics are generated by a transform. This is generally done when the transform terminates.

property MultiRecordProgressHandler^ ProgressHandler

This property is a pointer to the progress handler delegate that will receive progress status from the SDK.

property int ProgressInterval

Developer Guide 68 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for .Net This property contains the interval, in seconds, that progress is reported from the SDK.

System::String^ GetExtraInfo(System::String^ request);

Parameter Description

request [IN] The request for the transform

This method is designed to get extra information from a transform. However, in its default implementation, the method returns only an empty string. If a transform has been customized to determine valid requests from this method and define appropriate responses, the information could be returned using this method. For example, the Global Address Cleanse transform could be customized to work with this method and return a list of available address cleanse engines.

6.6 MultiRecordTransformHelper

Syntax

Class MultiRecordTransformHelper is a shared resource based processing object that is a clone of a MultiRecordTransform.

System::Data::DataTable^ Process(System::Data::DataTable^ input);

Parameter Description

input The collection of input data records to process

This method processes the input data records that were loaded into this transform. When this method returns, the output data records with the posted results are ready to be unloaded.

6.7 RecordTransform

Syntax

Class RecordTranform is the record processing Transform class for processing single records. This class is required for processing. Some of the methods listed here are inherted from the Transform class.

Instances of RecordTransform cannot be instantiated. You must use the CreateRecordTransform methods in TransformFactory methods to create valid RecordTransform instances.

RecordTransform objects are used to represent Data Cleanse, USA Regulatory Address Cleanse, Global Address Cleanse, and Geocoder.

void Process(System::Data::DataRow^ input, System::Data::DataRow^ output);

Developer Guide API reference for .Net © 2014 SAP SE or an SAP affiliate company. All rights reserved. 69 Parameter Description

record [IN] The input data record to process

This method processes the input data record owned by this transform. The input data record can be obtained by calling GetInputDataRecord(). The input data record should be loaded with data before being passed to this method. This method will read the fields of the input data record and post results to an output data record. The output data record is returned. The results can be queried from the output data record. Do not attempt to process an input data record owned by a different transform. Returns output data record on success; returns nullptr on a nonfatal error.

RecordTransformHelper^ CreateHelper();

This method is used to create a helper for this transform. A helper is used to process records, just like the transform itself does. Typically a helper is run in a different thread than the transform. The helper has all the same settings as this transform. The helper shares some of the transform's resources. It also shares the transform's handlers (for example, for logs and statistics). So the advantage of using one or more helper objects instead of creating multiple, identical transforms is the savings on resources and the production of only one set of statistics.

void DestroyHelper(RecordTransformHelper^ helper);

Parameter Description

helper [IN] The helper to be destroyed

This method is used to destroy a helper that is no longer needed. All helpers of a transform must be destroyed before that transform is destroyed.

property System::Data::DataTable^ InputSchema;

This method returns the schema of the input data record.

property System::Data::DataTable^ OutputSchema;

This method returns the schema of the output data record.

property System::Data::DataSet^ StatisticsSchemas;

This method gets the set of statistics tables that will be populated.

System::String^ GetExtraInfo(System::String^ request);

Parameter Description

request [IN] The request for the transform

This method is designed to get extra information from a transform. However, in its default implementation, the method returns only an empty string. If a transform has been customized to determine valid requests from this method and define appropriate responses, the information could be returned using this method. For example, the Global Address Cleanse transform could be customized to work with this method and return a list of available address cleanse engines.

Developer Guide 70 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for .Net 6.8 RecordTransformHelper

Syntax

Class RecordTransformHelper is the helper class for the record processing Transform class.

void Process(System::Data::DataRow^ input, System::Data::DataRow^ output);

This method processes the input data record owned by this transform. The input data record can be obtained by calling GetInputDataRecord(). The input data record should be loaded with data before being passed to this method. This method will read the fields of the input data record and post results to an output data record.

Returns the output data record or nullptr on a nonfatal error. The results can be queried from the output data record.

This method cannot process an input data record owned by a different transform.

6.9 TransformFactory

Syntax

Class TransformFactory is used to create a Transform. This class is required for processing.

In the .Net implementation, this class also contains the methods used by the Certified Report Generator.

property LogHandler^ LoggerHandler;

Parameter Description

handler [IN] The log event handler

Set the log event handler. A log event handler is called whenever a transform wishes to output log information. A log event handler is required.

property System::String^ Locale;

Parameter Description

locale [IN] The locale to use

Sets the locale to use for messages produced by Transforms. If the locale is not supported, a warning is logged to the LogHandler set and all messages will default to en_US.

MultiRecordTransform^ CreateMultiRecordTransform(System::String^ transformSettings);

Developer Guide API reference for .Net © 2014 SAP SE or an SAP affiliate company. All rights reserved. 71 Parameter Description

transformSettings [IN] The transform settings buffer

This method creates a multi-record transform, using the XML found in transformSettings.

Returns a handle to the created multi-record transform.

RecordTransform^ CreateRecordTransform(System::String^ transformSettings);

Parameter Description

transformSettings [IN] The transform settings buffer

This method creates a record transform, using the XML found in transformSettings.

Returns a handle to the created record transform.

bool ValidateTransformSettings(System::String^ transformSettings);

Parameter Description

transformSettings [IN] The transform settings buffer

This method validates the XML transform settings found in transformSettings.

Returns TRUE if the transform settings had no errors; otherwise, returns FALSE.

System::String^ UpgradeTransformSettings(System::String^ transformSettings);

Parameter Description

transformSettings [IN] The transform settings

This method upgrades the transform settings. It upgrades a transform’s XML settings found in transformSettings and returns it as a String.

void DestroyTransform(Transform^ transform);

Parameter Description

transform [IN] The transform to destroy

This method disposes of a transform instance. This method is needed to ensure operations that are performed when the user is finished with the transform, such as producing statistics that can only be done when the transform is finished processing.

You must call this method when you are finished using a transform instance.

void SetSerpReport(System::String^ fileName);

Parameter Description

fileName [IN] The filename and path to where the report is to be generated

Developer Guide 72 © 2014 SAP SE or an SAP affiliate company. All rights reserved. API reference for .Net This method generates certified mailing reports. You must call this method prior to creating a transform if you want the SERP report generated.

void SetAmasReport(System::String^ fileName);

Parameter Description

fileName [IN] The filename and path to where the report is to be generated

This method generates certified mailing reports. You must call this method prior to creating a transform if you want the AMAS report generated.

void SetCass3553Report(System::String^ fileName);

Parameter Description

fileName [IN] The filename and path to where the report is to be generated

This method generates certified mailing reports. You must call this method prior to creating a transform if you want the CASS 3553 report generated.

Developer Guide API reference for .Net © 2014 SAP SE or an SAP affiliate company. All rights reserved. 73 7 Address cleanse concepts

This product allows you to create applications that use many address cleanse features, from basic parsing and standardizing to more advanced concepts unique to only some transforms.

Address cleanse provides a corrected, complete, and standardized form of your original address data. With the USA Regulatory Address Cleanse transform and for some countries with the Global Address Cleanse transform, address cleanse can also correct or add postal codes.

What happens during address cleanse?

The USA Regulatory Address Cleanse transform and the Global Address Cleanse transform cleanse your data in the following ways:

● Verify that the locality, region, and postal codes agree with one another. If your data has just a locality and region, the transform usually can add the and vice versa (depending on the country). ● Standardize the way the address line looks. For example, they can add or remove punctuation and abbreviate or spell-out the primary type (depending on what you want). ● Identify undeliverable addresses, such as vacant lots and condemned buildings (USA records only). ● Assign diagnostic codes to indicate why addresses were not assigned or how they were corrected. .

Reports

The USA Regulatory Address Cleanse transform provides data for the creation of the USPS Form 3553 (required for CASS) and the NCOALink Summary Report. The Global Address Cleanse transform provides data for the creation of the Canadian SERP—Statement of Address Accuracy Report, the Australia Post’s AMAS report, and the New Zealand SOA Report.

7.1 Setting up the reference files

The USA Regulatory Address Cleanse transform and the Global Address Cleanse transform and engines rely on directories (reference files) in order to cleanse your data.

Directories

To correct addresses and assign codes, the address cleanse transforms rely on databases called postal directories.

Besides the basic address directories, there are many specialized directories that the USA Regulatory Address Cleanse transform uses:

Developer Guide 74 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Address cleanse concepts ● DPV® ● Early Warning System (EWS) ● eLOT® ● GeoCensus ● LACSLink® ● NCOALink® ● RDI™ ● SuiteLink™ ● Z4Change

These features help extend address cleansing beyond the basic parsing and standardizing.

Defining directory file locations

In the transform, you must tell the transform or engine where your directory (reference) files are located.

Caution

Incompatible or out-of-date directories can render the software unusable. The system administrator must install weekly, monthly or bimonthly directory updates for the USA Regulatory Address Cleanse Transform; monthly directory updates for the Australia and Canada engines; and quarterly directory updates for the Global Address engine to ensure that they are compatible with the current software.

Related Information

Directory data [page 13]

Developer Guide Address cleanse concepts © 2014 SAP SE or an SAP affiliate company. All rights reserved. 75 8 USA Regulatory Address Cleanse

The USA Regulatory Address Cleanse transform identifies, parses, validates, and corrects USA address data according to the U.S. Coding Accuracy Support System (CASS). This transform supports the generation of data that can be used to generate the USPS Form 3553 and can output many useful codes to your records. You can also run in a non-certification mode as well as produce suggestion lists.

Note

If an input record has characters not included in the Latin1 code page, the USA Regulatory Address Cleanse transform will not process that data. Instead, the software sends the mapped input record to the corresponding standardized output field (if applicable). No other output fields will be populated for that record. If your Unicode database has valid U.S. addresses from the Latin1 character set, the transform processes as normal.

If you perform both data cleansing and matching, the USA Regulatory Address Cleanse transform typically should process the data before the Data Cleanse transform, as well as any of the Match transforms.

The following sections describe the configurations for the USA Regulatory Address Cleanse XML. You can find examples of the XML configurations with the samples installed with the product.

8.1 USPS DPV®

DPV is a USPS product developed to assist users in validating the accuracy of their address information. DPV compares Postcode2 information against the DPV directories to identify known addresses and potential problems that may cause an address to become undeliverable.

DPV is available for U.S. data in the USA Regulatory Address Cleanse transform only.

You can enable DPV in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

Note

DPV processing is required for CASS certification. If you are not processing for CASS certification, you can choose to run your jobs in non-certified mode and still enable DPV.

Caution

If you choose to disable DPV processing, the software will not generate the CASS-required documentation and your mailing will not be eligible for postal discounts.

Related Information

Assignment options [page 130]

Developer Guide 76 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse 8.1.1 Benefits of DPV

DPV can be beneficial in the following areas:

● Mailing: DPV helps to screen out undeliverable-as-addressed (UAA) mail and helps to reduce mailing costs. ● Information quality: DPV increases the level of data accuracy by verifying an address down to the individual house, suite, or apartment instead of only the block face. ● Increased assignment rate: DPV may increase assignment rate through the use of DPV tiebreaking to resolve a tie when other tie-breaking methods are not conclusive. ● Preventing mail-order-fraud: DPV can eliminate shipping of merchandise to individuals who place fraudulent orders by verifying valid delivery addresses and Commercial Mail Receiving Agencies (CMRA).

8.1.2 DPV security

The USPS has instituted processes that monitor the use of DPV. Each company that purchases the DPV functionality is required to sign a legal agreement stating that it will not attempt to misuse the DPV product. If a user abuses the DPV product, the USPS has the right to prohibit the user from using DPV in the future.

8.1.2.1 False positive addresses

The USPS has included false positive addresses in the DPV directories as an added security to prevent DPV abuse. Depending on what type of user you are and your license key codes, the software's behavior varies when it encounters a false positive address. The following table explains the behaviors for each user type:

User type Software behavior Read about:

End users DPV processing is terminated. Obtaining DPV unlock code from SAP Support

End users with a stop processing DPV processing continues. Sending false positive logs to the USPS alternative agreement

Service providers DPV processing continues. Sending false positive logs to the USPS

8.1.2.2 Stop processing alternative

End users may establish a Stop Processing Alternative agreement with the USPS and SAP.

Establishing a stop processing agreement allows you to bypass any future directory locks. The Stop Processing Alternative is not an option in the software, it is a key code that you obtain from SAP Support.

First you must obtain the proper permissions from the USPS and then provide proof of permission to SAP Support. Support will then provide a key code that disables the directory locking function in the software.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 77 When you obtain the Stop Processing Alternative key code from SAP Support, enter it into the SAP License Manager. With the Stop Processing Alternative key code in place, the software takes the following actions when a false positive is encountered:

● Marks the record as a false positive. ● Generates a log file containing the false positive address. ● Notes the path to the log files in the error log. ● Generates a US Regulatory Locking Report containing the path to the log file. ● Continues processing your job.

Even though your job continues processing, you are required to send the false positive log file to the USPS to notify them that a false positive address was detected. The USPS must release the list before you can use it for processing.

8.1.2.3 DPV false positive logs

The software generates a false-positive log file any time it encounters a false positive record, regardless of how the job is set up. The software creates a separate log file for each mailing list that contains a false positive. If multiple false positives exist within one mailing list, the software writes them all to the same log file.

Related Information

Sending DPV false positive logs to the USPS [page 80] Retrieving the DPV unlock code from SAP [page 79]

8.1.2.3.1 DPV log file name and location

The software stores DPV false positive log files in the directory specified for the USPS Log Path in the Reference Files group.

Note

The USPS log path that you enter must be writable. An error is issued if you have entered a path that is not writable.

Log file naming convention

The software automatically names DPV false positive logs like this: dpvl####.log, where #### is a number between 0001 and 9999. For example, the first log file generated is dpvl0001.log, the next one is dpvl0002.log, and so on.

Developer Guide 78 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse Note

When you have set the degree of parallelism to greater than 1, the software generates one log per thread. During a job run, if the software encounters only one false positive record, one log will be generated. However, if it encounters more than one false positive record and the records are processed on different threads, then the software will generate one log for each thread that processes a false positive record.

Related Information

Reference files [page 128]

8.1.2.4 DPV locking for end users

This locking behavior is applicable for end users.

When the software finds a false positive address, DPV processing is discontinued for the remainder of the data flow. The software also takes the following actions:

● Marks the record as a false positive address. ● Issues a message in the error log stating that a DPV false positive address was encountered. ● Includes the false positive address and lock code in the error log. ● Continues processing your data flow without DPV processing. ● Generates a lock code. ● Generates a false positive log.

To restore DPV functionality, users must obtain a DPV unlock code from SAP Support.

8.1.2.5 Retrieving the DPV unlock code from SAP

These steps are applicable for end users who do not have a Stop Processing Alternative agreement with the USPS. When you receive a processing message that DPV false positive addresses are present in your address list, use the SAP USPS Unlock Utility to obtain an unlock code.

1. Navigate to http://service.sap.com/bosap-unlock to open the SAP Service Market Place (SMP) unlock utility page. 2. Click Retrieve USPS Unlock Code. 3. Click Search and select an applicable Data Services system from the list. 4. Enter the lock code found in the dpvx.txt file (location is specified in the DPV Path option in the Reference Files group). 5. Select DPV as the lock type. 6. Select BOJ-EIM-DS as the component.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 79 7. Enter the locking address that is listed in the dpvx.txt file. 8. Attach the dpvl####.log file (location is specified in the USPS Log Path option in the Reference Files group). 9. Click Submit. The unlock code displays. 10. Copy the unlock code and paste it into the dpvw.txt file, replacing all contents of the file with the unlock code (location is specified in the DPV path option of the Reference Files group). 11. Remove the record that caused the lock from the database, and delete the dpvl####.log file before processing the list again.

Tip

Keep in mind that you can only use the unlock code one time. If the software detects another false-positive (even if it is the same record), you will need to retrieve a new DPV unlock code.

If an unlock code could not be generated, a message is still created and is processed by a Technical Customer Assurance engineer (during regular business hours).

Note

If you are an end user who has a Stop Processing Alternative agreement, follow the steps to send the false positive log to the USPS.

8.1.2.6 Sending DPV false positive logs to the USPS

Service providers should follow these steps after receiving a processing message that DPV false positive addresses are present in their address list. End users with a Stop Processing Alternative agreement should also follow these steps after receiving a processing message that DPV false positive addresses are present in their address list.

1. Send an email to the USPS NCSC at [email protected], and include the following information:

○ Type “DPV False Positive” as the subject line ○ Attach the dpvl####.log file or files that were generated by the software (location is specified in the USPS Log Path directory option in the Reference Files group)

The USPS NCSC uses the information to determine whether the list can be returned to the mailer. 2. After the USPS NCSC has released the list that contained the locked or false positive record:

○ Delete the corresponding log file or files ○ Remove the record that caused the lock from the list and reprocess the file

If you are an end user who does not have a Stop Processing Alternative agreement, follow the steps to retrieve the DPV unlock code from SAP Support.

8.1.3 DPV monthly directories

DPV directories are shipped monthly with the USPS directories in accordance with USPS guidelines.

Developer Guide 80 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse The directories expire in four months. The date on the DPV directories must be the same date as the Address directory.

Do not rename any of the files. DPV will not run if the file names are changed. The following is a list of the DPV directories:

● dpva.dir ● dpvb.dir ● dpvc.dir ● dpvd.dir ● dpv_vacant.dir ● dpv_no_stats.dir

8.1.4 Required information in the job setup

When you set up for DPV processing, the following options in the USPS License Information group are required:

● Customer Company Name ● Customer Company Address ● Customer Company Locality ● Customer Company Region ● Customer Company Postcode1 ● Customer Company Postcode2

8.1.5 DPV output fields

Several output fields are available for reporting DPV processing results:

Field Description

DPV_CMRA The DPV Commercial Mail Receiving Agency (CMRA) component that is generated for this re­ cord.

L = The address triggered DPV locking.

N = The address is not a CMRA

Y = The address is a valid CMRA

= A blank output value indicates that Enable DPV is set to No, DPV processing is cur­ rently locked, or the transform cannot assign the input address.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 81 Field Description

DPV_Footnote DPV footnotes are required for CASS. The footnotes contain the following information: AA = Input address matches to the postcode2 file.

A1 = Input address does not match to the postcode2 file.

BB = All input address field values match to DPV.

CC = Input address primary number matches to DPV, but the secondary number does not match (the secondary is present but invalid).

F1 = Input address matches a military address.

G1 = Input address matches a general delivery address.

M1 = Input address primary number is missing.

M3 = Input address primary number is invalid.

N1 = Input address primary number matches to DPV but the address is missing the secondary number.

P1 = Input address is missing the RR or HC Box number.

P3 = Input address has an invalid PO, RR, or HC number.

RR = Input address matches to CMRA.

R1 = Input address matches to CMRA, but the secondary number is not present.

U1 = Input address matches a unique address.

DPV_NoStats No Stats indicator. No Stats means that the address is a vacant property, it receives mail as a part of a drop, or it does not have an established delivery yet.

Y = Address is flagged as No Stats in DPV data.

N = Address is not flagged as No Stats.

= Address was not looked up.

Note

The US Addressing report contains DPV No Stats counts in the DPV Summary section.

Developer Guide 82 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse Field Description

DPV_Status The DPV status component that is generated for this record. D = The primary range is a confirmed delivery point, but the secondary range was not available on input.

L = The address triggered DPV locking.

N = The address is not a valid delivery point.

S = The primary range is a valid delivery point, but the parsed secondary range is not valid in the DPV directory.

Y = The address is a confirmed delivery point. The primary range and secondary range (if present) are valid.

= A blank output value indicates that Enable DPV is set to No, DPV processing is cur­ rently locked, or the transform cannot assign the input address.

DPV_Vacant Vacant address indicator. Y = Address is vacant.

N = Address is not vacant.

= Address was not looked up.

Note

The US Addressing report contains DPV Vacant counts in the DPV Summary section.

8.1.6 Non certified mode

End users can set up jobs with DPV disabled if the end user is not a CASS customer but still wants a Postcode2 added to addresses. The non-CASS option, Assign Postcode2 to Non DPV, enables the software to assign a Postcode2 when an address does not DPV-confirm.

Caution

When DPV processing is disabled, the software does not generate the CASS-required documentation and the mailing is not eligible for postal discounts.

Related Information

Non Certified options [page 144]

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 83 8.1.7 DPV performance

The additional time required to perform DPV processing may affect processing time. Processing time may vary with the DPV feature based on operating system, system configuration, and other variables that may be unique to your operating environment.

You can decrease the time required for DPV processing by loading DPV directories into system memory before processing.

8.1.7.1 Memory usage

You may need to install additional memory on your operating system for DPV processing. We recommend a minimum of 768 MB to process with DPV enabled.

To determine the amount of memory required to run with DPV enabled, check the size of the DPV directories (recently about 600 MB) and add that to the amount of memory required to run the software.

The size of the DPV directories will vary depending on the amount of new data in each directory release.

Make sure that your computer has enough memory available before performing DPV processing.

To find the amount of disk space required to cache the directories, see the Supported Platforms document in the SAP Support portal.

8.1.7.2 Cache DPV directories

To better manage memory usage when you have enabled DPV processing, choose to cache the DPV directories.

Related Information

Transform performance [page 127]

8.1.7.3 Running multiple jobs with DPV

When running multiple DPV jobs and loading directories into memory, you should add a 10-second pause between jobs to allow time for the memory to be released. For more information about setting this properly, see your operating system manual.

If you don't add a 10-second pause between jobs, there may not be enough time for your system to release the memory used for caching the directories from the first job. The next job waiting to process may produce an error or access the directories from disk if there is not enough memory to cache directories, resulting in performance degradation.

Developer Guide 84 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse 8.1.8 DPV No Stats indicators

The USPS uses No Stats indicators to mark addresses that fall under the No Stats category. The software uses the No Stats table when you have DPV enabled in a job. The USPS puts No Stats addresses in three categories:

● Addresses that do not have delivery established yet. ● Addresses that receive mail as part of a drop. ● Addresses that have been vacant for a certain period of time.

8.1.8.1 No Stats table

You must install the No Stats table (dpv_no_stats.dir) before the software performs DPV processing. The No Stats table is supplied by SAP with the DPV directory install.

The software automatically checks for the No Stats table in the directory folder that you indicate in your job setup. The software performs DPV processing based on the install status of the directory.

dpv_no_stats.dir Type of processing Results

Installed DPV The software automatically outputs No Stats indicators when you include the DPV_NoStats output field in your job.

Not installed DPV The software automatically skips the No Stats processing and does not issue an error message. The software will per­ form DPV processing but won't populate the DPV_NoStat output field.

8.1.8.2 No Stats output field

Use the DPV_NoStats output field to post No Stat indicator information to an output file.

No Stat means that the address is a vacant property, it receives mail as a part of a drop, or it does not have an established delivery yet.

Related Information

DPV output fields [page 81]

8.1.9 DPV Vacant indicators

The software provides vacant information in output fields and reports using DPV vacant counts. The USPS DPV vacant lookup table is supplied by SAP with the DPV directory install.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 85 The USPS uses DPV vacant indicators to mark addresses that fall under the vacant category. The software uses DPV vacant indicators when you have DPV enabled in your job.

Tip

The USPS defines vacant as any delivery point that was active in the past, but is currently not occupied (usually over 90 days) and is not currently receiving mail delivery. The address could receive delivery again in the future. Vacant does not apply to seasonal addresses.

8.1.9.1 DPV address-attribute output fields

Vacant indicators for the assigned address are available in the DPV_Vacant output field.

Note

The US Addressing report contains DPV Vacant counts in the DPV Summary section.

Related Information

DPV output fields [page 81]

8.2 USPS eLOT® eLOT is available for U.S. records in the USA Regulatory Address Cleanse transform only. eLOT takes line of travel one step further. The original LOT narrowed the mail carrier's delivery route to the block face level (Postcode2 level) by discerning whether an address resided on the odd or even side of a street or thoroughfare. eLOT narrows the mail carrier's delivery route walk sequence to the house (delivery point) level. This allows you to sort your mailings to a more precise level.

You can enable eLOT in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

Related Information

Assignment options [page 130] Setting up the reference files [page 74]

Developer Guide 86 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse 8.3 Early Warning System (EWS)

EWS helps reduce the amount of misdirected mail caused when valid delivery points are created between national directory updates. EWS is available for U.S. records in the USA Regulatory Address Cleanse transform only.

You can enable EWS in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

Related Information

Assignment options [page 130]

8.3.1 Overview of EWS

The EWS feature is the solution to the problem of misdirected mail caused by valid delivery points that appear between national directory updates. For example, suppose that 300 Main Street is a valid address and that 300 Main Avenue does not exist. A mail piece addressed to 300 Main Avenue is assigned to 300 Main Street on the assumption that the sender is mistaken about the correct suffix.

Now consider that construction is completed on a house at 300 Main Avenue. The new owner signs up for utilities and mail, but it may take a couple of months before the delivery point is listed in the national directory. All the mail intended for the new house at 300 Main Avenue will be mis-directed to 300 Main Street until the delivery point is added to the national directory.

The EWS feature solves this problem by using an additional directory which informs CASS users of the existence of 300 Main Avenue long before it appears in the national directory. When using EWS processing, the previously mis-directed address now defaults to a 5-digit assignment.

8.3.2 EWS directory

The EWS directory contains four months of rolling data. Each week, the USPS adds new data and drops a week's worth of old data. The USPS then publishes the latest EWS data. Each Friday, SAP converts the data to our format (EWyymmdd.zip) and posts it on the SAP Business User Support site at https://service.sap.com/bosap- downloads-usps .

8.4 SuiteLink™

SuiteLink is an option in the USA Regulatory Address Cleanse transform.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 87 SuiteLink uses a USPS directory that contains multiple files of specially indexed address information, like secondary numbers and unit designators, for locations identified as high-rise default buildings.

With SuiteLink you can build accurate and complete addresses by adding suite numbers to high-rise business addresses. With the secondary address information added to your addresses, more of your pieces are sorted by delivery sequence and delivered with accuracy and speed.

SuiteLink is required for CASS

Beginning with CASS Cycle N, SuiteLink is required when you process in CASS compliant mode (in which the Disable certification transform option is set to No). If you have disabled SuiteLink in your job setup, but the “Disable certification” option is set to No, an error message is issued and processing does not continue.

Related Information

Assignment options [page 130]

8.4.1 Benefits of SuiteLink

Businesses who depend on Web-site, mail, or in-store orders from customers will find that SuiteLink is a powerful money-saving tool. Also businesses who have customers that reside in buildings that house several businesses will appreciate getting their marketing materials, bank statements, and orders delivered right to their door.

The addition of secondary number information to your addresses allows for the most efficient and cost-effective delivery sequencing and postage discounts.

Note

Beginning with CASS Cycle N, SuiteLink is required for those preparing CASS-compliant mailing lists.

8.4.2 How SuiteLink works

The software uses the data in the SuiteLink directories to add suite numbers to an address. The software matches a company name, a known high-rise address, and the CASS-certified postcode2 in your database to data in SuiteLink. When there is a match, the software creates a complete business address that includes the suite number.

Example

SuiteLink This example shows a record that is processed through SuiteLink, and the output record with the assigned suite number.

Developer Guide 88 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse The input record contains:

● Firm name (in FIRM input field) ● Known high-rise address ● CASS-certified postcode2

The SuiteLink directory contains:

● secondary numbers ● unit designators

The output record contains:

● the correct suite number

Input record Output record

Telera TELERA

910 E Hamilton Ave Fl2 910 E HAMILTON AVE STE 200

Campbell CA 95008 0610 CAMPBELL CA 95008 0625

8.4.3 SuiteLink directory

The SuiteLink directory is distributed monthly. You must use the SuiteLink directory with a ZIP+4 directory labeled for the same month. For example, the December 2010 SuiteLink directory can be used with only the December 2010 ZIP+4 directory.

You cannot use a SuiteLink directory that is older than three months based on its release date. The software warns you 15 days before the directory expires. As with all directories, the software won't process your records with an expired SuiteLink directory.

8.4.4 Improve processing speed

You may increase SuiteLink processing speed if you load the SuiteLink directories into memory. To activate this option, go to the Transform Performance group and set the Cache SuiteLink Directories to Yes.

8.5 LACSLink®

LACSLink is a USPS product that is available for U.S. records with the USA Regulatory Address Cleanse transform only. LACSLink processing is required for CASS certification.

LACSLink updates addresses when the physical address does not move but the address has changed. For example, when the municipality changes rural route addresses to street-name addresses. Rural route conversions

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 89 make it easier for police, fire, ambulance, and postal personnel to locate a rural address. LACSLink also converts addresses when streets are renamed or post office boxes renumbered.

LACSLink technology ensures that the data remains private and secure, and at the same time gives you easy access to the data. LACSLink is an integrated part of address processing; it is not an extra step. To obtain the new addresses, you must already have the old address data.

You can enable LACSLink in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

Related Information

Assignment options [page 130] Memory usage and caching for LACSLink processing [page 97] LACSLink® security [page 90]

8.5.1 Benefits of LACSLink

LACSLink processing is required for all CASS customers (beginning with CASS Cycle L).

If you process your data without LACSLink enabled, you won't get the CASS-required reports or postal discounts.

8.5.2 LACSLink® security

The USPS has instituted processes that monitor the use of LACSLink. Each company that purchases the LACSLink functionality is required to sign a legal agreement stating that it will not attempt to misuse the LACSLink product. If a user abuses the LACSLink product, the USPS has the right to prohibit the user from using LACSLink in the future.

8.5.2.1 LACSLink false positive addresses

The USPS has included false positive addresses in the LACSLink directories as an added security to prevent LACSLink abuse. Depending on what type of user you are and your license key codes, the software's behavior varies when it encounters a false positive address. The following table explains the behaviors for each user type:

User type Software behavior Read about:

End users LACSLink processing is terminated. Obtaining the LACSLink unlock code from SAP Support

End users with a Stop Processing LACSLink processing continues. Sending false positive logs to the USPS Alternative agreement

Developer Guide 90 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse User type Software behavior Read about:

Service providers LACSLink processing continues. Sending false positive logs to the USPS

Related Information

Stop processing alternative [page 77]

8.5.2.2 LACSLink false positive logs

For service providers and end users with Stop Processing Alternative enabled, the software takes the following actions when it detects a false positive address during LACSLink processing:

● marks the record as a false positive ● generates a LACSLink log file containing the false positive address ● notes the path to the LACSLink log files in the error log ● generates a US Regulatory Locking Report containing the path to the LACSLink log file ● continues LACSLink processing without interruption (however, you are required to notify the USPS that a false positive address was detected.)

Before releasing the mailing list that contains the false positive address, you are required to send the LACSLink log files containing the false positive addresses to the USPS.

8.5.2.2.1 LACSLink log file location

This software stores LACSLink false log files in the directory specified for the USPS Log Path in the Reference Files group.

Note

The USPS log path that you enter must be writable. An error is issued if you have entered a path that is not writable.

Log file naming convention

The software names LACSLink false positive logs like this: lacsl###.log, where ### is a number between 001 and 999. For example, the first log file generated is lacsl001.log, the next one is lacsl002.log, and so on.

Note

When you have set the degree of parallelism to greater than 1, the software generates one log per thread. During a job run, if the software encounters only one false positive record, one log will be generated. However, if

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 91 it encounters more than one false positive record and the records are processed on different threads, then the software will generate one log for each thread that processes a false positive record.

Related Information

Reference files [page 128]

8.5.2.3 LACSLink locking for end users

This locking behavior is applicable for end users.

When the software finds a false positive address, LACSLink processing is discontinued for the remainder of the job processing. The software takes the following actions:

● Marks the record as a false positive address. ● Issues a message in the error log that a LACSLink false positive address was encountered. ● Includes the false positive address and lock code in the error log. ● Continues processing your data flow without LACSLink processing. ● Generates a lock code. ● Generates a false positive log.

To restore LACSLink functionality, users must obtain a LACSLink unlock code from SAP Business User Support.

8.5.2.4 Submit log file to USPS

NCOALink service providers and end users with a Stop Processing Alternative agreement must submit the false- positive log to the USPS NCSC (National Customer Service Center) via email ([email protected]), with the mailer's name, the total number of addresses processed, and the number of addresses matched. Also include the subject line “LACSLink False Positive”.

The NCSC uses this information to determine whether the list can be returned to the mailer.

Tip

When the USPS has released your list that contained the false positive record, remember to delete the corresponding log file.

Developer Guide 92 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse 8.5.2.5 Retrieving the LACSLink unlock code from SAP

These steps are applicable for end users who do not have a Stop Processing Alternative agreement with the USPS. When you receive a processing message that LACSLink false positive addresses are present in your address list, use the SAP USPS Unlock Utility to obtain an unlock code.

1. Navigate to http://service.sap.com/bosap-unlock to open the SAP Service Market Place (SMP) unlock utility page. 2. Click Retrieve USPS Unlock Code. 3. Click Search and select an applicable Data Services system from the list. 4. Enter the lock code found in the lacsx.txt file (location is specified in the LACSLink Path option in the Reference Files group). 5. Select LACSLink as the lock type. 6. Select BOJ-EIM-DS as the component. 7. Enter the locking address that is listed in the lacsx.txt file. 8. Attach the lacsl####.log file (location specified in the USPS Log Path option in the Reference Files group). 9. Click Submit. The unlock code displays. 10. Copy the unlock code and paste it into the lacsw.txt file, replacing all contents of the file with the unlock code (location is specified in the LACSLink path option in the Reference Files group). 11. Remove the record that caused the lock from the database, and delete the lacsl####.log file before processing the list again.

Tip

Keep in mind that you can only use the unlock code one time. If the software detects another false-positive (even if it is the same record), you will need to retrieve a new LACSLink unlock code.

If an unlock code could not be generated, a message is still created and is processed by a Technical Customer Assurance engineer (during regular business hours).

Note

If you are an end user who has a Stop Processing Alternative agreement, follow the steps to send the false positive log to the USPS.

8.5.2.6 Sending LACSLink false positive logs to the USPS

Service providers should follow these steps after receiving a processing message that LACSLink false positive addresses are present in their address list. End users with a Stop Processing Alternative agreement should follow these steps after receiving a processing message that LACSLink false positive addresses are present in their address list.

1. Send an email to the USPS at [email protected]. Include the following:

○ Type “LACSLink False Positive” as the subject line

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 93 ○ Attach the lacsl###.log file or files that were generated by the software (location specified in the USPS Log Files option in the Reference Files group).

The USPS NCSC uses the information to determine whether or not the list can be returned to the mailer. 2. After the USPS NCSC has released the list that contained the locked or false positive record:

○ Delete the corresponding log file or files ○ Remove the record that caused the lock from the list and reprocess the file

If you are an end user who does not have a Stop Processing Alternative agreement, follow the steps to retrieve the LACSLink unlock code from SAP Support.

8.5.3 How LACSLink works

LACSLink provides a new address when one is available. LACSLink follows these steps when processing an address:

1. The USA Regulatory Address Cleanse transform standardizes the input address. 2. The transform looks for a matching address in the LACSLink data. 3. If a match is found, the transform outputs the LACSLink-converted address and other LACSLink information.

8.5.4 Conditions for address processing

The transform does not process all of your addresses with LACSLink when it is enabled. Here are the conditions under which your data is passed into LACSLink processing:

● The address is found in the address directory, and it is flagged as a LACS-convertible record within the address directory. ● The address is found in the address directory, and, even though a rural route or highway contract default assignment was made, the record wasn't flagged as LACS convertible. ● The address is not found in the address directory, but the record contains enough information to be sent into LACSLink.

For example, the following table shows an address that was found in the address directory as a LACS-convertible address.

Original address After LACSLink conversion

RR2 BOX 204 463 SHOWERS RD

DU BOIS PA 15801 DU BOIS PA 15801-66675

8.5.5 LACSLink directory files

SAP ships the LACSLink directory files with the U.S. National Directory update. The LACSLink directory files require about 600 MB of additional hard drive space. The LACSLink directories include the following:

Developer Guide 94 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse ● lacsw.txt ● lacsx.txt ● lacsy.ll ● lacsz.ll

Caution

The LACSLink directories must reside on the hard drive in the same directory as the LACSLink supporting files. Do not rename any of the files. LACSLink will not run if the file names are changed.

8.5.5.1 Directory expiration and updates

LACSLink directories expire in 105 days. LACSLink directories must have the same date as the other directories that you are using from the U.S. National Directories.

8.5.6 Required information in the job setup

All users running LACSLink must include required information in the USPS License Information group. The required options include the following:

● Customer Company Name ● Customer Company Address ● Customer Company Locality ● Customer Company Region ● Customer Company Postcode1 ● Customer Company Postcode2 ● Customer Company Phone

8.5.7 Reasons for errors

If your job setup is missing information in the USPS License Information group, and you have DPV and/or LACSLink enabled in your job, you will get error messages based on these specific situations:

Reason for error Description

Missing required options When your job setup does not include the required parameters in the USPS License Information group, and you have DPV and/or LACSLink enabled in your job, the software issues a verification error.

Unwritable Log File directory If the path that you specified for the USPS Log Path option in the Reference Files section is not writable, the software issues an error.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 95 Related Information

Reference files [page 128]

8.5.8 LACSLink output fields

Several output fields are available for reporting LACSLink processing results.

You must enable LACSLink, and include these output fields in your job setup, before the software posts information to these fields.

Field name Len Description

LACSLINK_QUERY 50 Returns the pre-conversion address, populated only when LACSLink is enabled and a LACSLink lookup was attempted.

This address will be in the standard USPS format (as shown in USPS Publication 28). However, when an address has both a unit designa­ tor and secondary unit, the unit designator is replaced by the char­ acter “#”.

blank: No LACSLink lookup attempted.

LACSLINK_RETURN_CODE 2 Returns the match status for LACSLink processing:

A = LACSLink record match. A converted address is provided in the address data fields.

00 = No match and no converted address.

09 = LACSLink matched an input address to an old address, which is a "high-rise default" address; no new address is provided.

14 = Found a LACSLink record, but couldn't convert the data to a de­ liverable address.

92 = LACSLink record matched after dropping the secondary num­ ber from input address.

blank = No LACSLink lookup attempted.

LACSLINK_INDICATOR 1 Returns the conversion status of addresses processed by LACSLink.

Y = Address converted by LACSLink (the LACSLink_Return_Code value is A).

N = Address looked up with LACSLink but not converted.

F = The address was a false-positive.

S = LACSLink conversion was made, but it was necessary to drop the secondary information.

blank: No LACSLink lookup attempted.

Developer Guide 96 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse 8.5.9 Memory usage and caching for LACSLink processing

The transform performance improves if you cache the LACSLink directories. This setting is controlled in the Transform performance section of the configuration file.

For the amount of disk space required to cache the directories, see the Supported Platforms document available in the SAP Support Documentation Supported Platforms/PARs section of the SAP Service Marketplace: http://service.sap.com/bosap-support .

If you do not have adequate system memory to load the LACSLink directories and the Insufficient Cache Memory Action is set to Error, a verification error message is displayed at run-time and the transform terminates. If the Continue option is chosen, the transform attempts to continue LACSLink processing without caching.

Related Information

Transform performance [page 127]

8.5.10 USPS Form 3553

The data this product provides for USPS Form 3553 reports LACSLink counts. The LACS/LACSLink field shows the number of records that have a LACSLink Indicator of Y or S, if LACSLink processing is enabled. If LACSLink processing is not enabled, this field shows the number of LACS code count.

8.6 USPS RDI®

The RDI option is available in the USA Regulatory Address Cleanse transform. RDI determines whether a given address is for a residence or non residence.

Parcel shippers can find RDI information to be very valuable because some delivery services charge higher rates to deliver to residential addresses. The USPS, on the other hand, does not add surcharges for residential deliveries. When you can recognize an address as a residence, you have increased incentive to ship the parcel with the USPS instead of with a competitor that applies a residential surcharge.

According to the USPS, 91-percent of U.S. addresses are residential. The USPS is motivated to encourage the use of RDI by parcel mailers.

You can use RDI if you are processing your data for CASS certification or if you are processing in a non-certified mode. In addition, RDI does not require that you use DPV processing.

You can enable RDI in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 97 Related Information

Assignment options [page 130]

8.6.1 How RDI works

After you install the USPS-supplied RDI directories (and then enable RDI processing), the software can determine if the address represented by an 11-digit postcode (Postcode1, Postcode2, and the DPBC) is a residential address or not. (The software can sometimes do the same with a postcode2.)

The software indicates that an address is for a residence or not in the output component, RDI_INDICATOR.

Using the RDI feature involves only a few steps:

1. Install USPS-supplied directories. 2. Specify in the Reference files section of the configuration file where these directories are located. 3. Enable RDI processing in the Assignment options section of the configuration file. 4. Run your application.

Related Information

Setting up the reference files [page 74] RDI directory files [page 98] Reference files [page 128] Assignment options [page 130]

8.6.1.1 Compatibility

RDI has the following compatibility with other options in the software:

● RDI is allowed in both CASS and non-CASS processing modes. ● RDI is allowed with or without DPV processing.

8.6.2 RDI directory files

RDI directories are available through the USPS. You purchase these directories directly from the USPS and install them according to USPS instructions to make them accessible to the software.

RDI requires the following directories.

Developer Guide 98 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse File Description

rts.hs11 For 11-digit postcode lookups (Postcode2 plus DPBC). This file is used when an address contains an 11-digit postcode. Determination is based on the delivery point.

rts.hs9 For 9-digit postcode lookups (Postcode2). This file is based on a ZIP+4.This is possible only when the ad­ dresses for that ZIP+4 are for all residences or for no residences.

8.6.3 RDI output field

For RDI, the software uses a single output component that is always one character in length. The RDI component is populated only when the Enable RDI option in the Assignment Options group is set to Yes.

Job/Views field Length Description

RDI_INDICATOR 1 This field contains the RDI value that consists of one of the following values:

Y = The address is for a residence.

N = The address is not for a resi­ dence.

Related Information

Assignment options [page 130]

8.6.4 CASS Statement, USPS Form 3553

The USPS Form 3553 contains an entry for the number of residences. (The CASS header record also contains this information.)

8.7 Z4Change (USA Regulatory Address Cleanse)

The Z4Change option is based on a USPS directory of the same name. The Z4Change option is available in the USA Regulatory Address Cleanse transform only.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 99 8.7.1 Enable Z4Change for faster processing

Enabling the Z4Change option can save processing time, compared to running all records through the normal ZIP +4 assignment process.

Z4Change is most cost-effective for databases that are large and fairly stable—for example, databases of regular customers, subscribers, and so on. In our tests, based on files in which five percent of records were affected by a ZIP+4 change, total batch processing time was one third the normal processing time.

When you are using the transform interactively—that is, processing one address at a time—there is less benefit from enabling Z4Change.

You can enable the Z4Change option in the Z4 change option section of the USA Regulatory Address Cleanse configuration file.

Related Information

Z4 Change options [page 139]

8.7.2 Z4Change and USPS rules

Z4Change is to be used only for updating a database that has previously been put through a full validation process. The USPS requires that the mailing list be put through a complete assignment process every three years.

8.7.3 Z4Change directory

The Z4Change directory, z4change.dir, is updated monthly and is available only if you have purchased the Z4Change option for the USA Regulatory Address Cleanse transform.

The Z4Change directory contains a list of all the ZIP Codes and ZIP+4 codes in the country.

You define the location of the Z4Change directory in the Reference Files section of the USA Regulatory Address Cleanse configuration file.

Related Information

Reference files [page 128] Setting up the reference files [page 74]

Developer Guide 100 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse 8.8 Suggestion lists

Suggestion list processing is used in transactional projects with the USA Regulatory Address Cleanse and Global Address Cleanse transforms. Use suggestion lists to complete and populate addresses that have minimal data. Suggestion lists can offer suggestions for possible matches if an exact match is not found. This section is only about suggestion lists in the USA Regulatory Address Cleanse transform.

Note

Suggestion list processing is not available for batch processing. In addition, if you have suggestion lists enabled, you are not eligible for CASS discounts and the software will not produce the required CASS documentation.

Ideally, when the USA Regulatory Address Cleanse transform looks up an address in the USPS postal directories (City/ZCF), it finds exactly one matching record with a matching combination of locality, region, and postcode. Then, during the look-up in the USPS national ZIP+4 directory, the software should find exactly one record that matches the address.

Generally, the software can do this even when the input data is not complete. For many addresses, all the software needs in order to make suggestions is the right postcode, house number, and some of the primary name.

Example

Incomplete address information

Input record Output record

Line1 = 1000 vin Address_Line = 1000 Vine Street

Line2 = 54603 Locality1 = La Crosse

Region1 = WI

Postcode_Full = 54601-3474

Integrating functionality

Suggestion lists functionality is designed to be integrated into your own custom applications via the Web Service. This easy address-entry system is ideal in call center environments or any transactional environment where cleansing is necessary at the point of entry. It's also a beneficial research tool when you need to manage bad addresses from a previous batch process.

Related Information

CASS rule [page 103]

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 101 8.8.1 Breaking ties

Sometimes it's impossible to pinpoint an input address to one matching record in the directory. At other times, the software may find several directory records that are near matches to the input data.

When the software is close to a match, but not quite close enough, it assembles a list of the near matches and presents them as suggestions. When you choose a suggestion, the software tries again to assign the address.

Example

Incomplete last line Given the incomplete last line below, the software could not reliably choose one of the four localities. But if you choose one, the software can proceed with the rest of the assignment process.

Input record Possible matches in the City/ZCF directories

Line1= 1000 vine La Crosse, WI 54603

Line2= lacr wi Lancaster, WI 53813

La Crosse, WI 54601

Larson, WI 54947

Example

Missing directional The same can happen with address lines. A common problem is a missing directional. In the example below, there is an equal chance that the directional could be North or South. The software has no basis for choosing one way or the other.

Input record Possible matches in the ZIP+4 directory

Line1 = 615 losey blvd 600-699 Losey Blvd North

Line2 = 54603 600-699 Losey Blvd South

Example

Missing suffix A missing suffix would cause behavior similar to the example above.

Input record Possible matches in the ZIP+4 directory

Line1 = 121 dorn 100-199 Dorn Place

Line2 = 54601 100-199 Dorn Street

Example

Misspelled street names A misspelled or incomplete street name could also result in the need to be presented with address suggestions.

Developer Guide 102 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse Input record Possible matches in the ZIP+4 directory

Line1 = 4101 marl 3900-4199 Marshall 55421

Line2 = minneapolis mn 4000-4199 Maryland 55427

Example

Incomplete last line Given the incomplete last line below, the software could not reliably choose one of the four localities. But if you choose one, the software can proceed with the rest of the assignment process.

Input record Possible matches in the City/ZCF directories

Line1= 1 000 vine La Crosse, WI 54601

Line2= lac wi Lac du Flambeau, WI 54538

Lac Courte Oreilles Indian Reservation, WI 54806

Lac du Flambeau Reservation, WI 54806

Example

Misspelled street names A badly misspelled street name could also cause a tie.

Input record Possible matches in the ZIP+4 directory

Line1 = 4101 mar 3900-4199 Marschall 55379

Line2 = minneapolis mn 4000-4199 Maryland 55427

8.8.2 More information is needed

When the software produces a suggestion list, you need some basis for selecting one of the possible matches. Perhaps you can come up with some additional or better data. For example, perhaps you are capturing address data while the customer is still on the phone. Or you might be taking data from a consumer coupon that is smudged, but if the software gives you a clue about what information is needed, perhaps you can resolve the address.

8.8.3 CASS rule

The USPS does not permit the generation of a USPS Form 3553 when suggestion lists are used in address assignment. The USPS suspects that users may be tempted to guess, which results in misrouted mail that is expensive for the USPS to handle.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 103 Therefore, when you use the suggestion list feature, you cannot get a USPS Form 3553 covering the addresses that you assign. The form is available only when you process in batch mode with the Disable Certification option set to No.

You must run addresses from real-time processes through a batch process in order to be CASS compliant. Then the software generates a USPS Form 3553 that covers your entire mailing database, and your list may be eligible for postal discounts.

8.9 USPS certifications

The following USPS certifications are available through the USA Regulatory Address Cleanse transform:

● CASS self-certification ● NCOALink license

8.9.1 Completing USPS certifications

The instructions in this section apply to USPS CASS self-certification and NCOALink license certification.

During certification you must process files from the USPS to prove that your software is compliant with the requirements of your license agreement.

The CASS and NCOALink certifications have two stages. Stage I is an optional test that includes answers that allow you to troubleshoot and prepare for the Stage II test. The Stage II test does not contain answers and is sent to the USPS for evaluation of the accuracy of your software configuration.

1. Complete the applicable USPS application (CASS, NCOALink) and other required forms and return the information to the USPS. After you satisfy the initial application and other requirements, the USPS gives you an authorization code to purchase the CASS or NCOALink option. 2. Purchase the option from the USPS. Then submit the following information to SAP:

○ your USPS authorization code (see step 1) ○ your NCOALink provider level (full service provider, limited service provider, or end user) (only applicable for NCOALink) ○ your decision whether or not you want to purchase the ANKLink option (for NCOALink limited service provider or end user only)

After you install the software, you are ready to request the applicable certification test from the USPS.

Tip

The samples provided with this product show how to generate the statistics for reports needed for certifications.

3. Submit the Software Product Information form to the USPS and request certification tests. 4. After you successfully complete the certification tests, the USPS sends you the applicable license agreement. At this point, you also purchase the applicable product from SAP .

Developer Guide 104 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse Related Information

About ANKLink [page 113]

8.9.2 Static directories

Users who are self-certifying for CASS must use “static” directories. Static directories do not change every three months with the regular directory updates. Instead, they can be used for certification for the duration of the CASS cycle. Using static directories ensures consistent results between Stage I and Stage II tests, and allows you to use the same directory information if you are required to re-test.

Note

If you do not use static directories when required, the software issues an error.

8.9.2.1 List of static directories

The following directories are available in static format:

● zip4us.dir ● zip4us.shs ● zip4us.rev ● revzip4.dir ● city10.dir ● zcf10.dir ● dpv*.dir ● elot.dir ● ew*.dir ● SuiteLink directories ● LACSLink directories ● RDI directories

8.9.2.2 Obtaining static directories

To request static directories, contact SAP Business User Support at http://service.sap.com/message .

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 105 8.9.2.3 Static directories location

It is important that you store your static directories separately from the production directories. If you store them in the same folder, the static directories will overwrite your production directories.

8.9.2.4 Static directories safeguards

To prevent running a production job using static directories, the software issues a verification warning or error under the following circumstances:

● When the job has both static and non-static directories indicated. ● When the release version of the zip4us.dir does not match the current CASS cycle in the software. ● When the data versions in the static directories aren't all the same. For example, for CASS Cycle M the data versions in the static directories are M01. ● When the job is set for self-certification but is not set up to use the static directories. ● When the job is not set for self-certification but is set up to use the static directories.

8.9.3 CASS self-certification

If you want to CASS-certify an application you create using this product, you must obtain CASS certification on your own (self certification). You need to show the USPS that your software meets the CASS standards for accuracy of postal coding and address correction. You further need to show that your software can produce a facsimile of the USPS Form 3553 , which is required to qualify mailings for postage discounts.

Visit the USPS RIBBS website at http://www.ribbs.usps.gov/cassmass/documents/tech_guides for more information about CASS certification.

8.9.3.1 CASS self-certification process overview

1. Familiarize yourself with the CASS certification documentation and procedures located at http:// ribbs.usps.gov/cassmass/documents/tech_guides . 2. (Optional.) Download the CASS Stage I test from the RIBBS website. This is an optional step. You do not submit the Stage I test results to the USPS. Taking the Stage I test helps you analyze and correct any inconsistencies with the USPS-expected results before taking the Stage II test. 3. When you are satisfied that your Stage I test results compare favorably with the USPS-expected results, request the Stage II test from the USPS using the Stage II order form located on the RIBBS website. The USPS will place the Stage II test in your user area on the RIBBS website for you to download. 4. Download and unzip the Stage II test file to an output area. 5. After you run the Stage II file with the CASS self-certification sample provided with this product (EmDQ_USARegulatoryAddressCleanseCASSSelfCert.xml and

Developer Guide 106 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse urac_self_certification_transform_driver.cpp), check that the totals on the USPS Form 3553 and the actual totals from the processed file match. 6. Zip the processed Stage II answer file and upload it to your user area on the RIBBS website. Name the Zip file the same as the input Zip file.

The USPS usually takes approximately two weeks to grade your test.

8.9.3.2 USPS Form 3553 required options for self certification

The following options in the CASS Report Options group are required for CASS self certification. This information is included in the USPS Form 3553.

Option Description

Company Name Certified Specify the name of the company that owns the CASS- certified software.

List Name Specify the name of the mailing list.

List Owner Specify the name of the list owner.

Note

Keep the CASS self-certification blueprints setting of “USPS”.

Software Version Specify the software name and version number that you are using to receive CASS self certification.

Related Information

CASS Report options [page 139]

8.9.3.3 Points to remember about CASS

Remember these important points about CASS certification:

● The end user of the data quality application is not required to obtain CASS self certification because this product is already CASS certified. ● CASS certification is given to software programs. You obtain CASS self certification if you have incorporated this product into your software program. ● The CASS reports pertain to address lists.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 107 ● CASS certification proves that the software can assign and standardize addresses correctly.

8.9.3.4 Using the SDK for CASS processing and self- certification

You can use this product to perform CASS processing. This section provides an overview of how to use the APIs and configuration files.

Requirement Description

Send all records for processing from the input stage file in the You must instantiate a MultiRecordTransform object with a form of a single collection U.S. Regulatory Address Cleanse XML configuration.

Configure the U.S. Regulatory Address Cleanse transform's In the Assignment Options section of the U.S. Regulatory USPS Certification Testing Mode for CASS. Address Cleanse configuration file, you must set the USPS Certification Testing Mode option to CASS (instead of NONE).

Assure that only one output field Stage_Test_Record is Stage_Test_Record is the field to where the data is placed. present in the output schema

Generate PS FORM 3553 Report Use the Certified Report Generator class to generate the report. (In the .Net implementation, the Certified Report Generator is in the TransformFactory class.)

Submit output file with PS FORM 3553 report for processing This product produces a file that contains the data you must to the USPS submit to the USPS.

Related Information

Assignment options [page 130]

8.9.4 NCOALink certification

You must complete the USPS certification procedure for NCOALink in order to purchase the NCOALink product from the USPS. For full information and the required forms for each provider type, visit the RIBBS website at http://ribbs.usps.gov/ncoalink/documents/tech_guides .

Note

If you have questions about the NCOALink documents or about NCOALink certification procedures, contact the USPS National Customer Support Center at 800-589-5766 or go to the USPS RIBBS website (link provided above).

Developer Guide 108 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse Related Information

Getting started with NCOALink [page 114]

8.9.4.1 Completing NCOALink certification

During certification you must process files from the USPS to prove that you adhere to the requirements of your license agreement. NCOALink certification has two stages. Stage I is an optional test which includes answers that allow you to troubleshoot and prepare for the Stage II test. The Stage II test does not contain answers and is sent to the USPS for evaluation of the accuracy of your software configuration.

8.9.4.2 Software product information for NCOALink application

Use the information in the following table as you complete Step 3 of the Software Product Information form for the NCOALink certification process.

Software product information (Step 3) Description

Company Name & License Number Your specific information. The license number is the authorization code provided in your USPS approval let­ ter.

Company's NCOALink Product Name Mover ID for NCOALink

Platform or Operating System Your specific information

NCOALink Software Vendor SAP Americas, Inc.

NCOALink Software Product Name Mover ID

NCOALink Software Product Version Contact SAP Business User Support.

Is Software Hardware Dependent? No

Address Matching ZIP+4 Product Name ACE

Address Matching ZIP+4 Product Version Contact SAP Business User Support.

Open or Closed System Closed

DPV® Product Name ACE

DPV Product Version Contact SAP Business User Support.

LACSLink® Product Name ACE

LACSLink Product Version Contact SAP Business User Support.

Integrated or Standalone Integrated

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 109 Software product information (Step 3) Description

ANKLink Enhancement Check the box if you purchased the ANKLink option from SAP .

HASH—FLAT—BOTH Indicate your preference. The software provides ac­ cess to both file formats.

Service Level Option Check the appropriate box.

Related Information

Data format [page 120]

8.9.4.3 Using the SDK for NCOALink processing

You can use this product to perform NCOALink processing. This section provides an overview of how to use the APIs and configuration files.

Requirement Description

Process at least 100 records You must instantiate a MultiRecordTransform object with a U.S. Regulatory Address Cleanse XML configuration and process at least 100 records.

Enable NCOALink option in configuration file In the Assignment Options section of the U.S. Regulatory Address Cleanse configuration file, you must set the Enable NCOALink option to YES.

Complete the Processing Options group in configuration file You must provide information for the options in the Processing Options section of the U.S. Regulatory Address Cleanse configuration file.

Obtain keycode from SAP You must obtain a keycode from SAP and place the keycode in the bobjprods.key file. You must match the keycode to the setting configured in the Provider Level option, in the USPS Licensee and Customer Information Options of the U.S. Regulatory Address Cleanse configuration file.

Submit monthly statistics to the USPS Use the StatisticsHandler to collect statistics from each run and consolidate them into a monthly CSL report that is named and formatted per USPS requirements.

Related Information

NCOALink (USA Regulatory Address Cleanse) [page 111] Assignment options [page 130] Processing options [page 148]

Developer Guide 110 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse USPS license information options [page 145]

8.10 NCOALink (USA Regulatory Address Cleanse)

Move Update is the process of ensuring that your mailing list contains up-to-date address information. Move Updating is the process of checking a mailing list against the National Change of Address (NCOA) database to make sure your data is updated with current addresses.

When you perform move updating, you update your records for individuals or businesses that have moved and have filed a Change of Address (COA) form with the USPS.

The USPS requires that your mailing list has undergone move update processing in order for it to qualify for the discounted rates available for First-Class presorted mailings. You can meet this requirement through the NCOALink process.

Mover ID is the name under which this product is certified for NCOALink.

You can enable NCOALink in the Assignment options section of the USA Regulatory Address Cleanse configuration file.

Related Information

Assignment options [page 130]

8.10.1 The importance of move updating

More than 40 million people and businesses move every year. To keep accurate address information for your contacts, you must use a USPS method for receiving your contacts' new addresses. Not only is move updating good business, it is required for all First-Class mailers who claim presorted or automation rates. As the USPS expands move-updating requirements and more strictly enforces the existing regulations, move updating will become increasingly important.

8.10.2 Benefits of NCOALink

By using NCOALink through the software, you're updating the addresses in your database with the latest move data. With NCOALink, you can:

● Improve mail deliverability. ● Reduce the cost and time needed to forward mail. ● Meet the USPS move-updating requirement for presorted First Class mail.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 111 ● Prepare for the possible expansion of move-update requirements.

8.10.3 How NCOALink works

This is how NCOALink processing works:

1. The USA Regulatory Address Cleanse transform standardizes the input addresses. NCOALink requires parsed, standardized address data as input. 2. The software searches the NCOALink database for records that match your parsed, standardized records. 3. If a match is found, the software receives the move information, including the new address, if one is available. 4. The software looks up move records that come back from the NCOALink database to assign postal and other codes. 5. Depending on your field class selection, the output file contains:

○ only the original address (CORRECT) ○ only the move-updated address, if one exists (MOVE-UPDATED) ○ the move-updated data if it exists and if it matches in the U.S. National directories. Or the field contains the original data if a move does not exist. (BEST)

Based on the Apply Move to Standardized Fields option in the NCOALink group, standardized components can contain either original or move-updated addresses. 6. You must produce the reports and log files required for USPS compliance. Use the StatisticsHandler to produce these files. Be sure to follow USPS file-naming conventions for the required files. See the Log file names topic, in this guide, for more information.

Related Information

Log file names [page 122]

8.10.4 Software performance

Your processing speed depends on the computer running the software and the percentage of input records affected by a move. More moves lead to slower performance.

Related Information

Improving NCOALink processing performance [page 119]

Developer Guide 112 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse 8.10.5 Address not known (ANKLink)

Undeliverable-as-addressed (UAA) mail costs the mailing industry and the USPS a lot of money each year. The software provides NCOALink as an additional solution to UAA mail. With NCOALink, you also can have access to the USPS's ANKLink data.

8.10.5.1 About ANKLink

NCOALink limited service providers and end users receive change of address data for the preceding 18 months. The ANKLink option enhances that information by providing additional data about moves that occurred in the previous months 19 through 48.

Note

The additional 30 months of data that comes with ANKLink indicates only that a move occurred and the date of the move; the new address is not provided.

The ANKLink data helps you make informed choices regarding a contact. If the data indicates that the contact has moved, you can choose to suppress that contact from the list or try to acquire the new address from an NCOALINK full service provider.

If you choose to purchase ANKLink to extend NCOALINK information, then the data you receive from the USPS will contain both the NCOALink 18-month full change of address information and the additional 30 month ANKLink information which indicates that a move has occurred.

If an ANKLink match exists, it is noted in the ANKLINK_RETURN_CODE output field and in the NCOALink Processing Summary report.

Tip

If you are an NCOALink full service provider you already have access to the full 48 months of move data (including the new addresses).

8.10.5.2 ANKLink data

ANKLink is a subset of NCOALink. You can request ANKLink data from the USPS National Customer Support Center (NCSC) by calling 1-800-589-5766 or by e-mail at [email protected]. ANKLink data is not available from SAP.

The software detects if you're using ANKLink data. Therefore, you do not have to specify whether you're using ANKLink in your job setup.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 113 8.10.5.3 ANKLink support for NCOALink provider levels

The software supports three NCOALink provider levels defined by the USPS. Software options vary by provider level and are activated based on the software package that you purchased. The following table shows the provider levels and support:

Provider level Provide service to COA data (months) Data received from Support for third parties USPS ANKLink

Full Service Provider Yes. 48 Weekly No (no benefit) (FSP) Third party services must be at least 51% of all processing.

Limited Service Yes. 18 Weekly Yes Provider (LSP) LSPs can both provide services to third parties and use the product internally.

End User No 18 Monthly Yes

Tip

If you are an NCOALink end user, you may complete a Stop Processing Alternative application and enter into an agreement with the USPS. After you are approved by the USPS you may purchase the software's stop processing alternative functionality which allows DPV and LACSLink processing to continue after a false positive address record is detected.

8.10.6 Getting started with NCOALink

Before you begin NCOALink processing you need to perform the following tasks:

● Complete the USPS certification process to become an NCOALink service provider or end user. For information about certification, see the USPS Certifications section by following the link below. ● Understand the available output strategies and performance optimization options. ● Configure your job.

Related Information

NCOALink certification [page 108]

Developer Guide 114 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse 8.10.7 What to expect from the USPS and SAP

NCOALink, and the license requirements that go with it, have created a new dimension in the relationship among mailers (you), the USPS, and vendors. It's important to be clear about what to expect from everyone.

8.10.7.1 Move updating is a business decision for you to make

NCOALink offers an option to replace a person's old address with their new address. You as a service provider must decide whether you accept move updates related to family moves, or only individual moves. The USPS recommends that you make these choices only after careful thought about your customer relationships. Consider the following examples:

● If you are mailing checks, account statements, or other correspondence for which you have a fiduciary responsibility, then move updating is a serious undertaking. The USPS recommends that you verify each move by sending a double postcard, or other easy-reply piece, before changing a financial record to the new address. ● If your business relationship is with one spouse and not the other, then move updating must be handled carefully with respect to divorce or separation. Again, it may make sense for you to take the extra time and expense of confirming each move before permanently updating the record.

8.10.7.2 NCOALink security requirements

Because of the sensitivity and confidentiality of change-of-address data, the USPS imposes strict security procedures on software vendors who use and provide NCOALink processing.

One of the software vendor's responsibilities is to check that each list input to the USA Regulatory Address Cleanse transform contains at least 100 unique records. Therefore the USA Regulatory Address Cleanse transform checks your input files for at least 100 unique records. These checks make verification take longer, but they are required by the USPS and they must be performed.

If the software finds that your data does not have 100 unique records, it issues an error and discontinues processing.

The process of checking for 100 unique records is a pre-processing step. So if the software does not find 100 unique records, there will be no statistics output or any processing performed on the input file.

Related Information

Multiple data source statistics reporting [page 123]

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 115 8.10.7.2.1 How the software checks for 100 unique records

When you have NCOALink enabled in your job, the software checks the collection for 100 unique records before any processing is performed on the data. If it finds 100 unique records, the job is processed as usual. However, if the software does not find 100 unique records, it issues an error stating that your input data does not have 100 unique records, or that there is not enough records to determine uniqueness.

For the 100 unique record search, a record consists of all input fields concatenated in the same order as they are mapped in the transform. Each record must be identical to another record for it to be considered alike (not unique).

Example

Comparing records The example below illustrates how the software concatenates the fields in each record, and determines non- unique records. Records 1 and 4 are non-unique, therefore an error is issued and processing stops.

1 332 FRONT STREET NORTH LACROSSE WI 54601

2 332 FRONT STREET SOUTH LACROSSE WI 54601

3 331 FRONT STREET SOUTH LACROSSE WI 54601

4 332 FRONT STREET NORTH LACROSSE WI 54601

8.10.7.2.2 Finding unique records in multiple threads

When your NCOALink job is set up for multiple threads, the software checks each thread for 100 unique records. The thread check isn't necessarily in order, meaning thread #3 could be checked before thread #1 for example. Therefore, each time you run the same multi-threaded job, there may be different unique record results.

You may receive an error even when you have 100 unique records per thread. This can happen if one of the threads contains less than 100 records, and it is checked before other threads that contain at least 100 records. Consider reducing the number of threads before you try to process the job again.

8.10.7.3 USPS responsibility for support

When you acquire an NCOALink license from the USPS, you are licensing a USPS product. The NCOALink database is developed and maintained by the USPS. Therefore, contact the USPS NCSC in Memphis at 800-589-5766 as your first line of support for NCOALink issues or problems with the NCOALink system.

8.10.8 About NCOALink directories

After you have completed the certification requirements and purchased the NCOALink product from the USPS, the USPS makes the latest NCOALink directories available monthly (if you're an end user licensee) or weekly (if

Developer Guide 116 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse you're a limited or full service provider licensee). Before you can use the NCOALink directories, you must download them (using the USPS downloading tool, EPF Downloader Manager, or another download interface) and then extract and uncompress them (using the SAP NCOALink Utility).

The SAP NCOALink Utility is a 64-bit application that is run from the command line.

● If you use a version of the software that automatically installs the NCOALink Utility, see the following sections for information about how to locate and run the application. ● If you use a version of the software that does not install the NCOALink Utility, you can download the application from the SAP Support Portal at https://support.sap.com/software/address-directories.html .

NCOALink licensing requirements

USPS licensing requires that you use the most current NCOALink directories available (either weekly or monthly depending on the license type).

USPS licensing requires limited or full service provider licensees to download the daily delete file and copy it to the location where the NCOALink directories are located every day that NCOALink jobs are run.

NCOALink directories expire after 45 days.

Related Information

Installing the NCOALink daily delete file [page 119]

8.10.8.1 Installing NCOALink directories

Ensure that your system meets the following minimum requirements:

● At least 60 GB of available disk space ● Sufficient RAM

1. Run the NCOALink Utility from the location where it is installed. If your NCOALink Utility was installed with Data Quality Management SDK, the utility is located in the following folder, where $LINK_DIR is the path to your installation directory:

○ $LINK_DIR\bin\ncoa\ncoautil.exe (Windows) ○ $LINK_DIR/bin/ncoa/ncoautil (UNIX) 2. Use the ncoautil command with the following command-line options:

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 117 Option Description Windows UNIX

/p:t -p:t Perform transfer to copy files from the source to the destination. When using this option you must also specify the following:

○ Location of compressed NCOALink data files with /d or -d ○ Transfer destination location with /t or -t

/p:u -p:u Perform unpack to uncompress the files. When using this option, you must also specify the following:

○ Transfer destination location with the /d or -d option ○ Transfer destination location with /t or -t

/p:v -p:v Perform verification on the fields. When using this option, you must also specify the transfer destination location with /t or -t.

/d -d Specify location of compressed NCOALink data files.

/t -t Specify transfer destination location.

/nos -nos Do not stop on error (return failure code as exit status).

/a -a Answer all warning messages with Yes.

You can combine p options. For example, if you want to transfer, unpack, and verify all in the same process, enter /p:tuv or -p:tuv.

After performing the p option specified, the program closes.

Example

Your command line may look something like this:

Windows

ncoautil /p:tuv /d D:\downloads\ncoa /t C:\Program Files (x86)\SAP BusinessObjects \Data Services\DataQuality\reference_data

UNIX

ncoautil -a -nos -p:tuv -d /local/downloads/ncoa -t /local/dataservices/DataQuality/ reference_data

8.10.9 About the NCOALink daily delete file

If you are a service provider, then every day before you perform NCOALink processing, you must download the daily delete file and install it in the same directory where your NCOALink directories are located.

The daily delete file contains records that are pending deletion from the NCOALink data. For example, if Jane Doe filed a change of address with the USPS and then didn’t move, Jane’s record would be in the daily delete file. Because the change of address is stored in the NCOALink directories, and they are updated only weekly or monthly, the daily delete file is needed in the interim, until the NCOALink directories are updated again.

Developer Guide 118 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse Note

If you are an end user, you only need the daily delete file for processing Stage I or II files. It is not required for normal NCOALink processing.

Important points to know about the daily delete file:

● The software will fail verification if an NCOALink certification stage test is being performed and the daily delete file is not installed. ● USA Regulatory Address Cleanse transform supports only the ASCII version of the daily delete file. ● Do not rename the daily delete file. It must be named dailydel.dat. ● The software will issue a verification warning if the daily delete file is more than three days old.

8.10.9.1 Installing the NCOALink daily delete file

To download and install the NCOALink daily delete file, follow these steps:

1. Go to the USPS Electronic Product Fulfillment site at https://epf.usps.gov/ . 2. Before you download the daily delete file for the first time, you must complete and fax the PS Form 5116 (Electronic Product Fulfillment Web Access Request Form) to the USPS Licensing Department. When completing the form, make sure that you select the NCOALink or NCOALink with ANKLink option, as appropriate. This allows you to access the daily delete file. 3. Log into the USPS Electronic Product Fulfillment site and download the NCOALink Daily Delete [TEXT] file to a location where the .tar file can be extracted. If your computer browser has pop-up blockers enabled, you may need to override them. 4. Extract the dailyDeletes_txt.tar file. 5. Copy the dailydel.dat file to the same location where your NCOALink directories are stored. 6. Repeat steps 3–5 every day before you perform NCOALink processing.

8.10.10 Improving NCOALink processing performance

Many factors affect performance when processing NCOALink data. Generally the most critical factor is the volume of disk access that occurs. Often the most effective way to reduce disk access is to have sufficient memory available to cache data. Other critical factors that affect performance include hard drive speed, seek time, and the sustained transfer rate. When the time spent on disk access is minimized, the performance of the CPU becomes significant.

8.10.10.1 Operating systems and processors

The computation involved in most of the software and NCOALink processing is very well-suited to the microprocessors found in most computers, such as those made by Intel and AMD. RISC style processors like

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 119 those found in most UNIX systems are generally substantially slower for this type of computation. In fact a common PC can often run a single job through the software and NCOALink about twice as fast as a common UNIX system. If you’re looking for a cost-effective way of processing single jobs, a Windows server or a fast workstation can produce excellent results. Most UNIX systems have multiple processors and are at their best processing several jobs at once.

8.10.10.2 Memory

NCOALink processing uses many gigabytes of data. The exact amount depends on your service provider level, the data format, and the specific release of the data from the USPS.

In general, if performance is critical, and especially if you are an NCOALink full service provider and you frequently run very large jobs with millions of records, you should obtain as much memory as possible. You may want to go as far as caching the entire NCOALink data set. You should be able to cache the entire NCOALink data set using 20 GB of RAM, with enough memory left for the operating system.

8.10.10.3 Data storage

If at all possible, the hard drive you use for NCOALink data should be fully dedicated to that process, at least while your job is running. Other processes competing for the use of the same physical disk drive can greatly reduce your NCOALink performance.

To achieve even higher transfer rates you may want to explore the possibility of using a RAID system (redundant array of independent discs).

When the software accesses NCOALink data directly rather than from a cache, the most significant hard drive feature is the average seek time.

8.10.10.4 Data format

The software supports both hash and flat file versions of NCOALink data. If you have ample memory to cache the entire hash file data set, that format may provide the best performance. The flat file data is significantly smaller, which means a larger share can be cached in a given amount of RAM. However, accessing the flat file data involves binary searches, which are slightly more time consuming than the direct access used with the hash file format.

8.10.10.5 Memory usage

The optimal amount of memory depends on a great many factors. The “Auto” option usually does a good job of deciding how much memory to use, but in some cases manually adjusting the amount can be worthwhile.

Developer Guide 120 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse 8.10.10.6 Performance tips

Many factors can increase or decrease NCOALink processing speed. Some are within your control and others may be inherent to your business. Consider the following factors:

● Cache size—Using too little memory for NCOALink caching can cause unnecessary random file access and time-consuming hard drive seeks. Using far too much memory can cause large files to be read from the disk into the cache even when only a tiny fraction of the data will ever be used. The amount of cache that works best in your environment may require some testing to see what works best for your configuration and typical job size. ● Directory location—It’s best to have NCOALink directories on a local solid state drive or a virtual RAM drive. Using a local solid state drive or virtual RAM drive eliminates all I/O for NCOALink while processing your job. If you have the directories on a hard drive, it’s best to use a defragmented local hard drive. The hard drive should not be accessed for anything other than the NCOALink data while you are running your job. ● Match rate—The more records you process that have forwardable moves, the slower your processing will be. Retrieving and decoding the new addresses takes time, so updating a mailing list regularly will improve the processing speed on that list. ● Input format—Ideally you should provide the USA Regulatory Address Cleanse transform with discrete fields for the addressee’s first, middle, and last name, as well as for the pre-name and post-name. If your input has only a name line, the transform will have to take time to parse it before checking NCOALink data. ● File size—Larger files process relatively faster than smaller files. There is overhead when processing any job, but if a job includes millions of records, a few seconds of overhead becomes insignificant.

8.10.11 NCOALink log files

Use information from the StatisticsHandler to generate the USPS-required NCOALink log files.

The USPS requires that you save these log files for five years.

You can generate the following move-related log files:

● Customer Service log (CSL) ● PAF Customer Information log ● Broker/Agent/List Administrator log

The following table shows the log files required for each provider level:

Log file Required for: Description

Customer service ● End users This log file contains one record per list that you process. log ● Limited Service Providers Each record details the results of change-of-address proc­ essing. ● Full Service Providers

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 121 Log file Required for: Description

PAF customer in­ ● Limited Service Providers This log file contains the information that you provided for formation log ● Full Service Providers the PAF.

The log file lists each unique PAF entry. If a list is processed with the same PAF information, the information appears just once in the log file.

When contact information for the list administrator has changed, then information for both the list administrator and the corresponding broker are written to the PAF log file.

Broker/Agent / ● Limited Service Providers This log file contains all of the contact information that you List Administrator ● Full Service Providers entered for the broker or list administrator. log The log file lists information for each broker or list adminis­ trator just once.

The USPS requires the Broker/Agent/List Administrator log file from service providers, even in jobs that do not in­ volve a broker or list administrator. The software produces this log file for every job if you’re a certified service pro­ vider.

8.10.11.1 Log file names

Follow the USPS file-naming schemes shown below for the following log files that you generate:

● Customer Service log ● PAF Customer Information log ● Broker/Agent/List Administrators log

For example, P1234C13.DAT is a PAF log file generated in December 2013 for a licensee with the ID 1234.

Character 1 Log type

B Broker log

C Customer service log

P PAF log

Characters 2-5: Platform ID

ID, exactly 4 characters long

Character 6 Month

1 January

2 February

Developer Guide 122 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse Character 6 Month

3 March

4 April

5 May

6 June

7 July

8 August

9 September

A October

B November

C December

Character 7-8: Year

The two-digit year, for example 13 for 2013

File extension

.DAT

8.11 Multiple data source statistics reporting

Statistics based on logical groups

For the USA Regulatory Address Cleanse transform, an input database can be a compilation of lists, with each list containing a field that includes a unique identifier. The unique identifier can be a name or a number, but it must reside in the same field across all lists.

The software collects statistics for each list using the Data_Source_ID input field. You map the field that contains the unique identifier in your list to the software's Data_Source_ID input field. When the software generates reports, some of the reports will contain a summary for the entire list, and a separate summary per list based on the value mapped into the Data_Source_ID field.

Restriction

For compliance with NCOALink reporting restrictions, the USA Regulatory Address Cleanse transform does not support processing multiple mailing lists associated with different PAFs. Therefore, for NCOALink processing, all records in the input file are considered to be a single mailing list and are reported as such in the Customer Service Log (CSL) file.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 123 Restriction

The Gather Statistics Per Data Source functionality is not supported when Enable Parse Only in the Non Certified Options group is set to Yes.

8.11.1 Data_Source_ID field

The software tracks statistics for each list based on the Data_Source_ID input field.

Example

In this example there are 5 mailing lists combined into one list for input into the USA Regulatory Address Cleanse transform. Each list has a common field named List_ID, and a unique identifier in the List_ID field: N, S, E, W, C. The input mapping looks like this:

Transform Input Field Name Input Schema Column Name Type

DATA_SOURCE_ID LIST_ID varchar(80)

To obtain DPV statistics for each List_ID, process the job and then open the US Addressing report.

The first DPV Summary section in the US Addressing report lists the Cumulative Summary, which reports the totals for the entire database. Subsequent DPV Summary sections list summaries per Data_Source_ID. The example in the table below shows the counts and percentages for the entire database (cumulative summary) and for Data_Source_ID “N”.

Statistic DPV Cumulative Sum­ % DPV Summary for % mary Count Data_Source_ID “N”

DPV Validated Addresses 1,968 3.94 214 4.28

Addresses Not DPV Valid 3,032 6.06 286 5.72

CMRA Validated Addresses 3 0.01 0 0.00

DPV Vacant Addresses 109 0.22 10 0.20

DPV NoStats Addresses 162 0.32 17 0.34

8.11.2 USPS Form 3553 and group reporting

The USPS Form 3553 includes a summary of the entire list and a report per list based on the Data_Source_ID field.

Example

Cumulative Summary The USPS Form 3553 designates the summary for the entire list with the words “Cumulative Summary”. It appears in the footer as highlighted in the Cumulative Summary report sample. In addition, the Cumulative

Developer Guide 124 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse Summary of the USPS Form 3553 contains the total number of lists in the job in Section B, field number 5, Number of Lists.

Example

Physical Source Field The USPS Form 3553 designates the summary for each Individual list with the words Physical Source Field followed by the Data Source ID value. It appears in the footer. The data in the report is for that list only.

Developer Guide USA Regulatory Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 125 9 USA Regulatory Address Cleanse transform reference

The following sections describe the configurations for the USA Regulatory Address Cleanse XML. You can find examples of the XML configurations with the samples installed with the product.

9.1 System group

Syntax

The System group controls high-level transform functions.

Note

If an option exists in the XML, but it is not documented here, do not alter the contents of the option. Editing the contents could cause errors.

Option Description

Link dir Enter the relative path to the United States Regulatory Address Cleanse support files.

For example, the default Windows 32-bit location of these files is C:\Program Files\SAP BusinessObjects\Data quality Mgmt SDK\windows_32\DataQuality\urac.

You would then enter C:\Program Files\SAP BusinessObjects\Data quality Mgmt SDK\windows_32\

Keyfile location Enter the absolute path to the Data Quality Management SDK key file (bobjprods.key).

Job type Enter BATCH, if you are running an NCOALink job. Otherwise, this option can be left blank, provided that you instantiate the transform as a multi-record transform.

9.2 Report and analysis

Use this option to generate USA Regulatory Address Cleanse report data.

Developer Guide 126 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Generate Report Specifies whether to generate report data for this transform. Data YES: Generates report data for this transform.

NO: Turns off report data generation. If you do not need to generate reports (during test­ ing, for example), you should set this option to NO to improve performance.

9.3 Transform performance

The Transform Performance option group for the USA Regulatory Address Cleanse transform contains options that could improve the performance of DPV, RDI, LACSLink, NCOALink, and SuiteLink processing.

Option Description

Cache DPV Directories Specifies whether the DPV directories are cached into memory. If the directories are cached, the caching takes place only once and is shared among all DPV threads run­ ning.

YES: Enables caching.

NO: Disables caching.

Cache RDI Directories Specifies whether the RDI directories are cached into memory.

YES: Enables caching.

NO: Disables caching.

Cache LACSLink Specifies whether the LACSLink directories are cached into memory. Directories YES: Enables caching.

NO: Disables caching.

Cache SuiteLink Specifies whether the SuiteLink directories are cached into memory. Directories YES: Enables caching.

NO: Disables caching.

Insufficient Cache Specifies the action to take if there is insufficient memory to cache the directories Memory Action you have set up for caching.

CONTINUE: Attempts to continue initialization without caching.

ERROR: Issues an error and terminates the transform.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 127 Option Description

NCOALink Caching Mode Specifies the method for caching NCOALink directories.

AUTO: Select to have the application use available memory for caching.

MANUAL: Select if you have a limited amount of memory available and you want to allocate a set amount of memory for caching. Enter a value in the NCOALink Memory Usage option.

NONE: Disables caching. This is the default setting. Consider this option for smaller jobs because the overhead of caching directories could take longer than the actual job execution duration.

NCOALink Memory Usage If the NCOALink Caching Mode is set to MANUAL, enter a value here to allocate a set amount of memory for NCOALink directory caching.

Windows use extended Specifies whether to use Windows extended memory to allow the transform to ac­ memory cess larger amounts of memory than the 2 GB process size limit. This setting applies only to 32-bit Windows platforms.

YES: Enables use of extended memory.

NO: Disables use of extended memory.

9.4 Reference files

Reference files are directories used by the USA Regulatory Address Cleanse transform to correct and standardize U.S. address data.

Option Description

Address Directory 1 zip4us.dir

This directory, also called the National Directory, is organized by ZIP Code. It lists street names, ranges of house numbers, and postal and other codes.

Address Directory 2 *.dir

This second address directory is optional, and can be used for a cus­ tomized address directory. No directory is automatically provided for this option. Most users should leave the Address Directory 2 option blank.

Address SHS Directory zip4us.shs

This directory enhances normal primary name lookups and is re­ quired for processing.

Developer Guide 128 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Reverse Soundex Address Directory zip4us.rev

This directory enhances primary name lookups.

City Directory city##.dir

The City directory is a table of city names, states, and ZIP Codes. It is organized by state and city.

Postcode Directory zcf10.dir

This directory contains the same data as the City directory, but is or­ ganized by ZIP Code.

Postcode Reverse Directory revzip4.dir

The Reverse ZIP+4 directory enables the software to assign more postal codes when the input data includes a unique ZIP Code and valid ZIP+4.

DPV Path Specify the path to the DPV (Delivery Point Validation) directory files. These directory files are required for CASS certification.

LACSLink Path Specify the path to the LACSLink directory files. These directory files are required for CASS certification.

eLOT Directory elot.dir

The eLOT directory contains eLOT codes for the delivery point that represents the mail carrier's delivery route walk sequence.

Include this directory only if the Enable eLot option in the Assignment Options group is set to YES.

The location for the NCOALink directory files. These directory files are NCOALink Path required for NCOALink processing and certification.

SuiteLink Path The SuiteLink directories contain specially indexed address informa­ tion like secondary numbers and unit designators for locations identi­ fied as high-rise business default buildings. These directory files are required for CASS certification

Specify the path to the SuiteLink directory files.

USPS Log Path Specify the path to the directory for NCOALink, DPV, and LACSLink log files. The software determines the file names during processing as the USPS requires. This directory must already exist and be writable.

It is important to use the same path for all jobs. If you have multiple clients, use the same log file directory for all clients so that the log files are combined.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 129 Option Description

RDI Path The RDI directory indicates if an address is residential or not.

EWS Directory ewsyymmdd.dir

The software lists ew*.dir in the Option Value column, so that the transform finds the most current directory.

Z4 Change Directory z4change.dir

The Z4Change directory lists all the ZIP Codes and ZIP+4 Codes in the country.

9.5 Assignment options

With this option group, you can choose the add-on features that you want to use during processing.

Option Description

Enable DPV Specify whether to perform DPV processing.

YES: Enables DPV processing.

NO: Disables DPV processing.

Enable eLOT Specify whether to perform eLOT processing.

YES: Enables eLOT processing.

NO: Disables eLOT processing.

Enable EWS Specify whether to perform EWS processing.

If this transform cannot make an exact match within the zip4us.dir (Ad­ dress Directory 1), it searches the EWS directory to see if the address is a new delivery point. If the address is located in the EWS directory, the transform marks the record as an EWS match and does not attempt further assignment.

YES: Enables EWS processing.

NO: Disables EWS processing.

Enable LACSLink Specify whether to perform LACSLink processing.

YES: Enables LACSLink processing.

NO: Disables LACSLink processing.

Developer Guide 130 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Enable NCOALink Specify whether to perform NCOALink processing.

YES: Enables NCOALink processing.

NO: Disables NCOALink processing.

Enable RDI Specify whether to perform RDI processing.

YES: Enables RDI processing.

NO: Disables RDI processing.

Enable Reverse Soundex Search Specify whether to use the zip4us.rev (Reverse Soundex) directory in an attempt to make address assignments.

YES: Enables Reverse Soundex.

NO: Disables Reverse Soundex.

Enable SuiteLink Specify whether to perform SuiteLink processing.

YES: Enables SuiteLink processing.

NO: Disables SuiteLink processing.

Dual Address Specify the action to take when the transform encounters a dual address.

POSITION: Selects an address based on the arrangement of the input data. The transform attempts to validate the address that is closest to the lower left corner of the address block. That might be the postal address (rural route or PO Box) or the street address; it depends on how the data was entered.

POSTAL: The transform attempts to validate based on the postal address. If that fails, the transform attempts again based on the street address.

STREET: The transform attempts to validate based on the street address. If that fails, the transform attempts again based on the postal address.

USPS Certification Testing Mode Specify if a certification testing mode is to be used.

CASS: Certification Testing Mode for activated for CASS.

NONE:Certification Testing Mode is not activated.

9.6 Standardization options

This option group contains the standardization settings to define for processing USA address data.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 131 Option Description

Standardize Specifies whether to correct and standardize the assigned address line and lastline data. Assigned YES: Corrects and standardizes your address line and lastline data. Use this value for CASS Address certification.

NO: Does not standardize your address line or lastline data.

Standardize Specifies whether to standardize unassigned data. Unassigned YES: Attempts to parse and standardize any unassigned addresses. Address NO: Leaves unassigned addresses as entered on input.

Developer Guide 132 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Use USPS YES: Enables this option. Primary Name NO: Disables this option. Abbreviation Abbreviates the address line to 30 characters or less when the output address line exceeds 30 characters and when an abbreviated form of the address is available in the directory data supplied by the USPS. Abbreviated forms of an address are only provided for output ad­ dresses 30 characters or greater. If the output address line is already 30 characters or less, the output address line is not abbreviated.

Note

An address line may be output with more than 30 characters in situations where no abbre­ viated form of the address is available in the directory data. If your data must fit exactly into 30 characters, we recommend to set appropriate address output field lengths to 30.

This option affects multiline and standardized last line fields when set to YES.

If the Use USPS Street Abbreviation parameter is set to Yes, it affects the following address components on output:

● Suffix Style: The style will be short. ● Directional Style: The style will be short. ● Address Line Alias: The setting of PRESERVE may be overridden.

Note

When the number of characters in the output is greater than the length specified for the output field, the software attempts to truncate the output data to fit in the output field without eliminating vital address data.

Intelligent truncation abbreviates the output data first, and if it still doesn’t fit the output buffer, it truncates the data.

There are no options to set up intelligent truncation in the US Regulatory Address Cleanse transform. The transform does this automatically.

If the Use USPS Primary Name Abbreviation and/or the Use USPS Locality Abbreviation are enabled, the software uses those abbreviations first. If the values don't fit within the length of the output fields, then the intelligent truncation will occur.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 133 Option Description

Use USPS YES: Enables this option (and affects only the multiline and standardized last line fields). Locality NO: Disables this option. Abbreviation Provides a USPS 13-character city name when one is available. If the city name is not valid (for example, it is a non-mailing city name), the software relies on other settings in the job to determine what to output for city.

If the city name is longer than 13 characters, the software will return an abbreviation that is 13 characters or less. If the city name is already 13 characters or less, the software will not ab­ breviate it.

Note

When the number of characters in the output is greater than the length specified for the output field, the software attempts to truncate the output data to fit in the output field without eliminating vital address data.

Intelligent truncation abbreviates the output data first, and if it still doesn’t fit the output buffer, it truncates the data.

There are no options to set up intelligent truncation in the US Regulatory Address Cleanse transform. The transform does this automatically.

If the Use USPS Primary Name Abbreviation and/or the Use USPS Locality Abbreviation are enabled, the software uses those abbreviations first. If the values don't fit within the length of the output fields, then the intelligent truncation will occur.

Capitalization Specifies the casing of your address data.

LOWER: Converts data to all lowercase letters. For example, "Main Street South" becomes "main street south"

MIXED: Converts data to initial capital letters. For example, "MAIN STREET SOUTH" becomes "Main Street South."

UPPER: Converts data to all capital letters. For example, "Main Street South" becomes "MAIN STREET SOUTH."

Note

If you want consistent casing for your data, make sure that this option and the Capitalization setting in the Data Cleanse transform are the same.

Developer Guide 134 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Directional Style Specifies whether to abbreviate directional data.

LONG: Uses fully-spelled directionals such as "North," "South," "East," "West."

PRESERVE: Preserves the style used in the input record.

SHORT: Uses abbreviated directionals such as "N," "S," "E," "W."

Note

When Use USPS Street Abbreviation is set to YES, the software overrides a setting of LONG and PRESERVE for the Directional Style parameter and outputs the Short directional style.

Primary Type Specifies whether to abbreviate the street (primary) type. Style LONG: Uses fully-spelled primary types such as Street, Avenue, Road.

PRESERVE: Preserves the style used in the input record.

SHORT: Uses abbreviated primary types such as St, Ave, Rd.

Note

When Use USPS Street Abbreviation is set to YES, the software overrides a setting of LONG and PRESERVE for the Primary Type Style parameter and outputs the short primary type style.

Unit Description Specifies how to standardize the unit description.

CONVERT: Uses the unit description found in the postal directory (such as an apartment, suite, room, or floor).

PRESERVE: Preserves the unit description from the input record, correcting any spelling er­ rors.

Retain Pound Outputs “#” into either extraneous fields or to the UNIT_DESCRIPTION output field. Sign in Unit Description YES: Outputs # unit designator to the UNIT_DESCRIPTION output field. NO: Outputs the # unit designator to the EXTRANEOUS_SECONDARY_UNIT_NUMBER and/or the EXTRANEOUS_SECONDARY_ADDRESS_DATA output fields.

The parameter will not affect the following address situations:

● Addresses for Puerto Rico ● Military addresses ● Rural Route addresses ● Addresses without “#” in the address line ● Addresses with remainder words

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 135 Option Description

Append Private Private mailboxes (PMB) are like post office boxes, except that they are hosted by private Mailbox companies.

YES: Places the address and PMB in the same field.

NO: Places the PMB into a separate field. PMB information is output to the NON_POSTAL_SECONDARY_ADDRESS, NON_POSTAL_UNIT, and NON_POSTAL_UNIT_NUMBER fields.

Preserve Dual Specifies whether to preserve or change the dual address order. Address Order YES: Keeps the address order as it was input when it contains both a street and mailing ad­ dress.

NO: Moves the assigned address immediately above the locality and region when the input contains both a locality and mailing address.

Address Line Specifies how to standardize the address line if the input primary address is an alias. Alias CONVERT: Converts address lines to the preferred form found in the postal directory.

PRESERVE: Retains address lines as they were input.

Note

To be compliant with CASS, set up your jobs to return the USPS preferred address. When the Address Line Alias option is set to Convert, the USPS preferred address is returned, even when the input record has a base address or an alias address. You can choose to set up your job to preserve the preferred address (Address Line Alias set to Preserve), but the software will not produce a USPS 3553 form.

Preserve Place Specifies whether to preserve or change non-mailing city names (place names). Names YES: Preserves the non-mailing city name. Given Hollywood as input, the transform would produce Hollywood as output.

NO: Changes non-mailing city names to city names preferred by the USPS. Given Hollywood as input, the transform would produce Los Angeles as output.

Note

When the Use USPS Locality Abbreviation is set to YES, Preserve Place Names is set to YES, and Assign with Input Locality is set to NO, the software may not preserve some place names over 13 characters and will abbreviate them.

Developer Guide 136 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Assign With Specifies whether to use the last line index when assigning the locality (city) name. Input Locality YES: Assigns the Locality1 based on the locality name that is input if it is valid for the Post­ code1. Do not change the Locality1 based on last line index.

NO: Assigns the Locality1 based on the locality that is input if it is valid for the Postcode1 and not a place name; otherwise, assigns Locality1 based on the last-line index of the address line. Produces a more geographically true Locality1. If you choose NO, the value you choose for the Preserve Place Names option does not matter; place names are converted.

Note

When the Use USPS Locality Abbreviation is set to YES, Preserve Place Names is set to YES, and Assign with Input Locality is set to NO, the software may not preserve some place names over 13 characters and will abbreviate them.

Include Unused Specifies what to do with unused address line data. Address Line YES: If the transform finds any extraneous information on the address line, it will be allowed Data to remain in place.

NO: If the transform finds any extraneous information on the address line, it will be stripped off and output through the Address_Line_Remainder1 field. For example, if the address line reads "123 Main St. Bldg X Apt 567", the "Bldg X Apt 567" portion is placed in the Ad­ dress_Line_Remainder1 field.

Add Firm Match YES: Adds secondary address information obtained from SuiteLink directories to the address Secondary line.

NO: Does not add secondary address information obtained from SuiteLink directories to the address line, but includes SuiteLink-found information reflected in the lastline ZIP+4 code and in other output fields.

CASS users who do not want to update address lines in their data with SuiteLink secondary information should set this option to No. The software updates the last line to reflect the Sui­ teLink secondary information in the ZIP+4, and does not update the original address. The software will also update the address line based on your standardization settings in the job setup.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 137 Option Description

Move Multiline If you turn on this feature, the USA Regulatory Address Cleanse transform rearranges your Data multiline data to conform to USPS guidelines.

The transform moves the primary address into position above the locality-region-postal code line (or lastline).

BOTTOM: Rearranges the lines according to USPS guidelines. If there are any blank lines, the transform moves them to the top and shifts the data to the bottom of the block.

NO: Does not rearrange any lines, blank or otherwise.

TOP: Rearranges the lines according to USPS guidelines. If there are any blank lines, the transform moves them to the bottom and shifts the data to the top of the block.

This feature does not require that you standardize your data.

Note: If you choose TOP or BOTTOM, the input field lengths of all fields mapped to Multi­ line must be the same. For example, if Multiline1 is set to a length of 60, all Multiline fields that you use must be set to 60.

Combine Specifies what to do with related fields input on separate lines. Multilines YES: Looks for related fields that were input on separate lines, and tries to put them together on the same line.

NO: Does not try to combine fields.

Multiline Update This option affects multiline data only (data passed in and retrieved through multiline fields). Postcode 1 DONT_UPDATE: Assigns Postcode1 fields, but does not write them to the Multiline output fields. In those fields, leaves the original Postcode1 intact and the assigned Postcode1 is avail­ able in other output fields.

ERASE_THEN_UPDATE: Replaces the original Postcode1 with the assigned Postcode1 in the Multiline output fields. If no Postcode1 is assigned, the original Postcode1 is not included in the Multiline output fields.

UPDATE: Replaces the input Postcode1 with the assigned Postcode1 in the Multiline output fields. If no Postcode1 is assigned, the original is retained.

Multiline Update This option affects multiline data only (data passed in and retrieved through multiline fields). Postcode 2 DONT_UPDATE: Assigns Postcode2, but does not write them to the Multiline output fields. The transform leaves the original Postcode2 intact and the assigned Postcode2 is available in other output fields.

ERASE_THEN_UPDATE: In the Multiline output fields, replaces the original Postcode2 with the assigned Postcode2. If no Postcode2 is assigned, the original Postcode2 will not be available in the Multiline output fields.

UPDATE: In the Multiline output fields, replaces the input Postcode2 with the assigned Post­ code2. If no Postcode2 is assigned, it retains the original.

Developer Guide 138 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference 9.7 Z4 Change options

With the Z4Change options you can turn on Z4Change processing and specify the last time the Postcode2 was updated.

Option Description

Enable Z4 Change Specifies whether to enable Z4Change processing.

Yes: Turns on Z4Change processing.

No: Turns off Z4Change processing.

Last ZIP4 Assign Date Specifies the month and year that the input records were most recently ZIP+4 coded— either through a full address correction process or a previous Z4Change pass.

This date must be entered in a MM/YYYY format. For example, you would enter a data of January 2011 by typing 01/2011.

The USA Regulatory Address Cleanse transform verifies that your date is within the 12- month period covered by the Z4Change file. If there is a date problem, you will receive an error message when you run your project.

9.8 CASS Report options

With this option group, you add the necessary USPS Form 3553 information as required by the USPS when certifying a mailing.

Option Description

List Name List Name

List Owner Specify the name of your company (up to 19 characters).

Mailer Address 1 (2,3, Specify the name and address of the person or organization for whom you are prepar­ and 4) ing the mailing (up to 29 characters per line).

Company Name If you rely on SAP for vendor CASS certification, leave this parameter blank; the trans­ Certified form inserts "SAP " as the default value. If you have your own end-user CASS certifica­ tion from the USPS, type your company name (up to 40 characters).

Software Version If you rely on SAP for vendor CASS certification, you may leave this parameter blank. The transform inserts the appropriate software name and version as the default value.

If you have received end-user CASS certification in your own company's name, type the software name and version number that you use to receive CASS certification.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 139 Option Description

LOT Certification Specify whether you have LOT certification.

Yes: You have LOT certification but you do not have CASS certification in your own name.

No: You have CASS certification in your own name but you did not seek or obtain LOT certification.

In this case, setting LOT Certified to No ensures that the LOT Certification lines on your USPS 3553 forms are blank, which is appropriate.

9.9 Suggestion List options

Set the options in this group to configure how suggestion lists are output.

Option Description

Combine Specify how individual suggestions with overlapping ranges are to be consolidated. Overlapping Ranges COMBINE_IGNORING_GAPS: Ignores gaps and overlaps in primary ranges, so consolida­ tion is more aggressive.

COMBINE_PRESERVING_GAPS: Preserves gaps in primary ranges, but overlapping ranges are consolidated.

NONE: Suggestions are not consolidated.

Address Range Specify a number that represents a span. The software uses the number to present a Window range of addresses around the input primary address range for which to return sugges­ tions.

By using this option, you can limit the suggestions returned to be within a few blocks of your input. For example, assume you entered 500 for this value. Then, you submit the fol­ lowing street address:

1000 Pine St.

Suggestions would only be returned in a range from 750 to 1250 Pine street.

Type “0” if you don't want to limit the ranges returned in suggestions.

Developer Guide 140 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Max Number Specify the maximum number of address-line suggestions that can be generated. The Addresslines maximum number that you can enter is 100.

For example, you could set this option to limit the size of the SOAP documents being sent by the web service, or to limit the maximum number of suggestions that your users would have to choose from.

Note

If you set a low maximum, a viable suggestion could be left out of the suggestion list.

Max Number Specify the maximum number of lastline suggestions that can be generated. The maxi­ Lastlines mum number that you can enter is 15.

For example, you could set this option to limit the size of the SOAP documents being sent by the web service, or to limit the maximum number of suggestions that your users would have to choose from.

Note

If you set a low maximum, a viable suggestion could be left out of the suggestion list.

Address Lines Match Type a value from 0 to 80 to specify the similarity score required for address-line sugges­ Minimum tions.

The similarity score determines what suggestions are returned in the list. A higher num­ ber indicates that the suggestion must be more similar to the input to be returned as a possible suggestion.

Lastlines Match Specify the similarity score required for lastline suggestions. Minimum Type a value from 0 to 80.

The similarity score determines what suggestions are returned in the list. A higher num­ ber indicates that the suggestion must be more similar to the input to be returned as a possible suggestion.

Match Range Specify whether to disregard an address-line suggestion when it does not match the pri­ mary range of the input address.

YES: Returns address-line suggestions only when they match the primary range of the in­ put address.

NO: Returns a possible address-line suggestion when it doesn't have the same primary range as the input.

9.9.1 Suggestion List output options

These options let you choose what information you want to output to the Suggestion_List output field.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 141 Note

Suggestion list fields that do not have a value are not output to the Suggestion_List output field if the style chosen is XML.

Option Description

Style Specify the style of the output file.

Enter DELIMITED or XML.

Delimiter If you chose Delimited for Style, specify the delimiter to use between each suggestion.

Choose any character or string to separate each suggestion. This value should differ from the Field Delimiter value.

Field Delimiter If you chose Delimited for Style, specify the delimiter to use between each suggestion list.

This value should differ from the Delimiter value.

9.9.2 Suggestion list components

Syntax

Specify the address field components that you want to include in the Suggestion_List output field.

Note

Suggestion list field components that do not have a value are not output to the Suggestion_List output field if the style chosen is XML.

Table 1: Lastline Components Option Description

Selection Enter YES to output the Selection number for multiple suggestions.

Locality1 Official Enter YES to output the locality preferred by the USPS. Applicable for primary, secondary, and lastline address levels.

Region1 Enter YES to output the state, province, territory, or region. Applicable for primary, secon­ dary, and lastline address levels.

Postcode Full Enter YES to output the five-digit ZIP Code (not including the four-digit ZIP4). Applicable for primary, secondary, and lastline address levels.

Developer Guide 142 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Table 2: Primary Address Components Option Description

Selection Enter YES to output the selection number for multiple suggestions.

Primary Name Full1 The primary name, primary type, primary prefix, and primary postfix.

Primary Number Enter YES to output the low portion of the premise number range. Applicable for primary Low address level.

Primary Number Enter YES to output the high portion of the premise number range. Applicable for primary High address level.

Primary Prefix1 Enter YES to output the abbreviated directional (N, S, NW, SE) that precedes a street name. Applicable for primary and secondary address levels.

Primary Name1 Enter YES to output the street name description. Applicable for primary and secondary address levels.

Primary Type1 Enter YES to output the abbreviated street type such as “St”, “Ave”, or “Pl”. Applicable for primary and secondary address levels.

Primary Postfix1 Enter YES to output the abbreviated directional (N, S, NW, SE) that follows a street name. Applicable for primary and secondary address levels.

Locality1 Enter YES to output the locality preferred by the USPS. Applicable for primary, secondary, and lastline address levels.

Postcode1 Enter YES to output the five-digit ZIP Code (not including the four-digit ZIP4). Applicable for primary, secondary, and lastline address levels.

Postcode2 Odd Enter YES to output the four-digit ZIP4 code, odd numbers only. Applicable for primary and secondary address levels.

Postcode2 Even Enter YES to output the four-digit ZIP4 code, even numbers only. Applicable for primary and secondary address levels.

Primary Side Enter YES to output Odd or Even for the primary side indicator. Applicable for primary ad­ Indicator dress level.

Option Description

Selection Enter YES to output the selection number for multiple suggestions.

Firm Enter YES to output the firm name for the secondary address.

Unit Description Enter YES to output the unit description, such as “#”, “Apartment”, or “Flat”. Applicable for secondary address level.

Unit Number Low Enter YES to output the low portion of the unit number range. Applicable for secondary address level.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 143 Option Description

Unit Number High Enter YES to output the high portion of the unit number range. Applicable for secondary address level.

Postcode2 Odd Enter YES to output the four-digit ZIP4 code, odd numbers only. Applicable for primary and secondary address levels.

Postcode2 Even Enter YES to output the four-digit ZIP4 code, even numbers only. Applicable for primary and secondary address levels.

Secondary Side Enter YES to output Odd or Even for the secondary side indicator. Applicable for secon­ Indicator dary address level.

9.10 Non Certified options

This option group includes options to process your data without following the CASS certification rules.

Option Description

Disable Certification Specifies whether to run address cleansing for CASS certification.

YES: Runs address cleansing without the restrictions of the CASS certification rules. Choose this value if you want to use any of the other options in this option group. This value also enables non-mailers to process addresses for 14 months after the directory creation date rather than 3 to 4 months for postal discounts through the USPS. You will not receive any postal discounts with this value, or be able to produce a USPS Form 3553.

NO: Runs address cleansing under the CASS certification rules. The other options in this option group will be ignored.

Assign With Input Specifies whether the transform should use the last four digits of the 9-digit postcode (if Postcode present on input), which is usually the Postcode2 field (ZIP+4), during address assign­ ment.

YES: Enables the transform to use the record's last four digits of the 9-digit postcode to try to make a finer level of assignment than it could make under CASS rules. Under CASS rules, the transform doesn't consider the last four digits of the 9-digit postcode. In order for this option to work, the last four digits of the 9-digit postcode must be unique to a valid firm or secondary address.

NO: Disables this option.

Developer Guide 144 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Accept Inexact When an input record has an obsolete postcode or a postcode move, specifies whether Postcode Move the transform should ignore some of the non-matching elements between the input re­ cord and the record in the national directory, and use built-in matching thresholds to de­ termine if the records match.

YES: Ignores the non-matching elements and uses built-in matching thresholds.

NO: Disables this option (and does not ignore non-matching elements).

Assign Postcode2 Not Specifies whether to output the Postcode2 when an assignment is made even when Ena­ DPV Validated ble DPV is set to No and Disable Certification is set to Yes. The output address is not vali­ dated by DPV.

YES: Assigns the Postcode2 when Enable DPV is set to No.

NO: Leaves the Postcode2 blank when an assignment is made and Enable DPV is set to No.

Enable Parse Only Specifies whether the transform should parse and validate your data or parse only.

YES: Parses records into their discrete components, but does not perform a lookup in the postal directories. This mode is fast, but parsing results are unverified.

NO: Parses records into their discrete components and performs a lookup in the postal directories. This mode might be slower, but parsing results are verified.

Enable Suggestion Specifies whether suggestion lists are generated for records that cannot be assigned. Lists This option is for transactional projects.

YES: Generates suggestion lists.

NO: Does not generate suggestion lists.

9.11 USPS license information options

This group of options is required for all users performing NCOALink, SuiteLink, LACSLink, and DPV processing. You must provide information about the company performing the processing (the licensee) and the company for whom the licensee is performing the processing (the customer). If you are performing the processing for yourself, you are the licensee and the customer.

The following table describes the USPS Licensee and Customer Information Options for the USA Regulatory Address Cleanse transform.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 145 Option Description

Licensee Name This field is required for NCOALink.

The name of the company as mentioned in the license agreement with the USPS, up to 30 characters. The licensee performs the NCOALink processing.

This information appears in the PAF log and NCOALink Processing Summary report.

List Owner NAICS This is a required field. Code Enter the North American Industry Classification System (NAICS) code, which identi­ fies in which business the list owner engages. The provided substitution parameter for this option is $$CompanyNAICSCode. For more information, visit the NAICS Web site at http://www.census.gov/epcd/www/naics.html .

List ID The Customer or List ID is required for NCOALink limited and full service providers. End users may leave it blank.

A unique ID assigned by the licensee to identify the list owner (customer). If the licen­ see does not have a naming scheme in place for the customer or lists, the 6 digits could be made up of the following:

● First 3 digits: Customer name/identifier ● Last 3 digits: List name identifier

Customer Company The customer is the person or company for whom you are performing NCOALink proc­ Name essing. End users may leave these fields blank unless Stop Processing Alternative func­ tionality is enabled. Customer Company Address The customer information appears in the NCOALink Processing Summary report and log files. Customer Company Locality These fields are required when Alternate Stop processing is required.

Customer Company The provided substitution parameters for these fields are: Region $$CompanyName

Customer Company $$CompanyAddress Postcode1 $$CompanyLocality Customer Company Postcode2 $$CompanyRegion $$CompanyPostcode1

$$CompanyPostcode2

Customer Company This is an optional field. The provided substitution parameter for this field is $$Compa­ Phone nyPhone.

List Processing This 2-digit number (from 1 to 52) indicates how many times per year the list is proc­ Frequency essed with NCOALink.

If the list owner has other lists processed by the NCOALink licensee at different fre­ quencies, enter 99.

List Received Date Enter the date when the NCOALink licensee received the list. Use the yyyy/mm/dd for­ mat. If you are an end user, you may leave this blank.

Developer Guide 146 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

List Return Date Enter the date when the list will be returned to the customer. Use the yyyy/mm/dd for­ mat. If you are an NCOALink end user, you may leave this blank.

IMB Mailer ID This is an optional field.

Enter your unique Intelligent Mail barcode (IMB) mailer ID that you received from the USPS if applicable. The provided substitution parameter for this field is $$IMBMailerID.

The IMB Mailer ID is a unique 6-digit or 9-digit numeric code assigned to mailers by the USPS based on their annual mail volumes. This information is included in the NCOALink Processing Acknowledgement Form (PAF).

Provider Level This option lists the provider levels for which you have a registered license keycode. De­ faults to the substitution parameter $$USPSProviderLevel.

Only provider levels supported in your registered keycodes display in the option list.

9.11.1 Required options for USPS License Information

If you are processing NCOALink, DPV, or LACSLink, you must provide information in the USPS License Information group.

Option NCOALink Full NCOALink Limited NCOALink DPV LACSLink Service Pro­ Service Provider End User vider

Licensee Name X X X

List Owner NAICS Code X X X

List ID X X X

Customer Company Name X X X X X

Customer Company Address X X X X X

Customer Company Locality X X X X X

Customer Company Region X X X X X

Customer Company Postcode1 X X X X X

Customer Company Postcode2 X X X X X

Customer Company Phone

List Processing Frequency X X X

List Received Date X X

List Return Date X X

Provider Level X X X

IMB Mailer ID

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 147 9.12 NCOALink options

The following options are available in the NCOALink group.

Option Description

Mailing List Name Enter the name of this list, up to 30 characters. If this list is a master house list or your only mailing list, consider entering your company name here. This name appears in the log files.

Platform ID The platform ID is the NCOALink licensee's identification number that is as­ signed by the USPS. It's exactly four characters long.

9.12.1 Processing options

The following table describes the NCOALink Processing Options.

Option Description

List Processing Mode CHANGE_OF_ADDRESS: You’re processing this job to update it with the lat­ est address data. Default.

STATISTICS_ONLY: You’re processing this job to analyze statistics such as the number of records in your list that have updated addresses and the number of moves of each type. When you choose this option, you do not re­ ceive move-updated addresses.

RETURN_CODES_ONLY: You’re processing this job for informational pur­ poses. When you choose this option and post to the NCOALink_Re­ turn_Code or ANKLink_Return_Code output component, you can see the return codes, which further explain whether matching records were found in the NCOALink directories and why or why not. With this option, you do not receive move-updated addresses.

Retrieve Move Types Choose the types of moves to process:

BUSINESS: Business moves only.

INDIVIDUAL:Retrieve Move Types Individual moves only.

INDIVIDUAL_AND_BUSINESS

INDIVIDUAL_AND_FAMILY

INDIVIDUAL_AND_FAMILY_AND_BUSINESS: Default.

Developer Guide 148 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

List Processing Objective Specify your reason for using NCOALink:

EMPLOYEE_TRAINING: You’re processing this file as part of employee train­ ing.

INTERNAL_DATABASE_TESTING: You’re testing with a licensee-owned da­ tabase.

MARKETING: You’re testing with external customer lists.

NORMAL: You’re processing the mailing list to update it before a mailing. Default.

STAGE_I and STAGE_II: You’re testing the matching performance against a USPS test file. The USPS scores the Stage II test file. Choose Stage I or Stage II only if you are processing a USPS test file.

SYSTEM_TESTING: You’re processing this file as part of system testing such as loading USPS file updates.

Note

When certifying for CASS, indicate the reason in the Assignment Options USPS Certification Testing Mode option.

High Match Rate Expectancy The USPS wants to distinguish between files that have a legitimate reason for a high percentage of NCOALink matches and files that are fraudulently used to create mover lists. Select NONE or leave blank if you don’t expect a high match rate. This option provides legitimate reasons for a high match rate.

NONE: Default.

ANKLINK_PROCESSED_LIST: An ANKLink-processed file contains records for people who have moved but you don’t yet have their new address. This option is available only to full service providers.

STAGE_FILE: If you’re performing Stage I or Stage II testing, ensure that the List Processing Objective is set to a Stage option.

RETURN_MAIL_LIST: A returned mail list file contains records for mail that was returned to sender.

Consider Moves Within Months Use this setting to ignore change-of-address data older than the specified number of months. For example, enter 12 to use change-of-address data that has a move-effective date within the last 12 months.

If you are an end user or limited service provider, enter a value from 6 to 18. If you’re a full service provider or using ANKLink, enter a value from 6 to 48. If the option is blank, the transform uses all available data based on your li­ cense. Default is blank.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 149 Option Description

Processing First Class Mail Select the types of mail to process by selecting YES or NO for each option.

Processing Periodicals Processing First Class Mail defaults to YES; the others default to NO. Processing Standard Mail

Processing Package Service Mail

External Processes Updating List Indicate whether the list undergoes additional processing before or after the USA Regulatory Address Cleanse transform.

9.12.2 Report Options

The following table describes the NCOALink Report Options.

Option Description

Generate Return Code Descriptions The NCOALink Processing Summary report always includes a brief summary of return codes, and you can include more detailed return code descriptions using this option. Return codes indicate whether a record was affected by a move, how the NCOALink match was made, or why a match could not be made.

YES: Includes the report codes in the NCOALink Processing Sum­ mary report.

NO: Default. Excludes report codes from the NCOALink Processing Summary report.

9.12.3 Output options

The following table describes the NCOALink Output Options.

Option Description

Apply Move to Standardized Fields Component output fields are not affected by this option.

YES: Default. The transform updates standardized fields to contain details about the address available through NCOALink.

NO: Standardized output fields will have the standardized version of input rather than the moved address.

9.12.4 Processing Acknowledgment Form (PAF) Details

The following table describes the NCOALink PAF Details. PAF Details are not required for end users.

Developer Guide 150 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Type INITIAL: This is the first PAF you’re completing to become authorized to process ad­ dresses for this particular customer.

MODIFIED: You’re completing a new PAF because some information on your old one changed.

RENEWAL: You’re completing a new PAF because your old one is expiring.

Name Of Person Enter the name of the person signing this PAF, up to 50 characters. Signing

Title Of Person Enter the job title of the person signing this PAF, up to 50 characters. Signing

Email of Person Enter the email address for the person who is signing the PAF. You can leave this parame­ Signing ter blank.

Company Website Enter the company website address for the person signing the PAF. You can leave this pa­ rameter blank.

Date Signed By Enter the date the customer signed the PAF in yyyy/mm/dd format. Customer

Date Signed By Enter the date that the licensee (the NCOALink service provider) signed the PAF in Licensee yyyy/mm/dd format.

Customer Parent If the list owner’s company is owned by another company, enter the parent company’s Company Name name here.

Customer Alternate If the list owner’s company is also known by another name, enter that alternate name Company Name here.

Using Alternative YES: Select if you are using a PAF that is not the USPS form, (you must have permission PAF from the USPS).

NO: The default setting for this field.

This field requires either a YES or NO.

Using Cooperative Indicates whether the list is from a cooperative database. Applicable for Full and Limited Database Service Providers only.

YES

NO: The default setting for this field.

Note

A PAF must be on file for each participant in the cooperative database. When set to YES, a “C” is included in the PAF log to indicate that the list processed was a coopera­ tive database.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 151 9.12.5 Service provider options

The following table describes the NCOALink Service Provider Options. These options are not required for end users.

Option Description

Buyer Company Name If the list was processed for rent, sale, or lease, enter the name of the company or individual who bought the list.

Postcode For Mail Entry Enter the ZIP Code of the Business Mail Entry Unit (BMEU) or post office where the mail will be submitted for mailing.

Pre Processes Performed Indicate whether you processed or will process this data before performing NCOALink processing.

YES

NO

Pre Processed Data Modified NO: If you will have processed this data before performing NCOA­ Link processing, indicates the preprocessing does not include changes to the data. Default.

FROM_POSTAL_DATA: If you will have processed this data before performing NCOALink processing, indicates whether that prepro­ cessing includes changes with postal data.

FROM_NON_POSTAL_DATA: If you will have processed this data before performing NCOALink processing, indicates whether that preprocessing includes changes with non-postal data.

FROM_BOTH: If you will have processed this data before perform­ ing NCOALink processing, indicates that the preprocessing in­ cludes changes with both postal and non-postal data.

Post Processes Performed Indicate whether you will process this data after performing NCOA­ Link processing.

YES

NO

Developer Guide 152 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Post Processed Data Modified NO: If you will process this data after performing NCOALink proc­ essing, indicates the postprocessing will not include changes to the data. Default.

FROM_POSTAL_DATA: If you will process this data after performing NCOALink processing, indicates whether the postprocessing in­ cludes changes with postal data.

FROM_NON_POSTAL_DATA: If you will process this data after per­ forming NCOALink processing, indicates whether that postprocess­ ing includes changes with non-postal data.

FROM_BOTH: If you will process this data after performing NCOA­ Link processing, indicates the postprocessing includes changes with both postal and non-postal data.

Concurrent Processes Performed Indicates whether you processed or will process this data in some other way while performing NCOALink processing.

YES

NO

Concurrent Processed Data Modified NO: If you will process this data in some other way while performing NCOALink processing, indicates the concurrent processing does not include changes to the data. Default.

FROM_POSTAL_DATA: If you will process this data in some other way while performing NCOALink processing, indicates whether that processing includes changes with postal data.

FROM_BOTH: If you will process this data in some other way while performing NCOALink processing, indicates whether that process­ ing includes changes with non-postal data.

FROM_BOTH: If you will process this data in some other way while performing NCOALink processing, indicates whether that process­ ing includes changes with both postal and non-postal data.

In House List Processing Indicates whether the list is an in house (internal) list. Applicable for Full Service Providers only.

YES

NO

Note

When set to Yes, an “I” is included in the CSL to indicate that the list was an in house list.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 153 Option Description

Output Returned Identifies the type of output returned to the client.

STANDARD: All required NCOALink output was returned to the cli­ ent. Default.

MODIFIED: One or more post processes modified the return infor­ mation (updates were applied to the list).

BOTHOutput Returned : One or more post processes modified the return information (updates were applied to the list); however, a separate file containing all the required output was also returned.

Additional Notes NONE: Default.

CUSTOMER_REQUESTED_EXTENTION: Select if the customer sub­ mitted a written request for an extension.

9.12.6 Contact Details

The following table describes the NCOALink Contact Details contained in the section. These options are not required for end users.

Option Description

Type BROKER: A broker directs business to the service provider.

LIST_ADMINISTRATOR: A list administrator stores and maintains address lists.

License Assigned ID Enter a unique six-character ID number for the broker or list administrator. You assign the ID number.

Contact Level Enter the degree of separation this contact is from you from 1 to 99. For exam­ ple, enter 1 if you received the list from this contact. If your contact received the list from a different broker, then enter 2 for this contact.

Note that the transform doesn't use this value in any logs.

NAICS Code Enter the broker’s or list administrator’s numeric North American Industry Clas­ sification System code, which identifies the business in which they engage. For more information, see http://www.census.gov/epcd/www/naics.html .

Name Enter the broker's or list administrator's name.

Address Enter the broker's or list administrator's address.

Locality Enter the broker's or list administrator's locality (city).

Region Enter the broker's or list administrator's region (state).

Developer Guide 154 © 2014 SAP SE or an SAP affiliate company. All rights reserved. USA Regulatory Address Cleanse transform reference Option Description

Postcode1 Enter the broker's or list administrator's Postcode1 (ZIP code).

Postcode2 Enter the broker's or list administrator's Postcode2 (ZIP+4 code).

Phone Enter the broker's or list administrator's phone number.

Contact Company Website Enter the website of the broker or list administrator. You can leave this parame­ ter blank.

PAF Signing Date Enter the date when this contact signed the PAF in the format yyyy/mm/dd.

Developer Guide USA Regulatory Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 155 10 Global Address Cleanse

The Global Address Cleanse transform identifies, parses, validates, and corrects global address data, such as primary number, primary name, primary type, directional, secondary identifier, secondary number, locality, region and postcode.

10.1 Supported countries (Global Address Cleanse)

There are several countries supported by the Global Address Cleanse transform. The level of correction varies by country and by the engine that you use. Complete coverage of all addresses in a country is not guaranteed.

For the Global Address engine, country support depends on which sets of postal directories you have purchased.

For Japan, the assignment level is based on data provided by the Ministry of Public Management Home Affairs, Posts and Telecommunications (MPT).

During Country ID processing, the transform can identify many countries. However, the Global Address Cleanse transform's engines may not provide address correction for all of those countries.

10.2 Processing Japanese addresses

The Global Address Cleanse transform's Global Address engine parses Japanese addresses. The primary purpose of this transform and engine is to parse and normalize Japanese addresses for data matching and cleansing applications.

A significant portion of the address parsing capability relies on the Japanese address database. The software has data from the Ministry of Public Management, Home Affairs, Posts and Telecommunications (MPT) and additional data sources. The enhanced address database consists of a regularly updated that includes regional postal codes mapped to localities.

10.2.1 Standard Japanese address format

A typical Japanese address includes the following components.

Address component Japanese English Output field(s)

Postal code 〒 654-0153 654-0153 Postcode_Full

Prefecture 兵庫県 Hyogo-ken Region1_Full

City 神戸市 Kobe-shi Locality1_Full

Ward 須磨区 Suma-ku Locality2_Full

Developer Guide 156 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse Address component Japanese English Output field(s)

District 南落合 Minami Ochiai Locality3_Full

Block number 1 丁目 1 chome Primary_Name_Full1

Sub-block number 25 番地 25 banchi Primary_Name_Full2

House number 2 号 2 go Primary_Number_Full

An address may also include building name, floor number, and room number.

Postal code

Japanese postal codes are in the nnn-nnnn format. The first three digits represent the area. The last four digits represent a location in the area. The possible locations are district, sub-district, block, sub-block, building, floor, and company. Postal codes must be written with Arabic numbers. The post office symbol 〒 is optional.

Before 1998, the postal code consisted of 3 or 5 digits. Some older databases may still reflect the old system.

Prefecture

Prefectures are regions. Japan has forty-seven prefectures. You may omit the prefecture for some well known cities.

City

Japanese city names have the suffix 市 (-shi). In some parts of the Tokyo and Osaka regions, people omit the city name. In some island villages, they use the island name with a suffix 島 (-shima) in place of the city name. In some rural areas, they use the county name with suffix 郡 (-gun) in place of the city name.

Ward

A city is divided into wards. The ward name has the suffix 区(-ku). The ward component is omitted for small cities, island villages, and rural areas that don't have wards.

District

A ward is divided into districts. When there is no ward, the small city, island village, or rural area is divided into districts. The district name may have the suffix 町 (-cho/-machi), but it is sometimes omitted. 町 has two possible pronunciations, but only one is correct for a particular district.

Developer Guide Global Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 157 In very small villages, people use the village name with suffix 村 (-mura) in place of the district.

When a village or district is on an island with the same name, the island name is often omitted.

Sub-district

Primarily in rural areas, a district may be divided into sub-districts, marked by the prefix 字 (aza-). A sub-district may be further divided into sub-districts that are marked by the prefix 小字 (koaza-), meaning small aza. koaza may be abbreviated to aza. A sub-district may also be marked by the prefix 大字 (oaza-), which means large aza. Oaza may also be abbreviated to aza.

Here are the possible combinations:

● oaza ● aza ● oaza and aza ● aza and koaza ● oaza and koaza

Note

The characters 大字(oaza-), 字(aza-), and 小字 (koaza-) are frequently omitted.

Sub-district parcel

A sub-district aza may be divided into numbered sub-district parcels, which are marked by the suffix 部 (-bu), meaning piece. The character 部 is frequently omitted.

Parcels can be numbered in several ways:

● Arabic numbers (1, 2, 3, 4, and so on) 石川県七尾市松百町 8 部 3 番地 1 号 ● Katakana letters in iroha order (イ, ロ, ハ, ニ, and so on) ● 石川県小松市里川町ナ部 23 番地 ● Kanji numbers, which is very rare (甲, 乙, 丙, 丁, and so on) 愛媛県北条市上難波甲部 311 番地

Sub-division

A rural district or sub-district (oaza/aza/koaza) is sometimes divided into sub-divisions, marked by the suffix 地割 (-chiwari) which means division of land. The optional prefix is 第 (dai-)

The following address examples show sub-divisions:

岩手県久慈市旭町 10 地割 1 番地

Developer Guide 158 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse 岩手県久慈市旭町第 10 地割 1 番地

Block number

A district is divided into blocks. The block number includes the suffix 丁目 (-chome). Districts usually have between 1 and 5 blocks, but they can have more. The block number may be written with a Kanji number. Japanese addresses do not include a street name.

東京都渋谷区道玄坂2丁目25番地12号

東京都渋谷区道玄坂二丁目25番地12号

Sub-block number

A block is divided into sub-blocks. The sub-block name includes the suffix 番地 (-banchi), which means numbered land. The suffix 番地 (-banchi) may be abbreviated to just 番 (-ban).

House number

Each house has a unique house number. The house number includes the suffix 号 (-go), which means number.

Block, sub-block, and house number variations

Block, sub-block, and house number data may vary. Possible variations include the following:

Dashes

The suffix markers 丁目(chome), 番地 (banchi), and 号(go) may be replaced with dashes.

東京都文京区湯島 2 丁目 18 番地 12 号

東京都文京区湯島 2-18-12

Sometimes block, sub-block, and house number are combined or omitted.

東京都文京区湯島 2 丁目 18 番 12 号

東京都文京区湯島 2 丁目 18 番地 12

東京都文京区湯島 2 丁目 18-12

Developer Guide Global Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 159 No block number

Sometimes the block number is omitted. For example, this ward of Tokyo has numbered districts, and no block numbers are included. 二番町 means district number 2.

東京都 千代田区 二番町 9 番地 6 号

Building names

Names of apartments or buildings are often included after the house number. When a building name includes the name of the district, the district name is often omitted. When a building is well known, the block, sub-block, and house number are often omitted. When a building name is long, it may be abbreviated or written using its acronym with English letters.

The following are the common suffixes:

Suffix Romanized Translation

ビルディング birudingu building

ビルヂング birudingu building

ビル biru building

センター senta- center

プラザ puraza plaza

パーク pa-ku park

タワー tawa- tower

会館 kaikan hall

棟 tou building (unit)

庁舎 chousha government office building

マンション manshon condominium

団地 danchi apartment complex

アパート apa-to apartment

荘 sou villa

住宅 juutaku housing

社宅 shataku company housing

Developer Guide 160 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse Suffix Romanized Translation

官舎 kansha official residence

Building numbers

Room numbers, apartment numbers, and so on, follow the building name. Building numbers may include the suffix 号室 (-goshitsu). Floor numbers above ground level may include the suffix 階 (-kai) or the letter F. Floor numbers below ground level may include the suffix 地下階 (chika kai) or the letters BF (where represents the floor number). An apartment complex may include multiple buildings called Building A, Building B, and so on, marked by the suffix 棟 (-tou).

The following address examples include building numbers.

● Third floor above ground 東京都千代田区二番町9番地6号 バウエプタ3 F ● Second floor below ground 東京都渋谷区道玄坂 2-25-12 シティバンク地下 2 階 ● Building A Room 301 兵庫県神戸市須磨区南落合 1-25-10 須磨パークヒルズ A 棟 301 号室 ● Building A Room 301 兵庫県神戸市須磨区南落合 1-25-10 須磨パークヒルズ A-301

10.2.2 Special Japanese address formats

Hokkaido regional format

The Hokkaido region has two special address formats:

● super-block ● numbered sub-districts

Super-block

A special super-block format exists only in the Hokkaido prefecture. A super-block, marked by the suffix 条 (-joh), is one level larger than the block. The super-block number or the block number may contain a directional 北 (north), 南 (south), 東 (east), or 西 (west). The following address example shows a super-block 4 Joh.

北海道札幌市西区二十四軒 4 条4丁目13番地7号

Developer Guide Global Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 161 Numbered sub-districts

Another Hokkaido regional format is numbered sub-district. A sub-district name may be marked with the suffix 線 (-sen) meaning number instead of the suffix 字 (-aza). When a sub-district has a 線 suffix, the block may have the suffix 号 (-go), and the house number has no suffix.

The following is an address that contains first the sub-district 4 sen and then a numbered block 5 go.

北海道旭川市西神楽4線5号3番地11

Accepted spelling

Names of cities, districts and so on can have multiple accepted spellings because there are multiple accepted ways to write certain sounds in Japanese.

Accepted numbering

When the block, sub-block, house number or district contains a number, the number may be written in Arabic or Kanji. For example, 二番町 means district number 2, and in the following example it is for Niban-cho.

東京都千代田区二番町九番地六号

P.O. Box addresses

P.O. Box addresses contain the postal code, Locality1, prefecture, the name of the post office, the box marker, and the box number.

Note

The Global Address Cleanse transform recognizes P.O. Box addresses that are located in the Large Organization Postal Code (LOPC) database only.

The address may be in one of the following formats:

● Prefecture, Locality1, post office name, box marker (私書箱), and P.O. Box number. ● Postal code, prefecture, Locality1, post office name, box marker (私書箱), and P.O. Box number.

The following address example shows a P.O. Box address:

The Osaka Post Office Box marker #1

大阪府大阪市大阪支店私書箱 1 号

Developer Guide 162 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse Large Organization Postal Code (LOPC) format

The Postal Service may assign a unique postal code to a large organization, such as the customer service department of a major corporation. An organization may have up to two unique postal codes depending on the volume of mail it receives. The address may be in one of the following formats:

● Address, company name ● Postal code, address, company name

The following is an example of an address in a LOPC address format.

100-8798 東京都千代田区霞が関1丁目 3 - 2 日本郵政 株式会社

10.3 Process Chinese addresses

The Global Address Cleanse transform's Global Address engine parses Chinese addresses. The primary purpose of this transform and engine is to parse and normalize addresses for data matching and cleansing applications.

10.3.1 Chinese address format

Chinese Addresses are written starting with the postal code, followed by the largest administrative region (for example, province), and continue down to the smallest unit (for example, room number and mail receiver). When people send mail between different prefectures, they often include the largest administrative region in the address. The addresses contain detailed information about where the mail will be delivered. Buildings along the street are numbered sequentially, sometimes with odd numbers on one side and even numbers on the other side. In some instances both odd and even numbers are on the same side of the street.

Postal Code

In China, the Postal Code is 6-digit number to identify the target delivery point of the address, and often has the prefix 邮编

Country

中华人民共和国 (People's Republic of China) is the full name of China. We often use the words " 中国 (PRC)" as an abbreviation of the country name. For mail delivered within China, the domestic addresses often omit the Country name of the target address

Developer Guide Global Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 163 Prefecture

In China, "prefectures" are similar to what a "state" is in the US. China has 34 prefectures in total. Each prefecture name is followed by a suffix, which is one of the four words ( 省,市,自治区 ,特别行政区 ). The race name will be included in some municipalities.

City

The city that is located in the prefecture, city name is usually followed by the suffix 市 or 地区

District

The administrative district that belongs to the specified City; in China, the district name often has two suffixes 区 , 新区 , 市 and 县 (county).

Street information

Specifies the delivery point where the mail receiver can be found. In China, the street information often has the form of [Town name] -> [Village name] -> Street (Road) name ->[Block number]-> House number. The components within the brackets [ ] are optional components in the address, and may often be omitted.

● Town name: The town name is usually followed by the suffix 镇 , 乡, or 堡. ● Village name: The village name is usually followed by the suffix 村, 新村. ● Street (Road) name: The street (road) name is usually followed by one of these suffixes 路, 大道, 街, 大街, 里. ● Block number: The block number is usually followed by one of these suffixes 弄 , 巷 , 厅 , 胡同. ● House number: The house number is followed by the suffix 号, and the house name is a unique number within the block or street/road.

Common metro address

This address includes the District name, which is common for metropolitan areas in major cities.

Address component Chinese English Output field

Postcode 510030 510030 Postcode_Full

Country 中国 China Country

Province 广东省 Guangdong Prov­ Region1_Full ince

Developer Guide 164 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse Address component Chinese English Output field

City name 广州市 Guangzhou City Locality1_Full

District name 越秀区 Yuexiu District Locality2_Full

Street name 西湖路 Xihu Road Primary_Name_Full1

House number 99 号 No. 99 Primary_Number_Full

Rural address

This address includes the Village name, which is common for rural addresses.

Address component Chinese English Output field

Postcode 5111316 5111316 Postcode_Full

Country 中国 China Country

Province 广东省 Guangdong Prov­ Region1_Full ince

City name 广州市 Guangzhou City Locality1_Full

County-level City name 增城市 Zengcheng City Locality2_Full

Town name 荔城镇 Licheng Town Locality3_Full

Village name 联益村 Lianyi Village Locality4_Full

Street name 光大路 Guangda Road Primary_Name_Full1

House number 99 号 No. 99 Primary_Number_Full

10.3.2 Sample Chinese address

This address has been processed by the Global Address Cleanse transform and the Global Address engine.

Input

510830 广东省广州市花都区赤坭镇广源路 1 号星辰大厦 8 层 809 室

Address-Line fields

Primary_Name1 广源

Primary_Type1 路

Developer Guide Global Address Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 165 Address-Line fields

Primary_Number 1

Primary_Number_Description 号

Building_Name1 星辰大厦

Floor_Number 8

Floor_Description 层

Unit_Number 809

Unit_Description 室

Primary_Address 广源路 1 号

Secondary_Address 星辰大厦 8 层 809 室

Primary_Secondary_Address 广源路 1 号星辰大厦 8 层 809 室

Lastline fields

Country 中国

Postcode_Full 510168

Region1 广东

Region1_Description 省

Locality1_Name 广州

Locality1_Description 市

Locality2_Name 花都

Locality2_Description 区

Locality3_Name 赤坭

Locality3_Description 镇

Lastline 510830 广东省广州市花都区赤坭镇

Non-parsed fields

Status_Code S0000

Assignment_Type S

Address_Type S

Developer Guide 166 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse 11 Global Address Cleanse transform reference

The Global Address Cleanse transform identifies, parses, validates, and corrects global address data, such as primary number, primary name, primary type, directional, secondary identifier, and secondary number.

Note

The Global Address Cleanse transform does not support CASS certification or produce data for a USPS Form 3553. If you want to certify your U.S. address data, you must use the USA Regulatory Address Cleanse transform, which supports CASS.

If you perform both address cleansing and data cleansing, you will typically perform the Global Address Cleanse processing before the Data Cleanse processing.

The following sections describe the configurations for the Global Address Cleanse XML. You can find examples of the XML configurations with the samples installed with the product.

11.1 System group

The System group controls high-level transform functions.

Note

If an option exists in the XML, but it is not documented here, do not alter the contents of the option. Editing the contents could cause errors.

Option Description

Link dir Enter the relative path to the match support files.

For example, the default Windows 32-bit location of these files is C:\Program Files\SAP BusinessObjects\Data quality Mgmt SDK\windows_32\DataQuality\gac.

You would then enter C:\Program Files\SAP BusinessObjects\Data quality Mgmt SDK\windows_32\

11.2 Report and analysis

Choose to generate report data for the Global Address Cleanse transform.

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 167 Option Description

Generate Report This option controls the statistics output for the StatsHandler. Data YES: Generates statistics for this transform.

NO: Turns off statistics generation.

11.3 Reference files

Reference files are directories required by the Global Address Cleanse transform to process your data. The configuration path option sets the location of the only Global Address Cleanse reference file.

11.4 Country ID options (Global Address Cleanse)

The Country ID option group specifies whether or not to use Country ID processing. This option group is required.

Option Description

Script Code Specifies the ISO four-character script code for your data.

CJKK: Chinese, Japanese, and Korean

CYRL: Cyrillic

GREK: Greek

LATN: Latin

Country ID Mode Specifies whether to always use the specified Country Name or to run Country ID proc­ essing.

CONSTANT: Assumes all of your input data is for the specified Country Name and does not run Country ID processing. Choose this option only if all of your data is from one country, such as Australia. This option may save processing time.

ASSIGNED: Runs Country ID processing. Choose this option if the input data contains ad­ dresses from more than one country.

Developer Guide 168 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference Option Description

Country Name Specifies the country of destination.

NONE: Select when the Country ID Mode is set to Assigned and you don't want a default country to be set when the country cannot be identified.

Special considerations:

● If the Country ID Mode is set to Constant, choose the country of destination from the Country Name list. The transform assumes that all of your data is for this country. Note: You cannot choose None if the Country ID Mode is set to Constant. ● If the Country ID Mode is set to Assigned, choose a country name to be used when the Country ID could not identify a country. ● If Country Name is set to None, then the address will be sent to the Default engine, Global Address.

Country ID Only YES: Only Country ID processing will be performed. Mode NO: Country ID and other Global Address Cleanse processing will be performed. This is the default setting.

Use Extended Local­ TRUE: Searches ga_locality.dir in addition to ga_major_localities.dct. To improve per­ ity List formance and reduce bad locality matches, MATCH_ALL_WORDS is set to TRUE, and one word common words, primary types, and directionals are not looked up.

FALSE: This is the default setting.

11.5 Engines

This section assigns the engines that you want to use with the Global Address Cleanse transform.

The Global Address Cleanse transform must always have one or more of the Global Address Cleanse engines enabled in order to process your data.

This option group is required.

Yes: Activates the engine for this transform.

No: De-activates the engine for this transform.

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 169 Option Description

Engines Specify which engine to use with this transform.

CANADA

GLOBAL_ADDRESS

USA

Dynamic Engine Init: Specify whether engines that are used for the Global Address Cleanse transform are limited to the engines that can be initialized:

● Yes: Uses only the engines that can successfully be initialized. If an engine fails to ini­ tialize, a warning is issued and the job continues. ● No: Uses all enabled engines. If an engine fails to initialize, the job fails. This is the de­ fault value.

11.6 Standardization options

The Standardization Options group includes all the options that you need to standardize your address data. These settings apply to the country that you specify for the Country Name option. This option group is required.

Note

You can set these options for all countries or by individual country.

Option Description

Country Name Specifies the country to which the other options apply. The default is GLOBAL.

Address Line Specifies how to standardize the address line. (Engines supported: Canada, Global Address, Alias and USA)

CONVERT: Converts address lines based on Official address line components instead of De­ livery address line components.

PRESERVE: Retains non-preferred data in address lines unless the data is incorrect.

Assign Locality Specifies how to standardize the locality name.

CONVERT: Converts the locality name to the locality name preferred by the country's postal authority.

PRESERVE: Preserves the input locality name unless it is incorrect.

VALID: Retains the input locality name unless it is not valid for mailing. If it is not valid for mail­ ing, replaces it with the preferred locality name.

Developer Guide 170 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference Option Description

Capitalization Specifies the casing of your address data.

MIXED: Converts data to initial capitals. For example, "MAIN STREET SOUTH" becomes "Main Street South."

UPPER: Converts data to full capitals. For example, "Main Street South" becomes "MAIN STREET SOUTH."

Note

If you want consistent casing for your data, make sure that this option and the Capitaliza­ tion setting in the Data Cleanse transform are the same.

Character Width Specifies whether to standardize half-width and full-width characters. Style NORMAL_WIDTH: Make no changes to character width.

HALF_WIDTH: Convert all characters to half width.

FULL_WIDTH: Convert all characters to full width.

Correct As­ Specifies whether to use the parsed or corrected data for the assigned output fields of type signed Data Best.

YES: Populates the Best components with corrected data.

NO: Populates the Best components with parsed data.

Note

If you choose No for this option, the Capitalization option is the only available Standardiza­ tion option for your assigned data.

Correct Unas­ Specifies whether the Global Address Cleanse transform should standardize your unassigned signed Data data.

YES: Populates the Best components with corrected data.

NO: Populates the Best components with parsed data.

Note

If you choose No for this option, the Capitalization option is the only available Standardiza­ tion option for your unassigned data.

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 171 Option Description

Country Style Specifies how to standardize the country data.

ISO_2CHAR: Standardizes country data to the two-character ISO code, such as AU, CA, or US.

ISO_3CHAR: Standardizes country data to the three-character ISO code, such as AUS, CAN, or USA

ISO_3DIGIT: Standardizes country data to the three-digit ISO code, such as 038, 124, or 840.

NAME: Standardizes country data to the full country name, such as Australia, Canada, or United States.

PRESERVE: Attempts to retain the country data in the input record, otherwise uses the cor­ rected country value.

Directional Specifies whether to use punctuation in the abbreviated directional data. Punctuation PRESERVE: If punctuation was provided on input, retains it on output with corrections applied (for example, NW. on input with one period will be N.W. on output with two periods).

YES: Outputs directionals with punctuation (for example, N. or S.W.)

NO: Outputs directionals without punctuation (for example, N, SW).

Directional Style Specifies whether to abbreviate directional data.

LONG: Uses fully-spelled directionals such as "North," "South," "East," "West."

PRESERVE: Preserves the style used in the input record.

SHORT: Uses abbreviated directionals such as "N," "S," "E," "W."

European Post­ Adds the one- to three-character European Postcode prefix, followed by a dash, for mail gen­ code Prefix erated and distributed inside Europe.

YES: Adds the European Postcode prefix.

NO: Does not add the European Postcode prefix.

PRESERVE: Retains the European Postcode prefix, if one is found on input.

In the following address, for example, the D- is the European Postcode extension.

Hallesches Ufer 32-38

D-10963 Berlin

Germany

Note

The European Postcode prefix is for mail distributed from one European country to an­ other European country.

Developer Guide 172 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference Option Description

Extra Lines Specifies what to do with extra lines of non-address data.

PRESERVE: Attempts to retain the extra line of non-address data in the general location in which it was input.

REMOVE: Does not include any extra line of non-address data in the standardized lines or multiline fields.

PREFERRED: All populated EXTRA fields are placed above or below the multiline fields and standardized input lines based on the country data being processed. For example, EXTRA fields for Japan will be located below the standardized lines.

Format Assigned Specifies whether to format your assigned data based on the country's preferred address for­ Data mat. For example, the format for Germany is:

{Primary_Name1} {Primary_Number}

{Postcode1) {Locality}

{Country}

YES: Formats the assigned data.

NO: Does not format the assigned data and leaves it in the location in which it was input. If data is added to the record, this data will be placed based on the format string.

Format Unas­ Specifies whether to format your unassigned data based on the country's preferred address signed Data format. For example, the format for Germany is:

{Primary_Name1} {Primary_Number}

{Postcode1) {Locality}

{Country}

YES: Formats the unassigned data.

NO: Does not format the unassigned data and leaves it in the line in which it was input.

Convert Latin For Latin script records, converts any extended ASCII characters in the Best component to Output to US US ASCII characters, if a character conversion is available. For example, with the input street ASCII name “Østerbrogade”, you can preserve the local character or convert it to the international data format “Osterbrogade” in the cleansed output. Any extended ASCII characters for which there is no conversion (such as the degree symbol or inverted exclamation and question marks), are left as is. By default, the option is set to No.

Yes: Converts extended ASCII characters.

No: Does not convert extended ASCII characters.

Include Country Specifies whether to include country names in standardized lines or multiline fields.

YES: Includes country name.

NO: Does not include country name.

PRESERVE: Retains the country name if found on input.

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 173 Option Description

Include Locality Specifies whether the Locality1_Full output field contains both the Locality1_Name and the Addition Locality1_Addition information.

Yes: Includes both the locality and the locality addition information.

No: Does not include the locality addition.

Preserve: Includes locality addition if found on input. This is the default setting.

Note

The Locality Name Style option in the Global Address Cleanse transform overrides this op­ tion. If the Locality Name Style option is set to Short, the Locality1_Full field will not contain the locality addition information.

Note

If the Translate Major Locality option is set to translate the Locality1 output field, no local­ ity addition will be output regardless of the Include Locality Addition option setting.

Include Unused Specifies whether to output the unused address line data (for standardized lines and multiline Address Line fields). Data YES: Outputs the unused address line data in the remainder fields ADDRESS_LINE_REMAIN­ DER1 through ADDRESS_LINE_REMAINDER 4 (for example, 100 Main St Red House).

NO: Does not output the unused address line data (for example, 100 Main St).

Include Unused Specifies whether to output the unused last line data (for standardized lines and multiline Lastline Data fields):

YES: Outputs the unused last line data.

NO: Does not output the unused last line data.

Developer Guide 174 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference Option Description

Locality Name Specifies the format for locality data in the Locality1_Name output field for addresses. Style This option applies to German addresses.

Preserve: Preserves the locality data format as it was input. This is the default setting.

Short: Outputs locality data in the abbreviated version, if available in the reference data.

Note

To use the short locality name style, the Address Line Alias option must be set to Convert.

Note

This option overrides the Include Locality Addition option.

Note

If the Translate Major Locality option is set to translate the Locality1 output field, no short locality will be output regardless of the Locality Name Style option setting.

Move Multiline Determines the position of blank lines in output addresses. Data BOTTOM: If there are any blank lines, the transform moves them to the top and shifts the data to the bottom of the address block.

NO: Does not rearrange any lines, blank or otherwise.

TOP: If there are any blank lines, the transform moves them to the bottom of the address block and shifts the data to the top of the block.

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 175 Option Description

Output Country Specify which language and script to use on output for the country name (not the entire re­ Language cord).

PRESERVE: Preserves country name as it was on input.

CATALAN - LATIN

CHINESE - HANI

DANISH - LATIN

DUTCH- LATIN

ENGLISH- LATIN

FINNISH - LATIN

FRENCH - LATIN

GREEK - GREEK

GERMAN - LATIN

HUNGARIAN - LATIN

ITALIAN - LATIN

JAPANESE - HANI

JAPANESE - KANA

KOREAN - HANG

NORWEGIAN - LATIN

POLISH - LATIN

PORTUGUESE - LATIN

RUSSIAN - CYRILLIC

SPANISH - LATIN

SWEDISH - LATIN

Postal Phrase If you choose Short for the Postal Phrase Style option, this option specifies whether to use Punctuation punctuation in the postal abbreviation.

YES: Includes punctuation in postal abbreviations (for example P.O. Box).

NO: Does not insert any punctuation for postal abbreviations (for example, PO Box).

PRESERVE: Retains the punctuation of postal abbreviations if found in input record.

Postal Phrase Specifies whether to abbreviate postal phrases. Style LONG: Outputs the fully-spelled postal phrase (for example Post Office Box).

PRESERVE: Retains the postal phrase if found in the input record.

SHORT: Outputs the abbreviated postal phrase (for example, PO Box). The punctuation for this option is determined by the Postal Phrase Punctuation option.

Developer Guide 176 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference Option Description

Primary Type If you choose Short for the Primary Type Style option, this option specifies whether to use Punctuation punctuation in primary type abbreviations.

YES: Includes a period at the end of primary type abbreviations (for example, St.).

NO: Does not insert any punctuation at the end of primary type abbreviations (for example, St).

PRESERVE: Retains the punctuation of primary type abbreviations from your input.

Primary Type Specifies the style for primary type address elements. Style LONG: Uses fully spelled primary types such as Street, Avenue, Road, or Strasse.

PRESERVE: Retains the style used in the input record

SHORT: Uses abbreviated primary type such as St, Ave, Rd, or STR. The punctuation for this option is determined by the Primary Type Punctuation option.

Region Style Specifies whether to abbreviate the region name (state or province, for example).

LONG: Uses the fully spelled region name (for example, California or Ontario).

PRESERVE: Retains the style used in the input record.

SHORT: Abbreviates the region name (for example, CA or ON).

Remove Address Specifies whether to include punctuation in certain street names that include a DE L' or D'. Apostrophes YES: Retains punctuation in street names if it was present on input, for example, Rue D'Abbe­ ville.

NO: Removes punctuation in street names, for example, Rue D Abbeville.

Secondary De­ If you choose SHORT for the Secondary Description Style, this option specifies whether to scription Punc­ use punctuation in the abbreviation. tuation YES: Uses punctuation in the abbreviation (for example, Apt.).

NO: Does not use punctuation (for example, Apt).

PRESERVE: Retains the style used in the input record.

Secondary De­ Specifies whether to abbreviate the secondary description (for example, a unit or an apart­ scription Style ment).

LONG: Uses the fully spelled secondary description (for example, Apartment).

PRESERVE: Retains the style used in the input record.

SHORT: Abbreviates the secondary description (for example, Apt). The punctuation for this option is determined by the Secondary Description Punctuation option.

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 177 Option Description

Secondary Num­ Specifies the format of the secondary number (for example, a suite or apartment number). ber Style DASHED: Converts all secondary ranges to the dashed format. For example, for Canada ad­ dresses, places a dash between the secondary and primary range: 5-100 Main St.

PRESERVE: Preserves the style of the address as it was input.

TRAILING: Converts all secondary ranges to the trailing format. For example, for Canada, pla­ ces the unit designator at the end of the primary address: 100 Main St Suite 5.

Street Name This is for the Netherlands only. Specifies the format for street data for addresses in the Style Netherlands.

POST_OFFICE: Outputs street data in the format preferred by the Netherlands post office.

● Street address with maximum 17 letters in upper case.

Note

The Capitalization option in the Global Address Cleanse transform overrides this op­ tion.

● “IJ” written as “Y.” For example: DE C V OPYNENSTR 1 would become 4001 VL TIEL

NEN5825: Outputs street data in the format preferred by the Ministry of Internal Affairs.

● Street address with maximum 24 characters in mixed case.

Note

The Capitalization option in the Global Address Cleanse transform overrides this op­ tion.

● “IJ” written as “IJ.” For example: Burg d Cock v Opijnenstr 1 4001 VL TIEL

Use Local Pri­ Specifies whether to use the type style for primary address components that is present in the mary Type Style address data. Setting this option to Yes ignores the Primary Type Style option. This option ap­ plies to Austria, Germany, and Switzerland.

Yes: Uses the Primary Type Style that is present in the address data.

No: Uses the Primary Type Style specified in the Primary Type Style option.

Developer Guide 178 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference Option Description

Use Postal Specifies which country data is output for countries that receive their postal service from an­ Country Name other country. For example, if you are using the USA engine and have addresses from the U.S. territories, the Country field is populated with the postal country (United States) rather than the territory name (such as American Samoa, Puerto Rico, and so on). The style of the Coun­ try field is still based on the Country Style option.

If the country does not have a postal country, this option does not change the output.

YES: Uses the postal country name.

NO: Uses the territory country name.

11.7 Canada engine

Use the Canada engine to process your Canada address data with the Global Address Cleanse transform. The engine includes specific options that you can set for processing Canada address data and suggestion lists.

11.7.1 Canada engine options

The Options group contains all of the specific settings that you must define when processing with Canada address data.

Option Description

Parse Only YES: Parses records into discrete components, but does not perform a lookup in the postal di­ rectories. Parse Only is fast, but parsing results are unverified.

NO: Parses records into discrete components and performs a lookup in the postal directories. Setting this option to NO may slow down processing, but parsing results are verified.

Output Ad­ CONVERT: Uses French for records in Quebec, and English for records in other regions (provin­ dress Lan­ ces). guage ENGLISH: Converts records to English.

FRENCH: Converts records to French.

PRESERVE: Detects the input language and preserves that language upon output, no matter the region (province).

Unit Descrip­ Specifies the unit description in English: tion APARTMENT: Uses Apartment as the default unit designator.

DEFAULT: Uses the default unit designator.

UNIT: Uses Unit as the default unit designator.

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 179 Option Description

Dual Address Specifies the action to take when the Canada engine encounters a dual address.

POSITION: Selects an address based on the arrangement of the input data. The Canada engine tries to validate the address that is closest to the lower left corner of the address block. That might be the postal or the street address, depending on how the data was entered. (This value is required for SERP certification.)

POSTAL: Tries to validate based on the postal address. If that fails, tries again based on the street address.

STREET: Tries to validate based on the street address. If that fails, tries again based on the postal address (rural route or PO Box).

Enable LVR Canada Post requires that any address with a valid Large Volume Receiver (LVR) postal code Rule be considered valid. The postal code cannot be changed to match other address components. Canada Post recommends that you don't correct LVR addresses; however, correction is per­ mitted when a unique address can be determined without changing the postal code.

YES: Regards any LVR address as assigned, even when the address line is so flawed that a match to the postal directory is impossible. (This value is required for SERP certification.)

NO: Disables this rule. The transform reports an LVR address as unassigned when the address line is flawed.

Enable Rural Canada Post requires that any address with a valid rural postal code must be considered valid. Rule (Rural postal codes have a zero in the second position.) This rule applies even if the address line is empty or contains bad data.

Canada Post recommends that you don't correct rural addresses; however, the Canada engine will always attempt to correct the rest of the address. The valid rural postal code will always be left intact, according to CPC rules. This also applies if an address is entered without a postal code or with an incorrect postal code, and the locality (city) entered has just one postal code associated with it that is a rural postal code.

YES: Regards any rural address as valid, even if the address line is so flawed that a match to the postal directory is impossible. (This value is necessary for SERP certification).

NO: Reports a rural address as invalid if the address line is bad.

Postcode Only This option affects assignment when the input address line is badly incomplete (for example, Search when the address includes a range but no street name). In this case, SERP rules specify that the transform must search based on postal code only, and attempt to find a street record con­ taining that range. If the Canada engine can find only one street record that contains the range, then (the SERP rules state) the address line is assigned from the postal code.

YES: Turns on the option. (This value is necessary for SERP certification.)

NO: Turns off the option. In some cases, the result is a better address line. In other cases, the Canada engine more reliably detects that it cannot assign an address line.

Developer Guide 180 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference Option Description

Postcode No This option is important when the Canada engine has determined that the address line can be Match Search assigned, but doesn't match the incoming postal code. SERP rules specify that if this occurs, the transform must search the postal directories to ensure the following:

● If the incoming address line is a PO Box address, the postal code must not be a valid postal code for an LVR (Large Volume Receiver), firm, or a civic (street) address, such as 100 Main St. ● If the incoming address line is a civic (street) address, the postal code must not be a valid postal code for an LVR PO Box address.

If either one of these conditions exist, the transform cannot assign the address, according to SERP rules. Because doing a postal-code-only search is very time consuming, disabling this search should speed up your processing time.

YES: Turns on this option. (This value is necessary for SERP certification.)

NO: Turns off this option.

Postcode Prior­ This option is important when the Canada engine is trying to break a tie between two possible ity Over Street assignments:

● A near match on address line ● An exact match on postal code

YES: When breaking a tie between a near match on address line and an exact match on postal code, validates based on the postal code. (This value is necessary for SERP certification.)

NO: When breaking a tie between a near match for the address line and an exact match for the postal code, places more weight on the address line than on the postal code, because data-en­ try errors are common in postal codes. Where possible, the transform changes the postal code to agree with the address line.

Use Firm to As­ Specifies whether the firm is used to make an assignment and is displayed in a suggestion list. sign Yes: Uses and displays the firm. This is the default.

No: Does not use or display the firm.

Related Information

Standardization options [page 170]

11.7.2 Canada engine report options

Set these options to add the necessary Statement of Address Accuracy report information.

This is an optional group; however, this option group must be completed so that you can produce a SERP Report (Software Evaluation and Recognition Program).

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 181 Option Description

Customer Company Name Specifies the company name of the organization for whom you are prepar­ ing the mailing (up to 40 characters).

Mailer Address1 Specifies the name and address of the person or organization for whom you are preparing the mailing (up to 40 characters per line). Mailer Address2

Mailer Address3

Mailer Address4

Customer CPC Number Specifies the customer's CPC number that is located on the Canada Post Contract (up to 15 characters).

11.7.3 Canada engine suggestion list options

Set these options when you want to generate suggestion lists for your Canada address data.

Option Description

Enable Suggestion Specifies whether suggestion lists are generated. Lists NO: Does not generate suggestion lists.

YES: Generates suggestion lists. ax Number Last­ Specifies the maximum number of lastline suggestions that can be generated. lines You might set this option in order to limit the size of the SOAP documents being sent by the web service, or to limit the maximum number of suggestions that your users would have to choose from. However, by setting a maximum, you may occasionally eliminate a suggestion from the list that could be the correct one.

The maximum number you can enter is 15.

Max Number Ad­ Specifies the maximum number of address line suggestions that can be generated. dress Lines You might set this option in order to limit the size of the SOAP documents being sent by the web service, or to limit the maximum number of suggestions that your users would have to choose from. However, by setting a maximum, you may occasionally eliminate a suggestion from the list that could be the correct one.

The maximum number you can enter is 50.

Lastlines Match Specifies the similarity score required for lastline suggestions. This score then determines Minimum which suggestions will be returned in the list. A higher number indicates that the suggestion must be more similar to the input in order to be returned as a possible suggestion.

Type a value from 0 to 80.

Developer Guide 182 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference Option Description

Address Lines Specifies the similarity score required for address-line suggestions. This score then deter­ Match Minimum mines which suggestions will be returned in the list. A higher number indicates that the sug­ gestion must be more similar to the input in order to be returned as a possible suggestion.

Type a value from 0 to 80.

Combine Overlap­ Specifies whether individual suggestions with overlapping ranges are combined. ping Ranges YES: Ignores gaps and overlaps in ranges.

Set this option to YES if you want to limit the number of total suggestions presented to your user. However, you might not see gaps of invalid ranges that would be apparent if this op­ tion was set to NO.

For example, the following suggestions would be presented if this option is set to NO:

1000-1099 Maple Ave

1100-1199 Maple Ave

But this suggestion would only show if set to YES:

1000-1199 Maple Ave

NO: Does not combine overlapping ranges.

Address Range Specifies a span around the input primary address range for which to return suggestions. By using this option, you can limit the suggestions returned to be within a few blocks of your input. For example, assume you entered 500 for this value. Then, you submit the fol­ lowing street address:

1000 Pine St.

Suggestions would be returned in a range from 750 to 1250 Pine street.

Type 0 if you don't want to limit the ranges returned in suggestions.

11.8 Global Address Country options

The Global Address engine includes settings in the Country Options section that you can set for processing global address data and suggestion lists.

Option Description

Country Name Choose a specific country for the Assignment option settings or choose Global (Apply to All Countries) to make global settings.

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 183 Option Description

Retain Postcode if Set this option for processing blank, valid, or invalid input postcodes. Valid Format No input postcode available:

YES: Consider postcode as an invalid format, so a postcode will be output if there is a sin­ gle answer or intelligent matching is possible otherwise return a blank postcode on out­ put.

NO: Output postcode if there is a single answer or intelligent matching is possible.

Input postcode available:

YES: Retain input postcode unless it is an invalid format for the country and there is a sin­ gle answer or intelligent matching is possible.

NO: Update output postcode if there is a single answer or intelligent matching is possible. Otherwise retain input postcode.

Dual Address Specifies the action to take when the Global Address engine encounters a dual address.

POSITION: Selects an address based on the arrangement of the input data. First tries to validate a postcode for the address closest to the lower left corner of the address block.

POSTAL: Attempts to validate based on the postal address. If that fails, attempts again based on the street address.

STREET: Attempts to validate based on the street address. If that fails, attempts again based on the postal address.

Disable Certification Australia:

YES: Enables non-certified features and extends the directory expiration for non-mailing purposes. You may extend the directory expiration period up to 14 months from the date the directories were created.

Processing with expired directory data is allowed when you are not planning to use the records for AMAS mailing purposes. This is ideal for data warehousing industries, for ex­ ample. However, when you select Yes, you cannot print the AMAS report. Any lists cre­ ated with expired directories cannot be used for postage discounts. Data directories ex­ pire after 15 months when certification is disabled.

NO: Uses the most current directory information, disables non-certified features, and en­ ables printing of the AMAS report.

New Zealand:

YES: Enables non-certified processing of New Zealand addresses and allows processing with expired directories for non-mailing purposes.

When you select YES, you cannot print the SOA (Statement of Accuracy) Report. Any list created with certification disabled cannot be used for mailing.

NO: Enables certified processing for New Zealand and enables printing of the SOA report.

Developer Guide 184 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference Option Description

Use Firm to Assign Specifies whether the firm is used to make an assignment and is displayed in a suggestion list.

Yes: Uses and displays the firm. This is the default.

No: Does not use or display the firm.

11.9 Global Address engine report options

With the Report Options group, you can add the required information for the following reports:

● New Zealand Statement Of Accuracy ● AMAS Address Matching Processing Summary

11.9.1 Report options for Australia

With this option group, you can add the required Australia AMAS - Address Matching Processing Summary information.

This is an optional group.

Option Description

Customer Company Name Specifies the name of the customer company name for whom you are pre­ paring this list (up to 40 characters).

Mailer Address1 Specifies the name and address of the person or organization for whom you are preparing the mailing (up to 29 characters per line). Mailer Address2

Mailer Address3

Mailer Address4

List Name Specifies the name of your database or mailing list (up to 40 characters). This might be the file name or your title or formal name for the list.

File Name Specifies the actual input file name, such as australia.dbf (up to 40 char­ acters).

11.9.2 Report options for New Zealand

With this option group, you can add the required Statement of Accuracy (SOA) report information.

This is an optional group.

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 185 Option Description

Customer Number New Zealand Post customer number.

If you want to submit your file for mailing and qualify for postage discounts, you must include your customer number on the report.

Customer Company Name Specifies the name of the customer company name for whom you are preparing this list (up to 40 characters).

Mailer Address 1 Specifies the name and address of the person or or­ ganization for whom you are preparing the mailing (up Mailer Address 2 to 29 characters per line). Mailer Address 3

Mailer Address 4

Mailer Address 5

Mailer Address 6

SOA Issuer Name Specifies the name of the company that prepared this list (up to 40 characters).

File Name Name of the input data associated with the report.

11.10 USA engine

Use the USA engine with the Global Address Cleanse transform to address cleanse your data for the United States of America and its territories. The engine includes specific options that you can set for processing USA data.

11.10.1 USA engine options

The Options group contains all of the specific settings that you must define when processing with USA address data.

Option Description

Parse Only YES: Parses records into discrete components, but does not perform a lookup in the postal directories. Parse Only is fast, but parsing results are unverified.

NO: Parses records into discrete components and performs a lookup in the postal directories. Setting this option to NO may slow down processing, but parsing results are verified.

Developer Guide 186 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference Option Description

Unit Description Specifies how to standardize the unit description.

CONVERT: Uses the unit description found in the postal directory (such as an apartment, suite, room, or floor).

PRESERVE: Preserves the unit description from the input record, correcting any spelling errors.

Dual Address Specifies the action to take when the transform encounters a dual address.

POSITION: Selects an address based on the arrangement of the input data.

The transform attempts to validate the address that is closest to the lower left corner of the address block. That might be the postal or the street ad­ dress; it depends on how the data was entered.

POSTAL: Attempts to validate based on the postal address. If that fails, at­ tempts again based on the street address.

STREET: Attempts to validate based on the street address. If that fails, at­ tempts again based on the postal address (rural route or PO Box).

Use Firm to Assign Specifies whether the firm is used to make an assignment and is displayed in a suggestion list.

Yes: Uses and displays the firm. This is the default.

No: Does not use or display the firm.

11.10.2 USA engine suggestion lists options

Set these options to generate suggestion lists for the USA and its territories.

Option Description

Enable Sugges­ Specifies whether suggestion lists are generated. tion Lists YES: Generates suggestion lists.

NO: Does not generate suggestion lists.

Max Number Specifies the maximum number of lastline suggestions that can be generated. Lastlines Limits the size of the SOAP documents being sent by the web service, or limits the maximum number of suggestions that your users would have to choose from. However, by setting a maxi­ mum, you may occasionally eliminate a suggestion from the list that could be the correct one.

The maximum number you can enter is 15.

Developer Guide Global Address Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 187 Option Description

Max Number Specifies the maximum number of address line suggestions that can be generated. Address Lines Limits the size of the SOAP documents being sent by the web service, or limits the maximum number of suggestions that your users would have to choose from. However, by setting a maxi­ mum, you may occasionally eliminate a suggestion from the list that could be the correct one.

The maximum number you can enter is 100.

Lastlines Specifies the similarity score required for lastline suggestions. This score determines which Match Mini­ suggestions will be returned in the list. A higher number indicates that the suggestion must be mum more similar to the input in order to be returned as a possible suggestion.

Type a value from 0 to 80.

Address Lines Specifies the similarity score required for address-line suggestions. This value etermines which Match Mini­ suggestions will be returned in the list. A higher number indicates that the suggestion must be mum more similar to the input in order to be returned as a possible suggestion.

Type a value from 0 to 80.

Combine Over­ Specifies whether individual suggestions with overlapping ranges are combined. lapping Ranges YES: Ignores gaps and overlaps in ranges.

You might set this option to YES if you want to limit the number of total suggestions presented to your user. However, you might not see gaps of invalid ranges that would be apparent if this option was set to NO.

For example, a suggestion list might show the following suggestions if this option is set to NO:

1000-1099 Maple Ave

1100-1199 Maple Ave

But would only show this suggestion if set to YES:

1000-1199 Maple Ave

NO: Does not combine overlapping ranges.

Address Range Specifies a span around the input primary address range for which to return suggestions. By using this option, you can limit the suggestions returned to be within a few blocks of your input. For example, assume you entered 500 for this value. Then, you submit the following street ad­ dress:

1000 Pine St.

Suggestions would only be returned in a range from 750 to 1250 Pine street.

Type 0 if you don't want to limit the ranges returned in suggestions.

Developer Guide 188 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Global Address Cleanse transform reference 12 Data Cleanse

Data cleansing is the process of parsing and standardizing data.

The parsing rules and other information that define how to parse and standardize are stored in a cleansing package.

The Data Cleanse transform identifies and isolates specific parts of mixed data, and then parses and formats the data based on the referenced cleansing package as well as options set directly in the transform. You can use Data Cleanse to assign gender and prenames to name data and to generate Match standards for all types of data.

12.1 Ranking and prioritizing parsing engines

When dealing with multiline input, you can configure the Data Cleanse transform to use only specific parsers and to specify the order the parsers are run. Carefully selecting which parsers to use and in what order can be beneficial. Turning off parsers that you do not need significantly improves parsing speed and reduces the chances that your data will be parsed incorrectly.

You can change the parser order for a specific multiline input by modifying the corresponding parser sequence option in the Parser_Configuration options group of the Data Cleanse transform. For example, to change the order of parsers for the Multiline1 input field, modify the Parser_Sequence_Multiline1 option. Use a pipe (|) delimiter to separate the values of the parsers.

12.2 About parsing data

The Data Cleanse transform can identify and isolate a wide variety of data. Within the Data Cleanse transform, you map the input fields in your data to the appropriate input fields in the transform. Person and firm data, phone, date, email, and data can be mapped to either discrete input fields or multiline input fields.

The example below shows how Data Cleanse parses product data from a multiline input field and displays it in discrete output fields. The data also can be displayed in composite fields, such as Standard Description, which can be customized in Cleansing Package Builder to meet your needs.

Input data Parsed data

Glove ultra grip profit 2.3 large black syn­ Product Category Glove thetic leather elastic with Velcro Mechanix Size Large Wear Material Synthetic Leather

Trademark Pro-Fit 2.3 Series

Cuff Style Elastic Velcro

Palm Type Ultra-Grip

Color Black

Developer Guide Data Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 189 Input data Parsed data

Vendor Mechanix Wear

Standard Description Glove - Synthetic Leather, Black, size: Large, Cuff Style: Elastic Vel­ cro, Ultra-Grip, Mechanix Wear

The examples below show how Data Cleanse parses name and firm data and displays it in discrete output fields. The data also can be displayed in composite fields which can be customized in Cleansing Package Builder to meet your needs.

Input data Parsed data

Mr. Dan R. Smith, Jr., CPA Prename Mr. Account Mgr. Jones Inc. Given Name 1 Dan

Given Name 2 R.

Family Name 1 Smith

Maturity Postname Jr.

Honorary Postname CPA

Title Account Mgr.

Firm Jones, Inc.

Input data Parsed data

James Witt Given Name 1 James 421-55-2424 [email protected] Family Name 1 Witt 507-555-3423 Aug 20, 2003 Social Security 421-55-2424

E-mail address [email protected]

Phone 507.555.3423

Date August 20, 2003

The Data Cleanse transform parses up to six names per record, two per input field. For all six names found, it parses components such as prename, given names, family name, and postname. Then it sends the data to individual fields. The Data Cleanse transform also parses up to six job titles per record.

The Data Cleanse transform parses up to six firm names per record, one per input field.

12.2.1 About parsing phone numbers

Data Cleanse parses both North American Numbering Plan (NANP) and international phone numbers.

Phone numbering systems differ around the world. When Data Cleanse parses a phone number, it outputs the individual components of the number into the appropriate output fields.

Developer Guide 190 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse The most efficient parsing happens when the phone number has a valid country code that appears before the phone number. If the country code is not present, or it is not first in the string, Data Cleanse uses other resources to attempt to parse the phone number.

Data Cleanse parses phone numbers by first searching internationally. It uses ISO2 country codes, when available, along with the patterns defined in the cleansing package for each country to identify the country code. If it encounters a country that participates in the NANP, it automatically stops trying to parse the number as international and attempts to parse the phone number as North American, comparing the phone number to commonly used patterns such as (234) 567-8901, 234-567-8901, and 2345678901.

Currently, the participating countries in the NANP include:

● AS - American Samoa ● AI - Anquilla ● AG - Antiqua and Barbuda ● BS- The Bahamas ● BB - Barbados ● BM - Bermuda ● VG - British Virgin Islands ● CA - Canada ● KY - Cayman Islands ● DM - Dominica ● DO - Dominican Republic ● GD - Grenada ● GU - Guam ● JM - Jamaica ● MS - Montserrat ● MP - Northern Mariana Islands ● PR - Puerto Rico ● KN - Saint Kitts and Nevis ● LC - Saint Lucia ● VC - Saint Vincent and the Grenadines ● SX - Saint Maarten ● TT - Trinidad and Tobago ● TC - Turks and Calcos Islands ● US - United States ● VI - United States Virgin Islands

Data Cleanse gives you the option for formatting North American numbers on output (such as your choice of delimiters). However, Data Cleanse outputs international numbers as they were input, without any formatting. Also, Data Cleanse does not cross-compare to the address to see whether the country and city codes in the phone number match the address.

Developer Guide Data Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 191 12.2.1.1 How Data Cleanse parses phone numbers

Set up international phone parsing in the Data Cleanse transform.

You can set up international phone parsing in the Data Cleanse transform at three levels: Record, job file, and global. Data Cleanse attempts to parse phone data in order as explained in the table below. If the transform cannot parse phone data using one of the three levels, it attempts to parse the phone data as a North American phone number. If Data Cleanse encounters a country code that is in the North American Numbering Plan (NANP) at any level, it skips any remaining levels and automatically attempts to parse the number as a North American phone number.

Note

The country has to be specified in the cleansing package before Data Cleanse attempts to parse using the three levels listed in the table below.

Level Process

Record Optional: Use the dynamic input field Option_Country.

Set up your data flow to include Global Address Cleanse prior to the Data Cleanse trans­ form. Make sure the Global Address Cleanse transform outputs the field ISO_Coun­ try_Code_2Char. Then map ISO_Country_Code_2Char to the dynamic input field, Op­ tion_Country, in your input mapping.

Advantage: Because Global Address Cleanse determines the country code that is output in the ISO_Country_Code_2Char field based on the record's address data, there is a good chance that the record's address and phone are from the same country.

Data Cleanse attempts to parse the phone data using the country code. If it finds a match, Data Cleanse parses the phone data based on the applicable country and moves on to parse the next record. If the transform does not find a match, the parsing goes to the next level (job file).

If the country code is from the NANP, Data Cleanse stops the international parsing, skips the job file and global levels, and attempts to parse the phone data as a North American phone number.

Developer Guide 192 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse Level Process

Job file Optional: Use ISO2 Country Code Sequence found in Data Cleanse's Phone Options group.

Set a sequence of countries in the ISO2 Country Code Sequence option to specify the or­ der in which Data Cleanse searches for phone data. Data Cleanse attempts to match phone data to the first country in the sequence. If no match, Data Cleanse moves on to the next country in the sequence. Global is the default setting for the sequence. You should keep Global as the last entry in the sequence, or you can remove it.

Advantage: If you know that your data is predominately from specific countries, you can set your sequence using those countries and remove Global so Data Cleanse attempts to match phone data to those countries only, in the order you have listed them.

If the country code is from the NANP, Data Cleanse stops the international parsing, skips the global level, and attempts to parse the phone data as a North American phone num­ ber.

If you delete Global from the sequence, and if Data Cleanse did not find matches to the countries in the sequence, the transform attempts to parse the record's phone data as North American.

If you want to skip this level of phone parsing, you can leave the ISO2 Country Code Se­ quence option blank. Data Cleanse then attempts to parse the phone data as North American.

Global Optional.

Global is the default setting for the job file option, ISO2 Country Code Sequence. You can choose not to include Global in the sequence, include it after a sequence of country co­ des, or include only Global in the sequence.

Advantage: If you are unsure of the countries represented in your data, and/or you do not have the country code output from the Global Address Cleanse transform, the transform searches all countries that are listed in the cleansing package (except NANP countries) as a part of the global search.

Note

The Global level does not include country codes as search criteria.

If the country code is from the NANP, Data Cleanse stops the global parsing and attempts to parse the phone data as a North American phone number.

If the transform cannot parse phone data based on the Global setting, or you have re­ moved Global from the ISO2 Country Code Sequence option, the transform attempts to parse the phone data as North American.

The transform outputs any phone data that does not parse to an Extra output field when you have included it in your output field setup.

Developer Guide Data Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 193 12.2.2 About parsing dates

Data Cleanse recognizes dates in a variety of formats and breaks those dates into components.

Data Cleanse can parse up to six dates from your defined record. That is, Data Cleanse identifies up to six dates in the input, breaks those dates into components, and makes dates available as output in either the original format or a user-selected standard format.

12.2.3 About parsing Social Security numbers

Data Cleanse parses U.S. Social Security numbers (SSNs) that are either by themselves or on an input line surrounded by other text.

Fields used

Data Cleanse outputs the individual components of a parsed Social Security number—that is, the entire SSN, the area, the group, and the serial.

How Data Cleanse parses Social Security numbers

Data Cleanse parses Social Security numbers in two steps:

1. Identifies a potential SSN by looking for the following patterns:

Pattern Digits per grouping Delimited by

nnnnnnnnn 9 consecutive digits n.a.

nnn nn 3, 2, and 4 (for area, group, and serial) spaces nnnn

nnn-nn- 3, 2, and 4 (for area, group, and serial) all supported delimiters nnnn

2. Performs a validity check on the first five digits only. The possible outcomes of this validity check are:

Outcome Description

Pass Data Cleanse successfully parses the data—and the Social Security number is output to a SSN output field.

Fail Data Cleanse does not parse the data because it is not a valid Social Security number as defined by the U.S. government. The data is output as Extra, unparsed data.

Developer Guide 194 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse Check validity

When performing a validity check, Data Cleanse does not verify that a particular 9-digit Social Security number has been issued, or that it is the correct number for any named person. Instead, it validates only the first 5 digits (area and group). Data Cleanse does not validate the last 4 digits (serial)—except to confirm they are digits.

Outputs valid SSNs

Data Cleanse outputs only Social Security numbers that pass its validation. If an apparent SSN fails validation, Data Cleanse does not pass on the number as a parsed, but invalid, Social Security number.

12.2.4 About parsing Email addresses

When Data Cleanse parses input data that it determines is an email address, it places the components of that data into specific fields for output. Below is an example of a simple email address. [email protected]

By identifying the various data components (user name, host, and so on) by their relationships to each other, Data Cleanse can assign the data to specific fields.

Fields Data Cleanse uses

Data Cleanse outputs the individual components of a parsed email address—that is, the email user name, complete domain name, top domain, second domain, third domain, fourth domain, fifth domain, and host name.

What Data Cleanse does

Data Cleanse can take the following actions:

● Parse an email address located either in a discrete field or combined with other data in a multiline field. ● Break down the domain name into sub-elements. ● Verify that an email address is properly formatted. ● Flag the address as belonging to an internet service provider (ISP).

What Data Cleanse does not verify

Several aspects of an email address are not verified by Data Cleanse:

Developer Guide Data Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 195 ● whether the domain name (the portion to the right of the @ sign) is registered. ● whether an email server is active at that address. ● whether the user name (the portion to the left of the @ sign) is registered on that email server (if any). ● whether the personal name in the record can be reached at this email address.

Email components

The output field where Data Cleanse places the data depends on the position of the data in the record. Data Cleanse follows the Domain Name System (DNS) in determining the correct output field.

For example, if [email protected] were input data, Data Cleanse would output the elements in the following fields:

Output field Output value

Email [email protected]

Email_User expat

Email_Domain_All london.home.office.city.co.uk

Email_Domain_Top uk

Email_Domain_Second co

Email_Domain_Third city

Email_Domain_Fourth office

Email_Domain_Fifth home

Email_Domain_Host london

12.2.5 About parsing street addresses

Data Cleanse does not identify and parse individual address components. To parse data that contains address information, process it using a Global Address Cleanse or U.S. Regulatory Address Cleanse transform prior to Data Cleanse. In the event address data is processed by the Data Cleanse transform, it is usually output to the Extra fields.

12.3 About standardizing data

The Data Cleanse transform can standardize data to make its format more consistent. Data characteristics that the transform can standardize include case, punctuation, and abbreviations.

Developer Guide 196 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse 12.4 About assigning gender descriptions and prenames

The Data Cleanse transform can assign a gender description to each name and populate a generated output field called Gender. Gender descriptions are: strong male, strong female, weak male, weak female, and ambiguous. For dual names, Data Cleanse offers four additional gender descriptions: female multi-name, male multi-name, mixed multi-name, and ambiguous multi-name. The intelligence behind gender assignment lies partly in the application and partly in the cleansing package.

The Data Cleanse transform can populate an additional generated output field called Prename when the Data Cleanse transform assigns a strong gender. Prenames include Mr., Ms., or Mrs.

12.5 Prepare records for matching

If you are planning a data flow that includes matching, it is recommended that you first use Data Cleanse to standardize the data to enhance the accuracy of your matches. The Data Cleanse transform should be upstream from the Match transform.

The Data Cleanse transform can generate match standards or alternates for many name and firm fields as well as all custom output fields. For example, Data Cleanse can tell you that Patrick and Patricia are potential matches for the name Pat. Match standards can help you overcome two types of matching problems: alternate spellings (Catherine and Katherine) and nicknames (Pat and Patrick).

This example shows how Data Cleanse can prepare records for matching.

Table 3: Data source 1 Input record Cleansed record

Intl Marketing, Inc. Given Name 1 Pat

Pat Smith, Accounting Mgr. Match Standards Patrick, Patricia

Given Name 2

Family Name 1 Smith

Title Accounting Mgr.

Firm Intl. Mktg, Inc.

Table 4: Data source 2 Input record Cleansed record

Smith, Patricia R. Given Name 1 Patricia

International Marketing, Incorp. Match Standards

Given Name 2 R

Family Name 1 Smith

Title

Developer Guide Data Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 197 Input record Cleansed record

Firm Intl. Mktg, Inc.

When a cleansing package does not include an alternate, the match standard output field for that term will be empty. In the case of a multi-word output such as a firm name, when none of the variations in the firm name have an alternate, then the match standard output will be empty. However, if at least one variation has an alternate associated with it, the match standard is generated using the variation alternate where available and the variations for words that do not have an alternate.

12.6 Cleansing package

With Data Quality Management SDK, Data Cleanse offers a person and firm cleansing package that handles a variety of regions. The cleansing package is designed to enhance the ability of Data Cleanse to appropriately cleanse the data according to the cultural standards of multiple regions. The table below illustrates how name parsing may vary by culture:

Culture Name Parsed Output Given_Name1 Given_Name2 Family_Name1

Spanish Juan C. Sánchez Juan C. Sánchez

Portuguese João A. Lopes João A. Lopes

French Jean Christophe Rous­ Jean Christophe Rousseau seau

German Hans Joachim Müller Hans Joachim Müller

American James Andrew Smith James Andrew Smith

12.7 About Japanese data

Data Cleanse can identify and parse Japanese data or mixed data that contains both Japanese and Latin characters.

In general, Data Cleanse uses a word breaker to break an input string into individual parsed values and then attempts to recombine contiguous parsed values into variations. Each variation is assigned one or more classifications based on how the variation is defined in the cleansing package. The input is then parsed according to the parser and parsing rules defined in the cleansing package.

With Japanese data, Data Cleanse first identifies the script in each input field as kanji, kana, or Latin and assigns it to the appropriate script classification. Input fields containing data classified as kana or kanji script are then processed using a special Japanese lexer and parser. Input fields containing data classified as Latin script are processed using the regular Data Cleanse methodology.

Developer Guide 198 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse Note

Only data in Latin script is parsed based on the value set for the Break on Whitespace Only transform option. All kana and kanji input is broken by the Japanese word breaker.

12.7.1 Text width in output fields

Many Japanese characters are represented in both fullwidth and halfwidth forms. Latin characters can be encoded in either a proportional or fullwidth form. In either case, the fullwidth form requires more space than the halfwidth or proportional form.

To standardize your data, you can use the Character Width Style option to set the character width for all output fields to either fullwidth or halfwidth. The normal width value reflects the normalized character width based on script type. Thus some output fields contain halfwidth characters and other fields contain fullwidth characters. For example, all fullwidth Latin characters are standardized to their halfwidth forms and all halfwidth katakana characters are standardized to their fullwidth forms. NORMAL_WIDTH does not require special processing and thus is the most efficient setting.

12.7.2 Processing Japanese data

In order to process Japanese data, you must set the Content Domain Sequence to JA or Global.

Set values for other transform options, including the output text width conversion, as appropriate for your needs.

Developer Guide Data Cleanse © 2014 SAP SE or an SAP affiliate company. All rights reserved. 199 13 Data Cleanse transform reference

Use the Data Cleanse transform to parse and format custom or person and firm data as well as phone numbers, dates, e-mail addresses, and Social Security numbers. Custom data includes operational or product data specific to your business. The cleansing package you specify contains the information necessary to define how your data should be parsed and standardized.

The Data Cleanse transform is typically used after the address cleansing process and before the matching process.

The following sections describe the configurations for the Data Cleanse XML. You can find examples of the XML configurations with the samples installed with the product.

13.1 System group

Syntax

The System group controls high-level transform functions.

Note

If an option exists in the XML, but it is not documented here, do not alter the contents of the option. Editing the contents could cause errors.

Option Description

DS_COMMON_DIR Enter the relative path to the Data Cleanse support files.

For example, the default Windows 32-bit location of these files is C:\Program Files \SAP BusinessObjects\Data quality Mgmt SDK\windows_32\DataQuality \datacleanse.

You would then enter C:\Program Files\SAP BusinessObjects\Data quality Mgmt SDK\windows_32\

13.2 Report and analysis

Use this option to generate report data for the Data Cleanse transform.

Developer Guide 200 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse transform reference Option Description

Generate Report Data Specifies whether to generate report data for this transform.

● Yes: Generates report data for this transform. ● No: Turns off report data generation. If you do not need to generate reports (during testing, for example), set this option to No to improve performance.

13.3 Cleansing Package

Controls which cleansing package the Data Cleanse transform uses.

Option Description

Cleansing package name Enter the name of the cleansing package you want to use. The location of the cleansing package is determined by the option in the group of the Data Cleanse XML.

Developer Guide Data Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 201 Option Description

Content domain sequence A content domain specifies which domain's properties should be assigned to a var­ iation. You can specify more than one content domain.

The Global domain is a special content domain which contains all variations and their associated properties. If a variation is not associated with domain-specific in­ formation the Global domain serves as the default domain. The Global domain is re­ quired for every content domain sequence. Be sure to add GLOBAL as the last do­ main in the sequence.

Note

You can set this option as a dynamic input field.

Select the content domains you want to include. The arrows allow you to change the order of the content domains.

GLOBAL - Global AR - Arabic ZH - Chinese CS - Czech DA - Danish NL - Dutch EN_US - English (United States & Canada) EN_GB - English ( & Ireland) EN_AU - English (Australia & New Zealand) EN_IN - English (India) FR - French DE - German HU - Hungarian ID - Indonesian IT - Italian JA - Japanese MS - Malay NO - Norwegian PL - Polish PT_BR - Portuguese () PT_PT - Portuguese (Portugal) RO - Romanian RU - Russian SK - Slovak ES_MX -Spanish (Latin America) ES_ES - Spanish () SV - Swedish

Developer Guide 202 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse transform reference Option Description

Output format Selects the format for output. Based on the specified domain in the output format, Data Cleanse uses certain output fields and formats the data in those fields accord­ ing to the regional standards.

Note

You can set this option as a dynamic input field.

Valid values for this option are:

AR - Arabic ZH- Chinese CS - Czech DA - Danish NL - Dutch EN_US - English (United States & Canada) EN_GB - English (United Kingdom & Ireland) EN_AU - English (Australia & New Zealand) EN_IN - English (India) FR - French DE - German HU - Hungarian ID - Indonesian IT - Italian JA - Japanese MS - Malay NO - Norwegian PL - Polish PT_BR - Portuguese (Brazil) PT_PT - Portuguese (Portugal) RO - Romanian RU - Russian SK - Slovak ES_MX - Spanish (Latin America) ES_ES - Spanish (Spain) SV - Swedish

13.4 Input word breaker

Controls how the parser breaks input data.

Developer Guide Data Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 203 Option Description

Break on White­ Specifies whether the Data Cleanse transform breaks input data only on white space or space Only on white space, punctuation, and alphanumeric transitions.

YES: Input data breaks only on white space.

NO: Input data breaks on white space, punctuation, and alphanumeric transitions.

This option allows the Data Cleanse transform to recognize alphanumeric product codes as entries in a custom cleansing package. For example, if Break on Whitespace Only is set to NO, the parser breaks a product code such as AF302 into two tokens, AF and 302. If Break on Whitespace Only is set to YES, the parser recognizes AF302 as a single entry.

13.5 Person standardization options

Controls how the Data Cleanse transform standardizes person-related output.

Option Description

Assign Prenames Specifies whether the transform should include assigned prenames (for example, Mr. or Mrs.) in the Prename output field.

The Prename output field always includes prenames that are part of the name input data. Additionally, the Data Cleanse transform can assign prenames based on the gender of the name (strong_male or strong_female) in the Given_Name1 field. When the gender of Given_Name1 is not strong, prenames are assigned based on the gender Ootions Use Given Name2 To Assign Gender and Use Family Name to Assign Gender.

● YES: Turns on prename assignment. ● NO: Turns off prename assignment. The Prename output field contains only pre­ names included in the input data.

Combine Compound Specifies how compound family names are standardized. Names ● YES: Combines compound family names. For example, the family name Van Helsing would combine to VanHelsing. ● NO: Preserves the format of compound family names.

Developer Guide 204 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse transform reference Option Description

Name Order Defines how Data Cleanse applies parsing rules to determine the content of the Given_Name and Family_Name output fields.

● GIVEN_FAMILY_NAME_STRICT and FAMILY_GIVEN_NAME_STRICT: These values specify the respective order of given and family names in the input file. Parsing rules that do not follow the strictly-defined name order are not considered when Data Cleanse determines which rule to apply to the input string. These settings are useful when the order of the family and given names in the input data is consistent. ● GIVEN_FAMILY_NAME_SUGGEST and FAMILY_GIVEN_NAME_SUGGEST: These val­ ues specify which rule to choose in order to break a tie when two rules have the same confidence score. Data Cleanse chooses the rule that follows the suggested name or­ der. ● UNKNOWN: Data Cleanse chooses the rule with the highest confidence score based on information in the dictionary and rule file.

Associate Name Ti­ Defines how name and occupational title data found in separate input fields are associ­ tle ated.

● YES: Data Cleanse assumes that the name and title data describe the same person and is associated. ● NO: Data Cleanse assumes that the name and title data is not associated.

Enable Presumptive Specifies whether you want to use presumptive name parsing on Name_Line input fields. Name Parsing ● YES: Turns on presumptive name parsing. Data in the Name_Line input field is treated as a name. ● NO: Turns off presumptive name parsing. Data in the Name_Line input field that does not parse as a name remains unparsed and is output to the Extra field. For example, if the data contains an automobile brand and model in a Name_Line in­ put field, the Data Cleanse transform tries to parse the information as a name based on rules in the cleansing package. If the option is set to NO and Data Cleanse is not able to assign the data, the unparsed data is output to the Extra field. If the option is set to YES, Data Cleanse will assign the data as a name.

Developer Guide Data Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 205 Option Description

Parse Discrete Input Defines how to parse person data.

● No: discrete input fields are mapped directly to the corresponding output fields with­ out being parsed. ● Yes: Discrete input fields are combined to one input field so the data can be parsed and output to discrete fields.

Example:

Table 5: Input data Column Field

Person1_Given_Name1 Mr John T

Person1_Family_Name1 Smith Iii

Table 6: Output data Column Option=No Option=Yes

Person1.Prename Mr

Person1.Given_Name1 Mr John T John

Person1.Given_Name2 T

Person1.Family_Name1 Smith Iii Smith

Person1.Maturity_Postname

13.5.1 Gender options

The gender standardization options control which input fields Data Cleanse uses to assign gender. These options are found in the Gender Options group.

Option Description

Use Given Name2 To As­ When the gender of the prename and Given_Name1 are unassigned or ambiguous, sign Gender assigns gender based on the gender of the parsed Given_Name2.

● YES: Turns on the option. ● NO: Turns off the option. For example, if the option is set to No, the gender of the name Pat Robert Smith is ambiguous because the Given_Name1, Pat, is ambiguous. However, if the op­ tion is set to Yes, the gender is Strong_Male because the Given_Name2, Robert, is Strong_Male. The same logic applies if the name were P. Robert Smith; the Given_Name1, P, is ambiguous.

Developer Guide 206 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse transform reference Option Description

Use Family Name to As­ When the gender of the prename, Given_Name1, and Given_Name2 are unassigned sign Gender or ambiguous, assigns gender based on the gender of the family name. Uses Fam­ ily_Name1 if gender is assigned and is not ambiguous. Uses Family_Name2 if unable to use Famiy_Name1.

● YES: Turns on the option. ● NO: Turns off the option.

13.6 Firm standardization options

The firm standardization options control how the Data Cleanse transform standardizes firm-related output.

Option Description

Enable Presumptive Specifies whether you want to use presumptive firm parsing on Firm_Line input fields. Firm Parsing YES: Turns on presumptive firm parsing. Data in the Firm_Line input field is treated as a name.

NO: Turns off presumptive firm parsing. Data in the Firm_Line input field that does not parse as a firm remains unparsed and is output to the Extra field.

For example, if the data has a given name and family name in a Firm_Lines input field, the Data Cleanse transform tries to parse the information as a firm based on rules in the cleansing package. If the option is set to NO and Data Cleanse is not able to assign the data, the unparsed data is output to the Extra field. If the option is set to YES, Data Cleanse will assign the data as a firm.

13.7 Other standardization options

Standardization options control how the Data Cleanse transform standardizes many types of output.

Option Description

Capitalization Specifies the casing of your output.

● LOWER: Converts the output to lowercase. For example, john mckay. ● MIXED: Preserves the casing for the standard form as defined within the cleansing package. If a standard form is not defined, the output is converted to mixed case. For example, John McKay. ● PRESERVE: Preserves the input casing. ● UPPER: Converts the output to uppercase. For example, JOHN MCKAY.

Developer Guide Data Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 207 Option Description

Remove Punctuation Removes all punctuation from standardized data (with the exception of hyphens between names).

● YES: Removes punctuation. ● NO: Leaves the punctuation as is on input.

For example, if the standard form for extra large is X.L. and the option is set to YES, the standardized output becomes XL.

Remove Diacritical Removes diacritical characters and replaces it with the ASCII equivalent. Characters ● Yes: Replaces diacritical characters such as accent marks, umlauts, and so on with the ASCII equivalent. ● No: Retains the standardized diacritical characters.

For example, when the option is set to No, the data is output with accent marks such as María Hernández or Geschäftsführer. When the option is set to Yes, the data is out­ put without accent marks such as Maria Hernandez or Geschaeftsfuehrer.

Character Width Specifies the character width used in output fields. Useful when processing Japanese or Style mixed language data.

● NORMAL_WIDTH: Output field width reflects the normalized character width based on the script type. Thus some output columns contain halfwidth characters and other columns contain fullwidth characters. For example, all fullwidth Latin characters are standardized to their halfwidth forms and all halfwidth katakana characters are standardized to their fullwidth forms. NORMAL_WIDTH does not require special processing and therefore is the most efficient setting. ● FULL_WIDTH: Characters are converted from their halfwidth forms to fullwidth forms for all output fields. For characters that do not have fullwidth forms, the halfwidth forms are used. ● HALF_WIDTH: Characters are converted from their fullwidth forms to halfwidth forms for all output fields. For characters that do not have halfwidth forms, the fullwidth forms are used.

Note

Since the output width is based on the normalized width for the character type, the output data may be larger than the input data. You may need to increase the column width in the target table.

For template tables, selecting Use NVARCHAR for VARCHAR columns in supported databases changes the VARCHAR column type to NVARCHAR and allows for increased data size.

Developer Guide 208 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse transform reference Option Description

One To One Mapping Specifies whether to place the input data into the corresponding output field for the fol­ lowing parsers: Phone, Email, Date.

● Yes: Places the parsed data into the corresponding output field. For example, if on in­ put Date1 and Date2 are blank and Date3 contains data, then on output, Date1 and Date2 are blank and the data is placed in Date3. ● No: Places the parsed data into the first available output field in the category. For example, if on input Date1 and Date2 are blank and Date3 contains data, then on output, Date1 contains the parsed data that was input in the Date3 field.

SSN Delimiter Specifies which character to use for standard U.S. Social Security number (SSN) output delimiters.

● BACKSLASH: Uses backward slashes as the delimiter in the SSN. For example, 799\45\6789. ● DASH: Uses dashes as the delimiter in the SSN. For example, 799-45-6789. ● SLASH: Uses forward slashes as the delimiter in the SSN. For example, 799/45/6789. ● NONE: Does not add a delimiter to the SSN. For example, 799456789. ● PERIOD: Uses periods as the delimiter in the SSN. For example, 799.45.6789. ● SPACE: Uses spaces as the delimiter in the SSN. For example, 799 45 6789.

13.8 Date options

Configures standards for date-related data.

Option group Description

Century Threshold Indicates whether a two-digit date is considered part of the 20th or 21st century.

Specify a two-digit integer that represents the first year that a parsed two-digit year is considered part of the 21st century (20xx). All two-digit years greater than the specified integer are considered part of the 20th century (19xx).

For example, if you enter 11, all two-digit years 11 or lower are considered part of the 21st century. 08 is considered 2008. 11 is considered 2011. All two-digit years higher than 11 are considered part of the 20th century. 12 is considered 1912.

Input Month Before Specifies whether the date follows the pattern of having the month first or the day first in Day the input.

● YES:The month is first. For example, 11/12/2004 would be November 12, 2004. ● NO:The day is first. For example, 11/12/2004 would be December 11, 2004.

Developer Guide Data Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 209 Option group Description

Input Year First Specifies whether the date follows the pattern of having the year first in the input.

● YES: The year is first. For example, if your input is 03/02/04, the transform will con­ vert it to 2003 February 4. ● NO: The month is first. For example, 03/02/04 would be March 2, 2004.

Date Format Specifies how to standardize date output.

● YEAR_MONTH_DAY: For example, 2012-08-16 ● YEAR_DAY_MONTH: For example, 2012-16-08 ● MONTH_DAY_YEAR: For example, 08-16-2012 ● DAY_MONTH_YEAR: For example, 16-08-2012

Date Delimiter Specifies what character to use for standard date output delimiters.

● BACKSLASH: Uses backward slashes as the delimiter for the date. For example, 04\01\2010. ● DASH: Uses dashes as the delimiter for the date. For example, 04-01-2010. ● SLASH: Uses forward slashes as the delimiter for the date. For example, 04/01/2010. ● NONE: Does not add a delimiter to the date. For example, 04012010 ● PERIOD: Uses periods as the delimiter for the date. For example, 04.01.2010. ● SPACE: Uses spaces as the delimiter for the date. For example, 04 01 2010.

Numeric Format Specifies the format of numeric date values

● Arabic_Numbers: Returns numeric date values in Arabic ● Chinese_Japanese_Numbers: Returns numeric date values in Chinese or Japanese.

Enable Zero Pad Specifies placement of a zero on the front of one-digit days and months. For example, July 4 could be 04-07 (or 07-04) with a zero pad, and 4-7 (or 7-4) without a zero pad.

● YES: Turns on the option. ● NO: Turns off the option.

Month Format Specifies how to standardize date and month components.

● FULL_TEXT: Standardizes output with spelled-out months in English (for example, March). ● NUMERIC: Standardizes output with numeric months (for example, 03). ● SHORT_TEXT: Standardizes output with abbreviated months in English (for example, Mar).

Year Format Specifies how to standardize date and year components.

● FULL_YEAR:Standardizes output with four-digit years (for example, 2004). ● SHORT_YEAR:Standardizes output with two-digit years (for example, 04).

13.9 Phone options

The PHONE_OPTIONS group controls how the Data Cleanse transform standardizes phone output.

Developer Guide 210 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse transform reference Option Description

ISO2 Country Code Specify the applicable two-character codes to search for phone information. The order in Sequence which you place the codes in the sequence determines the order in which Data Cleanse searches for phone information.

For example, if most of your data contains records from Germany and Australia, make sure the ISO2 code for Germany comes first, followed by Australia: DE|AU

The default setting is Global, which is optional. If you want to keep Global in your list, you should place it last in the sequence. For example: DE|AU|GLOBAL.

For this example sequence, the transform first determines if the phone data matches phone data from Germany. If the data doesn’t parse as German phone data, the trans­ form then checks if the data matches Australia phone data. If the phone data doesn’t parse as Australia phone data, the transform performs a global search and runs the data through the international regular expressions that are set in the cleansing package. Data Cleanse always performs a global search last, regardless of the order in which you place Global in the sequence.

North American Controls placement of parentheses () around the area code of phone number output fol­ Phone Parens Area lowing the North American Numbering Plan (NANP).

● YES: Includes the parentheses. For example, (123) 656-5000. ● NO: Omits the parentheses. For example, 123 656-5000.

North American Specifies placement of a delimiter between the area code and prefix phone output follow­ Phone Delimiter Af­ ing the North American Numbering Plan (NANP). To use this option, you must also spec­ ter Area ify a delimiter in the North_American_Phone_Delimiter option.

● YES: Adds a delimiter. For example, 123-656-5000. ● NO: Does not add a delimiter. For example, 123 656-5000.

North American Specifies a character to use as a delimiter for phone output following the North American Phone Delimiter Numbering Plan (NANP).

● BACKSLASH: Uses backward slashes as the delimiter in the phone number. For ex­ ample, 123\656\5000. ● DASH: Uses dashes as the delimiter in the phone number. For example, 123-656-5000. ● SLASH: Uses forward slashes as the delimiter in the phone number. For example, 123/656/5000. ● NONE: Does not add a delimiter to the phone number. For example, 1236565000. ● PERIOD: Uses periods as the delimiter in the phone number. For example, 123.656.5000. ● SPACE: Uses spaces as the delimiter in the phone number. For example, 123 656 5000.

Phone Extension Specifies the standard text for a phone extension. For example, Ext. Text

Developer Guide Data Cleanse transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 211 13.10 Parser configuration

Controls which parsing engines Data Cleanse uses for parsing multiline fields and the order in which they are applied. If a particular parser is not included, Data Cleanse does not look for that type of data in the input field.

Option Description

Parser Sequence There are up to twelve multiline input fields available. Use this option to assign one or many Multiline1-12 parsers to each field.

Note

Order is important. To order your parsers, enter the text below for the parsers you want to use separated by a pipe (|). For example, for the Multiline1 input field, you could order your parsers like this: EMAIL|SSN|DATE

EMAIL: Parses data as an e-mail address.

SSN: Parses data as a U.S. Social Security number.

DATE: Parses data as a date.

PHONE: Parses data as a telephone number.

PERSON: Parses data as a personal name.

FIRM: Parses data as a firm name.

PERSON_OR_FIRM: Parses data as a personal or firm name.

Developer Guide 212 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Cleanse transform reference 14 Geocoder

The Geocoder transform uses geographic coordinates expressed as latitude and longitude, addresses, and point- of-interest (POI) data to append data to your records. Using the transform, you can append address, latitude and longitude, census data, and other information. For census data, you can use census data from two census periods to compare data, when available.

Based on mapped input fields, the Geocoder transform has two modes of geocode processing:

● point-of-interest and address geocoding ● point-of-interest and address reverse geocoding

In general, the transform uses geocoding directories to calculate latitude and longitude values for a house by interpolating between a beginning and ending point of a line segment where the line segment represents a range of houses. The latitude and longitude values may be slightly offset from the exact location from where the house actually exists.

The Geocoder transform also supports geocoding parcel directories, which contain the most precise and accurate latitude and longitude values available for addresses, depending on the available country data. Geocoding parcel data is stored as points, so rather than getting you near the house, it takes you to the exact door.

Typically, the Geocoder transform is used in conjunction with the Global Address Cleanse or USA Regulatory Address Cleanse transform.

14.1 POI and address geocoding

In address geocoding mode, the Geocoder transform assigns geographic data. Based on the completeness of the input address data, the Geocoder transform can return multiple levels of latitude and longitude data. Including latitude and longitude information in your data may help your organization to target certain population sizes and other regional geographical data.

If you have a complete address as input data, including the primary number, the Geocoder transform returns the latitude and longitude coordinates to the exact location.

If you have an address that has only a locality or Postcode, you receive coordinates in the locality or Postcode area, respectively.

Point-of-interest geocoding lets you provide an address or geographical coordinates to return a list of locations that match your criteria within a geographical area. A point of interest, or POI, is a name of a location that is useful or interesting, such as a gas station or historical monument.

14.2 POI and address reverse geocoding

Reverse geocoding lets you identify the closest address or point of interest based on an input reference location, which can be one of the following:

Developer Guide Geocoder © 2014 SAP SE or an SAP affiliate company. All rights reserved. 213 ● latitude and longitude ● address ● point of interest

Mapping the optional radius input field lets you define the distance from the specified reference point and identify an area in which matching records are located.

With reverse geocoding, you can find one or more locations that can be points of interest, addresses, or both by setting the Search_Filter_Name or Search_Filter_Type input field. This limits the output matches to your search criteria. To return an address only, enter ADDR in the Search_Filter_Type input field. To return a point of interest only, enter the point-of-interest name or type. If you don't set a search filter, the transform returns both addresses and points of interest.

Developer Guide 214 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Geocoder 15 Geocoder transform reference

The Geocoder transform uses geographic coordinates expressed as latitude and longitude, addresses, and point- of-interest (POI) data to append data to your records. Using the transform, you can append address, latitude and longitude, census data, and other information. For census data, you can use census data from two census periods to compare data, when available.

Based on mapped input fields, the Geocoder transform has two modes of geocode processing:

● point-of-interest and address geocoding ● point-of-interest and address reverse geocoding

In general, the transform uses geocoding directories to calculate latitude and longitude values for a house by interpolating between a beginning and ending point of a line segment where the line segment represents a range of houses. The latitude and longitude values may be slightly offset from the exact location from where the house actually exists.

The Geocoder transform also supports geocoding parcel directories, which contain the most precise and accurate latitude and longitude values available for addresses, depending on the available country data. Geocoding parcel data is stored as points, so rather than getting you near the house, it takes you to the exact door.

Typically, the Geocoder transform is used in conjunction with the Global Address Cleanse or USA Regulatory Address Cleanse transform.

The Geocoder transform is flexible enough to accept new country directory data immediately after the directory data is released. There is no need to wait for the next Data Quality Management SDK release to begin using new country directory data. At the time of this publication, the Geocoder transform uses US, Canadian, French, German, and UK directory data. Check with your sales representative for a list of the most current country directories available.

The following sections describe the configurations for the Geocoder XML. You can find examples of the XML configurations with the samples installed with the product.

15.1 Directories

The Geocoder directories are designed specifically for use with the Geocoder transform. You must install the directories and point to them in the Reference Path. Your system administrator should have already installed these files to the appropriate locations.

For the Geocoder transform, all of the U.S. geocoding directories, including the ageo*.dir and cgeo2.dir files, must be in the same location. If they are currently installed in different locations for the USA Regulatory Address Cleanse GeoCensus functionality, you must move them. Other non-U.S. geocoding directories may be installed in different locations.

Developer Guide Geocoder transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 215 15.2 System group

Syntax

Option Description

Link dir Enter the relative path to the match support files.

For example, the default Windows 32-bit location of these files is C:\Program Files\SAP BusinessObjects\Data quality Mgmt SDK\windows_32\DataQuality \geocoder.

You would then enter C:\Program Files\SAP BusinessObjects\Data quality Mgmt SDK\windows_32\

Shared object Enter the name of the transform.

15.3 Geocoder options

The Geocoder transform includes options that control how geocoding data is appended to your data.

15.3.1 Report and analysis

Use this option to generate report data for the Geocoder transform.

Option Description

Generate Report This option controls the statistics output for the StatsHandler. Data YES: Generates statistics for this transform.

NO: Turns off statistics generation.

15.3.2 Reference files

The Reference files option specifies the file path of the directories needed for the Geocoder transform to process your data.

Developer Guide 216 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Geocoder transform reference Related Information

Directories [page 215]

15.3.2.1 Geocoder options

Specifies the assignment level options. This option group is required.

Option Description

Best Assignment Level Specifies the depth of assignment for the latitude and longitude output fields. This option is used for address and point-of-interest geocoding, and also for reverse geo­ coding when an address is used as the input reference point.

PREFERRED: Assigns to the finest depth. For example, the latitude and longitude of the house number. By default, this will assign to the Primary Number level.

PRIMARY_NUMBER: Assigns to the finest depth. For example, the latitude and longi­ tude of the house number. The software first attempts to assign to the primary num­ ber, then postcode, and finally locality if primary number and postcode are not found. If data is found for primary number, then postcode and locality are also assigned.

POSTCODE: Assigns to the postcode level. You will not receive a primary number as­ signment.

LOCALITY: Assigns to the locality, city or suburb level. You will not receive a primary number or postcode assignment.

SMALLEST_AREA: Assigns to the finest depth based on the size of the locality and postcode. The software first attempts to assign to the primary number. If the primary number is not returned, then it assigns based on postcode or locality, depending on which level is the smaller area. If data is found for primary number, then postcode and locality are also assigned. Geocoder compares the locality data to the postcode data, and then assigns to the level based on the smallest area. For example, the French postcode 75014 is a smaller area than the locality of .

Developer Guide Geocoder transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 217 Option Description

Locality Assignment Limits the level of locality centroid assignment. For example, setting the option to Lo­ Threshold cality1 excludes Locality2-4 during assignment, even though the software may return values at those levels. However, if you set the option to Locality4 and there is no Lo­ cality4 data, the finest available data will be shown, even if that is a Locality2 level.

This option is used for address and point-of-interest geocoding, and also for reverse geocoding when an address is used as the input reference point.

LOCALITY1-4: Returns the locality level that you choose. Locality1 is the most general and Locality4 is the most specific.

Address Locality level

Church Cottage Locality3

Pemborough Locality2

Bristol Locality1

In this example, there is no Locality4. If you choose Locality4, you will see the finest depth, Locality3 returned.

NONE: Skips the specific assignment level. Use this setting if you do not want to re­ turn an assignment threshold on locality.

PREFERRED: Assigns the finest depth. For example, the latitude and longitude of the city.

Postcode Assignment Limits the level of postcode centroid assignment. For example, setting the option to Threshold Postcode1 excludes the other levels during assignment, even though the application may return values at those levels.

This option is used for address and point-of-interest geocoding, and also for reverse geocoding when an address is used as the input reference point.

POSTCODE_FULL: Assigns to the entire extended postcode. For example, in the USA, it assigns to the 5-digit postcode and all four digits of the ZIP+4.

POSTCODE1: Assigns to the city or postcode area. For example, in the USA, it assigns to the 5-digit ZIP Code.

POSTCODE2_PARTIAL: Assigns to the first few characters of the extended postcode. For example, in the USA, it assigns the 5-digit postcode and the first two digits of the ZIP+4.

PREFERRED: Assigns the finest depth. For example, the latitude and longitude of the city or postcode area.

NONE: Skips the specific assignment level. Use this setting if you do not want to re­ turn an assignment threshold on postcode.

Developer Guide 218 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Geocoder transform reference Option Description

Offset Coordinates Specifies whether the offset values of latitude and longitude are returned if the side of the street is known. This option is used for address and point-of-interest geocoding, and also for reverse geocoding when an address is used as the input reference point.

YES: Returns the offset values.

NO: Returns the center value regardless of whether the side of the street is known.

Distance Unit Specifies the unit of distance used for the radius.

KILOMETERS

MILES

Default Max Records Specifies the maximum number of records that can be returned. The maximum num­ ber that you can enter is 100.

The value of the Max_Records input field takes precedence over the value of the De­ fault Max Records option. The value of the Default Max Records option is only used if the Max_Records input field is not mapped or is blank.

Default Radius The distance from a specified reference point used to identify an area in which match­ ing records are located.

The value of the Radius input field takes precedence over the value of the Default Ra­ dius option. The value of the Default Radius option is only used if the Radius input field is not mapped or is blank. If a radius is not specified, a default radius of one kilo­ meter is used.

Developer Guide Geocoder transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 219 16 Match

Choosing a match strategy

Think about the answers to these questions before deciding on a match strategy:

● What does my data consist of? (Customer data, international data, and so on.) ● What fields do I want to compare? (Last name, firm, and so on.) ● What are the relative strengths and weaknesses of the data in those fields?

Tip

You will get better results if you cleanse your data before matching. Also, data profiling can help you answer this question.

● What end result do I want when the match job is complete? (One record per family, per firm, and so on.)

Here are a few examples of strategies to help you think about how you want to approach the setup of your matching processing.

● Simple match. Use this strategy when your matching business rules consist of a single match criteria for identifying relationships in consumer, business, or product data. ● Consumer householding. Use this strategy when your matching business rules consist of multiple levels of consumer relationships, such as residential matches, family matches, and individual matches. ● Corporate householding. Use this strategy when your matching business rules consist of multiple levels of corporate relationships, such as corporate matches, subsidiary matches, and contact matches. ● Multinational consumer match. Use this match strategy when your data consists of multiple countries and your matching business rules are different for different countries.

16.1 Match components

The basic components of matching are:

● Match sets ● Match levels ● Match criteria

Match sets

A match set is represented by a Match transform; each match set can have its own match criteria and business rules.

Match sets let you control how the Match transform matches certain records, segregate records, and match on records independently. For example, you could choose to match U.S. records differently than records containing international data.

Developer Guide 220 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match A match set has these purposes:

● To allow only select data into a given set of match criteria for possible comparison (for example, exclude blank SSNs, international addresses, and so on). ● To allow for related match scenarios to be stacked to create a multi-level match set.

Match levels

A match level is an indicator to what type of matching will occur, such as on individual, family, resident, firm, and so on. A match level refers not to a specific criteria, but to the broad category of matching.

You can have as many match levels as you want. You can define each match level in a match set in a way that is increasingly more strict. Multi-level matching feeds only the records that match from match level to match level (for example, resident, family, individual) for comparison.

Match component Description

Family The purpose of the family match type is to determine whether two people should be considered members of the same family, as reflected by their record data. The Match transform compares the last name and the address data. A match means that the two records represent members of the same family. The result of the match is one record per family.

Individual The purpose of the individual match type is to determine whether two records are for the same person, as reflected by their record data. The Match transform compares the first name, last name, and address data. A match means that the two records repre­ sent the same person. The result of the match is one record per individual.

Resident The purpose of the resident match type is to determine whether two records should be considered members of the same residence, as reflected by their record data. The Match transform compares the address data. A match means that the two records represent members of the same household. Contrast this match type with the family match type, which also compares last-name data. The result of the match is one record per residence.

Firm The purpose of the firm match type is to determine whether two records reflect the same firm. This match type involves comparisons of firm and address data. A match means that the two records represent the same firm. The result of the match is one record per firm.

Firm-Individual The purpose of the firm-individual match type is to determine whether two records are for the same person at the same firm, as reflected by their record data. With this match type, we compare the first name, last name, firm name, and address data. A match means that the two records reflect the same person at the same firm. The result of the match is one record per individual per firm.

Developer Guide Match © 2014 SAP SE or an SAP affiliate company. All rights reserved. 221 Match criteria

Match criteria refers to the field you want to match on. You can use criteria options to specify business rules for matching on each of these fields. They allow you to control how close to exact the data needs to be for that data to be considered a match.

For example, you may require first names to be at least 85% similar, but also allow a first name initial to match a spelled out first name, and allow a first name to match a middle name.

● Family level match criteria may include family (last) name and address, or family (last) name and telephone number. ● Individual level match criteria may include full name and address, full name and SSN, or full name and e-mail address. ● Firm level match criteria may include firm name and address, firm name and Standard Industrial Classification (SIC) Code, or firm name and Data Universal Numbering System (DUNS) number.

16.2 Physical and logical sources

Tracking your input data sources and other sources, whether based on an input source or based on some data element in the rows being read, is essential for producing informative match reports. Depending on what you are tracking, you must create the appropriate fields to ensure that the software generates the statistics you want, if you don't already have them in your database.

A physical source is the filename or value attributed to the source of the input data.

A logical source is a group of records spanning multiple input sources or a subset of records from a single input source.

Physical input sources

You track your input data source by assigning that physical source a value in a field. Then you will use this field in the transforms where report statistics are generated.

Add a column or field to your source to track this source ID.

Logical input sources

If you want to count source statistics in the Match transform (for the Match Source Statistics Summary report, for example), you must create a field using a Query transform or a User-Defined transform, if you don't already have one in your input data sources.

This field tracks the various sources within a Reader for reporting purposes, and is used in the Group Statistics operation of the Match transform to generate the source statistics. It is also used in compare tables, so that you can specify which sources to compare.

Developer Guide 222 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match 16.3 Using sources

A source is the grouping of records on the basis of some data characteristic that you can identify. A source might be all records from one input file (physical source), or all records that contain a particular value in a particular field (logical source).

Sources are abstract and arbitrary—there is no physical boundary line between sources. Source membership can cut across input files or database records as well as distinguish among records within a file or database, based on how you define the source.

If you are willing to treat all your input records as normal, eligible records with equal priority, then you do not need to include sources in your job.

Typically, a match user expects some characteristic or combination of characteristics to be significant, either for selecting the best matching record, or for deciding which records to include or exclude from a mailing list, for example. Sources enable you to attach those characteristics to a record, by virtue of that record’s membership in its particular source.

Before getting to the details about how to set up and use sources, here are some of the many reasons you might want to include sources in your job:

● To give one set of records priority over others. For example, you might want to give the records of your house database or a suppression source priority over the records from an update file. ● To identify a set of records that match suppression sources, such as the DMA. ● To set up a set of records that should not be counted toward multi-source status. For example, some mailers use a seed source of potential buyers who report back to the mailer when they receive a mail piece so that the mailer can measure delivery. These are special-type records. ● To save processing time, by canceling the comparison within a set of records that you know contains no matching records. In this case, you must know that there are no matching records within the source, but there may be matches among sources. To save processing time, you could set up sources and cancel comparing within each source. ● To get separate report statistics for a set of records within a source, or to get report statistics for groups of sources.

16.3.1 Source types

You can identify each source as one of three different types: Normal, Suppression, or Special. The software can process your records differently depending on their source type.

Source Description

Normal A Normal source is a group of records considered to be good, eligible records.

Suppress A Suppress source contains records that would often disqualify a record from use. For example, if you’re using Match to refine a mailing source, a suppress source can help remove records from the mailing. Examples:

Developer Guide Match © 2014 SAP SE or an SAP affiliate company. All rights reserved. 223 Source Description

● DMA Mail Preference File ● American Correctional Association prisons/jails sources ● No pandering or non-responder sources ● Credit card or bad-check suppression sources

Special A Special source is treated like a Normal source, with one exception. A Special source is not counted when determining whether a match group is single-source or multi-source. A Special source can contribute records, but it’s not counted toward multi-source status.

For example, some companies use a source of seed names. These are names of people who report when they receive advertising mail, so that the mailer can measure mail delivery. Appearance on the seed source is not counted toward multi-source status.

The reason for identifying the source type is to set that identity for each of the records that are members of the source. Source type plays an important role in how the software processes matching records (the members of match groups) and how the software produces output (that is, whether it includes or excludes a record from its output).

16.3.2 Source groups

The source group capability adds a higher level of source management. For example, suppose you rented several files from two brokers. You define five sources to be used in ranking the records. In addition, you would like to see your job’s statistics broken down by broker as well as by file. To do this, you can define groups of sources for each broker.

Source groups primarily affect reports. However, you can also use source groups to select multi-source records based on the number of source groups in which a name occurs.

16.4 Prepare data for matching

Data correction and standardization

Accurate matches depend on good data coming into the Match transform. For batch matching, we recommend that you perform address cleansing and Data Cleanse before you attempt matching.

Developer Guide 224 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match Filter out empty records

You should filter out empty records before matching. This will help performance.

Noise words

You can perform a search and replace on words that are meaningless to the matching process. For matching on firm data, words such as Inc., Corp., and Ltd. can be removed.

Break groups

Sorting and combining records into smaller groups (what we refer to as break groups) that are more likely to match is important for improving performance. Comparing one million records against each other takes a lot of time.

Break groups organize records into collections that are potential matches, thus reducing the number of comparisons that the Match transform must perform. For example, a break group may contain only records from a particular ZIP code or the first three digits of a ZIP code.

Note

Break group creation must be done by you before the Match transform processes the data.

Match standards

You may want to include variations of name or firm data in the matching process to help ensure a match. For example, a variation of Bill might be William. When making comparisons, you may want to use the original data and one or more variations. You can add anywhere from one to five variations or match standards, depending on the type of data.

For example, if the first names are compared but don't match, the variations are then compared. If the variations match, the two records still have a chance of matching rather than failing, because the original first names were not considered a match.

Custom Match Standards

You can match on custom Data Cleanse output fields and associated aliases. Map the custom output fields from Data Cleanse.

Developer Guide Match © 2014 SAP SE or an SAP affiliate company. All rights reserved. 225 16.4.1 Fields to include for matching

To take advantage of the wide range of features in the Match transform, you will need to map a number of input fields, other than the ones that you want to use as match criteria.

Example

Here are some of the other fields that you might want to include. The names of the fields are not important, as long as you remember which field contains the appropriate data.

Field contents Contains...

Logical source A value that specifies from which logical source a record originated. This field is used in the Group Statistics operation and in compare tables.

Physical source A value that specifies from which physical source a record originated.

Criteria fields The fields that contain the data you want to match on.

This is not a complete list. Depending on the features you want to use, you may want to include other fields that will be used in the Match transform.

16.5 Compare tables

Compare tables are sets of rules that define which records to compare, sort of an additional way to create break groups. You use your logical source values to determine which records are compared or are not compared.

By using compare tables, you can compare records within sources, or you can compare records across sources, or a combination of both.

16.6 Data Salvage

Data salvaging temporarily copies data from a passenger record to the driver record after comparing the two records. The data that’s copied is data that is found in the passenger record but is missing or incomplete in the driver record. Data salvaging prevents blank matching or initials matching from matching records that you may not want to match.

For example, we have the following match group. If you did not enable data salvaging, the records in the first table would all belong to the same match group because the driver record, which contains a blank Name field, matches both of the other records.

Record Name Address Postcode

1 (driver) 123 Main St. 54601

2 John Smith 123 Main St. 54601

Developer Guide 226 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match Record Name Address Postcode

3 Jack Hill 123 Main St. 54601

If you enabled data salvaging, the software would temporarily copy John Smith from the second record into the driver record. The result: Record #1 matches Record #2, but Record #1 does not match Record #3 (because John Smith doesn’t match Jack Hill).

Record Name Address Postcode

1 (driver) John Smith (copied from record below) 123 Main St. 54601

2 John Smith 123 Main St. 54601

3 Jack Hill 123 Main St. 54601

The following example shows how this is used for a suppression source. Assume that the suppression source is a list of no-pandering addresses. In that case, you would set the suppression source to have the highest priority, and you would not enable data salvaging. That way, the software suppresses all records that match the suppression source records.

For example, a suppress record of 123 Main St would match 123 Main St #2 and 123 Main St Apt C; both of these would be suppressed.

16.6.1 Data salvaging and initials

When a driver record’s name field contains an initial, instead of a full name, the software may temporarily borrow the full name if it finds one in the corresponding field of a matching record. This is one form of data salvaging.

For illustration, assume that the following three records represent potentially matching records (for example, the software has grouped these as members of a break group, based on address and ZIP Code data).

Note

Initials salvaging only occurs with the given name and family name fields.

Record First name Last name Address Notes

357 J L 123 Main Driver

391 Juanita Lopez 123 Main

839 Joanne London 123 Main Lowest ranking record

The first match comparison will be between the driver record (357) and the next highest ranking record (391). These two records will be called a match. Juanita and Lopez are temporarily copied to the name fields of record 357.

The next comparison will be between record 357 and the next lower ranking record (839). With data salvaging, the driver record’s name data is now Juanita Lopez (as “borrowed” from the first comparison). Therefore, record 839 will probably be considered not to match record 357.

By retaining more information for the driver record, data salvaging helps improve the quality of your matching results.

Developer Guide Match © 2014 SAP SE or an SAP affiliate company. All rights reserved. 227 Initials and suppress-type records

However, if the driver record is a suppress-type record, you may prefer to turn off data salvaging, to retain your best chance of identifying all the records that match the initialized suppression data. For example, if you want to suppress names with the initials JL (as in the case above, you would want to find all matches to JL regardless of the order in which the records are encountered in the break group.

If you have turned off data salvaging for the records of this suppression source, here is what happens during those same two match comparisons:

Record First name Last name Address Notes

357 J L 123 Main Driver

391 Juanita Lopez 123 Main

839 Joanne London 123 Main Lowest ranking record

The first match comparison will be between the driver record (357) and the next- highest ranking record (391). These two records will be called a match, since the driver record’s JL and Juanita Lopez will be called a match.

The next comparison will be between the driver record (357) and the next lower ranking record (839). This time these two records will also be called a match, since the driver record’s JL will match Joanne London.

Since both records 391 and 839 matched the suppress-type driver record, they are both designated as suppress matches, and, therefore, neither will be included in your output.

16.7 Overview of match criteria

Use match criteria in each match level to determine the threshold scores for matching and to define how to treat various types of data, such as numeric, blank, name data, and so on (your business rules).

Match criteria

To the Match transform, match criteria represent the fields you want to compare. For example, if you wanted to match on the first ten characters of a given name and the first fifteen characters of the family name, you must create two criteria that specify these requirements.

Criteria provide a way to let the Match transform know what kind of data is in the input field and, therefore, what types of operations to perform on that data.

Pre-defined vs. custom criteria

There are two types of criteria:

● Pre-defined criteria are available for fields that are typically used for matching, such as name, address, and other data. By assigning a criteria to a field, the Match transform is able to identify what type of data is in the

Developer Guide 228 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match field, and allow it to perform internal operations to optimize the data for matching, without altering the actual input data. ● Data Cleanse custom (user-defined, non party-data) output fields are available as pre-defined criteria. Map the custom output fields from Data Cleanse and the custom fields appear in the Match Editor's Criteria Fields tab. ● Any other types of data (such as part numbers or other proprietary data), for which a pre-defined criteria does not exist, should be designated as a custom criteria. Certain functions can be performed on custom keys, such as abbreviation, substring, and numeric matching, but the Match transform cannot perform some cross-field comparisons such as some name matching functions.

Match criteria pre-comparison options

The majority of your data standardization should take place in the address cleansing and Data Cleanse transforms. However, the Match transform can perform some preprocessing per criteria (and for matching purposes only; your actual data is not affected) to provide more accurate matches. The options to control this standardization are located in the group in the XML.

● Convert diacritical characters ● Convert text to numbers ● Convert to uppercase ● Remove punctuation ● Locale

16.8 Matching methods

There are a number of ways to set up and order your criteria to get the matching results you want. Each of these ways have advantages and disadvantages, so consider them carefully.

Match method Description

Rule-based Allows you to control which criteria determines a match. This method is easy to set up.

Weighted-scor­ Allows you to assign importance, or weight, to any criteria. However, weighted-scoring evalu­ ing ates every rule before determining a match, which might cause an increase in processing time.

Combination Same relative advantages and disadvantages as the other two methods. method

16.8.1 Similarity score

The similarity score is the percentage that your data is alike. This score is calculated internally by the application when records are compared. Whether the application considers the records a match depends on the Match and

Developer Guide Match © 2014 SAP SE or an SAP affiliate company. All rights reserved. 229 No match scores you define in the in the XML (as well as other factors, but for now let's focus on these scores).

Example

This is an example of how similarity scores are determined. Here are some things to note:

● The comparison table below is intended to serve as an example. This is not how the matching process works in the weighted scoring method, for example. ● Only the first comparison is considered a match, because the similarity score met or exceeded the match score. The last comparison is considered a no-match because the similarity score was less than the no- match score. ● When a single criteria cannot determine a match, as in the case of the second comparison in the table below, the process moves to the next criteria, if possible.

Comparison No match Match Similarity score Matching?

Smith > Smith 72 95 100% Yes

Smith > Smitt 72 95 80% Depends on other criteria

Smith > Smythe 72 95 72% No

Smith > Jones 72 95 20% No

16.8.2 Rule-based method

With rule-based matching, you rely only on your match and no-match scores to determine matches within a criteria.

Example

This example shows how to set up this method in the Match transform.

Criteria Record A Record B No match Match Similarity score

Given Name1 Mary Mary 82 101 100

Family Name Smith Smitt 74 101 80

E-mail [email protected] [email protected] 79 80 91

By entering a value of 101 in the match score for every criteria except the last, the Given Name1 and Family Name criteria never determine a match, although they can determine a no match.

By setting the Match score and No match score options for the E-mail criteria with no gap, any comparison that reaches the last criteria must either be a match or a no match.

A match score of 101 ensures that the criteria does not cause the records to be a match, because two fields cannot be more than 100 percent alike.

Developer Guide 230 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match Remember

Order is important! For performance reasons, you should have the criteria that is most likely to make the match or no-match decisions first in your order of criteria. This can help reduce the number of criteria comparisons.

16.8.3 Weighted-scoring method

In a rule-based matching method, the application gives all of the criteria the same amount of importance (or weight). That is, if any criteria fails to meet the specified match score, the application determines that the records do not match.

When you use the weighted scoring method, you are relying on the total contribution score for determining matches, as opposed to using match and no-match scores on their own.

Contribution values

Contribution values are your way of assigning weight to individual criteria. The higher the value, the more weight that criteria carries in determining matches. In general, criteria that might carry more weight than others include account numbers, Social Security numbers, customer numbers, Postcode1, and addresses.

Note

All contribution values for all criteria that have them must total 100. You do not need to have a contribution value for all of your criteria.

You can define a criteria's contribution value in the option in the in the XML.

Contribution and total contribution score

The Match transform generates the contribution score for each criteria by multiplying the contribution value you assign with the similarity score (the percentage alike). These individual contribution scores are then added to get the total contribution score.

Weighted match score

In the weighted scoring method, matches are determined only by comparing the total contribution score with the weighted match score. If the total contribution score is equal to or greater than the weighted match score, the

Developer Guide Match © 2014 SAP SE or an SAP affiliate company. All rights reserved. 231 records are considered a match. If the total weighted score is less than the weighted match score, the records are considered a no-match.

You can set the weighted match score in the Weighted match score option of the in the XML.

Example

The following table is an example of how to set up weighted scoring. Notice the various types of scores that we have discussed. Also notice the following:

● When setting up weighted scoring, the No match score option must be set to -1, and the Match score option must be set to 101. These values ensure that neither a match nor a no-match can be found by using these scores. ● We have assigned a contribution value to the E-mail criteria that gives it the most importance.

Criteria Record A Record B No match Match Similarity Contribu­ Contribution score score tion value (similarity X contribu­ tion value)

First Mary Mary -1 101 100 25 25 Name

Last Smith Smitt -1 101 80 25 20 Name

E-mail ms@ msmith@ -1 101 84 50 42 sap.com sap.com

Total contribution score: 87

If the weighted match score is 87, then any comparison whose total contribution score is 87 or greater is considered a match. In this example, the comparison is a match because the total contribution score is 87.

16.8.4 Combination method

This method combines the rule-based and weighted scoring methods of matching.

Criteria Record A Record B No match Match Sim score Contribu­ Contribution score tion value (actual similarity X contribution value)

First Mary Mary 59 101 100 25 25 Name

Last Name Smith Hope 59 101 22 N/A (No N/A Match)

E-mail ms@ msmith@ 49 101 N/A N/A N/A sap.com sap.com

Developer Guide 232 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match Criteria Record A Record B No match Match Sim score Contribu­ Contribution score tion value (actual similarity X contribution value)

Total con­ N/A tribution score

16.9 Matching business rules

An important part of the matching process is determining how you want to handle various forms of and differences in your data. For example, if every field in a record matched another record's fields, except that one field was blank and the other record's field was not, would you want these records to be considered matches? Figuring out what you want to do in these situations is part of defining your business rules. Match criteria () are where you define most of your business rules.

16.9.1 Matching on strings, abbreviations, and initials

Initials and acronyms

Use the Initials adjustment score option to allow matching initials to whole words. For example, "International Health Providers" can be matched to "IHP".

Abbreviations

Use the Abbreviation adjustment score option to allow matching whole words to abbreviations. For example, "International Health Providers" can be matched to "Intl Health Providers".

String data

Use the Substring adjustment score option to allow matching longer strings to shorter strings. For example, the string "Mayfield Painting and Sand Blasting" can match "Mayfield painting".

Developer Guide Match © 2014 SAP SE or an SAP affiliate company. All rights reserved. 233 16.9.2 Extended abbreviation matching

Extended abbreviation matching offers functionality that handles situations not covered by the Initials adjustment score, Substring adjustment score, or Abbreviation adjustment score options. For example, you might encounter the following situations:

● Suppose you have localities in your data such as La Crosse and New York. However, you also have these same localities listed as LaCrosse and NewYork (without spaces). Under normal matching, you cannot designate these (La Crosse/LaCrosse and New York/NewYork) as matching 100%; the spaces prevent this. (These would normally be 94 and 93 percent matching.) ● Suppose you have Metropolitan Life and MetLife (an abbreviation and combination of Metropolitan Life) in your data. The Abbreviation adjustment score option cannot detect the combination of the two words.

If you are concerned about either of these cases in your data, you should use the Ext abbreviation adjustment score option.

How the adjustment score works

The score you set in the Ext abbreviation adjustment score option tunes your similarity score to consider these types of abbreviations and combinations in your data.

The adjustment score adds a penalty for the non-matched part of the words. The higher the number, the greater the penalty. A score of 100 means no penalty and score of 0 means maximum penalty.

Example

String 1 String 2 Sim score Sim score Sim score Notes when Adj when Adj when Adj score is 0 score is 50 score is 100

MetLife Metropolitan Life 58 79 100

MetLife Met Life 93 96 100

MetLife MetropolitanLife 60 60 60 This score is due to string comparison. Extended Abbreviation scoring was not needed or used because both strings being compared are each one word.

16.9.3 Name matching

Part of creating your business rules is to define how you want names handled in the matching process. The Match transform gives you many ways to ensure that variations on names or multiple names, for example, are taken into consideration.

Developer Guide 234 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match Note

Unlike other business rules, these options are set up in the match level option group, because they affect all appropriate name-based match criteria.

Two names; two persons

With the Number of names that must match option, you can control how matching is performed on match keys with more than one name (for example, comparing "John and Mary Smith" to "Dave and Mary Smith"). Choose whether only one name needs to match for the records to be identified as a match, or whether the Match transform should disregard any persons other than the first name it parses.

With this method you can require either one or both persons to match for the record to match.

Two names; one person

With the Compare Given_Name1 to Given_Name2 option, you can also compare a record's Given_Name1 data (first name) with the second record's Given_Name2 data (middle name). With this option, the Match transform can correctly identify matching records such as the two partially shown below. Typically, these record pairs represent sons or daughters named for their parents, but known by their middle name.

Record # First name Middle name Last name Address

170 Leo Thomas Smith 225 Pushbutton Dr

198 Tom Smith 225 Pushbutton Dr

Hyphenated family names

With the Match on hyphenated family name option, you can control how matching is performed if a Family_Name (last name) field contains a hyphenated family name (for example, comparing "Smith-Jones" to "Jones"). Choose whether both criteria must have both names to match or just one name that must match for the records to be called a match.

16.9.4 Numeric data matching

Use the Numeric words match exactly option to choose whether data with a mixture of numbers and letters should match exactly. You can also specify how this data must match. This option applies most often to address data and custom data, such as a part number.

The numeric matching process is as follows:

Developer Guide Match © 2014 SAP SE or an SAP affiliate company. All rights reserved. 235 1. The string is first broken into words. The word breaking is performed on all punctuation and spacing, and then the words are assigned a numeric attribute. A numeric word is any word that contains at least one number from 0 to 9. For example, 4L is considered a numeric word, whereas FourL is not. 2. Numeric matching is performed according to the option setting that you choose (as described below).

Option values and how they work

Option value Description

NONE Specifies that numeric words don't need to match exactly to be considered a match.

ANY_POSITION With this value, numeric words must match exactly; however, the position of the word is not important. For example:

● Street address comparison: "4932 Main St # 101" and "# 101 4932 Main St" are considered a match. ● Street address comparison: "4932 Main St # 101" and "# 102 4932 Main St" are not considered a match. ● Part description: "ACCU 1.4L 29BAR" and "ACCU 29BAR 1.4L" are considered a match.

SAME_POSITION This value specifies that numeric words must match exactly; however, this option dif­ fers from the Any_Position option in that the position of the word is important. For ex­ ample, 608-782-5000 will match 608-782-5000, but it will not match 782-608-5000.

ANY_POSITION_CON­ This value performs word breaking on all punctuation and spaces except on the deci­ SIDER_PUNCTUATION mal separator (period or comma) so that all decimal numbers are not broken. For ex­ ample, the string 123.456 is considered a single numeric word as opposed to two nu­ meric words.

The position of the numeric word is not important; however, decimal separators do impact the matching process. For example:

● Part description: "ACCU 29BAR 1.4L" and "ACCU 1.4L 29BAR" are considered a match. ● Part description: "ACCU 1,4L 29BAR" and "ACCU 29BAR 1.4L" are not considered a match because there is a decimal indicator between the 1 and the 4 in both cases. ● Financial data: "25,435" and "25.435" are not considered a match.

ANY_POSITION_IG­ This value is similar to the ANY_POSITION_CONSIDER_PUNCTUATION value, except NORE_PUNCTUATION that decimal separators do not impact the matching process. For example:

● Part description: "ACCU 29BAR 1.4L" and "ACCU 1.4L 29BAR" are considered a match. ● Part description: "ACCU 1,4L 29BAR" and "ACCU 29BAR 1.4L" are also consid­ ered a match even though there is a decimal indicator between the 1 and the 4. ● Part description: "ACCU 29BAR 1.4L" and "ACCU 1.5L 29BAR" are not considered a match.

Developer Guide 236 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match 16.9.5 Blank field matching

In your business rules, you can control how the Match transform treats field comparisons when one or both of the fields compared are blank.

For example, the first name field is blank in the second record shown below. Would you want the Match transform to consider these records matches or no matches? What if the first name field were blank in both records?

Record #1 Record #2

John Doe _____ Doe

204 Main St 204 Main St

La Crosse WI La Crosse WI

54601 54601

There are some options in the Match transform that allow you to control the way these are compared. They are:

● Both fields blank operation ● Both fields blank score ● One field blank operation ● One field blank score

Blank field operations

The "operation" options have the following value choices:

Option Description

EVAL If you choose Eval, the Match transform scores the comparison using the score you enter at the One field blank score or Both fields blank score option.

IGNORE If you choose Ignore, the score for this field rule does not contribute to the overall weighted score for the record comparison. In other words, the two records shown above could still be considered duplicates, despite the blank field.

Blank field scores

The "Score" options control how the Match transform scores field comparisons when the field is blank in one or both records. You can enter any value from 0 to 100.

To help you decide what score to enter, determine if you want the Match transform to consider a blank field 0 percent similar to a populated field or another blank field, 100 percent similar, or somewhere in between.

Your answer probably depends on what field you're comparing. Giving a blank field a high score might be appropriate if you're matching on a first or middle name or a company name, for example.

Developer Guide Match © 2014 SAP SE or an SAP affiliate company. All rights reserved. 237 Example

Here are some examples that may help you understand how your settings of these blank matching options can affect the overall scoring of records.

One field blank operation for Given_Name1 field set to Ignore

Note that when you set the blank options to IGNORE, the Match transform redistributes the contribution allotted for this field to the other criteria and recalculates the contributions for the other fields.

Fields compared Record A Record B % alike Contribution Score (per field)

Postcode 54601 54601 100 20 (or 22) 22

Address 100 Water St 100 Water St 100 40 (or 44) 44

Family_Name Hamilton Hammilton 94 30 (or 33) 31

Given_Name1 Mary — 10 (or 0) —

Weighted score: 97

One field blank operation for Given_Name1 field set to EVAL; One field blank score set to 0

Fields compared Record A Record B % alike Contribution Score (per field)

Postcode 54601 54601 100 20 20

Address 100 Water St 100 Water St 100 40 40

Family_Name Hamilton Hammilton 94 30 28

Given_Name1 Mary 0 10 0

Weighted score: 88

One field blank operation for Given_Name1 field set to EVAL; One field blank score set to 100

Fields compared Record A Record B % alike Contribution Score (per field)

Postcode 54601 54601 100 20 20

Address 100 Water St 100 Water St 100 40 40

Family_Name Hamilton Hammilton 94 30 28

Given_Name1 Mary 100 10 10

Weighted score: 98

16.9.6 Multiple field (cross-field) comparison

In most cases, you use a single field for comparison. For example, Field1 in the first record is compared with Field1 in the second record.

Developer Guide 238 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match However, there are situations where comparing multiple fields can be useful. For example, suppose you want to match telephone numbers in the Phone field against numbers found in fields used for Fax, Mobile, and Home. Multiple field comparison makes this possible.

Note

By default, Match performs multiple field comparison on fields where match standards are used. For example, Person1_Given_Name1 is automatically compared to Person1_Given_Name_Match_Std1-6. Multiple field comparison does not need to be explicitly enabled, and no additional configuration is required to perform multiple field comparison against match standard fields.

16.10 Group statistics

The Group Statistics post-match operation should be added after any match level and any post-match operation for which you need statistics about your match groups or your input sources.

This operation can also count statistics from logical input sources that you have already identified with values in a field (pre-defined) or from logical sources that you specify in the Input Sources operation.

This operation also allows you to exclude certain logical sources based on your criteria.

Note

If you choose to count input source statistics in the Group Statistics operation, Match will also count basic statistics about your match groups.

Group statistics fields

When you include a Group Statistics operation in your Match transform, the following fields are generated by default:

● GROUP_COUNT ● GROUP_ORDER ● GROUP_RANK ● GROUP_TYPE

In addition, if you choose to generate source statistics, the following fields are also generated and available for output:

● SOURCE_COUNT ● SOURCE_ID ● SOURCE_ID_COUNT ● SOURCE_TYPE_ID

Developer Guide Match © 2014 SAP SE or an SAP affiliate company. All rights reserved. 239 16.11 Input source select records

By adding an Input source select record operation to each match level (Post Match Processing) you want, you can flag specific record types for evaluation.

Adding this operation generates the Select_Record output field for you to include in your output schema. This output field is populated with a Y or N depending on the type of record you select in the operation.

Your results will appear in the Match Input Source Output Select report. In that report, you can determine which records came from which source or source group and how many of each type of record were output per source or source group.

Record type Description

Unique Records that are not members of any match group. No matching records were found. These can be from sources with a normal or special source.

Single source masters Highest ranking member of a match group whose members all came from the same source. Can be from normal or special sources.

Single source A record that came from a normal or special source and is a subordinate member of subordinates a match group.

Multiple source masters Highest ranking member of a match group whose members came from two or more sources. Can be from normal or special sources.

Multiple source A subordinate record of a match group that came from a normal or special source subordinates whose members came from two or more sources.

Suppression matches Subordinate member of a match group that includes a higher-priority record that came from a suppress-type source. Can be from normal or special source.

Suppression uniques Records that came from a suppress source for which no matching records were found.

Suppression masters A record that came from a suppress source and is the highest ranking member of a match group.

Suppression A record that came from a suppress-type source and is a subordinate member of a subordinates match group.

Developer Guide 240 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match 17 Match transform reference

The Match transform is responsible for performing matching based on the business rules you define. The transform then sends matching and unique records on to the next transform.

For best results, the data in which you are attempting to find matches should be cleansed. Therefore, you may need to include other Data Quality transforms before the Match transform.

The following sections describe the configurations for the Match XML. You can find examples of the XML configurations with the samples installed with the product.

17.1 System group

Syntax

The System group controls high-level transform functions.

Note

If an option exists in the XML, but it is not documented here, do not alter the contents of the option. Editing the contents could cause errors.

Option Description

Link dir Enter the relative path to the match support files.

For example, the default Windows 32-bit location of these files is C:\Program Files\SAP BusinessObjects\Data quality Mgmt SDK\windows_32\DataQuality\match.

You would then enter C:\Program Files\SAP BusinessObjects\Data quality Mgmt SDK\windows_32\

17.1.1 MatchSettings

Syntax

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 241

Option Description

MemoryInKBForComparisons The maximum number of kilobytes in the memory buffer used for compari­ son. This value, in conjunction with the largest break group size, can be used to fine-tune performance of the match process. If the largest break group size is smaller than the number of records that can fit in the compare buffer, then the records will be stored and accessed from memory. This makes the proc­ ess go faster. However, if the largest break group is bigger, then some cach­ ing will be involved and it may slow down processing. In order to fix it, you can either change the breaking strategy to make smaller break groups or you can increase the buffer size.

If this compare buffer is too large, this could also degrade performance.

Default value is10240 KB.

PreserveRecordOrder Controls whether each record of a break group is completed in the same or­ der as it sits in the break group (for example, record 1 is completed first, re­ cord 2 is completed second, etc). There are some performance benefits to not completing records in order. For example, there may be some efficiency gains by completing all the unique records first. Valid values are YES and NO.

NO is the default value.

WorkDirectory1-8 Enter a directory path to house the temporary work files used in the match process.

17.2 Report and analysis

The Report and analysis group allows you to generate statistics about the Match transform. This group is required and cannot be repeated.

Option Description

Generate report data This option controls the statistics output for the StatsHandler.

YES: Generates statistics for this transform.

NO: Turns off statistics generation.

Developer Guide 242 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference 17.3 Match control

The Match Control group controls what processing is to be performed by the Match transform. This is a required object that cannot be repeated.

Option Description

Name Specifies a logical name to the object. This could be used in reports. Minimum length is 1 and maximum length is 15.

This option is required and cannot be repeated.

17.3.1 Match levels group

Syntax

Option Description

Match levels Defines a match level by specifying the name of a Match Level object.

This is a required group and may be repeated with a maximum of 255 match levels de­ fined.

Match level name Specifies a match level by specifying the name of a Match Level object.

This option is required and cannot be repeated.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 243 17.3.2 Input fields group

Syntax

The Input fields group defines the input fields for the Match Control group.

Option Description

Field The field group defines one input field.

This group is optional and may be repeated up to 2 times.

Mapped name Specifies the mapped name of the input field.

Field type Specifies the type of the input field. Valid values for this required attribute are SOURCE_ID or DATA_SOURCE_ID. Both of these fields are used for generated statis­ tics. If a SOURCE_ID input field is omitted, then the source names found in the Input Source group are used instead.

There can only be one input field with a filed type of SOURCE_ID and only one input field with a field type of DATA_SOURCE_ID.

17.4 Match level group

Syntax

The Match Level group defines the match operations for a single match level. The match operations include the Compare object to compare two records, the Output Fields to post match results, and the Post Match Processing object to perform post processing. This group cannot be repeated.

Option Description

Match level Defines one match level.

This group is required and can be repeated.

Developer Guide 244 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Option Description

Name Specifies a logical name to the object. This name is referenced in the Match Control object. Minimum length is 1 and maximum length is 15.

Compare Specifies the name of the Compare object to be used to compare two records. The name specified should match the NAME attribute of a Compare Table object or a Compare Match Criteria object.

Post match Specifies a Post Match Processing group to use to perform post match processing. processing name

The Match Level group can post the following output fields: GROUP_NUMBER, MATCH_STATUS, MATCH_TYPE, MATCH_LEVEL, MATCH_CRITERION, or MATCH_SCORE.

17.5 Match criteria standard keys

Syntax

Key Description

ADDRESS_DATA1-5 Use for address data that is not accounted for in other address- based criteria in the Geographic category.

You can also use this key for fields that you know contain address data, but you're not sure which type it contains, or you can use it for international data that has not been parsed.

ADDRESS_POST_OFFICE_BOX_NUMBER Post Office Box number.

ADDRESS_PRIMARY_NAME Street name data.

ADDRESS_PRIMARY_NUMBER Street number data.

ADDRESS_PRIMARY_POSTFIX Address data that comes at the end of a street name, such as a directional.

ADDRESS_PRIMARY_PREFIX Address data that comes at the beginning of a street name, such as a directional.

ADDRESS_PRIMARY_TYPE Data that tells what type of street it is (street, boulevard, lane, and so on).

ADDRESS_PRIVATE_MAIL_BOX A private mail box (PMB) number. These are mail boxes that are not run by a postal authority.

ADDRESS_RURAL_ROUTE_BOX Rural-route box number (number only, without “Box” prefix).

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 245 Key Description

ADDRESS_RURAL_ROUTE_NUMBER Rural route number.

ADDRESS_SECONDARY_NUMBER The number of a unit, building, floor, or room.

UNPARSED_ADDRESS_LINE

COUNTRY Country name.

LOCALITY City, town, locality, or suburb.

POSTCODE1 Primary postal code.

POSTCODE2 Secondary postal code.

REGION Region data, such as state or province.

UNPARSED_LAST_LINE

FIRM Firm name.

FIRM _DATA1-3 Use for firm data that is not accounted for in other firm-based cri­ teria. You can also use this criteria for fields that you know con­ tain firm data, but you're not sure which type it contains. You can also use this for international data.

FIRM _MATCH_STD1-6 Firm match standards. The data in these fields is generated by the Data Cleanse transform or other pre-Match transforms.

FIRM _LOCATION A location within a company or organization.

FIRM _LOCATION _MATCH_STD1-6 Match standards for a location within a company or organization.

NAME_DATA1-3 Use for name data that is not accounted for in other name-based criteria. You can also use this criteria for fields that you know con­ tain name data, but you're not sure which type.

PERSON1-3_GIVEN_NAME1 The given name1 (first name) of the persons.

PERSON1_GIVEN_NAME 1_MATCH_STD1-6 Given_Name1 (first name) match standards for the first person.

PERSON2_GIVEN_NAME 1_MATCH_STD1-6 Given_Name1 (first name) match standards for the second per­ son.

PERSON3_GIVEN_NAME 1_MATCH_STD1-6 Given_Name1 (first name) match standards for the third person.

PERSON1-3_GENDER Gender.

PERSON1-3_FAMILY_NAME1 Family (last) name.

PERSON1_FAMILY_NAME1_MATCH_STD1-6 Family_Name1 (last name) match standards for the first person.

Developer Guide 246 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Key Description

PERSON2_FAMILY_NAME1_MATCH_STD1-6 Family_Name1 (last name) match standards for the second per­ son.

PERSON3_FAMILY_NAME1_MATCH_STD1-6 Family_Name1 (last name) match standards for the third person.

PERSON1-3_FAMILY_NAME2 Family name

PERSON1_FAMILY_NAME2_MATCH_STD1-6

PERSON2_FAMILY_NAME2_MATCH_STD1-6

PERSON3_FAMILY_NAME2_MATCH_STD1-6

PERSON1-3_MATURITY_POSTNAME Maturity postname.. For example, Sr. or Jr. (one standard per person).

PERSON1-3_MATURITY_POSTNAME_MATC Maturity postname match standards for the persons in your data H_STD1-6 record (one standard per person).

PERSON1-3_GIVEN_NAME2 Given name2 (middle name).

PERSON1_GIVEN_NAME2_MATCH_STD1-3 Given_Name2 (middle name) match standards for the first per­ son.

PERSON2_GIVEN _NAME2_MATCH_STD1-3 Given_Name2 (middle name) match standards for the second person.

PERSON3_GIVEN _NAME2_MATCH_STD1-3 Given_Name2 (middle name) match standards for the third per­ son.

PERSON1-3_HONORARY_POSTNAME Honorary postname for up to three persons indicating certifica­ tion, academic degree, or affiliation. For example, CPA.

PERSON1-3_HONORARY_POSTNAME_MAT Honorary postname match standards (one standard per person). CH_STD1-6

PERSON1-3_PRENAME Prename (for example, Mr. or Mrs.) for up to three persons.

PERSON1-3_PRENAME_MATCH_STD1-6 Prename match standards (one standard per person).

PERSON1-3_TITLE Job or occupational title of each person. For example, Manager.

PERSON1-3_TITLE_MATCH_STD1-6 Title match standards for each person.

SOCIAL_SECURITY_NUMBER1-3 Social Security numbers for up to three people in a record.

DATE1-3 Date data. For example, birthdate data.

PHONE Phone number.

CUSTOM Use custom fields to match data that does not qualify for any of the specifically named criteria.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 247 17.6 Match criteria key layout

Syntax

The Match Criteria Key Layout group is used by the Compare Match Criteria groups. All records of each data collection will use the same Match Criteria Key Layout. This group is optional and cannot be repeated.

Developer Guide 248 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Option Description

Match engine Specifies the match engine to use, based on the type of data you will be processing. If you use the Multinational strategy in the Match wizard, this option is set to Latin1 for all match sets.

CHINESE: Specifies that the Match transform will be processing Chinese data in Chinese script.

JAPANESE: Specifies that the Match transform will be processing Japanese data in Japanese script.

KOREAN: Specifies that the Match transform will be processing Korean data in Korean script.

LATIN1: Specifies that the Match transform will be processing Latin1 data. In general, this is the data used throughout the Americas, Western Europe, Oceania, and much of Africa. If you attempt to process non-latin1 data with the Latin1 engine, the results are unpredictable.

OTHER_NON_LATIN1: Specifies that the Match transform will be processing non-Latin1 data, other than Chinese, Japanese, Korean, and Taiwanese, such as Russian, Greek, Hebrew, Arabic, and others.

TAIWANESE: Specifies that the Match transform will be processing Taiwa­ nese data in Taiwanese script.

For optimum accuracy and performance, be sure that you have filtered your multinational data to separate match transforms with the appropriate match engine selected.

Default perform data salvage Specifies the default value that indicates whether to perform data salvage.

Data salvage is performed if the driver record has no data for this field and the passenger record does. Data is "salvaged" from the passenger record to the driver record.

YES: Performs the data salvage on the driver record after it matches a pas­ senger record.

NO: Does not perform the data salvage on the driver record after it matches a passenger record.

This setting is optional, and it defaults to NO.

Input source attributes Used to assign the Perform data salvage attribute to one or more input sources. Those input sources that are not specified here will get the setting found in the DEFAULT_PERFORM_DATA_SALVAGE option.

This section should not be filled out if any of the INPUT_FIELDS / FIELD / FIELD_TYPE is set to PERFORM_DATA_SALVAGE.

This group is optional and cannot be repeated.

Source Sets the Perform data salvage attribute for one input source.

This group is required and may be repeated up to a maximum of 10,000 sources.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 249 Option Description

Name Specifies a source name.

This attribute must match a NAME attribute specified in an INPUT_SOUR­ CES / SOURCE section (a predefined input source). An input source must not be specified more than once. The NAME attribute could also match a NAME attribute of a Source Group. This is equivalent of specifying all the In­ put Sources belonging to that Source Group. If an Input Source is specified both individually and by a Source Group, the individual setting takes prece­ dence.

Perform data salvage Specifies whether you want data salvaging performed on this source. Valid values are YES or NO.

Input fields Specifies what input fields are used for match processing.

This is required and cannot be repeated.

Field Specifies one input field that is to be used for match processing. The input field can be used to either define a key field to hold data to be compared or the input field can be used to make decisions while processing (e.g., on a per record basis decide whether to perform data salvaging or not).

This is required and can be repeated.

Mapped name Specifies the name of the input field to use. The name specified should match a MAPPED_NAME attribute found in the INPUT_FIELDS group.

This is required and cannot be repeated

Extra data Specifies additional data that should be stored in the key. If omitted, then there is no extra data to be stored in the key.

This option cannot be repeated.

Include separator Specifies whether a separator should be placed between each data field specified in this section. Valid values for this attribute are YES or NO.

If set to YES, and the data type of the input fields is character, then a space is placed between each field that is concatenated. If set to NO, then the fields are concatenated without a separator. This required attribute may not be repeated.

Field group Defines the fields to concatenate. The section is optional and may not be re­ peated.

Field Defines one input field to concatenate. This required section may be re­ peated for a maximum of 10 extra fields.

Mapped name Specifies the mapped name of the field. s a required attribute that may not be repeated. It should map a name found in the INPUT_FIELDS section.

Developer Guide 250 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Option Description

Field type Specifies what type of data is found in the specified input field. Valid values are KEY and PERFORM_DATA_SALVAGE.

When this option is set to KEY, then this FIELD section is defining one key field to be compared, along with an input field mapped to it and its format­ ting options.

When this attribute is set to PERORM_DATA_SALVAGE, then the specified input field should hold a YES ("Y") or NO ("N") value to indicate whether data salvage operation should be performed on a particular record.

This option is required and cannot be repeated.

Standard key name Specifies a logical name to the standard key being defined.

This option must be filled out if FIELD_TYPE is set to KEY.

Custom key name Specifies a logical name to the custom key being defined.

Custom Key name is ignored unless the Standard key name is set to CUS­ TOM.

Key length Specifies the number of characters in the field to compare.

1-255

Locale Specifies the locale setting for this criteria field.

Setting this option is recommended if you plan to use the Convert text to numbers option.

Remove punctuation Specifies whether to remove punctuation from your data to help provide more accurate matches.

Be aware of the following:

● Match will not remove a dash from a Family_Name* field. ● The default setting is NO.

YES: Removes punctuation.

NO: Keeps punctuation in your data.

Caution

Setting this option and the Convert text to numbers option to YES may produce undesirable results. For example:

Suppose you have 1.23 as data in a criteria field. Setting Remove punctuation to Yes would convert this number to 123. This number would then match another value of 123, or, in the case of converting text to num­ bers, match a value of "one hundred twenty-three".

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 251 Option Description

Convert to uppercase Specifies whether to convert all data to uppercase for matching purposes only. Be aware of the following:

● This option is ignored for all other Match engine option values. ● The default setting is NO.

YES: Converts the output data to uppercase where appropriate.

NO: Leaves the output data intact.

Convert diacritical characters Specifies whether to include diacritical characters in the matching process. Be aware of the following:

● This option applies to Latin1 data only. (between 0x80 and 0xff) ● The default setting is NO.

YES: Converts diacritical characters to the closest English ASCII equivalent for matching purposes. For example, ä converts to a.

NO: Preserves diacritical characters in the matching process, treating ä and a as not identical characters.

Convert text to numbers Specifies whether numbers represented as text (one, two, three) should be converted to numbers. If you choose Yes, they will be in cardinal (one = 1) or ordinal (first = 1st) format.

YES: Converts numbers represented as text to numbers.

NO: Leaves any numerical text intact (default setting).

Caution

Setting this option and the Remove punctuation option to YES may pro­ duce undesirable results. For example:

Suppose you have 1.23 as data in a criteria field. Setting Remove punctuation to YES would convert this number to 123. This number would then match another value of 123, or, in the case of converting text to num­ bers, match a value of "one hundred twenty-three".

17.7 Compare table group

Developer Guide 252 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference

The compare table group helps you create a table that is used to determine which record pairs qualify to be compared or which sources should be compared. If you do not include a Compare table in the Match transform, a driver record is compared with all remaining passenger records in the data collection.

Tip

If you are using many physical or logical sources in your project, it may be easier to specify what not to compare, as opposed to what to compare. For example, say you have 10 sources: A through J. You want to compare all but A and B. Set the Default action option to Compare. Then set up a table row for both source A and source B, and set the Action options for those sources to No_Match.

Option Description

Name Specifies a logical name to the object. Any Compare object that has a COM­ PARE option may reference it.

ID Source Specifies from where the ID values are obtained.

ID_FIELD: If ID_FIELD is selected, then the ID values are taken from the input field specified in the ID_FIELD section.

INPUT_SOURCES: If INPUT_SOURCES is specified, then the ID values are taken from the source names defined in the NAME option specified in an IN­ PUT_SOURCES / SOURCE section (a predefined input source).

The driver and passenger ID values (but not the default ID) can also be taken from the Source Group names defined in the NAME option specified in an IN­ PUT_SOURCES / SOURCE_GROUPS / SOURCE_GROUP section (a prede­ fined Source Group). Specifying a Source Group is equivalent to specifying all the Input Sources belonging to that Source Group. If an Input Source is speci­ fied both individually and by a Source Group, the individual setting takes precedence. The default setting is ID_FIELD.

Default ID Specifies what ID to use if the input ID is blank.

Default action Specifies what action to take if an ID is not found in the table.

COMPARE: Comparison is performed.

NO_MATCH: The two records are considered unique.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 253 Option Description

Compare Specifies what Compare to perform when an Action option is set to COM­ PARE.

The name specified should match the NAME option of a Compare Match Cri­ teria object. This option is required if the DEFAULT_ACTION is set to COM­ PARE or if an entry in the table is set to COMPARE. Otherwise this option is optional and cannot be repeated.

Table entry group Contains the TABLE_ENTRY configuration items.

Table entry Defines an entry into the compare table. If a driver and passenger pair match an entry in the table, then the entry's action is taken. This section is required and may be repeated.

Action Specifies what action to take if a driver/passenger pair matches this entry. Valid values are COMPARE or NO_MATCH.

If this option is set to COMPARE, then the Compare is performed. If this op­ tion is set to NO_MATCH, then the two records are considered unique.

Driver ID Specifies what ID value the driver record must have to match this entry. If this attribute is omitted, then it is assumed that any driver ID value will match this entry.

Passenger ID Specifies what ID value the passenger record must have to match this entry. If this optional attribute is omitted, then it is assumed that any passenger ID value will match this entry.

ID Field Defines the ID field. The ID_FIELD section is ignored if ID_SOURCE is set to INPUT_SOURCES. This section is required if ID_SOURCE is set to ID_FIELD. Otherwise this section is optional and cannot be repeated.

Mapped name Specifies the mapped name of one input field. This attribute is required and cannot be repeated.

Example

Below is an example of disabling comparisons between records from a house list, which presumably has no matches within it.

House List Check ID_FIELD Unknown COMPARE NormalMatch NO_MATCH H H ListID

Developer Guide 254 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference 17.8 Compare match criteria group

Syntax

The Compare Match Criteria group is used to compare two records to determine whether they match. The Compare Match Criteria determines whether two records match through its match option settings and the match criteria defined. The match criteria examines the data found in the key fields defined in the Match Criteria Key Layout group. This group is optional and may not be repeated. The COMPARE_MATCH_CRITERIA group may be repeated.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 255

Option Description

Name Specifies a logical name to the COMPARE_MATCH_CRITERIA group. Any Com­ pare object (such as a Match level or Match table) that has a COMPARE option may reference it. The minimum length is 1 and the maximum length is 256 char­ acters.

This option is required and cannot be repeated.

Weighted match score Specifies the weighted match score for this match level.

When your matching method includes weighted scoring, records are considered matches when the total contribution score is greater than or equal to this value.

The default value is 101.

This is optional.

17.8.1 Standard key match options

Syntax

The Standard key match options group holds some specialty match options that pertain to some of the standard keys. This section is optional and cannot be repeated.

Option Description

Number of names that Specifies the number of names that must match. This option requires that you must match have criteria of Person1_Given_Name1, Person1_Family_Name1, and so on.

ONE: Specifies that records are a match when at least one of the names meet the criteria (default value).

ALL: Specifies that records are a match only when all of the names meet the cri­ teria.

Developer Guide 256 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Option Description

Compare given name1 to Specifies whether the given name1 (first name) of one record is compared to the given name2 given name2 (middle name of another record).

For example, the two records shown below could be considered duplicate re­ cords if this option is set to Yes.

Given_Name1 Given_Name2 Family_Name1

John Smith

R John Smith

To use this option, you must have criteria named Person1_Given_Name1 and Per­ son1_Given_Name2, Person2_Given_Name1 and Person2_Given_Name2, and/or Person3_Given_Name1 and Person3_Given_Name2.

YES: The Given_Name1 field of one record is compared to the Given_Name2 field of another record.

NO: The Given_Name1 field of one record is not compared to the Given_Name2 field of another record (default value).

Match on hyphenated Specifies whether a single family (last) name in one record matches a hyphen­ family name ated family name in another record. For example, this option considers whether the two records shown below are matches.

This comparison is performed only if one field has a hyphen and the other does not.

Given Family

Laura Smith

Laura Albers-Smith

This option works on a criteria named Family_Name1, Family_Name2, or Fam­ ily_Name3.

YES: The family names match as long as the single family name in one record matches one of the hyphenated family names in another record.

NO: The hyphenated family name is considered a single family name and the comparison results in a low similarity, usually not meeting your family name cri­ teria, resulting in a no-match (default value).

Ignore family name when Specifies whether an adjustment occurs for family names when the given name is female a female. To use this option, you must have at least these three criteria: Given_Name1, Family_Name1, and Gender1.

YES: The Family_Name1 criteria is ignored when the given name gender is a fe­ male (Gender1=5). For example, Laura Smith may match Laura Albers.

NO: The gender is not used and the matching process is performed as usual (de­ fault value),

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 257 Option Description

Ignore firm is names This option lets you work with odd abbreviations or spellings of firm names. This match assumes that you are matching on two real names, firm, and address line, and breaking or matching on city or ZIP Code.

YES: If you enter Yes, you’re indicating “if you find matching names at the same address, call them dupes, even if the firms don’t match.” This lets you catch the following dupes, which might otherwise have been missed:

Rita Terranova ETI 100 Bren Rd 55343

Rita Terranova Eco Technologies 100 Bren Rd 55343

NO: If you insist on a good match on firm name, enter NO (default value).

Ignore street RR if POBOX This option affects business and household records matching on address. match Yes: Records are considered a match if the Boxes match. If the Boxes do not match, then the address and rural route address must pass the match criteria settings.

NO: All forms of the address (street, rural route, and PO Box) must match (de­ fault value).

Addr blank OP ignore if Use this option to control blank matching when comparing firm records within firms match the same break group.

YES: If firm data matches and neither firm field is blank, Match allows blank matching for all address components, regardless of the blank match setting of each address component.

NO: If firm data matches, but address data in one of the records is blank, the re­ cords will not be considered dupes (unless blank matching is turned on for those address components in the Match Criteria block–default value).

Unique on resident RR no This option applies only when an input record contains a resident-type name BOX (Current Resident, Occupant, blank, or name not defined) and a rural route ad­ dress, with no box number. Selecting Yes places all records with this type of name data and the same rural route address into the same dupe group.

YES: Specifies that each record with this type of name data to be a unique record (it will not match any other record).

NO: Records with this type of name data are treated the same as other records (default value).

Developer Guide 258 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference 17.8.2 Criteria definition group

Syntax

The Criteria definition group defines one match criteria. A match criteria specifies one or more key fields to compare. It also specifies what match options to use when comparing the specified key field(s) and how well the fields must match in order to be considered either a match or a "no match".

The CRITERIA_DEFINITION_GROUP group cannot be repeated. The CRITERIA_DEFINITION group can be repeated up to a maximum of 255.

Option Description

Criteria name Specifies a logical name to the defined criteria.

Standard key name Specifies a logical name to the standard criteria being defined.

Custom key name Specifies a logical name to the custom criteria being defined. Custom Key name is ignored unless the Standard key name is set to CUSTOM. This option is required if STANDARD_KEY_NAME is set to CUSTOM, otherwise this option is optional and cannot be repeated.

Compare field group

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 259 Option Description

Compare field Defines one Compare Field. This section can be used in addition to the STAND­ ARD_KEY_NAME and CUSTOM_KEY_NAME parameters, or instead of them.

Defining more than one Compare Field is a way of performing cross-field match­ ing. All the fields defined will be compared, with the highest score being used for the result.

This group is optional and may be repeated up to a maximum of 255.

Standard key name Specifies a "standard" key name. If a custom field is desired, then this option should be set to CUSTOM.

Custom key name Specifies a "custom" key name. This option is required if STANDARD_KEY_NAME is set to CUSTOM, otherwise this option is optional and cannot be repeated.

Compare field Defines one comparison to be performed between two Compare Fields. If not de­ comparison override fined, then all of the Compare Fields defined are compared against each other. group But if one or more COMPARE_FIELD_COMPARISON_OVERRIDE groups are de­ fined, then only the comparisons defined in these groups are performed.

The verifier should issue a warning if there is a Compare Field defined that is not referenced in a COMPARE_FIELD_COMPARISON_OVERRIDE group.

This section is optional and may be repeated up to a maximum of 255.

Driver key name Specifies the name of a key field in the driver record to compare.

Passenger key name Specifies the name of a key field in the passenger record to compare.

Field algorithm Specifies a Field Algorithm Object to use to compare the defined Compare Fields. If omitted, the current SIMIL algorithm is used. If filled out, then the name speci­ fied should match the NAME option of a Field Algorithm Object (e.g., FIELD_AL­ GORITHM_NUMERIC_DIFFERENCE). Also, if this option is filled out, the following options of the CRITERIA_DEFINITION section are ignored: COMPARE_ALGO­ RITHM, CHECK_FOR_TRANSPOSED_LETTERS, INITIALS_ADJUST­ MENT_SCORE, SUBSTRING_ADJUSTMENT_SCORE, ABBREVIATION_ADJUST­ MENT_SCORE, EXT_ABBREVIATION_ADJUSTMENT_SCORE, NU­ MERIC_WORDS_MATCH_EXACTLY, and APPROX_SUBSTRING_ADJUST­ MENT_SCORE.

Enable interscript Specifies whether an attempt should be made to use transliterated data in addi­ matching tion to normal Unicode matching. Valid values for this optional option are YES or NO. The default value is NO.

If the value is YES, then the data will be transliterated to Latin script before being compared. This will be an extra comparison after the normal Unicode comparison occurs, and the highest score of the two comparisons will be kept. Note that the LOCALE option is not needed for script conversion. This option is optional and cannot be repeated.

Enable this option if you have the same data in different scripts (writing systems). For example, if one record has Latin1 and the other has Katakana, or one has Latin and the other has Cyrillic.

Developer Guide 260 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Option Description

One field blank operation Specifies whether to use this criteria if one record's field is populated and the other record's field is blank.

EVAL: The value entered in the One field blank score is used as the similarity score for this criteria.

IGNORE: This criteria is ignored in the comparison process, and its contribution to the weighted score is proportionally distributed among the remaining criteria, therefore negating any impact the contribution score may have had.

One field blank score Specifies the similarity score to use if one of the fields is blank and the One field blank operation option is set to EVAL.

Type a value from 0 to 100.

Both fields blank Specifies whether to use this criteria when both of the records' fields for this crite­ operation ria are blank.

EVAL: The value entered in the Both fields blank score option is used as the simi­ larity score for this criteria.

IGNORE: This criteria is ignored in the comparison process, and its contribution to the weighted score is proportionally distributed among the remaining criteria, therefore negating any impact the contribution score may have had.

Both fields blank score Specifies the similarity score if both of the fields are blank and Both fields blank operation is set to Eval.

Type a value from 0 to 100.

Match score Specifies the minimum similarity score needed for the records to be considered a match. Valid values range from 0 to 101. (101 disables this option)

No match score Specifies the maximum similarity score needed for the records to be considered a no-match. Valid values range from -1 to 100. (-1 disables this option.)

Contribution to weighted Specifies the contribution value, when you use the weighted or the combination score scoring method.

If no single criteria decides a match or no-match, the contribution score is calcu­ lated by summing the products of each criteria's score by each criteria's weight.

Type a value between 0 and 100. 0 is the default value.

Use in weighted score if Specifies the minimum similarity score needed to qualify this criteria to contribute greater than to the Weighted Match Score.

For example, if the value entered here is 59 for a given name criteria, and the given names between two records are less than 60% similar, then the given name criteria is ignored and the contribution value is proportionally distributed among the remaining criteria.

Default is -1 (implies the criteria is always used in weighted score).

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 261 Option Description

Zero weighted score if Specifies the minimum similarity score needed for this criteria to qualify for con­ less or equal tributing a value other than zero to the weighted match score.

For example, if the value is 59 for the given name criteria, and the given names between two records are less than 60% similar, then the given name criteria con­ tributes zero toward the weighted match score.

Default is 0 (implies the criteria is not given a 0 weighted score unless the score is 0).

Compare algorithm Specifies how to handle fields where more than one word commonly exists. Only those options appropriate for the chosen value appear in the Comparison Rules table.

FIELD: If you choose Field Similarity, the transform compares the entire field's data as a single string. This algorithm is more efficient and should be used in fields that typically have just one word.

WORD: WORD first uses FIELD and then compares the words, keeping the best score.

Many criteria options require this option to be set to Word Similarity, such as Substring adjustment score, Initials adjustment score, and so on. See the individ­ ual option descriptions for requirements.

Check for transposed Specifies whether the match score should be adjusted for any transposed charac­ letters ters encountered.

YES: The transform deducts half as many points for transposed characters as it deducts for other non-matching characters.

For example:

Comparison: Smith—Simth

Finding: Words differ by one transposition (penalty of 1 correction)

Percentage alike: 90%

NO: The transform handles transposed characters the same way it handles any non-matching characters (default value).

For example:

Comparison: Smith—Simth

Finding: Words differ by two corrections (one insert and one delete).

Percentage alike: 80%

Developer Guide 262 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Option Description

Initials adjustment score Specifies whether you want initials or acronyms to match whole words. For exam­ ple, the firm name International Health Providers could match IHP.

Enter a value from 0 to 100. Enter a value of 0 (default) to disable initial checking.

Remember the following when using this option:

● The initial must match the first letter of the word. ● The letters that match are given a score of 100. The remaining letters are given the score that you specify (from 1-100). ● The two scores are proportionally combined to render the overall score. If there are other words in the field that are not shortened, they are scored the usual way. For example, New York Police Department may be shortened to New York PD and still match.

Note

For this option to work for multiple-word abbreviations (such as International Health Providers = IHP) you must set the Compare algorithm option to WORD. For this option to work for single-word abbreviations (such as Maria = M) you may set the Compare algorithm option to either Word or Field.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 263 Option Description

Substring adjustment Allows matching longer strings of words to shorter strings. For example, long firm score names are often shortened to the first few words of the name. A fictitious com­ pany such as Mayfield Painting and Sand Blasting might be shortened to Mayfield Painting.

Remember the following rules about values to enter:

● Enter a value from 0 to 100. Enter a value of 0 (default) to disable substring checking. ● Enter a value of 100 if you want substrings and longer strings to be consid­ ered a perfect match.

Here is what happens after processing.

● Letters that match are given a score of 100. The remaining letters are given the score you specify (from 1-100). ● The two scores are proportionally combined to render the overall score.

To qualify as a substring match, the shorter string must exactly match the first part of the longer string.

Consider the following example:

Matching substrings

● Mayfield ● Mayfield Painting ● Mayfield Painting and Sand

Substrings that do not match

● Mayfield Sand Blasting ● Painting and Sand Blasting

Alternate spellings in any of the words also disqualify the substrings as a match. For example, “Murphy Painting and Sand Blasting” does not match. (This com­ parison would have a similarity score of 85% without this option set.)

Note

For this option to work, you must set the Compare algorithm option to WORD.

Approx substring Specifies what score to give the words that were not matched to the other string. adjustment score Valid values for this option are 0 (default) to 100. A value of 0 disables the option.

Here is an example of two strings that do not match exactly, the first words do not match, nor do the words match consecutively, but are accepted by this option:

CRUZ RDZ and SMITH CRUZ DE RODRIQUEZ

In order for RDZ to match RODRIQUEZ the Abbreviation adjustment score option must be enabled. In this example, the leftover words (SMITH DE) and spaces are given the Approximation Substring option's adjustment score.

Developer Guide 264 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Option Description

Abbreviation adjustment This option controls matching whole words to abbreviations. For example, long score firm names are often abbreviated by removing letters. International Health Pro­ viders might be abbreviated to Intl Health Providers.

For example:

Full word Possible abbreviations

Business Bus, Bsnss, Bss

Database Dat, Db, Dse

As shown in the examples, abbreviation means that the first letter of the shorter word matches the first letter of the longer word, and all remaining letters of the shorter word appear in the longer word in the same order as in the shorter word. The value you enter is the score given to the letters that are in the longer word but not the shorter word.

● Enter a value of 0 (zero) to disable abbreviation checking. (Default value) ● Enter a value greater than 0 to enable this option. ● Enter a value of 100 if you want abbreviations and longer words to be consid­ ered a perfect match.

Note

For this option to work, you must set the Compare algorithm option to WORD.

Ext abbreviation This option handles a variation of the Abbreviation adjustment score option. adjustment score Enter a number that adjusts the similarity score for these types of abbreviations. For example:

● Enter a value of 0 (default) to disable extended abbreviation checking. ● Enter a value greater than 0 to enable extended abbreviation checking.

Remember the following when using this option:

● The first letter of the short word must match the first letter of the first word in the multiple-word string, and the remaining letters of the short word must be found in order in the multiple-word string. ● Letters that match are given a score of 100. The remaining letters are given the score that you specify. ● The two scores are proportionally combined to render the overall score.

Note

For this option to work, you must set the Compare algorithm option to WORD.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 265 Option Description

Numeric words match Specifies how to match data that contains both numbers and letters. exactly NONE: Numeric words don't need to match exactly to be considered a match. (Default value)

ANY_POSITION: Numeric words don't need to be in the same position in two dif­ ferent strings to be considered a match.

ANY_POSITION_CONSIDER_PUNCTUATION: This value behaves the same as the Any_Position value. However, the Match transform takes the position of a decimal separator (comma or period) within the numeric words into consideration.

ANY_POSITION_IGNORE_PUNCTUATION: Same as Any_Position, except that decimal separators (comma or period) are completely ignored.

SAME_POSITION: Numeric words must match exactly and be in the same position in the string to be considered a match.

Note

For this option to work, you must set the Compare algorithm option to WORD.

17.9 Post match processing group

The Post Match Processing object defines the processing that should occur after the match groups have been formed. This object directs the performing of the group statistics and the Input source select group operation.

The POST_MATCH_PROCESSING_GROUP is optional and cannot be repeated. The POST_MATCH_PROCESSING group is optional and can be repeated.

Option Description

Name Specifies a logical name to the object. This name is referenced in the Match Level ob­ ject. This option is required and cannot be repeated.

Operations Defines one or more operations to perform. This group is required and cannot be re­ peated.

Developer Guide 266 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Option Description

Op Defines one operation to perform. This required group may be repeated.

Name Specifies the name of the object that will perform the operation. The Name attribute can reference a Group Statistics, Input Source Group Statistics or an Input source se­ lect record group operation.

This option is required and cannot be repeated.

There are two possible post-match operations.

Group statistics

Use group statistics to generate statistical information about your group of matching records. Find out:

● the number of records within the match group ● the sequential group order number ● the group rank, which flags one record within each group of matching records as the Master record and all other records in the group as Subordinate records ● whether the records in a match group belong to more than one source

Group statistics are essential for generating data for match reports.

Input source select record

Use the Input source select record options to flag certain types of records for potential processing downstream.

17.10 Group statistics group

This group is available for those configurations that do not set up input sources with the Input Sources group.

The GROUP_STATISTICS_GROUP is optional and cannot repeat. The GROUP_STATISTICS group is optional and may repeat.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 267

The Group Statistics group option group includes the following options:

Option Description

Name Choose a name for this group statistics operation. If you are including more than one in this Match transform, make sure that the name is unique.

This name is referenced in the Post Match Processing object. Minimum length is 1 and maximum length is 15.

This option is required and cannot be repeated.

Source type definitions Defines the source types for this object. This section is required if any of the following output fields are used: GROUP_TYPE, SOURCE_COUNT, SOURCE_ID_COUNT, SOURCE_ID, and SOURCE_TYPE_ID. Otherwise, this section is optional and cannot be repeated

Definition Defines one SOURCE type. This section is required and may be repeated.

Source type ID Specifies the string that represents the source type ID. The source type ID valid values of "N" (normal), "S" (special) and "P" (purge or suppress). This option is required and cannot be repeated.

A "N" or Normal source group is a group of records considered to be good, eligible records.

A "P" or Suppress or purge source contains records that would often dis­ qualify a record from use.

A "S" or Special source is treated like a Normal source, with one exception. A special source is not counted in when determining whether a match group is single-source or multi-source. A Special source can contribute re­ cords, but is not counted toward multi-buyer status.

Include in source count Specifies whether this source type should be included in the source count. For example, the special source type and suppress source type are not in­ cluded in the source (buyer) count. Valid values are YES or NO.

This option is required and cannot be repeated

Source ID field Defines the source ID field. This section is required if any of the following output fields are used: GROUP_TYPE, SOURCE_COUNT, SOURCE_ID_COUNT, SOURCE_ID, and SOURCE_TYPE_ID. Otherwise this section is optional and cannot be repeated.

Developer Guide 268 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Option Description

Mapped name Specifies the mapped name of the source ID input field.

This option is required and cannot be repeated.

Auto generate sources Specifies the information needed to automatically generate the sources.

If this section is present, then sources are automatically generated. If this section is omitted, then source are not automatically generated. This sec­ tion is optional and cannot be repeated.

Source type ID field Specifies the input field that holds the source type ID.

This section is optional and cannot be repeated

Mapped name Specifies the mapped name of the source type ID input field.

This option is required and cannot be repeated.

Pre defined sources This group defines one or more predefined sources. This section should be filled out if the only source information in the record is the source ID itself. This section allows assigning a source type to each predefined source.

This section is optional and cannot be repeated.

Source Defines one source.

This group is required and may be repeated.

Source ID Specifies the source ID. This value will be compared against the value found in the source ID input field.

This option is required and cannot be repeated.

Source type ID Specifies the source type that should be given to this source. This value should match a value of the SOURCE_TYPE_DEFINITIONS / DEFINITION / SOURCE_TYPE_ID option.

This option is required and cannot be repeated.

Default source ID Specifies a default source ID. This value will be used if the source ID input field is undefined or blank. This value will also be used if the AUTO_GENER­ ATE_SOURCES section is not filled out and the source ID input field's value does not match any of the predefined sources. This option must also match a predefined source if the AUTO_GENERATE_SOURCES section is not filled out.

This option is required if any of the following output fields are used: GROUP_TYPE, SOURCE_COUNT, SOURCE_ID_COUNT, SOURCE_ID, and SOURCE_TYPE_ID. Otherwise, this option is optional and cannot be re­ peated.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 269 Option Description

Default source type ID Specifies a default source type ID. The default source type ID will be used if the DEFAULT_SOURCE_ID is used and the DEFAULT_SOURCE_ID does not match any ID defined so far (either predefined or auto). The default source type ID will also be used if the source type ID field is blank or holds an inva­ lid value (does not match any defined types). This option must match a pre­ defined type.

This option is required if the DEFAULT_SOURCE_ID is filled out and if the DEFAULT_SOURCE_ID does not match a predefined ID. This option is also required if auto-generate IDs is enabled. Otherwise this option is optional and cannot be repeated.

The Group statistics group can post the following output fields:

● GROUP_COUNT ● GROUP_ORDER ● GROUP_RANK ● GROUP_TYPE ● SOURCE_COUNT ● SOURCE_ID_COUNT ● SOURCE_ID ● SOURCE_TYPE_ID

17.11 Input sources

Syntax

This group defines your input sources. This group is optional and cannot be repeated.

Developer Guide 270 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference

Before you define your input sources, you will need to map a field that contains the value that identifies the input source.

Option Description

Default source name This name is used when a record does not belong to any of the predefined sources, and either the sources are not automatically generated, or the VALUE input field is blank, or the maximum number of sources (10,000) has been reached.

This option can be up to 255 characters long and the name specified must match a predefined source name. This option is required and cannot be re­ peated.

Sources Defines sources.

Source Defines one predefined source. An input record will sequentially try each source group until it finds the source group it belongs to. At least one group must be defined because the DEFAULT_SOURCE_NAME option must match a predefined source.

This group is required and may be repeated up to 500 sources.

Name Specifies a source name. A source name is a symbolic name for the source. This name is used in other locations of the XML and will appear in the source statistics.

Minimum length is 1 and maximum length is 255. This option is required and cannot be repeated.

Value Specifies a source value. The contents of an input record's VALUE input field is compared against this option. If it matches, then the input record will get this source definition. The comparison is case sensitive. Leading and trailing spaces will be trimmed.

Minimum length is 1 and maximum length is 255. This option is required if this source is not the default source. Otherwise this option is optional and cannot be repeated

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 271 Option Description

Type Specifies a source type. This option is required and cannot be repeated.

NORMAL: A Normal source contains good or eligible records.

SUPPRESS: A suppression source contains records that would often disqual­ ify a record from use. For example, if you’re using the software to refine a mailing list, a Suppress source removes records from the mailing. Examples:

● DMA Mail Preference File ● American Correctional Association prisons/jails lists ● No pandering or non-responder lists ● Credit card or bad-check suppression lists

SPECIAL: A Special source is treated like a Normal source, with one excep­ tion. A Special source is not counted in when determining whether a match group is single-source or or multi-source. A Special source can contribute re­ cords, but it’s not counted toward multi-source status.

For example, some companies use a source of seed names. These are names of people who report when they receive advertising mail, so that the mailer can measure mail delivery. Appearance on the seed source should not be counted toward multi-source status.

Auto sources The AUTO_SOURCES group defines the settings to use if sources are to be defined automatically. If this group is defined, then match will pull source def­ initions from the input fields. If this group is omitted, then sources will not be defined from the input fields. The Match transform will start out with the source(s) defined in the SOURCE group. Then as each record is processed, Match will first check to see if the record belongs to a predefined source. If it does, Match will assign that record to that source. If the record does not be­ long to a predefined source, then Match will check to see if the record be­ longs to an auto-defined source. If the record belongs to an auto-defined source, Match will use the auto-defined source. If the Input source is not de­ fined, Match will add the definition to the list of defined sources. If the maxi­ mum number of source definitions has been reached, then instead of adding a new source definition, Match will use the default source. The VALUE input field must be defined if this group is defined. This group is optional and can­ not be repeated.

Default type Specifies the default source type. Valid values are NORMAL, SUPPRESS or SPECIAL. This setting will be used if the TYPE input field is not defined or if the TYPE input field does not have a value of "N" (normal), "P" (purge or sup­ press), or "S" (special).

This option is required and cannot be repeated.

17.11.1 Source groups

The SOURCE_GROUPS group defines the Source Groups. A Source Group consists of one or more Input Sources. It allows the user to logically combine Input Sources. For example, the user may wish to place all the rented Input

Developer Guide 272 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference Sources into a Source Group and all the house Input Sources into another Source Group. The user does not have to place all Input Sources into a Source Group. Thus the record counts for Source Groups may be lower than the record count for Input Sources. If omitted, then no Source Group statistics are generated and the Source Group output fields will be empty or 0. This group is optional and cannot be repeated

Adding a source groups can provide you with additional statistics in certain Match reports.

Option Description

Undefined action Specifies the action to take if an input source does not appear in a source group.

IGNORE: The input source does not belong to any source group.

DEFAULT: The input source belongs to the default source group specified in the Default source group option.

AUTO:

● If the Source group field option is not defined, then the input source will belong to a source group of the same name as the input source. The source group is created if necessary. ● If the Source group field option is defined, then the input source belongs to the source group named in the Source group field option. The source group is created if necessary. ● If the source group field's content is blank, then that input source will not belong to a source group (equivalent of Ignore).

Default name Specifies the default Source Group name. This option is required if the UNDEFINED_AC­ TION is set to DEFAULT, and must match the NAME option in a SOURCE_GROUP group. See the UNDEFINED_ACTION option for details when this option is used. Minimum length is 1 and maximum length is 255.

Source group group Contains the configuration options for a Source Group.

Source group Defines one Source Group.

This group is required if the UNDEFINED_SOURCE_GROUP_ACTION is set to DEFAULT, otherwise this group is optional and may be repeated.

Name Specifies a Source Group name. A Source Group name is a symbolic name for the Source Group. This name will appear in the source statistics. Minimum length is 1 and maximum length is 255.

This option is required and cannot be repeated.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 273 Option Description

Sources Specifies one source that belongs to this Source Group. This group is required if the SOURCE_GROUPS / SOURCE_GROUP / NAME option does not match the _SOURCES / DEFAULT_NAME option.

This group is required and may be repeated.

Name Specifies an Input Source name. This name must match a name of a predefined source. Minimum length is 1 and maximum length is 255.

17.11.2 Input sources / Input fields

Syntax

This group is required if the AUTO_SOURCES group is defined, otherwise this group is optional and cannot be repeated.

Option Description

Field Defines one Input Field. This optional group may be repeated up to 3 fields.

Mapped name Specifies the mapped name of the Input Field. It should match the name of a field in the INPUT_FIELDS group. This option is required and cannot be repeated.

Field type Specifies the type of data the input field holds.This option is optional and cannot be re­ peated.

VALUE: This field type must be defined if any of the Input source definitions are using the SOURCE / VALUE option or if Auto sources are used. All the other input field types are optional.

TYPE: Thist field provides a source type when auto-generating input sources.

SOURCE_GROUP_NAME: This field provides a Source Group name for a record that does not belong to a predefined Source Group.

Developer Guide 274 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference 17.12 Field algorithm numeric difference group

Syntax

Options Description

Name Specifies a logical name to the Field Algorithm. Any FIELD_ALGORITHM attributeof a Compare Match Criteria Object may reference it. This is a required attribute.

Max difference Specifies the maximum difference allowed. Any difference larger than the MAX_DIF­ FERENCE will receive a score of 0. A difference equal to MAX_DIFFERENCE will re­ ceive a score of MAX_DIFFERENCE_SCORE. Any difference less than MAX_DIFFER­ ENCE will receive a proportional score between MAX_DIFFERENCE_SCORE and 100.

Max difference score Specifies what score to generate when the difference is the same as MAX_ DIFFER­ ENCE. Valid values for this required attribute range from 0 to 100. Any difference larger than the MAX_DIFFERENCE will receive a score of 0. A difference equal to MAX_ DIFFERENCE will receive a score of MAX_DIFFERENCE_SCORE. Any difference less than MAX_DIFFERENCE will receive a proportional score between MAX_ DIFFER­ ENCE_SCORE and 100.

17.13 Field algorithm numeric percent difference group

Syntax

Options Description

Name Specifies a logical name to the Field Algorithm. Any FIELD_ALGORITHM attribute of a Compare Match Criteria Object may reference it. This is a required attribute.

Max percent difference Specifies the maximum difference allowed as a percent of the absolute driver value. Valid values for this required attribute can range from 0 to 100.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 275 Options Description

Max percent difference Specifies what score to generate when the difference is the same as MAX_ PER­ score CENT_DIFFERENCE. Valid values for this required attribute range from 0 to 100. Any difference larger than the MAX_PERCENT_DIFFERENCE will receive a score of 0. A difference equal to MAX_ PERCENT_DIFFERENCE will receive a score of MAX_ PERCENT_DIFFERENCE_SCORE. Any difference less than MAX_ PERCENT_DIF­ FERENCE will receive a proportional score between MAX_ PERCENT_DIFFER­ ENCE_SCORE and 100.

17.14 Field algorithm geo proximity group

Syntax

Options Description

Name Specifies a logical name to the Field Algorithm. Any FIELD_ALGORITHM attribute of a Compare Match Criteria Object may reference it. This is a required option.

Distance unit Specifies both what distance unit the MAX_DISTANCE tolerance is being provided in and what the unit will be coming out of the calculation. Valid values for this required option are FEET, MILES, METERS, and KILOMETERS.

Max distance Specifies the maximum distance allowed. Any distance larger than the MAX_DIS­ TANCE will receive a score of 0. A distance equal to MAX_DISTANCE will receive a score of MAX_DISTANCE_SCORE. Any distance less than MAX_DISTANCE will re­ ceive a proportional score between MAX_DISTANCE_SCORE and 100. Valid values for this required option range from 0 to the maximum double value.

Max distance score Specifies what score to generate when the distance is the same as MAX_DISTANCE. Valid values for this required option range from 0 to 100.

17.15 Input source group statistics group

Syntax

The Input Source Group Statistics object generates statistics that can be posted with output fields. The functionality of the Input Source Group Statistics object is similar to the Group Statistics object. The main

Developer Guide 276 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference difference is the Input Source Group Statistics object generates statistics on the Input Sources that are defined, where the Group Statistics is more stand alone and independent. If this object is used, then the Input Source object must be defined. This object is optional and may be repeated.

Option Description

Name Specifies a logical name to the object. This name is referenced in the Post Match Processing object. Minimum length is 1 and maximum length is 15.

This option is required and cannot be repeated.

The Input Source Group Statistics group can post the following output fields:

● GROUP_COUNT ● GROUP_SOURCE_TYPE ● GROUP_ORDER ● GROUP_RANK ● GROUP_SOURCE_APPEARANCE ● GROUP_SOURCE_ORDER ● GROUP_SOURCE_GROUP_APPEARANCE ● GROUP_SOURCE_GROUP_ORDER ● MULTI_SOURCE _COUNT ● SOURCE_COUNT ● SOURCE_GROUP_COUNT

17.16 Input source select record group

Select types of records you want to flag on output based on each of the input sources. You may want to flag these records so that they will be available for writing to output.

This group generates an output field called SELECT_RECORD.

Developer Guide Match transform reference © 2014 SAP SE or an SAP affiliate company. All rights reserved. 277 This is a repeatable operation.

Record type Description

Name Enter a unique name for this operation that will allow you to identify it in a report and in your Select_Record output field. This name is referenced in the Post match proc­ essing group.

For example, suppose you have two Input source select record operations in this match level: DMA_Matches and Mail_List. Your output fields are then called:

_DMA_Matches_Select_Record ● _Mail_List_Select_Record

Select unique records Records that are not members of any match group. No matching records were found. These can be from sources with a normal- or special-type source.

Valid values are YES or NO.

Select single source Highest ranking member of a match group whose members all came from the same masters source. Can be from normal- or special-type sources.

Valid values are YES or NO.

Select single source A record that came from a normal- or suppress-type source and is a subordinate subordinates member of a match group.

Valid values are YES or NO.

Select multiple source Highest ranking member of a match group whose members came from two or more masters sources. Can be from normal- or special-type sources.

Valid values are YES or NO.

Select multiple source A subordinate record of a match group that came from a normal- or suppress-type subordinates source whose members came from two or more sources.

Valid values are YES or NO.

Select suppress source Subordinate member of a match group that includes a higher-priority record that matches came from a suppress -type source. Can be from normal- or special-type source.

Valid values are YES or NO.

Select suppress source Records that came from a suppress-type source, and for which no matching records uniques were found.

Valid values are YES or NO.

Select suppress source A record that came from a suppress-type source and is the highest ranking member masters of a match group.

Valid values are YES or NO.

Select suppress source A record that came from a suppress-type source and is a subordinate member of a subordinates match group.

Valid values are YES or NO.

Developer Guide 278 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Match transform reference 18 Data Quality fields

This section lists the mapped input field and generated output fields for each transform (when available).

Note

The field lengths given are recommendations only. The maximum field length for each field is 32,000 characters.

18.1 Input fields

Syntax

The Input fields group declares all the fields and their properties that the SDK transform will import from other transforms. This is a required group that cannot be repeated.

Option Description

Field The Field section defines one input field.

This group is required and may be repeated

Data record field name Specifies the name of the input field.

Mapped name Specifies the mapped name of the input field. The mapped name is used in the transform to reference the input field.

Data type Describes what type of data will be presented to the transform for the specified in­ put field. Valid values for this optional setting are CHARACTER_UCS2 (default), INTEGER32, DOUBLE, DECIMAL, DATE, TIME, and DATE_TIME.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 279 Option Description

Data length Specifies the length of the input field.

The transform may use the length setting for insight into how much data it can ex­ pect from a field.

● For the Character field types (CHARACTER_UCS2 and CHARACTER_UTF16), this setting normally would be required. But for backward compatibility, it de­ faults to unlimited if omitted. ● For the fixed length field types (INTEGER32, DOUBLE, and DATE), this setting is ignored. ● For the DECIMAL field type, this setting is required and represents the preci­ sion. ● For the TIME and DATE_TIME fields, this setting can be omitted. The length can be calculated by the DATA_SCALE setting. The TIME length is the sum of the DATA_SCALE plus 6 (HHMMSS). The DATE_TIME length is the sum of the DATA_SCALE plus 14 (YYYYMMDDHHMMSS).

Data scale Specifies the scale of the input field. The scale is the number of digits to the right of the decimal place. This setting is required for the DECIMAL, TIME, and DATE_TIME field types

18.2 Output fields

Syntax

The Output Fields group declares all the fields and their properties that the SDK transform will make available for export. This is an optional object that cannot be repeated.

Option Description

Field The Field section defines one output field.

This group is optional and may be repeated

Data record field name Specifies the name of the output field.

Match field name Specifies which match result should be posted to the output field. Valid values for this attribute depend on which Match objects are defined. This attribute is re­ quired and cannot be repeated.

Developer Guide 280 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Option Description

Data type Describes what type of data will be presented to the transform for the specified input field. Valid values for this optional setting are CHARACTER_UCS2 (de­ fault), INTEGER32, DOUBLE, DECIMAL, DATE, TIME, and DATE_TIME.

Data length Specifies the length of the input field.

The Transform may use the length setting for insight into how much data it can expect from a field.

● For the Character field types (CHARACTER_UCS2 and CHARAC­ TER_UTF16), this setting normally would be required. But for backward compatibility, it defaults to unlimited if omitted. Note: DS specifies an unlim­ ited length as -1. Thus the length must be declared as a signed integer in the xsd file. ● For the fixed length field types (INTEGER32, DOUBLE, and DATE), this set­ ting is ignored. ● For the DECIMAL field type, this setting is required and represents the preci­ sion. ● For the TIME and DATE_TIME fields, this setting can be omitted. The length can be calculated by the DATA_SCALE setting. The TIME length is the sum of the DATA_SCALE plus 6 (HHMMSS). The DATE_TIME length is the sum of the DATA_SCALE plus 14 (YYYYMMDDHHMMSS).

Data scale Specifies the scale of the input field. The scale is the number of digits to the right of the decimal place. This setting is required for the DECIMAL, TIME, and DATE_TIME field types

18.3 Data type support

Syntax

The following tables describe the data type support for each of the supported programming languages

Table 7: Character data Data Quality Management SDK supports UCS-2 character data.

Language Data type

C++ uint_16, char

.NET .NET String

Java Java String

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 281 Table 8: Integer data type (32-bit) Language Data type

C++ int32_t

.NET .NET System.Int32 (32-bit signed integer)

Java Java int data type. (signed 32 bit two's complement)

Table 9: Double data type Language Data type

C++ Double

.NET .NET System.Double data type. (double precision 64-bit floating point number).

Java Java double data type. (64-bit IEEE 754 floating point)

Table 10: Decimal data type Language Data type

C++ Native C++ double

.NET .NET System.Decimal

Java Java java.math.BigDecimal

Table 11: Date data type Language Data type

C++ Date class

.NET .NET System. DateTime

Java Java java.util.Date

Table 12: Time data type Language Data type

C++ Time class

.NET .NET System.DateTime

Java Java java.util.Date

Table 13: Date-Time data type Language Data type

C++ DateTime class

.NET .NET System. DateTime

Java Java java.util.Calendar

Developer Guide 282 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields 18.4 Data Cleanse fields

The Data Cleanse transform requires that you map fields on input and output. The input mappings tell Data Cleanse what type of data each field may contain.

18.4.1 Input fields

The following are recognized input fields that you can use in the input mapping for the Data Cleanse transform. The fields are listed alphabetically.

Name Description

Date1-6 Date. For example, 08/16/2004.

Email1-6 E-mail address.

Firm_Line1-2 Firm name, firm location, or both.

Firm_Location1-2 Location within a company or organization, such as a de­ partment, mail stop, room, or building.

Firm_Name1-2 Name of a company or organization.

Multiline1-12 Multiline data. Item types from this input are parsed in the order set in the Parser Sequence Multiline option.

Name_Line1-6 Whole name or names. May include job title.

Name_Or_Firm_Line1-6 Name of a person or organization.

Person1_Family_Name1 Discrete family name (for example, Smith).

Person2_Family_Name1

Person1_Family_Name2 Second discrete family name.

Person2_Family_Name2 May be useful for cultures where people are known by both paternal and maternal family names. If your input data con­ tains two family name fields, map the first to Person1_Fam­ ily_Name1 and the second to Person1_Family_Name2.

Person1_Given_Name1-2 Discrete given names (for example, John or B.).

Person2_Given_Name1-2

Person1_Honorary_Postname Honorary postname indicating certification, academic de­ gree, or affiliation, such as CPA. Person2_Honorary_Postname

Person1_Maturity_Postname Maturity postname indicating heritage, such as Jr., Sr., III.

Person2_Maturity_Postname

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 283 Name Description

Person1_Prename Discrete prename, such as Mr., Mrs., Dr., or Lt. Col.

Person2_Prename

Person1_Title Discrete job title, such as Software Engineer.

Person2_Title

Phone1-6 Phone number.

SSN1-6 U.S. Social Security number.

Title_Line1-6 Job title (for example, Accountant).

UDPM1-4 Input field associated with patterns and rules defined in the User-defined type of Reference Data in Cleansing Package Builder. For example, CN244-56.

18.4.2 Output fields

The following are recognized output fields that you can use in the output mapping for the Data Cleansetransform. By default, the Extra fields are always available, additional output fields are displayed based on the mapped input fields and the selected parser sequence multiline options. You can use the Filter Output Fields option to display a complete list of output fields. The fields are listed alphabetically.

Generated field name Description

Date A date that is parsed.

Date_Day The day that is parsed from the date.

Date_Month The month that is parsed from the date.

Date_Year The year that is parsed from the date.

Dual_Name Set of components resulting from one input field that contains two names separated by a connecting word such as "and" or "or."

Example 1:

Input: Terry and Kris Johnson

Output: Terry Johnson and Kris Johnson

Example 2:

Input: Terry Johnson or Kris Adams

Output: Terry Johnson or Kris Adams

Email An entire e-mail address.

Email_Domain_All The domain of the e-mail address. For example, sap.com.

Email_Domain_Fifth In an e-mail address with more than one domain listed, this field parses the fifth to last domain.

Developer Guide 284 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Generated field name Description

Email_Domain_Fourth In an e-mail address with more than one domain listed, this field parses the fourth to last domain.

Email_Domain_Host The host of the e-mail address (the first item listed after the @ symbol). For example, in "[email protected]", "sap" is returned.

Email_Domain_Second In an e-mail address with more than one domain listed, this field parses the second to last domain.

Email_Domain_Third In an e-mail address with more than one domain listed, this field parses the third to last domain.

Email_Domain_Top The last listed domain of the e-mail address. For example, .com.

Email_Is_ISP The email address is a known ISP (internet service provider).

Email_User The user name of the e-mail address. For example, in "[email protected]", "joex" is returned.

Extra Any data that is not parsed by any of the active parsers and thus Data Cleansedoes not recognize the data as fitting one of the other output fields.

Family_Name1 Family name (for example, Smith).

Family_Name1_Match _Std1-6 The match standard for family names.

This field is only used with a cleansing package that includes name data in more than one script. The match standards include the name as it is written in alternate script types. For example, for a family name in­ cluded in the Japanese dictionary in kanji script, the match standards include kana renditions of the name.

If the dictionary does not have an alias entry, the output field is empty.

Family_Name2 Second family name. May be used to output the paternal and maternal family names to separate fields.

Note

With Japanese data, if the input contains both the official name (usually in kanji) and the pronunciation name (known as the furigana and usually in hiragana or katakana), then the official name is output to Family_Name1/Given_Name1 and the pronunciation name is out­ put to Family_Name2/Given_Name2.

Family_Name2_ Match_Std1-6 The match standard for second family names.

This field is only used with a cleansing package that includes name data in more than one script. The match standards include the name as it is written in alternate script types. For example, for a family name in­ cluded in the Japanese dictionary in kanji script, the match standards include kana renditions of the name.

If the dictionary does not have an alias entry, the output field is empty.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 285 Generated field name Description

Firm The name of a company or organization.

Firm_Location A location within a company or organization, such as a department. For example, Mailstop.

Firm_Match_Std1-6 The match standard for firms. For example, HP is the match standard or alias for Hewlett Packard.

If the dictionary does not have an alias entry, the output field is empty.

Firm_Location_ Match_Std1-6 The match standard for firm locations. For example, MS is the match standard or alias for mailstop.

If the dictionary does not have an alias entry, the output field is empty.

Gender The gender description. The following output is available:

Ambiguous:The name does not reliably indicate a gender. The name could be either male or female. For example, Pat.

Male_Strong:High confidence that the person is male. That is, the name belongs to someone who is almost certainly a male. For example, John.

Male_Weak:Some confidence that the person is male. That is, the name belongs to someone who is probably male. For example, Terry.

Female_Strong:High confidence that the person is female. That is, the name belongs to someone who is almost certainly a female. For exam­ ple, Mary.

Female_Weak:Some confidence that the person is female. That is, the name belongs to someone who is probably a female. For example, Lynn.

For dual names, the following output is also available:

Multi_Names_Ambiguous:At least one of the names does not reliably indicate a gender. For example, Pat and John.

Multi_Names_Female:Some or high confidence that both of the names belong to people who are female. For example, Mary and Lynn.

Multi_Names_Male:Some or high confidence that both of the names belong to people who are male. For example, John and Terry.

Multi_Names_Mixed:Some or high confidence that one of the names belongs to a person who is female, and the other name belongs to a person who is male. For example, Lynn and John.

Developer Guide 286 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Generated field name Description

Gender_ID A numeric value that corresponds to the gender description:

0:Unassigned

1:Male_Strong

2:Male_Weak

3:Ambiguous

4:Female_Weak

5:Female_Strong

6:Multi_Names_Mixed

7: Multi_Names_Male

8:Multi_Names_Female

9:Multi_Names_Ambiguous

Given_Name1 Given name (for example, Robert).

Given_Name1_ Match_Std1-6 The match standard for given names. For example, the application can tell you that Patrick and Patricia are potential matches for the given name Pat.

Match standards can help you overcome two types of matching prob­ lems: alternate spellings (Catherine and Katherine) and nicknames (Pat and Patrick).

Given_Name2 Second given name.

Given_Name2_ Match_Std1-6 The match standard for second given names. For example, the applica­ tion can tell you that Patrick and Patricia are potential matches for the given name Pat.

Honorary_Postname Honorary postname indicating certification, academic degree, or affili­ ation. For example, CPA.

Honorary_Postname_Match_Std1-6 The match standard for an honorary postname. For example, M.B.A. is the match standard or alias for MBA.

If the dictionary does not have an alias entry, the output field is empty.

International_Phone The entire international phone number, including extra items such as the country code.

International_Phone_ Country_Code The country code of an international phone number.

International_Phone_ Country_Name The name of the country of origin of an international phone number.

International_Phone_ Line The portion of the international phone number that is not the country code or the city code.

International_Phone_ Locality_Code The city code of an international phone number.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 287 Generated field name Description

Match_Family_Name The combined standardized form of FamilyName1 and FamilyName2 with a space between used in the Match transform during the compari­ son process. Data is output in uppercase, apostrophes are removed, and other punctuation is replaced with a single space. PreFamilyName data is removed.

Match_Firm A form of Firm that may be used in the Match transform during the comparison process.

Data is output in uppercase, apostrophes are removed, other punctua­ tion is replaced with a single space, and data that is extraneous for matching purposes is removed. This extraneous data includes busi­ ness types such as Ltd. and GambH, and noise words such as The, And, and Of.

Note

Some words are classified to be removed from all domains, while others are language-specific and are classified to be removed in specific cultural domains.

Match_Given_Name1 The standardized form of GivenName1 used in the Match transform during the comparison process. Data is output in uppercase, apostro­ phes are removed, and other punctuation is replaced with a single space. PreGivenName data is removed.

Match_Given_Name2 The standardized form of GivenName2 used in the Match transform during the comparison process. Data is output in uppercase, apostro­ phes are removed, and other punctuation is replaced with a single space. PreGivenName data is removed.

Match_Maturity_Postname The standardized form of MaturityPostname used in the Match trans­ form during the comparison process. Data is output in uppercase, apostrophes are removed, and other punctuation is replaced with a single space.

Match_Person A form of Person that may be used in the Match transform during the comparison process. Data is output in uppercase, apostrophes are re­ moved, other punctuation is replaced with a single space, and data that is extraneous for matching purposes is removed. Extraneous data in­ cludes pre-given name, pre-family name, and pre-name as well as hon­ orary and maturity post names and name designators.

Match_Phone The form of Phone used in the Match transform during the comparison process. Data is output as a string of digits. Spaces, punctuation, al­ phabetical characters, and leading zeros are removed.

Match_Prename The standardized form of Prename used in the Match transform during the comparison process. Data is output in uppercase, apostrophes are removed, and other punctuation is replaced with a single space.

Maturity_Postname Maturity postname indicating heritage, such as Jr., Sr., III.

Developer Guide 288 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Generated field name Description

Maturity_Postname_Match_Std1-6 The match standard for a maturity postname. For example, SR. is a match standard or alias for Senior.

If the dictionary does not have an alias entry, the output field is empty.

Name_Connector The connector component of a dual name. For example, and.

Name_Designator Name designator such as Attn: or c/o.

Name_Special Term that generically describes a person. For example, occupant or current resident.

North_American _Phone An entire North American Numbering Plan (NANP) phone number.

North_American _Phone_Area_Code The area code parsed from the phone number.

North_American _Phone_Extension An extension parsed from the phone number.

North_American _Phone_Line The last four numbers (excluding an extension) parsed from a phone number. In (608) 555-5555, 5555 is returned.

North_American _Phone_Prefix The middle three numbers parsed from a phone number. In (608) 555-5555, 555 is returned.

North_American _Phone_Type The type of phone number that was parsed, if it is included with the in­ put. For example, Home or Work.

Person Set of components that define a single person.

For example, Thomas Williams-Doyle Sr., M.D.

Prename Prename (for example, Mr.).

Prename_Match_Std1-6 The match standard for a prename. For example, MR. is the match standard or alias for Mister.

If the dictionary does not have an alias entry, the output field is empty.

Rule_Label Retrieves the rule that parsed the indicated item.

Score Retrieves the confidence score for a parsed item.

SSN The entire Social Security number.

SSN_Area The first three numbers of the Social Security number.

SSN_Group The fourth and fifth numbers within a Social Security number.

SSN_Serial The last four numbers in a Social Security number.

Title Job or occupational title of a person. For example, Manager.

Title_Match_Std1-6 The match standard for title. For example, CFO is the match standard or alias for Chief Financial Officer.

If the dictionary does not have an alias entry, the output field is empty.

UDPM Attribute field defined in User-defined pattern rules in Cleansing Pack­ age BuilderReference Data.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 289 Generated field name Description

UDPM_ Subcomponent1-5 Subcomponents of the UDPM attribute field defined in a User-defined pattern rule.

18.5 Geocoder fields

The Geocoder transform requires that you map fields on input and output. These mappings tell the transform how to process the data in the field.

18.5.1 Input fields

The following are recognized input fields that you can use in the input mapping for the Geocoder transform. The fields are listed alphabetically.

Name Description

Country The two-character ISO country code.

Latitude A relative distance north or south of the equator, measured in 0-90 de­ grees.

Locality1–4 The city, town, or suburb and any additional related information.

Longitude A relative distance east or west of the Greenwich meridian, measured in 0-180 degrees.

Max_Records The maximum number of records that can be returned. You can enter a number up to 100.

A value greater than 0 outputs multiple results as XML to the Result_List output field rather than to individual output fields.

The value of the Max_Records input field takes precedence over the value of the Default Max Records option. The value of the Default Max Records option is only used if the Max_Records input field is not mapped or is blank.

POI_Name The name of a point of interest, such as the Washington Monument.

Developer Guide 290 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Name Description

POI_Type The point-of-interest type expressed as a number; for example, for one vendor 5999 is historical monument. If you want to return an address only, enter ADDR.

To return multiple point-of-interest types, concatenate POI type codes using a colon as a delimiter. For example, to return all schools (type 8211) and libraries (type 8231) within a defined area, you would enter:

8211:8231

To return a point-of-interest type and its address, enter:

5999:ADDR

The POI types and their corresponding codes differ depending on the data vendor that you use. For a detailed list of available POI types, see the vendor-specific directory update.

Postcode1–2 The postal code and a secondary postal code, if available.

Primary_Name1–4 The street name.

Primary_Number The premise number.

Primary_Postfix1 Abbreviated directional (N, S, NW, SE) that follows a street name.

Primary_Prefix1 Abbreviated directional (N, S, NW, SE) that precedes a street name.

Primary_Type1–4 Abbreviated type of primary name (St., Ave., or Pl.).

Radius The distance from a specified reference point used to identify an area in which matching records are located.

The value of the Radius input field takes precedence over the value of the Default Radius option. The value of the Default Radius option is used only when the Radius input field is not mapped or is blank.

Region1–2 The region symbol of the state, province, or territory.

Search_Filter_Name Search criteria for a point-of-interest name.

Search_Filter_Type Search criteria for a point-of-interest type, expressed as a four-digit number; for example, for one vendor 5999 is historical monument. If you want to return an address only, enter ADDR.

To return multiple point-of-interest types, concatenate POI type codes using a colon as a delimiter. For example, to return all schools (type 8211) and libraries (type 8231) within a defined area, you would enter:

8211:8231

To return a point-of-interest type and its address, enter:

5999:ADDR

The POI types and their corresponding codes differ depending on the data vendor that you use. For a detailed list of available POI types, see the vendor-specific directory update.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 291 18.5.2 Output fields

The following are recognized output fields that you can use in the output mapping for the Geocoder transform. The fields are listed alphabetically.

Name Description

Assignment_Level The level to which this transform matched the address to the data in the reference fields (directories).

PRE: Primary Number Exact assigns to the exact location of the address; for example, 123 Main St. This is the most precise level of assignment. To obtain the PRE, you must map either the POI_Type input field or the Primary_Name and Primary_Number input fields.

PRI: Primary Number Interpolated assigns to the level of the address range; for exam­ ple, 100-500 Main St.

L1-4: Locality1-4 assigns to the level of city, town, or suburb.

P1: Postcode1 assigns to the level of Postcode1.

P2P: Postcode2 Partial assigns the full Postode1 and the first few characters of Post­ code2.

PF: Postcode Full assigns to the level of Postode1 and Postcode2, when available.

Assignment_Level_Lo­ The level to which this transform assigned the locality. cality L1-4: Returns up to 4 locality levels. L1 is the most general and L4 is the most specific.

Assignment_Level_Post­ The level to which this transform assigns the Postcode. code P1: Postcode1 assigns to the level of Postcode1.

P2P: Postcode2 Partial assigns the full Postode1 and the first few characters of Post­ code2.

PF: Postcode Full assigns to the level of Postode1 and 2, when available.

Census_Tract_Block The census tract and block numbering area code as defined by the government for reporting census information. Census tracts are small, relatively permanent statisti­ cal subdivisions of a county. Block numbering areas are small statistical subdivisions of a county for grouping and numbering blocks.

Census_Tract_Block The census tract and block numbering area in the previous version of census data. _Prev

Cen­ The census tract and block group code as defined by the government for reporting sus_Tract_Block_Group census information. These codes are used for matching with demographic-coding da­ tabases. In the USA, the first six digits contain the tract number (for example, 002689); the first of the last four digits contains the BG number within the tract. The BG is a cluster of census blocks having the same first digit within a census tract. For example, BG 6 includes all blocks numbered from 6000 to 6999.

Developer Guide 292 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Name Description

Cen­ The census tract and block group code in the previous version of census data. sus_Tract_Block_Group_ Prev

Country_Code The two-character ISO country code.

Distance The distance from the input address, geographical coordinates, or point of interest to the closet address or point of interest.

Gov_County_Code A unique county code as defined by the government for reporting census information. For example, in the USA, this is a Federal Information Processing Standard (FIPS) three-digit county code.

Gov_Locality1_Code A unique code for an incorporated municipality such as a city, town, or locality as de­ fined by the government for reporting census information.

Gov_Region1_Code A unique region code as defined by the government for reporting census information. For example, in the USA, this is a Federal Information Processing Standard (FIPS) two-digit state code.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 293 Name Description

Info_Code A three-character code that provides information about the geocoding results. The status for address and point-of-interest geocoding assignment is indicated in the third character. The status for reverse geocoding assignment is indicated in the sec­ ond and third characters. If assigned to the best level, the Info_Code field is blank. The first character is not used at this time.

1: Reference data not available. The data for the input country is unavailable. Verify that the directory is installed and the reference path to the directory is valid.

2: Address data not available. When Best Assignment Level is set to Primary Number and the address directory is unavailable or doesn't exist, you see this code.

3: Centroid data not available. When Best Assignment Level is set to Locality or Post­ code, and the address directory is unavailable or doesn't exist, you see this code.

4: Assignment is limited. When Best Assignment Level is set to Locality or Postcode, and the input record is insufficient or incorrect, you see this code. The assignment may be made to a lower assignment level than the one specified. For example, if you choose Primary Number, and the Primary Number field is blank, you see this code; and the assignment may be at the Postcode or Locality level, if the data is available.

5: No data match. When the input record does not match the directory data for the Best Assignment Level, you see this code.

6: Ambiguous assignment. There is a tie for the Best Assignment Level.

7: Invalid input. Either the data is blank or invalid.

8: Insufficient input data. When the input data for the selected Best Assignment Level is blank, you see this code. For example, you see this code when you set the Best As­ signment Level to Primary Number and the input data is blank for Primary Number.

9: Invalid POI_Type input.

A: No POI input data used.

B: Reference point not found.

C: Not all results returned, because the number of results exceed the specified Max_Record.

D: Not all results returned, because the results exceed the field length available in the Result_List XML output field.

Latitude The latitude at the best assigned level (0-90 degrees north or south of the equator) in the format 45.32861.

Latitude_Locality The latitude at the locality level centroid of the city, town, locality, or suburb in the for­ mat 45.32861.

Latitude_Postcode The latitude at the Postcode level centroid of the Postcode in the format 45.32861.

Latitude_Primary_Num­ The latitude at the primary number level centroid of the primary number in the format ber 45.32861.

Locality1–4 The city, town, or suburb and any additional related information.

Developer Guide 294 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Name Description

Longitude The longitude at the best assigned level (0-180 degrees east or west of Greenwich meridian) in the format 123.45833.

Longitude_Locality The longitude at the locality level centroid of the city, town, locality, or suburb in the format 123.45833.

Longitude_Postcode The longitude at the Postcode level centroid of the Postcode in the format 123.45833.

Longitude_Pri­ The longitude at the primary number level centroid of the primary number in the for­ mary_Number mat 123.45833.

Metro_Stat_Area_Code The metropolitan statistical area. For example, in the USA, the 0000 code indicates the address does not lie in a metropolitan statistical area; usually a rural area. A met­ ropolitan statistical area has a large population that has a high degree of social and economic integration with the core of the area. The area is defined by the government for reporting census information.

Metro_Stat_Area_Code_ The metropolitan statistical area in the previous version of census data. Prev

Minor_Div_Code The minor civil division or census county division code when the minor civil division is not available. The minor civil division designates the primary government and/or ad­ ministrative divisions of a county such as a civil township or precinct. Census county division are defined in a state or province that does not have a well-defined minor civil division. The area is defined by the government for reporting census information.

Minor_Div_Code_Prev The minor civil division or census county division code in the previous version of cen­ sus data.

POI_Name The point of interest name, such as the Washington Monument.

POI_Type The point of interest type expressed as a four-digit number; for example, 5999 (his­ torical monument).

Population_Class_Local­ Indicates that the population falls within a certain size. ity1 0: Undefined. The population may be too large or small to provide accurate data.

1: Over 1 million.

2: 500,000 to 999,9999.

3: 100,000 to 499,999.

4: 50,000 to 99,999.

5: 10,000 to 49,999.

6: Less than 10,000.

Postcode1–2 The postal code and a secondary postal code, if available.

Primary_Name1–4 The street name.

Primary_Number The premise number.

Primary_Postfix1 Abbreviated directional (N, S, NW, SE) that follows a street name.

Primary_Prefix1 Abbreviated directional (N, S, NW, SE) that precedes a street name.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 295 Name Description

Primary_Range_High The high value of a primary number range.

Primary_Range_Low The low value of a primary number range.

Primary_Type1–4 Abbreviated type of primary name (St., Ave., or Pl.).

Ranking A numberic value that indicates how well the returned records match the input field based on the match score. A record with a ranking of 1 has the highest match score.

Region1–2 The region symbol of the state, province, or territory.

Result_List The XML output when multiple records are returned for a search.

Result_List_Count The number of results in the Result_List output field.

Side_Of_Primary_Ad­ Indicates that the location is on the L (left) or R (right) side of the street when moving dress north, northeast, northwest or east.

Stat_Area_Code A core based statistical area code where an area has a high degree of social and eco­ nomic integration within the core that the area surrounds. The area is defined by the government for reporting census information.

Stat_Area_Code_Prev The statistical area code in the previous version of census data.

18.6 Global Address Cleanse fields

The Global Address Cleanse transform requires that you map fields on input and output.

Related Information

Input fields [page 296] Output fields [page 302]

18.6.1 Input fields

The following are input fields that you can use in the Global Address Cleanse transform. The table also shows that each input field is available based on the engine(s) that you enable:

● Canada (C) ● Global Address (G) ● USA (U)

Developer Guide 296 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description Engine(s)

Address_Line The delivery address line, for example, "123 Main Street, Unit All engines 4."

Japan: Address_Line may represent the following address components:

● Block (chome, kumi, Hokkaido go), sub-block (banchi, gaiku, tochi kukaku), and house number (go) parts of the Japanese address. ● The Building Name, Building Floor, Building Room parts of the Japanese address. ● The P.O. Box portion of the address, if applicable.

China:

Address_Line may represent the following address compo­ nents:

● Street and street number. ● Building, Floor, Unit ● Residential community

For example,

晨晖路 123 号中华大厦 12 楼 1201 室

宝山新村 100 号 201 室

Country The identified country name of the address. All engines

Data_Source_ID Specifies the input source. This field is used in reports to All engines identify the record.

Firm The name of a company or organization. In some countries, All engines large firms have their own postal code. If you include a Firm field in your input, this transform may assign more specific postal codes.

Japan: All Firm data for addresses in Japan should be placed in this field.

China: China does not support Firm assignment. There is no Firm data for China. If the Firm is available on input, place it in this field.

Lastline The locality, region, and postal code on one line. All engines

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 297 Field Description Engine(s)

Locality1 The city, town, or suburb. All engines

Japan:

The city (shi), island (shima), ward (ku), county (gun) district (machi) or village (mura).

China: The Prefecture level localities. Prefectures (地区 diqu), Autonomous prefectures (自治州 zizhizhou),Prefecture-level cities (地级市 dijishi), Leagues (盟 meng), or Provincial coun­ tries (省直辖县,shengzhixiaxian).

Locality2 Any additional city, town, or suburb information. G, U

USA: The Puerto Rican urbanization.

Japan: Any additional ward, district, village or sub-district (aza, bu, chiwari, sen)

China: County level localities, Counties (县 xian), Autono­ mous counties (自治县 zizhixian), County-level cities (县级市 xianjishi), Districts (市辖区 shixiaqu), Banners (旗 qi), Auton­ omous banners (自治旗 zizhiqi), Forestry areas (林区 linqu), or Special districts (特区 tequ).

Locality3 Any additional city, town, or suburb information. G

Japan: Any additional district, village, sub-district (aza, bu, chiwari, sen, donchi, and tori), or super block (joh).

China: Township level localities, Townships (乡 xiang),Ethnic townships (民族乡 minzuxiang),Towns (镇 zhen),Subdistricts (街道办事处 jiedaobanshichu), District public offices (区公所 qugongsuo), Sumu (苏木 sumu), or Ethnic sumu (民族苏木 minzusumu).

Locality4 Any additional city, town, or suburb information. G

Japan: Any additional district, village, sub-district (aza, bu, chiwari, sen, donchi, and tori), or super block (joh).

China: Village-level localities, Administrative Villages(行政村 xingzhengcun), Neighborhood committees (社区居民委员会 juminwwiyuanhui), Neighborhoods or communities (社区 shequ), Village committees (村民委员会 cunminweiyuanhui), or Village groups (村民小组 cunminxiaozu).

Developer Guide 298 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description Engine(s)

Multiline1-12 A line that may contain any data. The type of data in this line All engines may vary from record to record.

Japan: Represents the lines that may contain any data with the following restrictions. The address in total has to be in the traditional order of a Japanese address. In addition, the block (chome, kumi, Hokkaido go), sub-block (banchi, gaiku, tochi kukaku), and house number (go) should be within one line on input.

Postcode The postal code. All engines

USA: The five-digit ZIP Code and ZIP+4.

Region1 The state, province, or region. All engines

Japan: Represents the prefecture (to, do, fu, ken). A prefec­ ture is similar to a state in the U.S.

China: Province-level regions, Provinces (省 sheng), Autono­ mous regions (自治区 zizhiqu), Municipalities (直辖市 zhixia­ shi), Special administrative regions (特别行政区 tebie xingz­ hengqu).

Suggestion_Reply1-6 Used to input the index number that corresponds to a spe­ All engines cific last line suggestion, an address line suggestion, or sec­ ondary list suggestion. These fields can also be used to input a street primary range or a street secondary range.

Suggestion_Reply1: If you do not want to use a suggestion list, make the value of this field 0 and the suggestion list will be ig­ nored.

If you want to use one field to hold all of the replies (rather than using all six reply fields), you can use the Sugges­ tion_Reply1 field and separate the replies with a pipe (|).

When using the Suggestion_Reply1-6 fields for SAP software for street and PO Box addresses, you can insert the following symbols to indicate whether the user has accepted changes made to the street address and when they are done with the street address:

● asterisk plus (*+): The user accepts the changes made to the street address up to the specified point and is done with the street address. ● asterisk minus (*-): The user does not accept the changes made to the street address up to the specified point and is done with the street address.

Suggestion_Start_Se­ Specifies the starting suggestion list number. If left blank, the All engines lection default value is 1.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 299 18.6.1.1 NW input fields

The NW input fields are designed to be used in conjunction with other NW input fields in SAP software. Use the fields properly to avoid unexpected results in your data.

The following table also shows that each input field is available based on the engine(s) that you enable:

● Canada (C) ● Global Address (G) ● USA (U)

See the fields listed in the transform's Input tab to view each field's properties.

Caution

Use the NW_ fields properly to avoid unexpected results in your data. .

NW input field name Description Engine (Global Address Cleanse)

NW_Building Contains the building information. All engines

If you map this input field, you must also map NW_Street.

NW_City1 Contains the locality. When you map NW input fields, this is a All engines required field. The NW_City1 and NW_City2 input fields must be mapped in sequence.

NW_City2 Contains additional locality or district information. All engines

NW_Country Contains the country. When you map NW input fields, this is All engines a required field.

NW_Floor_Num Contains the floor number. All engines

If you map this input field, you must also map NW_Street.

NW_Home_City Contains additional locality information. All engines

NW_House_Num1 Contains the house number. The NW_House_Num1 and All engines NW_House_Num2 input fields must be mapped in sequence.

If you map this input field, you must also map NW_Street.

NW_House_Num2 Contains additional house number information. The All engines NW_House_Num1 and NW_House_Num2 input fields must be mapped in sequence.

If you map this input field, you must also map NW_Street.

NW_Location Contains additional street information. All engines

If you map this input field, you must also map NW_Street.

NW_PO_Box_City Contains the locality. If any of the NW_PO_Box input fields All engines are mapped, then all of them must be mapped.

Developer Guide 300 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields NW input field name Description Engine (Global Address Cleanse)

NW_PO_Box_Country Contains the country. If any of the NW_PO_Box input fields All engines are mapped, then all of them must be mapped.

NW_PO_Box_Postcode Contains the postcode. If any of the NW_PO_Box input fields All engines are mapped, then all of them must be mapped.

NW_PO_Box_Region Contains the state, province, or region. If any of the All engines NW_PO_Box input fields are mapped, then all of them must be mapped.

NW_PO_Box Contains the PO Box number. If any of the NW_PO_Box input All engines fields are mapped, then all of them must be mapped.

NW_Postcode Contains the postcode. When you map NW input fields, this All engines is a required field.

NW_Region Contains the state, province, or region. When you map NW All engines input fields, this is a required field.

NW_Room_Num Contains the room number. All engines

If you map this input field, you must also map NW_Street.

NW_Str_Suppl1 Contains additional street information. The All engines NW_Str_Suppl1-3 input fields must be mapped in sequence.

If you map this input field, you must also map NW_Street.

NW_Str_Suppl2 Contains additional street information. The All engines NW_Str_Suppl1-3 input fields must be mapped in sequence.

If you map this input field, you must also map NW_Street.

NW_Str_Suppl3 Contains additional street information. The All engines NW_Str_Suppl1-3 input fields must be mapped in sequence.

If you map this input field, you must also map NW_Street.

NW_Street Contains the primary street name. When you map NW input All engines fields, this is a required field.

18.6.1.2 Mapping NW input fields

The NW input fields are designed to be used in conjunction with other NW input fields when cleansing address data in SAP software. The NW fields should be used only as a group and not individually in isolation. Use the fields properly to avoid unexpected results in your data.

You cannot map multiline or Address_Line input fields when you use the NW input fields. Although the NW input fields appear discrete, they behave and are processed as multiline fields. They are mapped internally to Multiline1-12 before normal Global Address Cleanse processing is performed. If an NW input field is not mapped, the multiline that would have been mapped to it is mapped to the next available NW input field.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 301 The following are restrictions for using NW input fields:

● You cannot use NW input fields with non-NW input fields. ● The NW_City1, NW_Country, NW_Postcode, NW_Region, and NW_Street input fields are always required. ● The NW_City1-2 input fields must be mapped in sequence. ● The NW_House_Num1-2 input fields must be mapped in sequence. ● The NW_Str_Suppl1-3 input fields must be mapped in sequence. ● If any of the NW_PO_Box input fields are mapped, then all of them must be mapped. ● If the NW_PO_Box input fields are mapped, then a minimum of the following NW street-level fields must be mapped:

○ NW_City1 ○ NW_Country ○ NW_Postcode ○ NW_Region ○ NW_Street

CJK script

For CJK script input, several NW input fields are concatenated into the last multiline field.

If a descriptor is found on NW_House_Num2, the multiline is processed as follows (+ indicates that the fields are concatenated with no space between them):

NW_Street+NW_House_Num1+NW_House_Num2+NW_RoomNumber+NW_Floor +NW_Building

If no descriptor is found on NW_House_Num2, the multiline is processed as follows:

NW_Street+NW_House_Num1+NW_RoomNumber+NW_Floor+NW_Building NW_House_Num2

18.6.2 Output fields

The following are output fields that can be used for the Global Address Cleanse transform. The table shows that each output field is available based on the engine(s) that you enable.

The table also shows that each field is available based on the engine(s) that you enable:

● Canada (C) ● Global Address (G) ● USA (U)

Developer Guide 302 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Additional_Info1 Austria: Includes the PAC code of the currently C, G valid address when you choose to preserve the alias address on output.

Belgium: Includes the NIS code.

Canada: The official 13-character abbreviation of the city name, or the full spelling if the city name is less than 13 characters (including spaces).

France: Includes the INSEE code.

Germany: Includes a portion of the German freightcode (Frachtleitcode).

Liechtenstein: Includes the postal service dis­ trict (Botenbezirke) when it is available in the data.

Poland: Includes the district name (powiat).

South Korea: Includes administration number (25-digit).

Spain: Includes the INE 91 section code.

Switzerland: Includes the postal service district (Botenbezirke) when it is available in the data.

Additional_Info2 Austria: Includes the City ID (OKZ). C, G

Canada: The official 18-character abbreviation of the city name, or the full spelling if the city name is less than 18 characters (including spaces).

Germany: Includes the District Code.

Liechtenstein: Additional postcode.

Poland: Includes the community name (gmina).

Spain: Includes the INE Street code.

Switzerland: Additional postcode.

Additional_Info3 Austria: Includes the Pusher-Leitcode (parcel). G

Germany: Includes the German City ID (ALORT).

Spain: Includes the INE Town code.

Additional_Info4 Austria: Includes the Pusher-Leitcode (letter). G

Germany: Includes the German street name ID (StrSchl).

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 303 Output field name (Global Address Cleanse) Description Engine

Additional_Info5 Austria: Includes the SKZ Street Code (7-digit). G

Germany: Includes the discount code for the freightcode.

Additional_Info6 Austria: Includes the corner-house identifica­ G tion (1-digit). The value for a corner house is 1.

Additional_Info7-8 Reserved for future use. All engines

Address_Line_Remainder1-4 Extraneous data found in the address line, All engines which either cannot be identified or does not belong in a standardized address.

USA 1-2: Complete secondary non-postal ad­ dress (for example, Apt. 10, Ste 500, Box 34, Rm 7, 5th Flr).

Address_Type A one-character code that represents the type All engines of address identified:

P: Postal

S: Street

X: Unknown

Area_Name1 An industrial area such as RIICO INDUSTRIAL G AREA.

Assignment_Info Indicates whether a record is valid, invalid, or All engines corrected, based on the status and information codes.

C: Corrected

I: Invalid

V: Valid

Developer Guide 304 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Assignment_Level The level to which this transform matched the All engines address to the data in the reference files (direc­ tories):

C: Country

L1: Locality1

L2: Locality2

L3: Locality3

L4: Locality4

PN: Primary name

PR: Primary range

R: Region

S: Secondary

X: Unknown, or the address was unassigned

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 305 Output field name (Global Address Cleanse) Description Engine

Assignment_Type A one- or two-character code that represents All engines the type of address.

Engine support varies; see each code listing for supported engines.

BN: Building name (Canada, Global Address)

F: Firm (Canada, Global Address, USA)

G: General delivery (Canada, Global Address, USA)

H: High-rise building (Canada, USA)

HB: House Boat (Global Address)

L: LOT (Global Address)

M: Military (Canada, USA)

R: Rural (Canada, USA)

P: Postal (Canada, Global Address, USA)

PI: Point of reference (Global Address)

PR: Poste Restante (Global Address)

PS: Packstation or Paketbox (Global Address)

RP: Postal Served by Route (Global Address)

S: Street (Canada, Global Address, USA)

SR: Street served by route (Canada, Global Ad­ dress)

U: Uninhabited (Global Address)

W: Caravan (Global Address)

X: Unknown or the address was unassigned (Canada, Global Address, USA)

Block_Description Block description such as "Block." G

Block_Full A compound output field consisting of the G Block_Description and Block_Number output fields.

Block_Number Block number. G

Developer Guide 306 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Building_Name1 The building name for the address, which in G some countries is used in place of the primary number. For example, in the U.K. an address may be “White House, High Street,” where “White House” is the building name instead of a primary number in an address such as “100 High Street.”

Building_Name1_2 A compound output field consisting of the G Building_Name1 and Building_Name2 output fields.

Building_Name2 The building name for the address, which in G some countries is used in place of the primary number.

Building_Primary_Addr_Delivery_Dual A compound output field consisting of the All engines Building_Name1, Building_Name2, Primary_Ad­ dress (delivery) and Primary_Address (dual) output fields.

Building_Primary_Secondary_Addr_Deliv­ A compound output field consisting of the All engines ery_Dual Building_Name1, Building_Name2, Pri­ mary_Secondary_Address (delivery), and Pri­ mary_Secondary_Address (dual) output fields.

Cert_Valid Indicates a valid certification. All engines

Country The ISO country code or the country name of All engines the input record. The parsed value of this com­ ponent is the country data found in the input re­ cord.

Country_Name Fully-spelled country name in the languages All engines specified in the Output_Country_Language op­ tion.

County_Name Fully spelled county name. U

USA: County information is not included on mail pieces.

Delivery_Installation_Full A compound output field consisting of the De­ C, G livery_Installation_Name, Delivery_Installa­ tion_Qualifier, and Delivery_Installation_Type output fields.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 307 Output field name (Global Address Cleanse) Description Engine

Delivery_Installation_Name The delivery installation city name, which is C, G usually the same as the city name and (if it is the same) omitted from the address line.

Canada: If the delivery installation name is dif­ ferent than the locality name, the delivery in­ stallation name is output to the secondary ad­ dress fields.

Japan: Returns the post office name.

Delivery_Installation_Qualifier Delivery Installation qualifier (for example, C “Main” in “RR 2 Vancouver Stn Main”).

Delivery_Installation_Type The delivery installation type. C English:

PO: Post Office

RPO: Retail Post Outlet

STN: Station

LCD: Letter Carrier Depot

CMC: Community Mail Center

CDO: Commercial Dealership Outlet

French:

BDP: Bureau de Poste

CSP: Comptoir Service Postal

SUCC: Succursale.

PDF: Poste de Facteurs

CPC: Centre Postal Communautaire

CC: Concession Commerciale

Delivery_Point Australia: Eight-digit delivery point identifier. G This is the primary component needed to gen­ erate a barcode.

This component is not printed on mail pieces.

Austria: Includes the PAC code, which is a unique identifier assigned by the Austrian postal authority.

New Zealand: A seven-character code that rep­ resents the delivery-point identifier.

United Kingdom: A two-character code that represents the delivery-point suffix.

Developer Guide 308 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Engine_Name The name of the engine that was selected to All engines process the record.

Error Specifies the error status generated as the re­ All engines sult of looking up the current record and per­ forming suggestion processing. Possible output values are 0- 6.

0: No suggestion selection error.

1: Blank suggestion selection/entry.

2: Invalid suggestion selection.

3: Invalid primary range.

4: Invalid floor range.

5: Invalid unit range.

6: Too many possible results to generate a sug­ gestion list. Provide more information, such as a postal code, region, or locality.

Extra1-12 Any non-address data found in the address All engines block. Available only if the input data was pre­ sented through multiline fields.

Firm The firm name for the address. All engines

Identification of firm name data in a multiline format may be inconsistent depending upon the level of firm data available in the postal di­ rectories for each engine. To avoid inconsistent identification of firm data, use the discrete Firm field when you process multiline data.

Canada and USA: The firm name is taken from the postal directory if found; otherwise, it’s taken from the input record. Be aware that the postal directory might contain some unusual or shortened spellings that you may or may not find suitable for printing on mail pieces. If you prefer to retain your own firm data, retrieve the parsed component.

Global Address: If the firm name is available on input or from reference data, the Global Ad­ dress engine returns the firm name.

Floor_Description The level description, such as “Floor.” All engines

Japan: The level description, such as kai.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 309 Output field name (Global Address Cleanse) Description Engine

Floor_Full A compound output field consisting of the All engines Floor_Description, Floor_Number, and Floor_Qualifier output fields.

Floor_Number The level number or information. All engines

Floor_Qualifier Additional word that precedes or follows the All engines floor information.

Full_Address The complete address line, including secondary All engines address, and dual address (street and postal).

Info_Code If the address is not fully assigned, displays a All engines four-character code that describes why the ad­ dress could not be assigned. If the address is fully assigned, the field is blank.

For more information, see #unique_330.

ISO_Country_Code_2Char The two-character ISO code that identifies a All engines country, for example, DE is Germany.

ISO_Country_Code_3Char The ISO-3166 three-character code that identi­ All engines fies a country, for example, DEU is Germany.

ISO_Country_Code_3Digit The three-digit ISO code that identifies a coun­ All engines try, for example, 276 is Germany.

ISO_Script_Code The four-character script code to use for an All engines identified country, such as LATN or KANA.

Language The two-character ISO language code that rep­ All engines resents the language of the address.

Lastline The locality (Locality1–Locality4 if available), All engines region, and postal code together in one compo­ nent. The region is only included when it is re­ quired for select countries.

Lastline_Remainder1-4 Unused lastline remainder data. G

Locality_Code Used in some countries to distinguish sections G of a large locality. For example, in France they are called arrondissements.

Locality1_2_Full A compound output field consisting of the Lo­ All engines cality1_Full and Locality2_Full output fields.

Locality1_2_Name A compound output field consisting of the Lo­ All engines cality1_Name and Locality2_Name output fields.

Locality1_4_Full A compound output field consisting of the Lo­ All engines cality1_Full, Locality2_Full, Locality3_Full, and Locality4_Full output fields.

Developer Guide 310 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Locality1_4_Name A compound output field consisting of the Lo­ All engines cality1_Name, Locality2_Name, Local­ ity3_Name, and Locality4_Name output fields.

Locality1_Addition Additional locality information. G

Locality1_Alternate Preserves the input locality if it is recognized by C, U the postal authority as a locality name for this address. Misspellings are corrected.

Locality1_Description Locality1 descriptor. G

Japan: Locality1 descriptor. For example, shi, shima, and so on.

China: Locality1 descriptor. For example, 市 (Shi).

Locality1_Full Includes Locality1_Name, Locality_Code, Local­ All engines ity1_Description, and Locality1_Qualifier. It may include Locality1_Addition, depending on the standardization option settings of Locality Name Style and Include Locality Addition.

Locality1_Name The city, town, locality, or suburb that is either All engines the Locality1_Alternate or Locality1_Official, de­ pending on the standardization option setting for Assign Locality.

Japan: The city (shi), island (shima), ward (ku), county (gun), district (machi), or village (mura).

Locality1_Official The locality name preferred by the postal au­ All engines thority.

Locality1_Qualifier Used by France for Cedex. G

Locality2_4_Full A compound output field consisting of the Lo­ All engines cality2_Full, Locality3_Full, and Locality4_Full output fields.

Locality2_4_Name A compound output field consisting of the Lo­ All engines cality2_Name, Locality3_Name, and Local­ ity4_Name output fields.

Locality2_Description Description of a subdivision of Locality1. G

Locality2_Full Includes Locality2_Name and Locality2_De­ G, U scription.

Locality2_Name Additional locality information. G, U

USA: Urbanization (Puerto Rican addresses only).

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 311 Output field name (Global Address Cleanse) Description Engine

Locality2_Official The locality name preferred by the postal au­ G thority.

Locality2_Qualifier Locality qualifier. G

Locality3_4_Full A compound output field consisting of the Lo­ All engines cality3_Full and Locality4_Full output fields.

Locality3_4_Name A compound output field consisting of the Lo­ All engines cality3_Name and Locality4_Name output fields.

Locality3_Description Description of a subdivision of Locality2. G

Locality3_Full Includes Locality3_Name and Locality3_De­ G scription.

Locality3_Name Additional locality information. G

Locality3_Official The locality name preferred by the postal au­ G thority.

Locality3_Qualifier Locality qualifier. G

Locality4_Description Description of a subdivision of Locality3. G

Locality4_Full Includes Locality4_Name and Locality4_De­ G scription.

Locality4_Name Additional locality information. G

Locality4_Official The locality name preferred by the postal au­ G thority.

Locality4_Qualifier Locality qualifier. G

Match_Block_Number A form of Block_Number that may be used in All engines the Match transform during the comparison process. Block descriptions are not output. Data is output in uppercase, diacritical charac­ ters and apostrophes are removed, and other punctuation and multiple spaces are replaced with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Developer Guide 312 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Match_Building_Name A form of Building_Name1 that may be used in All engines the Match transform during the comparison process. Builiding descriptions are output. Data is output in uppercase, diacritical characters and apostrophes are removed, and other punc­ tuation and multiple spaces are replaced with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Note

For China, building indicators will be re­ moved.

Match_Country A form of ISO_Country_Code_2Char that may All engines be used in the Match transform during the comparison process. Data is output in upper­ case, diacritical characters and apostrophes are removed, and other punctuation and multi­ ple spaces are replaced with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Match_Floor_Number A form of Floor_Number that may be used in All engines the Match transform during the comparison process. Floor descriptions and qualifiers are not output. Data is output in uppercase, diacrit­ ical characters and apostrophes are removed, and other punctuation and multiple spaces are replaced with a single space. The field is availa­ ble only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 313 Output field name (Global Address Cleanse) Description Engine

Match_Locality A form of locality that may be used in the Match All engines transform during the comparison process. The output is not affected by standardization set­ tings. Locality1_Official is output when a locality or better level assignment is made; otherwise, the Locality1_Name is output. Locality codes, qualifiers, or descriptions are not output. Data is output in uppercase, diacritical characters and apostrophes are removed, and other punc­ tuation and multiple spaces are replaced with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Match_Locality2 A form of locality that may be used in the Match All engines transform during the comparison process. The output is not affected by standardization set­ tings. Locality2_Official is output when a local­ ity or better level assignment is made; other­ wise, the Locality2_Name is output. Locality co­ des, qualifiers, or descriptions are not output. Data is output in uppercase, diacritical charac­ ters and apostrophes are removed, and other punctuation and multiple spaces are replaced with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Note

For China, Japan, and South Korea, and Tai­ wan, Locality2–4_Official or Locality2– 4_Name are output, if present. For all other countries, only Locality2_Official or Local­ ity2_Name is output.

Developer Guide 314 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Match_Postcode1 A form of Postcode1 that may be used in the All engines Match transform during the comparison proc­ ess. Data is output in uppercase, diacritical characters and apostrophes are removed, and other punctuation and multiple spaces are re­ placed with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Match_Primary_Directional A form of Primary_Prefix1 and Primary_Postfix1 All engines that may be used in the Match transform dur­ ing the comparison process. The output is not affected by standardization settings. The ab­ breviated form is output, if available. If a prefix and postfix are both present, they are sepa­ rated by a space. Data is output in uppercase, diacritical characters and apostrophes are re­ moved, and other punctuation and multiple spaces are replaced with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Match_Primary_Name A form of Primary_Name1 that may be used in All engines the Match transform during the comparison process. The output is not affected by stand­ ardization settings. Data is output in upper­ case, diacritical characters and apostrophes are removed, and other punctuation and multi­ ple spaces are replaced with a single space. Prefix, postfix, suffix, and type data is removed. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 315 Output field name (Global Address Cleanse) Description Engine

Match_Primary_Name2 A form of Primary_Name2 that may be used in All engines the Match transform during the comparison process. The output is not affected by stand­ ardization settings. Data is output in upper­ case, diacritical characters and apostrophes are removed, and other punctuation and multi­ ple spaces are replaced with a single space. Prefix, postfix, suffix, and type data is removed. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Note

For Brazil, China, and Japan, Pri­ mary_Name2–4 are output, if present. For all other countries, only Primary_Name2 is output.

Match_Primary_Number A form of Primary_Number that may be used in All engines the Match transform during the comparison process. Only the Primary_Number and Pri­ mary_Number_Extra are output, not the Pri­ mary_Number_Description. Data is output in uppercase, diacritical characters and apostro­ phes are removed, and other punctuation and multiple spaces are replaced with a single space. The field is available only as a Best com­ ponent.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Developer Guide 316 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Match_Primary_Type A form of Primary_Type1 that may be used in All engines the Match transform during the comparison process. The output is not affected by stand­ ardization settings. The abbreviated primary type is output, if available. Data is output in up­ percase, diacritical characters and apostrophes are removed, and other punctuation and multi­ ple spaces are replaced with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Note

For Brazil, China, and Japan, Pri­ mary_Type1–4 are output, if present. For all other countries, only Primary_Type1 is out­ put.

Match_Region A form of Region1_Name that may be used in All engines the Match transform during the comparison process. Data is output in uppercase, diacritical characters and apostrophes are removed, and other punctuation and multiple spaces are re­ placed with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Match_Stairwell_Name A form of Stairwell_Name that may be used in All engines the Match transform during the comparison process. Stairwell descriptions are not output. Data is output in uppercase, diacritical charac­ ters and apostrophes are removed, and other punctuation and multiple spaces are replaced with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 317 Output field name (Global Address Cleanse) Description Engine

Match_Unit_Number A form of Unit_Number that may be used in the All engines Match transform during the comparison proc­ ess. Unit descriptions and qualifiers are not output. Data is output in uppercase, diacritical characters and apostrophes are removed, and other punctuation and multiple spaces are re­ placed with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Match_Wing_Name A form of Wing_Name that may be used in the All engines Match transform during the comparison proc­ ess. Wing descriptions are not output. Data is output in uppercase, diacritical characters and apostrophes are removed, and other punctua­ tion and multiple spaces are replaced with a single space. The field is available only as a Best component.

Non-Latin scripts are transliterated for sup­ ported scripts if the Output Address Script op­ tion is selected. For CJK scripts, the field uses normal width standardization for consistency.

Multiline1-12 A line that may contain any data. The type of All engines data in this line may vary from record to record.

NW_Formatted_Postcode The postcode in a format that SAP software re­ All engines quires.

NW_ output fields For a list of the NW_ output fields and their de­ scriptions, see the next section.

NW_Postcode_In_Supported_Format Indicates whether the NetWeaver_Format­ All engines ted_Postcode output field is populated.

PMB_Full Contains private mailbox information. All engines

PName_Secondary_Addr Contains the full primary name (with no associ­ All engines ated primary number) and the full secondary address.

Point_Of_Reference1_2 A compound output field consisting of the All engines Point_of_Reference1 and Point_of_Reference2 output fields.

Point_Of_Reference1-2 A well known place or easily visible location to G help locate an address. For example, Opposite to Citibank ATM.

Developer Guide 318 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Post_Office_Name The name or numeric representation for a post G office, such as, "01" BP 1012.

Postcode_Description A word that indicates a postal code, when avail­ G able on input. For example:

Brazil: CEP, which stands for Código de Endere­ çamento Postal, and is output as CEP 52041-970.

China: 邮编

Japan: 〒

Postcode_Full Australia: Complete four-digit postal code. All engines

Canada: Complete six-character postal code (FSA + LDU).

Global Address: Complete postal code.

USA: The full ZIP Code with a hyphen (10 char­ acters).

Japan: The seven-digit postal code.

Postcode_In_Valid_Format Indicates whether the postcode is in the correct All engines format as defined by the postal authority for that country.

Postcode_Prefix The postcode prefix that is used by some Euro­ G pean countries. For example, many countries use the same postal code format of four or five digits. You can prefix the numeric postal code with a country code to avoid confusion when sending mail to or from the European country. The codes used are generally based on License plate codes (D for Germany or F for France) rather than ISO codes.

Postcode1 Australia: Four-digit postcode. All engines

Canada: First three characters (FSA) of the postal code.

Global Address: Postal code.

Japan: The first three digits of the postal code.

USA: Five-digit primary postal code (ZIP Code). Does not include the four-digit secondary postal code (ZIP4).

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 319 Output field name (Global Address Cleanse) Description Engine

Postcode2 The secondary postal code. All engines

Canada: The last three characters (LDU) of the postal code.

Japan: Contains the last four digits of the postal code.

USA: The four-digit ZIP Code, which on a mail piece, this code follows the primary postal code with a hyphen placed between (for example, 54601-1234).

Primary_Address Primary address line, such as the street ad­ All engines dress or post office box. Does not include sec­ ondary address information such as apartment.

Japan: The full block data.

Primary_Address_Delivery_Dual A compound output field consisting of the Pri­ All engines mary_Address1-4 (delivery) and Primary_Ad­ dress1-4 (dual) output fields.

Primary_Delivery_Mode The delivery mode for a street served by route C type address (Rural Route).

Primary_Delivery_Number The delivery number for a street served by C route type address (Rural Route).

Primary_Name_Full1 The primary name, primary type, primary pre­ All engines fix, and primary postfix.

Primary_Name_Full1_2 A compound output field consisting of the Pri­ All engines mary_Name_Full1 and Primary_Name_Full2 output fields.

Primary_Name_Full1_4 A compound output field consisting of the Pri­ All engines mary_Name_Full1, Primary_Name_Full2, Pri­ mary_Name_Full3, and Primary_Name_Full4 output fields.

Primary_Name_Full2 The primary name2, primary type2, primary G prefix2, and primary postfix2.

Primary_Name_Full3_4 A compound output field consisting of the Pri­ All engines mary_Name_Full3 and Primary_Name_Full4 output fields.

Primary_Name_Full3-4 The primary name and primary type. G

Developer Guide 320 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Primary_Name1 The street name description (typically a street All engines name or box description).

Japan: Block (chome, kumi, Hokkaido go), sub- block (banchi, gaiku, tochi kukaku).

The Post office name description (yuubinn­ kyoku or siten).

Primary_Name2 Second street and name description, typically a G street name or box description.

Japan: Additional block and sub-block informa­ tion.

Primary_Name3 The street name, delivery mode, and so on. G

Japan: Additional block and sub-block informa­ tion.

Primary_Name4 The street name, delivery mode, and so on. G

Japan: Additional block and sub-block informa­ tion.

Primary_Number The premise number, rural route number, or All engines PO Box number. In some cases it may include a range.

Primary_Number_Description A description preceding the primary number. G For example, KM (Kilometer) or Blk.

Japan: The postal number identifier 号 (go) or house number description 号 (go).

China: The description after street number. For example, 号(hao).

Primary_Number_Extra Data found near the parsed primary number, G which in most cases cannot be identified or does not belong in a standardized address.

Japan: The postal box identifier.

Primary_Number_Full The primary number, primary number descrip­ All engines tion, and primary number extra.

Primary_Postfix1 Abbreviated or non abbreviated directional (for All engines example, N, South, NW, SE) that follows a street name. Abbreviated or non abbreviated is based on the standardization setting for Direc­ tional Style.

Japan: Directional that follows block or sub- block.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 321 Output field name (Global Address Cleanse) Description Engine

Primary_Postfix2 Abbreviated or non abbreviated directional (for G example, N, South, NW, SE) that follows a street name. Abbreviated or non abbreviated is based on the standardization setting for Direc­ tional Style.

Japan: Directional that follows block or sub- block.

Primary_Prefix1 Abbreviated or non abbreviated directional (N, G, U South, NW, SE) that precedes a street name. Abbreviated or non abbreviated is based on the standardization setting for Directional Style.

Japan: Directional that precedes block or sub- block.

Primary_Prefix2 Abbreviated or non abbreviated directional (N, G South, NW, SE) that precedes a street name. Abbreviated or non abbreviated is based on the standardization setting for Directional Style.

Japan: Directional that precedes block or sub- block.

Primary_Secondary_Addr_Delivery_Dual A compound output field consisting of the Pri­ All engines mary_Secondary_Address (delivery) and Pri­ mary_Secondary_Address (dual) output fields.

Primary_Secondary_Address The primary address and secondary address in All engines one component.

Primary_Type1 The type of primary name (some examples are All engines rue, strasse, street, Ave, or Pl).

Primary_Type2-4 The type of primary name (some examples are G rue, strasse, street, Ave, or Pl).

Quality_Code Displays a two-character code that provides All engines additional information about the quality of the address. The quality of the address depends on the input data, the processing engine, country, information code, and status code (if an infor­ mation code is not generated).

For more information, see #unique_331.

Region1 Either the Region1_Name or Region1_Symbol All engines based on the standardization option Region Style.

Region1_2_Full A compound output field consisting of the Re­ All engines gion1_Full and Region2_Full output fields.

USA: Does not include Region2_Full.

Developer Guide 322 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Region1_2_Name A compound output field consisting of the Re­ All engines gion1_Name and Region2_Name output fields.

USA: Does not include Region2_Name.

Region1_Code The region code, which may be the ISO region All engines code.

Region1_Description The Region1 description. G

Region1_Full Includes Region1 and Region1_Description. All engines

Region1_Name The fully spelled out Region1 name. All engines

Region1_Symbol An abbreviation of the Region1 name. All engines

Region2 Either the Region2_Name or Region2_Symbol G, U based on the standardization option Region Style.

USA: Contains the county name.

Region2_Code The region code, which may be the ISO region All engines code.

Region2_Description The Region2 description. G

Region2_Full Includes Region2 and Region2_Description. G

Region2_Name The fully spelled out Region2 name. G

Region2_Symbol An abbreviation of the Region2 name. G

Remainder_Extra_PMB_Full A compound output field consisting of the Re­ All engines mainder_Full, Extra1, Extra2, and PMB_Full out­ put fields.

Remainder_Full Contains all remainder information, including All engines Address_Line_Remainder1-4 and Lastline_Re­ mainder1-4.

Room_Full A compound output field consisting of the All engines Unit_Description (if it contains “room” or a var­ iant) and Unit_Number output fields.

Room_Number The unit number for units that are variations of All engines “room” (for example, RM, RMS, ROOM, ROOMS, RM., RMS, 号室, 室, 호).

Secondary_Address The block, floor, unit, stairwell, or wing data on All engines one line.

Secondary_Address_No_Floor_No_Room A compound output field consisting of all Sec­ All engines ondary_Full output fields except Floor_Full and Room_Full.

Secondary_Address_No_Floor A compound output field consisting of all Sec­ All engines ondary_Full output fields except Floor_Full.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 323 Output field name (Global Address Cleanse) Description Engine

Secondary_Address_No_Room A compound output field consisting of all Sec­ All engines ondary_Full output fields except Room_Full.

Single_Address The full address and last line in one component. All engines

Stairwell_Description Entrance or stairwell identifier for a building, G such as, "Entrada" 1.

Stairwell_Full A compound output field consisting of the All engines Stairwell_Description and Stairwell_Name out­ put fields.

Stairwell_Name The name or number of an entrance or stairwell G for a building, such as, Entrada "1."

Status Specifies the suggestion status generated as All engines the result of looking up the current record and performing suggestion processing.

A: Primary address-line suggestions available.

AM: Follow up primary address-line sugges­ tions available.

F: Floor range is invalid.

L: Lastline suggestions available.

N: No suggestions available.

R: Primary range is invalid.

S: Unit range is invalid.

U: Secondary address-line suggestions availa­ ble.

UM: Follow up secondary address-line sugges­ tions available.

Status_Code Displays a six-character code that always starts All engines with an S. This code explains what parts of the address changed during processing.

For more information, see #unique_332.

Unit_Description The unit description, such as “Apartment” or All engines “Flat.”

Japan: The unit description, such as gousitsu.

Unit_Full A compound output field consisting of the Unit All engines Description, Unit_Number, and Unit_Qualifier output fields.

Unit_Number The unit number, such as 100 in “Apartment All engines 100.”

Developer Guide 324 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Output field name (Global Address Cleanse) Description Engine

Unit_Qualifier Additional word that precedes or follows the G unit information.

Wing_Description Identifies a wing within a building, such as, West G "Wing."

Wing_Full A compound output field consisting of the All engines Wing_Description and Wing_Name output fields.

Wing_Name The name or number of a wing within a building, G such as "West" Wing.

The NW_PO_Box output fields are populated only when fields are mapped to NW input fields and are used only for the PO Box address portion of SAP business suite software.

The following is a list of the available NW_PO_Box output fields. The content of each NW_PO_Box field is identical to its corresponding output field without the prefix.

● NW_PO_Box_Assignment_Info ● NW_PO_Box_Assignment_Level ● NW_PO_Box_Assignment_Type ● NW_PO_Box_Delivery_Installation_Full ● NW_PO_Box_Delivery_Point ● NW_PO_Box_Info_Code ● NW_PO_Box_ISO_Country_Code_2Char ● NW_PO_Box_ISO_Script_Code ● NW_PO_Box_Locality1_Full ● NW_PO_Box_Match_Block_Number ● NW_PO_Box_Match_Building_Name ● NW_PO_Box_Match_Country ● NW_PO_Box_Match_Floor_Number ● NW_PO_Box_Match_Locality ● NW_PO_Box_Match_Locality2 ● NW_PO_Box_Match_Postcode1 ● NW_PO_Box_Match_Primary_Directional ● NW_PO_Box_Match_Primary_Name ● NW_PO_Box_Match_Primary_Name2 ● NW_PO_Box_Match_Primary_Number ● NW_PO_Box_Match_Primary_Type ● NW_PO_Box_Match_Region ● NW_PO_Box_Match_Stairwell_Name ● NW_PO_Box_Match_Unit_Number ● NW_PO_Box_Match_Wing_Name ● NW_PO_Box_NW_Formatted_Postcode ● NW_PO_Box_NW_Postcode_In_Supported_Format ● NW_PO_Box_Postcode_Full ● NW_PO_Box_Postcode_In_Valid_Format

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 325 ● NW_PO_Box_Primary_Address ● NW_PO_Box_Primary_Number ● NW_PO_Box_Primary_Secondary_Address ● NW_PO_Box_Region1 ● NW_PO_Box_Region1_Full ● NW_PO_Box_Region2 ● NW_PO_Box_Region2_Full ● NW_PO_Box_Status_Code

18.6.3 Global Address Cleanse Suggestion List fields

The Global Address Cleanse transform's Suggestion Lists option requires that you map fields on input and output.

Note

The Suggestion Lists option does not support Chinese and Japanese addresses.

Related Information

Suggestion List Input Fields [page 326] Suggestion List Output Fields [page 326]

18.6.3.1 Suggestion List Input Fields

The Global Address Cleanse transform's Suggestion List option supports all Global Address Cleanse input fields in addition to the suggestion reply fields.

Field Description

Suggestion_Reply1-6 Contains the reply when more information is needed to complete the query. Each of these fields also contain the reply if a selection from a list needs to be made. Possible types of generated suggestion lists are lastline, primary name, and address.

18.6.3.2 Suggestion List Output Fields

The following are fields that you can use in output mapping for the Global Address Cleanse transform's Suggestion List option. The fields are listed alphabetically.

Developer Guide 326 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

Building_Name The building name for the address, which in some countries is used in place of the primary number. For example, in the U.K. an address may be “White House, High Street,” where “White House” is the building name instead of a primary number in an address such as “100 High Street.”

Delivery_Installation_Name The delivery installation city name, which is usually the same as the city name and (if it is the same) omitted from the address line.

Delivery_Installation_Qualifier Delivery Installation qualifier (for example, “Main” in C “RR 2 Vancouver Stn Main”).

Delivery_Installation_Type The delivery installation type.

English

PO: Post Office.

RPO: Retail Post Outlet.

STN: Station.

LCD: Letter Carrier Depot.

CMC: Community Mail Center.

CDO: Commercial Dealership Outlet.

French

BDP: Bureau de Poste.

CSP: Comptoir Service Postal.

SUCC: Succursale.

PDF: Poste de Facteurs.

CPC: Centre Postal Communautaire.

CC: Concession Commerciale.

Firm The firm name for the address.

Floor_Description The level description, such as “Floor.”

Floor_Number_High If the floor number is a range such as 20-22, LOW contains “20” and HIGH contains “22.” If the floor number is not a Floor_Number_Low range, both fields contain the floor number (for example, “20” and “20”).

Locality1 The city, town or suburb and any additional related information. Locality2

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 327 Field Description

Locality3

Locality4

Locality1_Official The locality name preferred by the postal authority.

Locality2_Official

Locality3_Official

Locality4_Official

Postcode The postal code.

USA: The five-digit ZIP Code and ZIP+4.

Postcode1 Australia: Four-digit postcode.

Canada: First three characters (FSA) of the postal code.

Global: Postal code.

USA: Five-digit primary postal code (ZIP Code).

Does not include the four-digit secondary postal code (ZIP4).

Postcode2 The secondary postal code.

Canada: The last three characters (LDU) of the postal code.

USA: The four-digit ZIP Code, which on a mail piece, this code follows the primary postal code with a hyphen placed between (for example, 54601-1234).

Primary_Name1 The street name description (typically a street name or box description).

Primary_Name2 Second street name and description, typically a street name or box description.

Primary_Name3 The street name, delivery mode, and so on.

Primary_Name4

Primary_Name_Full1 The primary name, primary type, primary prefix, and primary postfix. Primary_Name_Full2

Primary_Name_Full3 The primary name and primary type.

Primary_Name_Full4

Primary_Number_Description A description preceding the primary number. For example, Building or Blk.

Developer Guide 328 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

Primary_Number_Extra Extraneous data found near the parsed primary number, which either cannot be identified or does not belong in a standardized address.

Primary_Number_Full The primary number, primary number description, and primary number extra.

Primary_Number_High If the house number is a range such as 100-102, LOW contains “100” and HIGH contains “102.” If the house Primary_Number_Low number is not a range, both fields contain the house number (for example, “100” and “100”).

Primary_Postfix1 Abbreviated or non-abbreviated directional (for example, N, South, NW, SE) that follows a street name. Abbreviated or Primary_Postfix2 non-abbreviated is based on the standardization setting for Directional Style.

Primary_Prefix1 Abbreviated or non-abbreviated directional (N, South , NW, SE) that precedes a street name. Abbreviated or non- Primary_Prefix2 abbreviated is based on the standardization setting for Directional Style.

Primary_Side_Indicator Indicates if even, odd, or both values are valid. This applies to streets and PO Boxes.

E: The record covers the even-numbered value.

O: The record covers the odd-numbered value.

B: The record covers both the even- and odd-numbered values.

Primary_Type1 The type of primary name (rue, strasse, street, Ave, or Pl).

Primary_Type2

Primary_Type3

Primary_Type4

Region1 Returns the state, province, or region.

Secondary_Side_Indicator Indicates if even, odd, or both values are valid. This applies to floors and units.

E: The secondary record covers the even-numbered value.

O: The secondary record covers the odd-numbered value.

B: The secondary record covers both the even- and odd- numbered values.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 329 Field Description

Selection A unique index number that identifies this suggestion from the others in the returned list. The suggestion “selection” number ranges from 1 to the number of suggestion selections in the suggestion list.

Stairwell_Description Entrance or stairwell identifier for a building, such as, "Entrada" 1.

Stairwell_Name The name or number of an entrance or stairwell for a building, such as Entrada "1."

Unit_Description The unit description, such as “Apartment” or “Flat.”

Unit_Number_High If the unit number is a range such as 20-22, LOW contains “20” and HIGH contains “22.” If the unit number is not a Unit_Number_Low range, both fields contain the unit number (for example, “20” and “20”).

18.7 USA Regulatory Address Cleanse fields

The USA Regulatory Address Cleanse transform requires that you map fields on input and output.

Related Information

Input fields [page 330] Output fields [page 332]

18.7.1 Input fields

This table describes the input fields that you can use to map the input data file fields for the USA Regulatory Address Cleanse transform.

Field Description

Address_Line The delivery address line (for example, "123 Main Street, Unit 4")

Developer Guide 330 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

Check_Digit The check-digit for 11-digit delivery-point bar code. Applicable only if the trans­ form can make a full assignment.

The transform provides the check digit for a 5-digit bar code when a 5-digit as­ signment is possible, or the address is undeliverable. When the address is un­ assigned, the check digit is based on the unverified input Postcode1 (ZIP Code).

Country The country name. This transform does not attempt to make an assignment for addresses outside of the U.S. and its possessions, territories, and protector­ ates.

County_Code The three-digit county code. Numbers start at 001 within each state.

Data_Source_ID The input source or list identifier.

Use this field to identify the source of an input set or to identify the list that an input record belongs to in the case that multiple lists are present in the input.

Statistics are generated for each unique value in this field when you map the field in conjunction with enabling the Gather Statistics Per Data Source option in the Reports and Statistics group.

Delivery_Point The two-digit DPBC code.

Family_Name1 The family name (for example, Smith).

Firm The company name.

Given_Name1 The given name (for example, Robert).

Given_Name2 The second given name (for example, B.).

Lastline The last line delivery information that can include all or some of the following fields: Locality1, Region1, Postcode1, or Postcode2.

Locality1 The city, town, or suburb.

Locality2 The Puerto Rican urbanization information.

LOT The Line-of-Travel number.

LOT_Order The Line-of-Travel sortation:

A: Ascending

D: Descending

LOT codes are required for non-automated, CART presorting in Standard Mail, Enhanced Carrier Route Subclass.

Multiline1-12 A line from the input file which may contain data. The type of data in this line may vary from record to record.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 331 Field Description

Name The name of the person associated with the address.

Postcode_Full The complete postal code (ZIP10 with a hyphen; ZIP9 without a hyphen).

Postcode1 The five-digit primary ZIP Code. It does not include the four-digit ZIP4 Code.

Postcode2 The four-digit ZIP4 code. On a mail piece, this code follows the primary postal code with a hyphen placed between, for example, 54601-1234.

Postname The honorary postname (indicating certification, academic degree, or affiliation such as CPA) or maturity postname (indicating heritage such as Jr.).

Prename The prename (for example, Mr.).

Region1 The name of the state or province for this address.

SortCode_Route The four-digit carrier route number.

Stage_Address_Flag The USA Regulatory Address Cleanse information required from the stage file. For NCOALink stage testing only. Stage_Lastline_Flag

Stage_Name_Flag

Stage_Record_Key

Suggestion_Reply1-5 The index number that corresponds to a specific lastline suggestion or an ad­ dress line suggestion. These fields can also be used to input a street primary range or a street secondary range.

If you do not want to use a suggestion list, make the value of this field 0 and the suggestion list will be ignored.

Suggestion_Start_Selection The starting suggestion list number. If the field is left blank, the default value is 1.

Unit_Number The secondary address information (for example, the unit description and/or secondary number).

18.7.2 Output fields

This table describes the output fields that can be used for the USA Regulatory Address Cleanse transform.

Developer Guide 332 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

Complete, standardized primary and secondary address line. Style of suffixes, directional, and unit designators depends on how you define your options.

Address_Line Note If the output values don't fit within the length of the output field, then intelli­ gent truncation will occur.

Extraneous data found on the address line that cannot be identified or does not Address_Line_Remainder1 belong in a standardized address.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 333 Field Description

Address_Type (DELIVERY, The record-type indicator for the assigned address. Applicable for DELIVERY DUAL) and DUAL Generated Field Address Class.

The first character indicates the type of record in the address directory to which the address matched:

F: Firm

G: General delivery

H: High-rise apartment or office building

M: Military

P: Post office box

R: Rural route or highway contract

S: Street (usually, one side of one city block)

: Unassigned

The second character may be a D or . The D stands for default; it means that the transform detected, from the address directory, that a finer level of address assignment would be possible if further input information were avail­ able.

FD: Firm default. The transform did not assign a firm-level Postcode2, but could do so if given more or better firm information.

GD: General delivery default. Assign when General Delivery is the only primary name listed for the Postcode1.

HD: High-rise default. The transform assigned the Postcode2 for the entire building. Assignment at the unit, floor, or wing level is possible. Often caused by a suite or apartment number out of range.

RD: Rural route or highway contract default. The transform assigned the Post­ code2 for the entire route but could make a better assignment with the box number.

SD: Street default. Usually means that there is no Postcode2 for the building, so the transform had to assign the Postcode2 for the block.

UD: Unique default. Either the owner of the unique Postcode1 has not provided Postcode2 assignments, or the address could not be matched.

When the transform cannot assign an address, it will provide an address-type in­ dication based on the way that the input data was parsed. This process is not foolproof. The transform may indicate that a street, rural route, highway con­ tract, general delivery, or PO box was parsed.

Developer Guide 334 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

Alias_Type (OFFICIAL) The alias-type indicator for the assigned address. Applicable for OFFICIAL Gen­ erated Field Address Class.

Describes the input address, not the output address.

A: Input address matched an abbreviated street name.

B: The input address matched the high-rise alternate default base record.

C: Input street name is out of date; to get new street name, convert your record to the preferred alias.

H: Input address was an undesirable alternate, subject to conversion to a USPS preferred street address (high-rise alternate).

O: Input address was a street nickname or other alias.

P: Input address was a preferred alias.

: Input address was not an alias or was unassigned.

ANKLink_Return_Code ANKLink return code (Attempted Not Known). Valid values are:

77 : An ANKLink match was found. If NCOALink_Return_Code contains an A, 91, or 92, you may be able to obtain a new address from an NCOALink full service provider.

: No NCOALink lookup, or no ANKLink match. This will always be blank for full service providers.

Audit_Dropped_Secondary This field is used for audit testing. This field is also populated when an ANKLink match is made.

Audit _Prename These fields contain the name data used to make an NCOALink match. In some cases, the name in these fields is not the same as the input name (for example, if Audit _Given_Name1 a nickname, alternate spelling, or initial is used instead). Audit _Given _Name2 In the case of a firm match, these name fields will contain a split version of the Audit_Family_Name1 firm data.

Audit _Postname These fields are also populated when an ANKLink match is made.

Audit _Gender This field is used for audit testing. This field is also populated when an ANKLink match is made.

Audit _General This field contains information for Stage I and Stage II tests, specifically query data, result data, and hint bytes, as the USPS requires. Use this field for audit purposes only. This field is required for audits.

For more information about the content of this field, see the NCOALink User Technical Reference at .

This field can also contain ANKLink return codes.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 335 Field Description

Audit _Primary_Name This is the primary name that was sent to NCOALink for matching. This field is required for audits.

Audit _Range This is the range that was sent to NCOALink for matching. This field is required for audits.

Audit _Secondary_Range This is the secondary range that was sent to NCOALink for matching. This field is required for audits.

Audit _Trun­ This field is used for audit testing. This field is also populated when an ANKLink cated_Given_Name1 match is made.

Audit _Trun­ This field contains the truncated middle name, as stored in the NCOALink data. cated_Given_Name2 Use this field for audit purposes only. This field is required for audits.

Audit_Unit This is the unit data that was sent to NCOALink for matching. This field is re­ quired for audits.

Carrier_Route_Sort_Zone The carrier-route sort zone indicates eligibility for Standard Mail Automation En­ hanced Carrier Route:

A: Carrier route rates are available and merging is allowed.

B: Carrier route rates are available and merging is not allowed.

C: Carrier route rates are not available and merging is allowed.

D: Carrier route rates are not available and merging is not allowed.

CASS_Assignment_Type Indicates which option was used in making the assignment:

O: The non-CASS and DPV tie-break options were disabled or were not used to make an assignment.

1: Inexact Postcode1 move assignment.

2: Input Postcode2 assignment.

3: DPV tie-breaking was used to make this assignment.

: The transform could not assign an input address.

Developer Guide 336 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

CASS_Record_Type The record type necessary for posting on the CASS test. This field is populated for assigned records only. The valid record types include:

F: Firm

G: General Delivery

H: High-Rise

P: PO Box

R: Rural Route or Hwy contract

S: Street

Check_Digit Check digit for the delivery-point bar code, or for a five-digit bar code if a full postal code (ZIP+4) could not be assigned.

Count Specifies the suggestion count generated as the result of looking up the current record. A nonnegative value is output. If the current record does not end proc­ essing with a suggestion list needing resolution, then the value in this field is 0.

Country The country name.

County_Code Federal Information Processing Standard (FIPS) three-digit county code. Num­ bers are unique within states. You might use county information if you are pre­ paring a presorted periodicals mailing.

County_Name The fully-spelled county name.

Delivery_Point The two-digit DPBC code.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 337 Field Description

Delivery_Type Type of postal facility:

A: Airport Mail Facility (AMF)

B: Branch Office

C: Community Post Office (CPO)

D: Area Distribution Center (ADC)

E: Sectional Center Facility (SCF)

F: Delivery Distribution

G: General Mail Facility (GMF)

K: Network Distribution Centers (NDC)

M: Money Order Unit

N: City place name

P: Post Office (main)

S: Station

U: Urbanization (Puerto Rico only)

District District number for the U.S. House of Representatives.

DPV_CMRA The DPV Commercial Mail Receiving Agencies (CMRA) component that is gen­ erated for this record.

L: The address triggered DPV locking.

N: The address is not a CMRA.

Y: The address is a valid CMRA.

: A blank output value indicates that Enable_DPV_Validation is set to No, DPV processing is currently locked, or the transform cannot assign the input address.

Developer Guide 338 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

DPV_Footnote DPV footers are required for end-user CASS certification. The footers contain the following information:

AA: Input address matches to the ZIP+4 file.

A1: Input address does not match to the ZIP+4 file.

BB: All input address field values match to DPV.

CC: Input address primary number matches to DPV, but the secondary number does not match (the secondary is present but invalid).

F1: Input address matches to military address.

G1: Input address matches a general delivery address.

M1: Input address primary number is missing.

M3: Input address primary number is invalid.

N1: Input address primary number matches to DPV but the address is missing the secondary number.

P1: Input address is missing the RR or HC Box number.

P3: Input address is invalid PO, RR, or HC number.

RR: Input address matches to CMRA.

R1: Input Address matches to CMRA, but the secondary number is not present.

U1: Input address matches a unique address.

Note

The transform always posts the DPV footers in the same order and this field is not always 12 characters in length.

DPV_NoStats No Stat indicator. No Stat means that the address is a vacant property, it re­ ceives mail as a part of a drop, or it does not have an established delivery yet.

Y: Address is flagged as No Stat in DPV data.

N: Address is not No Stat.

: Address was not looked up.

Note

The US Addressing report contains DPV NoStats counts in the DPV Summary section.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 339 Field Description

DPV_Status The DPV status component that is generated for this record.

D: The primary range is a confirmed delivery point, but the secondary range is not available on input.

L: The address triggered DPV locking.

N: The address is not a valid delivery point.

S: The primary range is a valid delivery point, but the parsed secondary range is not valid in the DPV directory.

Y: The address is a confirmed delivery point. The primary range and secondary range (if present) are valid.

: A blank output value indicates that Enable_DPV_Validation is set to No, DPV processing is currently locked, or the transform cannot assign the input address.

DPV_Vacant Vacant address indicator. Y: Address is vacant.

N: Address is not vacant

: Address was not looked up.

Note

The US Addressing report contains DPV Vacant counts in the DPV Summary section.

Error Specifies the error status generated as the result of looking up the current re­ cord and performing a suggestion processing. Possible output values are 0-5.

0: There were no suggestion selection errors.

1: The necessary selection information was blank. For example, a lastline sug­ gestion list was generated, but there was no lastline selection input field data to make a choice.

2: The suggestion selection was invalid. For example, 8 was selected but there are only 5 suggestions.

3: The suggestion entry in the input field was invalid.

4: The suggestion range in the input field was invalid.

5: The suggestion secondary range in the input field was invalid.

Developer Guide 340 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

EWS_Match Returns the results of the EWS match:

T: True, the address is located in the EWS directory and is an EWS match.

F: False, the address is not located in the EWS directory.

: EWS is not enabled.

Extra1-10 Any non-address data found above the address data in the address block. Avail­ able only if the input data was presented through multiline fields.

Extraneous_Secondary_Ad­ Consists of the data from Extraneous_Secondary_Unit_Number and Extrane­ dress_Data ous_Secondary_Non_Postal respectively. Any additional # data is placed in the remainder or extra components.

Extraneous_Secon­ Extraneous data retained in this field is the best guess at Private Mail Box data, dary_Non_Postal based on the position in the address line and other information contained in the address.

Extraneous_Secon­ Extraneous data retained in this field is the best guess at secondary range data, dary_Unit_Number based on the position in the address line and other information contained in the address.

Fault_Code The fault code. Blank if the address was assigned.

Fault_Or_Status_Code The fault code if the address was unassigned; the status code if the address was assigned.

Finance_Area_Postcode The Finance Area Postcode is the lowest Postcode1 within a Finance Number. (Finance Numbers are currently used to link data to a single post office or post­ master.)

Firm Firm name. Do not use this field if the input was multiline, because if there is no firm name in the postal directory, the transform cannot reliably identify firm names from multilines.

If you retrieve the corrected component, the firm name is taken from the postal directory if found; otherwise, it's taken from the input record. Be aware that the postal directory might contain some unusual or shortened spellings that you may or may not find suitable for printing on mail pieces. If you prefer to retain your own firm data, retrieve the original component.

Foreign_Code Specifies whether the address is foreign or domestic:

F: Foreign addresses

Domestic U.S.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 341 Field Description

Full_Address The complete address line, including secondary address and dual address (street and postal).

Note

If the output values don't fit within the length of the output field, then intelli­ gent truncation will occur.

Intermediate_Codes Intermediate codes provide information that the USPS requires when you per­ form NCOALink certification or audit testing.

LACSCode LACS (Locatable Address Conversion System) indicator:

T: Address needs 9-1-1 conversion (from box to street address) and should be submitted to a LACS vendor.

F: Address does not need conversion.

Address was not assigned.

LACSLink_Indicator Returns the conversion status of addresses processed by LACSLink.

Y: Address converted by LACSLink (the LACSLink_Return_Code value is A).

N: Address looked up with LACSLink but not converted.

F: The address was a false-positive.

S: A LACSLink conversion was made, but it was necessary to drop the secon­ dary information.

No LACSLink lookup attempted.

LACSLink_Query Returns the pre-conversion address, populated only when LACSLink is turned on and a LACSLink lookup was attempted. This address will be in the standard Pub. 28 format. However, when an address has both a unit designator and sec­ ondary unit, the unit designator is replaced by the character "#".

No LACSLink lookup attempted.

Developer Guide 342 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

LACSLink_Return_Code Returns the match status for LACSLink processing:

A: LACSLink record match. A converted address is provided in the address data fields.

00: No match and no converted address.

09: LACSLink matched an input address to an old address, which is a "high-rise default" address; no new address is provided.

14: Found a LACSLink record, but couldn't convert the data to a deliverable ad­ dress.

92: LACSLink record matched after dropping the secondary number from input address.

No LACSLink lookup attempted.

Lastline Locality, region, and postal code together on one line.

Note

If the output values don't fit within the length of the output field, then intelli­ gent truncation will occur.

Locality1 Canada and USA engines: Locality preferred by the postal authority.

Other engines: City, town, locality, or suburb.

Note

If the output values don't fit within the length of the output field, then intelli­ gent truncation will occur.

Locality1_Alternate Preserves the input locality if it is recognized as a valid mailing locality for this address. Misspellings are corrected.

When the input city is not the correct city name for the address line, CASS rules require that the city name be converted based on the last line index. The con­ verted city name will be output through Locality1_Official. The input city name (capitalized and with spelling corrected as necessary) will be output through Lo­ cality1_Alternate, so you can retain it if you wish.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 343 Field Description

Locality1_LLIDX Yields a city name (locality1 name) that is more geographically precise than Lo­ cality1_Official.

LLIDX (last-line index) is a USPS number that ties a ZIP+4 record to a particular city, state, and ZIP.

Note

If the output values don't fit within the length of the output field, then intelli­ gent truncation will occur.

Locality1_Name The city, town, or suburb.

Locality names that are marked as invalid for mailing by the USPS are always preserved, never converted, regardless of the values set for the Preserve Place Name and Assign With Input Locality options.

Locality1_Official The standardized locality name. When the input city name is tagged by the USPS as invalid for mailing, this field will always yield a converted city name, no matter how the Preserve Place Name option is set.

Locality1_Official_ABBR The official USPS abbreviation of the city name, if one is available. This field will be blank if the full city name is less than 13 characters or if the full name is lon­ ger, but the USPS has not provided an official abbreviation.

Locality2 USA engine: Urbanization (Puerto Rican addresses only).

Other engines: Additional city, town, locality, or suburb information.

Locality2_Official Urbanization name; produced only when the address is in Puerto Rico.

LOT Line-of-travel number.

LOT_Order Line-of-travel sortation.

A: Ascending.

D: Descending.

Matched_Addressline_Indica­ The Match level indicator. tor T: Address line was matched to a ZIP+4 record.

F: Address line was not matched to a ZIP+4 record.

Matched_Lastline_Indicator Match level indicator.

T: Last line was matched to a City/ZCF record.

F: Last line was not matched to a City/ZCF record.

Developer Guide 344 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

Move_Effective_Date The date that the move is effective as indicated on the change of address card sent to the USPS in the format yyyymm. The yyyymm format is returned from the NCOALink directories and required by the USPS for audit purposes.

To use it in a function or post it to an output file, you’ll probably have to convert the format to mm/dd/yyyy first.

This field is also populated when an ANKLink match is made.

Move_Type Type of move record:

B: Business (matched by company name).

F: Family (matched by last name).

I: Individual (matched by first and last name).

This field is also populated when an ANKLink match is made.

Multiline1-12 A line which may contain any data. The type of data in this line may vary from record to record.

Note

For address data, If the output values don't fit within the length of the output field, then intelligent truncation will occur.

Name The name of a person associated with the address.

NCOALink_Hint_Byte This field is used for audit testing.

NCOALink_Return_Code This field shows NCOALink return codes. To populate this field, set the List Processing Mode to one of the three available options Change of Address, Statistics Only, or Return Codes Only.

A brief description of the return codes appears on the NCOALink Processing Summary report. To print more detailed return code descriptions on the report, enable the Generate Return Code Descriptions option under Report Options in the NCOALink block.

This field is also populated when an ANKLink match is made.

Non_CASS_Firm The firm match that was made by using the input ZIP+4 for missing or invalid firm information.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 345 Field Description

Non_CASS_Secondary_Ad­ The secondary address match that was made using the input ZIP+4 for missing dress or invalid secondary address information.

Note

If the output values don't fit within the length of the output field, then intelli­ gent truncation will occur.

Non_CASS_Unit The unit designator match that was made using the input ZIP+4 for missing or invalid unit designator information.

Non_CASS_Unit_Number The unit designator match that was made using the input ZIP+4 for missing or invalid unit designator information.

Non_Postal_Secondary_Ad­ The complete non-postal secondary address (PMB 10). Non-postal means that dress the mail is delivered through a private mailbox company rather than the USPS.

Non_Postal_Unit Non-postal unit designator (PMB). Non-postal means that the mail is delivered through a private mailbox company rather than the USPS.

Non_Postal_Unit_Number Non-postal secondary range (PMB number only, does not include designator). Non-postal means that the mail is delivered through a private mailbox company rather than the USPS.

Parsed_Firm If the change of address was made based on a firm (company) name, that firm name will be posted in this field.

This field is also populated when an ANKLink match is made.

Postal_Box_Number Post office box number.

Postcode_Full The complete ZIP10 with a hyphen.

Postcode_Full_No_Hyphen The complete ZIP9 without a hyphen.

Postcode_Type The type of ZIP Code that is assigned:

M: Military

U: Unique (specific to a university, large firm, or other institution).

Ordinary ZIP Code or the ZIP Code was not assigned.

Postcode1 The five-digit ZIP Code. Does not include the four-digit ZIP4.

Postcode1_Change_Ind Indicates whether the address is affected by postal code realignment.

T: True, the transform corrected the postal code (and, if applicable, the locality).

F: False.

The address was not corrected.

Developer Guide 346 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

Postcode2 Four-digit ZIP4 code. On a mail piece, this code follows the primary postal code, with a hyphen placed between, for example, 54601-1234.

Pre_Suitelink_Delivery_Point The numeric 2-digit code for the delivery point bar code that was generated be­ fore SuiteLink processing.

Pre_Suitelink_Postcode1 The ZIP Code that was assigned by the transform before SuiteLink processing.

5-digit ZIP Code: SuiteLink Retcode value is A.

: No ZIP Code assigned.

Pre_Suitelink_Postcode2 The ZIP+4 that was assigned by the transform before SuiteLink processing. The ZIP+4 is either for a high-rise default or street default record.

Pre_Suitelink_Unit_Descrip­ The unit designator that existed before SuiteLink processing. If this is blank, the tion transform did not assign any secondary information.

Pre_Suitelink_Unit_Number The secondary range information that existed before SuiteLink processing. If this is blank, the transform did not assign any secondary information.

Primary_Address Primary address line, such as the street address or post office box. Does not in­ clude secondary address information such as apartment.

If the Use USPS Primary Name Abbreviation is enabled, the software uses the USPS Primary Name abbreviation first. If the values don't fit within the length of the output fields, then the intelligent truncation will occur.

Primary_Name1 Street name description.

Note

If the output values don't fit within the length of the output field, then intelli­ gent truncation will occur.

Primary_Number The premise number.

Primary_Postfix1 Abbreviated directional (N, S, NW, SE) that follows a street name.

Primary_Postfix1_Long Fully-spelled directional, such as "North" or "South," that follows the street name.

Primary_Prefix1 Abbreviated directional (N, S, NW, SE) that precedes a street name.

Primary_Prefix1_Long Fully-spelled directional, such as "North" or "South," that precedes the street name.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 347 Field Description

Primary_Secondary_Address The primary address and secondary address on one line.

Note

If the output values don't fit within the length of the output field, then intelli­ gent truncation will occur.

Primary_Type1 Abbreviated street type, such as "St," "Ave," or "Pl."

Primary_Type1_Long Fully-spelled street type, such as "Street" or "Avenue."

QSS_Default Specifies whether the record qualified as a default match instead of qualifying as a match at higher level of assignment. Output values are:

T: True

F: False

RDI_Indicator The residential delivery indicator (RDI) shows whether the address is residential or nonresidential.

Y: Residential address

N: Nonresidential address

Region1 State, province, territory, or region.

Rural_Route_Box_Number The rural route box number.

Rural_Route_Number The rural route number.

Secondary_Address The building name, floor, and room number in one field.

Sortcode_Postcode The five-digit ZIP Code or two-digit zone.

Sortcode_Route The four-digit carrier route.

Stage_Test_Record This field is for stage testing only. It applies to NCOALink and CASS.

The USA Regulatory Address Cleanse transform will populate the values of this field automatically to match the format required for stage testing.

Developer Guide 348 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field Description

Status Specifies the suggestion status generated as the result of looking up the current record and performing suggestion processing.

A: Suggestion processing ended with an address suggestion list needing resolu­ tion.

L: Suggestion processing ended with a lastline suggestion list needing resolu­ tion.

N: There was no suggestion lists generated and not suggestion processing per­ formed.

Status

R: The primary range is invalid for the selected address suggestion.

S: The secondary range is invalid for the selected address suggestion.

U: The secondary address is invalid for the selected address suggestion.

Status_Code The status code. This field is blank if the address is unassigned.

Suggestion_List Contains all of the Suggestion List Component field values that you chose in the Suggestion List group of the USA Regulatory Address Cleanse transform.

SuiteLink_Retcode A: SuiteLink match—Secondary information exists and was assigned to this re­ cord as a result of SuiteLink processing.

00: No SuiteLink match—Lookup was attempted but no matching record could be found.

: A SuiteLink lookup was not attempted because one of the following was true:

● The address was not a high-rise default according to CASS. ● The address did not contain a firm.

Undeliverable_Indicator Indicates whether the record is a deliverable address:

T: The address is tagged by the USPS as unsuitable for mail delivery (for exam­ ple, a cemetery).

F: The address either was not matched to a ZIP+4 record or was matched to a record that indicates the address is suitable for mail delivery.

Unit_Description Unit description, such as "#", "Apartment", or "Flat."

Unit_Description_Directory Unit designator from ZIP+4 directory, or blank if none was found.

Unit_Number Unit number, such as 100 in "Apartment #100."

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 349 18.8 Match output fields

The following Match fields are generated by the Match transform per match level. Use these fields when you map your output schema.

Field name Description

Group_Number Specifies the records that belong to the same match group, which share the same group number. The group numbers start with the number 1. Unique records have a blank group number.

Match_Criterion Specifies the name of the criteria that made the decision ( if the Match_Type is R). Other­ wise, the field is blank.

Match_Level Specifies the name of the match level used.

Match_Score The Match_Score field outputs the following values:

● The criteria similarity score when the Match_Type is R. ● The total weighted score when the Match_Type is W. ● Blank if the record is a driver record (Match_Type of D) or if the records are unique.

Match_Status The values for the Match_Status field that appear in your output are:

D: This record is a driver in a match group.

P: This record is a passenger in a match group.

U: This is a unique record.

Match_Type Describes how each record is identified as a match. Possible values are:

: The record did not match any other record. It is a unique record.

D: The record was the driver record in the comparison process.

R: The record was identified as matching the driver record because one of the criteria met the Match_Score.

W: The record was identified as matching the driver because the total weighted score met the Weighted match score.

Input Source output fields

These fields are available only when you use an Input Source operation in your Match transform.

Field name Description

Source_Group_Name Specifies the name of the Source Group that the current record belongs to. If a re­ cord does not belong to any Source Group, then an empty string is output.

Source_Name Specifies the name of the input source that the current record belongs to.

Developer Guide 350 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field name Description

Source_Type Specifies the type of source that the current record belongs to.

N: The record comes from a Normal source.

P: The record comes from a Suppress source.

S: The record comes from a Special source.

If you also add a Group Statistics post-match group, and select the Generate source statistics from input sources option, the following output fields are available (these are in addition to the fields generated by the Group statistics operation).

Field name Description

Group_Source_Appearance Specifies the order the input source appears in this match group. The first input source appearing in the match group receives a value of 1, the second Input Source appearing will get 2, and so forth. Records that come from the same input source will receive the same Group_Source_Appearance value. Unique records have a value of 0.

Group_Source_Order Specifies the order of the records within the match group that have the same Group_Source_Appearance value. The first occurrence receives a value of 1, the second occurrence receives a value of 2, and so on. Unique records have a value of 0.

Group_Source_Type Specifies the type of source in the match group. This field will contain one of the following values:

M: The records come from more than one input source (excluding records from Special sources).

P: The records come from a Suppress source.

S: The records come from a Special source.

: The record is unique.

Multi_Source_Count Specifies the number of sources represented in the match group (excluding the Special sources and Suppress sources and Normal sources that follow a Suppress source in the match group). The values of this field could range from 0 to the num­ ber of records in the match group. Unique records receive a value of 1, if from a Normal list, and 0, if from a Special source or a Suppress source.

Source_Count Specifies the number of sources represented in the match group (regardless of the source types). The values of this output field could range from 1 to the number of records in the match group. Unique records will have a value of 1.

Source group output fields

These fields are available only if you use a Source Group operation in your Match transform.

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 351 Field name Description

Group_Source_Group_Ap­ Specifies the order the source group appears in this match group. The first source pearance group appearing in the match group receives a value of 1, the second source group appearing receives a value of 2, and so on. Records that come from the same source group will receive the same Group_Source_Group_Appearance value. Unique records receive a value of 0. Records in a match group not assigned to a source group will also get a value of 0.

Group_Source_Group_Or­ Specifies the order of the records within the match group that have the same der Group_Source_Group_Appearance value. The first occurrence receives a value of 1, the second occurrence receives a value of 2, and so on. Unique records receive a value of 0. Records in a match group not assigned to a source group will also get a value of 0.

Source_Group_Count Specifies the number of source groups represented in the match group. Records in the match group that do not belong to a source group are not counted. The val­ ues of this output field could range from 0 to the number of records in the match group. Unique records receive a value of 0 or 1.

Source_Group_Name Specifies the name of the source group that the current record belongs to. If a re­ cord does not belong to any source group, then an empty string is output.

Group statistics output fields

These fields are available only if you use a Group Statistics operation in your Match transform.

Field name Description

Group_Count Provides the total number of records in the match group.

Unique records have a value of 1.

Group_Order The master record receives a value of 1. Subordinate records receive a value of 2 through the number of records in the match group.

You may control the order by including a Group Prioritization in the Post Match Opera­ tions. Unique records have a value of 0.

Group_Rank Specifies whether the record is a master (M) or a subordinate (S). Unique records have an empty value.

Developer Guide 352 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality fields Field name Description

Group_Type Specifies whether a record contributed to the source count, and if so, whether there were other sources represented in the match group.

M: Multiple sources. Records from multiple sources are represented in the match group (records from Special sources are not counted toward a multiple-source match group).

S: Single source. All records in the match group come from a single source (records from Special sources are not counted toward a multiple-source match group).

P: At least one record from a Suppression source is included in the match group. (If the master record comes from a suppression source, then all records in the match group have a P. If the master record comes from a normal or special source, then the suppres­ sion record and all records after it have a P, but the records before the suppression re­ cord have a M or S.)

Source_Count Shows the number of logical sources in this match group.

Unique records have a blank value

Source_ID Specifies the logical source value. In most cases, this is the input source value. In other cases it is the default logical source value.

Source_Type_ID Specifies the type of logical source.

N: Normal source

P: Suppress source

S: Special source

Output flag selection output fields

The following output fields are available when you add an Output flag election operation to a Match transform.

Field Description

Select_Record Specifies whether the current record should be selected or not, based upon the settings in the object. Valid values of this output field are "Y" if the record should be selected and "N" if the record should not be selected

Developer Guide Data Quality fields © 2014 SAP SE or an SAP affiliate company. All rights reserved. 353 19 Data Quality codes

19.1 Information codes (Data Cleanse)

Information codes (assigned to the Info_Code output field) provide information about data that may be suspect and require a manual review. The output field contains one or more codes separated by a comma.

Date parse information

Information code format Description

D1## Date 1 parse level information

D2## Date 2 parse level information

D3## Date 3 parse level information

D4## Date 4 parse level information

D5## Date 5 parse level information

D6## Date 6 parse level information

Firm parse information

Information code format Description

F1## Firm 1 parse level information

F2## Firm 2 parse level information

F3## Firm 3 parse level information

F4## Firm 4 parse level information

F5## Firm 5 parse level information

F6## Firm 6 parse level information

Input field information

Information code format Description

I### Input field level information

Developer Guide 354 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Person parse information

Information code format Description

P1## Person 1 parse level information

P2## Person 2 parse level information

P3## Person 3 parse level information

P4## Person 4 parse level information

P5## Person 5 parse level information

P6## Person 6 parse level information

Phone parse information

Information code format Description

T1## Phone 1 parse level information

T2## Phone 2 parse level information

T3## Phone 3 parse level information

T4## Phone 4 parse level information

T5## Phone 5 parse level information

T6## Phone 6 parse level information

Record level information

Information code format Description

R### Record level information

Detailed descriptions of Data Cleanse information codes

Information Description code

R001 All input field data went to one or more Extra output fields. Nothing was parsed for the record.

R002 Parsed some of the input fields. One or more input fields went to the Extra output field.

R003 Parsed part of the input fields. Some of the record data went to the Extra output field.

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 355 Information Description code

R004 All input fields were parsed into appropriate output fields. No input data went to the Extra out­ put field.

R400 Data in Option_Content_Domain_Sequence overrode the content domain sequence transform option.

R405 Data in Option_Content_Domain_Sequence was not recognized as a content domain sequence. Data in Option_Country or data in content domain sequence transform option was used to de­ termine the content domain sequence.

R410 Data in Option_Output_Format overrode the output format Data Cleanse transform option.

R415 Data in Option_Output_Format was not recognized as a valid output format. Data in Op­ tion_Country or the output format transform option was used to determine the output format.

R420 Data in Option_Country was recognized as an ISO2 country code and overrode the content do­ main sequence transform option.

R421 Data in Option_Country was recognized as an ISO2 country code and overrode the output for­ mat transform option.

R425 Data in Option_Country was not recognized as an ISO2 country code. Option_Country data was used to override the content domain sequence or output format transform options, but failed:

● Attempted to use Option_Country to override the content domain sequence transform op­ tion. This occurs when the Option_Content_Domain_Sequence data is invalid (R405) or the data is not supplied (no status code is generated when this occurs). Data was parsed using the content domain sequence transform option. ● Attempted to use Option_Country to override the output transform option. This occurs when the Option_Output_Format data is invalid (R415) or is not supplied (no status code is generated when this occurs). Data was parsed using the output format transform option.

R428 The cleansing package does not recognize the country code provided in the Option_Country in­ put field.

P#01 The person# parse contained some data that was not found in the cleansing package. This in­ formation code does not report on title information, which is different from information code P#51.

P#02 The person# parse had a close firm parse. This is only applicable for Person_Firm multiline parse when using the Person_Firm multiline parser or Name_Firm_Line when the data came from the input field Name_Firm_Line.

P#03 The person# parse was a presumptive name parse (based on reasonable evidence).

P#04 The person# parse has no given name, or has a questionable given name.

P#05 The person# parse has no family name, or has a questionable family name.

P#51 The person# parse contained a title token that was not found in the cleansing package. This is different from information code P#01.

F#01 The firm# parse contained some data that was not found in the cleansing package.

Developer Guide 356 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Information Description code

F#02 The firm# parse had a close person parse. This is only applicable for Person_Firm multiline parse when using the Person_Firm multiline parser or Name_Firm_Line when the data came from the input field Name_Firm_Line.

F#03 The firm# parse was a presumptive firm parse (based on reasonable evidence).

D#01 Date# was not in the expected format.

D#02 Date# was converted from 2-digits to 4-digits. The century threshold transform option was ap­ plied.

D#03 Date# was in the Day_Month_Year format.

D#04 Date# was in the Month_Day_Year format.

D#05 Date# was in the Year_Month_Day format.

D#06 Date# was in the Year_Day_Month format.

D#07 Date# was in an ambiguous format. There is more than one possible format for the date. For example, 12/09/10 is valid for all of the formats, whereas 03/16/94 is valid only for the Month_Day_Year format.

T#01 The phone# parse did not have a North American area code.

T#02 The transform parsed phone data using a different country than the country listed in the Op­ tion_Country field.

T#03 The transform parsed phone data by prepending a country code to the incoming phone data.

I111-I116 All input data in Name_Line# went to one or more Extra output field. Nothing was parsed for this input field.

I131-I136 Parsed some input data in Name_Line#. Remaining data is in the Extra output field.

I151-I156 All input data in Title_Line# went to one or more Extra output fields. Nothing was parsed for this input field.

I171-I176 Parsed some input data in Title_Line#. Remaining data is in the Extra output field.

I311-I316 All input data in Name_Or_Firm_Line# went to one or more Extra output fields. Nothing was parsed for this input field.

I331-I336 Parsed some input data in Name_Or_Firm_Line#. Remaining data is in the Extra output field.

I351-I352 All input data in Firm_Line# went to one or more Extra output fields. Nothing was parsed for this input field.

I371-I372 Parsed some input data in Firm_Line#. Remaining data is in the Extra output field.

I511-I516 All input data in Date# went to one or more Extra output fields. Nothing was parsed for this input field.

I531-I536 Parsed some input data in Date#. Remaining data is in the Extra output field.

I711-I716 All input data in Email# went to one or more Extra output fields. Nothing was parsed for this in­ put field.

I731-I736 Parsed some input data in Email#. Remaining data is in the Extra output field.

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 357 Information Description code

I751-I756 All input data in Phone# went to one or more Extra output fields. Nothing was parsed for this input field.

I771-I776 Parsed some input data in Phone#. Remaining data is in the Extra output field.

I811-I816 All input data in SSN# went to one or more Extra output fields. Nothing was parsed for this input field.

I831-I836 Parsed some input data in SSN#. Remaining data is in the Extra output field.

I851-I856 All input data in UDPM# went to one or more Extra output fields. Nothing was parsed for this input field.

I871-I876 Parsed some input data in UDPM#. Remaining data is in the Extra output field.

I901-I912 All input data in Date# went to one or more Extra output fields. Nothing was parsed for this input field.

I951-I962 Parsed some input data in Multiline#. Remaining data is in the Extra output field.

19.2 Country ISO codes and assignment engines

The table shows which engine (if any) provides address correction. Additionally, it lists the 2-character and 3- character ISO code, the 3-digit ISO code, European Postcode prefix, and the level of assignment. The assignment level is based on the the reference data you own.

Table 14: Table Key Engine Assignment Level

Canada = C Country = C

Global Address = G Locality = L

USA = U Primary Name = Pn

Premise = Pr

Secondary = S

Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Afghanistan AF AFG 004 G C, L

Åland Islands AX ALA 248 AX G C, L

Albania AL ALB 008 G C, L

Algeria DZ DZA 012 G C, L

Developer Guide 358 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

American Sa­ AS ASG 016 U C, L, Pn, Pr, S moa G C, L

Andorra AD AND 020 AND G C, L

Angola AO AGO 024 G C, L

Anguilla AI AIA 660 G C, L

Antarctica AQ ATA 010 G C

Antigua and AG ATG 028 G C, L Barbuda

Argentina AR ARG 032 G C, L

Armenia AM ARM 051 G C, L

Aruba AW ABW 533 G C, L

Australia AU AUS 036 G C, L, Pn, Pr, S

Austria AT AUT 040 A G C, L, Pn, Pr, S

Azerbaijan AZ AZE 031 G C, L

Bahamas BS BHS 044 G C, L

Bahrain BH BHR 048 G C, L

Bangladesh BD BGD 050 G C, L

Barbados BB BRB 052 G C, L

Belarus BY BLR 112 G C, L

Belgium BE BEL 056 B G C, L, Pn, Pr

Belize BZ BLZ 084 G C, L

Benin BJ BEN 204 G C, L

Bermuda BM BMU 060 G C, L

Bhutan BT BTN 064 G C, L

Bolivia BO BOL 068 G C, L

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 359 Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Bonaire, Sint BQ BES 535 G C, L Eustatius and Saba

Bosnia and BA BIH 070 G C, L Herzegovina

Botswana BW BWA 072 G C, L

Bouvet Island BV BVT 074 G C

Brazil BR BRA 076 G C, L, Pn, Pr

British Indian IO IOT 086 G C Ocean Terri­ tory

British Virgin VG VGB 092 G C, L Islands

Brunei Darus­ BN BRN 096 G C, L salam

Bulgaria BG BGR 100 BG G C, L

Burkina Faso BF BFA 854 G C, L

Burundi BI BDI 108 G C, L

Cambodia KH KHM 116 G C, L

Cameroon CM CMR 120 G C, L

Canada CA CAN 124 C C, L, Pn, Pr, S

G C,L

Cape Verde CV CPV 132 G C

Cayman Is­ KY CYM 136 G C lands

Central Afri­ CF CAF 140 G C, L can Republic

Chad TD TCD 148 G C, L

Chile CL CHL 152 G C, L

Developer Guide 360 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

China CN CHN 156 G C, L, Pn, Pr

Christmas Is­ CX CXR 162 G C, L land (Included in the Aus­ tralia data package)

Cocos (Keel­ CC CCK 166 G C, L ing) Isles (In­ cluded in the Australia data package)

Colombia CO COL 170 G C, L

Comoros KM COM 174 G C, L

Congo, Re­ CG COG 178 G C, L public of

Congo, Demo­ CD COD 180 G C, L cratic Repub­ lic of

Cook Islands CK COK 184 G C, L

Costa Rica CR CRI 188 G C, L

Cote d'Ivoire CI CIV 384 G C, L

Croatia HR HRV 191 HR G C, L (Hrvatska)

Cuba CU CUB 192 G C, L

Curacao CW CUW 531 G C, L

Cyprus CY CYP 196 CY G C, L

Czech Repub­ CZ CZE 203 CZ G C, L, Pn, Pr lic (Czecho­ slovakia)

Democratic KP PRK 408 G C, L People's Re­ public of Ko­ rea

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 361 Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Denmark DK DNK 208 DK G C, L, Pn, Pr

Djibouti DJ DJI 262 G C, L

Dominica DM DMA 212 G C, L

Dominican DO DOM 214 G C, L Republic

Timor-Leste TL TLS 626 G C

Ecuador EC ECU 218 G C, L

Egypt EG EGY 818 G C, L

El Salvador SV SLV 222 G C, L

Equatorial GQ GNQ 226 G C, L Guinea

Eritrea ER ERI 232 G C, L

Estonia EE EST 233 EE G C, L, Pn, Pr

Ethiopia ET ETH 231 G C, L

Falkland Is­ FK FLK 238 G C, L lands

Faroe Islands FO FRO 234 FO G C, L, Pn, Pr (Included in the Denmark data package)

Federated FM FSM 583 U C, L, Pn, Pr, S States of Mi­ G C, L cronesia

Fiji FJ FJI 242 G C, L

Finland FI FIN 246 FI G C, L, Pn, Pr

France FR FRA 250 F G C, L, Pn, Pr, S

French Gui­ GF GUF 254 G C, L, Pn, Pr ana (Included in the France data package)

Developer Guide 362 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

French Poly­ PF PYF 258 G C, L, Pn, Pr nesia (In­ cluded in the France data package)

French South­ TF ATF 260 G C, L, Pn, Pr ern Territories

Gabon GA GAB 266 G C, L

Gambia GM GMB 270 G C, L

Georgia GE GEO 268 G C, L

Germany DE DEU 276 D G C, L, Pn, Pr

Ghana GH GHA 288 G C, L

Gibraltar GI GIB 292 G C, L

Greece GR GRC 300 GR G C, L, Pn, Pr

Greenland (In­ GL GRL 304 GL G C, L, Pn, Pr cluded in the Denmark data package)

Grenada GD GRD 308 G C, L

Guadeloupe GP GLP 312 G C, L, Pn, Pr (Included in the France data package)

Guam GU GUM 316 U C, L, Pn, Pr, S

G C, L

Guernsey (In­ GG GGY 831 G G C, L, Pn, Pr, S cluded in the United King­ dom data package)

Guatemala GT GTM 320 G C, L

Guinea GN GIN 324 G C, L

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 363 Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Guinea-Bis­ GW GNB 624 G C, L sau

Guyana GY GUY 328 G C, L

Haiti HT HTI 332 G C, L

Heard Island HM HMD 334 G C, L and McDonald Islands

Holy See (Vat­ VA VAT 336 G C, L, Pn, Pr ican City State) (In­ cluded in the data package)

Honduras HN HND 340 G C, L

Hong Kong HK HKG 344 G C, L

Hungary HU HUN 348 H G C, L, Pn, Pr

Iceland IS ISL 352 IS G C, L

India IN IND 356 G C, L

Indonesia ID IDN 360 G C, L

Iraq IQ IRQ 368 G C, L

Ireland, Re­ IE IRL 372 IRL G C, L public of

Islamic Re­ IR IRN 364 G C, L public of Iran

Israel IL ISR 376 G C, L

Isle of Man IM IMN 833 G C, L, Pn, Pr, S (Included in the United Kingdom data package)

Italy IT ITA 380 I G C, L, Pn, Pr

Developer Guide 364 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Jamaica JM JAM 388 G C, L

Japan JP JPN 392 G C, L, Pn, Pr, S

Jersey (In­ JE JEY 832 G C, L, Pn, Pr, S cluded in the United King­ dom data package)

Jordan JO JOR 400 G C, L

Kazakhstan KZ KAZ 398 G C, L

Kenya KE KEN 404 G C, L

Kiribati KI KIR 296 G C, L

Kuwait KW KWT 414 G C, L

Kyrgyzstan KG KGZ 417 G C, L

Lao People's LA LAO 418 G C, L Democratic Republic

Latvia LV LVA 428 LV G C, L, Pn, Pr

Lebanon LB LBN 422 G C, L

Lesotho LS LSO 426 G C, L

Liberia LR LBR 430 G C, L

Libyan Arab LY LBY 434 G C, L Jamahiriya

Liechtenstein LI LIE 438 FL G C, L, Pn, Pr (Included in the Switzer­ land data package)

Lithuania LT LTU 440 LT G C, L, Pn, Pr

Luxembourg LU LUX 442 L G C, L, Pn, Pr

Macao MO MAC 446 G C, L

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 365 Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Macedonia MK MKD 807 MK G C, L

Madagascar MG MDG 450 G C, L

Malaysia MY MYS 458 M G C,L

Malawi MW MWI 454 G C, L

Maldives MV MDV 462 G C, L

Mali ML MLI 466 G C, L

Malta MT MLT 470 G C, L

Marshall Is­ MH MHL 584 U C, L, Pn, Pr, S lands G C, L

Martinique MQ MTQ 474 G C, L, Pn, Pr (Included in the France data package)

Mauritania MR MRT 478 G C, L

Mauritius MU MUS 480 G C, L

Mayotte (In­ YT MYT 175 G C, L, Pn, Pr cluded in the France data package)

Mexico MX MEX 484 G C, L

Moldova MD MDA 498 MD G C, L

Monaco (In­ MC MCO 492 F G C, L, Pn, Pr cluded in the France data package)

Mongolia MN MNG 496 G C, L

Montserrat MS MSR 500 G C, L

Montenegro ME MNE 499 G C, L

Morocco MA MAR 504 G C, L

Developer Guide 366 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Mozambique MZ MOZ 508 G C, L

Myanmar MM MMR 104 G C, L

Namibia NA NAM 516 G C, L

Nauru NR NRU 520 G C, L

Nepal NP NPL 524 G C, L

Netherlands NL NLD 528 NL G C, L, Pn, Pr

New Caledo­ NC NCL 540 G C, L, Pn, Pr nia (Included in the France data package)

New Zealand NZ NZL 554 G C, L, Pn, Pr, S

Nicaragua NI NIC 558 G C, L

Niger NE NER 562 G C, L

Nigeria NG NGA 566 G C, L

Niue NU NIU 570 G C, L

Norfolk Island NF NFK 574 G C, L (Included in the Australia data package)

Northern Ma­ MP MNP 580 U C, L, Pn, Pr, S riana Islands G C, L

Norway NO NOR 578 N G C, L, Pn, Pr

Occupied Pal­ PS PSE 275 G C estinian Terri­ tory

Oman OM OMN 512 G C, L

Pakistan PK PAK 586 G C, L

Palau PW PLW 585 U C, L, Pn, Pr, S

G C, L

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 367 Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Panama PA PAN 591 G C, L

Papua New PG PNG 598 G C, L Guinea

Paraguay PY PRY 600 G C, L

Peru PE PER 604 G C, L

Philippines PH PHL 608 G C, L

Pitcairn PN PCN 612 G C, L

Poland PL POL 616 PL G C, L, Pn, Pr

Portugal PT PRT 620 P G C, L, Pn, Pr,S

Province of TW TWN 158 G C, L China Taiwan

Puerto Rico PR PRI 630 U C, L, Pn, Pr, S

G C, L

Qatar QA QAT 634 G C, L

Republic of KR KOR 410 G C, L Korea

Réunion (In­ RE REU 638 G C, L, Pn, Pr cluded in the France data package)

Romania RO ROU 642 RO G C, L

Russian Fed­ RU RUS 643 RUS G C, L eration

Rwanda RW RWA 646 G C, L

Saint Barthe­ BL BLM 652 G C, L lemy (In­ cluded in the France data package)

Saint Helena SH SHN 654 G C, L

Developer Guide 368 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Saint Kitts KN KNA 659 G C, L and Nevis

Saint Lucia LC LCA 662 G C, L

Saint Martin MF MAF 663 G C, L (Included in the France data package)

Saint Pierre PM SPM 666 G C, L, Pn, Pr and Miquelon (Included in the France data package)

Saint Vincent VC VCT 670 G C, L & Grenadines

Samoa WS WSM 882 G C, L

San Marino SM SMR 674 SMR G C, L, Pn, Pr (Included in the Italy data package)

Sao Tome and ST STP 678 G C, L Principe

Saudi Arabia SA SAU 682 G C, L

Senegal SN SEN 686 G C, L

Serbia RS SRB 688 G C, L

Seychelles SC SYC 690 G C, L

Sierra Leone SL SLE 694 G C, L

Singapore SG SGP 702 G C, L

Sint Maarten SX SXM 534 G C, L

Slovakia SK SVK 703 G C, L, Pn, Pr

Slovenia SI SVN 705 G C, L

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 369 Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Solomon Is­ SB SLB 090 G C, L lands

Somalia SO SOM 706 G C, L

South Africa ZA ZAF 710 G C, L

South Georgia GS SGS 239 G C, L and the South Sandwich Is­ lands

South Sudan SS SDN 728 G C, L

Spain ES ESP 724 E G C, L, Pn, Pr

Sri Lanka LK LKA 144 G C, L

Sudan SD SDN 736 G C, L

Suriname SR SUR 740 G C, L

Svalbard and SJ SJM 744 G C Jan Mayen (Included in the Norway data package)

Swaziland SZ SWZ 748 G C, L

Sweden SE SWE 752 S G C, L, Pn, Pr

Switzerland CH CHE 756 CH G C, L, Pn, Pr

Syrian Arab SY SYR 760 G C, L Republic

Tajikistan TJ TJK 762 G C, L

Thailand TH THA 764 G C, L

Togo TG TGO 768 G C, L

Tokelau TK TKL 772 G C, L

Tonga TO TON 776 G C, L

Trinidad and TT TTO 780 G C, L Tobago

Developer Guide 370 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Tunisia TN TUN 788 TN G C, L

Turkey TR TUR 792 TR G C, L, Pn, Pr

Turkmenistan TM TKM 795 G C, L

Turks and TC TCA 796 G C, L Caicos Islands

Tuvalu TV TUV 798 G C, L

Uganda UG UGA 800 G C, L

Ukraine UA UKR 804 UK G C, L

United Arab AE ARE 784 G C, L Emirates

United King­ GB GBR 826 GB G C, L, Pn, Pr, S dom

United Repub­ TZ TZA 834 G C, L lic of Tanzania

United States US USA 840 U C, L, Pn, Pr, S

G C, L

United States UM UMI 581 U C, L, Pn, Pr, S Minor Outly­ G ing Islands

U.S. Virgin Is­ VI VIR 850 U C, L, Pn, Pr, S lands G

Uruguay UY URY 858 G C, L

Uzbekistan UZ UZB 860 G C, L

Vanuatu VU VUT 548 G C, L

Venezuela VE VEN 862 G C, L

Viet Nam VN VNM 704 G C, L

Wallis and Fu­ WF WLF 876 G C, L, Pn, Pr tuna

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 371 Country 2-char ISO 3-char ISO 3-digit ISO European Engine Assignment name code code code Postcode level prefix

Western Sa­ EH ESH 732 G C, L hara

Yemen YE YEM 887 G C, L

Zambia ZM ZMB 894 G C, L

Zimbabwe ZW ZWE 716 G C, L

19.3 Information codes (Global Address Cleanse)

Information codes are four characters that explain why an address is unassigned. Information codes have six levels of classification:

● The 1000 level represents input record discrepancies. ● The 2000 level represents inconsistent last line information. ● The 3000 level represents inconsistent address information. ● The 4000 level represents inconsistent secondary address information. ● The 5000 level represents all other types of information. ● The 6000 level represents an unclassified error.

The table also shows that each information code is available based on the engine(s) that you enable.

● Canada (C) ● Global Address (G) ● USA (U) ● All engines - Consists of C, G, and U. ● Transform Level (T) - Information code does not come from a specific engine.

Use the following table to determine the code assigned to the Info_Code output field.

Info code Description Engine(s)

1020 Address validated in multiple countries. T

1030 No country found by Country ID or no country set for the record. T

1040 Address contains at least one character that is not part of the T character set supported by the engine.

1060 The country identified is not supported by any of the active en­ T gines.

1080 The script identified is not supported by any of the active engines. T

2000 Unable to identify locality, region, and/or postcode information on All engines input.

Developer Guide 372 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Info code Description Engine(s)

2010 Unable to identify locality and invalid postcode found. All engines

2020 Unable to identify postcode. Invalid locality is preventing a possible All engines address correction.

2030 Invalid locality and postcode are preventing a possible address All engines correction.

2040 Invalid postcode is preventing a locality selection. G, U

2050 Lastline matches are too close to choose one. G

3000 Locality, region and postcode are valid. Unable to identify the pri­ All engines mary address line

3010 Locality, region, and postcode are valid. Unable to match primary All engines name to directory.

3020 Possible primary name matches are too close to choose one. All engines

3030 Primary range is missing on input or not in the directory. All engines

3050 An invalid or missing primary type is preventing a possible address All engines match.

3060 A missing primary type and prefix/postfix (directional) is prevent­ G, U ing a possible address match.

3070 An invalid or missing prefix/postfix (directional) is preventing a All engines possible address match.

3080 An invalid or missing postcode is preventing a possible address All engines match.

3090 An invalid or missing locality is preventing a possible address G, U match.

3100 Possible address-line matches are too close to choose one. All engines

3200 The building name is missing on input or not in the directory. G

3220 Possible building names are too close to choose one. G

3250 The range or building name is missing on input or both are not in G the directory.

3110 Address conflicts with postcode and the same primary name has a C different postcode.

4000 The secondary information is missing on input or not in the direc­ All engines tory.

4010 Possible secondary address line matches are too close to choose All engines one.

4500 The organization is missing on input or not in the directory. G

4510 The organization's address is not in the directory. G

4520 Possible organization names are too close to choose one. G

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 373 Info code Description Engine(s)

5000 The address was valid, but the postal authority classified this ad­ G, U dress as undeliverable.

5010 The address does not reside in the specified country. C, U

5020 The entire input record was blank. T

5030 The country's postal authority will not permit assignment due to G violation of an assignment rule.

6000 Unclassified error. All engines

19.4 Status codes (USA Regulatory Address Cleanse)

When the transform assigns an address, it creates a status code (Status_Code output field). This code can tell you how the input address differs from the assigned address.

Digit Description

1st A: The transform truncated the address line to make it fit your field.

B: The transform truncated both the address line and the Locality1_Name.

C: The transform truncated the Locality1_Name to make it fit your field.

S: No truncation occurred.

Developer Guide 374 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Digit Description

2nd 0: Regarding the Locality1_Name, Region1, Postcode1, and Postcode2, there is no significant differ­ ence between the input data and the data that the transform assigned.

1: The transform assigned a different Postcode1.

2: The transform assigned a different Locality1_Name.

3: The transform assigned a different Locality1_Name and Postcode1.

4: The transform assigned a different Region1.

5: The transform assigned a different Region1 and Postcode1.

6: The transform assigned a different Locality1_Name and Region1.

7: The transform assigned a different Locality1_Name, Region1, and Postcode1.

8: The transform assigned a different Postcode2.

9: The transform assigned a different Postcode1 and Postcode2.

A: The transform assigned a different Locality1_Name and Postcode2.

B: The transform assigned a different Locality1_Name, Postcode1, and Postcode2.

C: The transform assigned a different Region1 and Postcode2.

D: The transform assigned a different Region1, Postcode1, and Postcode2.

E: The transform assigned a different Locality1_Name, Region1, and Postcode2.

F: The transform assigned a different Locality1_Name, Region1, Postcode1, and Postcode2.

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 375 Digit Description

3rd 0: Regarding the primary name, primary prefix/postfix, and primary type, there is no significant differ­ ence between the input and what the transform assigned.

1: The transform assigned a different primary type.

2: The transform assigned a different primary prefix.

3: The transform assigned a different primary prefix and primary type.

4: The transform assigned a different primary postfix.

5: The transform assigned a different primary type and primary postfix.

6: The transform assigned a different primary prefix and primary postfix.

7: The transform assigned a different primary prefix, primary type, and primary postfix.

8: The transform assigned a different primary name.

9: The transform assigned a different primary name and primary type.

A: The transform assigned a different primary prefix and primary name.

B: The transform assigned a different primary prefix, primary name, and primary type.

C: The transform assigned a different primary name and primary postfix.

D: The transform assigned a different primary name, primary type, and primary postfix.

E: The transform assigned a different primary prefix, primary name, and primary postfix.

F: The transform assigned a different primary prefix, primary name, primary postfix, and primary type.

Developer Guide 376 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Digit Description

4th 0: Regarding the county number, sort code route, delivery point, and unit description, there is no sig­ nificant difference between the input data and the data that the transform assigned.

1: The transform assigned a different unit description.

2: The transform assigned a different delivery point.

3: The transform assigned a different delivery point and unit description.

4: The transform assigned a different sort code route.

5: The transform assigned a different sort code route and unit description.

6: The transform assigned a different sort code route and delivery point.

7: The transform assigned a different sort code route, delivery point, and unit description.

8: The transform assigned a different county number.

9: The transform assigned a different county number and unit description.

A: The transform assigned a different county number and delivery point.

B: The transform assigned a different county number, delivery point, and unit description.

C: The transform assigned a different county number and sort code route.

D: The transform assigned a different county number, sort code route, and unit description.

E: The transform assigned a different county number, sort code route, and delivery point.

F: The transform assigned a different county number, sort code route, delivery point, and unit descrip­ tion.

5th 0: Regarding the LOT, LOT_Order, and Locality2_Official, there is no significant difference between the input data and the data that the transform assigned.

1: The transform assigned a different LOT.

2: The transform assigned a different LOT_Order.

3: The transform assigned a different LOT and LOT_Order.

4: The transform assigned a different Locality2_Official.

5: The transform assigned a different Locality2_Official and LOT.

6: The transform assigned a different Locality2_Official and LOT_Order.

7: The transform assigned a different Locality2_Official, LOT, and LOT_Order.

6th Always outputs a zero (0).

19.5 Quality codes (Global Address Cleanse)

Quality codes relay additional information about the quality of the address. There are six levels of quality codes based on these factors:

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 377 ● The country of the input data ● The engine used for processing ● The information code ● The status code if there is not an information code

Use the following table to determine the code assigned to the Quality_Code output field.

Quality code Description

Q1 Perfect address on input. All address components were validated without corrections.

Q2 Corrected address. All address components were validated after corrections were made.

Q3 Not all components of the address could be fully validated. There was insufficient information to make a final correction. However, the assessment of the record leads to the assumption that there is a "high" likelihood that this address is deliverable.

Q4 Not all components of the address could be fully validated. There was insufficient information to make a final correction. However, the assessment of the record leads to the assumption that there is a "fair" likelihood that this address is deliverable.

Q5 Not all components of the address could be fully validated. There was insufficient information to make a final correction. However, the assessment of the record leads to the assumption that there is a "small" likelihood that this address is deliverable.

Q6 Not all components of the address could be fully validated. There was insufficient information to make a final correction. However, the assessment of the record leads to the assumption that it is "highly unlikely" that this address is deliverable.

19.6 Status codes (Global Address Cleanse)

Status codes (assigned to the Status_Code output field) are five or six characters that represent the corrections made to the address during processing. The number of characters depends on the engine used for processing.

● The first character is always an S (for Status). ● The second character is associated with any last line corrections. ● The third character is associated with any address line corrections. ● The fourth character is associated with any secondary address line corrections. ● The fifth character is associated with changes to components that are not considered basic address components (Other Primary Address and Other Secondary Address). ● The sixth component indicates additional information about a record that is not related to a change in the address.

Second character

The value of the second character depends on corrections to the country, postcode, region, or locality.

Developer Guide 378 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Value Description

0 No significant difference between the input data and the corrected data.

1 Corrected country.

2 Corrected postal code.

3 Corrected country and postal code.

4 Corrected region.

5 Corrected country and region.

6 Corrected postal code and region.

7 Corrected country, postal code, and region.

8 Corrected locality.

9 Corrected country and locality

A Corrected postal code and locality.

B Corrected country, postal code, and locality.

C Corrected region and locality.

D Corrected country, region, and locality

E Corrected postal code, region, and locality.

F Corrected country, postal code, region, and locality.

Third character

The value of the third character depends on corrections to the pre/post directionals, primary type, primary name, and primary range.

Value Description

0 No significant difference between the input data and the corrected data.

1 Corrected pre/post directional.

2 Corrected primary type.

3 Corrected pre/post directional and primary type.

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 379 Value Description

4 Corrected primary name.

5 Corrected pre/post directional and primary name.

6 Corrected primary type and primary name.

7 Corrected pre/post directional, primary type, and primary name.

8 Corrected primary range.

9 Corrected pre/post directional and primary range.

A Corrected primary type and primary range.

B Corrected pre/post directional, primary type, and primary range.

C Corrected primary name and primary range.

D Corrected pre/post directional, primary name, and primary range.

E Corrected primary type, primary name, and primary range.

F Corrected pre/post directional, primary type, primary name, and primary range.

Fourth character

The value of the fourth character depends on corrections to the unit description, unit number, building name, stairwell name, and firm name.

Value Description

0 No significant difference between the input data and the corrected data.

1 Corrected unit description.

2 Corrected unit number.

3 Corrected unit description and unit number.

4 Corrected building name.

5 Corrected unit description and building name

6 Corrected unit number and building name.

7 Corrected unit description, unit number, and building name.

Developer Guide 380 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes Value Description

8 Corrected firm.

9 Corrected unit description and firm.

A Corrected unit number and firm.

B Corrected unit description, unit number, and firm.

C Corrected building name and firm.

D Corrected unit description, building name, and firm.

E Corrected unit number, building name, and firm.

F Corrected unit description, unit number, building name, and firm.

Fifth Character

The value of the fifth character depends on changes to components that are not considered basic address components (Other Primary Address and Other Secondary Address).

Other Primary Address components:

● Primary_Delivery_Mode ● Primary_Delivery_Number

Other Secondary Address components:

● Delivery_Installation_Name ● Delivery_Installation_Qualifier ● Delivery_Installation_Type

Value Description

0 No significant change between the input data and the corrected data.

1 Changed the Other Primary Address components.

2 Changed the Other Secondary Address components.

3 Changed the Other Primary Address and Other Secondary Address components.

Sixth Character

The value of the sixth character indicates additional information about a record that is not related to a change in the address.

Developer Guide Data Quality codes © 2014 SAP SE or an SAP affiliate company. All rights reserved. 381 Value Description

A Archived record used for assignment. Global Address engine.

B Base record assignment. Global Address engine (New Zealand).

C An Alias and a Bordering locality. Global Address engine (Australia).

D Deleted record. Global Address engine (Austria and Germany).

I Record ignored. Global Address engine (New Zealand).

L Large Volume Receiver (LVR). Global Address engine (Brazil).

U Unique address. Global Address Cleanse engine (New Zealand).

Developer Guide 382 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Data Quality codes 20 ShowA and ShowL (USA and Canada)

The Show programs are used for looking inside the postal directories to find answers to questions like these:

● Why did the transform standardized the address in an unexpected way? ● Why didn't the transform assign the address? ● Why did the transform’s error code indicate a flaw in the directory?

You can use ShowA to display or output information from the Address_1_Directory, and you can use ShowL to query the City_Directory and the Post_Code_Directory.

Note

Run the ShowA/ShowL utilities from a DOS command line using specific command-line options. These options are listed when you enter the following command:

Windows: cashowa /op

UNIX: cashowa -op

The Show configuration files

Each Show utility has its own configuration file. These files contain parameters for controlling how the program behaves.

Table 15: For USA addresses Utility Executable File name Location

ShowA showa.exe showa.cfg LINK_DIR\dataquality\urac

ShowL showl.exe showl.cfg LINK_DIR\dataquality\urac

Table 16: For Canadian addresses Utility Executable File name Location

ShowA cashowa.exe cashowa.cfg LINK_DIR\dataquality\gac

ShowL cashowl.exe cashowl.cfg LINK_DIR\dataquality\gac

Before you run the Show utilities, set both configuration files for the appropriate country directory. The configuration files contain instructions and detailed information about how to run the programs.

Note

Run the ShowA/ShowL utilities in the same directory as the ShowA/ShowL configuration files. You can change the location of the Show A/L executable files, however the utilities will not run if you did not accept the default location for the configuration files.

Developer Guide ShowA and ShowL (USA and Canada) © 2014 SAP SE or an SAP affiliate company. All rights reserved. 383 20.1 USA ShowA command line options

To view a summary of command line options, use this command:

Windows: showa /op

UNIX: showa -op

The following table lists the command line options and the command descriptions.

UNIX Windows Description

-a /a Appends information to the output file (if it already exists).

-alias /alias Includes preferred alias address lines.

-d /d Displays your query data on screen.

-fin /fin Expands the query to cover USPS finance area.

-op /op Displays the list of options (in this table).

-p /p Pauses screen display every 22 lines.

-2:dpbc /2:dpbc Enter the DPBC code for dpbc.

-4:zip4 /4:zip4 Enter the postcode2 forzip4.

-ad:file /ad:file Enter the Address-line dictionary and path name (addrln.dct) for file.

-c:cart /c:cart Enter the carrier route number for cart.

-f:file /f:file Enter the file path and name of the output file (to hold the information from the query instead of just displaying it on screen) for file.

-nd:file /nd:file Enter the National ZIP+4 directory path and name (zip4us.dir) for file.

-pre:dir /pre:dir Enter the primary prefix (N, NE, E, SE, S, SW, W, NW) for dir.

-pos:dir /pos:dir Enter the primary postfix (N, NE, E, SE, S, SW, W, NW) for dir.

-s:street /s:street Enter the street primary name (in quotes if multiple words) for street.

-sfx:suffix /sfx:suffix Enter the primary type (Ave, Blvd, St, Rd, and so on) for suffix.

-sh:range /sh:range Enter the street (primary) range high for range.

-sl:range /sl:range Enter the street (primary) range low or exact for range.

-t:type /t:type Enter the file type (dBASE3, ASCII, or DELIMITED) for type.

Developer Guide 384 © 2014 SAP SE or an SAP affiliate company. All rights reserved. ShowA and ShowL (USA and Canada) UNIX Windows Description

-u:urb idx /u:urb idx Enter the urbanization Index for urb idx.

-z:lo-hi /z:lo-hi Enter the low and high range for postcode1 for lo-hi.

-z:zip /z:zip Enter the postcode1 for zip.

20.2 Canada ShowA command line options

To view a summary of command line options, use this command:

Windows: cashowa /op

UNIX: cashowa -op

The following table lists the command line options and the command descriptions.

Windows UNIX Description

-a /a Appends query information to output file (if it already exists).

-d /d Displays query information on screen.

-op /op Displays this list of options.

-p /p Pauses screen display every 22 lines.

-accent /accent Displays French accented data.

-alias /alias Includes preferred alias address lines.

-adct:file /adct:file Enter the Address line dictionary path and name (addrlnca.dct) for file.

-cdct:file /cdct:file Enter the Casing dictionary path and name (pwcasca.dct) for file.

-cdir:file /cdir:file Enter the Canada city directory path and name (cancity.dir) for file.

-ci:index /ci:index Enter the City index for index.

-dir:dir /dir:dir Enter the primary post directional (N, NE, E, SE, S, SW, W, NW) for dir.

-di:index /di:index Enter the directory area index for index.

-f:file /f:file Enter the output file path and name for file.

-fdir:file /fdir:file Enter the Canada FSA directory path and name (canfsa.dir) for file.

Developer Guide ShowA and ShowL (USA and Canada) © 2014 SAP SE or an SAP affiliate company. All rights reserved. 385 Windows UNIX Description

-ldct:file /ldct:file Enter the Last Line Dictionary path and name (lastlnca.dct) for file.

-ndir:file /ndir:file Enter the Canada National directory path and name (canada.dir) for file.

-pc:pc /pc:pc Enter the postal code for pc.

-pdir:file /pdir:file Enter the Canada PCI directory path and name (canpci.dir) for file.

-s:street /s:street Enter the street primary name (in quotes if multiple words) for street.

-sfx:suffix /sfx:suffix Enter the primary type (Ave, Blvd, St, Rd, and so on) for suffix.

-sh:range /sh:range Enter the street (primary) range high for range.

-sl:range /sl:range Enter the street (primary) range low or exact for range.

-t:type /t:type Enter the file type (dBASE3, ASCII, or DELIMITED) for type.

20.3 Canada ShowL command line options

To view a summary of command line options in the DOS window, use this command:

Windows: cashowl /op

UNIX: cashowl -op

The following table lists the command line options and command descriptions.

UNIX Windows Description

-a /a Appends query information to an output file (if it already exists).

-accent /accent Displays French accented data.

-d /d Displays query data on screen.

-op /op Displays this list of options.

-p /p Pauses screen display every 22 lines.

-aci:index /aci:index Enter the Alternate city index for index.

-adct:file /adct:file Enter the Address-line dictionary path and name (addrlnca.dct) for file.

Developer Guide 386 © 2014 SAP SE or an SAP affiliate company. All rights reserved. ShowA and ShowL (USA and Canada) UNIX Windows Description

-cdct:file /cdct:file Enter the Casing dictionary path and name (pwcasca.dct) for file.

-cdir:file /cdir:file Enter the Canada city directory path and name (cancity.dir) for file.

-ci:index /ci:index Enter the City index for index.

-cn:city /cn:city Enter region1 name (in quotes if multiple words) for city.

-di:index /di:index Enter the Directory area index for index.

-dr:dir /dr:dir Enter which directory to search, City or ZCF for dir.

-f:file /f:file Enter the output file path and name for file.

-fdir:file /fdir:file Enter the Canada FS directory path and name (canfsa.dir) for file.

-fsa:fsa /fsa:fsa Enter the first part of the postal code for fsa.

-ldct:file /ldct:file Enter the Last line dictionary path and name (lastlnca.dct) for file.

-ndr:file /ndr:file Enter the Canada National directory path and name (canada.dir) for file.

-pc:pc /pc:pc Enter the postcode1 for pc

-pdr:file /pdr:file Enter the Canada PCI directory path and name (canpci.dir) for file.

-pr:prov /pr:prov Enter the province (use two letter abbreviations only) for prov.

-t:type /t:type Enter the output file type (dBASE3, ASCII, or DELIMITED) for type.

Developer Guide ShowA and ShowL (USA and Canada) © 2014 SAP SE or an SAP affiliate company. All rights reserved. 387 21 Glossary

Address Cleanse Transforms that produce a correct and complete standardized form of an input address. The transform can also assign codes for postal automation and append other useful address information. address line A line of data in an address that contains the primary and, possibly, secondary address. The primary address contains components such as the primary range, primary name, directionals (post- and pre-), and the suffix. The secondary address normally contains components such as the unit designator and the secondary range. aggregated data Data that results when a process combines elements. This data can be presented collectively or in summary form. alias Alternate form or name.

● Aliases are alternate forms that could potentially be matched to the word. For example, Robert is a personal name alias for Bob. Alias data is output in the Match_Std fields. ● In the Address Cleanse transforms, an alias is an alternative form of a primary address line. Aliases apply only to primary addresses (usually streets), not secondary addresses or last lines.

AMAS Australia Post’s Address Matching Approval System (AMAS). To receive postal discounts in Australia, you are required to file an AMAS report. application Another term for a software program. attribute A property created for a type of object.

Boolean An expression that defines a logical relationship between two or more items. The expression expression is either TRUE or FALSE. business rules Settings within your Data Quality transforms that explain how you want to process your data. These include things like telling the Global Address Cleanse transform how to case output data, or setting up match criteria for a matching process. case-sensitive Pertaining to the differentiation between upper-case and lower-case letters. A case-sensitive program differentiates between upper-case and lower-case letters when evaluating a text string.

CASS A United States Postal Service (USPS) certification that requires software vendors to go through a series of tests to prove that their software correctly codes addresses according to USPS requirements, and produces the required USPS reports. Long form: Coding Accuracy Support System classifications Indicators to Data Cleanse of the types of situations that apply to this word. For example, Hewlett is assigned the Firm_Name and Name_Weak_Family_Name classifications, because it can be used in both firm and personal names. command A directive given to a program to initiate an action. constant A data string that does not change from one record to the next. contribution A value you assign to a match criteria that represents the importance (or weight) you place value on that criteria’s data. For example, your organization may place a high degree of importance

Developer Guide 388 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Glossary on the customer number. For these types of criteria you would assign a higher contribution value to reflect a higher importance.

The contribution value is part of weighted scoring.

Data Cleanse A transform that identifies and isolates specific parts of mixed data, and then standardizes the data based on information stored in the parsing dictionary, business rules defined in the rule file, and expressions defined in the pattern file. data record A row of data that is constructed at runtime. The data remains in the form of the data record throughout the job. data salvage The process of temporarily copying data from a passenger record to the driver record after the two records are compared. The data that’s copied is data that is found in the passenger record, but is missing or incomplete (initials, for example) in the driver record. Data salvaging prevents blank matching or initials matching from matching records that you may not want to match.

Data Services A software system that allows users to build and execute applications with which they can create and maintain data warehouses. You can use Data Services to configure your Data Quality transforms and export the XML. delivery point A two-digit number derived from the primary range (house number). This number is used in code the generation of a DPBC barcode.

Delivery Point A technology that assists you in validating the accuracy of your address information with the Validation (DPV) USA Regulatory Address Cleanse transform. With DPV, you can identify addresses that are undeliverable as addressed and determine whether or not an address is a Commercial Mail Receiving Agency (CMRA). destination A location where you place your updated or “best” data when creating a best record. A record destination record can be either a master record, a subordinate record, or both in a match group. diacritical A character that contains an accent, dieresis (umlaut), tilde, cedilla, or other distinguishing character marks (for example, ä or Ç). You can choose to have standardized data with these types of characters. The application uses the Latin-1 code page for assigning these accents. dictionary Relational database that contains a lexicon of words and phrases that the data cleansing package and the Data Cleanse transform use to identify, parse, and standardize data. directional A component of the address line that indicates direction. For example, North in “211 N. 115th St.” discrete field Input or output data that has separate fields for each piece of information, such as addresses and names. discrete format Input source format in which pieces of data are parsed down to nearly the most distinct level. For example, a “first name” field would be discrete, whereas a “name” field that could contain first, middle, or last name information would not be discrete.

DPBC (Delivery A form of Postnet barcode, consisting of 62 bars and based on the combination of ZIP Code, Point Barcode) ZIP+4, DPBC, and a check digit. driver record A record that drives the comparison process. Driver records are part of a break group and are compared with passenger records to determine matches.

Developer Guide Glossary © 2014 SAP SE or an SAP affiliate company. All rights reserved. 389 The driver record is the first record in the break group. dual address A dual address occurs when a record contains two address lines. Two combinations are typical:

● PO box and street address:

1000 Main Street, Suite 51 PO Box 2342

● Rural route or Highway Contract and street address:

RR 1 Box 345 12784 Old Columbus Road dual names Two names included on an address line, for example, John and Jane Doe.

Early Warning A solution for matching valid delivery points that have been created between updates to the System (EWS) national ZIP+4 directory. EWS uses four months of rolling data found in an intermediate directory that is updated weekly with data from the USPS. eLOT Enhanced Line of Travel (eLOT) takes Line of Travel one step further in the presorting process. The original line of travel (LOT) narrowed down the mail carrier’s delivery route to the block face level (ZIP+4 level) by discerning whether an address resided on the odd or even side of a street or thoroughfare.

eLOT narrows the mail carrier’s delivery route walk sequence to the house (delivery point) level. This allows you to sort your mailings to a more precise level. fault code A numeric value that is assigned to a record after the USA Regulatory Address Cleanse transform validation process that signifies that the particular record was not successfully validated. Each numeric value represent a different type of fault.

FSA (Forward The first three characters of a Canadian alphanumeric postal code. For example, K1A in the Sortation Area) postal code for Canada Post’s Ottawa headquarters, K1A 0B1. gathering Recombines terms that belong together, such as alphanumeric terms that you would look up together in the dictionary. For example, if Data Cleanse breaks 1st into "1" and "st", then gathering recombines them to 1st. gender A code that indicates the likelihood of a record being a certain gender. This code is derived from the name and has five possible values: strong male, strong female, weak male, weak female, ambiguous, and unassigned. For example, a record marked as “strong male” indicates a high likelihood that the person is male. generated field A field that is generated on output by a transform. For example, a postcode field generated by the Global Address Cleanse transform.

GeoCensus A directory that contains latitude, longitude, census tract, and block information. That information sets the stage for mapping, demographic marketing, and other applications of your address data. hybrid format A format for records in which some fields are discrete, whereas others are in a multiline format. intersource Match between records of different sources. match

Developer Guide 390 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Glossary intrasource Match between records within a source. match key A value used to identify a record in a database. lastline The lastline of an address contains components such as the locality, region, and postcode (and it may contain the country name). line of travel A sorting sequence in which ZIP+4 codes are arranged in the order that they are served by (LOT) the mail carrier. LOT sequencing is required for some bulk mailing discounts. locality A part of the address line of a record. Locality most often refers to the city or town. In some countries, such as the United Kingdom, locality can extend to include district.

Local Delivery The last three characters of a Canadian alphanumeric postal code. For example, 0B1 in the Unit (LDU) postal code for Canada Post’s Ottawa headquarters, K1A 0B1.

Locatable A database of addresses that have been permanently converted, usually due to 911 Address emergency system implementation. The changes often consist of conversion from rural-style Conversion addressing to standardized, city-style addressing, or renumbering of existing city-style System (LACS) addresses. mail piece unit Typically referred to as a version identifier for printers, it represents the unique characteristics of a portion of a mailing. Every segment within a Mail.dat must have at least one mail piece unit. mapped field A field in a specific transform, for which it has been defined which field it should read from upstream transforms. master record The first record in a match group. You can control which record is the master record by using the Group Prioritization operation in the Match transform. match criteria A group of options that determine the rules for matching on particular data. match group A group of records found to be matching with each other. A match group consists of a master record and subordinate records. matching record A group of records found to be matches based on the criteria and business rules you choose. The records do not necessarily have the same data. match level A Match level designates the level in "hierarchically" type matching. One Match set can have one or more match levels. Duplicates that are found at one level are passed to the next level, where they are compared based on that level’s keys, and so on. For example, you could use multiple match levels if you wanted to detect duplicates at the household (residence), family, and individual level.

The order of the match levels is important because duplicates are found at each level, and only the results are made available for the next level. Usually, you will define your “broadest” match levels first, followed by more specific match levels. match set A group of criteria used to perform matching on your data.

A typical setup might have only select data reaching each match set for comparison. For example, you might want to exclude blank SSNs (Social Security Numbers), certain foreign addresses, and so on from reaching a particular match set.

Developer Guide Glossary © 2014 SAP SE or an SAP affiliate company. All rights reserved. 391 metadata In the software, information acquired and maintained to describe tables in source and target databases. This information includes the names of tables and their columns, and the data types of the columns.

In general, metadata typically includes a description of data models, a description of the layouts used in database design, the definition of the system of record, the mapping of data from the system of record to other places in the environment, and specific database design definitions. multiline The multiline format is a database record format in which address data is not consistently located in the same arrangement in all records. That is, data items “float” among fields. For example, an input source may have fields named Line1, Line2, Line3, and Line4 that contain various categories of name and address data, as well as non-address data. multi-source Records that appear on two or more sources. For example, let’s say you’re bringing together customer sources from several direct marketers or publishers. Your best prospects may be the people whose names appear on two or more sources, indicating they may be most receptive to your offer. normal source A source of records that the application should consider to be good, eligible records in a matching process.

North American Telephone numbering plan shared by 19 North American countries. These countries include Numbering Plan the United States and territories, Canada, Bermuda, Anguilla, Antigua & Barbuda, the (NANP) Bahamas, Barbados, the British Virgin Islands, the Cayman Islands, Dominica, the Dominican Republic, Grenada, Jamaica, Montserrat, St. Kitts and Nevis, St. Lucia, St. Vincent and the Grenadines, Trinidad and Tobago, and Turks & Caicos. null The absence of a value within a database field for a given record. It does not mean zero because zero is a value. option Business rules that can be set for a Data Quality transform that specify how you want to process your data. Each Data Quality transform has a different set of available options. option group Contain a set of options that allow you to set different business rules for a transform. other source In a Match transform, a source of records that should be treated as transparent, such as seed sources. They are not counted in determining how to characterize a match group—for example, multi-source or single-source. For example, some mailers use a seed source of potential buyers who report back to the mailer when they receive a mail piece so that the mailer can measure delivery. passenger record The records that are compared against driver records in a break group. After a driver record has been compared with every passenger record in a break group, a passenger record can become the new driver record in the break group, or it can be found to be a match with a driver record. At this point it is taken out of the comparison process. pattern file User-defined patterns are stored in a pattern file. The pattern file is a plain text file and can be edited in any text editing program. The pattern file is used by the Data Cleanse transform.

PMB (Private Private mail boxes are like post-office boxes but they are hosted by private companies. The mail box) USA Regulatory Address Cleanse and the Global Address Cleanse transforms can recognize certain forms of PMB data when it appears in an address line. postal address A delivery address that is a rural route or box number.

Developer Guide 392 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Glossary postal code A system of letters and/or digits used for sorting mail. Examples include the ZIP Code used in the United States and the alphanumeric FSA LDU system used in Canada.

Postcode2 The secondary part of a postal code. For example, in the United States, a postcode is composed of two parts (54601-4051). The first five digits are followed by a hyphen and a four-digit code. The four-digit code is the Postcode2 for a US postcode. postcode move A valid postcode that has been split or moved, so only a portion of the area that had been covered by the one postcode now has two or more postcodes, including the original one, for the same area. primary entry A word or phrase in the dictionary that the data cleansing package and Data Cleanse transform use to identify parse, and standardize data. primary key A column that is guaranteed to contain unique values, and whose values identify all of the rows in a table. reference file A file of address data used by Data Quality Management SDK to match, assign, standardize, and verify addresses. Reference files are also referred to as postal directories. These files have a .dir extension. rule matching Matches the token classifications against defined rules. secondary Assists Data Cleanse in determining how to process the word when it is used in different information ways. Secondary information can include how Data Cleanse will standardize the output data for the word or alternate forms that could potentially be matched to the word .

SERP Canada Post Corporation’s Software Evaluation and Recognition Program. Data Quality is certified under this program, allowing you to receive postage discounts for mailings to and within Canada. similarity score A percentage that indicates how much two fields or values are considered alike. This percentage is calculated by the application after the comparison process. For example, Ron and Rob are considered 67% alike because two of the three characters are alike.

Similarity scores are used in a number of situations— not just in the Match transform. For example, they can be used to determine which suggestions to return for suggestion lists.

The similarity score is not always a direct result of a one-to-one comparison; it can be altered by some options, such as those defined in the Match transform, for example. snowbird A casual term to describe someone who has multiple residences. This term is derived from individuals who reside in a cooler-climate region during the summer, and relocate to a home in a warmer-climate region during the winter. source For the Match transform, the grouping of records on the basis of some data characteristic that you can identify. A source might be all records from one input file, or all records that contain a particular value in a particular field. Sources are abstract and arbitrary—there is no physical boundary line between sources. Source membership can cut across data sources as well as distinguish among records within a data source, based on how you define the source. source group A group of sources that you can use to prepare a second set of match statistics, combining the statistics for two or more regular sources. For example, suppose you define five sources —two house sources and three rented sources. You would get match statistics for each individual source. But suppose that you also wanted a summary for the house sources and a

Developer Guide Glossary © 2014 SAP SE or an SAP affiliate company. All rights reserved. 393 summary for the rented sources. You could create two source groups—one for the house sources and one for the rented sources.

Source groups affect only the way that match statistics are reported. They do not affect matching. source record The location where the data you want to use to update or create your best record with resides. A source record can be the master or subordinate record of a match group. standards Define how Data Cleanse will standardize capitalization or other output formatting on data. street address A delivery address that is the street name and house number. subordinate Records that are part of a match group, and are found to be matches with (and subordinate record to) a master record. Subordinate records can contain data that may be used to update a master record and, thus, create a best record. suggestion lists Normally, when an address cleansing transform looks up an address in the postal directories, it finds one matching record. Sometimes, due to incomplete information, there may be two or more records (or suggestions) in the postal directories that could possibly be the correct record. Suggestion lists provide you with a list of “matching” addresses, so that you can choose which is the best address. suppression A source that contains records of information that should be excluded from other output source destinations. The records in the suppression source are used for matching in other sources. The records that match the suppression source could then be removed from further processing.

For example, suppression sources may be your own bad-account file or no-mail sources provided by the government or direct-marketing association (DMA) to prevent wasted mailings and offending consumers. table A database table that the software reads data from or loads data into. The path and mechanisms for reading and loading data and apportioning the data among rows and columns are defined in the datastore that the table is associated with. Writing a data set to a database table means sending a combination of rows with appropriate operation code to the database table. territory The locale value for a geographical location (usually the country) where a locale language is used. The paring of a language with a territory determines factors such as date format, time format, decimal separator, currency format, and so on. tokenization Assigns specific meanings to each of the pieces that result from word breaking. Data Cleanse looks up each individual input word in the dictionary. A list of tokens is created using the classifications associated with each word in the dictionary. transform A step in a Data Quality job that acts on a data set.

Unicode A standard that was designed to create a universal character set. It accomplishes this by providing a unique number for every character in every language.

The Unicode Standard describes more than 50,000 characters, including all the characters of the common character sets in use when Unicode was established around 1990, as well as many that have been added since then. Unicode is an open character set, meaning it can continue to incorporate characters as needed.

Developer Guide 394 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Glossary Unicode can handle letters, punctuation, and technical symbols—regardless of platform, program, writing system, or language. unique identifier In a Data Quality transform, an ID that is unique to a record or group of matching records. It is sequential, static, and will not change when records are updated or re-processed through the application. unique record Records that do not have any matching or subordinate records and, therefore, do not belong to any match group after the matching process is complete. weighted scoring A method of comparison that provides you with a greater degree of control in the matching process. This method allows you to use contribution values to place more or less importance on various match criteria. word breaking Breaks the input line down into smaller, more usable pieces. By default, Data Cleanse breaks an input line on white space, punctuation, and alphanumeric transitions. Terms such as 20GB, 4G, 1st, and U2 each break into two tokens at the alphanumeric transition. For example, "20GB" breaks into "20" and "GB" tokens.

XML Extensible Markup Language. This markup language is like HTML (Hypertext Markup Language) in that it specifies a standard with which you can define your own markup languages with their own sets of tags. XML allows you to define various tags with various rules, such as tags that represent business rules, tags that represent data description, or tags that represent data relationships.

XML Schema The XML format used by Data Quality Management SDK to support message processing that includes Web Services. XML Schemas describe the data structure of an XML file or message. You can use the same XML Schema to describe multiple XML sources or targets. XML Schema properties include: Name, Description, Imported from, Root element name, and Namespace.

Z4Change The Z4Change directory lists all the ZIP and ZIP+4 Codes in the country. A record in this file is tagged if it has changed within the last 12 months. The change might be a postal-code change (ZIP, ZIP+4, or CART), or even a change in the standardized form of the address-line or city name.

ZCF The ZIP-City File directory that is used by the USA Regulatory Address Cleanse transform when processing data from the U.S.

ZIP+4 A nine-digit number, consisting of the ordinary ZIP Code and a four-digit, add-on code.

ZIP Code ZIP is an acronym that stands for "Zone Improvement Plan." This is a 3-, 5-, or 9-digit number that represents a geographic region of the United States. The ZIP Code is important in determining entry eligibility and presort containerization. Note that this code is different from a facility code. zone The ZIP-City File directory that is used by the USA Regulatory Address Cleanse transform when processing data from the U.S.

Developer Guide Glossary © 2014 SAP SE or an SAP affiliate company. All rights reserved. 395 Important Disclaimers on Legal Aspects

This document is for informational purposes only. Its content is subject to change without notice, and SAP does not warrant that it is error-free. SAP MAKES NO WARRANTIES, EXPRESS OR IMPLIED, OR OF MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.

Coding Samples

Any software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP intentionally or by SAP's gross negligence.

Accessibility

The information contained in the SAP documentation represents SAP's current view of accessibility criteria as of the date of publication; it is in no way intended to be a binding guideline on how to ensure accessibility of software products. SAP specifically disclaims any liability with respect to this document and no contractual obligations or commitments are formed either directly or indirectly by this document.

Gender-Neutral Language

As far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as "sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible.

Internet Hyperlinks

The SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. Regarding link classification, see: http:// help.sap.com/disclaimer.

Developer Guide 396 © 2014 SAP SE or an SAP affiliate company. All rights reserved. Important Disclaimers on Legal Aspects Developer Guide Important Disclaimers on Legal Aspects © 2014 SAP SE or an SAP affiliate company. All rights reserved. 397 www.sap.com/contactsap

© 2014 SAP SE or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies. Please see http://www.sap.com/corporate-en/legal/copyright/ index.epx for additional trademark information and notices.