<<

Trillium Software System™ Batch User’s Guide

Investigator Converter Global Data Router Customer Data Parser Business Data Parser Window Key Generator Matcher Data Reconstructor Create Common Module

Version 7.16 May 2017 This manual, as well as the software described in it, are furnished under license and may be used only in accordance with the terms of such license. The content of this manual is furnished for informational purposes only, is subject to change without notice, and should not be construed as a commitment by Trillium Software. Trillium Software assumes no responsibility or liability for any errors or inaccuracies that may appear in this manual.

The customer shall not disclose, copy, reproduce, distribute, or display any portion of the Trillium Software System or this manual in any form to any third person without the prior written consent of Trillium Software, nor allow third parties to do the same. The customer shall keep the Trillium Software System and all confidential information in the strictest confidence.

Trillium Software System Batch User’s Guide 52617

Trillium Software, Inc. owns all rights in and to the marks "TRILLIUM SOFTWARE" and "TRILLIUM SOFTWARE SYSTEM," which marks are registered in various countries throughout the world (including, without limitation, the Patent and Trademark Office).

All other trademarks are the property of their respective owners.

© 2008-2017 Trillium Software, Inc. Table of Contents

Trillium Software System® Batch User’s Guide

CHAPTER 1 Investigator Data Process Flow ...... 1-2 Investigator Order of Operations...... 1-3 Investigator Parameters...... 1-4 Sample Investigator Parameter File ...... 1-6 Investigator Parameter Descriptions ...... 1-7 Sample Investigator Output Statistics ...... 1-29 Error Messages ...... 1-30 Running the Investigator...... 1-33

CHAPTER 2 Input/Output Resources ...... 2-3 Converter Parameters ...... 2-4 Using Record Select and Bypass Functionality with the Converter ...... 2-8 Rules File Subset Parameters and Syntax Rules...... 2-9 Comparison Operators ...... 2-10 Rule File Entries Examples ...... 2-11 Converter Parameter Details ...... 2-13 Running the Converter on UNIX and 32-bit PCs ...... 2-50 IBM Mainframe Converter Sample JCL ...... 2-51 Converter Error Messages ...... 2-52

CHAPTER 3 Global Data Router Design Flow ...... 3-2 Input/Output Resources ...... 3-2 Parameter Syntax ...... 3-3 Global Data Router Parameters ...... 3-4 Global Data Router Rules File...... 3-8 Rules File Parameters...... 3-9

Trillium Software System™ Batch User’s Guide iii Table of Contents

Running the Global Data Router on UNIX and 32-bit PCs ...... 3-24 IBM Mainframe Execution...... 3-25 Sample Rules File...... 3-27 Sample Log File Output...... 3-29 Testing the Router ...... 3-29 Global Data Router Error Messages ...... 3-30

CHAPTER 4 Customer Data Parsing Logic Flow ...... 4-2 Parser Functional Capabilities...... 4-4 Identifying Business versus Personal Names ...... 4-5 Comma-Reversed Names...... 4-6 Customer Data Parser Process Flow ...... 4-7 Customer Data Parser Input ...... 4-8 Customer Data Parser Output ...... 4-10 About Data Dictionary Language Files ...... 4-11 Special DDL Fields...... 4-12 Customer Data Parser Parameters ...... 4-14 Sample Parser Parameter File ...... 4-38 Sample Parser Parameter File ...... 4-40 pfprsdrv.par Parameters ...... 4-41 *CHANGE_DDNAME (‘change’ functionality) ...... 4-46 **JOIN_LINES (‘join’ functionality)...... 4-47 Name and Record Generation...... 4-49 Running the Customer Data Parser on UNIX and 32-bit PC Platforms ...... 4-52 IBM Mainframe Parser Sample JCL...... 4-53 Customer Data Parser Error Messages ...... 4-55 About Window Keys ...... 4-58 Multiple Customer Data Parsers ...... 4-58 How Multiple Parsers Work ...... 4-58 Line Pattern Identification Codes ...... 4-59 Pattern Leveling...... 4-60 Name Pattern Depth Levels ...... 4-61

iv Trillium Software System™ Batch User’s Guide Table of Contents

Street Pattern Depth Levels ...... 4-63 Customer Data Parser Output ...... 4-64 Customer Data Parser Repository (PREPOS) Record ...... 4-64 Parser Repository (PREPOS) Layout ...... 4-65 Complete PREPOS Layout ...... 4-66 Customer Data Parser Log File ...... 4-92 Sample Sorted Log File ...... 4-92 Bad Name, Street Patterns and City Problem Section...... 4-97 Parser Scrub Report ...... 4-99 Customer Data Parser Display (Scrub) Report File ...... 4-101 Section 1: Record Information ...... 4-101 Section 2: Street Line Information...... 4-102 Section 3: Additional Street Line Information ...... 4-102 Section 4: Geography Line Information ..4-102 Section 5: Name Line Information ...... 4-102 Detail File...... 4-104 Using the Palog Analyzer...... 4-105 Corrective Action Examples ...... 4-106 Name Pattern Addition...... 4-106 Street Pattern Addition ...... 4-106 U.S. City Problem Addition...... 4-106 Post Town Problem Addition ...... 4-107 Word/Phrase Addition ...... 4-107 Using the Parser Display Program ...... 4-107 Rerunning Table Maintenance After Tuning .....4-107 Review Codes and Review Groups ...... 4-108 Review Group Hierarchy...... 4-113 Statistics Report ...... 4-116 Output Display Program (CFPRSDSP) ...... 4-117 Parser Display Program Parameters ...... 4-118 Customer Data Parser Display Report Description ...... 4-119 Line Pattern Identification on the Display Report...... 4-123

Trillium Software System™ Batch User’s Guide v Table of Contents

CFPRSDSP Program Error Messages...... 4-124 Running the Parser Display Program on UNIX and 32-bit PC Platforms ...... 4-125 IBM Mainframe Sample Parser JCL...... 4-126

CHAPTER 5 Business Data Parsing Logic Flow ...... 5-2 Business Data Parser Functions...... 5-3 Primary Functions ...... 5-3 Business Data Parser Process Flow...... 5-4 Input and Output Resources ...... 5-4 DDL Requirements ...... 5-5 ORG_RECORD Field...... 5-6 Other Special DDL Fields ...... 5-6 DDL Specifics...... 5-7 Driver Parameter File ...... 5-7 Business Data Parser Parameter File ...... 5-10 Parameter File Descriptions...... 5-10 Running the Business Data Parser on UNIX and 32-bit PC Platforms ...... 5-16 IBM Mainframe Sample JCL...... 5-17 Business Data Parser Output...... 5-18 Business Data Parser Repository Record (BPREPOS) ...... 5-18 BPREPOS Fields...... 5-19 BDP Repository Output Record Format .....5-19 Business Data Parser Log File...... 5-21 Sample Business Data Parser Log File (Sorted)...... 5-21 Bad Patterns...... 5-22 Using the Palog Analyzer...... 5-22 Corrective Action ...... 5-23 Miscellaneous Pattern Addition ...... 5-23 Review Codes and Review Groups...... 5-25 Business Data Parser Review Codes ...... 5-25 Error Messages ...... 5-27

vi Trillium Software System™ Batch User’s Guide Table of Contents

CHAPTER 6 Window Key Generator Process Flow...... 6-2 Input and Output Resources ...... 6-2 Window Key Generator Parameters...... 6-3 Sample Parameter File 1...... 6-6 Sample Parameter File 2...... 6-6 Rules File ...... 6-8 Window Key Generator Codes ...... 6-9 Sample Window Key Rules File ...... 6-11 Running the Window Key Generator on UNIX and 32-bit PC Platforms ...... 6-11 Sample IBM Mainframe JCL...... 6-13 Error Messages ...... 6-14

CHAPTER 7 Matcher Driver Programs ...... 7-2 Window Matching and Reference Matching...... 7-3 Window Matching ...... 7-4 Window Matching Input and Output ...... 7-5 Window Keys ...... 7-5 Reference Matching...... 7-6 About Reference Matching ...... 7-7 Reference Matching Input and Output...... 7-9 Candidate Matching Information (CMI) Parameter File ...... 7-10 Sample CMI Parameter File...... 7-11 Sample Transaction File...... 7-11 Matcher Driver Program ...... 7-12 Matcher Parameters ...... 7-13 Handling Large Window Keys ...... 7-18 Creating Unique Window Keys...... 7-18 Using a Window Key Table ...... 7-19 Using Record Select and Bypass Functionality with the Matcher ...... 7-21 Rules File Parameters and Syntax Rules...... 7-21 Comparison Operators ...... 7-23 Rule File Entries Examples ...... 7-23

Trillium Software System™ Batch User’s Guide vii Table of Contents

Sample Matcher Parameter File ...... 7-25 About Matching Levels ...... 7-26 Field Comparison Routine Lists...... 7-26 Defining Field/Comparison Routine List Entries...... 7-27 Matcher Input 2 ...... 7-28 Grade Pattern Lists ...... 7-28 Grade Pattern List Syntax...... 7-29 Business Grade Pattern List Example ...... 7-30 Matching Prevention ...... 7-31 Using the PREVENT Match Routine ...... 7-31 Matching Propagation...... 7-32 Transitivity ...... 7-32 Propagation...... 7-32 Determining the Minimal Occurrence Influence...... 7-33 Match-Testing Early Exit...... 7-34 Matcher Output 1 ...... 7-35 Matcher Return Fields ...... 7-35 Matcher Output 2 ...... 7-36 Matcher Summary Statistics Report...... 7-36 Details for the Summary Statistics Report...... 7-42 Statistics from Matcher for Matcher Windows...... 7-42 Retail and Commercial...... 7-42 Commonizer Function ...... 7-45 Survivorship...... 7-47 Selecting the Surviving Name ...... 7-47 Survivorship Example...... 7-48 Selecting the Commercial Survivor...... 7-49 Identifying a Survivor Record Using CIS- _RANK_KEY ...... 7-49 Standard Common Data...... 7-51 User Common Data...... 7-52 User Common Data Parameter File Entries.....7-53 User Common Data Routines...... 7-56 Error Messages ...... 7-58

viii Trillium Software System™ Batch User’s Guide Table of Contents

Running the Matcher on UNIX and 32-bit PC Platforms ...... 7-66 IBM Mainframe Matcher Sample JCL ...... 7-67 Matcher Display Programs...... 7-68 Display Program DDL Requirements ...... 7-68 CFMATDSP Display Program...... 7-69 Display Program Parameters ...... 7-69 Sample Parameter File for CFMATDSP...... 7-71 Running the Matcher Display Program on UNIX and 32-Bit PC Platforms ...... 7-71 IBM Mainframe Matcher Display Sample JCL .....7-72 Display Program Errors ...... 7-73 Running CFFXMDSP on UNIX and 32-Bit PC Platforms ...... 7-76 IBM Mainframe Sample JCL for cffxmdsp ...... 7-77 Matcher Driver 2 ...... 7-78 Using CFMATCH with CKM ...... 7-78 CFMATCH Driver Parameters...... 7-79 Sample CFMATCH Matcher Driver Parameter File ...... 7-82 IBM Mainframe SAMPLE JCL for CFMATCH...... 7-83 Matcher Driver #3...... 7-84 Using CFWINMAT with CKM ...... 7-84 CFWINMAT Parameters ...... 7-84 Running CFWINMAT on UNIX and 32-bit PC Platforms ...... 7-88 IBM Mainframe Sample JCL for CFWINMAT ...... 7-89 Tuning the Match Results ...... 7-91 Getting Started ...... 7-92 Analyzing the Data...... 7-93 Using “Tie-Breaking” Fields ...... 7-93 Using Parmvals with the Matcher Comparison Routines...... 7-94 Comparison Routine and Parmval Details ...... 7-95 ABSOLUTE Routine ...... 7-95 APTNO Routine ...... 7-96 APTNO When Parmval (01) ...... 7-96

Trillium Software System™ Batch User’s Guide ix Table of Contents

ARRAY1 Routine ...... 7-98 ARRAY2 Routine ...... 7-99 BUSNAME Routine ...... 7-100 BUSNAME When Parmval (COMPACT) .....7-101 BUSNAME When Parmval (SORT)...... 7-102 BUSNAME When Parmval (DI) ...... 7-103 DATE Routine ...... 7-104 DIFFER Routine ...... 7-105 DIFFER Routine Parmvals ...... 7-105 Scoring Values...... 7-106 Example ...... 7-106 FLAG10 Routine ...... 7-106 FLAGFM Routine ...... 7-107 FLAGGN Routine...... 7-108 FLAGMF Routine ...... 7-109 Scoring Values...... 7-110 FLAGYN Routine ...... 7-110 FRSTNAME Routine ...... 7-111 FRSTNAME When Parmval = SYMETRIC ..7-113 FRSTNAME When Parmval = DI ...... 7-113 FRSTNAME When Parmval = TSB...... 7-113 FRSTNAME When Parmval = INITIAL ...... 7-114 GENER Routine...... 7-115 GENER When Parmval (95)...... 7-116 HOUSENO Routine...... 7-116 HOUSENO When Parmval (NORANGE) ....7-118 HOUSENO When Parmval (PARITY)...... 7-118 HOUSENO When Parmval (01)...... 7-119 MXDNAME Routine ...... 7-120 Scoring Values...... 7-121 NYSIIS Routine ...... 7-122 ONECOM Routine ...... 7-124 PARTIAL1 Routine ...... 7-125 PARTIAL1 When Parmval (10) ...... 7-125 PARTIAL1 When Parmval (FM) ...... 7-126 PARTIAL1 When Parmval (GN)...... 7-127 PARTIAL1 When Parmval (MF) ...... 7-128 PARTIAL1 When Parmval (MU)...... 7-129 PARTIAL1 When Parmval (YN) ...... 7-130

x Trillium Software System™ Batch User’s Guide Table of Contents

Scoring Values...... 7-131 PARTIAL1 When Parmval (ARRAY1,n) .....7-131 PARTIAL1 When Parmval (ARRAY2,n) .....7-132 PARTIAL2 Routine...... 7-133 PARTIAL2 When Parmval (DATE) ...... 7-134 PARTIAL2 When Parmval (SOUNDEX1) ...7-135 PARTIAL2 When Parmval (RSOUNDEX1) .7-137 Scoring Values...... 7-139 PARTIAL2 When Parmval (SOUNDEX2) ...7-140 PARTIAL2 When Parmval (RSOUNDEX2) .7-142 PARTIAL2 When Parmval (STATUS) ...... 7-144 PARTIAL2 When Parmval (NYSIIS) ...... 7-145 “Improved” NYSIIS Algorithm...... 7-145 PARTIAL2 When Parmval (RNYSIIS) ...... 7-148 Scoring Values...... 7-148 POSTCODE Routine ...... 7-148 Scoring Values ...... 7-149 POSTCODE When Parmval = TSB (used in UK only) ...... 7-149 PREFIX Routine ...... 7-150 PREVENT Routine ...... 7-151 RNYSIIS Routine ...... 7-152 SOCSEC Routine ...... 7-152 SOUNDEX1 Routine...... 7-153 SOUNDEX Algorithm ...... 7-153 SOUNDEX2 Routine...... 7-155 SPELLING Routine ...... 7-157 SPELLING When Parmval = DI...... 7-159 SPELLING When Parmval = SQUISH ...... 7-159 STATUS Routine ...... 7-159 STATUS When Parmval (STATUS)...... 7-159 Scoring Values...... 7-160 STREETS Routine ...... 7-160 STREETS When Parmval (DI)...... 7-162 STREETS When Parmval ()...... 7-162 SUBSTRNG Routine ...... 7-163 SUBSTRNG When Parmval (AND) ...... 7-164 TWORET Routine ...... 7-165 TWORET When Parmval (LO) ...... 7-165

Trillium Software System™ Batch User’s Guide xi Table of Contents

CHAPTER 8 Input/Output Resources...... 8-2 Data Reconstructor Parameters ...... 8-3 Parameter File Syntax ...... 8-5 Rules File ...... 8-6 Rules File Requirements ...... 8-6 Rule Script Language...... 8-7 Precedence and Associativity ...... 8-8 Comments ...... 8-9 Fields ...... 8-9 Input or Output Dictionary ...... 8-10 Selecting a Portion of a Field, field[n:n] & field(n:n)...... 8-11 Literal Values ...... 8-11 Binary Data Strings...... 8-12 Concatenating Literal Values ...... 8-12 BLANKS, ZEROS and NULLS...... 8-13 ‘IF’ Statements ...... 8-14 Conditions ...... 8-15 Logical Operators, AND and OR...... 8-18 Nested ‘IF’ statements ...... 8-19 Action Statements ...... 8-19 String Variables in the Data Reconstructor...... 8-25 Running the Data Reconstructor on UNIX and 32-Bit PC Platforms...... 8-26 IBM Mainframe Sample Data Reconstructor JCL 8-27 Error Messages ...... 8-28 Parameter Echo File Error Messages ...... 8-30 Rules File Error Messages...... 8-31

CHAPTER 9 User Common Data (Commonization) ...... 9-2 Selecting the Surviving Record ...... 9-6 Create Common Parameters ...... 9-8 Decision Routines...... 9-10 IBM Mainframe Sample JCL for cfcrcdrv ...... 9-16

xii Trillium Software System™ Batch User’s Guide Table of Contents

Running Create Common ...... 9-17 On UNIX and 32-bit PC Platforms...... 9-17 Error Messages ...... 9-18

Trillium Software System™ Batch User’s Guide xiii CHAPTER 1 Investigator

The Investigator module is used to investigate and analyze your data before (and sometimes after) you convert it. Unlike the Converter, the Investigator doesn’t perform any action on your data; however, because it works in conjunction with the Converter it uses many of the same analysis routines.

The Investigator provides parameters that let you identify the data to search for and flags the data that is found. The Investigator produces two output files and a statistics file.

Investigator 1-2 Investigator Data Process Flow

Investigator Data Process Flow

The Investigator uses a driver named CFINVDRV.

The Investigator uses the following input and output files:

File Description Driver parameter file pfindrv.par Input DDL file input.ddl Input file User data file Output file Output data file, which is typically used as input to the Converter.

In the OUTREC_FNAME output file, each field contains a 1 if the parameter line conditions were met and data is found. If conditions were not met, and no specified data is found, the field contains a 0. See page 1-23 for more information about the OUTREC_FNAME output file.

Trillium Software System™ Batch User’s Guide Investigator Order of Operations 1-3

Investigator Order of Operations

The Investigator driver operations occur in the following order:

1. Read input record/buffer 2. Process SINGLE_FIELD_LOOKUP parameter 3. Process MULTI_FIELD_LOOKUP parameter 4. Process FIELD_SCAN parameter 5. Process NUMERIC_RANGE_COMPARE parameter 6. Process FIELD_COMPARE parameter 7. Process ARITHMETIC_COMPARE parameter 8. Process FREQUENCY parameter 9. Write output to record file 10. Write output to field file

If you are running on IBM Mainframe, you must define the record length of the input file BEFORE it is run through the Investigator.

Investigator 1-4 Investigator Parameters

Investigator Parameters

Required parameters appear in bold and shaded. A detailed description of each parameter begins on page 1-7, in the section, “Investigator Parameter Descriptions.” Table 1.1 Investigator Parameters

Parameter Description ARITHMETIC_COMPARE Compares the numeric value contained in a field to the numeric value in another field, or to a numeric literal. See page 1-7. FIELD_COMPARE Compares a field to another field on the same record or a field to a literal on the same record. See page 1-12. FIELD_SCAN Finds literal or field values in the specified field. See page 1-13. FIELD_SCAN_TABLE Calls a file containing FIELD_SCAN entries. For neatness purposes, entries from the FIELD_SCAN parameter can be moved into a user-defined file instead of a parameter file. The FIELD_SCAN_TABLE parameter references and calls that file when calling for FIELD_SCAN entries. See page 1-15. FREQ_ALL_FIELDS Specifies default sorting options to be applied to all DDL fields. This results in frequency counts being reported for all fields in the input DDL file. See page 1-9. FREQ_PER Displays frequency in percentages as well as counts, based on the results in the FREQUENCY parameter. See page 1- 10. FREQUENCY Numeric value that specifies the number of occurrences of data for specified fields. See page 1-16. INP_DDL01 File that contains the input DDL. REQUIRED INP_FNAME01 Path and name of the input file. REQUIRED INP_RNAME01 Name of the input record on the input DDL. REQUIRED MAXIN01 Numeric value that specifies the maximum number of records to process. This value should always be greater than the value in the START parameter.

Trillium Software System™ Batch User’s Guide Investigator Parameters 1-5

Table 1.1 Investigator Parameters

Parameter Description MULTI_FIELD_LOOKUP Finds up to two fields on an external, user-defined table. See page 1-17. NTH_SAMP Numeric value that specifies a selected sample of records (every Nth record) from the input file to process. See page 1-20. NUMERIC_RANGE_COMPARE Determines if a numeric field value falls between two other numeric values. See page 1-20. OUTFLD_FNAME Writes the data of a single field to an output file, along with the record number (record ID) and a five-byte field that contains the parameter number. See page 1-22. OUTREC_FNAME Creates an output file; one record of output for each record of input. See page 1-23. PRINT_NTH_COUNT Prints to the screen in user-defined increments to show the execution progress of the Investigator program. See page 1-24. SINGLE_FIELD_LOOKUP Finds a single field in an external, user-defined table. See page 1-25. START Indicates the specified record from which to start processing. See page 1-27. STAT_FNAME Produces a file that contains the statistics from the run. See page 1-27. TO_UPPER_FNAME Translates characters to all uppercase characters. The parameter is a call to a table containing uppercase and lowercase recode information. See page 1-28.

Investigator 1-6 Sample Investigator Parameter File

Sample Investigator Parameter File

**************************************************************** * PFINDRV.PAR - Investigator driver parameter file * ****************************************************************

ARITHMETIC_COMPARE “Field1”,“L”,“Field2”,“L”,“”,”EQ” FIELD_COMPARE "Field1","R","Field2","R","","EQ" FIELD_SCAN FIELD_SCAN_TABLE “scan.txt” FREQUENCY “field_name”,“MAC7” FREQ_PER Y INP_DDL01 "..\dict\input.ddl" INP_FNAME01 "..\data\input.orig" INP_RNAME01 "INPUT" *MAXIN01 *MULTI_FIELD_LOOKUP *NTH_SAMP NUMERIC_RANGE_COMPARE OUTFLD_FNAME OUTREC_FNAME PRINT_NTH_COUNT 100 SINGLE_FIELD_LOOKUP START 1050 STAT_FNAME TO_UPPER_FNAME "m819u819"

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-7

Investigator Parameter Descriptions

The following sections provide complete descriptions,syntax, and examples of each Investigator parameter.

ARITHMETIC_COMPARE Parameter

Compares one numeric field value to another numeric field value or a numeric literal on the same record. All fields must be positive numeric characters. All results are absolute values.

Syntax

ARITHMETIC_COMPARE [numeric field1], [Field operator1], [numeric field2], [Field operator2], [literal numeric], [comparison indicator] where:

numeric field1 First numeric field to be compared Field operator1 Operator to use with the function numeric field2 Second numeric field to compare to the previous field (Optional if argument5 is used) Field operator2 Second operator to use with the function literal numeric Literal numeric value to be compared against (Optional if argument3 is used) comparison indicator Comparison to make between two fields

Field Operators

L Left-justify field (removes excess spaces between words) R Right-justify field (removes excess spaces between words) Y Left-pack field (removes all spaces) Z Right-pack field (removes all spaces)

Investigator 1-8 Investigator Parameter Descriptions

Comparison Indicators

GT Greater Than GE Greater than or Equal to LT Less Than LE Less than or Equal to EQ EQual to NE Not Equal to

Example The following example compares ship_date and order_date and checks to see if they are equal: ARITHMETIC_COMPARE "ship_date","L","order_date","L","","EQ" where:

ship_date Numeric field to compare REQUIRED L Left-justify the field specified in argument1 before comparing order_date Numeric field to compare against (Optional if argument5 is used) L Left-justify the field specified in argument3 before comparing “” Not used in this example. Optional if argument3 is used EQ Comparison indicator that checks if field1 and field2 are equal

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-9

FREQ_ALL_FIELDS Parameter

Specifies the default sorting options (any of the standard frequency options) to be applied to all DDL fields. These sorting options are identical to the FREQUENCY parameter.

If the FREQUENCY parameter is used for any individual field, it overrides any sorting options specified by FREQ_ALL_FIELDS.

Syntax

FREQ_PER [Field operator], [Sorting options]

Field Operators

Y Enable this function N Don’t enable this function Operators

A Count all values M Mask; for example, A=ALPHA, N=NUMERIC N Count blanks or zeros vs. non-blanks or non-zeros Sort Option 1 The following sort options can be combined after any operator within the quotes to better organize the resulting frequency analysis values:

A Sort ascending D Sort descending U Unsorted Sort Option 2

C Sort on the frequency count. Not available with sort option #1 ‘U’. V Sort on the field value. Not available with sort option #1 ‘U’.

Investigator 1-10 Investigator Parameter Descriptions

Sort Option 3

Numeric value Number of frequencies to return

Example FREQ_ALL_FIELDS "Y","ADC5" where:

Y Enable this parameter A Count all values D Sort descending C Sort on the frequency count 5 Show on the first five frequencies

FREQ_PER Parameter

Controls whether frequency percentages are displayed based on the results from the FREQUENCY parameter.

Syntax

FREQ_PER [Field operator]

Field Operator

Y Display frequency percentages N Don’t display frequency percentages

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-11

Example 1 In this example, FREQ_PER displays frequency percentages based on the results of the FREQUENCY parameter.

FREQ_PER Y FREQUENCY "city","ADC3" where: For FREQ_PER: The output from this example would look like this: Y Display frequency percentages For FREQUENCY: city Count occurrences of the city field 5 9.90% A Count all values Boston 7 8.30% D Sort output in descending order Chicago 1 7.70% C Sort on the frequency count 3 Show data on the first three entries

Example 2 In this example, FREQ_PER does not display frequency percentages of the FREQUENCY parameter:

FREQ_PER N FREQUENCY "city","ADC5"

For FREQ_PER: The output would look like the following because the FREQ_PER Display frequency percentages Y parameter is set to N: For FREQUENCY: New York 25 city Count occurrences of the city field A Count all values Boston 17 D Sort output in descending order Chicago 12 C Sort on the frequency count 3 Show data on the first five entries

Investigator 1-12 Investigator Parameter Descriptions

FIELD_COMPARE Parameter

Compares one field to another field on the same record or one field to a literal on the same record.

Syntax

FIELD_COMPARE [field1],[field operator1],[field2],[field operator2], [comparison indicator] where:

field1 First field to be compared Field operator1 First operator to use with the function field2 Second field; to be compared to the previous field Field operator2 Second operator to use with the function comparison indicator Comparison to make between two fields

Field Operators

L Left-justify field (removes excess spaces between words) R Right-justify field (removes excess spaces between words) Y Left-pack field (removes all spaces) Z Right-pack field (removes all spaces)

Comparison Indicators

GT Greater Than GE Greater than or Equal to LT Less Than LE Less than or Equal to EQ EQual to NE Not Equal to

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-13

Example The following example compares the ship_date to the order_date to determine whether the ship_date is greater than the order_date. FIELD_COMPARE "ship_date","R","order_date","R","","GT" where:

ship_date Name of the field to compare. REQUIRED R Right-justifies the field specified in argument1 before comparing order_date Name of a field to compare against (Optional if argument5 is used) R Right-justify the field specified in argument3 before comparing “” Argument not used in this example (Optional if argument3 is used) GT Comparison indicator that checks if field1 is greater than field2

FIELD_SCAN Parameter

Scans for a given string of occurrences within substrings of a field.

Syntax

FIELD_SCAN [sub-string],[field operator],[scanned data string], [location indicator] where:

sub-string Substring in which to scan for the data string Field operator Operator to use in the function scanned data string Data string to scan for location indicator Indicator for the field data

Investigator 1-14 Investigator Parameter Descriptions

Field Operators Character 1

L Left-justify field (removes excess spaces between words) R Right-justify field (removes excess spaces between words) Y Left-pack field (removes all spaces) Z Right-pack field (removes all spaces) M Converts field to mask values (all alpha characters are converted to an A and all numeric characters are converted to an N) before the lookup U Convert field to uppercase prior to lookup. Character 2

F Value in argument3 is a field name, not a literal. Location Indicators

B Look only at the beginning of the field E Look only at the end of the field D Look anywhere in the field T Look for two separate occurrences of the value in the field

Example

The following example looks for occurrences of INC. in the addr_line1 field: FIELD_SCAN “addr_line1”,“R”,“INC.”,“D” where:

addr_line1 Name of the field to be scanned for the value in argument3. R Right-justify the field. INC. Name of a field or literal to be scanned for in argument1 D Search anywhere in the data for INC.

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-15

FIELD_SCAN_TABLE Parameter

For neatness purposes, entries from the FIELD_SCAN parameter can be moved into a user-defined file instead of a parameter file. The FIELD_SCAN_TABLE parameter references and calls that table when calling for FIELD_SCAN entries.

Syntax

FIELD_SCAN_TABLE [table name]

Example In the following example, FIELD_SCAN entries are stored in the user-defined file named scan.txt: FIELD_SCAN_TABLE “scan.txt”

Investigator 1-16 Investigator Parameter Descriptions

FREQUENCY Parameter

Counts the occurrences of data for specified fields, using sort options (if required).

Syntax

FREQUENCY [DDL field name to analyze], [Operator; Sort option 1; Sort option 2; Sort option 3]

Operators

A Count all values M Mask (A=ALPHA, N=NUMERIC) N Count blanks or zeros vs. non-blanks or non-zeros Sort Options

The following sort options can be combined after any operator within the quotes to better organize the resulting frequency analysis values:

Sort Option 1 Sort Option 2

A Sort ascending C Sort on the frequency count D Sort descending V Sort on the field value Unsorted U Not available with Sort Option 1 ‘U’. Sort Option 3

Numeric Number of frequencies to return

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-17

Example FREQUENCY “field_name”,“MAC7” where:

field_name Input or output DDL field to analyze M Use the Mask function, shows alpha or numeric characters and any other special characters within this field A Sort ascending C Sort on the frequency count 7 Show on the first seven frequencies

Whether or not frequency PERCENTAGES are displayed depends on whether FREQ_PER is set to Y or N.

MULTI_FIELD_LOOKUP Parameter

Employs a user definable multi-field table to perform (1) or (2) field character lookup on the data. Fields on lookup do not need to be adjacent to each other.

It serves as a powerful investigational tool to examine inconsistencies in your data or changes that might need to be made with further processing using the Converter. For example, if (a) occurs and (b) occurs, then change (c).

Syntax

MULTI_FIELD_LOOKUP [field1], [Field operator1], [field2], [Field operator2], [table name] where:

field1 First field to find in table Field operator1 First operator to use with the function field2 Name of a second (optional) field to add to argument1 for table lookup Field operator2 Second operator to use with the function table name Name of the table that contains lookup values

Investigator 1-18 Investigator Parameter Descriptions

Field Operators

L Left-justify field (removes excess spaces between words) R Right-justify field (removes excess spaces between words) Y Left-pack field (removes all spaces) Z Right-pack field (removes all spaces) M Convert field to mask values (all alpha characters are converted to an A and all numeric characters are converted to an N) before the lookup U Convert field to uppercase prior to lookup.

Example In the following example, MULTI_FIELD_LOOKUP left-justifies addr_line3 and postcode, looks for masks in both fields as specified in the table, and searches for the fields in the table. If both masks are TRUE, resets the mask shape of addr_line3 according to table specifications:

MULTI_FIELD_LOOKUP "addr_line3",“LM”,"postcode", “LM”, “../tables/addr_line3.recode”

where:

addr_line3 Name of the first field to find in the table LM Left-justify the masks specified in argument1 before lookup postcode Name of a second optional field to add to argument1 for lookup LM Left-justify the masks specified in argument3 before lookup ../tables/ Name of the table that contains lookup values addr_line3.recode

Table Specifications The lookup table must have data for at least the length of the field(s) specified in argument1 and argument3.

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-19

Sample Multi-Field Table # 1

Assuming the following input: Field1 (addr_line1) = 12345 SMITH AVENUE Field2 (postcode) = 12345

This entry would test if Field1 (addr_line1) has a mask of NNNNN AAAAA AAAAAA and Field2 (postcode) has a mask of NNNNN.

*********************************************** * position 1-30 addr_line3 (mask) * position 31-35 postcode (mask) *********************************************** *2345678901234567890123456789012345 NNNNN AAAAA AAAAAA NNNNN

Investigator 1-20 Investigator Parameter Descriptions

NTH_SAMP Parameter

Processes a selected sample of records from a file.

Syntax

NTH_SAMP [numerical value of records to process]

Example NTH_SAMP 25 where:

25 Specifies the record number (in this case every 25th record read) to process

NUMERIC_RANGE_COMPARE Parameter

Analyzes a particular field of numeric data and determines if numeric values fall between two other values.

Syntax

NUMERIC_RANGE_COMPARE [numeric field1], [Numeric value1],[Numeric value2] where:

numeric field1 Name of the numeric field to find within the range numeric value1 Numeric value that specifies the low end of the range numeric value2 Numeric value that specifies the high end of the range

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-21

Example The following example checks to see if the value of the product_code field is within the range of 71250 to 72750:

NUMERIC_RANGE_COMPARE “product_code”,”71250”,”72750” where:

product_code Name of the numeric field to find within the range 71250 Numeric value that specifies the low end of the range 72750 Numeric value that specifies the high end of the range

Investigator 1-22 Investigator Parameter Descriptions

OUTFLD_FNAME Parameter

Contains the name of the output file containing a record for each criteria hit. The program writes the data of a single field to an output file, along with the record number (record ID) and a five-byte field containing the parameter number. The parameter number indicates the parameter in the parameter file that is associated with the data. Only a true condition is reported to this file. This can be used to locate specific data conditions and then collect them into a single file.

Syntax

OUTFLD_FNAME [file name]

Example This output file contains one record for each condition met as specified in the following parameters:

SINGLE_FIELD_LOOKUP MULTI_FIELD_LOOKUP FIELD_SCAN NUMERIC_RANGE_COMPARE FIELD_COMPARE ARITHMETIC_COMPARE

The specified file layout is as follows:

Field 1: Number for the parameter line as shown in stat file. (5 bytes)

Field 2: Record number on which the field in question falls. (12 bytes)

Field 3: The field from Argument 1 as it appears on the input file truncated/ expanded to 100 bytes. (100 bytes)

The layout is always 117 bytes. The system will truncate or expand the file to reach 117 bytes.

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-23

OUTREC_FNAME Parameter

Specifies the full record output file. In that file, the system creates one record of output for each record of input. The entire record is written along with the record number (record ID) and a field containing the status of every Investigator parameter that is enabled.

The beginning of the record will contain a single byte for each Investigator parameter, indicating the true or false status of the parameter. These single- byte fields can then be interrogated as a string to determine if specific combinations of data exist.

Syntax

OUTREC_FNAME [file name]

Example The output file layout is as follows: Fields 1-n: One one-byte field for each parameter line entry from the following parameters: SINGLE_FIELD_LOOKUP MULTI_FIELD_LOOKUP FIELD_SCAN NUMERIC_RANGE_COMPARE FIELD_COMPARE ARITHMETIC_COMPARE

In the specified file, each field contains a ‘1’ if the parameter line conditions were met. If conditions were not met, it contains a ‘0’.

Field 2: The number of the record read (12 bytes). Field 3: The entire record as it appears on input.

The record length of this file is determined by adding all three numbers in all three fields.

Investigator 1-24 Investigator Parameter Descriptions

Example The following example determines record length:

If the following parameters contained 10 lines, all with conditions met, the first number would be 10 bytes.

SINGLE_FIELD_LOOKUP MULTI_FIELD_LOOKUP FIELD_SCAN NUMERIC_RANGE_COMPARE FIELD_COMPARE ARITHMETIC_COMPARE

The second number is always 12 bytes.

The third number is the entire record (1000 bytes).

10 bytes + 12 bytes + 1000 bytes = 1022, the record length of the file.

PRINT_NTH_COUNT Parameter

Prints to the screen in user-defined increments to show the execution progress of the Investigator program.

Syntax

PRINT_NTH_COUNT [numerical value of record increments]

Example PRINT_NTH_COUNT 100 where:

100 Record increment to print to the screen

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-25

SINGLE_FIELD_LOOKUP Parameter

Employs a user-definable table and field operator as a means to search for a single data field within the table.

Syntax

SINGLE_FIELD_LOOKUP [field name], [Field operator], [table name] where:

field name Name of the field to look up in the table Field operator Operator to use in the function table name Name of the table name to use for the lookup

Field Operators

L Left-justify field (remove excess spaces between words) R Right-justify field (remove excess spaces between words) Y Left-pack field (remove all spaces) Z Right-pack field (remove all spaces) M Converts field to mask values (all alpha characters are converted to an A and all numeric characters are converted to an N) before the lookup U Converts field to uppercase before the lookup.

Investigator 1-26 Investigator Parameter Descriptions

Example

The following example looks for the field product_code in prod.table:

SINGLE_FIELD_LOOKUP “product_code”,”L”,”../tables/prod.table” where:

product_code Name of the field to look up in the table L Left-justify the field ../tables/prod.table Path and name of the table to use to look up the field name value (product_code)

Sample User-Defined Table

In this example, the table below contains product_code field values 025, 125, 126, and 127:

************************************************ * Sample User-Defined Table * This table contains fields defined by the user. * The following fields are product_code field values. ************************************************ 025 125 126 127

Table Specifications

The lookup table must have data for at least the length of the field specified in argument1.

Trillium Software System™ Batch User’s Guide Investigator Parameter Descriptions 1-27

START Parameter

Adds the ability to begin processing on a particular record number. The Investigator starts on that record number and goes to the end or, if the MAXIN parameter was used, to that many records only.

Syntax

START [begins processing at this record]

Example START 12450 where:

12450 Record from which processing begins

STAT_FNAME Parameter

Produces a report file that contains a parameter list legend, frequencies, and input/output totals for the run of the program.

A legend in the stat file indicates the numeric order of the single byte fields that will be contained on the file defined by the parameter OUTFLD_FNAME. Parameters are listed in the stat file in the order in which they are executed and indicate to the user the numerical parameter assignment.

Syntax

STAT_FNAME [report file name]

Example STAT_FNAME “../data/stat2”

Investigator 1-28 Investigator Parameter Descriptions

TO_UPPER_FNAME Parameter

Translates characters to all uppercase characters. This parameter is a call to a table recoding function which contains the uppercase/lowercase translation information.

Syntax

TO_UPPER_FNAME [path and name of the table that contains special uppercase translation information]

Example The following example translates Brian McVay (mixed-case) to BRIAN MCVAY (all uppercase):

TO_UPPER_FNAME “..\tables\m819u819” where:

..\tables\m819u819 Table that contains the character set translations. Located in the Trillium Software System \tables directory.

Trillium Software System™ Batch User’s Guide Sample Investigator Output Statistics 1-29

Sample Investigator Output Statistics

INPUT TOTALS FOR FILE = ../data/invdata 148 RECORDS READ

PARAMETER LEGEND AND TOTALS

MULTI_FIELD_LOOKUP

NUMBER PARAMETERCOUNT

1 “proc_amt_range”,”L”,”med_proc”,”L”,”../tables/Field1.recode”28 ------SUBTOTAL OF HITS FOR MULTI_FIELD_LOOKUP28

INV_RANGE_RECODE

NUMBER PARAMETERCOUNT

2 “date_usa_mmdd”,”0500”,”1200”86

------SUBTOTAL OF HITS FOR NUMERIC_RANGE_COMPARE86

------TOTAL NUMBER OF HITS114

OUTPUT TOTALS FOR FILE = ../data/outrec2 148 RECORDS ELIGIBLE FOR OUTPUT

OUTPUT TOTALS FOR FILE = ../data/outfld2 114 RECORDS ELIGIBLE FOR OUTPUT

Investigator 1-30 Error Messages

Error Messages

Table 1.2, “CFINVDRV Error Codes” describes the error messages returned by the CFINVDRV driver program.

Table 1.2 CFINVDRV Error Codes

Error Message Description

Must have a parm file. Parameter file is missing from the command line.

Parm Processing Error, status = 2. Parameter file is present but cannot be accessed or is corrupt. Check the path and/or file name.

Parm Processing Error, status = 4 Program encountered an error with a parameter entry. Use the parameter echo debugging process to determine the entry that is incorrect.

Unable to open statistics file The file in STAT_FNAME is invalid, or permissions . on the file prevent overwriting of the existing file. Check path and/or file name.

Unable to open the file Unable to open the file in INP_DDNAME. .

Can't open field output file. The output file in OUTFLD_FNAME is invalid or the file contents are corrupt. Check the path and/or file name.

Missing parameter INP_DDL01. A DDL file has not been specified in INP_DDL01. This is a required parameter for investigator. Check the path and/or filename.

Data Dictionary open problems. The DDL specified in INP_DDL01 cannot be opened. Please run the check syntax utility an check the path and/or file name.

Can't open record output file. The output file specified in OUTREC_FNAME is invalid or the file contents are corrupt. Check the path and/or file name.

Can't open input file . The input file specified in INP_FNAME01 is invalid or the file contents are corrupt. Check the path and/or file name.

Trillium Software System™ Batch User’s Guide Error Messages 1-31

Table 1.2 CFINVDRV Error Codes

Error Message Description

Can't open the file . The file specified in SINGLE_FIELD_LOOKUP, FIELD_SCAN_TABLE or MULTI_FIELD_LOOKUP is invalid or file contents are corrupt.

Maximum number of input The maximum number of frequencies has frequencies supported is, been exceeded. .

The maximum frequency length The maximum frequency length of a value for input is . exceeded 199 characters.

A TYPE is required for frequency. Must specify the type of value that the frequency is to count. Valid values are "A" = Any, "M" = Mask, "N" = Count Blanks or Zero's.

Only A, N or M is supported for The only operator of values that the frequency frequency. parameter supports are: "A" = Any, "M" = Mask, "N" = Count Blanks or Zero's.

The maximum number of input Number of fields in FREQUENCY parameter translates supported is exceeded. exceeds maximum 1000.

I/O error during read on Temp directory which holds overflow for this process is out of space.

Unable to write to the output file. The output file specified in OUTFLD_FNAME is invalid or the file contents are corrupt. Check the path and/or file name.

Unable to allocate memory. The memory allocation defined for the input buffer is not sufficient.

The parm, TO_UPPER_FNAME, is A parameter file is not specified in TO_UPPER_ required for the U operator. FNAME. This is a required parameter for U operator. Check the path and/or file name.

The maximum number of single Maximum number of fields to be looked up in the field lookups supported is specified lookup table has been exceeded. exceeded

The maximum number of field The maximum number of literals or field values to scans supported is exceeded be scanned for has been exceeded.

Investigator 1-32 Error Messages

Table 1.2 CFINVDRV Error Codes

Error Message Description

Maximum number of arithmetic Maximum number of numeric fields or numeric compares is exceeded. literals to be compared has been exceeded.

The maximum number of field Maximum number of fields or literals to be compares supported is exceeded compared has been exceeded.

Maximum number of numeric The maximum number of numeric field values to be range compares supported is scanned to see if they fall between to other numeric exceeded. values has been exceeded.

The maximum number of output Maximum number of output frequencies has frequencies supported is exceeded been exceeded.

Maximum frequency length for Maximum size for a frequency has been exceeded. output is exceeded.

Trillium Software System™ Batch User’s Guide Running the Investigator 1-33

Running the Investigator

This section describes how to run the Investigator on UNIX and 32-bit PC platforms. To execute the cfinvdrv program, use the following command-line syntax:

cfinvdrv –pf -pe where:

cfinvdrv Investigator driver program -pf Keyword that indicates the parameter file follows Investigator driver parameter file -pe Keyword that indicates the parameter echo file follows File that displays any parameter processing errors in the program listing file

Investigator CHAPTER 2 Converter

The Converter is a powerful tool that moves data from input to output based on field names in the Data Dictionary Language (DDL) files. If a field name exists on both the input and output DDL, the Converter moves data from input to output based on the position specified in the output DDL.

Converter 2-2

The Converter lets you convert and merge records from up to ten input files into a single, standard format. The Converter simplifies data conversion by enabling you to:

Convert character formats

Change integers to characters

Modify and adjust field lengths

Handle literal constants and increasing values The Converter can also:

Perform data audits on input and output fields by providing frequency statistics.

Recode character fields with a user-defined external table

Identify and separate records that reject the conversion process so that they can be more closely examined. See the “Data Dictionary Language” section of the Trillium Software System® Control Center manual (in the section with the DDL Editor tool) for more information about creating and using DDLs.

Trillium Software System™ Batch User’s Guide Input/Output Resources 2-3

Input/Output Resources

The driver that invokes the Converter is named cfcondrv. This diagram illustrates the job flow of data through the Converter.

The Converter uses the following input and output files:

Input

Driver parameter file pfcondrv.par (See “Converter Parameters” on page 2-4 for more information about the parameters used in this file.)

Input DDL input1.ddl (one DDL per input file)

Input File Up to 10 input files can input to the Converter.

Output

Output DDL convout.ddl

Output File convout; this file is commonly used as the input file to the Parser.

Output statistics file File produced with statistics of functions performed.

Converter 2-4 Converter Parameters

Converter Parameters

This section describes the Converter parameters. REQUIRED parameters are indicated as a shaded row in the table.

INP parameters apply to specified fields on all input DDLs. These functions are performed on the input record prior to movement of data to output.

All field names used in INP parameters MUST be in the input DDL.

OUT parameters apply to specified fields on the output DDL. These functions are performed on the output record after movement of data from output.

All field names used in OUT parameters MUST be in the output DDL.

Table 2.1 Converter Parameters

Parameter Description

FREQ_PER Controls whether frequency percentage is displayed based on the results of INP_FREQ01-10 and/or OUT_FREQ. See page 2- 13 for more information.

INP_DDL01— Name of input record DDL files 1–10. At least one DDL is INP_DDL10 REQUIRED.

INP_FIELD_TRAN01— Translates an entire field from one character set to another INP_FIELD_TRAN10 with case translation. The table file name must be included in the INP_FIELD_TRAN_FNAME parameter. Trillium Software currently provides 36 different tables for character set translation. See page 2-15 for more information.

INP_FIELD_TRAN_FNAME Specifies the name of the look-up table used in character set translations. See page 2-16 for more information.

INP_FNAME01E— Name of the file containing records that did not pass conversion INP_FNAME10E (same shape as input record).

INP_FNAME01— Name of input data files 1—10. At least one file is REQUIRED. INP_FNAME10

Trillium Software System™ Batch User’s Guide Converter Parameters 2-5

Table 2.1 Converter Parameters

Parameter Description

INP_FREQ01— Invokes frequency counts on the specified input fields. Records INP_FREQ10 occurrences in fields, of literal data strings, mask shapes or populated/blank fields in data, as well as counts for each choice.

See page 2-31 for more information.

INP_RNAME01— Record names of the input record 1–10 DDLs. At least one is INP_RNAME10 REQUIRED.

INP_TRAN01— Translates a specific string or hex value to another string or hex INP_TRAN10 value on an “every occurrence basis.” See page 2-16 for more information. If a field name is omitted during a translation of multiple fields, the Converter applies the previous field name in its place.

LENGTH_OVERRIDE Moves a field to a field of shorter length. See page 2-17.

MAXIN01—MAXIN10 Maximum number of records to read from the input files, 1–10. This value must always exceed the value in the START parameter.

NTH_SAMP Numeric value that indicates an incremental sample of records from a file to take for processing or testing. See page 2-18 for more information.

ORDER_OF_ Specifies an alternate order to the Converter’s normal order of OPERATIONS operation. See page 2-18 for more information.

OUT_ARITHMETIC_ Compares one field to another field or to a literal value on the COMPARE same record. See page 2-19 for more information. All fields must be positive, numeric characters and all answers are absolute values.

OUT_ARITHMETIC_ Performs a character–based comparison of output fields and RECODE values using a special value comparison operator (such as GT=greater than, GE=greater than or equal to, and so on). See page 2-21 for more information.

Converter 2-6 Converter Parameters

Table 2.1 Converter Parameters

Parameter Description

OUT_BUILD_OR_LIST Populates an output field (target field) based on criteria and contents of an input field (source field). Also performs uppercase and lowercase functions. See page 2-23 for more information.

OUT_CHANGE_ Analyzes a particular field of data and changes a value without RECODE employing user-defined tables. See page 2-27 for more information.

OUT_CHANGE_TABLE Enables entries in OUT_CHANGE_RECODE to be moved into a user-defined file instead of a parameter file. References and calls that external file when calling for OUT_CHANGE_RECODE entries. See page 2-29 for more information.

OUT_DDL Name of the output record DDL file. REQUIRED.

OUT_FIELD_COMPARE Compares one field to another field on the same record, or one field to a literal on the same record. See page 2-29.

OUT_FNAME Name of the output data file. REQUIRED.

OUT_FREQ Records occurrences in fields, of literal data strings, mask shapes or populated/blank fields in data, as well as counts for each choice. See page 2-31 for more information. DDL field names must be enclosed by quotes.

OUT_MULTI_RECODE Uses user-definable field table to perform 1 or 2 field character lookups on our data, in order to recode 1 or 2 other fields. See page 2-34 for more information.

OUT_RANGE_RECODE Analyzes a particular field of numeric data and determine whether numeric values fall between two values. If a field falls between a low value and a high value, it would then recode from another defined value in the same field. See page 2-37 for more information.

OUT_RECODE Uses a recode table to change data values based on actual values or mask shapes. See page 2-39 for more information.

OUT_RNAME Record name of the output record DDL. REQUIRED.

Trillium Software System™ Batch User’s Guide Converter Parameters 2-7

Table 2.1 Converter Parameters

Parameter Description

OUT_SCAN_RECODE Efficiently scans for particular string occurrences within substrings of a field and either deletes it or transfers it to another field. See page 2-41 for more information.

OUT_SCAN_TABLE Allows OUT_SCAN_RECODE parameter entries to be moved into a table for neatness purposes. The table defined here contains OUT_SCAN_RECODE entries as they would appear in the parameter file. See page 2-46 for more information.

PRINT_NTH_COUNT Prints to screen a numeric value that indicates the conversion progress in user-defined increments.

SELBYP_PARMNAME Name of the parameter file that contains the rules for applying bypass and select logic. See “Using Record Select and Bypass Functionality with the Converter” on page 2-8.

SELBYP_LOGFNAME Name of the log file that contains the statistics of the applied bypass and select rules for the processed data. See “Using Record Select and Bypass Functionality with the Converter” on page 2-8.

SOURCE_FLD Indicates the name of the field in the DDL to receive the file identifier. Used when running the Converter with multiple input files. Works in conjunction with parameter, SRC_ID01—10.

SRC_ID01—SRC_ID10 Specifies a literal value used as input files 1—10 identifiers.

START Adds the ability to begin conversion on a particular record number. If used with the MAXIN parameter, you must ensure that the MAXIN parameter exceeds the value specified here.

STAT_FNAME Produces a report with frequencies and input/output totals.

TO_LOWER_FNAME File name of the table that contains the lowercase translate information. See page 2-47 for details.

TO_UPPER_FNAME File name of the table that contains the uppercase translate information. See page 2-48 for details.

Converter 2-8 Using Record Select and Bypass Functionality with the Converter

Table 2.1 Converter Parameters

Parameter Description

UPLOW_FNAME File name of the table that contains exceptions to standard casing rules. See page 2-49 for details.

Using Record Select and Bypass Functionality with the Converter

The parameters SELBYP_PARMNAME and SELBYP_LOGFNAME are used to define a subset of parameters that enable the filtering of data as part of the data conversion process. By including these parameters in the batch driver parameter file, users can define the criteria for selecting and/or bypassing specified records to efficiently filter data in conjunction with the Converter (CFCONDRV).

This functionality only can be enabled through direct parameter file editing. The following parameters must be physically entered into the parameter file.

SELBYP_PARMNAME Parameter file that contains the rules to apply to the bypass and select logic. These rules are in the special rules file.

SELBYP_LOGFNAME Log file that will contain the statistics of the applied bypass and select rules for the data processed.

Trillium Software System™ Batch User’s Guide Rules File Subset Parameters and Syntax Rules 2-9

Rules File Subset Parameters and Syntax Rules

The external rule file defined by the SELBYP_PARMNAME parameter includes instructions that enable the select and/or bypass functionality. This rule file can contain any of the parameter entries defined below.

Parameter Description

INPUT_SELECT Used to select a record for subsequent processing. Must contain keyword LIST. Must always be followed by one of the associated FNAME parameters. For example: INPUT_SELECT LIST INP_FNAME01 NODE (line_02 NESC ‘BOSTON’)

INP_FNAME01 Contains both input select and input bypass logic. Must contain the keyword NODE, followed by the comparison rule(s). Multiple rules Up to 10 input parameters can be defined. Specifies the position and length or DDL field name of the field from the associated input file to scan for a literal string. The string must be enclosed in single quotation (‘ ’) marks. For example: INP_FNAME01 NODE (line_02 NESC ‘BOSTON’)

INPUT_BYPASS Used to bypass a record so that no processing occurs. Must contain keyword LIST. Must always be followed by one of the associated FNAME parameters. For example: INPUT_BYPASS LIST INP_FNAME01 NODE (line_02 NESC ‘BOSTON’)

Converter 2-10 Comparison Operators

Parameter Description

OUTFILES Must contain keyword ‘List’. Must always be followed by the OUT_FNAME parameter, containing select logic commands. For example: OUTFILESLIST OUT_FNAMENODESELECT=(1,3 EQ ‘JOY’), FILE=’..\data\out.data’

OUT_FNAME Contains select logic, as well as the file name of where to store processed records. Must always contain the keyword NODE followed by the selection keyword, followed by the logic statement. Specifies the position and length or DDL field name of the field to scan for a literal string. The string must be enclosed in single quotation (‘ ’) marks. For example: OUT_FNAME NODE SELECT=(1,3 eq ‘joy’), FILE=’...\data\out1.data’

Positions and lengths are both one-based.

Comparison Operators

The following table lists the supported operators that can be used in rule file parameter entries.

Operator Definition EQ EQual to OR OR condition AND AND condition GE Greater than or Equal to GT Greater Than LE Less than or Equal to LT Less Than EQSC EQual SCan; scans for a literal that is equal to the length of the field NESC Not Equal SCan; scans for a literal that is not equal to the length of the field

Trillium Software System™ Batch User’s Guide Rule File Entries Examples 2-11

Rule File Entries Examples

Example 1 The following INPUT_SELECT example indicates to include only records in which the name ‘JOHN’ is in DDL field name line_01. The EQSC operator indicates to scan the entire field for the string ‘JOHN’ and if found, select this record for further processing:

INPUT_SELECT LIST INP_FNAME01 NODE (line_01 EQSC 'JOHN')

Example 2 The following INPUT_BYPASS example indicates to bypass all records from the first input file (defined in the Converter driver parameter file) that meets any of the following criteria: 1. The DDL field name line_01a contains the text MR JOHN C. 2. Position 1 for two bytes contains NR, or position 3 for two bytes, contains HN. 3. Position 1 for two bytes contains MR.

INPUT_BYPASS LIST INP_FNAME01 NODE (line_01a EQ ‘MR JOHN C’) NODE (1,2 EQ ‘NR’ or 3,2 EQ ‘HN’) NODE (1,2 EQ ‘MR’)

Example 3 The following INPUT_BYPASS example indicates

INPUT_BYPASS LIST INP_FNAME01 NODE (47,12 EQ ‘SEE ATTACHED’) NODE (47,12 EQ ‘See ’) NODE (136,12 EQ ‘DO NOT ’ and 171, 15 EQ ‘ADDRESS CORRECT’)

Converter 2-12 Rule File Entries Examples

Sample Converter Parameter File

**************************************************************************** * PFCONDRV.PAR - Converter parameter file **************************************************************************** OUT_FNAME "..\data\convout" OUT_DDL "..\dict\convout.ddl" OUT_RNAME "INPUT" *MAXIN01 START 1050 PRINT_NTH_COUNT 100 INP_FNAME01 "..\data\input.orig" INP_DDL01 "..\dict\input.ddl" INP_RNAME01 "INPUT" INP_FREQ01 "oraddrl1","M" INP_TRAN01 "oraddrl1","41","C4" ““,"43","FF" INP_FIELD_TRAN_FNAME "..\data\m850u850" INP_FIELD_TRAN01 "..\data\oraddrl2" TO_LOWER_FNAME "..\data\m819l819" TO_UPPER_FNAME "..\data\m819u819" OUT_RECODE "product_code",”L”,"..\tables\prodrecode.table" OUT_MULTI_RECODE "addrline3","LM","postcode","M","addrline3", "M","","","..\tables\addr_line3.recode" OUT_SCAN_RECODE "addrline1","R","INC.","D","Valid_Flag","INVALID" OUT_RANGE_RECODE "product_code","275","325","999" OUT_BUILD_OR_LIST "branch_flag","branch","A","PN1" UPLOW_FNAME "..\data\uplow.tbl" OUT_ARITHMETIC_RECODE "branch_no","L","102500","GT","Valid_Flag","VERIFY" ORDER_OF_OPERATIONS OUT_RECODE, OUT_ARITHMETIC_RECODE, OUT_SCAN_RECODE, OUT_BUILD_OR_LIST, OUT_MULTI_RECODE, OUT_RANGE_RECODE, OUT_CHANGE_RECODE OUT_ARITHMETIC_COMPARE OUT_FIELD_COMPARE

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-13

Converter Parameter Details

Each parameter in the Converter uses a different number of arguments to execute the parameter’s function. Any unused arguments in these parameters must still be included in the parameter syntax; specify an unused argument by entering double quotes (“ ”).

FREQ_PER Parameter

Controls whether frequency percentages are displayed, based on the results of INP_FREQ01-10 and/or OUT_FREQ.

Syntax

FREQ_PER [Field Operator]

Y Display frequency percentages

N Don’t display frequency percentages

Example 1 FREQ_PER displays frequency percentages based on OUT_FREQ:

FREQ_PER Y OUT_FREQ "city","ADC3" where:

For FREQ_PER Y Display frequency percentages For OUT_FREQ cit Count occurrences of the city field y A Count all values D Sort output in descending order

Converter 2-14 Converter Parameter Details

C Sort on the frequency count 3 Show data on the first three entries.

The output from this example would look similar to the following: New York 25 9.90% Boston 17 8.30% Chicago 12 7.70%

See page 2-31 for information on the OUT_FREQ parameter functionality. Example 2 FREQ_PER doesn’t display the frequency percentages of OUT_FREQ:

FREQ_PER N OUT_FREQ "city","ADC3"

For FREQ_PER N Do not display frequency percentages. For OUT_FREQ cit Count occurrences of the city field. y A Count all values. D Sort output in descending order. C Sort on the frequency count. 3 Show data on the first three entries. The output from this example would look similar to the following: New York 25 Boston 17 Chicago 12

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-15

INP_FIELD_TRAN01-10 Parameter

Translates an entire field from one character set to another with case translation. Translations are based only on look-up tables, and the table file name must be included in this parameter.

One code page table translation per Converter execution is allowed by fields.

Fields can be defined individually by input file number. Trillium Software currently provides a selection of code page tables for character set translation.

Syntax

INP_FIELD_TRAN01 [target field] where:

target field The field to be translated from input file 1.

Example INP_FIELD_TRAN01 “oraddrl1” “oraddrl3”

oraddrl1 First field to be translated from input file 1.

oraddrl3 Second field to be translated from input file 1.

Converter 2-16 Converter Parameter Details

INP_FIELD_TRAN_FNAME Parameter

Specifies the name of the look-up table used in character set translations. The table defined by this parameter is used to affect the fields defined by the INP_FIELD_TRAN parameter.

Syntax

INP_FIELD_TRAN_FNAME [path location of look-up table] Example INP_FIELD_TRAN_FNAME “..\tables\m850m819” where:

..\tables\m850m819 Code page file to be used for this translation.

INP_TRAN01—10 Parameter

Translates, within an input field, a particular hex value to another identified hex value. This translation occurs on an “every occurrence basis” for the specified field name.

The maximum number of executions is 200.

Syntax

INP_TRAN01 [target field],[source value],[translate value] where:

target field Field name to translate

source field Current value to translate (input hex value).

translate value Recode value to translate to (output hex value).

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-17

Example In this example, hex value (41) is translated to hex value (C4). In the second translate, hex value (43) is translated to hex value (FF). Notice the absence of a field name on second translate; the Converter assumes the previous field name and uses “” as placeholders.

INP_TRAN01 “oraddrl1”,“41”,“C4” “”,“43”,“FF” where:

oraddrl1 Field name to translate

41 Current value to translate (input hex value)

C4 Recode value to translate to (output hex value)

LENGTH_OVERRIDE Parameter

Used with the OUT_SCAN_RECODE parameter to allow a field to be moved to a field of shorter length.

Syntax

LENGTH_OVERRIDE [Field operator]

Y Activates this parameter; this setting allows for the field in OUT_SCAN_RECODE to fit into a smaller field. If the field is too long, and this parameter has not been invoked, an error message is displayed during program startup.

N Does not allow any overrides of length.

Converter 2-18 Converter Parameter Details

NTH_SAMP Parameter

Specifies the increment sample of records from a file.

Syntax

NTH_SAMP [increment of records to sample from the file] Example NTH_SAMP 25

25 Writes one record to output for every 25 records read from the input file. After the 25th record is taken, every 25th record read is processed thereafter. For exmple, records 25, 50, 75, etc. will be written to output.

ORDER_OF_OPERATIONS Parameter

Specifies the Converter’s Order of Execution (OUT parameters applications) and allows the default order to be altered.

Syntax

ORDER_OF_OPERATIONS [Beginning Converter parameter], [Next Converter parameter],[Third Converter parameter], [Fourth Converter parameter]

If if the ORDER_OF_OPERATIONS parameter is not used,then the Converter follows a default order of operations: 1. OUT_RECODE 2. OUT_ARITHMETIC_RECODE All of the these functions must be 3. OUT_MULTI_RECODE included, even if you only use 4. OUT_SCAN_RECODE 5. OUT_RANGE_RECODE two or three functions. If this is 6. OUT_FIELD_COMPARE the case, include those two or 7. OUT_BUILD_OR_LIST three functions first and add the 8. OUT_CHANGE_RECODE rest after. 9. OUT_ARITHMETIC_COMPARE 10. OUT_TRANSFORM (TrilliumJ only)

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-19

Example For example, if you only want to run OUT_MULTI_RECODE and OUT_SCAN_RECODE, position them first:

ORDER_OF_OPERATIONS: OUT_MULTI_RECODE, OUT_SCAN_RECODE, OUT_RECODE, OUT_ARITHMETIC_RECODE, OUT_RANGE_RECODE, OUT_FIELD_COMPARE, OUT_BUILD_OR_LIST, OUT_CHANGE_RECODE, OUT_ARITHMETIC_COMPARE, OUT_TRANSFORM (TrilliumJ only)

OUT_ARITHMETIC_COMPARE Parameter

Performs an arithmetic comparison on an output field by comparing one field value to another, or to a literal on the same record. All fields must be positive, numeric characters. All answers are absolute values.

The maximum number of executions is 1000.

Syntax

OUT_ARITHMETIC_COMPARE [field1], [Field operator1], [field2], [Field operator2], [mathematical operator], [Comparison operator],[target field], [literal string]

Converter 2-20 Converter Parameter Details

where:

field1 First field, to be compared to second field Field operator1 Field operator value field2 Second of two fields to be compared Field operator2 Field operator value mathematical operator Math operator to use in the comparison Comparison operator Operator for field comparison target field Field name where the character answer to the operation is stored literal string Literal string set in the outfield where the result is TRUE (used only in conjunction with a Comparison Operator)

Field Operators

L Left-justified field R Right-justified field Y Left-pack (remove all spaces between subelements) Z Right-pack (remove all spaces between subelements)

Mathematical Operators Comparison Operators

SUB Subtract GT Greater than DIV Divide GE Greater than or equal to ADD Add LT Less than MUL Multiply LE Less than or equal to EQ Equal to NE Not equal to

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-21

Example 1 In this example, the OUT_ARITHMETIC_COMPARE parameter: Left-justifies the ship_date and order_date fields Subtracts the order_date from ship_date and stores the difference in the days_to_ship field OUT_ARITHMETIC_COMPARE “ship_date”,“L”,“order_date”,“L”,””,“SUB”,”days_to_ship”,””

where:

ship_date First of two fields to be compared. L Field operator that describes a left-justified output field order_date Second of two fields to be compared. SUB Operator to use in the comparison. days_to_ship Field to store the difference between ship_date and order_date

OUT_ARITHMETIC_RECODE Parameter

Performs a character-based comparison of either a field-to-literal values or field-to-field values using a value comparison operator such as GT=Greater than, or GE=Greater than or equal to. If the result of the comparison is True, then you can set another field to a literal string.

Maximum number of executions is 400.

Syntax

OUT_ARITHMETIC_RECODE [field1], [Field operator], [comparison base value], [Comparison operator], [target field], [literal string]

Converter 2-22 Converter Parameter Details

where:

field1 First field to be compared to second field, or literal value Field operator Field operator value Comparison base value Base value to use on comparison Comparison operator Operator for field comparison Target field Field name to set with the literal string Literal string The literal value to store in the target field

Field Operators Comparison Operators

L Left-justified field GT Greater than R Right-justified field GE Greater than or equal to Y Left-pack (remove all spaces between LT Less than subelements) LE Less than or equal to Z Right-pack (remove all spaces between subelements) EQ Equal to M Mask; search the recode table via the mask of the field as described by the following conditions); all alpha characters (A-Z yields A), all numeric characters (0-9 yields N), all others remain the same. U Uppercase

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-23

Example 1 In this example, if the value in the field, branch_no, (which is left-justified) is greater than “102500”, then the literal flag, “VERIFY”, is placed in the field Valid_flag.

OUT_ARITHMETIC_RECODE “branch_no”,“L”,“102500”,“GT”, “Valid_Flag”, “VERIFY”

where:

branch_no Field name to compare value in. L Field operator describing the left-justified output field. 102500 Base value to be used on comparison. GT Operator used in the comparison (GT=Greater than). Valid_Flag Field name to be populated with literal string. VERIFY Literal string to set in the field “Valid_Flag”.

This parameter performs a character-based comparison, so alpha- numeric fields are also acceptable. Please remember that a “1A” would be less than an “11” in EBCDIC and greater than “11” in ASCII.

OUT_BUILD_OR_LIST Parameter

Performs a field populate to build an output field, based on criteria and contents of other output fields. Additional output operations may be invoked.

Maximum number of executions is 800.

Syntax

OUT_BUILD_OR_LIST [target field], [source field], [Occurrence operator], [Field operator]

where:

target field Field name to populate

Converter 2-24 Converter Parameter Details

source field Field from which to derive the contents to populate Occurrence operator Operator to use during the process Field operator Operator to use during the process Occurrence Operators

B When the output field is blank. This operator blanks out the source field. Operator “B” can also follow any of these operators (such as ZB) Z When the output field is zero. Operator “Z” can also follow the ‘B’ operator (such as BZ) This operator “B” places zeroes in the source field, whether or not the condition from the first character is met. X Always performs the operation A Append to existing data in the output field with at least one blank between the existing data if there is any.

Field Operators

L Left-justified field R Right-justified field Y Left-pack (remove all spaces between subelements) Z Right-pack (remove all spaces between subelements) U Uppercase. For the U option, you must add the UPLOW_FNAME parameter to the parameter file. See page 2-49. P * This options means that the parameter will perform functions on the case of the data. It will convert strings from uppercase to lowercase, from lowercase to uppercase or perform mixed-case functions. When using the P option, you must add UPLOW_FNAME, TO_UPPER_FNAME or TO_LOWER_FNAME parameter(s) to the parm file.

* For the P option, the following are accepted operators:

N Name line S Street line G Geography line A Any 1 Use default rule uppercase for first character; use lowercase for remaining 2 Use default rule “AS IS”

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-25

3 Use default rule: all uppercase 4 Use default rule: all lowercase All operators must be used on this option. Field is three-digits:

First position is P

Second position is N, S, G or A

Third position is 1, 2, 3 or 4

The letter designations above are listed in the UPLOW table to define when to perform a case translation. The following list defines only those qualified entries from the UPLOW table that are affected:

PN N and A PS S and A PG G and A PA A

The UPLOW table (customizable, to meet business needs) is included in the base Trillium Software System install in the tables directory. It is used for exceptions to the standard “First uppercase, rest lowercase” rule. For example, Name MCCARTHY should be McCarthy, not Mccarthy.

Example 1 This example uses the parsed data that the Converter has written to output, and puts it into one name field with spaces between each name.

OUT_BUILD_OR_LIST “name_01”,”pr_first_01”,”A”,”” “name_01”,”pr_middle1_01”,”A”,”” “name_01”,”pr_last_01”,”A”,”” “name_01”,”pr_gener_01”,”A”,””

Converter 2-26 Converter Parameter Details

After the data is in one field, you can use OUT_BUILD_OR_LIST and reference the UPLOW table to make the first letter uppercase and the rest lowercase; for example:

OUT_BUILD_OR_LIST “std_name_01”,”name_01”,”A”,”PN1”

Example 2 This example copies the contents of a source field (field2) into a target field (field1), always regardless (‘X’), and left justifies (‘L’) field1.

OUT_BUILD_OR_LIST “target field”,”source field”,”X”,”L”

Beginning contents:

Record Source field Target field 1DOG

2CAT

Ending contents:

Record Source field Target field 1DOGDOG

2CATCAT

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-27

OUT_CHANGE_RECODE Parameter

Performs string and character recoding. Users can analyze within a particular field of data, and then change a value without employing user–defined tables. One or many occurrences of a value may need to be recoded. (From and To lengths are limited to the size of the field.)

The maximum number of executions is 1000.

Syntax

OUT_CHANGE_RECODE [source field], [search value], [replace value], [Field operator] where: source field Field name to be examined. search value Value currently being searched for, so it can be changed. replace value Value that the searched-for field will be changed to. Field operator Operator to use during the process. Used on the location within the field.

Field Location Operators

B Beginning (exact beginning of field) D Default; searches entire field and replaces every occurrence, except when a mask is used) E End (exact end of field) M Activates the Mask option; may be combined with any of the above options (such as BM, DM, or EM)

Converter 2-28 Converter Parameter Details

Example 1 In this example, account numbers prefixed with the department in which they originated are changed to have the correct prefix. OUT_CHANGE_RECODE “account_number”,”BILL”,”AR”,”B” where:

account_number Field name to examine.

BILL Original prefix being searched for.

AR Replacement prefix.

B Location within the field. (B=beginning)

Example 2 This example deals with Masks; “AAANNNN”, in the field, ‘product_code’, is changed to ‘AAA–NNNN’.

OUT_CHANGE_RECODE “product_code”,”AAANNNN”,”AAA–NNNN”,”DM” where:

product_code Field name to examine.

from value; AAANNNN Occurrence of the Mask in the From value.

to value;AAA-NNNN Occurrence of the Mask in the To value.

DM Location within the field. D causes a replacement on every occurrence. M activates the mask recoding.

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-29

OUT_CHANGE_TABLE Parameter

References an external file when calling for OUT_CHANGE_RECODE entries. Entries from the OUT_CHANGE_RECODE parameter can be moved into a user- defined file instead of a parameter file for purposes of neatness.

Syntax

OUT_CHANGE_TABLE [user-defined file] Example In the following example, OUT_CHANGE_RECODE entries are stored in the user-defined file, “scan.txt.” OUT_CHANGE_TABLE “..\data\scan.txt”

The OUT_CHANGE_TABLE parameter references the “scan.text” file when it calls for specific OUT_CHANGE_RECODE entries.

OUT_FIELD_COMPARE Parameter

Performs a field-to-field comparison between fields on the same record, or between one field and a literal on the same record.

The maximum number of executions is 1000.

Converter 2-30 Converter Parameter Details

Syntax

OUT_FIELD_COMPARE [field1], [Field operator1], [field2],[Field operator2], [comparison literal], [Comparison operator], [target field], [Literal String] where:

field1 First field to be compared to second field, or literal value. Field operator 1 Field operator value field2 Second field, to be compared to field1. Field operator 2 Field operator value comparison literal Literal value to compare to the contents of field1. Comparison operator Operator used for field to field comparison, or field to literal comparison. target field Field name to store the value of the literal string. Literal String The literal value to store in the target field.

Field Operators

L Left-justified field R Right-justified field Y Left-pack; remove all spaces between subelements Z Right-pack; remove all spaces between subelements

Comparison Operators

GT Greater than GE Greater than or equal to LT Less than LE Less than or equal to EQ Equal to NE Not equal to

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-31

Example 1 In this example, the parameter left-justifies the fields, ship_date and order_ date. The value of order_date is then compared to ship_date and if they are equal, the value is stored in compare_date.

OUT_FIELD_COMPARE "ship_date","L","order_date","L","","EQ","compare_date", ""

where:

ship_date First field, compared to “order_date.” L Field operator that left-justifies “ship_date” order_date Second field, compared “ship_date.” L Field operator that left-justifies “order_date” “” Unused in this example. EQ Operator to be used in the comparison that asks whether “ship_date” is equal to “compare_date” compare_date Field to store results of the field compare If “ship_date” is equal to “order_date”, then that value is stored.

OUT_FREQ/INP_FREQ Parameter

This parameter counts occurrences of data for specified fields. This parameter can be used as an input data frequency counter (INP_FREQ01—10) or output data frequency counter (OUT_FREQ). Sorting options are available.

The maximum number of executions for each parm is 1000.

Syntax

OUT_FREQ [DDL field to analyze], [Operator, with Sort options 1, 2 and 3]

Converter 2-32 Converter Parameter Details

Operators

A Count all values M Mask (for example, A=ALPHA, N = NUMERIC). N Count blanks or zeros vs. non-blanks or non-zeros

Sort Options 1 (for Frequency Results) The following sort options can be combined after any operator within the quotes to better organize the resulting frequency analysis values.

A Sort ascending D Sort descending U Unsorted

Sort Options 2 (for Frequency Results) Note that neither C nor V option can be used with sort option U.

C Sort on the frequency count. V Sort on the field value.

Sort Option 3 (for Frequency Results)

Numerical value Number of frequencies to return

Example 1

OUT_FREQ “field_name”,“MAC7” where:

field_name DDL (input or output) field to analyze Operator=M Mask, shows alpha or numeric characters and any other special characters Sort Option 1= A Sort ascending Sort Option 2=C Sort on the frequency count Sort Option 3 =7 Show the top seven frequencies

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-33

Example 2 The following are examples of how this routine may be used to analyze different fields. The examples also contain samples of the converter’s subsequent statistics.

To see the different shapes of information in the zip code field, use:

OUT_FREQ “zip”,”MDC5”

This produces the following output:

FIELD zip ANALYSIS NNNNN-NNNN 151100 NNNNN 6500 NNNNNNNNN 2550 NNNN 1928 NNNNN NNNN 1557

Converter 2-34 Converter Parameter Details

To find the five most common states in the database, use:

OUT_FREQ “state”,”ADC5”

This produces the following output:

FIELD state ANALYSIS

NY 14450 CA 12560 MA 9558 NJ 6599 OR 3611

OUT_MULTI_RECODE Parameter

Employs a user-definable multi-field table to perform one- or two-field character lookup on data to recode one or two other fields. This parameter serves as a powerful investigational tool to examine inconsistencies in your data or changes that might need to be done; such as If (a) occurs and (b) occurs, change (c). For re-engineering data, especially legacy data, it is imperative to have such functionality as a multi-field table lookup, in conjunction with a recode function built into one operation.

Syntax

OUT_MULTI_RECODE [field1], [Field operator], [field2], [Field operator2], [target field1], [Field operator3], [target field2], [Field operator4 (optional)], [file name]

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-35

where:

field1 First field to be compared (to second field or literal value) Field operator 1 Field operator value field2 Second fields, to be compared to first field (Optional) Field operator 2 Field operator value target field1 Field to be affected by the table Field operator 3 Field operator value target field2 Second field to be affected by the table (Optional) Field operator 4 Field operator value File name File name of the table to use for arguments 1—8

Field Operators The field operators act on the original data before comparison to table values occurs.

L Left-justified field R Right-justified field Y Left-pack; remove all spaces between subelements Z Right-pack; remove all spaces between subelements M Mask; search the recode table via the mask of the field as described by the following conditions: All alpha characters (A-Z yields A) All numeric characters (0-9 yields N) All others remain the same U Uppercase

Converter 2-36 Converter Parameter Details

Example 1 In this example, this OUT_MULTI_RECODE left justifies the field “addr_line3” and looks for masks in “addr_line3” and “postcode” as specified in the sample multi-field table below. If both masks are true, the mask shape of “addr_line3” is reset according to the table specifications. In this case, “addr_ line3” is blanked out. When using masks, the scan field(s) must be the same field as the recode field.

OUT_MULTI_RECODE “addr_line3”,“LM”,“postcode”, “M”, “addr_line3”, “M”, “”,“”, “../tables/samp_table.recode” where: addr_line3 Field name to look up in table LM Field operators (L=left-justified and M=Mask) postcode Field name to look up in table M Field operator. addr_line3 Field affected by table M Field operator. M=mask. “” Not used in this example “” Not used in this example “../tables/samp_table.recode” Name of the table to use for arguments 1-8.

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-37

Sample Multi-Field Table # 1 Each entry in this table should be as wide as the sum of the fields named in arguments 1, 3, 5, 7. An example would be:

*************************************************************** Blanks out addr_line3 when it appears to be a duplication of postcode and town information * position 1-30 addr_line3 (mask) * position 31-35 postcode (mask) * position 36-65 addr_line3 recode (mask) * Note that each row should be padded out with blanks to position 65 ******************************************************************* *2345678901234567890123456789012345678901234567890123456789012345 NNNNN AAAAA AAAAAA NNNNN

Assuming, for the input: addr_line3 = 12345 KUALA LUMPUR postcode = 12345

This entry tests if addr_line3 has the mask NNNNN AAAAA AAAAAA and postcode has the mask of NNNNN. If so, addr_line3 is blanked out.

OUT_RANGE_RECODE Parameter

This recode parameter is used when the user needs to analyze a particular field of numeric data and determine whether or not numeric values fall between two values. If the Converter finds a field falls between a low value and a high value, it would then recode from another defined value in the same field.

The maximum number of executions is 200.

Syntax

OUT_RANGE_RECODE [field1], [low value], [high value], [recode numeric

Converter 2-38 Converter Parameter Details

value] field1 Field name to be examined low value Lowest value (inclusive to examine) high value Highest value to include in the recode recode numeric value Value to recode to

Example If, in the field, ‘product_code’, a numeric value between “71250” and “72750” is found, then recode that numeric value to “97250”.

OUT_RANGE_RECODE “product_code”,”71250”,”72750”,”97250” product_code Field name to examine 71250 Lowest value (inclusive to examine) 72750 Uppermost value to include in the recode 97250 Value to recode to

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-39

OUT_RECODE Parameter

Employs a user-definable table and field operator to recode a particular field. For example, when old legacy terms are no longer desired in final data, a simple one-to-one recode can be performed using older data terms on the left, with target data listed to the right. The Converter:

1. Looks up fields and field operators 2. Performs an operation 3. Makes an adjustment prior to lookup. 4. Compares the data terms on the left side against the target data file on the right. 5. Finds its related recode and performs the recode (either in the same data location or in a separately defined data field).

The maximum number of executions for OUT_RECODE is 200.

Syntax

OUT_RECODE [field1], [Field operator], [table value] where:

field1 First field name to be looked up in the table

Field operator The operator to use

table value The recode table to lookup the field name

Converter 2-40 Converter Parameter Details

Field Operators

U Convert the data from the input field to uppercase before looking in the table.

Example 1 OUT_RECODE “product_code”,”U”,”../tables/prodrecode.table” where:

product_code Field name to look up in the table U Uppercase field operator ..\tables\prodrecode.tab Recode table to use to look up the field name value and le recode it to a new value as entered in the table. Sample Recode Table In this example, the table recodes the product_code field values from 025 to 999, 125 to 999, 126 to 901, and so on.

****************************************************************** * Sample Recode Table - prodrecode.table * This table recodes the product_code field * Note: the size of the recode table must be exactly the width of the two values (no spaces allowed between columns) ****************************************************************** 025999 125999 126901 127904

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-41

OUT_SCAN_RECODE Parameter

Scans for a given string of occurrences within substrings of a field. Can be used to remove unwanted punctuation, and unwanted data in a field so that you can effectively clean up your data for better re-engineering purposes. Flags can also be set to indicate particular data conditions.

The maximum number of executions is 3000.

Syntax

OUT_SCAN_RECODE [field1], [Field operator], [data string1], [scan operator], [recode field], [literal value] where: field1 Field name to be scanned. Field operator Field operator value data string1 Data string for which to scan the field scan operator Scan operator value recode field Field to be recoded as a result of a positive occurrence of the scan Literal string Literal value to be put into (Valid_Flag) on a TRUE occurrence

If the subsequent field name is “”, the Converter assumes the previous field name.

Converter 2-42 Converter Parameter Details

Field Operators

The following operators are all applied BEFORE scanning begins. For example, If L is entered, the Converter left-justifies the fields before searching

When processing the L/R/Y/Z field operators, the Converter changes double-byte spaces to single-byte spaces.

L Left-justified field R Right-justified field Y Left-pack; remove all spaces between subelements Z Right-pack; remove all spaces between subelements M Mask; search the recode table through the mask of the field as described by the following conditions: All alpha characters (A-Z) = yields an “A” All numeric characters (0-9) = yields an “N” All others remain the same. U Uppercase

OUT_SCAN_RECODE uses three levels of scan operators to effectively reorganize and clean up data.

Code Description Scan Operator—Level 1 B Beginning of field D Default, or anywhere within the field E End of field or after last blank

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-43

Code Description T Between. Finds two occurrences of the value specified and operates on those values and all data in between. Special functionality involving scanning of substrings: The scan string can be divided into two substrings, which allows separate delimiters to be used for scanning. Adding a numeric value after any of the ‘T’ options (such as T2) activates this special function. The numeric value indicates the length of the substring within the scan string that will be used as the first delimiter. Remaining characters of the scan string form the second delimiter. For example: OUT_SCAN_RECODE “addr_line1”,“R”,“<-->”,“TC2”,“”,”” In this example, the length value of 2 indicates the substring “<-” is the initial delimiter and the “->” is the second delimiter. Scan Operator—Level 2 C Cuts scan value from field and stores in new target field R Copy or reproduce scan value from field S Deletes scan value from field. There is no additional storage of value to a new field

Converter 2-44 Converter Parameter Details

Code Description Scan Operator—Level 3 F Field move. Moves the entire field contents to new target field. L Selects from the end of the scanned string; then moves left to the beginning of the field. P Performs the operation form scan operator 2, preceding the scanned value on the field. T Terminate. Performs operation from this point forward; operation occurs after scan string. E When used with scan operator C, cuts scan for string and everything after it to the end of the field. When used with operator S, space/blanks go to the end of the field. W Word move. Moves an entire word to a new tagged field. Scan Operator DS Uses a scan to find a string anywhere in the field, then blank/space the scanned-for string. DSE Uses scan to find a string anywhere in the field, then blank/space from the beginning of the scanned-for string to the end of the field. BCF Searches for a string at the beginning of the field, then cuts the entire field contents and replaces in the new target field.

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-45

Scan Operator Examples The following are examples of valid scan operator strings: BCF DRW ESL Certain Level 3 scan operator codes won’t work with Levels 1 and 2 combinations; for example, the following are examples of invalid scan operator strings: BCL, ECE, ECT

Table 2.2 Scan Operator Combinations Level Operators Level 1 B D E T Level 2 C C C C Level 3 E E Level 3 F F F Level 3 L L Level 3 T T Level 3 W W W Level 3 P P Level 2 R R R R Level 3 E E Key Level 3 F F F R Reproduce S Delete Level 3 L L B Beginning of field F Field move Level 3 T T C Cut T Terminate Level 3 W W W D Default T Between Level 3 P P P Precede value W Word move Level 2 S S S S E End of field L Left scan Level 3 E E Invalid combination Level 3 F F F Level 3 L L Level 3 T T

Converter 2-46 Converter Parameter Details

Example 1 In this example, this OUT_SCAN_RECODE right-justifies and then scan the field, “addr_line1” for the literal, “INC.” If the literal is found anywhere in the field, the literal, “INVALID”, will be put into the field, “Valid_flag.”

OUT_SCAN_RECODE “addr_line1”,“R”,“INC.”,“D”,”Valid_Flag”,“INVALID”

where:

addr_line1 Field name to scan

R Right-justified field operator

INC. Particular string to scan the field (addr_line1) for

D Position type (D=Default).

Valid_Flag Field to recode as a result of a positive occurrence of the scan

INVALID Recode value to be put into (Valid_Flag) on an occurrence

If the subsequent field name is “”, the Converter assumes the previous field name.

OUT_SCAN_TABLE Parameter

References and calls that table when calling for OUT_SCAN_RECODE entries. For neatness purposes, entries from OUT_SCAN_RECODE can be moved to a user- defined file instead of a parameter file.

Syntax

OUT_SCAN_TABLE [user-defined file]

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-47

Example In the following example, OUT_SCAN_RECODE entries are stored in the user-defined file, scan.txt:

OUT_SCAN_TABLE “..\data\scan.txt”

Sample scan.txt File

OUT_SCAN_RECODE “oraddrl6”, “M”, “NNNNNNNNNNNNN”, “DCW”, “workphone”, ““ “oraddrl6”, “M”, “NNNNNNNNNN”, “DCW”, “workphone”, “ “oraddrl6”, “M”, “+NNNNNNNNNNN”, “DCW”, “workphone”, ““ “oraddrl6”, “M”, “NNNN NNNNNNNNNN”, “DCW”, workphone”, ““ “oraddrl6”, “M”, “NN NN NN NN NN”, “DCW”, “workphone”, ““ “oraddrl6”, “M”, “NNNN N NN NN NN NN”, “DCW”, “workphone”, ““

TO_LOWER_FNAME Parameter

Translates characters to all lowercase characters. This parameter is required with the “P” and/or “U” option in the OUT_BUILD_OR_LIST parameter and is a call to a table recoding function which contains the uppercase/lowercase translate information.

Syntax

TO_LOWER_FNAME [path that contains location of translate table]

Converter 2-48 Converter Parameter Details

Example In the following example, the name Kevin McCarthy (mixed-case) becomes kevin mccarthy (all lowercase):

TO_LOWER_FNAME “..\tables\m819l819” where:

m819l819 Table that contains character set translations; ships with most Trillium Software System installations. Check the base install \tables directory.

TO_UPPER_FNAME Parameter

Translates characters to all uppercase characters. This parameter is required with the “P” and or “U” option in the OUT_BUILD_OR_LIST parameter and is a call to a table recoding function which contains the uppercase/lowercase translate information.

Syntax – (uses one argument)

TO_UPPER_FNAME [path that contains location of translate table] Example In the following example, the name Brian McVay (mixed-case) becomes BRIAN MCVAY (all uppercase):

TO_UPPER_FNAME “..\tables\m819l819” where:

m819u819 Table that contains character set translations; ships with most Trillium Software System installations.

Trillium Software System™ Batch User’s Guide Converter Parameter Details 2-49

UPLOW_FNAME Parameter

Calls to a table recoding function which contains the uppercase/lowercase recode information. This parameter is used in conjunction with the “P” option in OUT_BUILD_OR_LIST.

Syntax

UPLOW_FNAME [path that contains recode table location] Example

UPLOW_FNAME “..\tables\uplow.tbl” where:

uplow.tbl Table of exceptions for uppercase/lowercase rule employed by OUT_BUILD or OUT_LIST.

Sample UPLOW Table The table must be 101 bytes in total length, not including CRLF.

String qualifier Input Desired output (1 byte) (50 chars) (50 chars) N (= Name) MR Mr.

S (= Street) ST St.

G (= Mass MA Geography)

A (= Any) MCMAHO McMahon N

The specific qualifier N,S,G is attempted first and if not found in the table, a lookup is then attempted for A. The table should be padded out to full width with blanks (no tabs).

Converter 2-50 Running the Converter on UNIX and 32-bit PCs

Running the Converter on UNIX and 32-bit PCs

To execute the cfcondrv program, use the following command:

cfcondrv -parmfile parm_file_name -parmecho echo_file_name where:

cfcondrv Name of the Converter driver

-parmfile Keyword to indicate that the parameter file follows

parm_file_name Driver parameter file

-parmecho Keyword to indicate that the parameter echo file follows

echo_file_name Displays any parameter processing errors in program listing file (Optional) where: Example

cfcondrv -parmfile ..\parms\pfcondrv -parmecho ..\data\echo

Trillium Software System™ Batch User’s Guide IBM Mainframe Converter Sample JCL 2-51

IBM Mainframe Converter Sample JCL

The following figure illustrates running cfcondrv on an IBM Mainframe platform:

//********************************************************** //* //* SAMPLE JCL TO RUN CONVERTER PROGRAM (CFCONDRV) //* //********************************************************** //CFCONDRV EXEC PGM=CFCONDRV,REGION=5500K,PARM='/-PARMFILE PF', // REGION=0m //STEPLIB DD DSN=&BASEPREF.&TRILVER.LOADLIB,DISP=SHR // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //PF DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFCONDRV) //CNVINPUT DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.DATA.INPUT //OUTPUT DD UNIT=&UNIT,DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=639,BLKSIZE=22365), // SPACE=(TRK,(5000,100),RLSE), // DSN=&PROJPREF.&TRILVER.US.DATA.CONVOUT //INPUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(INPUT) //STDINPUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(CONVOUT) //STATFILE DD SYSOUT=* //SYSOUT DD SYSOUT=* //TRILMSGS DD DUMMY

Figure 2.1 Sample cfcondrv JCL

Converter 2-52 Converter Error Messages

Converter Error Messages

This table lists error messages returned by the Converter program. Table 2.3 CFCONDRV Error Messages

Message Description

Must have a parm file. The run command does not contain a parameter file after the keyword “-parmfile”

Parm Processing Error1, The parameter file name given is missing or invalid. ca->status = 2. Check the path and file name syntax for the command line string.

Parm Processing Error1, The parameter file contains an invalid parameter name ca->status = 4. in the left column of the file. Check that all parameter names are used only once and that the first column of the file contains only parameter names or spaces.

Parm Processing Error1, The parameter file contains a duplicate parameter ca->status = 6. name. Parameters can only appear once in a file.

Parm Processing Error1, A parameter that requires quotations around parameter ca->status = 11. values, is missing one or more quotations.

Parm Processing Error1, There is a parenthesis in an illegal location. Check the ca->status = 16. syntax of parameters that use parenthesis to ensure they are correct.

Parm Processing Error1, A parameter that must have a numeric value as the first ca->status = 17. value contains a value that is not numeric.

Parm Processing Error1, A parameter that must have a numeric value as the last ca->status = 18. value contains a value that is not numeric.

Unable to open statistics file. The file name defined by STAT_FNAME is missing or invalid. May also be caused by permission problems with the file.

Missing output DDL filename or The file name defined by OUT_DDL is missing or record name. invalid, or the DDL name defined by OUT_RNAME does not match the record name in the DDL.

Trillium Software System™ Batch User’s Guide Converter Error Messages 2-53

Table 2.3 CFCONDRV Error Messages

Message Description

Can't open output file. The file name defined by OUT_FNAME is missing or invalid. May also be caused by permission problems with the file.

Parm Processing Error2 Problem with the entry formats in the file defined by Parm Processing Error3 OUT_SCAN_TABLE. Entries must be made according to the rules for OUT_SCAN_RECODE.

Only A, N or M is supported When using the parameters OUT_FREQ or INP_FREQ, for frequency. the first character of the second argument must be an A, N, or M.

Only 0-9 and A-F are valid. When using INP_TRAN, only the hex values of 0-9 and A-F may be used in the second and third arguments.

The INP_FIELD_TRAN_FNAME The user has defined INP_FIELD_TRAN but has parm is required for translate. omitted the required corresponding INP_FIELD_ TRAN_FNAME file name.

TO_UPPER_FNAME required for When using OUT_BUILD_OR_LIST, and the U or P U operator. operators are used in the fourth argument, the user TO_UPPER_FNAME required for P must also specify the translate table file names using operator. TO_UPPER_FNAME and TO_LOWER_FNAME. TO_LOWER_FNAME required for The tables defined here are the uppercase and P operator. lowercase translation tables.

P option must be followed by 1, When using OUT_BUILD_OR_LIST, and the P 2, 3, or 4. operator is used in the fourth argument, the second P option must be followed by N, position of the argument must be N, S, G, or A. The S, G, or A. third position must be held by one of the values 1, 2, 3, or 4. cv_DuplicateEdb Failed. Problem with the seqno field as defined in the output cv_PutStringValue Failed. DDL. Ensure the field is defined as ASCII NUMERIC (or EDCDIC NUMERIC) and that the field length does not exceed beyond the overall length defined for the DDL.

Converter 2-54 Converter Error Messages

Table 2.3 CFCONDRV Error Messages

Message Description

Data Dictionary open problems. The program could not open the DDL defined to the program. Either: File defined is not a DDL file File defined is a DDL file but is not the correct format (ASCII or EBCDIC) The DDL file is missing the header information

START parm record number The user has defined a value for the START parameter > number of records in file that is greater than the total number of records in the input file.

Invalid order_of_operations. One of the parms in ORDER_OF_OPERATIONS is misspelled.

Incorrect order_of_operations. Required parameter has been omitted from ORDER_OF_OPERATIONS.

“P” option only supports lengths Parameter OUT_BUILD_OR_LIST when used with “P” <= 100. option, supports character translations in fields less than or equal to 100 bytes in length.

Illegal string length in OUT_ If the length value given is larger than the length of SCAN_RECODE option xxx. the scan string, the program is exited with an exit code of 99.

Trillium Software System™ Batch User’s Guide CHAPTER 3 Global Data Router

The Global Data Router is a powerful function that scans incoming data for information that indicates the country of origin. The program is able to take an input file containing records from multiple country sources and disseminate them accordingly. This ensures further processing is done at a country-specific level.

After a record has been identified as belonging to a specific country, the record is written to a file containing records from that country. Using this tool before parsing allows records to be sorted by country of origin. The data can then be parsed on a country-by-country basis, improving the parsing results over what could be obtained if records from all countries were contained in the same file.

The Router uses a parameter file (pfrouter.par) to define all input and output files and the expected country of origin for the data to be processed. In addition, a file called the Rules file contains specific country word definitions used to improve the decision making process. The rules file is also where tables containing country-specific data are defined to the program.

Global Data Router 3-2 Global Data Router Design Flow

Global Data Router Design Flow

Input/Output Resources

Input Driver parameter file pfrouter.par See Table 3.1, “Global Data Router Parameters,” on page 4 for more information. Rules file router.rules See “Rules File Parameters” on page 3-9 for more information. Input DDL input.ddl Input file Contains data from multiple countries. Output Output file Contains all the input data split into the proper country.

Trillium Software System™ Batch User’s Guide Parameter Syntax 3-3

Parameter Syntax

Parameters are defined to the program in an external file. All parameters are optional unless specified.

Output file names use the format . where:

filename User-defined by specifying the OUT_FNAME parameter.

countryname Assigned by the two-letter country identifier specified by the COUNTRY_LIST parameter.

To enable the country name to be appended to the file name portion, the user must position an asterisk in the OUT_FNAME parameter definition.

For example, if the output file name defined was router_* and the country identifier was US, then the output file name for records assigned to this country will be router_US.

See “IBM Mainframe Execution” on page 3-25 for more information about allocating output datasets for the IBM Mainframe.

Global Data Router 3-4 Global Data Router Parameters

Global Data Router Parameters

Table 3.1, “Global Data Router Parameters” describes the parameters contained in the pfrouter.par parameter file. The REQUIRED parameters are indicated in shaded. Table 3.1 Global Data Router Parameters

Parameter Values Description

ALTERNATE_ encoding Contains a list of possible encodings for the input ENCODINGS value text. This is useful if the user has a database that was created from multiple suppliers, and there is a mix of encodings in the database. The Router tries each of the possible encodings when it picks up the text, and gets the maximum weight from the possible encodings. Used only in , , and .

CONNECTOR_ Y, N Indicates that the user wants a return value for the RETURN connector. Rather than list all countries that received weight, only the country that received the most weight is returned. Input and output code pages are passed back so that the connector can properly transfer data back to the source. For example: CONNECTOR_RETURN Y The Router returns a string such as: cz utf8 cp1250 where: cz = the country utf8 = the code page after input data 1250 = the code page to convert the output data

COUNTRY_CODE_ field name Field name defined by the input DDL, (where on the FIELD record to look for a country code).

Trillium Software System™ Batch User’s Guide Global Data Router Parameters 3-5

Table 3.1 Global Data Router Parameters

Parameter Values Description

COUNTRY_LIST Two-letter List of countries to search; also the country identifier country code used as part of the file name. For example: United States = US To improve performance, list countries in the order of the most probable occurrence first followed by the next most likely. So if the file contains 90% US records and 10% Canadian records, then the order should be US,CA. See the note following this table for a country list.

DELIM delimiter Delimiter if the input is a delimited file. Contains four character delimiter characters that can be spelled out as alphabetic: space = spa tab = tab comma = csv pipe = pipe Output files will always be created as fixed-length record files.

FIELDS fields Field names, defined by input DDL, to scan from input data. Field order affects the processing order of the data. For example: FIELDS line_01, line_02

GLOBAL_GEOG_ table name Name of the Global Geography Table, which contains FNAME state, city, locality, dependent locality, postal code information and word and pattern structures. For example: GLOBAL_GEOG_FNAME GLOBGEOG

INP_DDL file name Name of the DDL file that describes the input data layout.

INP_FNAME file name Name of the input file; for example ..\data\input.txt

LOG_FNAME log file name Name of the log file that tracks every weight added.

Global Data Router 3-6 Global Data Router Parameters

Table 3.1 Global Data Router Parameters

Parameter Values Description

LOWERCASE_ Y, N Makes the country ID lowercase when it is replaced in COUNTRY_ID the output file name. For example: LOWERCASE_COUNTRY_ID Y

LOG_NTH_COUNT numeric value Limits the number of records written to the log file when running large record volumes. Every NTH record will be written to the log file.

MAXIN numeric Maximum total number of records to be read from the start of the file.

NEWLINE Y, N Indicates that file has newline terminated records, rather than fixed length. Delimited files MUST be newline terminated, so this is assumed if “delim” is used.

NOMAP Y, N Y - Turns off the memory mapping feature.

N (default) - Allows the system to maintain control of the file space, instead of allocating memory for it.

NOMATCH_FNAME file name File name if record does not get routed to any of the countries in the COUNTRY_LIST parameter.

OUT_DDL file name File name of output data dictionary. If this is not specified, then use INP_DDL for output.

OUT_FNAME file name File name of output file. If there is a ‘*’ in the name, it will be replaced with the country name from the country list.

POSTCODE_FIELD field name Allows specification of a specific field to get the postcode from. Ensures that the Router does not get confused by postal codes that might appear in the incorrect position, if specified as part of the “field” parameter. For example: POSTCODE_FIELD zipcode

POSTCODE_MASK file name Overrides the position part of the POSTCODE_MASK values that occur in the rules file. This allows the user to input data in the same order for all countries.

Trillium Software System™ Batch User’s Guide Global Data Router Parameters 3-7

Table 3.1 Global Data Router Parameters

Parameter Values Description

POSTCODE_ user-defined Value specified in this parameter will override the POSITION value positional value of the postcode entered in the country. Useful if all data is in the same format. For example: POSTCODE_POSITION rs PRINT_NTH_ numeric Prints to the screen “Processing record xxx” COUNT every n records

RULES_DDNAME rules file name File name of rules file. This contains all the rules used to rout the record to the appropriate output. See the section “Global Data Router Rules File” on page 3-8.

RULES_ECHO file name File name to echo all the statements in the rules file

SAVE_COUNTRY_ field name Name of field, defined in the DDL, to save the output FIELD string of which countries were possibilities. The list will contain the two-letter country code separated by spaces. If the field length is set to 2, then only the most likely country will be saved

SAVE_WEIGHT_ field name Name of field, defined in the DDL, to save weight FIELD value.

SKIP_DUPLICATE_ Y, N Indicates that the router should not look up items in CITY_NAME DUPLICATE_CITY_NAME. For example: SKIP_DUPLICATE_CITY_NAME Y

START numeric Record number to begin processing.

* The countries supported in COUNTRY_LIST are included in the list below: Argentina AR Germany DE Portugal PT Australia AU Hong Kong HK Saudi Arabia SA Austria AT IN Singapore SG Belgium BE Ireland IE South Africa ZA Brazil BR Italy IT Spain ES Brunei BN Jamaica JM Sweden SE Darussalam CA Malaysia MY Switzerland CH

Global Data Router 3-8 Global Data Router Rules File

Chile CE Mexico MX United Arab AE Emirates Colombia CO Netherland NL United Kingdom UK s Czech Republic CZ New NZ United States US Zealand Denmark DK Peru PE Venezuela VE France FR Philippines PH

Sample Global Data Router Parameter File

****************************************************** parm file for cfrouter ****************************************************** COUNTRY_LIST us,ca,es,uk,nl,au,fr,po,de,ph,sa,sg,sr,sw,it,be,br,ch, FIELDS oraddrl7,oraddrl6,oraddrl5 GLOBAL_GEOG_FNAME ..\..\tables\GLOBGEOG INP_DDL OUT_DDL OUT_FNAME INP_FNAME ..\data\RounterInput.dat LOG_FNAME ..\data\log NOMATCH_FNAME ..\data\nomatch RULES_DDNAME ..\parms\router.rules RULES_ECHO ..\data\rules_echo

Global Data Router Rules File

This rules file is specified in the RULES_DDNAME parameter.

RULES_DDNAME ..\parms\router.rules

Trillium Software System™ Batch User’s Guide Rules File Parameters 3-9

Rules File Parameters

The Rules file contains entries that define the resource tables used by the program as well as specifics about country data.

A comment line can also be added by including an asterisk (*) on the first left position of the line.

For example: * This is a comment line.

The Router matches data from the incoming record to lookup tables that contain data specific to a country. When a match to a lookup table is obtained, a “weight” is assigned to the word or phrase that has been identified. This “weight” system is used to determine the probability that the word or phrase identified belongs to the country from which table the match was made. The weighting system is controlled by the user through specific parameters in the rule file.

The organization of these parameters indicates to which country a parameter value applies. An example of this as well as additional detail about constructing a rule file is provided with the rule file example.

Parameter Value Description

ADD_ENDING word Allows you to define endings of words that indicate endings the data is from a particular country. These get added to the same lookup table as entries in the GGT that start with a ‘-‘. If found, the WEIGHT_FOUND_ ENDINGS value is added. For example: ADD_ENDING shire WEIGHT_FOUND_ENDINGS 50 In this example, if the ending “shire” is found, a weight of 50 is added.

Global Data Router 3-10 Rules File Parameters

Parameter Value Description

ADDITIONAL_ word, Assigns additional weight to a particular word in a WEIGHT numeric country. (Useful if the same city name appears in multiple countries, and the current rules keep assigning the record to the wrong country.) In this example, “Barcelona” is a major city in Spain, but a minor name in the United Kingdom. For example: ADDITIONAL_WEIGHT Barcelona,20 Using this parameter lets you indicate for it to go to Spain, if more than one possibility.

ALIAS user- This parameter tries different variations of a particular defined word. If one of the listed values is matched, the word program tries substituting all of the other entries when looking up the string in the GGT. The number of entries in the alias list indicates how many attempts would be made to find the maximum value of the weight for a particular record. This is useful in a country such as Portugal, where the name of the city might have one variation, but the data was entered with another. For instance, the city name “Aguiar da Beira”, if entered as “Aguiar de Beira”, would still be found. An alias must start and end with either a space or a hyphen. For example: ALIAS do,de,da ALIAS san,santa,saint

CLIENT_WDPAT_ file name File name that contains client exceptions to the GGT. FNAME Only cities and neighborhoods that look similar to the following are used by the Router: 'BIGGINHILL' GEOG BEG ATT=NEIGHBORHOOD, RECODE='BIGGIN HILL' 'BEWFLEET' GEOG ATT=CITY,RECODE=BENFLEET 'NCPINKHILL_' GEOG ATT=CIT- CHG,RECODE='NCPINK HILL'

Trillium Software System™ Batch User’s Guide Rules File Parameters 3-11

Parameter Value Description

COUNTRY country Two-letter code defined by the user to identify the code matched country

Each time this parm appears in the rule file, all successive parameters in the “parameter block” will be associated with this country, until the next COUNTRY parameter is found (see rules file example).

Example

COUNTRY ES COUNTRY_NAME SPAIN POSTCODE_MASK LC,NNNNN WEIGHT_NO_POSTCODE_MATCH0 WEIGHT_NO_STATE_MATCH 0 IGNORE_CITY_NAME VIA RUA SAO PONTE

COUNTRY_ user- Client-specific value representing the country CODE defined (expected values in country_code_field) This value(s) parameter inserts the character string that is used internationally to identify this country. The position of the country code on the record is defined by the DDL. If found, the value defined in WEIGHT_COUNTRY_ CODE is added to the total weight. This parameter can have multiple values. In this example, if the code usa is found, then a weight of 50 is added: COUNTRY_CODE usa WEIGHT_COUNTRY_CODE 50

Global Data Router 3-12 Rules File Parameters

Parameter Value Description

COUNTRY_ country If name(s) listed here are found, the weight value of NAME name WEIGHT_COUNTRY_NAME is applied. If the full name of the country is found anywhere in the record, under one of the following conditions, then the value in WEIGHT_COUNTRY_NAME is added to the total: At the beginning or end of the data Preceded and followed by a space or comma If a country name is likely to occur in the data, then it is best to set this to 0 (for example, if “Canada” is likely to be a street name in the country of that name, set it to 0). You can have multiple names for the country if there are multiple languages within the country. For example: COUNTRY_NAME ENGLAND WALES SCOTLAND

COUNTY_ country Contains county prefix name(s). In some places, the PREFIX prefixes Global Geography Table has the state names entered as counties, with the word “County” in the table. Very often the data will not contain this word. This allows for a match even if the word is not present. For example, in Ireland, the city of Sutton is in County Dublin. However, data may come in as “Sutton Dublin”, and not as “Sutton County Dublin”. For example: COUNTY_PREFIX county

DELIM delimiter Specifies the character to use as a delimiter between character fields on the input file. Output files will always be created as fixed-length record files.

DROP_PERIOD Y When set to ‘Y’, performs the following: If a period is followed by a space, the period is removed before processing. If a period is in the middle of a word, the period is replaced with a space. For example: DROP_PERIOD Y

Trillium Software System™ Batch User’s Guide Rules File Parameters 3-13

Parameter Value Description

DUPLICATE_ user- Lets you override the order in COUNTRY_LIST, for CITY_NAME defined specific cities that appear in multiple countries. The values names of all cities that appear in more than one city table have been extracted to this table. Entries use the following format: City name, country1, country2 Countries are listed in the most likely order that the city would be in. The Router only returns a weight for the country that is highest ranked in the DUPLICATE_ CITY_NAME entry. For example: DUPLICATE_CITY_NAME Toronto,CA,US,AU COUNTRY_LIST us,ca

If the input record contained just “Toronto”, with no state or province identifier, and no postcode mask, then the Router would only give weight to Canada, even though the US is first in the country list.

GEOG_CITY_ Y When a match is made of a city name on the input CHG_RECODE data to a city-change entry in the Global Geography Table, the recoded city name from the table entry will be used. This parameter turns off this service so that the original city name remains on the record. For example: GEOG_CITY_CHG_RECODE Y

Global Data Router 3-14 Rules File Parameters

Parameter Value Description

GEOG_PREFIX user- Enables the following: defined Extract entries from the Global Geography Table value Remove prefixes Create a lookup table entry that would change data into the full name with the prefix For example, in Canada, there are many cities with French names beginning with L’, such as L’Ardoise. Sometimes the prefix is not included. By defining a list of prefixes and creating the prefix table, the program automatically converts Ardoise into L’Ardoise, and matches it to the GGT entry. If there are multiple entries, separate with commas. For example: GEOG_PREFIX de , l’ Here, the prefix “de” includes a space. That means that the program would find any usage of this without the space. So if the GGT contained “DE WINTON”, the program would match up with “DEWINTON” in the data, but not with “WINTON”. If the GGT contained “L’ARDOISE”, the program would match up with “ARDOISE” in the data. If there is a space, the program drops it and attempts a match. If there is no space, it drops the whole prefix and attempt a match. If the prefix “de” is added and then the entire phrase is looked up, it would find “DE LERY” before “LERY”. Both are city names in Canada, but LERY is in , and DE LERY is in .

GEOG_ Y Allows for recodes from the Global Geography Table to RECODE be applied to the input string. The program uses recode and phrases that are found in the Global Geography Table. For example: GEOG_RECODE Y If the Global Geography Table contained: ‘ONT’ RECODE ’ON’ then, if we find the word ‘ont’ on the input, we will replace it with ‘on’.

Trillium Software System™ Batch User’s Guide Rules File Parameters 3-15

Parameter Value Description

IGNORE_CITY_ user- Lets you add words that will not be considered city NAME defined names, unless there is a supporting postal code or value state name. For example: IGNORE_CITY_NAME de rua A good example is the Brazilian section of the GGT which has cities named “De” and “Rua”. These are very common words in street addresses in this country.

LOAD_FILES Y Enables the Global Geography Table to be loaded to memory at program startup. Typically NOT used in batch.

MATCH_ Y When set to Y, this parameter allows a match between HYPHEN_ a data entry that has been entered in the Global SPACE Geography Table with hyphens, and a data entry that has spaces (as well as the reverse scenario). For example: MATCH_HYPHEN_SPACE Y If the Global Geography Table contains “VILLARS- SUR-GLANE”, the string “Villars Sur Glane” will match it. This eliminates many table recodes.

MAX_ numeric Contains the maximum number of postal codes to add ADDITIONAL_ to the weight. POSTCODES This parameter will limit the amount of weight added when a city name has been identified. This works in conjunction with the parameter WEIGHT_ ADDITIONAL_POSTCODES. See the description for WEIGHT_ADDITIONAL_ POSTCODES for an explanation of the interaction of these two rule parameters. For example: MAX_ADDITIONAL_POSTCODES 50

MIN_WEIGHT numeric Sets the minimum weight value that must be accumulated for the record to be assigned to a country. For example: MIN_WEIGHT 70 If the total weight for a country is less than this value, a match will not be made.

Global Data Router 3-16 Rules File Parameters

Parameter Value Description

RECODE user- This parameter acts like a recode parameter in the defined Global Geography Table. If the program finds the first value value, it changes it to the second. For example: RECODE S.,Santa In this case, if ‘S’ is found, it is changed to ‘Santa’.

RECODE_ALL user- Performs recodes for every country. For example: defined RECODE_ALL co antrim This example would ensure that a value like “CO ANTRIM” does not get sent to the US (because the program might think that “co” means .)

SPACE_AFTER_ Y When set to ‘Y’, if the program encounters a period in PERIOD the text, and the next character is not a space, then prior to matching the string to the lookup tables, this parameter inserts a space. This allows text such as “S.Diego” to convert to “S. Diego”. If “S.” is recoded to “San”, then the program gets a match in the Global Geography Table. For example: SPACE_AFTER_PERIOD Y

STATE_ user- Contains a table of long names for states. LONGNAME defined It is not necessary to perform a recode in order value to convert the long names into the abbreviations for lookups. For example: STATE_LONGNAME WA,,wash Ma,,mass ON,ont, Certain state names may occur in other places, such as Washington. (The program may convert Washington into WA, thus making “Washington DC” into “WA DC” which is not found anywhere.) Values contained here will not be converted, only looked up, when trying to find a state match.

Trillium Software System™ Batch User’s Guide Rules File Parameters 3-17

Parameter Value Description

STATE_MATCH numeric If the program does not get a full match on a state name, this parameter allows the consideration of a partial match as sufficient. The value entered here is number of characters to match, at the beginning of a state name, to be considered a match. For example: STATE_MATCH 4 In this case, example, if the state was “MASSACHUSETTS”, the STATE_MATCH value of 4 would allow for a match on “MASS.”

STATE_ENDING word- Allows a match if all the characters at the end of a ending word match up to the ending specified here. value For example: STATE_ENDING shire This is useful in places like the UK, where the state names might end in “shire”, such as “Abdereenshire”.

STATE_ postal Contains a list of postal code ranges for a particular POSTCODE_ code country. For example, if you had data from Billerica, RANGE ranges and wanted to ensure that it went to the US, you could add a range of postcodes (01820 - 01899). If that post code was contained on the record, additional weight would be assigned during the routing process. Format is: STATE_POSTCODE_RANGE state, low range value, high range value For example: STATE_POSTCODE_RANGE ma,01821,01899

STREET_TYPE Contains a list of names that are street identifiers. If the word preceding the street type looks like a city name, then the resulting city match is ignored. When working with multiple countries, it is best to put all street types for the various languages in this list, so the program knows which country it belongs to. For example: STREET_TYPE R,road R,st L,rue de The identifiers (L = left and R = right) indicate the position for the identifier, relative to street name. For example: "Rue" always comes to the left of the street name, and "road" comes to the right.

Global Data Router 3-18 Rules File Parameters

Parameter Value Description

TRANSLATE_CHAR user- Converts characters from one form to another. Uses defined the following syntax: TRANSLATE_CHAR ab, cd, ef values For example: ‘a’ gets translated to ‘b’, ‘c’ gets translated to ‘d’ and ‘e’ gets translated to ‘f’. For example: TRANSLATE_CHAR ØO,AE This translates ‘Ø’ to ‘O’ and ‘A’ to ‘E’.

TRANSLATE_TABLE user- Contains the name of a translation table to convert defined accented characters into non-accented ones. Many of values the internal city-state tables are entered without accents. This table allows the text to be converted (for each country) so that accented characters will match up. This table should be entered BEFORE the country sections of the rules file. Within the country section, use USE_TRANSLATE_TABLE to reference a particular table. For example: TRANSLATE_TABLE cp1250 Ç,C

USTABAUX file name Indicates the path and file name of the USTABAUX file, which contains city/state/zipcode information that the router uses to get more accurate routing of US data. Optional. For example: USTABAUX \test\router\USTABAUX

USE_TRANSLATE_ file name Invokes the designated translation table. For TABLE example: USE_TRANSLATE_TABLE cp1250

WEIGHT_ADDITIONAL_ numeric For city name matches, this value is added if another CITY_MATCH city match is made for a particular entry in a different country. The result is added to the total weight, giving more weight for more than one city match. For example: WEIGHT_ADDITIONAL_CITY_MATCH 10 In this case, if the city of ‘Portsmouth’ got a match in the US as well as the UK, a value of 10 would be applied twice.

Trillium Software System™ Batch User’s Guide Rules File Parameters 3-19

Parameter Value Description

WEIGHT_ADDITIONAL_ numeric This value is added to any city name with more than POSTCODES one postcode. It is multiplied by the lesser of the values in MAX_ADDITIONAL_POSTCODES, and the number of postal codes, minus one for this entry in the GGT. If, for example, “Boston” had 15 postal codes and MAX_ADDITIONAL_ POSTCODES is set to 10, then subtract 1 from 15 giving 14. Since 14 is greater than 10, then 10 would be multiplied by the value of the weight. If we set the weight to zero, then we don’t use this in the calculation. For example: WEIGHT_ADDITIONAL_POSTCODES 10

WEIGHT_ADDITIONAL_ numeric For city name matches with two words in the city WORDS_IN_CITY name, when the two words are fairly common. This value is the number of words in the city name minus one. This value is then multiplied by the value of the weight. So, a city like “Barrow in Furness” in the UK can have a greater impact. For example: WEIGHT_ADDITIONAL_WORDS_IN_CITY 15

WEIGHT_ATT_CITY numeric Add this weight for all entries in the Global Geography Table that have attribute assignments of “att=city”. This is useful for foreign countries that may have data that is not represented in the native-language spelling. For instance, in Italy, the capital is “Roma”. However, the name could be entered as the English spelling of “Rome.” This value is in the Global Geography Table, so it would be categorized as being in Italy. For example: WEIGHT_ATT_CITY 70

WEIGHT_ATT_STATE numeric Add this weight for all entries in the Global Geography Table that have attribute assignments of “att=state.” This weight is very similar to “WEIGHT_ATT_CITY”, except using state names instead of city names. For example: WEIGHT_ATT_STATE 50

Global Data Router 3-20 Rules File Parameters

Parameter Value Description

WEIGHT_COUNTRY_ numeric Add this weight if the country code for this record is in CODE the appropriate column. This is probably a large value, exceeding the threshold, since that the value, if present, is probably correct. For example: WEIGHT_COUNTRY_CODE 100

WEIGHT_COUNTRY_ numeric Adds this weight if the country name appears NAME anywhere in the record. For example: WEIGHT_COUNTRY_NAME 25 This should not exceed the threshold.

WEIGHT_COUNTRY_ numeric Adds this weight if the country name is the last thing NAME_LAST in the input data. If the country name is in a field, then place this field LAST in the list of fields to pick up. For example: WEIGHT_COUNTRY_NAME_LAST 25

WEIGHT_FOUND_ numeric Adds this weight if an ending from the GGT, or from ENDINGS the ADD_ENDING parameter being set for a particular country in your router rules file, is found. For example: If the ending –weg is in the table, and the word “Arborweg” is in the record, we would add the desired weight. For example: WEIGHT_FOUND_ENDINGS 60

WEIGHT_LEVEL1 numeric Adds this value if there is a match at the state/province/county level entry in the GGT.

WEIGHT_LEVEL2 numeric Adds this value if there is a match at the city level entry in the GGT.

WEIGHT_LEVEL3 numeric Adds this value if there is a match at the locality/neighborhood level entry in the GGT.

WEIGHT_LEVEL4 numeric Adds this value if there is a match at the dependent locality/secondary neighborhood level entry in the GGT.

WEIGHT_NO_ numeric Subtracts this value if there is a match on the city, but POSTCODE_MATCH not on the postal code.

WEIGHT_NO_STATE_ numeric Subtracts this value if there is a match on the city, but MATCH not on the state/province.

Trillium Software System™ Batch User’s Guide Rules File Parameters 3-21

Parameter Value Description

WEIGHT_ numeric Adds this weight if there is a word in the record that POBOX refers to a post office box for that country. These items are in the Global Geography Table, with an attribute of “ATT=PBOX”. For example: WEIGHT_POBOX 30

WEIGHT_POSTCODE_ numeric Adds this weight when there is a match on the MASK position and pattern described in the “POSTCODE_ MASK” parameter value. For example: WEIGHT_POSTCODE_MASK 50

WEIGHT_POSTCODE_ numeric Adds this weight if the POSTCODE_MASK is matched MASK_ANYWHERE anywhere in the record. The entire record is then searched for the postal code pattern. This is useful in countries like Canada or the UK, where postal codes can be very distinctive. For example: POSTCODE_MASK lc,nnnnn,nnnnn-nnnn WEIGHT_POSTCODE_MASK_ANYWHERE 75

WEIGHT_SECONDARY numeric Adds this weight if there is a secondary match at a GEOG_MATCH particular level. For example, if the city is matched and then the state or postal codes associated with this city are also matched. For example, if the input data is: Joe Customer 184 Main St Billerica Ma 01821 and WEIGHT_SECONDARY GEOG_MATCH 100 If the program looked this up in the table, it would find “Billerica”. When checked further, it would find a match on “MA”, and also on postal code “01821”. This would add 2 x 100 to the weight value for the state and postal code matches.

Global Data Router 3-22 Rules File Parameters

Parameter Value Description

WEIGHT_STATE_ numeric Adds this weight if the Router can't find an exact POSTCODE_RANGE postcode match, but the postcode is in the correct range for the state from the STATE_POSTCODE_ RANGE table (created in the rules file). For example: WEIGHT_STATE_POSTCODE_RANGE 10 Billorica ma 01821 (Normally, this is Billerica MA 01821) This will not get a match on “Billorica”, but since the range for Massachusetts contains the sectional center 018, it will add this weight.

WEIGHT_SYNONYM numeric If a synonym is found, as defined in the WDPAT file, which is in the GGT, this weight is added to the overall total.

WEIGHT_THREE_ numeric Used for any matches on city names having three or WORDS_IN_CITY more words. This value is the number of words in the city name minus one. This value is then multiplied by the value of the weight. If the city name is “Caldas de Sao Jorge” there are four words, minus one, leaving three as the number of words. The weight given is 10 and this would be multiplied by three giving this city a weight of 30. If we set it to zero, then we don’t use this in the calculation. For example: WEIGHT_THREE_WORDS_IN_CITY 1000

Trillium Software System™ Batch User’s Guide Rules File Parameters 3-23

Parameter Value Description

WEIGHT_THRESHOLD numeric This is a user-defined value that the total computed weight is compared against. When the total weight is greater than or equal to this value, no more data comparison is performed and the country of origin is determined. For example: WEIGHT_THRESHOLD 100 In this case, if the total computed weight equals 100 or greater, processing stops and the country of origin is determined.

WRITE_NOMATCH Y Allows for country records to be written to the common output file defined in the NOMATCH_FNAME parameter, instead of to the file for the identified country. The Router can identify the country of origin but maintain all records in one output file; useful if the country can be positively identified, but no parser exists for that country. For example: WRITE_NOMATCH Y

Global Data Router 3-24 Running the Global Data Router on UNIX and 32-bit PCs

Running the Global Data Router on UNIX and 32-bit PCs

To execute the cfrouter program, use the following command syntax: cfrouter –pf parm_file_name –pe echo_file_name where:

cfrouter Name of the driver program

–pf Keyword that indicates that the driver parameter file follows

parm_file_name Name of the driver parameter file

–pe Keyword indicating that the parameter echo file follows

echo_file_name Displays any parameter processing errors in the program listing file Optional

Example

cfrouter –pf ..\parms\pfrouter.par –pe ..\data\echo

Trillium Software System™ Batch User’s Guide IBM Mainframe Execution 3-25

IBM Mainframe Execution

Running the Router on the IBM Mainframe requires special attention to the output datasets. A single parameter is used to define the root name of the output dataset. The country identifier is then appended to the root name defined to make a unique dataset name. When creating a JCL, it is important to carefully define DD statements that define datasets to hold output data for each country.

The following sample JCL uses two dataset names that have been assigned by the DD statements “ROUTOTUS” and “ROUTOTCA”. These names correspond to the value of the parameter entry in OUT_FNAME. For the DD names given here:

OUT_FNAME would need to contain ROUTOT*.

The DD statement “ROUTOT2” would correspond to NOMATCH_FNAME, designed to catch all records that did not meet US or CA criteria.

Global Data Router 3-26 IBM Mainframe Execution

Sample JCL for CFROUTER

//ROUTMAIN EXEC PGM=ROUTMAIN,REGION=0M,PARM='/-PF PARMFILE', // REGION=0M //STEPLIB DD DISP=SHR, // DSN=UDP01.TRILNE.MOD.ROUTER.LOADLIB // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* // //PARMFILE DD DISP=SHR, // DSN=UDP01.ROUTER.PARMLIB(PFRTRPRM) //PFRTRRLS DD DISP=SHR, // DSN=UDP01.ROUTER.PARMLIB(PFRTRRLS) //ROUTIN DD DISP=SHR, // DSN=UDP01.ROUTER.INPUT.FILE // //ROUTOTUS DD UNIT=TRILDSK, // DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=1000,BLKSIZE=0), // SPACE=(TRK,(15,15),RLSE), // DSN=UDP01.ROUTER.USOUTPUT //ROUTOTCA DD UNIT=TRILDSK, // DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=1000,BLKSIZE=0), // SPACE=(TRK,(15,15),RLSE), // DSN=UDP01.ROUTER.CAOUTPUT //ROUTOUT2 DD UNIT=TRILDSK, // DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=1000,BLKSIZE=0), // SPACE=(TRK,(15,15),RLSE), // DSN=UDP01.ROUTER.OUTPUT2 // //ROUTIDDL DD DISP=SHR, // DSN=UDP01.ROUTER.DDLLIB(JDIXDDL) //ROUTLOG DD UNIT=TRILDSK, // DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=132,BLKSIZE=0), // SPACE=(TRK,(15,5),RLSE), // DSN=UDP01.ROUTER.ROUTLOG //TRILMSGS DD DUMMY //GLBGEOG DD DISP=SHR, // DSN=UDP01.TRIL.TABLES.GLOBGEOG /*

Trillium Software System™ Batch User’s Guide Sample Rules File 3-27

Sample Rules File

Within the rules file, note that parameters defining file name paths appear before country-specific information. Each country contains a block of parameters separated by a blank line. Each block begins with the COUNTRY parameter. Parameter names are not case-sensitive.

A comment line can be added by including an asterisk "*" on the first left position of the line. (For example * This is a comment line.)

STREET_TYPE R,STREET R,PARK R,ROAD R,ST R,RD R,AVE R,AVENUE L,RUE DE LOAD_FILES Y GLOBAL_GEOG_FNAME ../tables/GLOBGEOG MIN_WEIGHT 10 GEOG_RECODE Y WEIGHT_ADDITIONAL_CITY_MATCH 10 WEIGHT_ADDITIONAL_WORDS_IN_CITY 10 WEIGHT_ADDITIONAL_POSTCODES 10 WEIGHT_THRESHOLD 1000 WEIGHT_SECONDARY_GEOG_MATCH 1000 WEIGHT_FOUND_ENDINGS 50 WEIGHT_POBOX 10 WEIGHT_LEVEL1 50 WEIGHT_LEVEL2 100 WEIGHT_LEVEL3 100 WEIGHT_COUNTRY_NAME 500 WEIGHT_COUNTRY_CODE 2000 WEIGHT_POSTCODE_MASK 250 DROP_PERIOD Y SPACE_AFTER_PERIOD Y MAX_ADDITIONAL_POSTCODES 10

(Continued on next page)

Global Data Router 3-28 Sample Rules File

COUNTRY NL COUNTRY_NAME NETHERLANDS *WEIGHT_NO_POSTCODE_MATCH 0 *WEIGHT_NO_STATE_MATCH 0 IGNORE_CITY_NAME LE DEM RECODE HARLAM,HAARLAM 'S-GRAVENHAGE,S-GRAVENHAGE 'S-GRAVENDEEL,S-GRAVENDEEL 'S-HERTOGENBOSCH,S-HERTOGENBOSCH

COUNTRY ES COUNTRY_NAME SPAIN POSTCODE_MASK LC,NNNNN *WEIGHT_NO_POSTCODE_MATCH 0 *WEIGHT_NO_STATE_MATCH 0 IGNORE_CITY_NAME VIA DA RUA SAO PONTE COUNTRY UK COUNTRY_NAME ENGLAND WALES SCOTLAND UK ADD_ENDING SHIRE MATCH_HYPHEN_SPACE Y IGNORE_CITY_NAME PARK STREET COUNTY IRELAND WEIGHT_NO_POSTCODE_MATCH 0 STATE_ENDING FR COUNTRY_NAME FRANCE GEOG_PREFIX L' DE POSTCODE_MASK LC,NNNNN IGNORE_CITY_NAME CHEMIN MOULIN

Trillium Software System™ Batch User’s Guide Sample Log File Output 3-29

Sample Log File Output

The log file details the decisions that were made for each record processed. This data can be used to analyze the behavior of the Router program and subsequently modify the rules file entries to better fit the particular data being processed. 2 NL - GLOBAL_GEOG_FNAME cat=TABCIT wgt=100 "ROTTERDAM", total 100 2 NL - WEIGHT_NO_STATE_MATCH cat=TABCIT wgt=-50 "ROTTERDAM", total 50 2 NL - WEIGHT_NO_POSTCODE_MATCH cat=TABCIT wgt=-30 "ROTTERDAM", total 20 where:

2 Record number

NL Country abbreviation (NL=Netherlands)

GLOBAL_GEOG_FNAME Rules parameter specified in the Rules file

cat=TABCIT Matched file

wgt=100 Weight value

ROTTERDAM Item matched

total 100 Weight assigned

Testing the Router

The best way to test the Router is to look at the log file, which tells you why a record got routed to a certain country. Setting the parameter “WEIGHT_ COUNTRY_CODE” to 1 will cause the record to go through the rest of the checking to determine the country.

At the end, if the country found in the country_code field does not match the country it was routed to, a log message with the tag “COUNTRY_CODE_FIELD” is put in the log file. This is very useful to see which records got sent to the wrong place.

Global Data Router 3-30 Global Data Router Error Messages

Global Data Router Error Messages

Table 3.2, “CFROUTER Error Messages” describes the messages that might be returned by the program in the event of an error. All messages are displayed to STDERR. Table 3.2 CFROUTER Error Messages

Message Description

Opening global geog table The GGT defined in GLOBAL_GEOG_FNAME is present but incorrect. Check the path, file name and access rights.

reading global geog table The GGT defined in GLOBAL_GEOG_FNAME is present but incorrect. Check the path, file name and access rights.

STREET_TYPE must have A street type or a position indicator has not been specified in two portions; one is missing STREET_TYPE of the Rules file.

opening tabcit file The GGT parameter is present but incorrect. Check the path, file name and access rights.

opening client wdpat file A table is defined in CLIENT_WDPAT_FNAME, but cannot be opened. User does not have permission, or file is corrupt.

Country in list, but The specified code in COUNTRY LIST does not exist in the not in rules file rules file.

no INP_DDL defined An input DDL file and/or record name have not been specified in the parameter INP_DDL. Check path and/or file name.

FIELDS parm value not Field names defined in the input DDL, to scan from input found in DDL data (line_01, line_02) do not exist. Check field names and field order.

required parm missing A required parameter in pfrouter.par is missing.

Missing Switch A parameter in the rules file is missing values Y/N or 1/0.

Bad Switch A parameter in country router rules files has been specified with invalid Switch. The valid switches are Y/N or 1/0.

with opening output DDL The output DDL file is specified in the parameter OUT_DDL but incorrect. Check path and/or file name and access rights.

Trillium Software System™ Batch User’s Guide Global Data Router Error Messages 3-31

Table 3.2 CFROUTER Error Messages

Message Description

Error opening input file Input file specified in INP_FNAME cannot be opened. User does not have permission or the file is corrupt.

Error opening parm file The required PFROUTER parameter file is present but incorrect. Check path and/or file name.

Parm name not recognized A value specified in one of the parameters of the country router rules file is incorrect.

Blank parameter value A required parameter in the file, pfrouter.par file, is missing.

Cannot open log file The file specified in LOG_FNAME is present but cannot be opened; user does not have permission, or the file is corrupt.

Put no TRANSLATE_CHAR has been specified but corresponding TABCIT defined with TABCIT has not been defined. TRANSLATE_CHAR

Country name too long Value in COUNTRY_NAME cannot exceed 49 characters.

Duplicate parm name The specified parameter has been specified twice. entered

Input DDL not entered An input DDL file and/or record name have not been specified in the INP_DDL. Check path and/or file name.

Error opening echo parm file A file has been specified in RULES_ECHO, but cannot be opened. Either the user does not have permission to do so, or the file is corrupt. Check the path and/or file name.

Global Data Router CHAPTER 4 Customer Data Parser

Your database of name and address data is often your most powerful link to your customers. Therefore, identification of customer/prospect names and addresses is a critical component of a quality database.

The Customer Data Parser (CDP) is a tool designed to identify and/or verify the components of free-floating or fixed-field data. The CDP identifies and verifies all name and address data.

Customer Data Parser 4-2 Customer Data Parsing Logic Flow

The CDP is called as an external subroutine from a driver program that supplies it with name and address data. The CDP then returns the identified and verified result. The driver may range from an interactive data entry system to a high volume batch process. The parsing process is highly table driven to allow users to customize name and address identification to their specific requirements.

The tables are designed for easy comprehension and are easily updated. The data returned by the CDP is comprehensive and is applicable to a wide variety of uses. The software is written in the ANSI standard ‘C’ programming language and can be executed in numerous environments.

Customer Data Parsing Logic Flow

Name and address data is passed from a driver program to the CDP in a work area that is defined by a parameter file. This work area is called the Input Name and Address area (INA).

Parsing is accomplished in five major steps: 1. Assign all possible attribute(s) to the word/phrase, such as title, generation, apartment, first name, and so on. The first step is to isolate all words/phrases in the INA. Words/phrases are assigned all possible meanings via the Word/Phrase table, supplied via Table Maintenance. Words/phrases not specified in tables are assigned an intrinsic attribute (such as alpha or numeric). 2. Identify lines according to attribute weights and counts. Line identification is an iterative process that takes into account the following factors: a. Known line definition b. Position of the line relative to other lines c. Number of meanings of each possible line type found d. Scale weight of meanings e. Commonality between lines f. Geographic Directory verification

Trillium Software System™ Batch User’s Guide Customer Data Parsing Logic Flow 4-3

g. Overall context of all lines found

3. Assign final word/phrase attributes. Word/phrase attributes are assigned to those words/phrases whose meanings correspond with the identified line type. Lines are then processed by routines designed specially for that line type. Name and street lines rely heavily on the Pattern table for verifying the contents of the respective lines. Information from geographic directories verifies and supplies additional information about a geographic area. 4. Generate output: CDP Repository (PREPOS). A comprehensive data block is passed from the CDP back to the driver program called the Customer Data Parser Repository (PREPOS). It consists of fixed-fielded character data including error codes, identification indicators, name information, street information, and geographic information. 5. Comprehension. While steps 1 - 4 are taking place, the Customer Data Parser keeps track of how much effort was needed for it to identify and verify components. Several mechanisms are used to evaluate and refine the results returned by the Parser. These include: • The comprehension and confidence codes • The PREPOS • Review codes • The PREPOS Review Group • The Log and Display files

Customer Data Parser 4-4 Parser Functional Capabilities

Parser Functional Capabilities

The four primary functions of the Customer Data Parser are to: 1. Identify lines of a name/address by line type. 2. Identify words/phrases on each line by type. 3. Prepare a name and address for matching. 4. Prepare a name and address for accurate presentation.

In order to accomplish its primary functions, the Customer Data Parser:

Comprehends single or multiple line input and input that is predefined by line type or word/phrase type.

Standardizes all name, secondary street, primary street, secondary geography, primary geography data.

Identifies single and/or multiple names on a line, personal and business name forms and relationships between names found.

Supports comma-reversed names and verifies cities.

Offers flexibility through an externally edited set of tables.

Identifies words and phrases by their masks (such as zip code).

Enables identification of data that is embedded in a name/address.

Appends "missing" data when can be reasonably inferred (such as missing state name).

Enables the categorizing of any unique words and phrases (such as SIC codes, and business types).

Identifies special addresses by words or phrases (such as "Hold Mail").

Corrects misspellings and allows for recodes by using external tables.

Displays input to output results for the tuning process.

Collects run statistics in order to identify problem areas quickly.

Produces a comprehensive log that contains name, street, and city problems that can be analyzed to refine external tables.

Supports homonym processing using a primary, secondary, and tertiary Parser based on geography (for example, Mr. and señor).

Trillium Software System™ Batch User’s Guide Identifying Business versus Personal Names 4-5

Identifying Business versus Personal Names

The CDP determines the difference between personal and business names in the following manner: 1. If a line contains at least one word of attribute BUSINESS. (Note that pattern processing has the final attribute assignment for a line allowing for compound businesses and personal and business names on one line). 2. If the line begins with the same value as the city and is not further qualified. 3. If a line contains a word of attribute BUSINESS and does not contain a word of an attribute of personal nature (FIRST, LAST, etc.). 4. If a word is in the possessive form (uses an apostrophe followed by the letter s). 5. If an unidentified word contains all consonants and is at least four characters long. 6. If a line does not pass name pattern validation it will have a reject name form, but will be stored in the PREPOS business name field. 7. If more than one comma exists on a name line.

Customer Data Parser 4-6 Comma-Reversed Names

Comma-Reversed Names

Sometimes, names are entered in data files in LAST, FIRST sequence. The CDP has a built-in routine to handle this type of convention.

Example SMITH, MARY M

Elements are looked up and attributes assigned based on input sequence of the elements. In the example, the initial pattern would be ALPHA ALPHA 1ALPHA. Mary would not get a FIRST attribute because the word is not at the ‘BEG’ (beginning) of the line.

If a single comma exists, the pattern element(s) to the left of the comma is temporarily moved to the end of the pattern element(s) to the right of the comma. Here, the pattern becomes ALPHA 1ALPHA ALPHA. This reversed pattern is given one chance against the Word/Pattern file for a pattern match and output token assignment.

In this case, the reversed pattern would hit an existing name pattern with a recode value of FIRST MIDDLE LAST. The system then assigns the parsed input elements to the appropriate PREPOS fields remembering the original element sequence.

If the reversed pattern does not match an existing pattern, the name is coded as an unknown name pattern and an entry is written to the log file showing the ORIGINAL input pattern and sequence of elements, prior to the reversal. If more than one comma exists on a name line, the CDP considers this a business name form. For example, the CDP identifies the name of this law firm as a business name:

SMITH, NICOLI, ROGERS, P.C.

Trillium Software System™ Batch User’s Guide Customer Data Parser Process Flow 4-7

Customer Data Parser Process Flow

Figure 4.1 Customer Data Parser (CFPRSDRV) Resources

Customer Data Parser 4-8 Customer Data Parser Input

Customer Data Parser Input

File Description

Input Name and Input Name and Address data to be parsed. This contains Address Data character name and address information in the format (convout) specified in the parameter file. This file is usually the output file from the Converter.

CDP Parameter File Text file including control logic for Parser processing. Table (pfparser.par) file names, locations, control data values, and processing controls may be altered by the user within this file.

Driver Parameter File Text file that contains the input and output file names, DDL (pfprsdrv.par) file details, and other statistical output file names.

DDLs The DDLs used in the Parsing process include: convout.ddl (describes the user input record) prepos.ddl (describes Parser return area – PREPOS) parsout.ddl (described the output record) report.ddl (describes the output parser report; consists of the entire PREPOS, the original input record and the candidate codes)

Tables

Word/Phrase Table The first of two encoded files produced by Table (CLTABDEF) Maintenance. This table includes standard definitions supplied by the Trillium Software System and user-defined definitions for words/phrases.

Pattern Table The second of two encoded tables produced by Table (CLTABPAT) Maintenance. This table contains standard patterns supplied by the Trillium Software System and user-defined patterns associated with each line type.

City Directory This standardized, encoded city directory is provided by the (xxTABCIT) Trillium Software System. It is used for address (city) verification and correction, and is based on a primary xx=country specific geography, secondary geography lookup (e.g., state, city). code (Note: If the US Postal Geocoder is used, this file must be in sync with the postal directories.)

Trillium Software System™ Batch User’s Guide Customer Data Parser Input 4-9

File Description

Auxiliary City This city name/postal code table is supplied by the Trillium Directory Software System. It is used to confirm or supply a city name (USTABAUX) by referencing the postal code/city via a spelling algorithm. US only This file may be optionally loaded into memory for faster processing. (Note: If the US Geocoder is used, this file must be in sync with postal directories.) To determine the correct city name five-zip(5 bytes), flag(1 byte), state code, city name The flag values are: 1- Preferred mailing name 2 - Acceptable mailing name 9 - Unacceptable mailing name When a city comes in and its flag is a 9, the city name is changed to the one that has the same zip code but has the flag set to a 1. If the incoming city has the flag set to a 2, then the same city name is used on output. 194031PANORRISTOWN 194032PAAUDUBON 194032PAEAGLEVILLE 194039PAJEFFERSONVILLE

In the above example from the USTABAUX, if the incoming city is Jeffersonville, then the outbound city is Norristown.

If the incoming city is Eagleville, then the outbound city is Eagleville.

Customer Data Parser 4-10 Customer Data Parser Output

Customer Data Parser Output

File Description

Log File Contains a listing of primary CDP statistics including: (palog.txt) record counts of confidence comprehension codes review codes counts of recognized errors in the data Invalid name and street patterns and city problems encountered in the address data are also recorded. This file itemizes the token attributes for each invalid pattern, so the user may compare it directly with the data. The statistics file is used to fine-tune Customer Data Parser input by adding or correcting entries through Table Maintenance in the USERDEF word/pattern table. The Palog Analyzer function (in the Parser Tuner tool) is a great way to adjust tables when Table Maintenance is required. See the Parser Tuner section of Ch. 4 of the Control Center manual for more information about the Parser Tuner.

Parser Display This output file, (displaying the INA file data vs. PREPOS data), is Report File used to evaluate the parsing results. (padsp)

Detail File An output file consisting of the INA in the form of lines, tokens, (pastat.txt) and rules used to arrive at the PREPOS results.

Output File Output file whose shape is defined by the output DDL. Prepos data (PAOUT) is copied to this file when fields match between the prepos DDL and output DDL files. The ORG_RECORD field is also copied from the input to the output. Please note that only the data contained in the field name ORG_RECORD is copied across. Fields of other names not redefined by ORG_RECORD are not copied.

Report File The report file, (specified in OUTRPRT_DDNAME parameter), read (REPORT) by the Parser Display program to generate a report of Parser output. The records for the Parser Display are selected based on the review groups and every nth record specified (see the parameter, NTH_RECxxn). The program assumes standard names for the name and address portion of the DDL. All DDLs are named in the parameter file.

Trillium Software System™ Batch User’s Guide About Data Dictionary Language Files 4-11

About Data Dictionary Language Files

The Data Dictionary Language (DDL) file is used to describe input and output data to the program using field names. Modifying input/output shapes does not require programming changes and can be accomplished by changing the DDLs (Data Dictionary Language files) DDLs must be used with the program. One of two possible output record types may be generated:

Case 1: Up to 10 multiple names placed on single output record. Case 2: Each individual name of a multi-customer input record on a separate output record.

See the “Data Dictionary Language” section of the Control Center manual for a list of DDL fields used with the CDP.

The CDP module specifically needs to know the structure of the name and address components of a file. They are defined by specific field names. Two methods of describing name\address data using DDLs are:

The first uses the field names "oraddrl1" through "oraddrl10" to describe the input name and address data.

The second way is to break each address line into parts. Up to 7 parts may be specified per line. The parts can be described by adding the letters "a" to "g" to the end of the field name that describes the line.

For example, if the first line has three parts, then the following field names would be used: oraddrl1a, oraddrl1b and oraddrl1c.

The program builds each line for the parts with a space as a separator character. This implementation causes a 1000 byte work area to be sent to the CDP in the form of ten 100-byte lines. The parameter file should specify lines in the shape of this work area.

Customer Data Parser 4-12 Special DDL Fields

Example Line1 = 1 99 Line6 = 501 99 Line2 = 101 99 Line7 = 601 99 Line3 = 201 99 Line8 = 701 99 Line4 = 301 99 Line9 = 801 99 Line5 = 401 99 Line10 = 901 99

Special DDL Fields

The following special DDL fields are used for running the CD Parser.

ORG_RECORD Field

The ORG_RECORD field must be defined in both the user input record DDL and in the output record DDL. This is the name of the standard DDL record that holds the original address lines. This field defines which contiguous fields are to be copied from the input record to the output record. Typically, this field describes the entire input record.

Only the field ORG_RECORD is copied from input to output.

Other Special DDL Fields

The DDL describing the input record must use standard names as well to describe the fields in the input record to be used by the Customer Data Parser. Additional fields that may be defined in the PREPOS DDL will cause certain conversion action and population to take place.

Trillium Software System™ Batch User’s Guide Special DDL Fields 4-13

Field Name Definition

The following fields must be defined in both the PREPOS and output DDLs if geocoding will be performed for their respective countries. au_match_area Field name must be the last field in the prepos DDL and the first field ca_match_area in the parser output DDL. The parser output DDL will typically redefine de_match_area the entire match area into its individual components. Refer to the hk_match_area parser output DDL delivered with the country template for a complete uk_match_area field layout. tg_match_area After running the parser, this field of the prepos will contain geographic data to be used as input to the proper geocoder program.

The following fields can also be defined in the output DDL. Each field is 100 bytes. (Optional) label_name_1 Inclusion of this special field name in your PARSOUT.DDL definition will label_name_2 cause the first, second and third (respectively) LABEL name lines to be label_name_3 copied here. label_street_1 Inclusion of this special field name in your PARSOUT.DDL definition will label_street_2 cause the first, second and third (respectively) LABEL street lines to label_street_3 be copied here. label_geog_1 Inclusion of this special field name in your PARSOUT.DDL definition will label_geog_2 cause the first, second and third (respectively) LABEL geography lines label_geog_3 to be copied here.

Label lines contain standardized data only. No recode functionality is present.

Customer Data Parser 4-14 Customer Data Parser Parameters

Customer Data Parser Parameters

Table 4.1, “Customer Data Parser Parameters” describes the parameters in the pfparser.par parameter file. All parameters are optional unless otherwise specified. A REQUIRED parameter is indicated by a shaded row in the table.

Entries in this file must take the form of [KEYWORD]=[PARM VALUE] where:

[KEYWORD] Name of the parameter [PARM VALUE] Name of the modifier

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

AUSTRALIAN_CATEGORY_VALUE F00AUS Indicates the table recode value associated with Australia.

BRAZILIAN_CATEGORY_VALUE F00BRA Indicates the table recode value associated with Brazil.

BUSINESS_NAME_EDIT Y Y=Enables business edit processing for the BUS-EDIT attribute entries defined in the CDP Word/Pattern table. See the Utilities and Table Maintenance manual for more information about the BUS-EDIT attribute.

CANADIAN_CATEGORY_VALUE F00CAN Indicates the table recode value associated with Canada.

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-15

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

COMPLEX_CITY_SPELLING Y (default), N Y=Applies a spelling algorithm to a city name, when the input city name did not match the city table. N=Turns this function off.

DEFAULT_ORIGIN 1-9 Indicates the default country origin of the INA. REQUIRED 1=USA 2=CANADA 3=UK 4=Other 5=BRAZIL 6=AUSTRALIA 7=GERMANY 8=Italy 9=Optional value available for Portugal only. When used, the last token on a street line is NOT changed from APT to ALPHA.

DETAIL_DISPLAY Y Indicates that a tokenized detail report is to be written to the file specified in PRIMARY_DETFNAME.

DISP_CONFIDENCE0—10 N1, N2 DISP_CONFIDENCEx N1 N2 N1 = every NTH record you want to look at, for that confidence level.... N2 = ...up to N2 number of records For example: DISP_CONFIDENCE1 5, 50 In this case, every 5th record with a confidence level of 1, up to 50 records. Up to ten parameters may be used. REQUIRED

Customer Data Parser 4-16 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

DISPFNAME file name Name of the Parser Display Report file. This report contains details of input vs. output data.

ELIMINATE_DUPLICATE_ Y, N Y=Eliminate duplicate dwelling DWELLINGS information. N=Do not eliminate duplicate dwelling information. (This is the default.)

GERMAN_CATEGORY_VALUE F00DEU Indicates the table recode value associated with Germany.

IGNORE_BUSINESS_MEANINGS X, Y X=Turns off business, possible business, business-descriptive and business redefine attributes. Business names are only generated from patterns. Y=Turns off the setting of token meanings of business attributes and possible business attributes.

ISALFILE file name (code The file name (code page table) page table) specified here determines if characters are alphabetic in a MASK setting. For example: Postal Code = 1A2B3C MASK recognition = NANANA Used in conjunction with the table maintenance MASK modifier. It is required for special characters found in many foreign languages. Used with the ISNMFILE parameter.

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-17

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

ISNMFILE file name Name of the file (code page table) specified here determines if characters are numeric in a MASK setting. For example: Postal Code = 1A2B3C MASK recognition = NANANA Used in conjunction with the table maintenance MASK modifier. It is required for special characters found in many foreign languages. Used with the ISALFILE parameter.

Customer Data Parser 4-18 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

KEEP_CHARACTER Special Normally, special characters are Characters removed and other characters are converted to uppercase. This parameter specifies characters that you do not want removed or converted to uppercase. For example: KEEP_CHARACTER=[ ]; retains the left and right brackets. Special Character Inclusion and Deletion: The Parser automatically removes most special characters unless it is told to keep them by including this parameter in the parameter file. Automatically KEPT characters: Numerics 0-9 Alphas A-Z, a-z Hyphen - Apostrophe ‘ Forward Slash / Ampersand & Plus Sign + [ + is recoded to & ] Double Quotes “ [Double quotes are recoded to single quotes (‘)] Percent Sign % Automatically REMOVED characters: Brackets [] Redirection Symbols <> Dollar Sign $ Pound Sign # Equal Sign = Back Slashes \ Carat Symbols ^ Asterisks *

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-19

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

KEEP_DELIMITER Special Normally, the special characters, Characters period (.), comma (,), and space ( ), are used as word delimiters. This parameter can be used to specify which delimiter is to be kept for the label line. For example: KEEP_DELIMITER=/ ., ; Tthe delimiter is kept at the end of the token when searched for in the table, and the space and comma would show up in the label line. For example: Fiona/ MacDonald In this case, Fiona/ is looked up in the table.

LIMIT_GENDER_ASSIGNMENT Y, N (default) Establishes how gender assignment is determined. Y= gender is based on the first name and title only. N=gender assignment is based on the first name, middle name, and title.

Customer Data Parser 4-20 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

LINE1 N,F,S,G,?,X In the example: LINE2 LINE1=1 99 ?; LINE3 LINE2=101 99 ?; LINE4 LINE3=201 99 ?; LINE5 LINE4=301 99 ?; LINE6 LINE5=401 99 ?; LINE7 LINE8 LINE6=501 99 ?; LINE9 LINE7=601 99 ?; LINE10 This indicates that address is located at position 1 of the INA for a length of 99 and is of unknown type address. Address is located at position 101 of the INA for a length of 99 and is of unknown type, and so on. If a selection is unknown, the Parser will do its best to determine the proper variable assignment. The maximum length of a line is 99. The maximum number of lines is 10. See “Line Pattern Identification Codes” on page 4-59 for more information about these codes. REQUIRED

NEIGHBORHOOD_FORMAT_ 1 or 0 (default) When this parameter is turned on, OPTION the pr_neigh1_name and pr_neigh1_ name_display fields (in the PREPOS layout) get treated as one 60-byte field to store in the neighborhood value. 1=Enables the function 0=Disables the function

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-21

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

NAME_PARSING_DEPTH 1 – 9 Indicates how many levels the Parser (default=9) is allowed to perform to select accurate attribute values. Should be set to a lower number for updating Table Maintenance with the best entry. REQUIRED

NO_CONCATENATED_TOKENS_ Y, N When a word is defined as a AS_LAST concatenee, it is usually concatenated to the token that follows it in the line. Then the combined token is defined as a LAST. As a result, names like La Keysia Smith parses as: FIRST = Smith LAST = La Keysia Y=Defines the combined token as ALPHA = La Keysia N=A combined token is defined as a LAST LAST = La Keysia

Customer Data Parser 4-22 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

NO_SPECIAL_BUSINESS_ Y, NP Y=No special business service SERVICE processing occurs. NP=Forces one pass through the patterns, disabling automatic business line type identification. It also prints business patterns to the CDP log file (palog). Turns numerics into Business Names. If the beginning token of the name line matches the city name of the geography line then the name line becomes the busines name (provided the beginning token of the name line is not defined as a First Name attribute). This parameter affects BUSINESS and BUSINESS? attributes. By default, a BUSINESS attribute on a name line forces the line to a business.

NO_SPECIAL_CHARACTER_ Y, blank The Parser usually separates words LOOKUP_SERVICE from special characters (- or /) prior to look up in the table. For example, for a word like “INC–”, the hyphen is separated out and “INC” is looked up in the table. If “INC-” is found in the table, the (-) is removed. blank=The parameter separates words from special characters (- or / ) prior to look up in the table; normal processing for the Parser. Y=Turns off the above process, and activates this parameter. (“INC–” is looked up in the table.)

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-23

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

NO_SPECIAL_COMMA_NAME_ Y, blank Normally, when a single comma REVERSE_SERVICE exists on a name line, the element(s) to the left of the comma are temporarily moved to the end of the line before pattern lookup. For example: SMITH, MARY M (ALPHA ALPHA 1 ALPHA) is changed to MARY M SMITH (ALPHA 1ALPHA ALPHA) which then finds a pattern to identify it as FIRST MIDDLE LAST If set to Y, turns off the above process, and enables the parameter.

NO_SPECIAL_HOUSE_SERVICE Y, M Normally, the Parser performs pre- processing of house #s prior to Street pattern processing. Functions performed include: A “1 1/2” remains as is (Note that "1 1/2" becomes a HSNO token; the fraction portion must be 3 characters in length and include the '/'). “2420-36” becomes “2420 36” Not validThis option does NOT work for New York, and .) Y=Turns off the special house number processing prior to Street pattern lookup. M=Only minimal house number service (e.g. Only option 1 above is invoked.)

Customer Data Parser 4-24 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

NO_SPECIAL_INTERNET_ Y, N Y=The program suppresses BUSINESS_SERVICE embedded business tokens ‘Business.com’ and ‘Business.net’. It also sets the attribute to IGNORE. N=Recognizes and calls these business tokens.

NO_SPECIAL_JOIN_STREET_ Y, blank Normal Parser processing means ADJACENT_ALPHA_SERVICE adjacent ‘ALPHA’ elements are combined into a single ‘ALPHA’ element during the leveling process. See the section “Street Pattern Depth Levels” on page 4-63 for information on street parsing levels. Y=Turns off the above process, and switches on this parameter.

NO_SPECIAL_NAME_ Y, M With normal Parser processing, pre- PREPARATION processing takes place prior to Name processing, including Name Concatenation and Name Splitting. For example: JOHN&MARY, JOHN+MARY, or JOHN/ MARY splits into three tokens, providing both names have a valid FIRST attribute. Y=Turns off the above process, switches on the parameter. M=Performs concatenation and special business processing after performing Level 1 name pattern processing; users can decide if something is concatenated or if a business should be configured differently.

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-25

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

NO_SPECIAL_ORDINAL_STREET_ Y, X Normally, street-name ordinals are SERVICE added to street lines after Street pattern processing, but will NOT be added to street lines if the street name is made up of multiple words. For example: 10 22 Street normally becomes 10 22nd Street 10 22 Street Road remains 10 22 Street Road Notice that street-name is 22 Street, a multi-word street name) This parameter turns off the ordinal processing feature. Values are: X=Turns off ordinal processing, but if ordinals already exist on input data, they are kept in the display and non- display fields. Y=Turns off ordinal processing, and if ordinals exist on input, they are kept in the display fields and dropped from non-display fields.

Customer Data Parser 4-26 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

NO_SPECIAL_NAME_SPLIT_ Y, N (default) For two- and three-character names SERVICE with no gender code, the program would automatically split them into initials. For example: ‘BPL’ (“B” is parsed to first name field, “P” is parsed to middle name field, and “L” is parsed to last name field.)

The Parser would normally cause “BPL” to become the initials ‘BPL’.

Y=Prevents the above from happening, leaving ‘BPL’ exactly is. N=Allows splitting of initials as shown above.

Works on UK data only.

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-27

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

ORIGINAL_MEANINGS_OPTION 1, 2, 3 Tells the system which data becomes contents of the display field(s). 1=Display field contain original data value, not the synonym value 2=Multiple street line data (line type Z) is stored in pr_misc_field; 3=Implements both option 1 and option 2 together. Original data value is kept and “Z line type data” are stored in pr_misc_addr field. When an entry is made a synonym of another entry, the original entry value will be lost, and the synonym value will become the display value. Consider the example: ‘FIONA’ INSERT SYNONYM=MARY ‘MARY’ INSERT NAME BEG ATT=FIRST,GENDER=F, RECODE=SUSAN oraddrl1: FIONA pr_first_display_01: MARY pr_first_01: SUSAN Notice that ‘FIONA’ has been replaced, ‘MARY’ has become the original data and ‘SUSAN’ is the recoded value. Setting this parameter to a 1 stops storing of the synonym value in the display field and instead stores the original data value. Synonym values that point to other entries must be values that make sense. Don’t make entries to be synonyms of other misspelled entries, or the original value will be this mis- spelled value.

Customer Data Parser 4-28 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

PREPOS_FORMAT_OPTION 1 Stores three additional versions of street_name in the PREPOS in the last 150 characters of the pr_misc_ review_codes field. The data is stored in three 50-byte areas in positions 546—695: Area 1 (546-595) – Standardized street title Area 2 (596-645) – Display street title Area 3 (646-695) – Original street title Normal street title areas are filled in as they were previously, with the exception of the street to long review problems. These are suppressed if the parameter in and the data can fit in the 50-byte areas.

The following 11 parameters contain secondary and tertiary versions of themselves, used when running multiple Customer Data Parsers at once.

PRIMARY_AUX_CITYFNAME file name Name of the file that contains the zipcode-to-city cross-reference SECONDARY_AUX_CITYFNAME table. This file is usually TERTIARY_AUX_CITYFNAME USTABAUX. This is only used in the US (Optional). When running with multiple country parsing tables, a maximum of three sets of parsing tables can be defined by the SECONDARY and TERTIARY parameter names.

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-29

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

PRIMARY_AUX_MAIN Y Y=Loads auxiliary city name table into memory. (workstations require SECONDARY_AUX_MAIN the USTABAUX.80 file defined by PRIMARY_AUX_CITYFNAME). When TERTIARY_AUX_MAIN running with multiple country parsing tables, a maximum of three sets of parsing tables can be defined by the SECONDARY and TERTIARY parameter names.

PRIMARY_CITY_NAME_TYPE COMPLEX Indicates shape of city directory (for directories other than US) defined by SECONDARY_CITY_NAME_TYPE the PRIMARY_CITYFNAME parameter. REQUIRED for city TERTIARY_CITY_NAME_TYPE directories other than the US. When running with multiple country parsing tables, a maximum of three sets of parsing tables can be defined by the SECONDARY and TERTIARY parameter names.

PRIMARY_CITYFNAME file name Name of the file of the primary city name directory. When running with SECONDARY_CITYFNAME multiple country parsing tables, a maximum of three sets of parsing TERTIARY_CITYFNAME tables can be defined by the SECONDARY and TERTIARY parameter names. REQUIRED

PRIMARY_DEBUGFNAME file name When these parameters are set, debugging information is printed to SECONDARY_DEBUGFNAME the named file. Some of the debugging information comprises TERTIARY_DEBUGFNAME name and street patterns before table lookup. If the patterns are found, the recoded patterns will also be printed, along with geographic lookups.

Customer Data Parser 4-30 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

PRIMARY_DETFNAME file name Defines the file where the detail display report will be written. When SECONDARY_DETFNAME running with multiple country parsing tables, a maximum of three TERITARY_DETFNAME sets of parsing tables can be defined by the SECONDARY and TERTIARY parameter names. Used when the DETAIL_DISPLAY parameter is set to Y.

PRIMARY_GEO_CATEGORY user-defined Indicates the geographic category value that should be processed by the SECONDARY_GEO_CATEGORY primary Parser. Can be used as an alternate TERITARY_GEO_CATEGORY for the PRIMARY_GEO_ VALUE parameter. When running with multiple country parsing tables, a maximum of three sets of parsing tables can be defined by the SECONDARY and TERTIARY parameter names. For example: F00USA F00CAN

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-31

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

PRIMARY_GEO_VALUE 1—9 Indicates the country origin that should be processed by the CDP. SECONDARY_GEO_VALUE 1=USA, 2=CANADA, TERTIARY_GEO_VALUE 3=UK, 4=other, 5=BRAZIL, 6=AUSTRALIA, 7=GERMANY 8=ITALY 9=Optional value available for Portugal only. When used, the last token on a street line is NOT changed from APT to ALPHA.

When running with multiple country parsing tables, a maximum of three sets of parsing tables can be defined by the SECONDARY and TERTIARY parameter names. REQUIRED

PRIMARY_LOGFNAME file name The file name of the Parser log file. When running with multiple country SECONDARY_LOGFNAME parsing tables, a maximum of three sets of parsing tables can be defined TERTIARY_LOGFNAME by the SECONDARY and TERTIARY parameter names. REQUIRED

PRIMARY_PATTERNFNAME file name The file name of the word/pattern pattern file created during table SECONDARY_PATTERNFNAME maintenance. Normally named CLTABPAT. TERTIARY_PATTERNFNAME When running with multiple country parsing tables, a maximum of three sets of parsing tables can be defined by the SECONDARY and TERTIARY parameter names. REQUIRED

Customer Data Parser 4-32 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

PRIMARY_WORDFNAME file name The file name of the word/pattern definitions file created during table SECONDARY_WORDFNAME maintenance. Normally named CLTABDEF. TERTIARY_WORDFNAME When running with multiple country parsing tables, a maximum of three sets of parsing tables can be defined by the SECONDARY and TERTIARY parameter names. REQUIRED

REVIEW_GROUP_ORDER review group This parameter allows for the Review value (numeric) Group order to be changed to a different order. Because there is a specific order for Review Groups, changing the order may improve the reporting accuracy by allowing records to become better categorized. Any record not in a Review Group is coded as “000.” See “Review Codes and Review Groups” later in this chapter for a listing of the default Review Group hierarchy.

SKIP_DELIMITER special Normally, the special characters, characters period (.), comma (,), and space ( ), are used as word delimiters. This parameter specifies which of those characters to use as delimiters. For example: SKIP_DELIMITER=, ; specifies that only the comma (,) and space ( ) are to be used as the SKIP DELIMITER. This does not include the delimiter when searching within the table. For example: O.T.O. will be treated as one token when being looked up in the table.

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-33

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

SORT_DWELLINGS Y Y=Output data is sorted alphabetically by dwelling name. For example: “Suite 34 Floor 10” is sorted as “Floor 10 Suite 34.”

Customer Data Parser 4-34 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

SPECIAL_CAREOF_SERVICE Y, N Y=Allows a CARE-OF attribute to set logical beginning and ending positions, such as FIRST or TITLE. N=Turns this function off; words that follow the CARE-OF attribute will have no special processing performed on them. If set to Y: In this example, a business is “care-of” a personal name, and that personal name’s title has been properly flagged as TITLE. The logical line position was changed after the CARE-OF attribute so that “Mrs.” is now seen as a TITLE. DEFAULTED BUSNAME PATTERN=ALPHA, BUSINESS, CARE-OF, TITLE, 1ALPHA, 1ALPHA, ALPHA, | THORN, SECURITIES, C/O, MRS, F, B, SMITH, REC=1 If set to N: In this example, a business is “care-of” a personal name, and the personal name is not seen. The logical line position was not changed after the CARE-OF attribute. Here, the title (“Mrs.”) is not recognized, but seen as an ALPHA. DEFAULTED BUSNAME PATTERN=ALPHA, BUSINESS, CARE-OF, ALPHA, 1ALPHA, 1ALPHA, ALPHA, | THORN, SECURITIES, C/O, MRS, F, B, SMITH, REC=1

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-35

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

SPECIAL_SINGLE_WORD_ Y Y=Lines with a single token above SERVICE an identified street line are given a single attempt to match a pattern in the pattern table. The token must be identified as a single intrinsic attribute (such as ALPHA, ALPHA- 1SPECIAL). For example: JOHN SMITH BRIDLEWORKS1 10 MAIN ST BILLERICA MA In this example, ‘BRIDLEWORKS1’ would be identified as an ALPHA- 1NUMERIC token and would be identified as a business if the pattern for ALPHA-1NUMERIC was assigned to a business.

STREET_LOG_SERVICE LL=Prints to the Parser Log file the street pattern that existed at Level 1 Parsing depth. See “Street Pattern Depth Levels” on page 4-63 for more information on street parsing levels.

STREET_PARSING_DEPTH 1 - 9 Indicates how many levels the Parser (Default=9) will be allowed to perform to select the accurate attribute value. This should be set to a lower number for updating Table Maintenance with the best entry. See the section “Street Pattern Depth Levels” on page 4-63 for more information about street parsing levels. REQUIRED

Customer Data Parser 4-36 Customer Data Parser Parameters

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

TOP_LEVEL_DOMAINS_LIST_ character The Parser normally converts all dots FNAME (.) to spaces. However, a business may have a dot as part of its name, (such as Monster.com.) In this case, the dot should not be converted to a space. This parameter allows for all top-level domains to remain as is. Users can specify the name of a file that contains a list of top-level domains. Maximum number of names allowed is 500; maximum length of each name is 14. For example: TOP_LEVEL_DOMAINS_LIST_ FNAME ..\data\domainlist.txt

The file domainlist.txt could contain .edu, .gov, .org, and so forth. If NO_SPECIAL_INTERNET_ BUSINESS_SERVICE is set to Y, then the dots are replaced by spaces.

TRFILE table name File name of an optional translate table. These tables correct the character set for processing. This is used when foreign characters need to be recognized (such as characters with tildes, umlouts, and so on).

UNITED_KINGDOM_ F00GBR Indicates the table recode value CATEGORY_VALUE associated with the United Kingdom.

UNITED_STATES_CATEGORY_ F00USA Indicates the table recode value VALUE associated with the United States.

Trillium Software System™ Batch User’s Guide Customer Data Parser Parameters 4-37

Table 4.1 Customer Data Parser Parameters

Parameter Name Values Description

USER1 user-defined Indicates the position in the PREPOS USER2 values layout (prepos.ddl) of where tokens USER3 assigned a USER1–9, or USER4 attribute are to be written. USER5 USER6 In order to have fields (with data) USER7 written to the output file, you must USER8 place a field with the same name USER9 (case-sensitive) in the prepos.ddl USERA and the parsout.ddl.

See the Utilities and Table Maintenance Manual for information on attribute assignment. Tokens are placed in the corresponding position indicated to the right. Positions and lengths are for the PREPOS layout. For example: USER1=350 20; USER2=370 20; USER3=390 10; USER4=400 20; If this parameter is not used, or if the area indicated is non-blank, data is stored in the miscellaneous line area.

WRITE_ALL_NAME_DATA Y,y, Writes the relationship token to the N(default),n pr_relation_nn field even though it is not part of a pattern.

Y, y = Write all identified name data to the PREPOS. N, n = Only write the name data that correspond to a found pattern.

Customer Data Parser 4-38 Sample Parser Parameter File

Sample Parser Parameter File

This file allows the CDP to be controlled externally by naming each of the files to be used for controlling the CDP and each of the output files. This file also defines the shape of the INA; that is, it names the number of lines and starting position and length for each. Entries in the parameter file must use the format: [KEYWORD]=[PARM VALUE];

[KEYWORD] Name of the parameter

[PARM VALUE] Name of the modifier

Note that an asterisk in column 1 of a line indicates a comment.

The last line of all parameter files must contain a carriage return and/ or linefeed to enable the system to process the last Parameter/table entry in the file. Use only spaces, not tabs.

Trillium Software System™ Batch User’s Guide Sample Parser Parameter File 4-39

*********************************************** Customer Data Parser sample parm file *********************************************** PRIMARY_LOGFNAME=PARSLOG; *SECONDARY_LOGFNAME=; *TERTIARY_LOGFNAME=; PRIMARY_DETFNAME=PARSDET; *SECONDARY_DETFNAME=; *TERTIARY_DETFNAME=; DETAIL_DISPLAY=Y; DEFAULT_ORIGIN=1; DISPFNAME=PARSDISP; PRIMARY_GEO_VALUE=1; *SECONDARY_GEO_VALUE=2; *TERTIARY_GEO_VALUE=3; CANADIAN_CATEGORY_VALUE=F00CAN; UNITED_KINGDOM_CATEGORY_VALUE=F00GBR; UNITED_STATES_CATEGORY_VALUE=F00USA; BRAZILIAN_CATEGORY_VALUE=F00BRA; AUSTRALIAN_CATEGORY_VALUE=F00AUS; GERMAN_CATEGORY_VALUE=F00DEU; PRIMARY_WORDFNAME=USTABDEF; *SECONDARY_WORDFNAME=; *TERTIARY_WORDFNAME=; PRIMARY_PATTERNFNAME=USTABPAT; *SECONDARY_PATTERNFNAME=; *TERTIARY_PATTERNFNAME=; PRIMARY_CITYFNAME=USTABCIT; *SECONDARY_CITYFNAME=; *TERTIARY_CITYFNAME=; PRIMARY_GEO_CATEGORY=F00USA; *SECONDARY_GEO_CATEGORY=; PRIMARY_AUX_CITYFNAME=USTABAUX; *PRIMARY_AUX_MAIN=Y; *SECONDARY_AUX_CITYFNAME=; *TERTIARY_AUX_CITYFNAME=; *PRIMARY_CITY_NAME_TYPE=COMPLEX; *SECONDARY_CITY_NAME_TYPE=COMPLEX; LINE1=1 99 ?; LINE2=101 99 ?; LINE3=201 99 ?; LINE4=301 99 ?; LINE5=401 99 ?;

Customer Data Parser 4-40 Sample Parser Parameter File

NAME_PARSING_DEPTH=9; STREET_PARSING_DEPTH=9; NO_SPECIAL_HOUSE_SERVICE=Y; NO_SPECIAL_COMMA_NAME_REVERSE_SERVICE=Y; NO_SPECIAL_JOIN_STREET_ADJACENT_ALPHA_SERVICE=Y; NO_SPECIAL_ORDINAL_STREET_SERVICE=Y; NO_SPECIAL_BUSINESS_SERVICE=NP; NO_SPECIAL_CHARACTER_LOOKUP_SERVICE=Y; DISP_CONFIDENCE0=5 50; DISP_CONFIDENCE1=5 50; DISP_CONFIDENCE2=5 50; DISP_CONFIDENCE3=5 50; DISP_CONFIDENCE4=5 50; DISP_CONFIDENCE5=5 50; USER1=350 20; USER2=370 20; USER3=390 20; USER4=400 20; USER5=420 20; KEEP_CHARACTER=[] SKIP_DELIMITER=/ ; REVIEW_GROUP_ORDER=001003002008; TRFILE=; BUSINESS_NAME_EDIT=Y; SORT_DWELLINGS=Y; PREPOS_FORMAT_OPTION=1;

Sample Parser Parameter File

Trillium Software System™ Batch User’s Guide pfprsdrv.par Parameters 4-41

pfprsdrv.par Parameters

All parameters are optional unless otherwise specified. Please note that required parameters appear in bold and shaded.

Name Values Description

BLANK_GEN_F1_LENGTH numeric The output field length of the fields specified in BLANK_GEN_F1_OFFSET.

BLANK_GEN_F1_OFFSET numeric Provides the ability to blank out up to 4 fields in a generated record’s output (e.g. F1, F2, F3, F4).

CC_ROUTINE numeric Tells the program which candidate code to (0–9) generate. 0=Turns off candidate code Default= 3 generation. Up to three candidate codes can be specified. Candidate codes can also be created with the Window Key Generator. See the Window Key Generator section of this manual for a description of this program.

*CHANGE_DDNAME table name The table defined in this parameter is used to change data in name and address lines before parsing. See page 4-46 for more information on the CHANGE_DDNAME special functionality.

DDL_INP_FNAME file name Name of the file of the input record specified in the DDL.

DDL_INP_RNAME record name Name of the record of the input record specified in the DDL.

DDL_OUT_FNAME file name Name of the file of the output record specified in the DDL.

DDL_OUT_RNAME record name Name of the record of the output record specified in the DDL.

DDL_OUTRPRT_FNAME file name Name of the file of the output report specified in the DDL.

DDL_OUTRPRT_RNAME record name Name of the record of the output report specified in the DDL.

Customer Data Parser 4-42 pfprsdrv.par Parameters

Name Values Description

DDL_PREPOS_FNAME file name Name of the file of the Parser return record specified in the DDL.

DDL_PREPOS_RNAME record name Name of the record of Parser return record.

GEN_BUSINESS_NAMES Y, N Generate business names: Y =If a record has both personal names and business names), a new record is generated for each business name. (Default) N=And the business name is not the first name on a record), no records are generated and the pr_busname_01 field does not get populated with any business name.

GEN_PERSONAL_NAMES Y, N Generate personal names: (Y is default) If set to Y and a record has both personal names and business names, a new record is generated for each personal name. If set to N (and the personal name is not the first name on the record), no records are generated and the personal name fields such as pr_title_01 and pr_first_01, fields are not populated.

INP_DDNAME file name Name of the input file.

**JOIN_LINES Joins the second name line (oraddrl2) to the first name line (oraddrl1) for re-parsing purposes. See page 4-47 for more information about using this parameter.

JOIN_LINES_DDNAME file name File specified here contains all values that can be used in the JOIN_LINES parameter.

LAST_LINE_TO_GEN_ 1-10, blank Specifies how many oraddr lines will be NAMES considered, when generating name sections, or additional name records. This paramater can limit the number of lines to generate for additional names. Blank defaults to all.

Trillium Software System™ Batch User’s Guide pfprsdrv.par Parameters 4-43

Name Values Description

MAX_NUMB_NAMES 1-10 The value used here means the maximum number of names generated from an input record with multiple names. This value, combined with the presence or absence of the pr_name_sect_02 DDL field, determines how the CDP generates names. You can define up to ten pr_name_sect_0X fields. 9=Add an additional record for every name found in a record 1=No additional records added. See “Name and Record Generation” on page 4- 49 for more information about using MAX_ NUMB_NAMES for name and record generation. This value controls how many records the parser can write to the output record from one input record. It does NOT control how many names the CDP itself identifies, or how many name segments from the PREPOS are written to the output.

MAXERR numeric Maximum number of records written to the reject file.

MAXIN numeric Maximum number of records to read (if blank, all records are read).

MAXOUT numeric Maximum number of records to write (if blank, all records are displayed).

NTH_MAXnn numeric Lets the user specify the number of example records (from each review group) to be displayed in the Parser Display Report. (nn=00–20) Works in conjunction with the NTH_RECnn parameter. The following example writes every 10th record (up to a maximum of 100 records) with a review group of 000 to the Parser Display Report: NTH_REC00 10 NTH_MAX 00 100

Customer Data Parser 4-44 pfprsdrv.par Parameters

Name Values Description

NTH_RECxx numeric Allows a sample of the records in a file to appear in the Parser Display report. All records (xx=00–20) from a file processed through the Parser will fall into one of 21 possible review groups. If you divide this value for each of the review groups by the number of records listed in the Parser statistics, you will have the number of records available to the Parser Display report. Displays nth record of review group Parser field xx, where xx is the review group number; ranges from 00 to 20. For example, NTH_ REC00 10 displays every 10th record of review group 000. If this parameter is not specified, then the records in that review group are not written out. This parameter must be specified if review group records are to be written out.

OUT_DDNAME file name Name of the output file.

OUTRPRT_DDNAME file name Name of the report output file.

PALOG_REC_ID_FLDNAME field name The value specified here is the name of a field from the input record. The value from this field (such as account_number) is written to palog, instead of the relative record number (REC=nnn) The first 30 bytes of the specified field will be displayed in palog (maximum number).

PA_PARMNAME parameter Name of the parameter file for the Parser file name (default is pfparser.par).

PRINT_NTH_COUNT numeric Prints the count of every nth records read. If 0 or not specified, no counts are reported.

REJ_DDNAME file name Name of the reject file. Any record with a confidence value less than the value specified in THRESHOLD is routed to this reject file.

Trillium Software System™ Batch User’s Guide pfprsdrv.par Parameters 4-45

Name Values Description

SPLIT_LINES Y, N Y=Checks if there is miscellaneous data on a Geog line. If so, it checks the last tokens on the line to see if they are valid city/province info. If they are valid, they are split into a new line and the record is reparsed. N=Disabled

START numeric Starts processing at this record (one-based) Typically used to skip past the header information in a customer-supplied data file.

STAT_FNAME file name Name of the statistics file. If not specified, statistics are displayed on the screen.

THRESHOLD 1-10, –1 Any confidence value less than this specified value goes to the reject file. If this is set to –1, no records are rejected.

ZERO_GEN_F1_LENGTH numeric The output field length of the ZERO_GEN_F1_OFFSET parameter.

ZERO_GEN_F1_OFFSET field name Provides the ability to zero out up to four fields in generated record’s output (such as F1, F2, F3, and F4).

Customer Data Parser 4-46 *CHANGE_DDNAME (‘change’ functionality)

*CHANGE_DDNAME (‘change’ functionality)

The change table is operates on the address lines prior to parsing in the work area containing the address lines, as defined by the DDL. This table does not change the source data, but only the data in the temporary work area passed to the parser. Its functionality is very much like a change command that is used in a text editor. This table should be reserved for problems that are unable to be solved by the parsing tables.

Syntax

CHANGE "FROM_VALUE", "TO_VALUE", "WHEN_VALUE"

FROM_VALUE Value to change TO_VALUE Value to change to WHEN_VALUE Condition of when you want to apply the changes

The layout for this section is: "line# pos" where line# One of the following values: 1—9 , A=10, or *=All pos One of the following values: B=Beginning, E=End, or D=Default (anywhere on the line)

Example CHANGE "JOHN ", "WILLIAM ", "1B" "WILLIAM ", "JOHN ", "1B"

This table changes beginning occurrences of "JOHN " to "WILLIAM " if this data is on the physical line 1. The second entry then changes it back. This illustrates how more than one change command can be applied. Note a space was used after “JOHN” and “WILLIAM” to not cause changes to a line beginning with "JOHNSON".

Do not exceed 200 entries in a change table. Similar caution should be used as one would use on a change command in a text editor.

Trillium Software System™ Batch User’s Guide **JOIN_LINES (‘join’ functionality) 4-47

**JOIN_LINES (‘join’ functionality)

This parameter is only used to join the second name line (oraddrl2) to the first name line (oraddrl1) for re-parsing purposes. Both lines must have a valid pattern identified for this to work.

Syntax

All values must be enclosed within double quotation marks.

“line1_end_value”, “line1_end_att”, “line2_beg_value” ,”line2_beg_att” where:

“line1_end_value” Value of the last token on the first name line. “line1_end_att” Attribute of the last token on the first name line. “line2_beg_value” Value of the first token on the second name line. “line2_beg_att” Attribute of the first token on the second name line.

Each set of parameter values is a test to determine whether or not to join the second name line to the first name line for re-parsing. There can be multiple sets of parameter values. Example

“joe”, “051”, This example would join name lines when the first name line “*”, ““ ended with the literal, “joe”, and had an attribute of FIRST, and a second name line that had anything at the beginning. “OF”,””,”*”, An asterisk (*) in the value position indicates a wildcard value. “*”,””,”OF”,

By using JOIN_LINES, you would change a record from: ‘MARY SMITH AND JOE’ to: ‘MARY SMITH AND JOE SMITH SMITH’ ‘SKATING CLUB OF’ to: ‘SKATING CLUB OF BOSTON BOSTON’

Customer Data Parser 4-48 **JOIN_LINES (‘join’ functionality)

The Parser automatically inserts a space between the last letter of the first line and the first letter of the second line. In the case where a single word is split over two lines, using “N” as the fourth parameter value will concatenate the lines without the space. If the fourth parameter value is already being used with a token value (such as “051”), simply add the “N” after the token value (“051N”) for the same effect.

The Parser statistics displays the total number of records that have been joined and re-parsed by the JOIN_LINES parameter. The log file, however, will not reflect the re-parsed data. The log file is created through the CDP module, not the driver where the re-parsing takes place. The display file shows the parsing of the record before and after the lines have joined.

Trillium Software System™ Batch User’s Guide Name and Record Generation 4-49

Name and Record Generation

The presence or absence of more than one name section in the output DDL triggers whether or not records will be generated for each name on the input record. The two options are:

If the pr_name_sect_01 field is the ONLY name section defined in the output DDL – Records are generated, and the number of records generated is determined by the lowest value between MAX_NUMB_ NAMES or the number of names on the input record.

If MAX_NUMB_NAMES=5 and input line is “John Smith & Mary Smith”: • Record 1 output segmentpr_name_sect_01 populated with “John Smith.” • Record 2 output segment pr_name_sect_01 populated with “Mary Smith.”

If multiple name sections are defined in the output DDL, then these name sections are populated with additional input names, but no additional records will be generated if there are more names than name sections defined.

Customer Data Parser 4-50 Name and Record Generation

The number of name sections that will be populated on an output record will be determined by the lowest value of:

MAX_NUMB_NAMES, or

Number of names on the input record, or

Number of name sections on the output DDL.

MAX_NUMB_NAMES controls how many names are written to any defined name sections (pr_name_sect_0n).

Up to ten pr_name_sect_0X fields may be defined.

The field pr_number_of_names contains the number of names written to name sections on an output record up to the value in MAX_NUMB_NAMES. If there are four names on an input record with three name sections defined and MAX_NUMB_NAMES is set to 2, then a single record is output with two name sections populated.

The presence of the field pr_number_of_input_names in parsout.ddl causes the value of the number of names on the input record to be populated. The r_number_of_input_names field should be of length 2. It can be added to the parsout.ddl in two ways:

Placed on its own, (for example as the last pr field), it will be populated independently and hold the number of names on the input record as additional information available in paout.

Placed under pr_number_of_names as a redefinition, output record length remains unchanged and both fields will then contain the number of names on the input record, so pr_number_of_names can be used in the old way.

Must be in lowercase to be recognized by the Parser.

Trillium Software System™ Batch User’s Guide Name and Record Generation 4-51

Example

Input record to Parser: Parsout.ddl has these name sections defined: Fiona MacDonald pr_name_sect_01 Susan Smith pr_name_sect_02 John Black pr_name_sect_03 Paula White 86 Concord Rd In the Parser parameter file, MAX_NUMB_NAMES is Billerica MA 01821 set to 2. Parser Output: pr_name_sect_01: Fiona MacDonald 86 Concord Rd pr_name_sect_02: Susan Smith Billerica MA 01821 pr_name_sect_03: blank

Customer Data Parser 4-52 Running the Customer Data Parser on UNIX and 32-bit PC Platforms

Running the Customer Data Parser on UNIX and 32-bit PC Platforms

To execute the cfprsdrv program, use the following command-line syntax:

cfprsdrv -parmfile [parm_file_name] -parmecho [echo_file_name] where: cfprsdrv Name of the driver program -parmfile Keyword that indicates the parameter file follows parm_file_name Name of the CDP driver parameter file -parmecho Keyword that indicates the parameter echo file follows echo_file_name Name of the file that displays any parameter processing errors in the program listing file (Optional)

Trillium Software System™ Batch User’s Guide IBM Mainframe Parser Sample JCL 4-53

IBM Mainframe Parser Sample JCL

The following sample Job Control Language is used to run cfprsdrv: // **************************************************************** //* SAMPLE JCL TO RUN PARSER (CFPRSDRV) // **************************************************************** //CFPRSDRV EXEC PGM=CFPRSDRV,REGION=5500K, // PARM='/-PARMFILE PF -PARMECHO PE', REGION=0M //STEPLIB DD DSN=&BASEPREF.&TRILVER.LOADLIB,DISP=SHR // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //STDINPUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(CONVOUT) //PREPOS DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(PREPOS) //PARSOUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(PARSOUT) //HDTRPT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(REPORT) //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //TRILMSGS DD DUMMY //PASTAT DD SYSOUT=* //CEEDUMP DD DUMMY,DCB=BLKSIZE=133 //PE DD SYSOUT=* //* PFPRSDRV IS THE PARSER DRIVER PARM FILE //PF DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.USMLIB(PFPRSDRV) //* PFPARSER IS THE PARSER PARM FILE //PFPARSER DD DISP=SHR,DSN=&PROJPREF.&TRILVER.USMLIB(PFPARSER) //* STDINPUT IS THE INPUT FILE //INPUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DATA.CONVOUT

Continued on next page

Customer Data Parser 4-54 IBM Mainframe Parser Sample JCL

//* PAOUT IS THE OUTPUT FILE (PARSOUT) //PAOUT DD UNIT=&UNIT,DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=2643,BLKSIZE=21144), // SPACE=(TRK,(10,50),RLSE), // DSN=&PROJPREF.&TRILVER.US.DATA.PAOUT //* PAREPORT IS THE SCRUB REPORT FILE //PAREPORT DD UNIT=&UNIT,DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=10520,BLKSIZE=21040), // SPACE=(TRK,(10,50),RLSE), // DSN=&PROJPREF.&TRILVER.US.PAREPORT //PAREJECT DD DUMMY,DCB=(RECFM=FB,LRECL=5206,BLKSIZE=5206) //* PALOG IS THE PARSER LOG FILE //PALOG DD UNIT=&UNIT,DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=200,BLKSIZE=23200), // SPACE=(TRK,(350,50),RLSE), // DSN=&PROJPREF.&TRILVER.US.DATA.PALOG //* PADSP IS THE PARSER DISPLAY FILE //PADSP DD UNIT=&UNIT,DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=133,BLKSIZE=23142), // SPACE=(TRK,(100,50),RLSE), // DSN=&PROJPREF.&TRILVER.US.DATA.PADSP //TABLEDEF DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.TABLES.CLTABDEF //TABLEPAT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.TABLES.CLTABPAT //TABLECIT DD DISP=SHR,DSN=&CDIRPREF.&TRILVER.TABLES.US.USTABCIT //TABLEAUX DD DISP=SHR,DSN=&CDIRPREF.&TRILVER.TABLES.US.USTABAUX /*

Figure 4.2 IBM Mainframe Parser Sample JCL

Trillium Software System™ Batch User’s Guide Customer Data Parser Error Messages 4-55

Customer Data Parser Error Messages

Table 4.2, “CFPRSDRV Driver Error Messages” describes the error messages that can be returned by the Customer Data Parser. Table 4.2 CFPRSDRV Driver Error Messages

Message Description Must have a parm file. The parameter file for the driver program is missing. Processing Error, status = 1 The path and parameter file for the CDP driver program is missing. Processing Error, status = 2 Parameter file for the driver program is present, but incorrect. Processing Error, status = 3 Parameter echo file for the CDP is present, but cannot be opened. Processing Error, status = 4 The program has encountered an error with a parameter entry. Processing Error, status = 5 Unknown command line parameter. Processing Error, status = 6 Duplicate parameter name found in file. Processing Error, status = 7 Bad format for override parameter. Processing Error, status = 8 Bad format for parameter in source program. Processing Error, status = 9 Parameter value was expected to be numeric. Processing Error, status = 10 Missing override value. Processing Error, status = 11 No ending quote. Processing Error, status = 12 Insufficient memory for parm_entry parameter value. Processing Error, status = 13 Insufficient memory for parm_entry parameter value. Processing Error, status = 14 Insufficient memory for long parameter value. Processing Error, status = 15 Insufficient memory for double parameter value. Processing Error, status = 16 Extraneous parenthesis found. Missing parameter PA_PARMNAME. The file pfparser.par is not specified in PA_PARMNAME. REQUIRED.

Customer Data Parser 4-56 Customer Data Parser Error Messages

Table 4.2 CFPRSDRV Driver Error Messages (Continued)

XX buffer malloc failed. Memory allocation error while loading CLTABDEF and XX= ukms, cams, bzms, hkms, CLTABPAT tables. Increase memory allocation for aums, tgms, or dems specific platform. cv_CopyValidation failed Copy of the report or output fields from prepos.ddl to . the report or output DDLs failed. Must specify either An input record length must be specified or the use of INPUT_LRECL or use DDL. a DDL must be used in conjunction with this operation. Input buffer malloc failed. Memory allocation defined for the input buffer is insufficient. Output buffer malloc failed. Memory allocation defined for the output buffer is insufficient. prepos buffer malloc failed. Memory allocation defined for the prepos buffer is insufficient. No input file specified. An input data file has not been specified in INP_DDNAME. Unable to open input file The input file specified in INP_DDNAME is invalid or contents are corrupted. Unable to open output file The output file specified in OUT_DDNAME is invalid . or contents corrupted. Unable to open report file The report file in OUTRPRT_DDNAME is invalid or . contents are corrupted. Unable to open the reject file The reject file in REJ_DDNAME is invalid or the . contents are corrupted. Problem trying to initialize the A parameter values within pfparser.par is incorrect. parser. I/O error during read on Temp directory holding record overflow for this . process is out of space. ABEND Parser problems. One of the Parameter values within the pfparser.par is incorrect. Check the PALOG file for error messages. cv_CopyRecord failed: An error has occurred when trying to copy the . input record.

Trillium Software System™ Batch User’s Guide Customer Data Parser Error Messages 4-57

Table 4.2 CFPRSDRV Driver Error Messages (Continued) cv_CopyField (ORG_RECORD) The ORG_RECORD section does not match that of failed . the original input file. ORG_RECORD information is missing and cannot be copied to PAOUT. cv_PutStringValue failed An error occurred while writing a value to a field that does not exist. Writing to . An error has occurred while writing to the specified file. Writing to . An error occurred while writing to the specified field. Check that the field name specified in the program is correct. Problem in parser during close. An error occurred trying to close the Parser. Check that the output file has not been corrupted and that there is sufficient space available. Closing file . An error occurred trying to close the specified file. Check that the file has not been corrupted and that there is sufficient space available. Four values required for JOIN_ Four parameter values must be specified for LINES. JOIN_LINES. The use of wildcards is not permitted and cannot be substituted into any value. Unable to open statistics file File specified in STAT_FNAME is invalid.

Customer Data Parser 4-58 About Window Keys

About Window Keys

The Window Key Generator program allows users to create Window Keys that are required for all processes in the Trillium Software System. See the Window Key Generator chapter for complete information.

Multiple Customer Data Parsers

Secondary and Tertiary Parser parameters are available for the log, detail display, and word, pattern, city and auxiliary city tables. The program is designed to process records from many geographic areas; it handles regional and international differences of address vocabulary through multiple processing layers. The parameters allow for the specifications of primary, secondary and tertiary tables, if required.

Up to three CDPs can be enabled through the parameter file.

How Multiple Parsers Work

Multiple CDPs work together to process records from various countries. Currently, the CDP can lookup data for a list of countries in addition to ordinary name and address parsing. See PRIMARY_GEO_VALUE in the CDP parameter file for a list of countries used.

If you have a license for more than one country, then you can setup the Customer Data Parser to parse data from up to three countries in one step. However, the data needs to be split into two jobstreams to continue through the Postal Geocoders (if available).

When more than one CDP is specified in the file, the world origin of a parsed address, other than that of the primary CDP, is forwarded to either the secondary Parser or the Tertiary CDP, if applicable. Therefore, the primary CDP determines the country of origin of the address (if possible) and then routes the record to the proper Parser for that country.

Trillium Software System™ Batch User’s Guide Line Pattern Identification Codes 4-59

Line Pattern Identification Codes These codes indicate general content and relative position of the original name/address lines. They also invoke processing routines.

Inbound definition—Line pattern types which may be predefined in the Customer Data Parser Parameter File (for example: LINE1=1 99 ?; LINE1=1 99 N; LINE1=1 99 S, etc.)

Outbound definition—Line pattern types which the Customer Data Parser has identified—Displayed on the Customer Data Parser Display Reports. Pattern Description N Name line F Firm name S Street address G Geography line (city, state, postal code) X Provides the option that prohibits the line from being identified as a Name line ? Allows the CDP to determine the line pattern type (Default)

Pattern Description N Name line S Street address A Additional address data (such as apartment information) B Post office box line R Rural route line G Geography line (city, state, postal code) H Hold line I Ignore line Y Miscellaneous line with care-of Z Miscellaneous street line M Miscellaneous line ? Unidentified line E Line containing the email address (must be by itself on line)

Customer Data Parser 4-60 Pattern Leveling

Pattern Leveling

The Parser stores two kinds of attributes for an element:

Intrinsic attributes such as ALPHA, 1ALPHA-NUMERIC, 1NUMERIC

Specific attributes for a word or phrase such as FIRST, HSNO, L-TYPE and RELATIONSHIP Pattern leveling is the process of generalizing a name or street pattern by: 1. The selected replacement of specific attributes with the intrinsic attribute for that element. 2. The selected removal of descriptive or ‘extra’ data from the line.

The rules governing attribute replacement and removal of elements are based on the type of line. There are multiple levels of generalization, each one making the pattern easier for the Parser to understand. The number of parsing levels is controlled by the parameters: NAME_PARSING_DEPTH STREET_PARSING_DEPTH

“Name Pattern Depth Levels” on page 4-61 and “Street Pattern Depth Levels” on page 4-63 describe level processing for Name and Street patterns, respectively. If the line is predefined as “F” Business in the parameter file, it first identifies the tokens. Then it looks up patterns. Otherwise, any business token causes the entire line to business (unless NO_SPECIAL_BUSINESS_ SERVICE is set to NP.)

Trillium Software System™ Batch User’s Guide Name Pattern Depth Levels 4-61

Name Pattern Depth Levels

When Action Performed Prior to 1Handle concatenees. If token after CONCATENEE is not ALPHA, 1ALPHA or patterns ALPHA-1SPECIAL, reset next token parsed attribute to ALPHA: CON FIRST > CON ALPHA; Force concatenated token to have parsed attribute of LAST; Remove connectors between concatenated tokens: O’NEAL > ONEAL Business word processing. Force all tokens to BUSINESS if any of the following occur: More than one comma on line; Any numeric not defined as a GENERATION on line; Where BUSINESS, B-DESCRIPTIVE or B_REDEFINE token is present; If ALPHA token has more than two characters, and all the characters are consonants; If ALPHA-1SPECIAL token ends with an apostrophe s (‘s) characters: JOHN’S BAR AND GRILL; BUSINESS token assignment will not take place if any of the following occur: Token has the same last name on the next or previous line; If there is a BUSINESS? token and either a FIRST, LAST, TITLE, P-TITLE, GENERATION or RELATIONSHIP; Splits single/words where at least one element to left or right is a FIRST. Check for token splitting where there is an ALPHA-1SPECIAL token and the next token does not have a parsed attribute of DESC; Split single token into two tokens where a “+”, “/” or “&” character is part of it: MARY+JOHN > MARY & JOHN; Split names with P-TITLE tokens: DRs John & MARY SMITH > Dr John Smith & Dr Mary Smith. Level 1 An ALPHA-1SPECIAL token with a HYPHEN recodes to an ALPHA token.

Customer Data Parser 4-62 Name Pattern Depth Levels

When Action Performed Level 2 Level 1 +: 1Skip leading (B-DESCRIPTIVE, DESCRIPTIVE and everything after) RELATIONSHIP, CONNECTOR, REDEFINE, CARE-OF) and unless the NO_SPECIAL_BUSINESS parameter is set, it will make one attempt to find pattern match. Skip everything after a non-leading DESCRIPTIVE. Skip trailing B-DESCRIPTIVE, RELATIONSHIP, CONNECTOR, REDEFINE, CARE-OF. RELATIONSHIP goes to a CONNECTOR. First IGNORE token sets all other tokens to IGNOREs. Level 3 Level 2 +: 1FIRST token changed to an ALPHA token. LAST token changed to an ALPHA token. Level 4 Level 3 +: 1Skip leading (ALPHA-1SPECIAL without a “-“ or “/” character), ALPHA-SPECIAL or OTHER-SPECIAL; Skip trailing (ALPHA-1SPECIAL without a “-“ character), ALPHA-SPECIAL, NUMERIC-1SPECIAL, NUMERIC-SPECIAL or OTHER-SPECIAL.

Trillium Software System™ Batch User’s Guide Street Pattern Depth Levels 4-63

Street Pattern Depth Levels

When Action Prior to 1Saves last TYPE. patterns Allows only one APT-COMPLEX; others set to null. HIGHWAY or ROUTE without pairs go to TYPE. Units without pair go to APT-COMPLEX. Process double DIRECTIONS (combines). House processing (split HYPHEN except HI, NY or valid range). Combines multiple ALPHA tokens. Level 1 Everything as is. Level 2 1Skip leading CARE-OF. Skip IGNORE and everything after it. Pairs of two part tokens (APARTMENT, APARTMENT#) are removed. Level 3 Level 2 +: 1All types (L-TYPE, SEC-TYPE, TYPE) go to TYPE. STREET goes to ALPHA. NUMBER goes to ALPHA. ALPHA-1SPECIAL with a HYPHEN or quote go to ALPHA. Level 4 Level 3 +: 1S-DIRECTION going to DIRECTION. ALPHA-NUMERIC containing a single alpha character (e.g. w12345) preceded by a HSNO # GOING to APARTMENT#. Level 5 Level 4 +: 1. Non last TYPES going to ALPHA. 2. Single ALPHA preceded by HSNO # going to APARTMENT# and is skipped. Level 6 Level 5 +: Combines all adjacent ALPHA tokens into 1 ALPHA token.

Customer Data Parser 4-64 Customer Data Parser Output

Customer Data Parser Output

The following sections describe the output records and files returned by the Customer Data Parser. This output includes:

CDP repository record (PREPOS)

Parser Log File (palog)

Parser Scrub Report

Parser Detail File

Parser Statistics File

Customer Data Parser Repository (PREPOS) Record

This output record details the parsing returned data. Data is in two formats: display and non-display.

Non-display fields are constructed with matching purposes in mind.

Display fields contain the original input data. They are constructed with presentation purposes in mind.

Values are assigned to these fields using the Table Maintenance definition of the elements as described in the following table.

If Then Else RECODE value exists for Use recode value Use the original word/phrase the word/phrase (for example, if input value record=’Bob’, output= ‘Robert’) SYNONYM value exists Use synonym value for the word/phrase SYNONYM value exists Use synonym value use the original word/phrase for the word/phrase: value (for example, if input record=’Bob’, output= ‘Bob’)

Trillium Software System™ Batch User’s Guide Parser Repository (PREPOS) Layout 4-65

Parser Repository (PREPOS) Layout

The Customer Data Parser PREPOS layout breaks down into the following sections to provide a 9836-byte layout: Table 4.3 Parser Repository (PREPOS) Layout

Section Description

Codes (1-781) All of the codes returned from the Customer Data Parser

Street (782-2061) All of the street information returned from the Customer Data Parser

Geographic (2062-2469) All of the geography information returned from the Customer Data Parser

Geocoder (2470-2569) Includes user-defined geographic codes (such as tract, longitude/latitude, and so on)

Input Geographic Match The layout shown below is for the US Geocoder. See country-specific geocoder documentation for other country geographic match formats

Output Geographic Match The layout shown below is for the US Geocoder. See country- specific geocoder documentation for other country geographic match formats

Name These fields are repeated nine times for a total of 10 names. Each section is described by one field (pr_name_sect_0x) re-defining the entire 571 byte area

Normalized Section Original order of elements

Customer Data Parser 4-66 Complete PREPOS Layout

Complete PREPOS Layout

This table describes all of the fields in the PREPOS: Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

Codes Section

PR_RETURN 1 1 The CDP return codes.

PR_CONFIDENCE 2-3 2 Acceptance level of complete name and address. Confidence is how well the system analyzes the completeness of the record and how hard it worked to identify each line on the record. It works with the THRESHOLD parameter in the driver parameter file (pfprsdrv). The number is between 00 and 10, (10 is highest). As confidence decreases it shaves off the score passed in to determine its final value.

PR_ COMPREHENSION 4-5 2 Acceptance level of understood name and address. Comprehension is how well the system understands the input it was given. If the system feels it has understood all parts of the input then it assigns a numerical score. The scores are between 00 and 10, 10 being the highest. After a numerical score is established, it goes on to the next area of confidence.

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-67

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_ORIG_LINEPAT 6-15 10 Original line pattern of 10 input address lines corresponding to the in_ area LINE1-LINE10 definitions. For a complete list of line patterns, see the section “Line Pattern Identification on the Display Report” later in this chapter. The line pattern of the input address lines is determined through a combination of weighting the orientation of the line within the full address and the number and weight of the elements identified on the line.

PR_LINE_RULES 16-35 20 Two-byte line type identification rules.

00 = No Rule Applied 01 = User Wins 02 = First Is Name 03 = By Line Flag 04 = Has House number 05 = Has Zip Code 06 = Has Score 07 = Surrounded By Geography 08 = Identify Geography 09 = Same Last Name 10 = First Token Misc. 11 = Surrounded By Street 12 = Matched Name Pattern 13 = Has Apostrophe 14 = Surrounded By Names 15 = Last Line Has Geography 16 = Connected To Previous 17 = Adjusted By Overall Context 18 = Misc. Has Name Score 19 = Misc. Has Street Name 20 = Street Line Already Present These rules are applied to determine the line types of oraddrl1-10.

Customer Data Parser 4-68 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_NAME_REVIEW_CODES 36-335 300 The review code for names. For a complete list, see the section “Review Codes and Review Groups” later in this chapter. These are three-byte codes. They are allocated because any single name may have more than 1 condition to report. Used in determining the pr_rev_group field.

PR_STREET_REVIEW_CODES 336-365 30 The review code for streets. For a complete list of name codes see the section “Review Codes and Review Groups” later in this chapter. These are three-byte codes. This information is used in the determination of the pr_rev_group field.

PR_GEOG_REVIEW_CODES 366-395 30 The review code for geography fields. For a complete list of codes see the section “Review Codes and Review Groups” later in this chapter.These are three-byte codes. This information is used in the determination of the pr_rev_group field.

PR_MISC_REVIEW_CODES 396-695 300 The review codes for miscellaneous problems. For a complete list of codes see the section “Review Codes and Review Groups” later in this chapter. These are three-byte codes. This information is used in the determination of the pr_rev_group field.

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-69

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_GLOBAL_REVIEW_CODES 696-725 30 The review codes for global review problems. For a complete list of codes see the section “Review Codes and Review Groups” later in this chapter. These are three-byte codes. This information is used in the determination of the pr_rev_group field.

PR_NUMBER_OF_NAMES 726-727 2 Values are 00-10. The number of names (personal and business) written to name sections on an output record.

PR_NAME_TYPES 728 1 1 = Retail/personal - (e.g. John Smith) 2 = Business - (e.g. Trillium) 3 = Reject - (neither a business or personal name was identified) 4 = Mixed - (includes both a business and individual name) A categorization value based on the mixture of name types identified on the input record.

PR_CATEGORY 729-778 50 This is an output field used to store standard or user-defined category codes assigned to word definitions in the Word/Pattern tables. Concatenated codes from overall address, terminated with a | character. An example of a category may be an SIC Code assigned to a business word. See Table Maintenance documentation for info on the assignment of categories.

Customer Data Parser 4-70 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_REV_GROUP 779-781 3 The review group codes displayed in the default review group hierarchy. These codes help in determining general data conditions within parsing, and guide the user to areas for more specific parser tuning.These codes and their hierarchy are in the section “Review Group Hierarchy” later in this chapter. The Statistical Report summarizes numbers and percentages of records distributed over each review group.

Street Section

PR_HSE_NBR 782-796 15 Output field used to store the input house value. 5, 10, 20, etc. (As in 5 Main Street)

PR_HSE_NBR_DISPLAY 797-811 15 Output field used to store the original input house value. 5, 10, 20, etc. (no recodes applied)

PR_HSE_MASK 812-826 15 Output field used to store the shape of the input house values. N = Numeric A = Alpha -, / = Special

PR_HSE_TYPE 827 1 Output field used to store the input house type. N = Numeric - = Hyphenated U = Unusual S = Slash B = Blank A = Alpha

PR_ST_TL 828-852 25 Output field to store input street title. Main, Elm (as in 5 Main Street)

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-71

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_ST_TL_DISPLAY 853-877 25 Output field to store the original input street title. Main, Elm, etc. (no recodes applied)

PR_ST_TYPE1*** 878-892 15 Output field to store the first street type input. Rd., St. (as in 5 Main St.)

PR_ST_TYPE1_DISPLAY 893-907 15 Output field to store the original first street type input. Rd., St. (no recodes applied)

PR_ST_TYPE2*** 908-922 15 Output field to store second street type input. Rd., St., Ave., etc.

PR_ST_TYPE2_DISPLAY 923-937 15 Output field used to store the original second street type input. Rd., St., Ave., etc. (no recodes applied)

PR_PR_ST_DIR 938-949 12 Output field used to store input prefix street direction. This field will or will not be populated depending on the position of the incoming direction. Direction BEFORE street name = pr_ pr_st_dir field is populated. Direction AFTER street name = pr_ sc_st_dir field is populated. The pattern structure determines the final answer. N, S, W, E, etc.

PR_PR_ST_DIR_DISPLAY 950-961 12 Output field used to store the original prefix street direction input. N, S, W, E, etc.(no recodes applied)

Customer Data Parser 4-72 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_SC_ST_DIR 962-973 12 Output field used to store the post street direction input. This field will or will not be populated depending on the position of the incoming direction. Direction BEFORE street name = pr_ pr_st_dir field is populated. Direction AFTER street name = pr_ sc_st_dir field is populated. The pattern structure determines the final answer. N, S, W, E, etc.

PR_SC_ST_DIR_DISPLAY 974-985 12 Output field used to store the original post street direction input. N, S, W, E, etc.(no recodes applied)

PR_RTE_NAME 986-1005 20 Field to store input route name. Route, Rt, etc.

PR_RTE_NAME_DISPLAY 1006-1025 20 Field to store the input route name. Route, Rt, etc., (no recodes applied)

PR_RTE_NBR 1026-1033 8 Output field, stores input route value. 1, 300, 212, etc.

PR_RTE_MASK 1034-1041 8 Output field used to store the shape of the input route value. N = Numeric A = Alpha -, / = Special

PR_RTE_TYPE 1042 1 Field to store input route type. R = Route; H = Highway

PR_BOX_NAME 1043-1062 20 Field to store input box name. Box, PO Box, etc.

PR_BOX_NAME_DISPLAY 1063-1082 20 Output field to store the original input box name. Box, PO Box, etc., (no recodes applied)

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-73

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_BOX_NBR 1083-1092 10 Field to store the input box value. 6, 12, 811, etc.

PR_BOX_MASK 1093-1102 10 Output field to store the shape of the input box value. N = Numeric A = Alpha -, / = Special

PR_BOX_TYPE 1103 1 Output field to store input box type. B = Box D = Drawer P = PO Box T = Pole

PR_COMPLEX1_NAME 1104-1128 25 Output field used to store the standardized name of an apartment, military, or business complex. Bay Crest, USS John F Kennedy, etc. (Contains the ‘Marvin’ in “Marvin Gardens.”)

PR_COMPLEX1_NAME_ 1129-1153 25 Output field used to store the original DISPLAY input name of an apartment, military, or business complex. Bay Crest, USS John F Kennedy, etc.

PR_COMPLEX1_TYPE 1154-1168 15 Output field used to store the standardized type of apartment, military, or business complex. Airport, College; (Contains the ‘Gardens’ in “Marvin Gardens.”)

PR_COMPLEX1_TYPE_DISPLAY 1169-1183 15 Output field used to store the original type of apt., military, or bus. complex Airport, College

PR_COMPLEX2_NAME 1184-1208 25 Output field to store the standardized name of a secondary apartment, military, or business complex. Bay Crest, USS John F Kennedy, etc.

Customer Data Parser 4-74 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_COMPLEX2_nAME_ 1209-1233 25 Output field used to store the original DISPLAY input name of a secondary apartment, military, or business complex. Bay Crest, USS John F Kennedy, etc. (no recodes applied)

PR_COMPLEX2_TYPE 1234-1248 15 Output field to store the standardized type of a secondary apartment, military, or business complex. Airport, College

PR_COMPLEX2_TYPE_DISPLAY 1249-1263 15 Output field used to store the original type of a secondary apartment, military, or business complex. Airport, College, (no recodes applied)

PR_COMPLEX3_NAME 1264-1288 25 Output field to store the standardized name of a tertiary apartment, military, or business complex. Bay Crest, USS John F Kennedy, etc.

PR_COMPLEX3_NAME_ 1289-1313 25 Output field used to store the original DISPLAY input name of a tertiary apartment, military, or business complex. Bay Crest, USS John F Kennedy, etc. (no recodes applied)

PR_COMPLEX3_TYPE 1314-1328 15 Output field to store the standardized type of a tertiary apartment, military, or business complex. Airport, College

PR_COMPLEX3_TYPE_DISPLAY 1329-1343 15 Output field to store original type of a tertiary apartment complex, military complex, or business complex. Airport, College, (no recodes applied)

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-75

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_DWEL1_NAME ** 1344-1363 20 Output field used to store the standardized type of dwelling when pairs of 2-part dwelling tokens exist. 2-part dwelling tokens: Apartment Apartment# Suite Suite#

PR_DWEL1_NAME_DISPLAY 1364-1383 20 Output field used to store the original type of dwelling when pairs of 2-part dwelling token exist. 2-part dwelling tokens: Apartment Apartment# Suite Suite#

PR_DWEL1_NBR 1384-1393 10 Output field to store dwelling value when pairs of 2-part dwelling token exist. 12, 200, as in Suite 12.

PR_DWEL1_MASK 1394-1403 10 Output field used to store the shape of the dwelling value when pairs of 2- part dwelling token exist. N = Numeric A = Alpha -, / = Special

PR_DWEL1_TYPE 1404 1 Output field used to store the dwelling type value when pairs of 2- part dwelling token exist. A = Apartment F = Floor U = Unit L = Lot D = Department S = Site T = Suite ? = Used if a record contains a dwelling #, but no type

Customer Data Parser 4-76 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_DWEL2_NAME ** 1405-1424 20 Output field to store the standardized type of the secondary dwelling when pairs of 2-part dwelling tokens exist. 2-part dwelling tokens: Apartment Apartment# Suite Suite#

PR_DWEL2_NAME_DISPLAY 1425-1444 20 Output field to store the original type of secondary dwelling when pairs of 2-part dwelling token exist. (No recodes applied.) 2-part dwelling tokens: Apartment Apartment# Suite Suite#

PR_DWEL2_NBR 1445-1454 10 Output field used to store the secondary dwelling value when pairs of 2-part dwelling token exist. 12, 200, as in Suite 12.

PR_DWEL2_MASK 1455-1464 10 Output field used to store the shape of the secondary dwelling value when pairs of 2-part dwelling token exist. N = Numeric A = Alpha -, / = Special

PR_DWEL2_TYPE 1465 1 Output field used to store the dwelling type value when pairs of 2- part dwelling token exist. A = Apartment F = Floor U = Unit L = Lot D = Department S = Site T = Suite ? = Used if a record contains a dwelling #, but no type

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-77

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_DWEL3_NAME ** 1466-1485 20 Output field used to store the standardized type of the tertiary dwelling when pairs of 2-part dwelling tokens exist. 2-part dwelling tokens: Apartment Apartment# Suite Suite#

PR_DWEL3_NAME_DISPLAY 1486-1505 20 Output field used to store the original type of tertiary dwelling when pairs of 2-part dwelling token exist.

PR_DWEL3_NBR 1506-1515 10 Output field used to store the tertiary dwelling value when pairs of 2-part dwelling token exist. 12, 200, as in Suite 12.

PR_DWEL3_MASK 1516-1525 10 Output field used to store the shape of the tertiary dwelling value when pairs of 2-part dwelling token exist. N = Numeric A = Alpha -, / = Special

PR_DWEL3_TYPE 1526 1 Output field used to store the tertiary dwelling type value when pairs of 2- part dwelling token exist. A = Apartment F = Floor U = Unit L = Lot D = Department S = Site T = Suite ? = Used if a record contains a dwelling #, but no type

Customer Data Parser 4-78 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_MISC_ADDR 1527-2026 500 Output field used to store misc. address values. 50 bytes per line. Data from any address line with an IGNORE attribute is contained here (num, unclaimed, unknown)

PR_BEST_NUMBER 2027-2036 10 Best number composite, constructed as follows: If house # exists, use house number. If APT # exists use APT number. If box # exists use box number. If route # exists use route number. 1, 200, 1200. This number is used primarily in the relationship matching process.

PR_BEST_ST_TL 2037-2061 25 Best street title composite, constructed as: If street name exists, use it. If complex name/type exists, use it. If box name exists, use it. Main, Elm, etc. This number is used primarily in the relationship matching process.

Geographic Section

PR_COUNTRY_NAME 2062-2091 30 Output field used to store the country name. Canada, Mexico - This field populates when a country field is found within address lines (not assigned by the program)

PR_COUNTRY_NAME_DISPLAY 2092-2121 30 Output field that stores the original country name. No recodes applied for Canada and Mexico.

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-79

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_NEIGH1_NAME 2122-2151 30 Output field used to store the neighborhood 1/locality name. If the Parser parm NEIGHBORHOOD_FORMAT_ OPTION is used, this field is combined with pr_neigh1_ name_display and treated as one 60-byte field, for storing the neighborhood value. Typically used in data residing in the UK and Latin America.

PR_NEIGH1_NAME_DISPLAY 2152-2181 30 Output field used to store the original neighborhood 1/locality name. If the Parser parm NEIGHBORHOOD_FORMAT_ OPTION is used, this field is combined with the pr_neigh1_ name field and treated as one 60-byte field for storing the neighborhood value. Bo Barcelona, URB El Duque (no recodes applied). This field is typically used in data residing in the UK and Latin America for example.

PR_NEIGH2_NAME 2182-2211 30 Output field used to store the neighborhood 2/locality name. Bo Barcelona, URB El Duque. This field is typically used in data residing in the UK and Latin America for example.

PR_NEIGH2_NAME_DISPLAY 2212-2241 30 Output field used to store the original neighborhood 2/locality name. Bo Barcelona, URB El Duque (no recodes applied). This field is typically used in data residing in the UK and Latin America for example.

Customer Data Parser 4-80 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_CITY_NAME 2242-2271 30 Output field to store the city name. Boston, New York, London

PR_CITY_NAME_DISPLAY 2272-2301 30 Output field to store original city name. Boston, Manhattan (no recodes applied)

PR_CITY_NUMBER 2302-2307 6 Output field used to store the city name. V22663, (ANNNNNN). This value is assigned in the parser and is generally used as a primary look-up key in any geocoding process.

PR_CITY_STATUS 2308 1 Output field used to store the city status. Used to help determine the correct city name using USTABAUX. Flag values are: 1=Preferred mailing name. 2=Acceptable mailing name 9=Unacceptable mailing name.

PR_CITY_LNAME_DIR 2309-2338 30 Output field used to store the long city name from the directory. Displays city and state abbreviation (Billerica, MA)

PR_ST_PROV_CTY_NAME 2339-2368 30 Output field used to store the state, province or county name. MA, Cheshire

PR_ST_PROV_CTY_NAME_ 2369-2398 30 Output field used to store the original DISPLAY state, province or county name. Massachusetts, Cheshire (no recodes applied)

PR_POSTAL_CODE 2399-2413 15 Output field to store the input postal code. 01821, 01879, M8X 2X3

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-81

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_POSTAL_CODE_MASK 2414-2428 15 Output field to store the shape of the input postal code (pr_postal _code) mask. N = Numeric A = Alpha -, / = Special

PR_POSTAL_CODE_TYPE 2429 1 Output field to store the type of input postal code. Unused at this time; fill with “U”

PR_POSTAL_CODE_DIR 2430-2444 15 Output field to store the postal code found on the USTABCIT city table by the CDP for the city/state unidentified by the CDP. 01821

PR_POSTAL_CODE_MASK_DIR 2445-2459 15 Output field to store the shape of the postal code identified by the CDP (pr_ postal_code_dir) mask N = Numeric A = Alpha -, / = Special

PR_POSTAL_CODE_TYPE_DIR 2460 1 One-character codes assigned by the USPS and found by the CDP on the USTABCIT city table. The values are: 4 = Box Code 5 = Special Military 8 = Non Unique

PR_ST_PROV_NUMBER 2461-2462 2 2-character numeric identifier assigned to the states by the USPS (also known as a FIPS code). 01-99 US ONLY

Customer Data Parser 4-82 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_WORLD_ORIGIN 2463 1 Numeric identifier used by the parser to identify country of origin for data being parsed. The indicators of origin are: 1 = United States (USA) 2 = Canada (CA) 3 = United Kingdom (UK) 4 = Other not assigned a code 5 = Brazil (BZ) 6 = Australia (AU) 7 = Germany (DE) 8 = Italy (IT) 9 = Optional value available for Portugal only. When used, the last token on a street line is NOT changed from APT to ALPHA.

PR_POST_OFFICE_CODE 2464-2469 6 Numeric identifier assigned by the USPS to identify postal delivery areas (also known as Postal Finance Code). US only. For example: 24071 (NNNNN)

Geocode Section—For user-defined geographic codes, such as tract, longitude/ latitude, etc.

PR_GEOCD_A 2470-2479 10 Geographic Code A. Reserved for additional Geographic codes.

PR_GEOCD_B 2480-2489 10 Geographic Code B. Reserved for additional Geographic codes.

PR_GEOCD_C 2490-2499 10 Geographic Code C. Reserved for additional Geographic codes.

PR_GEOCD_D 2500-2509 10 Geographic Code D. Reserved for additional Geographic codes.

PR_GEOCD_E 2510-2519 10 Geographic Code E. Reserved for additional Geographic codes.

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-83

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_GEOCD_F 2520-2529 10 Geographic Code F. Reserved for additional Geographic codes.

PR_GEOCD_G 2530-2539 10 Geographic Code G. Reserved for additional Geographic codes.

PR_GEOCD_H 2540-2549 10 Geographic Code H. Reserved for additional Geographic codes.

PR_GEOCD_I 2550-2559 10 Geographic Code I. Reserved for additional Geographic codes.

PR_GEOCD_J 2560-2569 10 Geographic Code J. Reserved for additional Geographic codes.

Input Geographic Match Section (This section shows data for the US Geocoder only—see country-specific geocoder documentation for other country formats)

PR_GIN_STR_NAME 2570-2591 22 Input street name. Original Street Name, ex. Main, Elm, etc.

PR_GIN_POSTAL_CODE 2592-2600 9 Input postal code. Original postal code, ex. 01821

PR_GIN_STR_PRE_DIRECTION 2601-2602 2 Input pre direction. Street directional that precedes the street name. Commonly designations like “N”, “SW”, etc.

PR_GIN_STR_POST_ 2603-2604 2 Input post direction. Street DIRECTION directional that follows the street name. Common designations like “N”, “SW”, etc.

PR_GIN_STR_SUFFIX 2605-2608 4 Input Street suffix. (like AVE or RD.) Appropriate abbreviations are used.

PR_GIN_HOUSE_NUMBER 2609-2618 10 Input (original) House Number. 5, 10, 20, etc.

PR_GIN_SECONDARY_ 2619-2624 6 Input second number. Original NUMBER Secondary Number for Apt. Suite, etc. For example: 5, 10, 20, etc.

Customer Data Parser 4-84 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_GIN_SECONDARY_TYPE 2625-2628 4 Input second type. Original Secondary Type. For example: Apt, Ste, Unit, etc.

PR_GIN_BLDG_FIRM_NAME 2629-2654 26 Input building firm. For example: Original name presented on line 1

PR_GIN_POSTAL_NUMBER 2655-2660 6 Post office number. For example: 240714 (NNNNNN)

PR_GIN_POSTAL_CITY_ 2661-2666 6 Postal city number. For example: NUMBER V21489 (ANNNNN)

PR_GIN_POSTAL_CITY_NAME 2667-2696 30 A valid city name for mailing purposes; appears in the last line of an address on a mailpiece. Postal city name = BOSTON (preferred city name) pr_city_name = BEACON HILL (vanity city name) Postal city name = NEW YORK CITY (preferred city name) pr_city_name = TUDOR CITY (vanity city name) Postal city name = PORT AUTHORITY (preferred city name) pr_city_name = NEW YORK CITY (vanity city name)

PR_GIN_STPRVCTY_NAME 2697-2698 2 State, province, county; for example: MA

PR_GIN_RECORD_TYPE 2699 1 Input record type. Geocoder code used as part of an address lookup. 1 = Street 2 = RR/HC 3 = PO Box 4 = Error 5 = Bad Street Pattern F=Foreign address record

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-85

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

Output Geographic Match Section (for US Geocoder only – see country specific geocoder documentation for other country formats)

PR_GOUT_POSTAL_CODE 2700-2708 9 Output postal code. The original postal code or postal code from the postal directory when a match was possible. Values depend on the match success of the street components as indicated by the fail level.

PR_GOUT_DELIVERY_POINT 2709-2710 2 Output delivery point. Two-byte numeric code created per USPS rules derived from street_info and secondary number.

PR_GOUT_DELIVERY_POINT_ 2711 1 Output delivery point code. This is a CD code determined by taking the two rightmost house numbers, adding them together, along with the zip+9, and then subtracting that from the nearest multiple of 10. If the house number was 20 and the ZIP was 123456789, you would add 2+0+1+2+3+ 4+5+6+7+8+9 and get 47. Now subtract 47 from the next nearest multiple of 10, (50). The value would be 3.

PR_GOUT_CARRIER_ROUTE 2712-2715 4 A 4-byte code assigned to a mail delivery or collection route within a 5- digit ZIP code. First character is alphabetical; the last three are numeric: Bnnn = PO Box Hnnn = Highway Contract Rnnn = Rural route Cnnn = City delivery Gnnn = General delivery

Customer Data Parser 4-86 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_GOUT_DELIVER_ADDR 2716-2765 50 Delivery address line rebuilt from either USPS ZIP+4 Standardized data if matched or parsed data if not matched. 5 Main St.

PR_GOUT_HOUSE_NUMBER 2766-2775 10 Output house number. For example: 5, 10, 20, etc.

PR_GOUT_STR_PRE_ 2776-2777 2 Street prefix direction. Street DIRECTION directional that precedes the street name. Letter designations like “N”,”SW”, etc.

PR_GOUT_STR_NAME 2778-2799 22 Street name Original street name or street name from directory when a match was possible.

PR_GOUT_STR_POST_ 2800-2801 2 Street post direction. Directional that DIRECTION follows the street name. Letter designations like “N”,”SW”, etc.

PR_GOUT_STR_SUFFIX 2802-2805 4 Street suffix. Street types like AVE or RD. Appropriate abbreviations are used.

PR_GOUT_SECONDARY_TYPE 2806-2809 4 Output of the Postal Geocoder identifying the type of apartment. Street component type that follows a street name. (SUITE, APT, etc.)

PR_GOUT_SECONDARY_ 2810-2815 6 Output of the Postal Geocoder NUMBER identifying the apartment number.

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-87

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_GOUT_FAIL_LEVEL 2816 1 A numeric code used to identify the level at which the address matched or failed to match the postal directories. 0 = Exact match; zipcode + 4 returned. 1 = City failure, city not found on directory 2 = Street Name failure, street name not found on the postal directory. 3 = Primary Range failure, typically the house # cannot be found on directory. 4 = Street Components failure, secondary street components (like apartment information) not found on the postal directory. 5 = Ambiguous Failure: not one clear match (usually when more info is required or missing info prohibits matching).

Name Section—The following 27 name fields are repeated nine times for a total of 10 names. Each section is described by one field (pr_name_sect_0x) redefining the entire 571 byte area.

PR_NAME_NUMBER_01 2817-2818 2 Number representing the occurrence of this name within the record. Values are 01-10. If generating records, this contains 01 for first name encountered, 02 for second name encountered, etc.

Customer Data Parser 4-88 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_NMFORM_01 2819 1 A 1-byte field in each of 10 name segments in the PREPOS indicating whether the name is in personal or business form. Valid values are: 1 = Personal 2 = Business 3 = Errors Blank=No name present or identified. If the address has no name line recognized, like in the case where it is set to ignore, the CDP will not populate the pr_nmform field because it doesn’t know if this line is a name line. An ignore line becomes a miscellaneous line type. (This should be the only occasion when the name form field is not populated.)

PR_PREFIX_01 2820-2834 15 Personal name prefix, using recoded word. Titles like Mr., Dr.

PR_PREFIX_DISPLAY_01 2835-2849 15 Original personal name prefix display, no recodes applied. Titles like Mr., Dr., (no recodes applied)

PR_FIRST_01 2850-2864 15 Personal name, First name 1, using recoded word. First names like John, Joseph

PR_FIRST_DISPLAY_01 2865-2879 15 Original Personal name, first name display, no recodes applied. First names like John, Joe (no recodes applied)

PR_MIDDLE1_01 2880-2894 15 Personal name, middle name 1, using recoded word. Middle names like Richard, Stephen

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-89

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_MIDDLE1_DISPLAY_01 2895-2909 15 Original Personal name, middle name display, (no recodes applied.) Middle names like Richard, Stephen

PR_MIDDLE2_01 2910-2924 15 Personal name, middle name 2, using recoded word. Middle names like Richard, Stephen

PR_MIDDLE2_DISPLAY_01 2925-2939 15 Original Personal name, middle name display (no recodes applied.) Middle names like Richard, Stephen

PR_MIDDLE3_01 2940-2954 15 Personal name, middle name 3, using recoded word. Middle names like Richard, Stephen

PR_MIDDLE3_DISPLAY_01 2955-2969 15 Original Personal name, middle name display, (no recodes applied.) Middle names like Richard, Stephen

PR_LAST_01 2970-2999 30 Personal name, last name, using recoded word. Last (Surnames) names like Smith, Jones

PR_LAST_DISPLAY_01 3000-3029 30 Original Personal name, last name display, (no recodes applied.) Last (Surnames) names like Smith, Jones

PR_SUFFIX_01 3030-3044 15 Personal name, suffix, using recoded word. Titles after a personal name (DMD, ORTH, etc.)

PR_SUFFIX_DISPLAY_01 3045-3059 15 Original Personal name, suffix display, (No recodes applied.) Titles after a personal name (DMD, ORTH, etc.)

PR_GENER_01 3060-3069 10 Personal name, generation, using recoded word. Jr., Sr., etc.

Customer Data Parser 4-90 Complete PREPOS Layout

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

PR_GENER_DISPLAY_01 3070-3079 10 Original Personal name, generation display, (no recodes applied.) Jr., Sr., etc.

PR_GENDER_01 3080 1 Personal name, gender, using recoded word. F = Female M = Male N = Neutral Blank

PR_BUSNAME_01 3081-3180 100 Business name, using recoded word; for example: IBM

PR_BUSNAME_DISPLAY_01 3181-3280 100 Original Business name display, no recodes applied; for example: International Business Machines

PR_CONNECTOR_01 3281-3295 15 Connector, last name, using recoded word; for example: &

PR_CONNECTOR_DISPLAY_01 3296-3310 15 Original Connector display, no recodes applied; for example: AND

PR_RELATION_01 3311-3335 25 The standardized relationship identified by the CDP; for example: Trustee for, Executor for, In Trust for

PR_RELATION_DISPLAY_01 3336-3360 25 The input relationship identified by the CDP; for example: Trustee for, Executor for, In Trust for

PR_ORIG_LINE_NUMBER_01 3361-3362 2 Original line number; for example: 01-10

PR_NAME_CATEGORY_01 3363-3387 25 Concatenated codes from category (this name)

* The previous 27 name fields are repeated nine times for a total of 10 names. Each section is described by one field (pr_name_sect_0x) redefining the entire 571 byte area.

Trillium Software System™ Batch User’s Guide Complete PREPOS Layout 4-91

Table 4.4 Customer Data Parser Repository Output Record Format

Field Pos Len Description

Normalized Section (Original order of elements)

PR_LINE 8527-9526 1000 Address lines 1-10, standardized in original sequence, each 100 bytes in length. CDP Output: Mr. and Mrs. John Smith

PR_PATTERN 9527-9826 300 A 300-character field in the PREPOS used to store 10 three-character token identifiers for each name/ address line used for debugging and tuning. For example: 054060054051053

PR_LINE_TYPE 9827-9836 10 Types of original input address lines. A = Apartment B = Box E = E-mail F = Firm G = Geography H = Hold M = Miscellaneous N = Name R = Rural Route S = Street Y = Miscellaneous w/Care-of Z = Miscellaneous Street ? = Unidentified Blank = No Input Information

** pr_dwel1_name, pr_dwel2_name, pr_dwel3_name: These are paired elements (APT, APT#). The exception is UNIT, which stands alone. They get filled left-to-right, top-to- bottom. The order of FLOOR can be either KEYWORD,VALUE, or VALUE,KEYWORD. As with everything else, patterns determine the final order per line. *** pr_str_type1, pr_str_type2: Two street types are not recognized by the USPS. However, there is more than one type field to accommodate this scenario for other countries. You could get both fields populated depending on the output pattern as well. An example would be “Park Street Circle”. The USPS wants ‘Park Street’ as the street name and only ‘Circle’ as the type, but you could create a pattern for “STR_NM TYPE TYPE”.

Customer Data Parser 4-92 Customer Data Parser Log File

Customer Data Parser Log File

When the Parser encounters a bad name, street pattern or a city problem during processing, an entry is written to the Customer Data Parser Log File (palog) for the purpose of reviewing and improving the parsing process. This file also lists statistics collected during the parsing process that are helpful in assessing overall parsing results. Sample Sorted Log File

The following example shows a sorted Parser Log File (palog.srt). The first section of this file lists statistics gathered during parsing. Bad name and street patterns and city problems encountered in the address data are displayed at the end of the file.

Trillium Software System™ Batch User’s Guide Sample Sorted Log File 4-93

0 CONFLICTING GEOGRAPHIC TYPES 0 CORRECTED CITY NAME TOO LONG 0 DISPLAY BOX NUMBER TOO LONG 0 DISPLAY BOX TOO LONG 0 DISPLAY BUSINESS TOO LONG 0 DISPLAY CITY TOO LONG 0 DISPLAY COMPLEX NAME TOO LONG 0 DISPLAY COMPLEX TYPE TOO LONG 0 DISPLAY CONNECTOR TOO LONG 0 DISPLAY COUNTRY TOO LONG 0 DISPLAY DIRECTION TOO LONG 0 DISPLAY DWELLING NUMBER TOO LONG 0 DISPLAY DWELLING TOO LONG 0 DISPLAY FIRST NAME TOO LONG 0 DISPLAY GENERATION TOO LONG 0 DISPLAY HOUSE NUMBER TOO LONG 0 DISPLAY LAST NAME TOO LONG 0 DISPLAY MIDDLE NAME TOO LONG 0 DISPLAY NEIGHBORHOOD TOO LONG 0 DISPLAY POST CODE TOO LONG 0 DISPLAY RELATION TOO LONG 0 DISPLAY ROUTE NUMBER TOO LONG 0 DISPLAY ROUTE TOO LONG 0 DISPLAY STATE/PROVINCE/COUNTY TOO LONG 0 DISPLAY STREET TYPE TOO LONG 0 DISPLAY TITLE TOO LONG

Customer Data Parser 4-94 Sample Sorted Log File

0 FOREIGN ADDRESS ELEMENT FOUND 0 HOLD MAIL ELEMENT PRESENT 0 INVALID TOKEN DEFINITION 0 LABEL OR LABEL ELEMENT TOO LONG 0 MISC DATA FOR LINE TOO LONG 0 NO CITY NAME FOUND IN RECORD 0 NO STATE FOUND IN RECORD 0 NUMBER OF PERSONAL NAMES WITH A COMPOUND SURNAME 0 NUMBER OF RECORDS CONTAINING 10 NAMES 0 NUMBER OF RECORDS CONTAINING 2 DWELLINGS 0 NUMBER OF RECORDS CONTAINING 4 NAMES 0 NUMBER OF RECORDS CONTAINING 5 NAMES 0 NUMBER OF RECORDS CONTAINING 6 NAMES 0 NUMBER OF RECORDS CONTAINING 7 NAMES 0 NUMBER OF RECORDS CONTAINING 8 NAMES 0 NUMBER OF RECORDS CONTAINING 9 NAMES 0 NUMBER OF RECORDS CONTAINING A HIGHWAY ADDRESS 0 NUMBER OF RECORDS CONTAINING A MISC LINE 0 NUMBER OF RECORDS CONTAINING A ROUTE ADDRESS 0 NUMBER OF RECORDS CONTAINING A SPECIAL ADDRESS LINE 0 NUMBER OF RECORDS CONTAINING ALL BLANK DATA 0 NUMBER OF RECORDS CONTAINING FOREIGN GEOGRAPHY 0 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 1 0 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 2 0 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 5 0 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 7 0 NUMBER OF RECORDS WITH COMPREHENSION CODE OF UNKNOWN 0 NUMBER OF RECORDS WITH CONFIDENCE CODE OF 1 0 NUMBER OF RECORDS WITH CONFIDENCE CODE OF 2 0 NUMBER OF RECORDS WITH CONFIDENCE CODE OF 3 0 NUMBER OF RECORDS WITH CONFIDENCE CODE OF 4 0 NUMBER OF RECORDS WITH CONFIDENCE CODE OF 5 0 NUMBER OF RECORDS WITH CONFIDENCE CODE OF 6 0 NUMBER OF RECORDS WITH CONFIDENCE CODE OF 7 0 NUMBER OF RECORDS WITH CONFIDENCE CODE OF UNKNOWN 0 NUMBER OF RECORDS WITH DIRECTORY POST CODE TYPE = 4 0 NUMBER OF RECORDS WITH DIRECTORY POST CODE TYPE = 5 0 NUMBER OF RECORDS WITH DIRECTORY POST CODE TYPE = 9 0 STANDARDIZED BOX NUMBER TOO LONG 0 STANDARDIZED BOX TOO LONG

Trillium Software System™ Batch User’s Guide Sample Sorted Log File 4-95

0 STANDARDIZED COUNTRY TOO LONG 0 STANDARDIZED DIRECTION TOO LONG 0 STANDARDIZED DWELLING NUMBER TOO LONG 0 STANDARDIZED DWELLING TOO LONG 0 STANDARDIZED FIRST NAME TOO LONG 0 STANDARDIZED GENERATION TOO LONG 0 STANDARDIZED HOUSE NUMBER TOO LONG 0 STANDARDIZED LAST NAME TOO LONG 0 STANDARDIZED MIDDLE NAME TOO LONG 0 STANDARDIZED NEIGHBORHOOD TOO LONG 0 STANDARDIZED POST CODE TOO LONG 0 STANDARDIZED RELATION TOO LONG 0 STANDARDIZED ROUTE NUMBER TOO LONG 0 STANDARDIZED ROUTE TOO LONG 0 STANDARDIZED STATE/PROVINCE/COUNTY TOO LONG 0 STANDARDIZED STREET TITLE TOO LONG 0 STANDARDIZED STREET TYPE TOO LONG 0 STANDARDIZED TITLE TOO LONG 0 TOO MANY CATEGORIES 0 TOO MANY DIRECTIONS 0 TOO MANY DWELLING VALUES 0 TOO MANY DWELLINGS 0 TOO MANY MIDDLE NAMES 0 TOO MANY NAMES NON EXPORT 0 TOO MANY STREET TYPES 0 TOO MANY TITLES 0 TOTAL NUMBER OF EXPORT NAMES GT MAX 0 UNIDENTIFIED TOKEN 0 UNUSUAL BOX VALUE 0 UNUSUAL DWELLING VALUE 0 UNUSUAL HOUSE NUMBER 0 UNUSUAL POST CODE VALUE 0 UNUSUAL ROUTE VALUE 0 USER TOKEN ERROR 1 DISPLAY STREET TITLE TOO LONG 1 NO STREET IDENTIFIED 1 NUMBER OF RECORDS CONTAINING A COMPLEX 1 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 3 1 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 4 1 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 6

Customer Data Parser 4-96 Sample Sorted Log File

2 MORE THAN ONE MIDDLE NAME 2 NO GEOGRAPHY IDENTIFIED 2 NO NAMES IDENTIFIED 2 NUMBER OF RECORDS CONTAINING 0 NAMES 2 NUMBER OF RECORDS CONTAINING 3 NAMES 2 NUMBER OF RECORDS CONTAINING A BOX IN THE ADDRESS 2 NUMBER OF RECORDS CONTAINING NOT VERIFIED GEOGRAPHY 2 NUMBER OF RECORDS CONTAINING ONLY VERIFIED BUSINESS NAME FORMS 2 NUMBER OF RECORDS WITH DIRECTORY POST CODE TYPE = OTHER 2 NUMBER OF RECORDS WITHOUT GEOGRAPHY INFORMATION 2 NUMBER OF RECORDS WITHOUT NAME INFORMATION 2 UNKNOWN STREET PATTERN 3 DUPLICATE STREET LINE TYPES 3 NUMBER OF RECORDS CONTAINING AN UNKNOWN LINE 3 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 8 3 UNIDENTIFIED DERIVED GENDERS CONFLICT 5 NUMBER OF RECORDS CONTAINING IDENTIFIED NAME REVIEW CODES 5 NUMBER OF RECORDS CONTAINING STREET REVIEW CODES 5 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 9 6 DOMESTIC CITY NAME PRESENT BUT COULD NOT BE VERIFIED 6 NUMBER OF RECORDS CONTAINING ONLY REJECTED NAME FORMS 7 NUMBER OF NAMES WITH A REJECTED NAME FORM 7 UNKNOWN NAME PATTERN 9 NUMBER OF RECORDS CONTAINING MISC REVIEW CODES 11 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 0 21 CITY NAME CHANGE USED FOR CITY 21 MIXED NAME FORMS PRESENT 21 NUMBER OF RECORDS CONTAINING A CORRECTED CITY NAME 21 NUMBER OF RECORDS CONTAINING COMBINED NAME FORMS 21 NUMBER OF RECORDS WITH CONFIDENCE CODE OF 0 23 NUMBER OF NAMES WITH A VERIFIED BUSINESS NAME FORM 23 NUMBER OF RECORDS CONTAINING 1 DWELLING 23 NUMBER OF RECORDS CONTAINING 2 NAMES 26 NUMBER OF RECORDS CONTAINING GLOBAL REVIEW CODES 27 NUMBER OF RECORDS CONTAINING GEOGRAPHY REVIEW CODES 33 NUMBER OF RECORDS WITH DIRECTORY POST CODE TYPE = 0 65 NUMBER OF RECORDS WITH DIRECTORY POST CODE TYPE = 8 69 NUMBER OF RECORDS CONTAINING ONLY PERSONAL FORMS 73 NUMBER OF RECORDS CONTAINING 1 NAME 77 NUMBER OF RECORDS WITH CONFIDENCE CODE OF 10 78 NUMBER OF RECORDS WITH COMPREHENSION CODE OF 10

Trillium Software System™ Batch User’s Guide Bad Name, Street Patterns and City Problem Section 4-97

Bad Name, Street Patterns and City Problem Section

BAD NAME PATTERN=1ALPHA, 1ALPHA, ALPHA, ALPHA, 1ALPHA, | D, D, WALLACE, VICE, P, REC=34 BAD NAME PATTERN=FIRST, ALPHA, CONNECTOR, FIRST, ALPHA, CONNECTOR, FIRST, | JIMMY, PIKE, AND, AMY, PIKE, AND, KAREN, REC=39 BAD NAME PATTERN=FIRST, ALPHA, CONNECTOR, FIRST, ALPHA, CONNECTOR, FIRST, | KAREN, PIKE, AND, AMY, PIKE, AND, JIMMY, REC=40 BAD NAME PATTERN=FIRST, FIRST,, ALPHA, ALPHA, | SUSAN, MURPHY,, VICE, PREZ, REC=16 BAD NAME PATTERN=TITLE, FIRST, ALPHA, CONNECTOR, TITLE, FIRST, ALPHA, CON- NECTOR, FIRST, | MR, CHESTER, TATE, &, MR, JOSEPH, TATE, &, JESSICA, REC=54 BAD NAME PATTERN=TITLE, FIRST, FIRST, FIRST, ALPHA, TITLE, | DR, BERNARD, THOMAS, CLINTON, SMITH, DMD, REC=55 BAD NAME PATTERN=TITLE, TITLE, FIRST, 1ALPHA, LAST, | MAJOR, MRS, GEORGE, M, MAJOR, REC=48 BAD STREET PATTERN=HSNO, S-DIRECTION, SEC-TYPE, TYPE, ALPHA-1NUMERIC, APT- COMPLEX, | 1727, N, SHORE, RD, A2, REAR, REC=74 BAD STREET PATTERN=HSNO, S-DIRECTION, SEC-TYPE, TYPE, ALPHA-1NUMERIC, APT- COMPLEX, | 1727, N, SHORE, ROAD, A2, REAR, REC=75

CITY PROBLEM= MACARLYSLE REC=28 CITY PROBLEM= MACARLYSLE REC=30 CITY PROBLEM= MACONCERD REC=19 CITY PROBLEM= MACONCERD REC=26 CITY PROBLEM= MACONCERD REC=47 CITY PROBLEM= MACONCERD REC=49

The bad name and street patterns and city problems encountered in the address data appear at the end of the sorted CDP Log File in the above example. This section itemizes the token attributes for each unknown pattern.

Customer Data Parser 4-98 Bad Name, Street Patterns and City Problem Section

For name and street patterns, the program separates the tokens with a comma (for example, TITLE, FIRST). Tokens in the pattern prior to the vertical bar ( | ) are the attributes assigned by the Parser and tokens after the vertical bar are the actual values. With this, the user may compare the pattern directly with the actual data, which is included on the same Bad Pattern line.

Also shown is the record number (for example, REC=) which identifies the record where the error occurred in the original input file. If a comma is included on an input line in the original file, this is indicated in this section by a double comma. (for example, FIRST, ALPHA).

City problems occur when a postal code/state/city match is not found in the CDP geographic directories, usually the result of misspellings or mismatches. This fine-tuning of the CDP is achieved by the user adding or correcting entries in the USERDEF table through Table Maintenance.

Trillium Software System™ Batch User’s Guide Parser Scrub Report 4-99

Parser Scrub Report ********************************************************************* TRILLIUM SOFTWARE SYSTEM Fri Mar 14 12:54:58 2005 PARSER SCRUB REVIEW ********************************************************************* T ORIGINAL INPUT T NORMALIZED LINES PATTERNS N MS. LANE DAVEY N MS LANE DAVEY 054,051,053 I PRESIDENT I PRESIDENT 178 N LANE DAVEY AND ASSOCIATES N LANE DAVEY AND ASSOCIATES 059,059,059,059 S 200 ELLERY STREET S 200 ELLERY ST 114,116,100 G NORTHAMPTON, MA 01060G NORTHAMPTON MA 01060150,151,157

COMPREHENSION = 09, CONFIDENCE = 08, NUMBER OF NAMES = 02 NAME TYPE = M GLOBAL REVIEW CODES =090 NAME 1 REVIEW CODES =019 ------STREET INFORMATION HSE# T TTL TYPE1 TYPE2 DIR1 DIR2

200 N ELLERYST 200 ELLERYSTREET ------MISC DATA LINE 2 =PRESIDENT GEOGRAPHY INFORMATION

CITY NME DIR CITY NMEST/PRV/CTYPOST CODET DIR POST CODE T NM

NORTHAMPTON MA 01060 U 01060 8 25 NORTHAMPTON MA ------NAME INFORMATION N PREFIX FIRST NME MID NME 1 MID NME 2 MID NME 3 LAST NME SUFFIX GENER G CNTR RELATION F

Customer Data Parser 4-100 Parser Scrub Report

********************************************************************** TRILLIUM SOFTWARE SYSTEM Fri Mar 14 12:55:36 2004 TRILLIUM PARSER SCRUB REVIEW ********************************************************************** T ORIGINAL INPUT T NORMALIZED LINES PATTERNS N MR. JACK O'CONNOR AND MRS. JOHN O'CN MR JACK O'CONNOR AND MRS JOHN O'CONNOR O054,051,053,060,054,051 S FORT TERRACE DRIVE S FORT TERRACE DR 116,100 G NORTHAMPTON, MA 01060 G NORTHAMPTON MA 01060 150,151,157

COMPREHENSION = 00, CONFIDENCE = 00, NUMBER OF NAMES = 01 NAME TYPE = R MISC LINE 1 REVIEW CODES =001 ------STREET INFORMATION HSE# T TTL TYPE1 TYPE2 DIR1 DIR2

FORT TERRACE DR FORT TERRACE DRIVE ------GEOGRAPHY INFORMATION CITY NME DIR CITY NME ST/PRV/CTYPOST CODE T DIR POST CODE T NM

NORTHAMPTON MA 01060 U 01060 8 25 NORTHAMPTON MA ------NAME INFORMATION N PREFIX FIRST NME MID NME 1 MID NME 2 MID NME 3 LAST NME SUFFIX GENER G CNTR RELATION F

01 MR JACK O'CONNOR AND MRS JOHN O'CONNOR OR TERESA 3 MR JACK O'CONNOR AND MRS JOHN O'CONNOR OR TERESA

Properly parsed name token information appears under the appropriate column in the NAME INFORMATION section. On the previous page, Ms. Lane Davey is an example of a name that was successfully parsed. In the example above, however, the name

Trillium Software System™ Batch User’s Guide Customer Data Parser Display (Scrub) Report File 4-101

received on input did not parse correctly (bad pattern of name components) and is displayed left-justified under the NAME INFORMATION section.

Customer Data Parser Display (Scrub) Report File

The display of input against output is known as the Display Report (or Scrub Report), and is produced based on the values in the DISP_CONFIDENCE keywords in the Parameter File. The report displays parsing results determined by the system. The report format follows: Section 1: Record Information

Line Pattern

Original Input Lines

Reconstructed Input Lines

Token Pattern Information (Indicates how the CDP coded the input data See the Utilities and Table Maintenance manual for further information on token and attribute coding)

Parsing Comprehension (indicates how well the program understood the address)

Parsing Confidence (indicates the acceptance level of the complete name and address)

Number of Names

Name Type (for each line)

Review Codes

In sections 2, 3 and 4, the Standardized value for each field appears as the top line of the two lines that are displayed and the Display value for each field appears at the bottom of the two lines.

Customer Data Parser 4-102 Section 2: Street Line Information

Section 2: Street Line Information

House Number

House Number Type

Street Title

Primary and Secondary Street Type

Primary and Secondary Street Direction

Section 3: Additional Street Line Information

Route Name, Number, Type

Box Name, Number, Type

Dwelling 1 Name, Number, Type

Dwelling 2 Name, Number, Type

Complex Name, Type

Section 4: Geography Line Information

City Name

City Name from City Directory

State/Province/County Name

Postal Code, Type

Postal Code, Type from City Directory

State Number Section 5: Name Line Information Repeated for as many name lines as exist in the INA

Name Number

Prefix

First Name

Middle Name 1, 2, and 3

Trillium Software System™ Batch User’s Guide Section 5: Name Line Information 4-103

Last Name

Suffix

Generation

Gender

Connector

Relationship

Name Form

Customer Data Parser 4-104 Detail File

Detail File

The Detail file consists of INA data, recoded data, comprehension and confidence levels, the line pattern, and details on the line pattern. Each input name/address is given one page of detail. This sample shows the same two records as displayed for the Log File and Display Report. This file is invoked with DETAIL_DISPLAY and the PRIMARY_DETFNAME parameters.

Sample Detail File 1) MS. LANE DAVEY 2) PRESIDENT 3) LANE DAVEY AND ASSOCIATES 4) 200 ELLERY STREET 5) NORTHAMPTON, MA 01060 6) 7) **************************************************************************** * * RECODED RECORD **************************************************************************** * 1) <<< MS >>> <<< LANE >>> <<< DAVEY >>> 2) <<< PRESIDENT >>> 3) <<< LANE >>> <<< DAVEY >>> <<< AND >>> <<< ASSOCIATES >>> 4) <<< 200 >>> <<< ELLERY >>> <<< ST >>> 5) <<< NORTHAMPTON >>> <<< MA >>> <<< 01060 >>> COMPREHENSION = 9 CONFIDENCE LEVEL = 8 LINE PATTERN = NINSG -->LINE (1) IS A NAME LINE DUE TO RULE 6) LINE HAS VALID BEGINNING/ ENDING ATTRIBUTES FOR THIS LINE TYPE -->LINE (2) IS IGNORE LINE DUE TO RULE 3) LINE HAS ATTRIBUTES OF ONLY ONE L TYPE PATTERN =IGNORE, -->LINE (3) IS A NAME LINE DUE TO RULE 6) LINE HAS VALID BEGINNING/ENDING ATTRIBUTES FOR THIS LINE TYPE PATTERN =BUSINESS, BUSINESS, BUSINESS, BUSINESS, -->LINE (4) IS A STREET LINE DUE TO RULE 6) LINE HAS VALID BEGINNING/ENDI ATTRIBUTES FOR THIS LINE TYPE PATTERN =HSNO, STREET-NAME, TYPE, -->LINE (5) IS A GEOG LINE DUE TO RULE 15) LAST LINE HAS GEOGRAPHY PATTERN =CITY, STATE, ZIPCODE,

Trillium Software System™ Batch User’s Guide Using the Palog Analyzer 4-105

Using the Palog Analyzer

The Customer Data Parser is designed to be extremely flexible in order to handle the variations of input name and address data. This flexibility is achieved through the modification of tables and directories used to give the parsing software its understanding of name and address elements.

After executing the Parser, the Log File (palog.txt) and the Display Report should be examined. The run statistics are located at the end of the log file. The statistics give you a good indication of events that happened during parsing. Prior to recording the statistics, invalid name pattern, street pattern, and city problem entries are recorded.

Searching for common occurrences that could be repaired with minimal effort should be attempted first. The Log File should be sorted to identify common occurrences.

After reviewing the Log file for common errors, the user may make additions and corrections to the user-defined table that is input to Table Maintenance.

The Palog Analyzer (located inside the Parser Tuner tool) is an interactive tools that greatly assists with the analysis and correction of pattern problems reported in the Parser log files. This tool makes it very easy take the proper corrective action, by assisting with the creation of the appropriate client word/ pattern table entry.

See the Parser Tuner section of the Control Center manual for more information.

Customer Data Parser 4-106 Corrective Action Examples

Corrective Action Examples

Name Pattern Addition

BAD NAME PATTERN=TITLE,FIRST,LAST,CONNECTOR,TITLE,FIRST,LAST, CONNECTOR,FIRST, |MR,JACK,O’CONNOR,AND,MRS,JOHN,O’CONNOR,OR,TERESA

Action: Add Name Pattern to User Table ‘TITLE FIRST LAST CONNECTOR TITLE FIRST LAST CONNECTOR FIRST’ INSERT PATTERN NAME DEF RECODE=’TITLE FIRST LAST CONNECTOR TITLE FIRST LAST CONNECTOR FIRST’ EXPORT=’TITLE(1)FIRST(1)LAST(1)CONNECTOR(2)TITLE(2)FIRST(2)LAST(2)

Street Pattern Addition BAD STREET PATTERN=HSNO, DIRECTION, L-TYPE, DIRECTION, DIRECTION, 14, WEST, AVENUE, EAST, EAST,

Action: Add Street Pattern to User Table ‘HSNO DIRECTION L-TYPE DIRECTION DIRECTION’ INSERT PATTERN STREET DEF RECODE=’HSNO STREET-NAME TYPE DIRECTION IGNORE’

U.S. City Problem Addition CITY PROBLEM=02130MABACK BAY

Action: Add Local Name to User Table as a City-Change ‘MABACK BAY_’ INSERT GEOG DEF ATT=CITY-CHANGE,RECODE=’MABOSTON’

Trillium Software System™ Batch User’s Guide Post Town Problem Addition 4-107

Post Town Problem Addition POST TOWN PROBLEM=GLOUCESTERSHIRE CHELTENHAN

Action: Add Post Town Name to User Table as a City-Change: ‘GLOUCESTERSHIRE CHELTENHAN_’ INSERT GEOG DEF ATT=CITY-CHANGE, RECODE=’CHELTENHAM’

Word/Phrase Addition FIRST, 1ALPHA, ALPHA, ALPHA, ALPHA|JOHN, C, NICOLI, SOFTWARE, DEVELOPER

Action: Add ‘Software Developer’ as a Descriptive Phrase/Title in User Table: ‘SOFTWARE DEVELOPER’ INSERT NAME DEF ATT=DESCRIPTIVE

Using the Parser Display Program

The Customer Data Parser Display Program can be used for tuning when the overall context of the name and address is required. By examining parsing results at the various confidence levels, one can make decisions about additions to the User Tables that affect line identification, as well as the parsing accuracy of a given line type.

See the section “Output Display Program (CFPRSDSP)” on page 4-117 for more information about this program.

Rerunning Table Maintenance After Tuning

After making appropriate changes to the User Tables, the user must execute Table Maintenance to produce a new set of encoded word/phrase and pattern Tables. The Customer Data Parser should also be re-executed in order to evaluate results. This process can be repeated as many times as necessary.

Customer Data Parser 4-108 Review Codes and Review Groups

Review Codes and Review Groups

Review codes are generated from the Parser to identify specific conditions occurring for each record being parsed. When a record receives a review code, a review group is also written to a 3-byte field called pr_rev_group in the PREPOS.

For multiple review codes, the review group is determined by a default hierarchy table (see the table “Review Group Hierarchy” on page 4-113).

To change the review group order, REVIEW_GROUP_ORDER can be used to specify the review group hierarchy.

The review codes are written to the CDP Repository Output Record (PREPOS) in three character pairs in the following fields:

pr_name_review_codes

pr_street_review_codes

pr_geog_review_codes

pr_misc_review_codes

pr_global_review_codes The table below lists the review codes, review groups, and descriptions.

Table 4.5 Review Codes and Review Groups

Review Code Review Group Description

Review Codes Can Belong To Multiple Review Code Fields:

000 000 No review code found

Name Codes

001 008 Unknown name pattern

002 009 Standardized first name too long

003 009 Display first name too long

004 009 Total number of export names gt max

Trillium Software System™ Batch User’s Guide Review Codes and Review Groups 4-109

Table 4.5 Review Codes and Review Groups (Continued)

Review Code Review Group Description

005 009 Standardized middle name too long

006 009 Display middle name too long

007 009 Too many middle names

008 009 Standardized last name too long

009 009 Display last name too long

010 009 Standardized title too long

011 009 Display title too long

012 009 Too many titles

013 009 Standardized connector too long

014 009 Display connector too long

015 009 Standardized relation too long

016 009 Display relation too long

017 009 Standardized business too long

018 009 Display business too long

019 009 Derived genders conflict

020 009 Standardized generation too long

021 009 Display generation too long

022 010 More than one middle name

Street Codes

026 011 Unknown street pattern

027 011 Standardized street type too long

028 011 Display street type too long

029 011 Too many street types

Customer Data Parser 4-110 Review Codes and Review Groups

Table 4.5 Review Codes and Review Groups (Continued)

Review Code Review Group Description

030 012 Standardized direction too long

031 012 Display direction too long

032 012 Too many directions

033 013 Standardized street title too long

034 013 Display street title too long

035 013 Standardized complex name too long

036 013 Display complex name too long

037 013 Standardized house number too long

038 013 Display house number too long

039 013 Unusual house number

040 013 Display dwelling too long

041 013 Standardized dwelling too long

042 013 Too many dwellings

043 013 Unusual dwelling value

044 013 Too many dwelling values

045 013 Display box too long

046 013 Standardized box too long

047 013 Unusual box value

048 013 Display route too long

049 013 Standardized route too long

050 013 Standardized route number too long

051 013 Display route number too long

052 013 Unusual route value

Trillium Software System™ Batch User’s Guide Review Codes and Review Groups 4-111

Table 4.5 Review Codes and Review Groups (Continued)

Review Code Review Group Description

053 013 Standardized complex type too long

054 013 Display complex type too long

055 013 Standardized dwelling number too long

056 013 Standardized box number too long

057 013 Display box number too long

058 013 Display dwelling number too long

059 020 Duplicate street line types

Geography Codes

061 014 No city name found in records

062 014 No state found in records

063 014 Standardized city too long

064 014 Display city too long

066 015 Standardized state/province/county too long

067 015 Display state/province/county too long

070 015 Standardized country too long

071 015 Display country too long

072 015 Standardized neighborhood too long

073 015 Display neighborhood too long

074 015 Standardized post code too long

075 015 Display post code too long

076 015 Unusual post code value

077 016 Corrected city name too long

078 000 City name change used for city

Customer Data Parser 4-112 Review Codes and Review Groups

Table 4.5 Review Codes and Review Groups (Continued)

Review Code Review Group Description

079 017 Conflicting geographic types

080 018 Domestic city name present but could not be verified

Global Review Codes

082 001 Unidentified pattern

083 001 Unidentified token

084 019 Unidentified line

085 001 Invalid token definitions

086 001 Label or label element too long

087 001 Miscellaneous data for line too long

088 001 Too many categories

089 001 Too many names for export

090 002 Mixed name forms present

091 003 Hold mail element present

092 004 Foreign address element found

093 005 No names identified

094 006 No street identified

095 007 No geography identified

096 - 099 Currently unassigned

Trillium Software System™ Batch User’s Guide Review Group Hierarchy 4-113

Review Group Hierarchy

The table below displays the default review group hierarchy. The review group code is placed in the PREPOS field, pr_rev_group.

The parameter, REVIEW_GROUP_ORDER, may be used to modify the group hierarchy.

Table 4.6 Review Group Hierarchy

Review Group Text of Parser Report Description 001 Unidentified token 005 No names identified No name found on the record For example: 12 main street Boston MA 01123 006 No street identified No street information found on the record. For example: John Smith Boston MA 01123 007 No geography identified No Geography information found on the record. For example: John Smith 12 main street 014 No city or county Record did not contain city or county, or identified could not be identified 019 Unidentified Line Line type could not be determined, and is set to ? 008 Unknown name pattern Pattern for name format does not exist in table. For example: John Smith B A C D 12 main street Boston MA 01123 011 Unknown street pattern Pattern for street format does not exist in table 013 Unusual or long address When the length of the street name exceeds 25 bytes as defined in prepos.ddl

Customer Data Parser 4-114 Review Group Hierarchy

Table 4.6 Review Group Hierarchy (Continued)

Review Group Text of Parser Report Description 012 Invalid directional Direction is inconsistent. 017 Conflicting geography The country default is US and the valid city types state is followed by foreign type postal code. For example: Mr John Smith 12 main street Boston MA A1C 3R4 015 Geography too long The length of the geography exceeds 30 bytes as in prepos.ddl 018 Unable to verify city City name cannot be identified. name 016 Corrected city name too Table entry for a city change recode exceeds long 25 bytes, as defined in prepos.ddl 020 Multiple street line types When more than one street line is found on the record. 010 More than one middle Two or more middle names were found on name the name line. For example: John Adam Wilson Smith 12 main street Boston MA 01123 009 Derived genders conflict When the title and first name gender value are different. For example: Miss John Smith 12 main street Boston MA 01123 004 Foreign address Parser found a geography element outside the country that the Parser is running. For example: John Smith 12 main street Boston France 01123

Trillium Software System™ Batch User’s Guide Review Group Hierarchy 4-115

Table 4.6 Review Group Hierarchy (Continued)

Review Group Text of Parser Report Description 003 Hold mail One of the lines on the record is of type H (such as Return Mail) For example: John Smith Return Mail 12 main street Boston MA 01123 002 Mixed name forms A business and personal name were found on record. For example: John Smith ABC corp 12 main street Boston MA 01123 000 No review code found No identifiable error on record. For example: John Smith 12 main street Boston MA 01123

Customer Data Parser 4-116 Statistics Report

Statistics Report

Review codes are generated from the CDP to identify specific conditions occurring for each record being parsed. When a record receives a review code, a review group is also written to a 3 byte field called PR_REV_GROUP in the PREPOS.

The Parsing Statistics Report (pastat) is generated by the CDP and summarizes the number and percentage of records distributed over each review group. A brief description of each review group also appears.

Review Groups# of Records % Descriptions ______0 945 94.5%No Targeted Conditions Found 1 0 0.0% Unidentified Item 2 22 2.2% Mixed Name Forms 3 0 0.0% Hold Mail 4 0 0.0% Foreign Address 5 0 0.0% No Names Identified 6 0 0.0% No Street Identified 7 2 0.2% No Geography Identified 8 4 0.4% Unknown Name Pattern 9 8 0.8% Derived Genders Conflict 10 11 1.1% More Than One Middle Name 11 1 0.1% Unknown Street Pattern 12 0 0.0% Invalid Directional 13 0 0.0% Unusual or Long Address 14 3 0.3% No City or County Identified 15 0 0.0% Geography Too Long 16 0 0.0% Corrected City Name Too Long 17 0 0.0% Conflicting Geography Types 18 1 0.1% Unable to Verify City Name 19 0 0.0% Unidentified Line 20 3 0.3% Multiple Street Line Types

Figure 4.3 Sample Parsing Statistics Report

Trillium Software System™ Batch User’s Guide Output Display Program (CFPRSDSP) 4-117

Output Display Program (CFPRSDSP)

This is the program for reviewing Parser output. The program is run as a separate step to allow sorting of the input file, if desired (usually by the parsout field pr_rev_group in the PREPOS layout).

Sorting Options A sort utility from Optech Sort is included with the UNIX and PC versions of the Trillium Software System. You can either use Optech Sort or your own sort utility to sort your data.

Customer Data Parser 4-118 Parser Display Program Parameters

Parser Display Program Parameters

Parameters that are REQUIRED are indicated by the shaded rows.

Name Values Description

CLIENT client name Client name to display on the report.

DDL_INP_FNAME DDL name Name of the input DDL file. One DDL is required to describe the input. This DDL is named in the parameter file. The program assumes standard names for the Parser generated data of the DDL

DDL_INP_RNAME DDL name Record name of the input DDL.

INP_DDNAME file name Name of the input file.

MAX_PR_REV_ numeric Maximum number of records displayed per review GROUP group. If this parameter is used, the file must first be sorted by review group as a primary sort key.

MAXIN numeric Maximum number of records to read. If blank, all records in are read.

MAXOUT numeric Maximum number of records to write. If blank, all records in are displayed.

MAXPAGES numeric Maximum number of pages to display.

NUMB_NAMES numeric Display only records that contain at least this many names (or greater).

NUMB_PRINT numeric Maximum number of names to display.

PREPOS_FORMAT_ numeric 1=PREPOS_FORMAT_OPTION is set in the Parser OPTION parameter file. If this is set, it should also be set in pfprsdsp.par.

PRN_DDNAME file name Name of the report file.

TITLE title Title to display at top of the report.

TITLE2 title Secondary title line at the top of the report.

USER_FIELD1–5 field Field name(s) in the DDL. These parameters can be name(s) used to display the contents of any field in the DDL.

Trillium Software System™ Batch User’s Guide Customer Data Parser Display Report Description 4-119

Customer Data Parser Display Report Description

Section Subsections

Heading Trillium Software System, time, date of printing and page number Client name Descriptive title

Client Specific Data Additional client specific information can include any field described in the DDL (one field per line).

Record Information Line pattern: indicates how the Parser coded the input line

Original input lines: allows for up to 10 name/address lines

Normalized input lines: after parsing

Token pattern information: indicates how the Parser coded the original input data (see the Utilities and Table Maintenance Manual for further information on token and attribute coding)

Confidence: indicates the acceptance level of the complete name and address (0-10; 10 being the highest)

Comprehension: indicates how well the program understood the name and address (0-10; 10 is highest)

Number of names identified in the address

Name type: (for each name identified) personal=1, business=2, reject=3, mixed=4

Category: accumulated categories for the name/address assigned by the parsing tables

Review Code Data If review code is non zero, explanation is listed here.

Each line (name, street, geog, etc.) has various review codes which are a subset of a review group. Therefore, a name review code of 012 would be included in the totals for review group 009, as would a name review code of 016.

Customer Data Parser 4-120 Customer Data Parser Display Report Description

Section Subsections

Street Data Note that many fields are presented twice: (once showing recode values, once showing display values) House number Street title Primary street type Secondary street type Primary street direction Secondary street direction

Additional Street Line Route name, number, type Data Box name, number, type Dwelling1 name, number, type Dwelling2 name, number, type Complex name, type Geographic Data Country name Neighborhood name City name, city name from city directory State/province/county name

Miscellaneous Postal code, type Geographic Data State code Postal code, other postal information

Name Data Number of names: derived from original input line Name form: (P) Personal, (B) Business or (R) Reject Prefix: title First name Middle name 1, 2, 3 Last name Suffix: secondary title Generation: Jr., Sr., III, etc. Gender: (M)male, (F)female, or (N)neutral Connector: and, or etc. Relationship: Executor, Beneficiary, etc. Candidate codes: (for each name generated)

Trillium Software System™ Batch User’s Guide Customer Data Parser Display Report Description 4-121

******************************************************************************************************** TRILLIUM SOFTWARE SYSTEM (Thu Mar 18 14:59:06 2004) "CLIENT COMPANY NAME" "DISPLAY OF PARSING RESULTS" ******************************************************************************************************** Client Specific Data pr_rev_group = <000> ------Original address lines Normalized address lines Patterns (reconstructed lines)

(N) (N) <051 052 053 > (S) <605 KATHERINE LANE > (S) <605 KATHERINE LN > <114 116 100 > (G) (G) <150 152 156 >

Confidence = 10, Comprehension = 10, Names = 01, Name type = 1, Category = ------Review Code Data No review codes found ------Street Data Number Title Pr type Sc type Pr dir Sc dir 605 KATHERINE LANE 605 KATHERINE LANE ------Geographic Data Country name Neighborhood name City nameNumber State/province/county Directory city name WOODSTOCK GA WOODSTOCK GA ------Misc Geographic Data Postal code SC O Fincod 30188-3625 2 000000 ------Name Data Nb F Prefix First Middle 1 Middle 2 Middle 3 Last Suffix Gener G Cnctr Rel

Figure 4.4 Sample Customer Data Parser Display Report

Customer Data Parser 4-122 Customer Data Parser Display Report Description

******************************************************************************************************** TRILLIUM SOFTWARE SYSTEM (Thu Mar 18 14:59:06 2004) "CLIENT COMPANY NAME" "DISPLAY OF PARSING RESULTS" ******************************************************************************************************** Client Specific Data

pr_rev_group = <000> ------Original address lines Normalized address lines Patterns (reconstructed lines)

(N) (N) <051 052 053 > (S) <605 KATHERINE LANE > (S) <605 KATHERINE LN > <114 116 100 > (G) (G) <150 152 156 >

Confidence = 10, Comprehension = 10, Names = 01, Name type = 1, Category = ------Review Code Data No review codes found ------Street Data Number Title Pr type Sc type Pr dir Sc dir 605 KATHERINE LANE 605 KATHERINE LANE ------Geographic Data Country name Neighborhood name City nameNumber State/province/county Directory city name WOODSTOCK GA WOODSTOCK GA ------Misc Geographic Data Postal code SC O Fincod 30188-3625 2 000000 ------Name Data Nb F Prefix First Middle 1 Middle 2 Middle 3 Last Suffix Gener G Cnctr Rel

Figure 4.5 Sample Customer Data Parser Display Report (Continued)

Trillium Software System™ Batch User’s Guide Line Pattern Identification on the Display Report 4-123

Line Pattern Identification on the Display Report

This pattern indicates the general content and relative position of the original name and address lines. A blank space indicates a blank line on the original record. This pattern is used to invoke specific processing routines for that line type.

Table 4.7 Line Pattern Identification

Pattern Definition

A Additional address data, (such as apartment information)

B Post office box line

E Email address

G Geography line (such as city, state, postal code)

H Hold line

IIgnore

M Miscellaneous line

N Name line

R Rural route line

S Street address

Y Miscellaneous line with care-of

Z Miscellaneous street line

? Unidentified line

Customer Data Parser 4-124 CFPRSDSP Program Error Messages

CFPRSDSP Program Error Messages This table shows all error messages from the display program. Table 4.8 CFPRSDSP Error Messages

Message Description

Parm Processing Error, status = 2 The parameter file for the Parser display program is present but incorrect.

Parm Processing Error, status = 3 The parameter echo file for the display program is present but incorrect.

Parm Processing Error, status = 4 The display program has processed the parameter file and encountered an error with a parameter entry.

Missing <-parmfile> parameter. The path and parameter file for the match display program is missing from the command line. Check the path and file name.

Must have a parm file. The parameter file for the customer data Parser display program is missing from the command line. Check the path and file name.

During dictionary open. The data dictionary defined in DDL_INP_FNAME can not be opened.

Cannot access The field being used is not defined on the DDL being from DDL used for this process.

during cmIniFCB for The field being used is not defined on the DDL being . used for this process. Check DDL and the information for this field, before running the process again.

Field not present on The field being used in this operation is not defined on the DDL. the DDL.

No original address lines. The ORG_RECORD field must be defined in both the User input record DDL and in the output record DDL.

No input file specified. An input data file has not been specified in the parameter INP_DDNAME.

Trillium Software System™ Batch User’s Guide Running the Parser Display Program on UNIX and 32-bit PC Platforms 4-125

Table 4.8 CFPRSDSP Error Messages

Message Description

Unable to open file . The matcher display program is unable to open the file specified in the parameter INP_DDNAME, please check the path and\or filename.

I/O CFPRSDSP ERROR during An I/O error has occurred while trying to perform a read on %s. read operation. The file may be corrupt. Please check the file and re-submit the operations again

Unable to get the value for The value specified is not valid for this field. Please . check your DDL or field positions and lengths used by the display program.

during cmLodFCB for The field ORADDRLn being used in this operation is not . defined on the DDL that is being used for this process.

Closing file . An error has occurred trying to close the specified file. Check that the file has not been corrupted and that sufficient space is available for the write operation.

Running the Parser Display Program on UNIX and 32-bit PC Platforms

The cfprsdsp program uses the following command syntax: cfprsdsp -parmfile parm_file_name -parmecho echo_file_name where:

cfprsdsp The CDP Display program driver program name.

-parmfile Keyword indicating the parameter file follows.

parm_file_name The Parser display parameter file.

-parmecho Keyword indicating the parameter echo file follows.

echo_file_name Displays any parameter processing errors.

Customer Data Parser 4-126 IBM Mainframe Sample Parser JCL

IBM Mainframe Sample Parser JCL

Figure 4.6, “IBM Mainframe Sample Parser JCL,” describes the Job Control Language used to run the Parser Display Report:

// ******************************************************************** ******** //* SAMPLE JCL TO RUN PARSER DISPLAY REPORT (CFPRSDSP) // ******************************************************************** ******** //CFPRSDSP EXEC PGM=CFPRSDSP,REGION=5500K, // PARM='/-PARMFILE PF -PARMECHO PE', REGION=0M //STEPLIB DD DSN=&BASEPREF.&TRILVER.LOADLIB,DISP=SHR // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //TRILMSGS DD DUMMY //PF DD DISP=SHR,DSN=&PROJPREF.&TRILVER.USMLIB(PFPRSDSP) //PE DD SYSOUT=* //HDTRPT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(REPORT)

Figure 4.6 IBM Mainframe Sample Parser JCL

Trillium Software System™ Batch User’s Guide CHAPTER 5 Business Data Parser

Your database contains mission critical business data, which is in need of identification and standardization. Identification of this information is a critical component of a quality database. The Business Data Parser is a tool designed to identify, verify, and standardize the components of free form text using sophisticated Pattern Recognition Technology. The Business Data Parser is designed to identify, verify and standardize all data that is NOT name and address related.

The Business Data Parser is called as an external subroutine from a driver program that supplies it with the data. The Business Data Parser then returns the identified and standardized result. The driver may range from an interactive data entry system to a high volume batch process.

The Business Data parsing process is business rule driven to allow users to customize business data identification to their specific requirements. The business rule tables are designed for easy comprehension and are easily updated. The data returned by the Business Data Parser is comprehensive and applicable to a wide variety of uses.

Business Data Parser 5-2 Business Data Parsing Logic Flow

The software is written in the ANSI standard ‘C’ programming language and can be executed in numerous environments.

Business Data Parsing Logic Flow

Business data is passed from a driver program to the Business Data Parser in a work area defined by a parameter file.

Parsing is accomplished in four major steps:

1. Assigning all possible attribute(s) to the word/phrase. There are up to 50 user defined attributes.

The first step in business data parsing is to isolate all words/phrases in the work area. Words/phrases are assigned all possible meanings via a Word/ Phrase table of business rules that is supplied via the BDP Table Mainte- nance process. Words/phrases not specified in the table are assigned an intrinsic attribute (alpha, numeric, etc.).

2. The default line type of “M” for Miscellaneous is assigned to the line unless the line hits upon a pattern in the word/pattern table that has a defined line attribute for that pattern.

3. Generating output: Business Data Parser Repository (BPREPOS)

A comprehensive data block is passed from the Business Data Parser back to the driver program called the Business Data Parser Repository (BPREPOS). It consists of fixed-fielded character data including error codes, identification indicators, and token information.

4. Statistics

Several mechanisms such as the BPREPOS Review Group and the Log and Display files are used to evaluate and refine the results returned by the Business Data Parser.

Trillium Software System™ Batch User’s Guide Business Data Parser Functions 5-3

Business Data Parser Functions

The Business Data Parser has the following primary and secondary functions:

Primary Functions

Identify business data words/phrases in free form text, up to 1000 characters in length.

Interpret and validate patterns of the identified text.

Produce standardized and identified output in useful formats.

Secondary Functions

In order to accomplish its three primary functions, the Business Data Parser:

Uses customized user-defined attributes.

Offers flexibility through an externally edited set of tables for business rules.

Identifies words and phrases by their values or their masks.

Corrects misspellings and allows for word or phrase recodes via the external tables.

Allows for the categorizing of any unique words and phrases using user-defined condition text.

Identifies data for review by numerous review conditions.

Produces a standard output so that applications may easily choose needed data elements.

Displays a log file with results for the business rules tuning process.

Collects run statistics in order to identify development areas quickly.

Produces a log, containing problems that can be easily analyzed to aid in the refinement of the external word/phrase and pattern tables.

Business Data Parser 5-4 Business Data Parser Process Flow

Business Data Parser Process Flow

Input and Output Resources

Business Parser This text file includes control logic for Parser processing. Table file Parameter File names, locations, control data values, and processing controls (pfbparse.par) may be altered within this file.

Driver Parameter File Text file that contains the input and output file names, DDL file (pfbprdrv.par) details, and other statistical output file names.

DDLs input.ddl, prepos.ddl, output.ddl

Input File The file with business data used as input to the Business Data Parser. (usually convout from the Converter)

Word/Phrase Table The first of two encoded files produced by the Table Maintenance (BDTABDEF) system. This table includes standard definitions and user-defined definitions for words/phrases used by the Business Data Parser.

Trillium Software System™ Batch User’s Guide DDL Requirements 5-5

Pattern Table This is the second of the two encoded tables produced by the (BDTABPAT) Table Maintenance system. This table contains standard patterns and user-defined patterns used by the Business Data Parser.

Log File A listing of primary Parser statistics including record counts of (bpalog) review groups and counts of recognized errors in the data. Invalid patterns encountered in the business data are also recorded. This file itemizes the token attributes for each invalid pattern, so the user may compare it directly with the data.

This file is used to fine-tune the Business Data Parser input by adding or correcting entries through Business Data Table Maintenance in the word/pattern tables.

Detail File An output file that consists of input data in the form of tokens and (bparsdet) rules used to arrive at the BPREPOS results.

Output File (bpaout) Output file whose shape is defined by the output DDL. BPREPOS data is copied to this file when fields match between the BPREPOS DDL and output DDL files.

The ORG_RECORD field is also copied from the input to the output. Please note that only the data contained in the field name ORG_RECORD is copied from input. Fields of other names from the input that are not redefined by the field ORG_RECORD are not copied. DDL Requirements

To run the Business Data Parser Driver program, you need three DDL files; one that describes the:

User input record (ORG_RECORD)

Business Data Parser return area (BPREPOS)

Output record

The user input DDL describes the input file. The output record consists of the desired fields from the BPREPOS and the entire original input record. The report DDL consists of the entire BPREPOS and original input record. All DDLs are named in the pfbprdrv.par parameter file.

Business Data Parser 5-6 ORG_RECORD Field

ORG_RECORD Field

The field ORG_RECORD must be defined in both the user input record DDL and in the output record DDL. This is the name of the standard DDL record that holds the original input lines. This field defines which contiguous fields are to be copied from the input record to the output record. Typically, this field describes the entire input record.

Only ORG_RECORD is copied from input to output.

Other Special DDL Fields

The DDL describing the input record must use standard names as well to describe the fields in the input record. Additional fields that may be defined in the BPREPOS DDL will cause certain conversion action and population to take place.

The following optional fields, when used, should be defined in the output DDL only. Length of each field is 1000 bytes. The following table describes the results of including one or more fields in the BPARSOUT.DDL definition:

Include this field To copy BA_LINE LABEL line

BA_MISC_DATA Miscellaneous data

Trillium Software System™ Batch User’s Guide DDL Specifics 5-7

DDL Specifics

The data dictionary is used to describe input and output data to the program using field names. Modifying input/output shapes does not require programming changes and can be accomplished by changing the DDLs (Data Dictionary Language files). DDLs must be used with the program.

See the “Data Dictionary Language” section of the Control Center manual (in the chapter with the DDL Editor tool) for further information on creating and using DDLs. The Business Data Parser module specifically needs to know the structure of the business components of a file. They are defined by specific field names. There are two methods of describing to cfgrsdrv the business data to be processed using DDLs:

The first uses the field name "business_data1" to describe the input business data.

The second way is to identify the business data in parts. Up to 7 parts per line may be specified. The parts can be described by adding the letters "a" to "g" to the end of the field name. For example, if the line has three parts, then field names "business_data1a", " business_ data1b", and " business_data1c" are used. The program builds the business data line from named parts with a space as a separator character.

This implementation causes a work area of up to 1000 bytes to be sent to the Business Data Parser.

Driver Parameter File

Please note that required parameters appear in bold and shaded.

Parameter Name Values Description

DDL_INP_FNAME DDL name Name of the input record specified in the DDL

DDL_INP_RNAME DDL name Name of the input record specified in the DDL

Business Data Parser 5-8 Driver Parameter File

Parameter Name Values Description

DDL_OUT_FNAME DDL name Name of the output record specified in the DDL

DDL_OUT_RNAME DDL name Name of the output record specified in the DDL

DDL_PREPOS_FNAME DDL name Name of the BDP return record specified in the DDL

DDL_PREPOS_RNAME file name Name of the record of BDP return record specified in the DDL

INP_DDNAME file name Name of the input file

MAXIN numeric Maximum number of records read If blank, all records are read

MAXOUT numeric Max number of records written If blank, all records are displayed

OUT_DDNAME file name Name of the output file

PA_PARMNAME parameter Parameter file for the BDP (usually pfbparse.par) file name

PRINT_NTH_COUNT numeric Prints the count of every nth records read If 0 or not specified, no counts are reported

START numeric Start processing at this record (1-based)

STAT_FNAME file name Name of the statistics file If not specified, the statistics are displayed on the screen

Trillium Software System™ Batch User’s Guide Driver Parameter File 5-9

Sample Driver Parameter File

A sample parameter file to run the cfgprsdrv program appears below: ************************************************************ * PFBPRDRV.PAR - Business Data Parser driver parameter file ************************************************************ MAXIN 10000 INP_DDNAME ..\data\binput OUT_DDNAME ..\data\bpaout PA_PARMNAME ..\parms\pfbparse.par STAT_FNAME ..\data\bpastat.txt DDL_INP_FNAME ..\dict\binput.ddl DDL_PREPOS_FNAME ..\dict\bprepos.ddl DDL_OUT_FNAME ..\dict\bparsout.ddl DDL_INP_RNAME BINPUT DDL_PREPOS_RNAME BPREPOS DDL_OUT_RNAME BPARSOUT PRINT_NTH_COUNT 100

The last line of all Parameter Files and Word/Pattern tables MUST contain a carriage return and/or line feed in order for the system to process the last Parameter/table entry. Do not use tabs; only spaces are valid.

Business Data Parser 5-10 Business Data Parser Parameter File

Business Data Parser Parameter File

The parameter file allows the Business Data Parser to be controlled externally by naming each of the files to be used for controlling its behavior. This file also predefines the shape of the input line as a miscellaneous line starting in position 1 for a length of 999.

Syntax [KEYWORD]=[PARM VALUE] where:

[KEYWORD] Name of the parameter

[PARM VALUE] Modifier

Note that an asterisk in column 1 of a line indicates a comment.

Parameter File Descriptions

The following table lists parameters in the BDP parameter file. Please note that all required parameters appear in bold and shaded. Table 5.1 Business Data Parser Parameters

Parameter Name Values Description

DETAIL_DISPLAY Y Indicates that a tokenized detail report is to be written to the PRIMARY_DETFNAME.

Trillium Software System™ Batch User’s Guide Parameter File Descriptions 5-11

Table 5.1 Business Data Parser Parameters

Parameter Name Values Description

ISALFILE code page Name (code page table) specified here table determines if characters are alphabetic in a MASK setting. For example: Vehicle Code = 1A2B3C MASK recognition = NANANA Used with the ISNMFILE parameter. This parameter is used in conjunction with the MASK modifier in table maintenance. It is required for special characters found in many foreign languages.

ISNMFILE file name The file name (code page table) specified here determines if characters are numeric in a MASK setting. For example: Vehicle Code = 1A2B3C MASK recognition = NANAN Used with the ISALFILE parameter. This parameter is used in conjunction with the MASK modifier in table maintenance. It is required for special characters found in many foreign languages.

KEEP_DELIMITER Special Normally, the special characters, period (.), Characters comma (,), and space ( ), are used as word delimiters. Use this parameter to specify which delimiter is to be kept for the label line. For example: KEEP_DELIMITER=/., ; The delimiter is kept at the end of the token when searched for in the table, and the space and comma would show up in the label line. For example, Fiona/ MacDonald Fiona/ is looked up in the table.

Business Data Parser 5-12 Parameter File Descriptions

Table 5.1 Business Data Parser Parameters

Parameter Name Values Description

KEEP_CHARACTER special Normally, special characters are removed and characters other characters are converted to uppercase. This parameter specifies characters that you do not want removed or converted to uppercase. For example, specifying KEEP_CHARACTER=[]; keeps the left and right brackets. Special Character Inclusion and Deletion: Characters automatically kept: Numerics 0-9 Alphas A-Z, a-z Hyphen - Apostrophe ‘ Forward Slash / Ampersand & Plus Sign + NOTE: + is recoded to & Double Quotes “ NOTE: Double quotes are recoded to single quotes (‘) Percent Sign % Characters automatically removed: Brackets [ ] Redirection Symbols <> Dollar Sign $ Pound Sign # Equal Sign = Back Slashes \ Carat Symbols ^ Asterisks * Quotes “” (Recodes to single quotes.)

LEAVE_LABEL_LINE_ SKIP_ When constructing the label area, use the DELIMITER “skipped” delimiter between tokens.

LINE1 M, ? For example, LINE1=1 999 ?; indicates that the miscellaneous data line 1 is located at position 1 for a length of 999 and is of unknown type address. The maximum length of a line is 999.

Trillium Software System™ Batch User’s Guide Parameter File Descriptions 5-13

Table 5.1 Business Data Parser Parameters

Parameter Name Values Description

NO_SPECIAL_CHARACTER_ Y Separates the word from the special character LOOKUP_SERVICE prior to look up in the table, understanding the processing of words ending with (- or /). For example, “INC-” is normally looked up as “INC” if no entry is found in Word/Pattern Table for “INC-”. If “INC-” is found in the table, the (-) is removed. Y disables this process.

POPULATE_UNKNOWN_ Y Populates user fields with known attributes, PATTERNS even in the event of a pattern failure.

PRIMARY_LOGFNAME file name Name of the Parser log file.

PRIMARY_DETFNAME file name Used with the DETAIL_DISPLAY parameter. Defines the file name the detail display report is to be written to.

PRIMARY_WORDFNAME file name Name of the word/pattern definitions file created during table maintenance. Normally named BDTABDEF.

PRIMARY_ file name Name of the word/pattern pattern file created PATTERNFNAME during table maintenance, normally named BDTABPAT.

REVIEW_GROUP_ file name Indicates the Review Groups hierarchy. This ORDER parm can be manipulated to change the review group hierarchy order. Review groups consist of three digit numerics (must be three digits). The left most numerics have priority over the right. If a Review Group is omitted, it is shut off. Any record not in a Review Group is coded as “000.” See page 5- 26 for a list of the default Review Group hierarchy.

Business Data Parser 5-14 Parameter File Descriptions

Table 5.1 Business Data Parser Parameters

Parameter Name Values Description

SKIP_DELIMITER Special Normally, the special characters, period (.), Characters comma (,), and space ( ), are used as word delimiters. Use this parameter to specify other delimiters to replace the above three characters. For example: SKIP_DELIMITER=/,. ; specifies that the slash is to be used as the delimiter instead of the period, comma, or space characters. This parameter does not include the delimiter when searching within the table. For example: Fiona/ MacDonald Fiona is looked up in the table.

TEXT_USER1 - TEXT_ defined User-defined names for user attributes 1 USER50 names through 50

TRFILE file name Name of the optional translate tables. These tables correct the character set for processing. TRFILE is used when foreign characters need to be recognized for example, characters with tildes, umlouts, and so on).

POPULATE_UNKNOWNS_ Y or blank When set to Y, the label line is populated with TO_LABEL the complete input line including unknown or undefined words/tokens. Defined words will be standardized. If the parameter is left blank, the label line is populated ONLY with words/tokens that have been defined in the word/pattern table. They will be standardized and appear in the same left-to-right order that they appear in the input line.

ORIGINAL_MEANINGS_ 1 Forces display fields to always be populated OPTION with original input data (no recodes, no synonyms)

Trillium Software System™ Batch User’s Guide Parameter File Descriptions 5-15

Sample Parameter File for the Business Data Parser

****************************************************************** * PFBPARSE.PAR - Business Data Parser parameter file* ****************************************************************** PRIMARY_LOGFNAME=BPALOG; PRIMARY_DETFNAME=BPARSDET; *DETAIL_DISPLAY=; PRIMARY_WORDFNAME=BDTABDEF; PRIMARY_PATTERNFNAME=BDTABPAT; LINE1=1 999 M; TEXT_USER1=”MAKE”; TEXT_USER2=”MODEL”; TEXT_USER3=”YEAR”; POPULATE_UNKNOWN_PATTERNS=Y; POPULATE_UNKNOWNS_TO_LABEL=; KEEP_CHARACTER=.,$()&#; SKIP_DELIMITER= ; *KEEP_DELIMITER=; *REVIEW_GROUP_ORDER=; *LEAVE_LABEL_LINE_SKIP_DELIMITER=; * ORIGINAL_MEANINGS_OPTION=; * NO_SPECIAL_CHARACTER_LOOKUP_SERVICE=; *TRFILE=; *ISALFILE=; *ISNMFILE=;

The last line of all Parameter Files and Word/Pattern tables MUST contain a carriage return and/or line feed in order for the system to process the last Parameter/table entry in the file. Do not use tabs; only spaces are valid.

Business Data Parser 5-16 Running the Business Data Parser on UNIX and 32-bit PC Platforms

Running the Business Data Parser on UNIX and 32-bit PC Platforms

To execute the cfgrsdrv program, use the following command-line syntax: Syntax cfgrsdrv -parmfile parm_file_name -parmecho echo_file_name where:

cfgrsdrv Name of the Business Data Parser driver program

-parmfile Keyword that indicates the parameter file follows

parm_file_name Name of the driver parameter file

-parmecho Keyword that indicates the parameter echo file follows

echo_file_name Displays any parameter processing errors in the program listing file, echo_file_name (Optional)

Example:

cfgrsdrv -parmfile ..\parms\pfbprdrv.par -parmecho ..\data\echo

Trillium Software System™ Batch User’s Guide IBM Mainframe Sample JCL 5-17

IBM Mainframe Sample JCL

The following sample Job Control Language is used to run cfgrsdrv: ******************************************************************************* //* SAMPLE JCL TO RUN THE BUSINESS PARSER PROGRAM (CFGRSDRV) // ******************************************************************************* //CFGRSDRV EXEC PGM=CFGRSDRV,REGION=5500K, // PARM='/-PARMFILE PF -PARMECHO PE', REGION=0M //STEPLIB DD DSN=&BASEPREF.&TRILVER.LOADLIB,DISP=SHR // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //BINPUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(BINPUT) //BPREPOS DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(BPREPOS) //BPARSOUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(BPARSOUT) //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //TRILMSGS DD DUMMY //BPASTAT DD SYSOUT=* //CEEDUMP DD DUMMY,DCB=BLKSIZE=133 //PE DD SYSOUT=* //* PFBRSDRV IS THE PARSER DRIVER PARM FILE //PF DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.USMLIB(PFBRSDRV) //* PFBPARSE IS THE PARSER PARM FILE //PFBPARSE DD DISP=SHR,DSN=&PROJPREF.&TRILVER.USMLIB(PFBPARSE) //* BINPUT IS THE INPUT FILE //BINPUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DATA.BINPUT //* BPAOUT IS THE OUTPUT FILE (BPARSOUT) //BPAOUT DD UNIT=&UNIT,DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=4339,BLKSIZE=21144), // SPACE=(TRK,(10,50),RLSE), // DSN=&PROJPREF.&TRILVER.US.DATA.BPAOUT //* BPALOG IS THE PARSER LOG FILE //BPALOG DD UNIT=&UNIT,DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=200,BLKSIZE=23200), // SPACE=(TRK,(350,50),RLSE), // DSN=&PROJPREF.&TRILVER.US.DATA.BPALOG //TABLEDEF DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.TABLES.BDTABDEF //TABLEPAT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.TABLES.BDTABPAT Figure 5.1 IBM Mainframe Sample JCL for cfgrsdrv

Business Data Parser 5-18 Business Data Parser Output

Business Data Parser Output

The following sections describe the output records and files returned by the Business Data Parser. The first section discusses the Business Data Parser repository record, or BPREPOS, which details the data returned from parsing. Samples and descriptions of the following Business Data Parser output files are also included:

Business Data Parser Log file (bpalog)

Business Data Parser Detail file (bparsdet)

Business Data Parser Statistics file (bpastat.txt)

Business Data Parser Repository Record (BPREPOS)

This output record consists of data in two formats: display and non-display. Non-display fields are constructed with matching purposes in mind. Values are assigned to these fields using the Table Maintenance definition of the element. ‘CHEVROLET’ INSERT MISC DEF ATT=MAKE ‘CHEVY’ INSERT MISC DEF ATT=MAKE,RECODE=‘CHEVROLET’

Non-Display Field Value Definitions

Non-Display fields contain any recode values. For a word or phrase:

If Then RECODE value exists Use recode value. For example, if input record =’Chevy’, then output is ‘Chevrolet’ SYNONYM value exists Use the original word/phrase value.

Display Field Value Definitions

Display fields contain the original input data.

If Then SYNONYM value exists Use the SYNONYM value.

Trillium Software System™ Batch User’s Guide BPREPOS Fields 5-19

Otherwise Use the original word/phrase value. For example, if input record =’Chevy’, then output is ‘Chevy’ The BPREPOS is returned to the calling program in the following layout. Note that although the 3 user fields are listed only once, they are repeated fifty times.

BPREPOS Fields

The following table provides the position, length, and description of the fields in the BPREPOS. Fields to be included in the BPARSOUT DDL are:

ba_return

ba_user1 through ba_user50 (as needed)

ba_user1_display through ba_user50_display (as needed)

ba_misc_data

ba_line

BDP Repository Output Record Format

Table 5.2 BD Parser Repository Output Record Format

Field Pos Len Description

ba_return 1 1 The Business Data Parser return codes.

ba_confidence 2 2 This field is reserved for future use.

ba_comprehension 4 2 This field is reserved for future use.

ba_orig_linepat 6 1 Original line pattern of input line

ba_line_rules 7 20 Line type identification rule

ba_global_review_ 27 30 See “Business Data Parser Review Codes” on codes page 5-25.

ba_misc_review_codes 57 30 See “Business Data Parser Review Codes” on page 5-25.

Business Data Parser 5-20 BDP Repository Output Record Format

Table 5.2 BD Parser Repository Output Record Format

Field Pos Len Description

ba_category 87 50 Field used to store category codes assigned to word definitions in the Word/Pattern tables.

ba_rev_group 137 3 0 = No Targeted Conditions Found 1 = Unidentified Pattern 2 = Miscellaneous Line Too Long 3 = Label Line too Long 4 = Too Many Categories Found 5 = Unknown Token 6 = No Data Found

ba_filler_01 140 100 This field is reserved for future use.

ba_user1 240 100 User-defined business data field, using recoded (REPEAT 50 TIMES) word(s)

ba_user1_display 340 100 User-defined business data field, no recodes apply (REPEAT 50 TIMES)

ba_data_present_ 10240 1 Y/N flag indicating presence of data in ba_user1 user1 field (REPEAT 50 TIMES)

ba_misc_data 10290 1000 Field to store any data not identified by pattern processing

ba_filler_02 11290 310 Reserved for future use.

ba_line 11600 1000 Label line standardized in original sequence.

ba_pattern 12600 300 Field used to store 100 three-character token identifiers used for debugging and tuning

ba_line_type 12900 1 Output line type of M or as defined by pattern processing

ba_filler_03 12901 100 Reserved for future use.

Trillium Software System™ Batch User’s Guide Business Data Parser Log File 5-21

Business Data Parser Log File

When the BDP encounters a bad pattern during processing, an entry is written to the Business Data Parser Log File (bpalog) for the purpose of reviewing and improving the parsing process. This file also lists statistics collected during the parsing process that are helpful in assessing overall parsing results. The Business Data Parser Log File is usually sorted to identify common occurences.

Sample Business Data Parser Log File (Sorted)

The following example shows a sorted Log File (bpalog.srt). The first section of this file lists statistics gathered during parsing. Bad patterns encountered in the business data are displayed at the end of the file. BUSINESS DATA PARSER PRIMARY PARSING STATISTICS Trillium Software System(R) BUSINESS DATA PARSER

0 NUMBER OF RECORDS CONTAINING ALL BLANK DATA 0 NUMBER OF RECORDS CONTAINING AN OTHER LINE 0 NUMBER OF RECORDS CONTAINING AN UNKNOWN LINE 4 NUMBER OF RECORDS CONTAINING MISC REVIEW CODES 58 NUMBER OF RECORDS CONTAINING A MISC LINE 58 NUMBER OF RECORDS PROCESSED

BAD MISC PATTERN=OTHER-SPECIAL, YEAR , MAKE , MODEL , ALPHA, ALPHA, ALPHA, ALPHA, HYPHEN, ALPHA, ALPHA, ALPHA, ALPHA, ALPHA, ALPHA, | LOAN#345678912, 1996, SATURN, SL2, DENT, IN, LEFT, FRONT, -, CURRENT, INSURANCE, CLAIM, COMMERCE, INSURANCE, CO, REC=3

BAD MISC PATTERN=OTHER-SPECIAL, YEAR , MAKE , MODEL , MODEL , ALPHA-SPECIAL, NUMERIC-SPECIAL, | LOAN#234567891, 2001, DODGE, RAM 1500, QUAD-CAB, PAID-IN-FULL, 3/31/02, REC=2

BAD MISC PATTERN=YEAR , MAKE , MODEL , YEAR , | 1989, OLDS, REGENCY, 98, REC=46 BAD MISC PATTERN=YEAR , OTHER-SPECIAL, MAKE , ALPHA, NUMERIC-

Business Data Parser 5-22 Bad Patterns

Bad Patterns

The bad patterns encountered in the business data appear at the end of the sorted Business Data Parser Log File in the above example. This section itemizes the token attributes for each unknown pattern.

The Business Data Parser separates the tokens with a comma (such as YEAR , MAKE). Tokens in the pattern prior to the vertical bar ( | ) are the attributes assigned by the Business Data Parser and tokens after the vertical bar are the actual values. With this, the user may compare the pattern directly with the actual data, which is included on the same Bad Pattern line.

Also shown is the record number (such as REC=1) which identifies the record where the error occurred in the original input file. If a comma is included on an input line in the original file, this is indicated in this section by a double comma (such as YEAR ,, MAKE).

The fine-tuning of the Business Data Parser is achieved by the user adding or correcting entries in the standard word/pattern table (BDWDPAT) or in the client word/pattern table (CDWDPAT) through Table Maintenance.

Using the Palog Analyzer

The Business Data Parser is designed to be extremely flexible in order to handle the variations of input data. This flexibility is achieved through the modification of tables used to give the parsing software its understanding of the data elements.

After executing the Business Data Parser, the Log File (bpalog.txt) should be examined. The run statistics are located at the end of the log file. The statistics give the user a good indication of events that happened during parsing. Prior to recording the statistics, invalid patterns are recorded. Searching for common occurrences that could be repaired with minimal effort should be attempted first. The Log File should be sorted to identify common occurrences.

Trillium Software System™ Batch User’s Guide Corrective Action 5-23

After reviewing the Log file for common errors, the user may make additions and corrections to the standard table or the user-defined table that are used as input to Table Maintenance.

At least one definition and one pattern entry must be present in the standard table (BDWDPAT).

The Palog Analyzer (located inside the Parser Tuner tool) is an interactive tools that greatly assists with the analysis and correction of pattern problems reported in the Parser log files. This tool makes it very easy take the proper corrective action, by assisting with the creation of the appropriate client word/ pattern table entry.

See the Parser Tuner section of the Control Center manual for complete information.

Corrective Action

Miscellaneous Pattern Addition

BAD MISC PATTERN=YEAR, MAKE, ALPHA, YEAR | 1980, OLDS, DELTA, 88

Action: Add Miscellaneous Pattern to User Table ‘YEAR MAKE ALPHA YEAR’ INSERT PATTERN MISC DEF RECODE=’YEAR MAKE MODEL MODEL’

This bad pattern could also be corrected with a Word/Phrase Addition.

Action: Add ‘Delta 88’ as a Model in the User Table ‘DELTA 88’ INSERT MISC DEF ATT=MODEL

The record would then parse by hitting the following existing pattern in the standard word/pattern table:

Business Data Parser 5-24 Miscellaneous Pattern Addition

‘YEAR MAKE MODEL’ INSERT PATTERN MISC DEF RECODE=’YEAR MAKE MODEL’

After making appropriate changes to the User Table, the user must execute Table Maintenance to produce a new set of encoded word/phrase and pattern Tables. The Business Data Parser should then be re-executed in order to evaluate results.

This process can be repeated as many times as necessary.

Trillium Software System™ Batch User’s Guide Review Codes and Review Groups 5-25

Review Codes and Review Groups

If the Business Data Parser is being executed from the Batch System, results are often examined by a review group.

Review codes are generated from the Business Data Parser to identify specific conditions occurring for each record being parsed. When a record receives a review code, a review group is also written to a 3-byte field called BA_REV_GROUP (described in Table 5.2 on page 5-19, “Business Data Parser Repository Output Record Format”) in the BPREPOS. In the case of multiple review codes, the review group is determined by a default hierarchy table

The table “Output Review Group Hierarchy” on page 5-26 show these review groups. To change the review group order, the REVIEW_GROUP_ORDER parameter may be added to the BDP Parameter File (see “The Business Data Parser Parameter File” on page 5-10 for more information) to specify the review group hierarchy.

Business Data Parser Review Codes

The review codes are written to the BDP Repository Output Record (BPREPOS) in three character pairs in the following fields:

ba_global_review_codes

ba_misc_review_codes

Table 5.3, “Output Review Group Hierarchy” lists the review codes, review groups, and descriptions.

Business Data Parser 5-26 Business Data Parser Review Codes

Output Review Codes and Review Group Hierarchy

Table 5.3 Output Review Group Hierarchy

Review Code Review Group Description Review Codes can belong to multiple Review Code Fields:

082 001 Unidentified pattern

087 002 Miscellaneous line too long

086 003 Label Line too Long

088 004 Too many categories found

083 005 Unknown token

090 006 No data found

000 000 No targeted conditions found

The parameter, REVIEW_GROUP_ORDER, in the Business Data Parser parameter file, may be used to modify the review group hierarchy.

Trillium Software System™ Batch User’s Guide Error Messages 5-27

Error Messages

This table lists the error messages for the cfgrsdrv program: Table 5.4 CFGRSDRV Error Codes

Message Description

Must have a parm file. The parameter file for the business data parser driver program is missing from the command line.

Missing PA_PARMNAME Path and parm file for the driver program missing from PA_ parameter. PARMNAME.

No input file specified. An input data file has not been specified in INP_DDNAME.

Unable to open input file The input file specified in INP_DDNAME is invalid or the file . contents are corrupt. Check the path and\or filename.

Unable to open output file The output file specified in the BDP driver parameter OUT_ . DDNAME is invalid or the file contents are corrupt. Check the path and/or filename.

Unable to open reject file The reject file specified in REJ_DDNAME is invalid or contents . are corrupt.

Problem trying to initialize One of the Parameter values within the parameter file the Parser. PFBPARSER.PAR is incorrect. Please check the PALOG file for error messages.

I/O CFGRSDRV ERROR The temp directory which holds overflow is out of space. during read on .

ABEND Parser problems. One of the Parameter values within the parameter file PFBPARSER.PAR is incorrect. Please check the PALOG file for error messages.

cv_CopyRecord failed: An error has occurred when trying to copy the input record. .

cv_CopyField The ORG_RECORD section of the PAOUT file does not match (ORG_RECORD) failed that of the Original Input File. ORG_RECORD information is . missing.

Writing to . An error has occurred while writing to the specified file.

Business Data Parser 5-28 Error Messages

Table 5.4 CFGRSDRV Error Codes

Message Description

Writing to . An error has occurred while writing to the specified field. Check that the field name specified in the program is correct.

Problem in parser during Check that the Parser output file has not been corrupted and close. that there is sufficient space available for the write operation to this file.

Closing file . An error has occurred trying to close the specified file. Check that the file is not corrupted and there is sufficient space available for the write operation.

Unable to open statistics The file specified in STAT_FNAME is invalid, or the file permissions on the file prevent the user from overwriting the existing file.

Parm Processing Error, The path and parameter file for the customer data parser status = 1 driver program is missing from the command line. Check the path and/or file name.

Parm Processing Error, The parameter file for the customer data parser driver status = 2 program is present but incorrect. Check the path and/or filename.

Parm Processing Error, The parameter echo file for the customer data parser driver status = 3 program is present but cannot be opened. Check the permissions on the file.

Parm Processing Error, The program has encountered an error with a parameter status = 4 entry. Use the parameter echo debugging process to determine the entry that is incorrect.

Parm Processing Error, Unknown command line parameter. status = 5

Parm Processing Error, Duplicate parameter name found in file. status = 6

Parm Processing Error, Bad format for override parameter. status = 7

Parm Processing Error, Bad format for parameter in source program. status = 8

Trillium Software System™ Batch User’s Guide Error Messages 5-29

Table 5.4 CFGRSDRV Error Codes

Message Description

Parm Processing Error, Parameter value was expected to be numeric. status = 9

Parm Processing Error, Missing override value. status = 10

Parm Processing Error, No Ending Quote. status = 11

Parm Processing Error, Insufficient memory for parameter value parm_entry. status = 12

Parm Processing Error, Insufficient memory for parameter value parm_entry. status = 13

Parm Processing Error, Insufficient memory for long parameter value. status = 14

Parm Processing Error, Insufficient memory for double parameter value. status = 15

Parm Processing Error, Extraneous parenthesis found. status = 16

Business Data Parser CHAPTER 6 Window Key Generator

The Window Key Generator program is designed to create Window Keys for matching. A window key is a composite structure constructed from complete or partial elements in the input data. The window key is then used as a filter to select records for inclusion to the match window for comparison. (Multiple window keys can be used for multiple comparisons.) This program uses a rules-based parameter file to create up to 30 window keys per input record.

This program generates window keys based on user- defined rules. The ability exists to create one output record with the window keys generated and stored in separate fields, or to create one output record per window key generated. If the option to create one output record per window key is selected, then ‘invalid’ window keys are not generated on output.

See “Rules File” on page 6-8 for information about invalid window keys.

The maximum window key value is 50 bytes.

Window Key Generator 6-2 Window Key Generator Process Flow

Window Key Generator Process Flow

Input and Output Resources

The Window Key Generator requires the following input and output files:

Driver parameter file pfwinkey.par

Rules parameter file pfrules.par

Input DDL parsout.ddl

Input file Typically the Geocoder output file (geout)

Output file wkeyout

Trillium Software System™ Batch User’s Guide Window Key Generator Parameters 6-3

Window Key Generator Parameters

Please note that all required parameters appear in bold and shaded. Table 6.1 Window Key Generator Parameters

Parameter Value Description

MAXIN numeric Maximum number of records to read. The default is to read all records.

INP_DDNAME file Name of the input file. name

INP_DDL DDL Name of input DDL; the record name of the input DDL. name

RULES_DDNAME rules file Name of the parameter file that contains the window key definitions.

FOREIGN_CHAR_ numeric SUPPORT Valid values are:

EBCDIC ASCII 0 codepage037 codepage819 1 codepage875 codepage1253

If the input file is encoded as codepage 875 or equivalent, this should be set to 1. This ensures that the program properly interprets foreign character values greater than 0x7F.

Default foreign character values are used by the system if this parameter is not provided.

RULES_ECHO listing Name of the listing file of the processed RULES_DDNAME file. file

Window Key Generator 6-4 Window Key Generator Parameters

Table 6.1 Window Key Generator Parameters

Parameter Value Description

GENERATE_ Yes, No Yes=Creates 1 output record for each valid window key RECORDS defined. If three window keys are defined, the first input record is written out three times, each with the different window key field populated. No=Creates 1 output record per 1 input record with all the window key fields populated with the different window key values.

OUT_DEFAULT_ DDL The file name of output DDL. Record name of the output DDL. DDL name This is valid only if GENERATE_RECORDS is set to ‘No.’

OUT_DEFAULT_ file Name of output file. DDNAME name Valid only if GENERATE_RECORDS is set to ‘No.’

WKEY_NN_ field Field names to store the window key values of each defined NAMES names window key. Valid values for ‘NN’ are 01-30. Valid only if GENERATE_RECORDS is set to ‘No.’ The ‘NN’ value must correspond to the rules parameter file. For every WKEY_NN_NAMES parameter used, there must be a matching corresponding WINDOW_KEY_NN PARAMETER in the rules parameter file.

OUT_NN_DDL This parameter has five entries: 1. Name of the output DDL file 2. Name of the record of the output DDL 3. Name of the field used to store the window key number Field must be 2 bytes. 4. Name of the field used to store the window key value 5. Name of the output file Valid only if GENERATE_RECORDS is set to ‘Yes.’ Valid values for ‘NN’ are 01-30. The ‘NN’ value must correspond to the window_key_nn in the rules parameter file. For every out_nn_ddl parameter used, there must be a matching corresponding window_key_nn parameter in the rules parameter file.

Trillium Software System™ Batch User’s Guide Window Key Generator Parameters 6-5

Table 6.1 Window Key Generator Parameters

Parameter Value Description

WKEY_PREDEF_ DDL field Field names to be used as input for predefined window keys. INPFLD name Requires use of WKEY_PREDEF_OUTFLD parameter. This is valid only if GENERATE_RECORDS is set to ‘Yes’. If both parameters are used, these parameters are ignored: RULES_ECHO. OUT_nn_DDL, WKEY_nn_NAMES, WINDOW_ KEY_nn, and MATCH_WKEY_nn.

If both parameters are used and PARM is set, the rules file parm will be ignored and these warning messages are generated: WARNING: USING PARM . WARNING: USING PARM . WARNING: PARM IS IGNORED.

WKEY_PREDEF_ DDL field Field names to be used as output for predefined window keys. OUTFLD name Requires use of WKEY_PREDEF_INPFLD parameter. This is valid only if GENERATE_RECORDS is set to ‘Yes’.

See WKEY_PREDEF_INPFLD parameter descripition for list of warning messages and ignored parameters when both parameters are used.

TRANSLATE_DATA Specifies the translation option to be applied to the input data record before Window Key processing occurs. Valid values are: EA2 = translate the EBCDIC input data to ASCII AE2 = translates the ASCII input data to EBCDIC NONE = no translation occurs (default value)

Window Key Generator 6-6 Sample Parameter File 1

Sample Parameter File 1

This parameter file shows that GENERATE_RECORDS is set to “No”; this means that one output record is created for each input record.

******************************************************* * CFWINKEY PARM FILE ******************************************************* INP_DDNAME “..\data\input.dat” INP_DDL “..\dict\parsout.ddl”, “PARSOUT” RULES_DDNAME “..\parms\pfrules” RULES_ECHO GENERATE_RECORDS “No“ OUT_DEFAULT_DDL “..\dict\parsout.ddl”, “PARSOUT” OUT_DEFAULT_DDNAME “..\data\output.dat” WKEY_01_NAMES window_key_01 WKEY_02_NAMES window_key_02 WKEY_03_NAMES window_key_03

Sample Parameter File 2

This parameter file shows that GENERATE_RECORDS is set to “Yes”; this means that one output record is created for each valid defined window key.

***************************************************************************** * CFWINKEY PARM FILE *****************************************************************************

INP_DDNAME “..\data\input.dat” INP_DDL “..\dict\parsout.ddl”, “PARSOUT” RULES_DDNAME “..\parms\pfrules” GENERATE_RECORDS “Yes“ OUT_01_DDL ..\dict\wk01out.ddl, OUTREC, window_code_01, window_key_01, ..\data\wk01.out OUT_02_DDL ..\dict\wk02out.ddl, OUTREC, window_code_02, window_key_02, ..\data\wk02.out OUT_03_DDL ..\dict\wk02.out.ddl, OUTREC, window_code_02, window_key_02, ..\data\wk02.out

Trillium Software System™ Batch User’s Guide Sample Parameter File 2 6-7

This parameter file shows 3 window keys being generated for which out_02_ddl and out_03_ddl use the same output DDL, and are written to the same output file.

The last line of all parameter files and CLWDPAT tables must contain a carriage return and/or line feed in order to process the last Parameter/table entry in the file. Do not use tabs; only spaces are valid.

Window Key Generator 6-8 Rules File

Rules File

The parameter used with this rules file is WINDOW_KEY_NN.

Syntax

WINDOW_KEY_NN“Pos1”, Pos2, “Pos3”, “Pos4”, “Pos5”, Pos6, “Pos7”, “Pos8”, “Pos9”, “Pos 10” where NN = 01-30

If a position is not used, it must be set to NULL (““). Table 6.2 Window Key Generator Rules Definition

Pos Description

Pos1 The primary field name to use to build the window key.

Pos2 The number of characters to use from Pos1 to build the window key.

Pos3 The rule to use that tells the system which elements to select from Pos1. (See the following table for a list of rule options.) Special note: You can begin window key building for the stated field after the presence of a special character (non-alphabetic & non-numeric). The special character is a hyphen, and is only used in Pos #3 or #7. If this functionality is called and only blanks exist after the special character, the window key building defaults to the beginning of the data field. For example: “pr_last_01”,”3”,”–C”,””,””,0,””,””,””,”” pr_last_01 data field = JONES-SMITH The resulting window key elements are SMT.

Pos4 Secondary field name to use to build the window key. Used only if Pos1 is not used.

Pos5 Primary field value which forces the use of Pos4 (secondary field) to build the window key.

Pos6 Number of characters to use from Pos4; (must be <= Pos2). If not used, set to 0.

Pos7 Specifies which elements to select from Pos4. (See the following table for a list of rules options.) See note above in Pos. #3 regarding special-character selection functionality.

Trillium Software System™ Batch User’s Guide Window Key Generator Codes 6-9

Table 6.2 Window Key Generator Rules Definition (Continued)

Pos Description

Pos8 Window key is not created if Pos1 (or Pos4) is blank or if it only contains zeros. The options for this position are: B = Do not create window key if Pos1 or Pos4 is blank. Z = Do not create window key if Pos1 or Pos4 is only zeros. BZ = Do not crete window key if Pos1 or Pos4 is blank or zeros. Only applies when RECORD_GENERATION is set to “Yes.”

Pos9 Window key is not created if Pos1 (or Pos4) equals the value specified in this field. For example, set this field to the string Blank to prevent a window key from being created for records that have a value of Blank in Pos1 or Pos4. Be sure the length of Pos9 matches the length of the field in Pos1 (or Pos4). Only applies when RECORD_GENERATION is set to “Yes.”

Pos10 Window key is not created if Pos10 is set to a user-defined flag field in your DDL. The flag field should have a size of 1 and be set to Y or y to prevent the creation of a window key for this record. Only applies when RECORD_GENERATION is set to “Yes.”

Window Key Generator Codes

The following codes compliment positions in the Rules definition layout.

For codes that contain the symbol '&': this invokes an option to begin building the window key using the last word of the field. Table 6.3 Window Key Generator Code Descriptions

Code Description Used With

A, A& Any character Position 3 and 7

V, V& Vowels only Position 3 and 7

VNR, VNR& Non-repeating vowels only Position 3 and 7

R If input value is equal to 1, store value as 1. All other Position 3 and 7 input values recode to a 2. Used primarily for pr_nmform_01 field

Window Key Generator 6-10 Window Key Generator Codes

Table 6.3 Window Key Generator Code Descriptions

Code Description Used With

C, C& Consonants only Position 3 and 7

CNR, CNR& Non-repeating consonants only Position 3 and 7

N Numerics only Position 3 and 7

NNR Non-repeating numerics only Position 3 and 7

NY Algorithm based on standard NYSIIS code. (See Position 3 and 7 “PARTIAL2” in the Matcher chapter later in this manual.)

FV, FV& First + vowels Position 3 and 7

FVNR, FVNR& First + non-repeating vowels Position 3 and 7

FC, FC& First + consonants Position 3 and 7

FCNR, FCNR& First + non-repeating consonants Position 3 and 7

I, I& Initials Position 3 and 7

OS Original soundex (See “PARTIAL2” in the Matcher Position 3 and 7 chapter.)

ROS Reversed original soundex

RIS Reversed improved soundex

IS Improved soundex (See “PARTIAL2” in the Matcher Position 3 and 7 chapter)

A* Any sorted Position 3 and 7

V* Vowels only sorted Position 3 and 7

VNR* Non-repeating vowels only sorted Position 3 and 7

C* Consonants only sorted Position 3 and 7

CNR* Non-repeating consonants only sorted Position 3 and 7

FV* First + vowels sorted Position 3 and 7

FVNR* First + non-repeating vowels sorted Position 3 and 7

Trillium Software System™ Batch User’s Guide Sample Window Key Rules File 6-11

Table 6.3 Window Key Generator Code Descriptions

Code Description Used With

FC* First + consonants sorted Position 3 and 7

FCNR* First + non-repeating consonants sorted Position 3 and 7

I* Initials sorted Position 3 and 7

B Blanks Position 8

Z Zeros Position 8

BZ Blanks or Zeros Position 8

RNY Reversed NYSIIS code. (Input string is reversed Position 3 and 7 before the code is built.) (See “PARTIAL2” in the Matcher chapter)

FC- First character after hyphen and subsequent consonants, Position 3 and 7 commony used with hyphenated names.

Sample Window Key Rules File

Each window key can be made up of multiple fields. Each field has its own rules definitions.

If GENERATE_RECORDS is set to ‘Y’, and any field used in generating the key is true for positional parameters 8, 9 or 10, then an output record for that particular window key is not generated.

Running the Window Key Generator on UNIX and 32-bit PC Platforms

The cfwinkey program uses the following command-line syntax:

Syntax cfwinkey -parmfile parm_file_name -parmecho echo_file_name

Window Key Generator 6-12 Running the Window Key Generator on UNIX and 32-bit PC Platforms

SAMPLE INPUT RECORD pr_gout_postal_code = 01821 pr_last_01 = JONES pr_gout_str_name = LINNELL pr_nmform_01 = 1 window_key_01 "pr_gout_postal_code",3,"A","pr_postal_code"," ",3,"A","","","" "pr_last_01",2,"-FC","pr_busname_01"," ",1,"A","","","" "pr_gout_str_name",2,"FC","","",0,"","","","" "pr_nmform_01",1,"R","","",0,"","","","" window_key_02 "pr_gout_postal_code",5,"A","pr_postal_code"," ",5,"A","","","" "pr_last_01",3,"-A","pr_busname_01"," ",1,"A","","","" "pr_nmform_01",1,"R","","",0,"","","","" window_key_03 "pr_gout_postal_code",5,"A","pr_postal_code"," ",5,"A","","","" "pr_gout_str_name",5,"A","","",0,"","","","" "pr_nmform_01",1,"R","","",0,"","","",""

SAMPLE OUTPUT WINDOW KEYS GENERATED window_key_01 = 018JNLN1 window_key_02 = 01821JON1 window_key_03 = 01821LINNE1 where:

cfwinkey Name of the Window Key Generator driver program

-parmfile Keyword that indicates the parameter file follows

parm_file_name Name of the program parameter file

-parmecho Keyword that indicates the parameter echo file follows

echo_file_name Name of the file that displays any parameter processing errors in the program listing file

Example cfwinkey -parmfile ..\parms\cfwinkey -parmecho ..\data\echo

Trillium Software System™ Batch User’s Guide Sample IBM Mainframe JCL 6-13

Sample IBM Mainframe JCL

The following sample Job Control Language (JCL) is used to run cfwinkey:

//*************************************************************************** //* CFWINKEY - CREATING WINDOW KEYS //*************************************************************************** //CFWINKEY EXEC PGM=CFWINKEY,PARM=’/-PARMFILE PF -PARMECHO PE’, // REGION=0M //STEPLIB DD DSN=&PROJPREF.LOADLIB,DISP=SHR // DD DSN=&PROJPREF.LINKLIB,DISP=SHR //INPUTDDL DD UNIT=&UNIT, // VOL=SER=&VOL, // DISP=SHR, // DSN=&PROJPREF.DDLLIB(PARSOUT) //CEEDUMP DD DUMMY,DCB=BLKSIZE=133 //SYSOUT DD SYSOUT=A //SYSPRINT DD SYSOUT=A //PE DD SYSOUT=A //PF DD * //*************************************************************************** //* PARM FILE FOR CFWINKEY //*************************************************************************** INP_DDNAME INPUT INP_DDL INPUTDDL, PARSOUT OUT_DEFAULT_DDL INPUTDDL, PARSOUT

OUT_DEFAULT_DDNAME OUTPUT wkey_01_names window_key_01 wkey_02_names window_key_02 wkey_03_names window_key_03

RULES_DDNAME RULES RULES_ECHO RULECHO //* //PE DD SYSOUT=A //RULECHO DD SYSOUT=A //* //* INPUT FILE //* //INPUT DD DISP=SHR, // DSN=&PROJPREF.INPUT.DAT //* //* OUTPUT FILES //* //OUTPUT DD UNIT=&UNIT, // VOL=SER=&VOL, // DISP=(NEW,CATLG,DELETE), // SPACE=(TRK,(200,100),RLSE), // DCB=(RECFM=FB,LRECL=&LRECL,BLKSIZE=&BLKSIZE), // DSN=&PROJPREF.OUTPUT.DAT //* //* RULES PARM FILE //* //RULES DD *

Window Key Generator 6-14 Error Messages

****************************************************************************** * Window key rules parameter file ******************************************************************************

window_key_01 “pr_last_01”,5,“A”,”pr_busname_01”,” ”,5,”A”,””,””,”” “pr_gout_str_name”,1,”A”,””,””,0,””,””,””,”” “pr_gout_postal_code”,5,”A”,””,””,0,””,””,””,””

window_key_02 ”ssn”,9,”N”,””,””,0,””,”Z”,””,””

window_key_03 “phone”,10,“N”,““,““,0,““,“BZ”,““,““ “pr_nmform_01”,1,“R”,““,““,0,““,““,““,““

Error Messages

The following error messages can all be returned from cfwinkey.

Table 6.4 CFWINKEY Error Messages

Message Description

Parm processing error, status Parameter file for the window key generator program is = 2 present but incorrect. Check the file path and name.

Parm processing error, status Parameter echo file is present but incorrect. = 3

Parm processing error, status Program has encountered an error with a parameter = 4 entry. Use echo debugging to determine the entry that is incorrect.

initializing primary parm file. One of the parameters is incomplete. Add the value rulesecho to RULES_ECHO to investigate syntax errors in the rules file.

is required. An input data file has not been specified in INP_DDNAME.

is A value has not been specified in GENERATE_RECORDS. required.

is required An input DDL file and/or record name not specified in INP_DDL.

Trillium Software System™ Batch User’s Guide Error Messages 6-15

Table 6.4 CFWINKEY Error Messages

Message Description

invalid An incorrect value has been specified in the parameter value. GENERATE_RECORD. Correct values are "Yes" or "No".

is required. An output DDL file and/or record name is not specified in OUT_DEFAULT_DDL.

out_default_ddl The DDL file specified in OUT_DEFAULT_DDL is present open problems. but cannot be opened because the user does not have permission.

is An output data file has not been specified in the required. parameter OUT_DEFAULT_DDNAME. Enter file path and name. invalid entry for parm Value specified in TRANSLATE_DATA is incorrect. Values . are only E2A or A2E.

is required Rules parameter file is not specified in the parameter RULES_DDNAME. Enter path and parameter filename. initializing ruleslist file. Rules defined in pfrules.par have been incorrectly expressed. Ensure that all 10 fields are represented. parm processing error. One of the fields in pfrules.par has been incorrectly expressed. Please check the field spelling and case, code values and the padded length value of the secondary field. error: output ",index

is Window key field defined in DDL is too long. too long MAX SIZE IS 50.

is The generated window key is larger then the too short designated field.

Cannot find the window key field in DDL to the build wanted but no ddl window key.

Cannot find the output window key field name in the DDL wanted but no out key field that was requested for the build. Check DDL.

Window Key Generator 6-16 Error Messages

Table 6.4 CFWINKEY Error Messages

Message Description

Parameter WINDOW_NN_NAME is missing a wanted but no window_key corresponding window key field name WINDOW_KEY_NN parm. in rules file.

Parameter OUT_NN_DDL is missing a corresponding wanted but no out file. window key field name in the rules file.

and Two WKEY_NN_NAMES parameters have the same have window key field name. same wkey_field.

inp_ddl open The DDL and the name of the input record DDL is invalid problems. or the dictionary is corrupt. Check path and/or file name followed by the record name of the input record DDL.

malloc for common out buffer The memory allocation defined for the out buffer is not failed. sufficient for the number of window keys generated.

copy record failed An error has occurred when trying to copy the input record prior to populating the window key into the record.

generating window_key_ An error occurred when generating the actual value for . the specified window key.

putstringvalue failed for An error occurred while updating the window key field on . the record with the generated window key value.

generating window_key_ The WINDOW_KEY_NN field name is not the same . as in the DDL. Check the case and Spelling of the parameter file.

Trillium Software System™ Batch User’s Guide CHAPTER 7 Matcher

Matching is the process of identifying records with a matching relationship (customer, household, commercial) in a database or duplicates in several databases. The Matcher is a flexible tool, designed to compare records to determine the level of similarity between them. The result of the comparisons is categorized as either a passed, suspect, or failed match, based on the similarity of data elements in the records, as well as the assigned score of their exceptions.

The comparison routines are designed to compare specific types of data. This includes routines for comparing:

Business names Personal name components Primary street components Secondary street components Geography components Generic character components Dates and numeric fields

Matcher 7-2 Matcher Driver Programs

These comparison routines are defined in this chapter. See “Tuning the Match Results” on page 7-92 for more information. You can customize the matching rules to adjust:

Data elements or fields used in comparisons.

Comparison routine used to match each data element.

Alpha grade equivalent of the numeric score returned from a field comparison.

Table used to assign a Pass, Suspect, or Fail state to a pair of records.

Matcher Driver Programs

There are three Matcher driver programs. The most commonly used Matcher for a batch implementation is the cfmatdrv program. This Matcher also has an API that can be called for use in online matching applications. For information about making API calls to the Matcher, see the Programmer's Guide. The other two Matcher drivers (cfmatch and cfwinmat) are primarily intended for Customer Key Manager applications. They are for those with a CKM license.

Functionality CFMATDRV CFMATCH CFWINMAT Applications CKM/Trillium CKM only CKM only Software Record type distinction Household and None None Individual or Commercial Number of Window Keys per Match Single Multiple Single Set record bypass criteria Yes No Yes; up to 3 Optionally create links for Yes Yes Yes pass matches Create reciprocal links in link Yes Yes Yes output file Flags survivor records Yes No No Commonize data across a set of Yes No No matched records using user- defined criteria

Trillium Software System™ Batch User’s Guide Window Matching and Reference Matching 7-3

Functionality CFMATDRV CFMATCH CFWINMAT Write all input records to output file Yes No; links only No; matched records only Optionally create a links file Yes No Yes Perform window or window and Yes Yes No; window reference match per iteration match only Available as API Yes No; batch only No; batch only

Window Matching and Reference Matching

The Matcher has two main functions: matching a record to an existing reference file (called reference matching) and matching a group of records to each other within the same file (called window matching). The input file is read one window key set at a time (assuming that the input data is sorted in window key sequence as referenced in the parameter file). The output data set is sorted by matched individual number within matched household number within the window key. When using reference matching, the Matcher attempts to match the master file to the input files’ window (window key group). If a match occurs, the record in the window is updated with the master file record’s household ID number. After all possible reference matching for a window is complete, the window is matched against itself. Any records matching another record that have a household ID already will receive that household ID. If no match is found, the system generates a key that is based on two parameters: CYCLE and BASE. CYCLE is any user-defined string that you want to use to identify the record as an unmatched one. BASE is the starting number for new records. To create the key, the system adds the internal count number to the value of the BASE parameter and appends the sum to the value of the CYCLE parameter. If the BASE and CYCLE parameters are not defined, the system uses the Julian System date (as the CYCLE) and 0 as the BASE.This program also produces a links file indicating which matched records are linked together with common data. The Matcher can also produce a statistics file that summarizes the results of the matching process, and produces output records with matched information (common data segment and match data segment) appended onto each record.

Matcher 7-4 Window Matching

Data Dictionary Language file formats (DDLs) must be used with the program. See the “Data Dictionary Language” section of the Control Center manual (in the chapter with the DDL Editor tool) for more information about creating and using DDLs.

Window Matching

Window matching uses the window keys to generate the match window: 1. After the window is created, all records within the window are matched to each other, and grade patterns are generated. 2. Records are then sorted so that matched record sets are grouped together and appear first in the record order based on appended updated records. 3. The process ends when the match window is cleared. 4. Window matching is repeated for each window keyset in the file.

Trillium Software System™ Batch User’s Guide Window Matching Input and Output 7-5

Window Matching Input and Output

Input DDLs Used to describe the input, output and Matcher record shapes. Input files Input records with fields to be matched (must be fixed-field data). (with Records) Field and Pattern Parameter lists delivered to the Matcher by the driver program parameter files include Field/Comparison Routine lists and Grade patterns lists (for example, pfinflds.par, pfinpats.par and pfhhpats.par). Driver parameter file Parameters that define all Matcher input and output files, type of (pfmatdrv.par) matching to perform, records to process, and so on. Output Summary statistics file Includes execution statistics, match pattern distribution lists, and (mastat.txt) individual window statistics. Updated records Results of the matching process (for example, you can append a matched Household Number or Individual Number to a record) are used to update records. Links file Output file that contains a list of keys representing the relationship of one matched record to another. (Optional)

Window Keys Window keys are user-defined codes, built of data intrinsic to the name and address record. Because a record is compared to or within the window only, a user should consider carefully how these groups are built.

Window matching uses window keys for comparing records. Window keys are usually included in the input data from the create window key program (cfwinkey) to establish the match window. See Chapter 6, “Window Key Generator,” for more information.

Matcher 7-6 Reference Matching

This use of windows limits the number of comparisons made for any individual record. That is, records are compared to or within the window only. Match windows eliminate the need to compare every record to every other record in the file. The following example shows how a window key is constructed for an Individual versus a Business:

Individual Business First three characters of postal code: First three characters of postal code: 01821 = 018 01821 = 018 First two consonants of last name (or first First letter of business name: vowel and first consonant if name begins TRILLIUM SOFTWARE = T with a vowel): SMITH = SM First three letters of street name: First two letters of street name: 25 LINNELL CIRCLE = LIN 164 LEXINGTON ROAD = LEX Name form type (1=individual, 2=business): Two least significant digits of house number: “1” for individuals 164 = 64 Resulting window key: 018SMLIN1 Resulting window key: 018TLEX64

Reference Matching

For reference matching, the Matcher reads in records, one at a time, from an input file that contains candidate records . Records with the same window keys as the record from the input file are read from a reference file and stored in the window. The input record is then compared to every record in the window using the fields and comparison routines named in the match parameters.

The successful or unsuccessful comparison of each field generates a numeric grade for each comparison. If a match is determined, a key is copied from the reference record to the input record. If no match is found, a new key is generated and stored on the input record and the record is written to the output file.

Trillium Software System™ Batch User’s Guide About Reference Matching 7-7

About Reference Matching

Each record in the master file that contains the same window key as the candidate record is compared to the candidate record.

If a record in the master file matches the candidate record at the household level and the household_number field of the candidate record is blank, the household_number from the master file record is copied to the household_number field of the candidate record.

If a candidate record does not match any record in the master file at the household level, it will be assigned a household_number from the base and cycle parameters in the Matcher Driver parameter file.

If a record in the master file matches the candidate record at the individual level, the household_number is the same (from the second bullet), and the individual_number on the candidate record is blank, the individual_number from the master file record is copied to the individual_number field of the candidate record.

After all records have been through the reference matching process as outlined in #1 through #3 above, the normal window matching takes over on the candidate records, using the matching criteria set in the Matcher parameter files. This process attempts to match all records within a window to all other records within the same window.

During the window matching process, after all household and individual linkages have been established, the household_number field and the individual_number field are examined.

The first non-blank household_number within a household is propagated to the household_number field of all other records in the same household.

The first non-blank individual_number within a household is propagated to any other record in the household that has a blank individual_number and the same individual linkage ID (established during window matching process).

The list of reference matches are output based on the value of their pattern ID, with the lowest IDs placed at the start of the list. For each transaction record, as it is compared to the input reference records a list of reference matches are maintained base on the value of their pattern ID, with the lowest IDs placed at the start of the list. In the

Matcher 7-8 About Reference Matching

case of duplicate pattern IDs, the duplicate is inserted after the previous match. When writing to the optional reference output file and the ALL_REF_ MATCH setting is set to 'Y', this entire list of matches is then written to the output file beginning with the first match with the lowest ID. If the ALL_REF_MATCH setting is set to 'N', only the first match with the lowest ID in the list is written to the output file. For information on ALL_ REF_MATCH , see “Matcher Parameters” on page 7-13. When transferring data from reference records to transaction records, the data from the reference record of the first match with the lowest ID in the list Is copied to the transaction record.

Trillium Software System™ Batch User’s Guide Reference Matching Input and Output 7-9

Reference Matching Input and Output

Input Master file records Data format requirements for Master File records and Candidate and candidate records records are that the data must be the same shape, fixed-field, and the record length must be 32,400 bytes or fewer. Data that has been processed by the Parser is correctly formatted, and may be used directly. Field and pattern Parameter lists delivered to the Matcher by the driver program parameter files include Field/Comparison Routine lists and Grade Patterns lists (such as pfinflds.par, pfinpats.par, and pfhhpats.par). Driver parameter file Parameters used to define all Matcher input and output files, type (pfmatdrv.par) of matching to perform, records to process, and so on. See important specific reference matching parameters and parameter files on the following pages. Output Summary statistics This output file includes execution statistics, match pattern distribution lists and individual window statistics. Updated records All candidate records are updated with the results of the reference and window matches.

Matcher 7-10 Candidate Matching Information (CMI) Parameter File

Candidate Matching Information (CMI) Parameter File

The CMI parameter file is specified in the parameter CMI_FNAME. This file contains the fields required for batch reference matching.

Name Description household_number Field in the master file that contains existing household numbers. Also the location where the existing household numbers are stored on the Candidate record if a match occurs. If there is no match, the Base and Cycle parameters are applied to the matched_hhld field and stored in the field specified by this parameter. Maximum field length is 30 bytes. individual_number Field in the master file that contains existing individual numbers. Also the location where the existing individual numbers are stored on the Candidate record if a match occurs. If there is no match, the Base and Cycle parameters are stored in the field specified by this parameter. Maximum field length is 30 bytes. record_id User-defined, numeric field (for example, seqno) within the Candidate and master files. Must be unique within both the transaction and master file. Maximum field length is 30 bytes. The household_number and individual_number fields point to the existing linkage numbers on the master file. Resulting linkage numbers are also placed in these fields on the output file. Add the field names in the CMI parameter file to the corresponding DDL file prior to attempting reference matching.

For example, use opt# fields for the number parameter and the existing seqno field for the identifier. Note that the reference match output fields cannot be the same fields used in the window match (the Matcher Return Area (MATCHRET) of the DDL). Two new fields are required for the output of the reference match.

Trillium Software System™ Batch User’s Guide Sample CMI Parameter File 7-11

Sample CMI Parameter File ******************************************************* * CANDIDATE MATCHING INFORMATION (CMI) PARAMETER FILE * ******************************************************* HOUSEHOLD_NUMBER FLDNAMES(OUR_HHLD_ID) INDIVIDUAL_NUMBER FLDNAMES(OUR_INDV_ID) RECORD_ID FLDNAMES(SEQNO)

Sample Transaction File NAME CANDIDATE HOUSEHOLD INDIVIDUAL RECORD AND ADDRESS CODE NUMBER NUMBER ID ______JOHN C SMITH 021SMBEA1 9436500001 * 9436500045 * 1 38 BEAL STREET APT 1 WINTHROP MA 02152

DON SMITH 021SMBEA1 9436500001 * 9535300002 ** 2 38 BEAL STREET APT 1 WINTHROP MA 02152

J C SMITH 021SMBEA1 9535300003 ** 9535300003 ** 3 38 BEAL STREET APT 2 WINTHROP MA 02152

ANITA JONES 021JNBEA1 9436500002 * 9436500047 * 4 99 BEACON STREET BOSTON MA 02108

WILLIAM TAFT 018TFGLO1 9535300005 ** 9535300005 ** 5 2 GLORIA AVENUE TYNGSBORO MA 01879

MARY TAFT 018TFGLO1 9535300005 ** 9535300006 ** 6 2 GLORIA AVENUE TYNGSBORO MA 01879 ------* COPIED FROM MASTER FILE ** ASSIGNED USING BASE & CYCLE PARAMETERS *** WINDOW KEY ROUTINE 1 USED TO CREATE WINDOW KEYS

Matcher 7-12 Matcher Driver Program

Matcher Driver Program

The primary Matcher driver program is called CFMATDRV:

Trillium Software System™ Batch User’s Guide Matcher Parameters 7-13

Matcher Driver Input and Output The Matcher driver uses the following input and output files:

Input Driver parameter file pfmatdrv.par Input DDL parsout.ddl Input file wkeyout (Usually output file from Window Key Generator.) Matcher Field List and pfinpats.par, pfinflds.par, pfhhpats.par, pfhhflds.par, Pattern parameter files pfcopats.par, pfcoflds.par, and pfccflds.par Output Output file maout Output statistics file mastat.txt Output Matcher link file malink

Matcher Parameters

Note that REQUIRED parameter names are in bold in a shaded row in the table; all remaining parameters are optional. Table 7.1 Matcher (CFMATDRV) Parameters

Name Values Description

ALL_REF_MATCH Y or Y – enables Matcher to identify all matches when a blank/N transaction record matches more than one record in the reference file. Blank/N(default) – does not attempt to match any additional records on the reference file after matching one record.

BASE Numeric User-defined starting number for new records; used for Reference matching only.

CAND_CODE_FIELD field name Name of the window key field in the DDL (such as window_key_01)

CMI_FNAME file name Identifies the parameter file specifying the additional required fields used for reference matching (Candidate Matching Information (CMI)). REQUIRED for reference matching.

Matcher 7-14 Matcher Parameters

Table 7.1 Matcher (CFMATDRV) Parameters (Continued)

Name Values Description

COMM_FLDS_ file name Name of commercial household field list. For example, FNAME pfcoflds.par.

COMM_PATS_ file name Name of commercial household pattern list parameter FNAME file. For example, pfcopats.par. REQUIRED.

COMN_FLDS_ file name Name of the Common Data segment field list.For FNAME example, pfccflds.par. REQUIRED.

CYCLE date Defaults to Julian System Date (YYDDD) Reference match only.

DDL_INP_FNAME DDL name Name of the input DDL file.

DDL_INP_RNAME DDL name Record name of the input DDL.

DDL_MAT_FNAME DDL name Name of the Matcher return area DDL file.

DDL_MAT_RNAME DDL name Record name of the Matcher return area DDL.

DDL_OUT_FNAME DDL name Name of the output DDL file.

DDL_OUT_RNAME DDL name Record name of the output DDL.

DEBUG Y Used to interpret Matcher results. Creates a statistics file, which shows the resulting patterns for every comparison that occurred. Used for debugging purposes only; this parameter creates an extremely large statistics file. Match comparison details are written to the Matcher statistics file. Y invokes this option.

HHLD_FLDS_FNAME file name Name of the retail household field list parameter file. For example, pfhhflds.par. REQUIRED.

HHLD_PATS_FNAME file name Name of the retail household pattern list parameter file. For example, pfccpats.par. REQUIRED.

INDV_FLDS_FNAME file name Name of the retail individual field list parameter file. (pfinflds.par)

INDV_PATS_FNAME file name Name of the retail individual pattern list parameter file. (pfinpats.par) REQUIRED

Trillium Software System™ Batch User’s Guide Matcher Parameters 7-15

Table 7.1 Matcher (CFMATDRV) Parameters (Continued)

Name Values Description

INP_DDNAME file name Name of the input file. REQUIRED

LARGE_WINDOW_ numeric The number of window key entries in the file KEY_NUMBER * (specified in TABLE_DDNAME) that must be processed for this match.

LINK_DDNAME file name Name of the link file produced by the Create Common function.

LINKS_SELECT P, S, B Allows the selection of links that are to be written to the file designated by the LINK_DDNAME parameter. If this is not used, then only passing links will be written to the LINK_DDNAME file. If the LINK_DDNAME parameter is used, the following are valid options: P=Only passing links are written (Default) S=Only suspect links are written B=Both passing and suspect links are written

MAX_WINDOW_SIZE numeric Controls how many records are added to the Match * window. If there are more records of one window key than the value of this parameter, additional windows are created for the remaining records. For example, if you have 1000 records and set this parameter to 500, additional Match windows are created for the remaining records.

Records do not get compared between Match windows. If windows split, details about the split windows are displayed to STDERR.

MAX_WINDOW_SIZE_ Y, N Turns off printing of split window information. DETAILS

MAXIN numeric Maximum number of records to read; if blank, all records in the file are read.

NAME_FORM_FIELD field name Populated by the Parser and used by the Matcher to distinguish between a Retail and Commercial record. Default field name is pr_nmform_01. REQUIRED

Matcher 7-16 Matcher Parameters

Table 7.1 Matcher (CFMATDRV) Parameters (Continued)

Name Values Description

OUT_DDNAME file name Name of the Matcher output file. REQUIRED

OVERRIDE_NMFORM R, C Overrides the NAME_FORM_FIELD value. R=Records loaded into Retail matching window C=Records loaded into Commercial matching window.

PRINT_NTH_COUNT numeric Prints the count of every nth records read. If 0 or not specified, no counts are printed.

REF_DDNAME file name Identifies the file name of the reference file. Used with Reference matching only.

REFHHPAT field name DDL field where household pattern IDs are written for output. Used for Reference matching only.

REFINPAT field name DDL field where individual pattern IDs are written for output. Used for Reference matching only.

REFMAT_DDNAME file name A second output file for the reference match; contains all records from the reference file that had a matching record in the transaction file.

SECOND_SIDE_TRN Y Y – creates a recriprocal link and writes to the output file specified in the parameter LINK_DDNAME.

SELBYP_ file name Name of the parameter file that contains the rules for PARMNAME applying bypass and select logic. See “Using Record Select and Bypass Functionality with the Matcher” on page 7-21. REQUIRED

SELBYP_LOGFNAME file name Name of the log file that contains the statistics of the applied bypass and select rules for the processed data. See “Using Record Select and Bypass Functionality with the Matcher” on page 7-21. REQUIRED

SKIP_WINDOW_ Y Y – perform reference matching only. Do not perform MATCH a subsequent window match.

START numeric Indicates the record number from which to start processing.

STAT_DDNAME file name Name of the statistics file. REQUIRED

Trillium Software System™ Batch User’s Guide Matcher Parameters 7-17

Table 7.1 Matcher (CFMATDRV) Parameters (Continued)

Name Values Description

SURVIVOR HHLD or Allows the survivor flag that is set at the individual INDV level to be placed into the household survivor flag location. This doesn’t change the survivor selection criteria. HHLD=Place the flag in household survivor flag location. INDV=Leave the individual level survivor flag in the individual survivor flag location.

TABLE_DDNAME* file name File containing the table that stores large window key values. Used to limit the number of records added to the Match window in order to minimize memory requirements and optimize performance. Table entries use the following format: Two bytes for the window key number, followed by the Window key value.

TABLE_LRECL numeric Record length of the table that contains large window key entries. Length must include any platform-specific control characters.

TIE_BREAKER_FIELD_ field name DDL field name that contains data to determine when NAME* to stop adding records to the Match window. Records are added until this field contains a value that does not match the records previously added. Then the match is executed for the current set of records.

TRAN_INFO_FIELD_ field name Contains a field used to copy data from the NAME transaction record to the reference record on the output file listed in the REFMAT_DDNAME parameter. Use when necessary to determine which transaction record the reference record matched. For example: TRAN_INFO_FIELD_NAME LIST_ID

Matcher 7-18 Handling Large Window Keys

Table 7.1 Matcher (CFMATDRV) Parameters (Continued)

Name Values Description

TRANSITIVE_MODE Y Y=Enables matched records to remain in the window for subsequent matching. With window matching, it is possible that A matches B, and A does not match C due to differences in data elements, but B matches C if B and C were being compared. Invoking this option allows B to be compared to C. This parameter potentially increases the time it takes to match due to more matching comparisons being performed. If the LINK_DDNAME output is being created, all additional matching links created by this parameter are generated from the base record in the window, not the record it matched to. For example: A matches B creates link from A --> B B matches C creates link from A --> C A does not match C If set to Y, don’t use propagation.

* These Matcher parameters allow you to add information to your window key. This is useful for keys that contain large amounts of records.

Handling Large Window Keys

When the Matcher runtime seems very slow, or it looks as though no progress is being made, the cause is usually a large number of window keys of the same value. When this occurs, the Matcher will have many more comparisons to make and that may slow the match process. This condition can cause the process to stop all together. But there are a few options to improve the situation.

Creating Unique Window Keys

The first option would be to make the window key more unique. CFWINKEY allows you to create your own custom window keys that qualify records as potential matches. Based on your business requirements, you can include any data component in the window key that will diversify the key, resulting in fewer

Trillium Software System™ Batch User’s Guide Using a Window Key Table 7-19

records with the same key.

Using a Window Key Table

Another option is to use the following Matcher parameters: TABLE_DDNAME TABLE_LRECL TIE_BREAKER_FIELD_NAME LARGE_WINDOW_KEY_NUMBER

These parms ensure that an "intelligent" match comparison is performed within a window key value. These parameters add information to particular window keys that you know contain a large number of records.

These parameters work in conjunction with the TUWKANAL utility. The following example shows you how to use those parameters:

1. Run TUWKANAL against the data, and you receive the output shown below indicating that the top two windows exceed 1000. We will assume 1000 is the maximum number of records allowed to have the same window key. TOP 2 WINDOWS 028ALLST1 1523 028BLLST1 1300

2. Add the entire window key value to a new table, to be used by the Matcher at runtime. In this example, this table is named wkeytable and is defined by the parameters TABLE_DDNAME and TABLE_LRECL. 3. Set the TIE_BREAKER_FIELD_NAME parameter to contain the DDL field name of a data element that distinguishes records within the given window key. The Matcher will add the data contained in this field onto the window key at runtime in order to make the window key more unique. This will improve the speed of the Matcher significantly. 4. Set the LARGE_WINDOW_KEY_NUMBER parameter to contain the window key number from the create window key process.

Matcher 7-20 Using a Window Key Table

For this example, pfmatdrv.par would contain the following parameters:

TABLE_DDNAME ..\tables\ wkeytable TIE_BREAKER_FIELD_NAME pr_best_nbr LARGE_WINDOW_KEY_NUMBER 01 TABLE_LRECL 11

The table wkeytable contains the following entries: 01028ALLST1 01028BLLST1

The first 2 characters of an entry in the table (“01”) refer to the number specified in LARGE_WINDOW_KEY_NUMBER. The remainder of the entry is the window key itself as returned from TUWKANAL. The TABLE_LRECL parameter specifies the length of each entry in the table itself.

Depending upon what information you decide to "add" to the window key, you will need to edit and rerun your sort for the Matcher to avoid splitting records between windows.

For example, if you specify pr_best_nbr as the TIE_BREAKER_FIELD_NAME, you would need to add pr_best_nbr as the second field in the sort directly following the window_key field.

Original sort: window_key_01(a), seqno(a), pr_name_number_01(a) New sort including additional field: window_key_01(a), pr_best_nbr(a), seqno(a), pr_name_number_01(a)

Trillium Software System™ Batch User’s Guide Using Record Select and Bypass Functionality with the Matcher 7-21

Using Record Select and Bypass Functionality with the Matcher

The parameters SELBYP_PARMNAME and SELBYP_LOGFNAME are used to define a subset of parameters that enable the filtering of data as part of the data conversion process. By including these parameters in the parameter file, users can define the criteria for selecting and/or bypassing specified records to efficiently filter data in conjunction with the Matcher (CFMATDRV).

This functionality only can be enabled through direct parameter file editing. No support for this parameter construction is available through the Control Center.

SELBYP_PARMNAME Name of the parameter file that contains the rules to apply to the bypass and select logic. (these rules are in the special rules file) SELBYP_LOGFNAME Name of the log file that will contain the statistics of the applied bypass and select rules for the data processed.

Rules File Parameters and Syntax Rules

The rule file defined by SELBYP_PARMNAME includes instructions that enable the select and/or bypass functionality. This file uses the parameters below. Table 7.2 Rule File Parameters

Parameter Description

INPUT_SELECT Used to select a record for subsequent processing. Must always contain the keyword ‘LIST’ and be followed by one of the associated DDNAME parameters. For example: INPUT_SELECT LIST INP_DDNAME01 NODE (line_02 NESC ‘BOSTON’)

INPUT_BYPASS Used to bypass a record so that no processing occurs. Must always contain the keyword ‘LIST’ and be followed by one of the associated DDNAME parameters. For example: INPUT_BYPASS LIST INP_DDNAME01 NODE (line_02 NESC ‘BOSTON’)

Matcher 7-22 Rules File Parameters and Syntax Rules

Table 7.2 Rule File Parameters

Parameter Description

OUTFILES This must always contain the keyword ‘LIST’ and be followed by one of the associated DDNAME parameters. For example: OUTFILES LIST OUT_DDNAME01 NODE (LINE_02 NESC ‘BOSTON’)

REF_DDNAME Contains both input select and input bypass logic. Must always contain the keyword ‘NODE’ followed by the selection or bypass logic. Specifies position and length (or DDL field name) of the field from the associated input file to scan for a literal string. String must be enclosed in single quotation (‘ ’) marks. For example: REF_DDNAME NODE (LINE_02 NESC ‘BOSTON’) REQUIRED only for a Reference match.

INP_DDNAME Contains both input select and input bypass logic. Must always contain the keyword ‘NODE’ followed by selection or bypass logic. Specifies position and length (or DDL field name) of the field from the associated input file to scan for a literal string. String must be enclosed in single quotation (‘ ’) marks. For example: INP_DDNAME01 NODE (LINE_02 NESC ‘BOSTON’) REQUIRED

OUT_DDNAME Contains select logic, as well as the file name of where to store processed records. This must always contain the keyword ‘NODE’ followed by the selection or bypass keyword, followed by the logic statement. Specifies the position and length (or DDL field name) of the field to scan for a literal string. String must be enclosed in single quotation (‘ ’) marks. For example: OUT_DDNAME NODE SELECT=(1,3 EQ ‘JOY’) FILE=’...\DATA\OUT1.DATA’ REQUIRED

REFMAT_DDNAME Contains select logic, as well as the file name of where to store processed records. Must always contain the keyword ‘NODE’ followed by the selection or bypass logic. Specifies the position and length (or DDL field name) of the field from the associated input file to scan for a literal string. String must be enclosed in single quotation (‘ ’) marks. For example: REFMAT_DDNAME NODE (LINE_02 NESC ‘BOSTON’) REQUIRED only for a Reference match.

Trillium Software System™ Batch User’s Guide Comparison Operators 7-23

Comparison Operators The following table lists the supported operators that can be used in rule file parameter entries.

Operator Definition

EQ Equal to

OR OR condition

AND AND condition

GE Greater than or equal to

GT Greater than

LE Less than or equal to

LT Less than

EQSC Equal scan (for a literal that is equal to the length of the field)

NESC Not Equal scan (for a literal that doesn’t equal the length of the field)

Rule File Entries Examples

The following INPUT_SELECT example indicates to include only records in which the name ‘JOHN’ is in DDL field name line_01. The EQSC operator indicates to scan the entire field for the string ‘JOHN’ and if found, select this record for further processing:

INPUT_SELECT LIST INP_DDNAME NODE (line_01 EQSC 'JOHN')

Matcher 7-24 Rule File Entries Examples

The following INPUT_BYPASS example indicates to bypass all records from the first input file defined in the Converter driver parameter file that meets any of the following criteria: 1. The DDL field name line_01a contains the text MR JOHN C. 2. Position 1 for two bytes contains NR, or position 3 for two bytes contains HN. 3. Position 1 for two bytes contains MR:

INPUT_BYPASS LIST INP_DDAME NODE (line_01a EQ ‘MR JOHN C’) NODE (1,2 EQ ‘NR’ OR 3,2 EQ ‘HN’) NODE (1,2 EQ ‘MR’)

In the following example, OUT_FNAME indicates to output selected records from the first SELECT operation to the file out.data:

OUTFILES LIST OUT_DDNAME NODE SELECT=(1,3 EQ ‘JOY’), FILE=’..\DATA\OUT.DATA’

Trillium Software System™ Batch User’s Guide Sample Matcher Parameter File 7-25

Sample Matcher Parameter File The following parameter file runs the Matcher in reference match mode: ************************************************* * PFMATDRV - Matcher Driver parameter file ************************************************* MAXIN MAX_WINDOW_SIZE 300 DDL_INP_FNAME ..\dict\parsout.ddl DDL_INP_RNAME PARSOUT DDL_MAT_FNAME ..\dict\matchret.ddl DDL_MAT_RNAME MATCHRET DDL_OUT_FNAME ..\dict\parsout.ddl DDL_OUT_RNAME PARSOUT INP_DDNAME ..\data\geout.srt REF_DDNAME ..\data\geout.ref OUT_DDNAME ..\data\maout.ref STAT_DDNAME ..\data\mastat.txt HHLD_PATS_FNAME ..\parms\pfhhpats HHLD_FLDS_FNAME ..\parms\pfhhflds INDV_PATS_FNAME ..\parms\pfinpats INDV_FLDS_FNAME ..\parms\pfinflds COMM_PATS_FNAME ..\parms\pfcopats COMM_FLDS_FNAME ..\parms\pfcoflds COMN_FLDS_FNAME ..\parms\pfflds NAME_FORM_FIELD pr_nmform_01 CAND_CODE_FIELD window_key_01 CMI_FNAME ..\parms\cmifile BASE 0 CYCLE 95088 SURVIVOR INDV

The last line of all parameter files must contain a carriage return and/ or line feed so the system can process the last Parameter/table entry in the file. If omitted, a Parameter Processing error status=9 occurs.Do not use tabs in parameter files; only spaces are valid. If tabs are used, a Parameter Processing error status=4 occurs.

Matcher 7-26 About Matching Levels

About Matching Levels

The matching process includes the ability to match at three levels:

Household Individual Commercial

Both suspect (possible matches) and matched (definite matches) are available at all levels. The Matcher supports a different set of parameters for personal matching versus commercial matching, and for individual matching versus household matching. There must be a household match for an individual match to be attempted.

Matcher Input 1 Field Comparison Routine Lists

Field Comparison Routine List files contain entries that tell the Matcher what fields to compare between two records, what matching algorithm to apply to those fields, and how to convert the returned score into a letter grade. The grade is then used with other field comparison results to create a pattern that is used to determine the overall match result of the records being compared.

Matcher Field/Comparison parameter files

pfccflds.par Common Data Segment Field List parameter file

pfcoflds.par Commercial Household Field List parameter file

pfhhflds.par Retail Household Field List parameter file

pfinflds.par Retail Individual Field List parameter file

Special operations like Matcher Early Exit and Propagation are also defined in these files. There is one Field Comparison Routine List file for each level of matching. (For example, household, individual and commercial match comparisons each use a separate file.)

Trillium Software System™ Batch User’s Guide Defining Field/Comparison Routine List Entries 7-27

Defining Field/Comparison Routine List Entries There are two ways to define the same Field/Comparison Routine List entry:

Example 1 SCORES(100,95,92,00,00) ROUTINES(APTNO,NONZERO) FLDNAMES(HH_NUM)

Example 2 SCORES(100,95,92,00,00) ROUTINES(APTNO,NONZERO) FLDLOCS(034,010)

The following rules apply to field/comparison routine list entries:

All keyword value pairs must be separated by a space.

All values must be followed by a comma.

There can be up to fifty entries specified in the list.

All Field Position, Field Length pairs must be enclosed in parentheses.

The table below explains each syntax part in the preceding parameter entry and all options that can be applied.

Keyword Format and valid values SCORES Up to five numeric grade thresholds, separated by commas, can be included: SCORES(100,95,92,00,00) ROUTINES(APTNO,NONZERO) FLDNAMES(HH_NUM) The first four values must be positive. The last value may be negative (refer to the discussion of Match-Testing Early Exit at the end of this section). A minimum of one score must be given. ROUTINES Up to two routine names (character), separated by commas, and in the format: routines(comparison routine, optional propagation routine) SCORES(100,95,92,00,00) ROUTINES(APTNO,NONZERO) FLDNAMES(HH_NUM) A complete list of Matcher Routines can be found in the section “Using Parmvals with the Matcher Comparison Routines” later in this chapter. Valid Propagation Routine Names are: NONBLANK, NONZERO The first routine name specifies the comparison routine; the second specifies the propagation routine.

Matcher 7-28 Matcher Input 2

Keyword Format and valid values FLDNAMES Up to three character field names, separated by commas, in the format: fldnames(field name from DDL, field name from DDL) SCORES(100,95,92,00,00) ROUTINES(APTNO,NONZERO) FLDNAMES(HH_NUM) Field names are used only when a DDL is used, and must be exactly the same as the DDL listing. Fldnames are not used when fldlocs are used. FLDLOCS Up to three positive numeric field positions, field lengths pairs. Pair entries, as well as pairs, are separated by commas, in the format: fldlocs(Field Starting Position,Field Length,Field Starting Position,Field Length) SCORES(100,95,92,00,00) ROUTINES(APTNO,NONZERO) FLDLOCS(034,010) These name the starting positions and lengths of the fields specified. A minimum of one position/length combination must be given. Fldlocs are not used when fldnames are used. PARMVAL Special options that go along with the match routines. BUSINESS_NAME SCORES(100,90) ROUTINE(BUSNAME)FLDNAMES (PR_BUSNAME_01) PARMVAL(COMPACT) See “Using Parmvals with the Matcher Comparison Routines” section for more information.

Matcher Input 2

Grade Pattern Lists

Numeric Grades, Grade Patterns and Field/Comparison Routine Listing

For each list of Field Comparison Routines, there must be a corresponding Grade Pattern List. The numeric grades generated as the result of matching, and based on the grade level parameters, are translated into alpha (letter) grades.

A pattern of scores is developed from each data element comparison alpha letter grade. Those scores determine a pass/suspect/fail state for the pair of records by searching the pass/suspect/fail table. For individual scoring derived from the comparison routines, see “Using Parmvals with the Matcher Comparison Routines” later in this chapter.

Trillium Software System™ Batch User’s Guide Grade Pattern List Syntax 7-29

Matcher Grade Pattern parameter files:

pfcopats.par Commercial Household Pattern List parameter file

pfhhpats.par Retail Household Pattern List parameter file

pfinpats.par Retail Individual Pattern List parameter file

Grade Pattern List Syntax The format for each entry is:

Position Value 1 Single character Pattern Category (P-PASS, S-SUSPECT, F-FAIL): P110AABA 2-4 Unique three-character numeric Pattern Identifier: P110AABA 5 and up Character Grade Pattern (valid grades are A,B,C,D, E and a dash as a wildcard character ( – ) ) P110AABA The number of grades in a pattern must equal the number of Comparison Fields as specified in the Comparison Routine List. Delimiters should not be specified between the values.

Up to 1000 entries can be specified in a Grade Pattern list.

Matcher 7-30 Business Grade Pattern List Example

Business Grade Pattern List Example

P110AABA P110AABA Record passed and received a score of 100 (an A) P118AA-A on the business name, the street name and the S122AB-A box number, and a score of 98 (a B) on house F141BABB number. P118AA–A Record passed and matched perfectly on business name, street name and box number, while ‘-’ (a wildcard) indicates that any answer here would be accepted. S122AB–A Record is a suspect match, matching perfectly on business name and box number, getting a score of 96 (a B) on street name and a wildcard for house number. F141BABB Record failed with a score of B on business name, house number, and box number and an A(100) on street name. The corresponding Business Comparison Field Routine looks like this:

BUSINESS_NAME SCORES(100,90,80) ROUTINES(BUSNAME) FLDLOCS(3451,100) STREET_NAME SCORES(100,96,80) ROUTINES(STREETS) FLDLOCS(2467,25) HOUSE_NUMBER SCORES(100,98,90) ROUTINES(HOUSENO) FLDLOCS(1475,15) BOX_NUMBER SCORES(100) ROUTINES(ABSOLUTE) FLDLOCS(1734,10)

User-defined grade patterns determine the criteria that determine whether the match is considered a pass, suspect or fail.

Trillium Software System™ Batch User’s Guide Matching Prevention 7-31

Matching Prevention

There are instances where a user might choose to prevent a match from occurring between records that have equal field values. For example, a user might choose to allow candidate records that have the same account number field value to remain unmatched. However, this objective cannot always be met using pattern list manipulations

Transitivity occurs when two records are matched together indirectly via a third record. To achieve matching prevention, you need to override the matching propagation feature by using the PREVENT match routine.

Using the PREVENT Match Routine

Matching prevention is activated through the comparison routine PREVENT in the field list parameter file. The field to which this name is attached is the field that will prevent a match from occurring when an equal field value is located on any other record in the candidate match group. The Matching Prevention feature can be employed during Household, Individual, and/or Commercial Matching.

An example of a Field list entry to prevent matching is: FLDNAMES(ACCT_NUMBER) ROUTINES(PREVENT)

This feature requires a positive action (pass or fail) via the matching patterns. For this reason, PREVENT can be used positively to force a match when any member of the match group has the same value. See the section “PREVENT Routine” on pages 7-152 for more information.

Only one PREVENT can be used per field list.

Matcher 7-32 Matching Propagation

Matching Propagation

Transitivity Transitivity occurs when two records are matched together indirectly through a third record. Therefore, a matching Propagation feature is included in the Matcher. Example

A. JOHN SMITH 15 MAIN ST B. JOHN SMITH 15 MAIN ST APT 1 C. JOHN SMITH 15 MAIN ST APT 2

Propagation

If ’A’ versus ’B’ is allowed to match and ’A’ versus ’C’ is allowed to match, then ’B’ and ’C’ would be matched, which generally would not be desirable. The solution to this problem is through propagation.

When ’A’ is missing the apartment number and matches ’B’ containing an apartment number, ’A’ gets a temporary copy of ’B’s apartment number for future comparison causing the comparison of ’A’s apartment number vs. ’C’s apartment to fail. Example

A. JOHN SMITH 15 MAIN STAPT 1 (TEMPORARY COPY FROM ‘B’) B. JOHN SMITH 15 MAIN STAPT 1 C. JOHN SMITH 15 MAIN STAPT 2

Valid propagation routines are: NONBLANK, NONZERO.

Trillium Software System™ Batch User’s Guide Determining the Minimal Occurrence Influence 7-33

Determining the Minimal Occurrence Influence

The concept of minimal occurrence influence (MOI) takes into consideration the frequency that a data element occurs across a group of potential match candidates. The user can perform a frequency of the data field before records are to be matched, and append a “0” to every record which they determine has a high probability of matching to another record. This “0” flags the record as a “high probability match.”

During the match process, if two records are compared, and both records contain the “probabilistic flag” (0), and an “A” grade is returned as a result, the Matcher will convert the “A” grade to an “M”. If there is no pattern with “M”, then the “M” will be treated as an “A.”

All match routines (except SUBSTRING) can include an “MOI” element to the match decision process. (Match Routines are explained in the section “Using Parmvals with the Matcher Comparison Routines” later in this chapter.)

To include this option, the entry in the Matcher field list parameter file for the match routine is modified to include the DDL field name that contains the “probabilistic flag.”

As an example, the field list parameter file entry for the FRSTNAME match routine using the probabilistic flag would be as follows:

FIRST_NAME SCORES(98,96,90) ROUTINES(FRSTNAME) FLDNAMES (pr_first_display_01,P_CODE_FIELD)

In this example, the DDL field name p_code_field will, on selected records, contain the code (0). For example, if the code was assigned as shown below, and the comparison of the first name fields returned an “A”, then the grade would be converted to a “M” and a pattern using this grade will be looked up.

DDL Field pr_first_display p_code_field Record 1 John Record 2 Jhona 0

Matcher 7-34 Match-Testing Early Exit

Match-Testing Early Exit

A negative value for the ‘E’ grade threshold in a Field/Comparison Routine List entry activates the Match-Testing Early Exit facility. The speed of the Matcher System execution can be increased by the use of this facility.

Essentially, whenever the score returned from the comparison routine in the Routine List entry falls below the ABSOLUTE VALUE of the “E” grade threshold, all subsequent Match Testing for the two records being evaluated is suspended.

The match pattern being constructed for the current two records is filled with 'F' (fail) values from this field to the end of the Field/Comparison Routine List.

The ABSOLUTE VALUE of the E grade threshold value must be less than the D grade threshold value.

The following is a Field/Comparison Routine listing showing early exit on the first field, FIRST_NAME, if the matching score falls below 90.

FLDNAMES(FRST_NAME,SOC_SEC_NO) ROUTINE(FRSTNAME,SOCSEC) SCORES(99,98,96,95,-90)

Trillium Software System™ Batch User’s Guide Matcher Output 1 7-35

Matcher Output 1

Matcher Return Fields The table below lists the Matcher default fields. These field names appear in the standard parsout DDL under the re-defined field, MAT_RECORD.

Commonized means the name field was copied from the survivor record onto all members of the group (household, individual, or suspect).

Commonization uses fields listed in pfccflds.par.

Matched and Suspect patterns are assigned using the Matcher Pattern Parameter Files (pfhhpats.par, pfinpats.par, and pfccpats.par).

These files are used in conjunction with the Matcher Field Parameter Files (pfhhflds.par, pfinflds.par, and pfcoflds.par).

Field Length Description Matcher_Common_Data 108 Data is re-defined by the following 17 fields: Hhld_Branch_Number 4 Commonized “branch number” for household. Hhld_Last_Name 13 Commonized “last name” for household. Hhld_Street_Name 15 Commonized “street_name” for household. Hhld_House_Number 4 Commonized “house_number” for household. Hhld_Postal_Code 9 Commonized “postal_code” for household. Hhld_Number_Members 8 Number of members in household group. Hhld_Prime_Flag 1 Household survivor flag: 1=SURVIVOR;0=NON-SURVIVOR Indv_Branch_Number 4 Commonized “branch number” for individual. Indv_First_Name 5 Commonized “first name” for individual. Indv_Postal_Code 9 Commonized “postal_code” for individual. Indv_Number_Members 8 Number of members in individual groups. Indv_Prime_Flag 1 Individual survivor: 1=SURVIVOR; 0=NON-SURVIVOR Susp_Branch_Number 4 Commonized “branch number” for suspect. Susp_First_Name 5 Commonized “first name” for suspect. Susp_Postal_Code 9 Commonized “postal_code” for suspect. Susp_Number_Members 8 Number of members in suspect group. Susp_Prime_Flag 1 Suspect survivor: 1=SURVIVOR; 0=NON-SURVIVOR

Matcher 7-36 Matcher Output 2

Field Length Description Matcher Link Data – Uses the following 12 fields Window_Number 8 Unique number appended to records with unique window key. Matched_Hhld 8 Unique number appended to records matched (P) at the HOUSEHOLD level. Matched_Indv_In_ 8 Unique number appended to records matched (P) at Matched_Hhld INDIVIDUAL level. Suspect_Indv_In_ 8 Unique number appended to records suspected (S) at Matched_Hhld INDIVIDUAL level. Suspect_Hhld 8 Unique number appended to records suspected (S) at the HOUSEHOLD level. Matched_Indv_In_ 8 Unique number appended to records matched (P) at the Suspect_Hhld INDIVIDUAL level (within suspect HOUSEHOLDS). Suspect_Indv_In_ 8 Unique number appended to records suspected (S) at the Suspect_Hhld INDIVIDUAL level (within suspect HOUSEHOLDS). Record_Number 8 Record number assigned by Matcher during file read. Matched_Hhld_Pattern 3 Pattern which determined match (P) at HOUSEHOLD level. Suspect_Hhld_Pattern 3 Pattern which determined suspect (S) at HOUSEHOLD level. Matched_Indv_Pattern 3 Pattern which determined match (P) at INDIVIDUAL level. Suspect_Indv_Pattern 3 Pattern which determined suspect (S) at INDIVIDUAL level.

Matcher Output 2

Matcher Summary Statistics Report The matching process generates a Summary Statistics file (mastat) on both personal and commercial window statistics and the grade patterns. The user specifies whether or not to produce the summary statistics through input parameters. If summary statistics are desired, the following are displayed for both retail and commercial windows when the Matcher is closed.

Trillium Software System™ Batch User’s Guide Matcher Summary Statistics Report 7-37

TRILLIUM SOFTWARE SYSTEM NAME MATCHER branch_number fldlocs(2544,4) business_name fldlocs(1914,100) first_name fldlocs(1598,15) gender fldlocs(1813,1) house_number fldlocs(679,15) middle_name fldlocs(1628,15) last_name fldlocs(1703,30) postal_code fldlocs(1511,9) street_title fldlocs(720,25)

9 COMMON DATA SEGMENT FIELDS last_name scores(100,91) routines(spelling) fldlocs(1763,30) postal_code scores(100,95,90) routines(postcode) fldlocs(237,7) house_number scores(100,98,85) routines(houseno,nonblank) fldlocs(679,15) street_name scores(100,89,80) routines(streets) fldlocs(720,25) box scores(98) routines(houseno,nonblank) fldlocs(942,10) complex_1 scores(89,80) routines(streets,nonblank) fldlocs(978,25) complex_2 scores(80) routines(streets,nonblank) fldlocs(1058,25) dwelling scores(98) routines(aptno,nonblank) fldlocs(1153,10) city scores(100,0) routines(partial1) fldlocs(1421,30) hnumb_apt scores(100,99,98) routines(substrng) fldocs(1153,10,679,15)

10 HOUSE COMPARISON FIELDS seqno scores(100) routines(prevent) fldlocs(3040,9) title scores(100,85,0,0,0) routines(prefix) fldlocs(1613,15) f_name scores(100,98,90,0,0) routines(frstname) fldlocs(1658,15) first_2on scores(100,0,0,0,0) routines(frstname) fldlocs(1659,14) mid_nme scores(100,99,98,90,0) routines(frstname) fldlocs(1688,15) d_o_b_yr scores(100,65,0,0,0) routines(partial1) fldlocs(2950,4) d_o_b_md scores(100,65,0,0,0) routines(partial1) fldlocs(2954,4) nino scores(100,65,0,0,0) routines(partial1) fldlocs(2930,20) gender scores(100,65,0,0,0) routines(partial1) fldlocs(1873,1) parmval(MF) froot_name scores(100,96,90) routines(frstname) fldlocs(1643,15)

Matcher 7-38 Matcher Summary Statistics Report

10 INDIVIDUAL COMPARISON FIELDS business_name scores(100,90,75) routines(busname) fldlocs(1874,100) postal_code scores(100,95,90) routines(postcode) fldlocs(237,7) house_number scores(100,98,85) routines(houseno,nonblank) fldlocs(679,15) street_name scores(89,80) routines(streets) fldlocs(720,25) box scores(98) routines(houseno,nonblank) fldlocs(942,10) complex_1 scores(88,80) routines(streets,nonblank) fldlocs(978,25) complex_2 scores(80) routines(streets,nonblank) fldlocs(1058,25) dwelling scores(98) routines(houseno,nonblank) fldlocs(1153,10) city scores(100,0) routines(partial1) fldlocs(1421,30)

9 BUSINESS COMPARISON FIELDS STATISTICS FOR RETAIL WINDOWS

12,015 TOTAL WINDOWS

19,869 TOTAL RECORDS

13,535 MATCHED HOUSEHOLD GROUPS 16,504 MATCHED INDIVIDUAL GROUPS IN MATCHED HOUSEHOLD GROUPS 16,285 SUSPECT INDIVIDUAL GROUPS IN MATCHED HOUSEHOLD GROUPS 13,378 SUSPECT HOUSEHOLD GROUPS 16,483 MATCHED INDIVIDUAL GROUPS IN SUSPECT HOUSEHOLD GROUPS 16,259 SUSPECT INDIVIDUAL GROUPS IN SUSPECT HOUSEHOLD GROUPS

0 SHORT WINDOWS BECAUSE OF MEMORY 0 TOTAL RECORDS IGNORED

0 TOTAL DELETES 0 TOTAL BAD DELETES 0 TOTAL SINGLETON DELETES

14 MAX WINDOW SIZE 1.65 AVERAGE WINDOW SIZE

Trillium Software System™ Batch User’s Guide Matcher Summary Statistics Report 7-39

STATISTICS FOR COMMERCIAL WINDOWS

23 TOTAL WINDOWS

32 TOTAL RECORDS

29 MATCHED HOUSEHOLD GROUPS 28 SUSPECT HOUSEHOLD GROUPS

0 SHORT WINDOWS BECAUSE OF MEMORY 0 TOTAL RECORDS IGNORED

0 TOTAL DELETES 0 TOTAL BAD DELETES 0 TOTAL SINGLETON DELETES

3 MAX WINDOW SIZE 1.39 AVERAGE WINDOW SIZE

PATTERN HOUSEHOLD PATTERN GRADE ID MATCH HITS VALUE

F 999 13 -D------B- P 102 305 AAAAAAAA-- P 103 1 ABAAAAAA-- P 104 5 AAABAAAA-- P 105 0 ABABAAAA-- P 106 3 BAAAAAAA-- P 107 0 BBAAAAAA-- P 108 0 BAABAAAA-- P 109 0 BBABAAAA-- P 110 5,410 AAAAABAA-- P 111 9 ABAAABAA-- P 112 67 AAABABAA-- P 113 2 ABABABAA-- P 114 20 BAAAABAA-- P 115 0 BBAAABAA-- P 116 1 BAABABAA-- P 117 0 BBABABAA-- S 118 10 AABCABAA-- S 119 5 ABBCABAA--

Matcher 7-40 Matcher Summary Statistics Report

PATTERN INDIVIDUAL PATTERN GRADE ID MATCH HITS VALUE

F 999 2,529 A------P 256 0 -BDBDBBAAA P 257 2,081 -AAAAAABA- P 258 469 -AAABAABA- P 259 48 -AAACAABA- P 260 17 -AAADAABA- P 261 63 -AABAAABA- P 262 19 -AABBAABA- P 263 1 -AABCAABA- P 264 1 -AABDAABA- P 265 0 -ABAAAABA- P 268 0 -ABADAABA- P 269 0 -ABBAAABA- P 270 1 -ABBBAABA- P 271 0 -ABBCAABA- P 274 0 -ACABAABA- P 275 0 -ACACAABA- P 276 0 -ACADAABA- P 277 72 -ACBAAABA- P 278 15 -ACBBAABA- P 279 9 -ACBCAABA- P 280 7 -ACBDAABA- P 281 0 -ADAAAABAA S 992 0 ------A-- S 993 27 -----AA--- S 994 0 -----BA--- S 995 176 --A------

PATTERN BUSINESS PATTERN GRADE ID MATCH HITS VALUE F 999 0 -D------B P 102 3 AAAAAAAA- P 103 0 ABAAAAAA- P 104 0 AAABAAAA- P 105 0 ABABAAAA- P 106 0 BAAAAAAA- P 107 0 BBAAAAAA- P 108 0 BAABAAAA- P 109 0 BBABAAAA- P 110 0 AAAAABAA- P 111 0 ABAAABAA- P 112 0 AAABABAA- P 113 0 ABABABAA- P 114 0 BAAAABAA-

Trillium Software System™ Batch User’s Guide Matcher Summary Statistics Report 7-41

P 118 0 AABAAAAA- P 119 0 ABBAAAAA-

COUNT OF RETAIL INDIVIDUAL MATCHES BY GROUP SIZE # OF MEMBERS # OF GROUPS TOTAL RECORDS 1 13,590 13,590 2 2,572 5,144 3 255 765 4 71 284 5 11 55 6 4 24 7 1 7 ------28 16,504 19,869

COUNT OF RETAIL INDIVIDUAL SUSPECTS BY GROUP SIZE # OF MEMBERS # OF GROUPS TOTAL RECORDS 1 13,163 13,163 2 2,707 5,414 3 290 870 4 80 320 5 13 65 6 5 30 7 1 7 ------28 16,259 19,869

COUNT OF COMMERCIAL INDIVIDUAL MATCHES BY GROUP SIZE # OF MEMBERS # OF GROUPS TOTAL RECORDS 1 26 26 2 3 6 ------3 29 32

COUNT OF COMMERCIAL INDIVIDUAL SUSPECTS BY GROUP SIZE # OF MEMBERS # OF GROUPS TOTAL RECORDS 1 24 24 2 4 8 ------3 28 32

Matcher 7-42 Details for the Summary Statistics Report

Details for the Summary Statistics Report

The Matcher Summary Statistics Report includes the following elements: Heading Trillium Software, program version, date of printing and time Echoed Input Files

Field Echoes the parameter file that determines... Common Data Segment Common Data Fields House Comparison Household Comparison Fields Individual Comparison Individual Comparison Fields Business Comparison Business Comparison Fields

Statistics from Matcher for Matcher Windows

Total Windows Indicates the number of Match windows for this run (a Match window indicates a group of records with the same window key) Total Records Total number of retail or commercial records in this run

Retail and Commercial A Matched Household or Suspect Household Group is categorized as either:

Retail Number of matched reclamations Commercial Number of matched relationships constructed through analysis of constructed through analysis of household fields

Trillium Software System™ Batch User’s Guide Retail and Commercial 7-43

Groups Type Description Matched Individual Groups R Number of matched individual relationships Matched Household Groups R Within households constructed by analyzing individual fields. Suspect Individual Groups in number R Relationships within matched household of matched and suspect individual groups. Matched Household Groups Matched Individual Groups in number R Within suspect households constructed by of matched individual relationships analyzing individual fields. Suspect Household Groups Suspect Individual Groups in number R Suspect household groups constructed by of suspect individual relationships in analyzing individual fields. Suspect Household Groups Max Window Size R Largest number of records encountered with the same window key. Average Window Size R Average number of records encountered with the same window key.

* R=Retail only

Grade Pattern Hits

Grade Pattern List Scores:

‘Scorecard’ for Household, Individual, and Business pattern hits.

Lists the Grade Pattern ID number, the actual scores and the pattern value for this match run.

Matcher 7-44 Retail and Commercial

Retail Individual Matches by Group Size COUNT OF RETAIL INDIVIDUAL MATCHES BY GROUP SIZE

# OF MEMBERS # OF GROUPS TOTAL RECORDS 1 13,590* 13,590 2 2,572** 5,144 3 255*** 765 4 71 284 5 11 55 6 4 24 7 1 7 ------28 16,504 19,869

* Indicates that 13,590 records were ‘singletons’, not matching to any other record. ** Indicates that 2,572 records matched to one other record. *** Indicates that 255 records matched to two other records, and so on.

Count of Retail Individual: Same as above with the exception of Suspects by Group Size:counting retail suspect records.

Count of Commercial: Same as above with the exception of Individual Matches by counting commercial records.

Group Size: Same as above with the exception of Individual Suspects by counting commercial suspect records Group Size.

Trillium Software System™ Batch User’s Guide Commonizer Function 7-45

Commonizer Function

The Commonizer function of the Matcher can provide up to three different services on a household and/or individual basis for a matched set of records. All commonization is performed after records have successfully matched. The user can control the selection and location of the commonized data though parameter entries in the create common parameter file (pfccflds.par).

Available for Window Matching only.

This functionality can be implemented at the matched individual or matched commercial level. No service is provided for the household portion of a personal match. However, if the individual field list and pattern parameter files are not used by the Matcher, only a household match could take place. In this event, the commonization would happen at the household level.

The Commonizer performs three functions:

Selects a Survivor – Automatically selects the record survivor according to survivor selection rules. This function flags a single record at the household and individual levels indicating the “best” record of the matched set.

Creates Standard Common Data – Populates the first 108 bytes of the MAT_RECORD field of matcher_common_data, with data for each record.

Creates User Common Data – Copies data across specified fields of matched record sets using up to 10 specific routines.

The first two functions occur automatically during a window match and are controlled by parameter entries in the create common (pfccflds.par) file. The user can limit the fields used for the decision making by commenting out selected fields in the parameter file. An example parameter file is shown below.

Matcher 7-46 Commonizer Function

Sample pfccflds.par Parameter File

*************************************************************************** * CREATE COMMON FIELD LISTS FOR MATCHER ************************************************************************* BUSINESS_NAME FLDNAMES (PR_BUSNAME_01) FIRST_NAME FLDNAMES (PR_FIRST_01) GENDER FLDNAMES (PR_GENDER_01) HOUSE_NUMBER FLDNAMES (PR_BEST_NUMBER) MIDDLE_NAME FLDNAMES (PR_MIDDLE_DISPLAY_01) LAST_NAME FLDNAMES (PR_LAST_01) POSTAL_CODE FLDNAMES (PR_POSTAL_CODE) STREET_TITLE FLDNAMES (PR_BEST_ST_TL) SOURCE_ID FLDNAMES (SEQNO)

************************************************************************ * USE THE FOLLOWING FIELDS FOR CALLABLE ************************************************************************ *BUSINESS_NAME FLDLOCS (1469,100) *FIRST_NAME FLDLOCS (1238,15) *GENDER FLDLOCS (1468,1) *HOUSE_NUMBER FLDLOCS (921,10) *MIDDLE_NAME FLDLOCS (1283,15) *LAST_NAME FLDLOCS (1358,30) *POSTAL_CODE FLDLOCS (1166,15) *STREET_TITLE FLDLOCS (931,25) *SOURCE_ID FLDLOCS (2635,9)

The first eight entries in the file indicate the fields (or position and length) to source the data for the Standard Common Data segment. The last field, source_id, is used to identify the field containing the data to populate the Matcher links file.

The links file must be created when several matchers will be run consecutively using different window keys for each match. The contents of the links file is then used to resolve the links across all matches.

Trillium Software System™ Batch User’s Guide Survivorship 7-47

Survivorship

Survivorship is the process of selecting the “best record” from a group of matched individual or commercial records. Selecting the Surviving Name

The Commonizer selects the surviving name (parent) from a group of matched individual or commercial records. After a single record containing the best name has been identified, a survivor flag, known as the prime flag, is set to a value of “1” for the selected record. All remaining records in the group have the prime flag set to “0”.

The user can then select the record that contains the prime flag value of 1 with confidence that this record contains the best name. The survivor functionality is used in conjunction with the User Common Data function so that a complete and accurate “best record” can be created.

To select the Individual Survivor:

1. The Commonizer polls through the records to identify those records that contain a valid gender code [“M” (male) or “F” (female)]. Records that do not contain a valid gender code are rejected. 2. The most common surname within the group of records will be identified and selected as the surviving surname. 3. The program then identifies the most common first letter of the first name of each record within the group. Records which have a first name initial that is less than the most common first name initial are rejected by the program. Of the records that were not rejected, the first name that has more than one character qualifies as the surviving first name candidate. 4. The program then identifies the most common first letter of the middle name of each record within the group in combination with the first initial of all records in the group. Records which have a first initial/middle name initial that is less common than the most common first name/middle name initial are rejected by the program. Of the records that were not rejected, the middle name that has more than one character is selected as the surviving middle name.

Matcher 7-48 Survivorship Example

Survivorship Example

In the following survivorship example, the first record is chosen as survivor: First Middle Title Name Name(s) Surname Gender DOB

1. John Simon James Smith M 06181960

2. John James Smyth M 06181960

3. John Smith M 06181960

4. John S Smith M 06181960

5. Mr. J S Smith M 06181960

6. J Smith 06181960

The following selection logic was used for the preceding example:

The Commonizer detects Record 6 with an unidentified gender code. This record was rejected for further consideration as the survivor.

In the remaining group of records, ‘Smith’ is the most common surname. The program will reject the surname of Record 2 (Smyth).

The most common first initial of the first name is ‘J’. Of the records starting with ‘J’ only Record 1, 2, 3,and 4 have more than a single character. Since record 2 was already rejected during surname selection, only Record 1, 3 and 4 survive as candidates.

The most common first initial of the middle name is ‘S’ as shown in Record 1, 4, and 5. Record 5 was already rejected; leaving Records 1 and 4 to survive as candidates. Of the two records, only Record 1 has more than a single character starting with an ‘S’ (Simon James).

If only a single record survives after any of the evaluation steps, the process is complete. In this case, Record 1 is the only survivor. When a record fails a test, it is no longer available as a candidate for the surviving name, but is still available as a candidate for User Common Data.

Trillium Software System™ Batch User’s Guide Selecting the Commercial Survivor 7-49

Selecting the Commercial Survivor

If the matched record set contains commercial records, and the CIS_RANK_KEY option is not used, the procedure follows these rules:

1. The program searches through the record set and finds the record with the longest business name. 2. If there is a tie, then the program counts the number of same business name and street name occurrences and flags the record containing this criterion. 3. If there is a tie again, then the first record of the selected set is chosen.

Identifying a Survivor Record Using CIS_RANK_KEY

The Matcher can identify a survivor record based on a unique 2- to 15-character common field called the CIS_RANK_KEY found in the ORG_RECORD (rather than Name/Address components used in the normal match process). This key is copied to all records that match that survivor. CIS_RANK_KEY Example The CIS_RANK_KEY field identifies the survivor record within the household (lowest numerical value in the common field). This 15–character key is commonized via the hhld_street_name field. Create Common Parameters The input fields used with Create Common are required to have a 2- to 15-character unique identifier. Trillium Software recommends using a 2-character rank field, followed by a 13-character unique identifier. This common field (CIS_RANK_KEY) is identified in the Common Fields parameter file, replacing the 15 character field (STREET_TITLE) that is normally used for the Common Data Segment.

Matcher 7-50 Identifying a Survivor Record Using CIS_RANK_KEY

Create Common Field Example The following sample Create Common field lists show a traditional list and an enhanced list:

O l d C r e a t e C o m m o n F i e l d L i s t New Create Common Field List branch_number fldlocs(2203,2) business_name fldnames(pr_busname_01) business_name fldnames(pr_busname_01) first_name fldnames(pr_first_01) first_name fldnames(pr_first_01) gender fldnames(pr_gender_01) gender fldnames(pr_gender_01) house_number fldnames(pr_best_number) house_number fldnames(pr_best_number) middle_name fldnames(pr_middle_display_01) middle_name fldnames(pr_middle_display_01) last_name fldnames(pr_last_01) last_name fldnames(pr_last_01) postal_code fldnames(pr_postal_code) postal_code fldnames(pr_postal_code) cis_rank_key fldlocs (2500,15) street_title fldlocs(2205,15) source_id fldnames(seqno) source_id fldnames(seqno) branch_number fldlocs(2203,2)

The CIS_RANK_KEY in the new, enhanced field list replaces the STREET_TITLE field that is used in the traditional Create Common field list.

Trillium Software System™ Batch User’s Guide Standard Common Data 7-51

Standard Common Data

Matcher common data is determined for each record and placed in the matcher_common_data section. This data is commonized across a matched set of records after the surviving record has been identified. The commonized data is copied from the surviving record as determined using the rules previously listed in the Survivorship section.

Prime Flag for Type Description Segment 1 Household Survivor flag (1=survivor, 0=non-survivor) based on the first individual survivor flag in the household or the best record according to the commercial survivorship rules. Segment 2 Individual Record containing the best name according to the survivorship rules. The matcher_common_data segment has the following format:

Table 7.3 Format of Standard Common Data Segment

Match Segment Field Length 1 Household/commercial) Branch 4 Last_Name 13 Street_Title 15 House_Number 4 Postal_Code 9 Members 8 Prime Flag 1 2 Individual Branch 4 First_Name 5 Postal_Code 9 Members 8 Prime_Flag 1 3 Individual Branch 4 First_Name 5 Postal_Code 9 Members 8 Prime_Flag 1

Matcher 7-52 User Common Data

User Common Data

In addition to the Standard Common Data segment described above, the user can instruct the Commonizer to commonize data across a matched set of records at the individual or commercial level.

Data cannot be commonized for a household level match.

Unlike Standard Common Data, commonization is controlled with parameter entries that define selected algorithms listed in the pfccflds.par file. These algorithms allow data to be evaluated anywhere in the record and then commonized across the matched record set.

The structure of the parameter entry defines where the data will be commonized. The user has the option to commonize data in the existing field or a new field. In addition, one field can be evaluated using the selected algorithm and the data to commonize can be sourced from another field.

User Common Data Syntax

where:

Field identifier A word that identifies the particular parameter entry. ’routines’ keyword The keyword “routines.” routine name Name of the routine (See “User Common Data Routines” on page 7- 56). ’fldnames’ keyword The keyword “fldnames” (may also be “fldlocs” for position and length definition). field1,field2,field3 The DDL field names (or position and length) containing the data to apply the User Common Data routine logic, location of the source of the common data and/or the target field to store the common data. See the User Common Data Parameter File Entries below for details. ’parmval’ keyword The keyword “parmval” parmval value The parmval value used with the user common data routine.

Trillium Software System™ Batch User’s Guide User Common Data Parameter File Entries 7-53

User Common Data Parameter File Entries

There are two forms of a parameter entries for User Common Data commonization:

Type Description 1 Two field names are defined after the “fldnames” keyword on the parameter entry line. The first field is the field that contains the data to apply the routine logic. This is also the field to source the common data from. The second field is the target field to store the commonized data sourced from the first field. For example: CUSTOMER_NAME ROUTINES(LONGEST) FLDNAMES(CUST_NAME,NEW_COMPANY_NAME) 2 Three field names are defined after the “fldnames” keyword on the parameter entry line. The first field is the field that contains the data to apply the routine logic. The second is the field to source the common data from. The third is the target field to store the commonized data sourced from the second field. For example: CUSTOMER_NAME ROUTINES(LITERAL) FLDNAMES(SOURCE_FLD,CUST_NAME, NEW_COMPANY_NAME) User Common Data Sample Parameter File **************************************************************************** * PFCCFLDS.PAR ************************************************************************* BUSINESS_NAME FLDNAMES(PR_BUSNAME_01) FIRST_NAME FLDNAMES(PR_FIRST_01) GENDER FLDNAMES(PR_GENDER_01) HOUSE_NUMBER FLDNAMES(PR_BEST_NUMBER) MIDDLE_NAME FLDNAMES(PR_MIDDLE_DISPLAY_01) LAST_NAME FLDNAMES(PR_LAST_01) POSTAL_CODE FLDNAMES(PR_POSTAL_CODE) STREET_TITLE FLDNAMES(PR_BEST_ST_TL) SOURCE_ID FLDNAMES(SEQNO) CUSTOMER_NAME ROUTINES(LONGEST)FLDNAMES(CUST_NAME,NEW_COMPANY_NAME)

See Example 1 below

Matcher 7-54 User Common Data Parameter File Entries

customer_name routines(literal) fldnames(source_fld,cust_name, new_company_name) parmval(GALE) See Example 2 below customer_name routines(most) fldnames(phone_num,new_cust_id,) parmval(NBZ) See Example 3 below

The last three parameter entries in the file (customer_name) are examples of User Common Data parameters. These entries are listed below with example data for each. For a complete list of User Common Data Routines, see the section “User Common Data Routines” on page 7-56. Example 1 customer_name routines(longest) fldnames(cust_name,new_company_name)

This parameter entry will apply the “longest” routine Input to the field cust_name and cust_name new_company_name commonize the longest Rec 1 TRILLIUM string of data from this field Rec 2 TRILLIUM SOFTWARE to the field named new_ company_name. Rec 3 TRILLIUM Output Records 2 and 3 both cust_name new_company_name contain the words TRILLIUM SOFTWARE in the Rec 1 TRILLIUM TRILLIUM SOFTWARE cust_name field. Because Rec 2 TRILLIUM SOFTWARE TRILLIUM SOFTWARE we are using the routine Rec 3 TRILLIUM TRILLIUM SOFTWARE LONGEST, after commonization, the longest occurrence of data in the cust_name field is populated into the target field, named new_company_name.

Trillium Software System™ Batch User’s Guide User Common Data Parameter File Entries 7-55

Example 2 customer_name routines(literal) fldnames(source_fld,cust_name, new_company_name) parmval(GALE)

In this parameter entry, Input the routine LITERAL is applied to this field source_fld cust_name new_company_name using a parmval of Rec 1 OAK TRILLIUM GALE. This function will Rec 2 GALE ACME look for the word GALE Rec 3 JONES TRILLIUM in the source_fld of each record in the Output matched set, and source_fld cust_name new_company_name commonize the data from the GALE record. Rec 1 OAK TRILLIUM ACME The data to commonize Rec 2 GALE ACME ACME is sourced from the field Rec 3 JONES TRILLIUM ACME cust_name and is populated in the field new_company_name. Record 2 contains the word GALE in the source_fld field. Example 3 customer_name routines(most) fldnames(phone_num,new_cust_id,) parmval(NBZ) Here we are using the routine MOST to find the most common occurrence of data Input in the phone_num field. Because we have phone_num new_cust_id applied a parmval of NBZ (non-blank, Rec 1 978-901-0000 non-zero), commonization of the blank Rec 2 field is prohibited. Rec 3

Records 2 and 3 are both missing data in Output the phone_num field. After phone_num new_cust_id commonization, the target field new_ Rec 1 978-901-0000 978-901-0000 cust_id contains the data sourced from the phone_num field that was populated Rec 2 978-901-0000 with data. Rec 3 978-901-0000

Matcher 7-56 User Common Data Routines

When building User Common entries, placement of spaces between the items, entered in the pfccflds.par file, is important. The system will not run if the spaces are not there. See the following correct format: DATE_OF_BIRTH ROUTINES(HIGHCHAR) FLDNAMES(DOB,DOB_COMMON) PARMVAL(NZ) spaces ^ ^ ^

User Common Data Routines

These parmvals allowed in the routines and shown in the preceding examples. Key: NB=Nonblank NZ=Non-zero NBZ=Non-blank,non-zero

Table 7.4 User Common Data Routines

Routine Parmvals Definition Example

LONGEST N/A Compares the length of the data Test field = Smith in a field (test field) on one Test field = Smit record against the length of the data in the same field on another Here, commonize the contents record. The system commonizes of field, “Smith” (the longer of the longer of the two fields. the two).

SHORTEST N/A This compares the length of the Test field = Smith data in a field (test field) on one Test field = Smit record against the length of the In this case, commonize the data in the same field on another contents of test field, “Smit” record. The system commonizes (the shorter of the two). the shorter of the two fields compared.

Trillium Software System™ Batch User’s Guide User Common Data Routines 7-57

Table 7.4 User Common Data Routines

Routine Parmvals Definition Example

MOST NB The most frequent occurrence. If “John” appears 3 times; NZ (If there is a tie, the first record and “Jon” appears 2 times. NBZ in the window is chosen to Then commonize using “John” commonize.) OR If both “John” and “Jon” appear 3 times and the record with “John” was the first record into the window, “John” is used to commonize.

LEAST NB The least frequent occurrence. If “John” appears 3 times and NZ (If there is a tie, the first record “Jon” appears 2 times, NBZ in the window is chosen to commonize w/ “Jon” commonize.) OR If both “John” and “Jon” appear 3 times and the record with “John” was the first record into the window, “John” is used to commonize.

LITERAL Any value This routine commonizes on a Search for the value “617” in specific value. an area code field and commonize it. The field must be the same size as the literal value to commonize.

LOWCHAR NB Commonize on the highest or If using LOWCHAR and field1 = HIGHCHAR NZ lowest character value. “F” and field2 = “M”, “F” would NBZ be commonized, because the Can be either an Alphabetic or letter “F” is considered lower Numeric character. than the letter “M” in the alphabet. OR If using HIGHCHAR and field1 = “315” and field2 = “191”, “315” would be commonized.

Matcher 7-58 Error Messages

Error Messages

This table describes the error messages that can be returned by the Matcher. Table 7.5 CFMATDRV Error Messages

Message Description Must have a parm file Parameter file is missing from the command line. Parm Processing Error, status = 2 Matcher parameter file is present but incorrect. Check the file path and name. Parm Processing Error, status = 3 Parameter file for the Matcher is present but can’t be opened. Parm Processing Error, status = 4 Matcher program has processed the parameter file and has encountered an error with a parameter entry. Use the parameter echo debugging process to determine the incorrect entry. Must have at least one field list. Matcher parameter file must contain at least one of the following parameters: COMM_FLDS_FNAME, HHLD_ FLDS_FNAME, INDV_FLDS_FNAME Must have at least one pattern list. Matcher parameter file must contain at least one of the following parameters: COMM_PATS_FNAME, HHLD_ PATS_FNAME, INDV_PATS_FNAME Missing common field list. Required parameter is not included in the Matcher parameter file: COMM_FLDS_FNAME

Missing input DDL record name. Required DDL_INP_RNAME parameter not included in parameter file.

Missing match DDL record name. Required DDL_MAT_RNAME parameter not included in parameter file.

Missing output DDL record name. Required DDL_OUT_RNAME parameter not included in parameter file.

Missing record length for cross Required TABLE_LRECL parameter not included in reference file. parameter file. Required if window key splitting is used via table-defined method.

Missing large window key number LARGE_WINDOW_KEY_NUMBER not included in driver for cross reference file. parameter file. Required if window key splitting will be used via the table-defined method.

Missing tie breaker field name for TIE_BREAKER_FIELD_NAME not included in driver cross reference file. parameter file. Required if window key splitting will be used via the table-defined method.

Trillium Software System™ Batch User’s Guide Error Messages 7-59

Table 7.5 CFMATDRV Error Messages (Continued)

Message Description

Missing cross reference file. TABLE_DDNAME not included in parameter file. Required if window key splitting will be used via the table- defined method. Insufficient memory for matcher Not enough system memory available at startup time return buffer. of the Matcher. Insufficient memory for the buffer Machine is out of memory. Usually caused by a large OUTPUT_AREA. number of records with the same window key value. Reduce the number of records with the same window key value or control the number of records added to the Match window (using either MAX_WINDOW_SIZE or LARGE_WINDOW_KEY_NUMBER.) Commercial field and pattern list User has attempted to override name form using required with Commercial match. OVERRIDE_NMFORM with the value C for a commercial match, but has not defined commercial field and pattern parameter files. Retail field and pattern list required User has attempted to override name form with with Retail match. OVERRIDE_NMFORM, with the value "R" for a personal (retail) match, but has not defined household and individual field and pattern parameter files. Could not modify window record. Name forms (1 = personal, 2, 3 = business) are not consistent for an entire window key set. Occurs when the window key does not include name form value as part of the key.

Ref household pattern field must When REFHHPAT parameter is used, the DDL field name have length three. this parameter references is not 3 bytes in length. Check the length statement for the field in the output DDL. Reference matching only.

Ref individual pattern field must When the REFINPAT parameter is used, the DDL field have length three. name this parameter references is not three bytes in length. Check the length statement for the field in the output DDL. Reference matching only.

No input file specified. Required INP_DDNAME parameter not included in parameter file.

Unable to open input file. Input file defined by INP_DDNAME is present, but cannot be opened. User does not have permission to do so, or the file contents are corrupt.

Matcher 7-60 Error Messages

Table 7.5 CFMATDRV Error Messages (Continued)

Message Description

Unable to open reference file. File defined by REF_DDNAME is not present or the user does not have permission to do so, or the file contents are corrupt.

Unable to open output file. File defined by OUT_DDNAME driver parameter is invalid or the permissions on the file prevent overwriting of the existing file.

No output file specified Required OUT_DDNAME parameter not included in in parms. parameter file.

Missing parameter Matcher driver CMI_FNAME parameter is missing from CMI_FNAME. the parameter file. REQUIRED for reference matching. Bad return value '2' from: No parameter file name specified. PROCESS_PARAMETERS Bad return value '3' from: Could not open the parameter file. PROCESS_PARAMETERS Bad return value '4' from: Invalid field positions specified in parameter file PROCESS_PARAMETERS – (pfinflds.par) Individual Fields parm file Bad return value '4' from: Invalid field positions specified in parameter file PROCESS_PARAMETERS – (pfhhflds.par) Household Fields parm file Bad return value '4' from: Invalid field positions specified in parameter file PROCESS_PARAMETERS – (pfcoflds.par) Commercial Fields parm file Bad return value '5' from: Invalid field lengths specified in parameter file PROCESS_PARAMETERS – (pfinflds.par) Individual Fields parm file

Bad return value '5' from: Invalid field lengths specified in parameter file PROCESS_PARAMETERS – (pfhhflds.par) Household Fields parm file

Bad return value '5' from: Invalid field lengths specified in parameter file PROCESS_PARAMETERS – (pfcoflds.par) Commercial Fields parm file

Trillium Software System™ Batch User’s Guide Error Messages 7-61

Table 7.5 CFMATDRV Error Messages (Continued)

Message Description Bad return value '6' from: Need spaces to delimit parameter file entries PROCESS_PARAMETERS – (pfinflds.par) Individual Fields Parm file.

Bad return value '6' from: Need spaces to delimit parameter file entries. PROCESS_PARAMETERS – (pfhhflds.par) Household Fields parm file

Bad return value '6' from: Need spaces to delimit parameter file entries. PROCESS_PARAMETERS – (pfcoflds.par) Commercial Fields parm file

Bad return value '7' from: No threshold scores specified in parameter file PROCESS_PARAMETERS – (pfinflds.par) Individual Fields Parm file.

Bad return value '7' from: No threshold scores specified in parameter file PROCESS_PARAMETERS – (pfhhflds.par) Household Fields parm file

Bad return value '7' from: No threshold scores specified in parameter file PROCESS_PARAMETERS – (pfcoflds.par) Commercial Fields parm file

Bad return value '8' from: No field names or field positions and lengths specified PROCESS_PARAMETERS – in parameter file (pfinflds.par) Individual Fields Parm file. Bad return value '8' from: No field names or field positions and lengths specified PROCESS_PARAMETERS – in parameter file (pfhhflds.par) Household Fields parm file Bad return value '8' from: No field names or field positions and lengths specified PROCESS_PARAMETERS – in parameter file (pfcoflds.par) Commercial Fields parm file Bad return value '9' from: No comparison routine names specified in parameter PROCESS_PARAMETERS – file (pfinflds.par) Individual Fields Parm file.

Matcher 7-62 Error Messages

Table 7.5 CFMATDRV Error Messages (Continued)

Message Description Bad return value '9' from: No comparison routine names specified in parameter PROCESS_PARAMETERS – file (pfhhflds.par) Household Fields parm file Bad return value '9' from: No comparison routine names specified in parameter PROCESS_PARAMETERS – file (pfcoflds.par) Commercial Fields parm file Bad return value 'B' from: Unknown propagation routine specified in parameter PROCESS_PARAMETERS – file (pfinflds.par) Individual Fields Parm file. Bad return value 'B' from: Unknown propagation routine specified in parameter PROCESS_PARAMETERS – file (pfhhflds.par) Household Fields parm file Bad return value 'B' from: Unknown propagation routine specified in parameter PROCESS_PARAMETERS – file (pfcoflds.par) Commercial Fields parm file Bad return value 'C' from: Bad match score threshold value in parameter file PROCESS_PARAMETERS – (pfinflds.par) Individual Fields Parm file.

Bad return value 'C' from: Bad match score threshold value in parameter file PROCESS_PARAMETERS – (pfhhflds.par) Household Fields parm file Bad return value 'C' from: Bad match score threshold value in parameter file PROCESS_PARAMETERS – (pfcoflds.par) Commercial Fields parm file Bad return value 'D' from: Unknown comparison routine specified in parameter PROCESS_PARAMETERS – file (pfinflds.par) Individual Fields Parm file.

Bad return value 'D' from: Unknown comparison routine specified in parameter PROCESS_PARAMETERS – file (pfhhflds.par) Household Fields parm file Bad return value 'D' from: Unknown comparison routine specified in parameter PROCESS_PARAMETERS – file (pfcoflds.par) Commercial Fields parm file

Trillium Software System™ Batch User’s Guide Error Messages 7-63

Table 7.5 CFMATDRV Error Messages (Continued)

Message Description Bad return value 'E' from: Field positions couldn’t be retrieved from dictionary PROCESS_PARAMETERS – (pfinflds.par) Individual Fields Parm file.

Bad return value 'E' from: Field positions couldn’t be retrieved from dictionary PROCESS_PARAMETERS – (pfhhflds.par) Household Fields parm file

Bad return value 'E' from: Field positions couldn’t be retrieved from dictionary PROCESS_PARAMETERS – (pfcoflds.par) Commercial Fields parm file Bad return value 'F' from: Missing beginning parenthesis (pfinflds.par) PROCESS_PARAMETERS – Individual Fields Parm file. Bad return value 'F' from: Missing beginning parenthesis (pfhhflds.par) PROCESS_PARAMETERS – Household Fields parm file Bad return value 'F' from: Missing beginning parenthesis (pfcoflds.par) PROCESS_PARAMETERS – Commercial Fields parm file Bad return value 'F' from: Invalid pattern category specified in grade pattern list PROCESS_PARAMETERS – (pfinpats.par) Individual Patterns Parm file. Bad return value 'F' from: Invalid pattern category specified in grade pattern list PROCESS_PARAMETERS – (pfhhpats.par) HOUSEHOLD PATTERNS parm file Bad return value 'F' from: Invalid pattern category specified in grade pattern list PROCESS_PARAMETERS – (pfcopats.par) COMMERCIAL PATTERNS parm file Bad return value 'G' from: Missing comma/ending parenthesis (pfinflds.par) PROCESS_PARAMETERS – Individual Fields Parm file. Bad return value 'G' from: Missing comma/ending parenthesis (pfhhflds.par) PROCESS_PARAMETERS – Household Fields parm file

Matcher 7-64 Error Messages

Table 7.5 CFMATDRV Error Messages (Continued)

Message Description Bad return value 'G' from: Missing comma/ending parenthesis (pfcoflds.par) PROCESS_PARAMETERS – Commercial Fields parm file Bad return value 'G' from: Missing comma/ending parenthesis/CRLF in parm file PROCESS_PARAMETERS – (CMI file is specified in pfmatdrv.par). CMI LIST parm file Bad return value 'G' from: No field list specified prior to grade pattern list PROCESS_PARAMETERS – (pfinpats.par) INDIVIDUAL PATTERNS parm file Bad return value 'G' from: No field list specified prior to grade pattern list PROCESS_PARAMETERS – (pfhhpats.par) HOUSEHOLD PATTERNS parm file Bad return value 'G' from: No field list specified prior to grade pattern list PROCESS_PARAMETERS – (pfcopats.par) COMMERCIAL PATTERNS parm file Bad return value 'H' from: Unknown keyword in parameter entry of individual PROCESS_PARAMETERS – fields parameter file (pfinflds.par) Individual FieldsHousehold Fields parm file’ Bad return value 'H' from: Unknown keyword in parameter entry of household PROCESS_PARAMETERS – fields parameter file (pfhhflds.par) Household Fields parm file’ Bad return value 'H' from: Unknown keyword in parameter entry of commercial PROCESS_PARAMETERS – fields parameter file (pfcoflds.par) Commercial Fields parm file’ Bad return value 'H' from: Invalid pattern identified in grade pattern list of PROCESS_PARAMETERS – individual patterns parameter file (pfinpats.par) INDIVIDUAL PATTERNS parm file Bad return value 'H' from: Invalid pattern identified in grade pattern list of PROCESS_PARAMETERS – household patterns parameter file (pfhhpats.par) HOUSEHOLD PATTERNS parm file Bad return value 'H' from: Invalid pattern identified in grade pattern list of PROCESS_PARAMETERS – commercial patterns parameter file (pfcopats.par) COMMERCIAL PATTERNS parm file

Trillium Software System™ Batch User’s Guide Error Messages 7-65

Table 7.5 CFMATDRV Error Messages (Continued)

Message Description Bad return value 'I' from: Comparison routine name too long (pfinflds.par) PROCESS_PARAMETERS – Individual Fields parm file’ Bad return value 'I' from: Comparison routine name too long (pfhhflds.par) PROCESS_PARAMETERS – Household Fields parm file’ Bad return value 'I' from: Comparison routine name too long (pfcoflds.par) PROCESS_PARAMETERS – Commercial Fields parm file’ Bad return value 'I' from: Bad character (non ABCDEF) in pattern grade string. PROCESS_PARAMETERS – (pfinpats.par) INDIVIDUAL PATTERNS parm file Bad return value 'I' from: Bad character (non ABCDEF) in pattern grade string. PROCESS_PARAMETERS – (pfhhpats.par) HOUSEHOLD PATTERNS parm file Bad return value 'I' from: Bad character (non ABCDEF) in pattern grade string. PROCESS_PARAMETERS – (pfcopats.par) COMMERCIAL PATTERNS parm file Bad return value 'J' from: Target and source must be the same length. PROCESS_PARAMETERS – Commercial Fields parm file Bad return value 'L' from: Invalid parmval found PROCESS_PARAMETERS Bad return value 'M' from: Invalid parmval for routine PROCESS_PARAMETERS Bad return value 'N' from: Excess number of fields (greater than 50) in match PROCESS_PARAMETERS fields. Bad return value 'O' from: More than one PREVENT comparison routine is used PROCESS_PARAMETERS within matching.

Matcher 7-66 Running the Matcher on UNIX and 32-bit PC Platforms

Running the Matcher on UNIX and 32-bit PC Platforms

To run the cfmatdrv driver, use the following command syntax: cfmatdrv –parmfile parm_file_name –parmecho echo_file_name where:

cfmatdrv Name of the driver program –parmfile Keyword that indicates that the parameter file follows parm_file_name Name of the driver parameter file –parmecho Keyword that indicates that the parameter echo file follows echo_file_name Optional file used by –parmecho to store processing error information

Example cfmatdrv -parmfile ..\parms\pfmatdrv.par -parmecho ..\data\echo

Trillium Software System™ Batch User’s Guide IBM Mainframe Matcher Sample JCL 7-67

IBM Mainframe Matcher Sample JCL /********************************************************* //* SAMPLE JCL TO RUN MATCHER PROGRAM (CFMATDRV) //******************************************************** //CFMATDRV EXEC PGM=CFMATDRV,REGION=5500K,COND=(0,NE), // PARM='/-PARMFILE PF -PARMECHO PE', REGION=0M //STEPLIB DD DSN=&BASEPREF.&TRILVER.LOADLIB,DISP=SHR // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //CEEDUMP DD DUMMY,DCB=BLKSIZE=133 //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //TRILMSGS DD DUMMY //PF DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFMATDRV) //PE DD SYSOUT=* //PARSOUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(PARSOUT) //MATCHRET DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(MATCHRET) //PMOUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DATA.GEOOUT //MAOUT DD UNIT=&UNIT,DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=2643,BLKSIZE=21144), // SPACE=(TRK,(10,50),RLSE), // DSN=&PROJPREF.&TRILVER.US.DATA.MAOUT //MALINK DD DUMMY,DCB=(RECFM=FB,LRECL=44,BLKSIZE=23144) //MASTAT DD SYSOUT=* //PFHHFLDS DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFHHFLDS) //PFHHPATS DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFHHPATS) //PFINFLDS DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFINFLDS) //PFINPATS DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFINPATS) //PFCOFLDS DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFCOFLDS) //PFCOPATS DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFCOPATS) //PFCCFLDS DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFCCFLDS)

Matcher 7-68 Matcher Display Programs

Matcher Display Programs

These display programs are for reviewing output of the Matcher. These programs are run as a separate step to allow sorting of the input file. This section describes the Matcher display programs (cffxmdsp, cffxmduk, and cfmatdsp) that are used to display output from the Matcher. Table 7.6 Matcher Display Program Descriptions

Program Description

cfmatdsp Displays a report with the parsed names and the original name Matcher Display Program and address. This program focuses on viewing original name and address data. Available for all countries.

cffxmdsp Displays a report with name and address fields ordinarily used Fixed-Match Display in the Matching process. For example, prefix, first, middle 1 and Program (US only) 2, surname, generation, house #, street name, apartment #, box #, city and postal code. This program focuses on viewing parsed data. Available for US only.

cffxmduk Displays a report with Name and address fields ordinarily used Fixed-Match Display in the Matching process. For example, prefix, first, middle 1 and Program (UK only) 2, surname, generation, house #, street name, apartment #, box #, city and postal code. This program focuses on viewing parsed data. Available for UK only.

Display Program DDL Requirements

One DDL is required to describe the output record from the Matcher. This DDL is named in the Matcher parameter file.

The program assumes standard names for the input, Parser, and Matcher data in the DDL.

If a field is not defined in the DDL, the field is filled with blanks and the processing continues. See the “Data Dictionary Language” section of the Control Center manual (in the section with the DDL Editor tool) for more information about creating and using DDLs. These programs are flexible and can be used to show household, customer, account and suspect sets.

Trillium Software System™ Batch User’s Guide CFMATDSP Display Program 7-69

CFMATDSP Display Program

Display Program Parameters

The following parameters are used in all of the Matcher output display programs. Please note that required parameters appear in bold and shaded. Table 7.7 Matcher Display Program Parameters

Parameter Value Description

CLIENT Client name Client name to display on the report.

DDL_INP_FNAME file name File name of the input DDL.

DDL_INP_RNAME record name Record name of the input DDL.

INNER_KEY field name Inner key field used for organizing records in the report (field is usually matched_indv_in_matched_hhld).

INP_DDNAME file name Name of the input file.

MAXIN Numeric Maximum number of outer key households to read (if blank, all records in the file are read).

MAXLINES_ON_ Numeric Page length, in lines. APAGE

Matcher 7-70 Display Program Parameters

Table 7.7 Matcher Display Program Parameters (Continued)

Parameter Value Description

MAXOUT Numeric Maximum number of records to write (if blank, all records in the file are displayed).

MIN_INNER_SETS Numeric Minimum number of allowable inner sets for output to generate.

MIN_NUMB_NAMES Numeric Minimum number of names required for match group to display.

NTH_OUTER_SETS Numeric Prints every nth qualifying outer_key set.

OPT_FIELD1– User-defined Client-defined optional fields: 1–10 for cfmatdsp; 1–5 OPT_FIELD10 fields for cffxmdsp. The Program generates as many lines as needed to print all optional fields.

OUTER_KEY Field name Outer key field used for organizing records in the report (field is usually matched_hhld).

PATTERN_DISPLAY Pattern field Specifies the pattern to display: name matched_hhld_pattern, suspect_hhld_pattern matched_indv_pattern suspect_indv_pattern For CFFXMDSP only.

PRIME_DISPLAY Prime flag Identifies the prime flag to use in the report: field name hhld_prime_flag susp_prime_flag indv_prime_flag For CFFXMDSP only.

PRN_DDNAME File name Name of the report output file.

TITLE Title Title to display at the top of the report.

TITLE2 Title2 Secondary title line to display on the report.

Trillium Software System™ Batch User’s Guide Sample Parameter File for CFMATDSP 7-71

Sample Parameter File for CFMATDSP *************************************************************** * Parm file for match report, original lines version (pfmatdsp.par) *************************************************************** *MAXIN 1000 CLIENT "ANY CLIENT" DDL_INP_FNAME ..\dict\parsout.ddl DDL_INP_RNAME PARSOUT INNER_KEY matched_indv_in_matched_hhld INP_DDNAME ..\data\maout MAXLINES_ON_APAGE 60 MIN_NUMB_NAMES 2 OUTER_KEY matched_hhld PRN_DDNAME ..\data\madsp TITLE "TRILLIUM DEMONSTRATION MATCH RULES" TITLE2 "REPORT SHOWS ORIGINAL N/A LINES"

Running the Matcher Display Program on UNIX and 32-Bit PC Platforms Use the following command-line syntax to run cfmatdsp: cfmatdsp -parmfile parm_file_name -parmecho echo_file_name where:

cfmatdsp Name of the driver program. –parmfile Keyword that indicates the parameter file follows. parm_file_name Name of the driver parameter file. –parmecho Keyword that indicates the parameter echo file follows. echo_file_name Optional file used by –parmecho to store processing error information.

Matcher 7-72 IBM Mainframe Matcher Display Sample JCL

IBM Mainframe Matcher Display Sample JCL // *********************************************************** * //* SAMPLE JCL TO RUN ORIGINAL MATCH DISPLAY PROGRAM (CFMATDSP) // *********************************************************** * //CFMATDSP EXEC PGM=CFMATDSP,REGION=5500K,COND=(0,NE), // PARM='/-PARMFILE PF -PARMECHO PE', REGION=0M //STEPLIB DD DSN=&BASEPREF.&TRILVER.LOADLIB,DISP=SHR // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //TRILMSGS DD DUMMY //SYSTMP01 DD UNIT=&UNIT,DISP=(NEW,PASS), // DSN=&&CFMDSP1,SPACE=(TRK,(300,10),RLSE) //SYSTMP02 DD UNIT=&UNIT,DISP=(NEW,PASS), // DSN=&&CFMDSP2,SPACE=(TRK,(300,10),RLSE) //PF DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFMATDSP) //PE DD SYSOUT=* //PARSOUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(PARSOUT) //MAOUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DATA.MAOUT //MADSP DD SYSOUT=* //*MADSP DD SYSOUT=1,CHARS=(GS20), //* DCB=(RECFM=FB,LRECL=250,BLKSIZE=2500)

Trillium Software System™ Batch User’s Guide Display Program Errors 7-73

Display Program Errors

This is a list of error messages returned from the display program. Table 7.8 CFMATDSP Error Messages

Message Description

Missing <-parmfile> parameter. The path and parameter file for the match display program is missing from the command line. Check the file path and name.

Must have a parm file. The parameter file for the Matcher display program is missing from the command line. Check the file path and name.

Parm Processing Error, status = 2 The parameter file for the match display program is present but incorrect. Check the file path and name.

Parm Processing Error, status = 3 The parameter file for the Matcher display program is present but cannot be opened. Check the permissions on the file.

Parm Processing Error, status = 4 The display program has processed the parameter file and encountered an error with a parameter entry. Use the parameter echo debugging process to determine the entry that is incorrect.

During dictionary open. The data dictionary defined in DDL_INP_FNAME cannot be opened. Check the file path and name.

outer_key field too long. The display program has processed the parameter file and has encountered an error with the parameter entry: The field defined in the OUTER_KEY parameter cannot exceed 32 bytes.

Missing parameter OUTER_KEY. The match display program has processed the parameter file and has encountered an error with the parameter entry: The parameter OUTER_KEY has no value present.

Inner key field too long. The display program has processed the parameter file and has encountered an error with the parameter entry: The field defined in the INNER_KEY parameter cannot exceed 32 bytes.

Matcher 7-74 Display Program Errors

Table 7.8 CFMATDSP Error Messages (Continued)

Message Description

Insufficient Memory. This error can occur if there is not enough system memory available at startup time of the Matcher.

Cannot access from The field used in this operation is not defined on the the data dictionary." DDL being used for this process. Check the DDL and the information for this field, before running the process again. Use the Syntax checker, to check for errors and use the DDL recalculation tool, to ensure that the DDL record length is correct.

Inner key field too long. The display program has processed the parameter file and has encountered an error with the parameter entry: The field defined in the INNER_KEY parameter cannot exceed 32 bytes.

Insufficient Memory. This error can occur if there is not enough system memory available at startup time of the Matcher.

Cannot access from The field in this operation is not defined on the DDL the data dictionary." being used for this process. Check your DDL and the information for this field before running the process again. Use the Syntax checker, to check for errors and use the DDL recalculation tool, to ensure that the DDL record length is correct.

No input file specified. An input data file has not been specified in the INP_DDNAME.

Unable to open file . The display program is unable to open the file specified in INP_DDNAME. Check the file path and name.

Unable to initialize for household The display program is reading in an empty read. MAOUT file.

I/O error during read on The temp directory which holds overflow of record for . this process has run out of space.

Unable to get the value for The value specified is not valid for this field. Check your . DDL or field positions and lengths used by the Matcher display program.

Trillium Software System™ Batch User’s Guide Display Program Errors 7-75

Table 7.8 CFMATDSP Error Messages (Continued)

Message Description

Closing file . An error has occurred trying to close the specified file. Check that the file has not been corrupted and that there is sufficient space available for the write operation to this file.

Matcher 7-76 Running CFFXMDSP on UNIX and 32-Bit PC Platforms

Running CFFXMDSP on UNIX and 32-Bit PC Platforms

Use the following command syntax to run cffxmdsp: cffxmdsp –parmfile parm_file_name –parmecho echo_file_name where:

cffxmdsp Name of the driver program. –parmfile Keyword that specifies that the parameter file follows. parm_file_name Name of the driver parameter file. –parmecho Keyword that specifies that the parameter echo file follows. echo_file_name Optional file used by –parmecho to store processing error information.

Trillium Software System™ Batch User’s Guide IBM Mainframe Sample JCL for cffxmdsp 7-77

IBM Mainframe Sample JCL for cffxmdsp

The following sample Job Control Language is used to run cffxmdsp:

//********************************************************* //* SAMPLE JCL TO RUN FIXED MATCH DISPLAY PROGRAM (CFFXMDSP) //********************************************************* //CFFXMDSP EXEC PGM=CFFXMDSP,REGION=5500K,COND=(0,NE), // PARM='/-PARMFILE PF -PARMECHO PE', REGION=0M //STEPLIB DD DSN=&BASEPREF.&TRILVER.LOADLIB,DISP=SHR // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //TRILMSGS DD DUMMY //SYSTMP01 DD UNIT=&UNIT,DISP=(NEW,PASS), // DSN=&&CFFDSP1,SPACE=(TRK,(300,10),RLSE) //SYSTMP02 DD UNIT=&UNIT,DISP=(NEW,PASS), // DSN=&&CFFDSP2,SPACE=(TRK,(300,10),RLSE) //PF DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFFMFXDSP) //PE DD SYSOUT=* //PARSOUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DDLLIB(PASOUT) //MAOUT DD DISP=SHR,DSN=&PROJPREF.&TRILVER.US.DATA.MAOUT //MAFXDSP DD SYSOUT=* //*MAFXDSP DD SYSOUT=1,CHARS=(GS15), //* DCB=(RECFM=FB,LRECL=133,BLKSIZE=1330)

Matcher 7-78 Matcher Driver 2

Matcher Driver 2

Using CFMATCH with CKM

The cfmatch driver program can be used to match a file against itself (called window matching), or it may be used to match a file against a reference file (reference matching).

The input files are read one window key value at a time within the window key code number. (It is assumed that the input files are sorted in window key code number, window key value sequence) Within each window key value, the program performs the window matching between the transaction file then perform the reference file match between the transaction file and reference file.

This program also produces a statistic file for each window key code number summarizing the results of the matching process and produces multiple output matching link files.

This program can be used to match at multiple window key codes in one pass of the input files. Each window key code value can have different matching fields and matching patterns files.

Trillium Software System™ Batch User’s Guide CFMATCH Driver Parameters 7-79

CFMATCH Driver Parameters

All required parameters appear in bold and shaded. Table 7.9 CFMATCH Matcher Parameters

Parameter Value Description

ADD_CRLF Y or Set to Y to append a carriage return/line feed (2 bytes) Blank to the end of the output matching link files.

BYPASS_WKEY_ Numeric List of window key code values to bypass from the CODES matching process.

GENERATE_ Y or Set to Y to create a reciprocal output matching link. RECIPROCAL_TRAN Blank

INP_DDL File Name of the input DDL and record name within the name required DDL. Describes the shape of the input files. REQUIRED

MAX_REF_WINDOW_ Numeric Limits the size of the reference Match window to a SIZE specific number of records. (Default is 50000.) See Note following this table.

MAX_TRN_WINDOW_ Numeric Limits the size of the transactional Match window to a SIZE specific number of records.(Default is 50000.) Not used when matching on the window key code value defined by the parameter PREV_LEVEL1_WINDOW. See the Note following this table.

MATCH_WKEY_NN File Required entries for parameter include: names First entry—file the contains the matching fields Second entry—file the contains the matching patterns Third entry—file the contains the matching statistics MATCH_WKEY_01 ..\PARMS\HHFLDS, ..\PARMS\HHPATS, ..DATA\WINKEY01.STA One parameter entry for each unique window key code value is required on the input files. Valid values for NN are 01—30

PREV_LEVEL1_ Numeric Value of the window key code value that defines the WINDOW Customer Key Manager required window key.

Matcher 7-80 CFMATCH Driver Parameters

Table 7.9 CFMATCH Matcher Parameters

Parameter Value Description

PRINT_NTH_COUNT Numeric Prints the count of every nth record read. If 0 or not specified, no counts are printed.

REF_WKEY_CODE_ Field Field name that contains the 2-byte window key code FIELD_NAME name value on the input reference file. REQUIRED

REF_WKEY_FIELD_ Field Field name that contains the window key value on the NAME name input reference file. REQUIRED

REF_RECORD_ID_ Field Field name to use to generate the output link file from FIELD_NAME name the reference file. Maximum field length is 19. REQUIRED

REFMAT_NL1_ File The output link file that contains matching links DDNAME name generated during the reference match phase. Links are created for all window key code values EXCEPT the one defined by the parameter PREV_LEVEL1_WINDOW. REQUIRED

REFMAT_YL1_ File Output link file that contains matching links generated DDNAME name during the reference match phase. Links are created ONLY for the window key code values defined by the PREV_LEVEL1_WINDOW parameter.

REF_DDNAME File Reference input file name. REQUIRED name

SAVE_KEY_PAT_ Y, Blank If set to Y, append the window key code value (2 bytes) NUMBS and the matching pass pattern number (3 bytes) to output matching link files.

TABLE_DDNAME File File name of the table to contain large window key name values. The format of the table entries is 2 bytes for the window key number followed by the window key value. Used when large Match windows will exist and the user wants to limit the number of records added to the Match window, so that memory usage is minimized and performance is maximized. See “Handling Large Window Keys” on page 7-18.

Trillium Software System™ Batch User’s Guide CFMATCH Driver Parameters 7-81

Table 7.9 CFMATCH Matcher Parameters

Parameter Value Description

TABLE_RECL* Numeric Record length of the table that contains large window key entries. The length must include any platform- specific control characters. See Special Table , “Handling Large Window Keys,” on page 7-18.

TIE_BREAKER_FIELD_ Field DDL field name containing the data to determine where NAME name to stop adding records to the Match window. When this field does not contain the same value as the records previously added to the Match window, the process stops and the match executes for the set of records present in the Match window at that time.

TRN_DDNAME Field Transactional input file name. REQUIRED name

TRN_WKEY_CODE_ Field Field name that contains the 2 byte window key code FIELD_NAME name value on the input transactional file. REQUIRED

TRN_WKEY_FIELD_ Field Field name that contains the window key value on the NAME name input transactional file. Maximum field length is 50 bytes. REQUIRED

TRN_RECORD_ID_ Field Field name to use to generate the output link file from FIELD_NAME name the transactional file. Maximum field length is 19. REQUIRED

WINMAT_NL1_ File Output link file that contains matching links generated DDNAME name during the window match phase. Links are created for all window key code values EXCEPT the one defined by PREV_LEVEL1_WINDOW. REQUIRED

WINMAT_YL1_ File Output link file that contains matching links generated DDNAME name during the window match phase. Links are created ONLY for the window key code values defined by the PREV_ LEVEL1_WINDOW.

WKEY_CODES_NO_ Numeric List of window key code values to bypass from the WINMAT window matching portion of the program.

Matcher 7-82 Sample CFMATCH Matcher Driver Parameter File

Special caution! The MAX_REF_WINDOW_SIZE parameter value controls how many reference records are added to the Match window. If there are more records of one window key value than the value of this parameter, the additional reference records will not be compared during the reference match. Therefore, caution must be taken when using this parameter.

Contact Technical Support (1-978-901-0000) for additional information about controlling Match window sizes.

Sample CFMATCH Matcher Driver Parameter File

*************************************************************** * PFMATCH - SAMPLE PARAMETER FILE FOR CFMATCH DRIVER *************************************************************** INP_DDL ..\dict\input.ddl,MATREC TRN_DDNAME ..\data\tran.srt REF_DDNAME ..\data\mast.srt WINMAT_NL1_DDNAME ..\data\winmat.nl1 WINMAT_YL1_DDNAME ..\data\winmat.yl1 REFMAT_NL1_DDNAME ..\data\refmat.nl1 REFMAT_YL1_DDNAME ..\data\refmat.yl1 TRN_WKEY_CODE_FIELD_NAME window_code TRN_WKEY_FIELD_NAME window_key REF_WKEY_CODE_FIELD_NAME window_code REF_WKEY_FIELD_NAME window_key TRN_RECORD_ID_FIELD_NAME record_id REF_RECORD_ID_FIELD_NAME level1key PREV_LEVEL1_WINDOW04

*location of matching fields, patterns, statistics

MATCH_WKEY_01 ..\parms\hhflds,..\parms\hhpats,..\data\winkey01.sta MATCH_WKEY_02 ..\parms\hhflds,..\parms\hhpats,..\data\winkey02.sta MATCH_WKEY_03 ..\parms\ssnflds,..\parms\ssnpats,..\data\winkey03.sta MATCH_WKEY_04 ..\parms\rgflds,..\parms\rgpats,..\data\winkey04.sta

Trillium Software System™ Batch User’s Guide IBM Mainframe SAMPLE JCL for CFMATCH 7-83

IBM Mainframe SAMPLE JCL for CFMATCH ********************************************************** //* SAMPLE JCL TO RUN MATCHER PROGRAM (CFMATCH) ********************************************************** //CFMATCH EXEC PGM=CFMATCH,REGION=5500K,COND=(0,NE), // PARM='/-PARMFILE PF', REGION=0M //STEPLIB DD DSN=&BASEPREF.&TRILVER.LOADLIB,DISP=SHR // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //INPUTDDL DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.DDLLIB(MATCH) //CEEDUMP DD DUMMY,DCB=BLKSIZE=133 //SYSOUT DD SYSOUT=* //SYSPRINT DD SYSOUT=* //PF DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.PARMLIB(PFMATCH) //PE DD SYSOUT=* //* INPUT FILES //MASTINP DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.MASTER.INPUT //TRANINP DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.TRAN.INPUT //* OUTPUT FILES //WINNL1 DD DISP=(NEW,CATLG,DELETE), // UNIT=&UNIT, // SPACE=(TRK,(50,10),RLSE), // DCB=(RECFM=FB,LRECL=38,BLKSIZE=0), // DSN=&PROJPREF.&TRILVER.US.WINMAT.NL1 //WINYL1 DD DISP=(NEW,CATLG,DELETE), // UNIT=&UNIT, // SPACE=(TRK,(50,10),RLSE), // DCB=(RECFM=FB,LRECL=38,BLKSIZE=0), // DSN=&PROJPREF.&TRILVER.US.WINMAT.YL1 //REFNL1 DD DISP=(NEW,CATLG,DELETE), // UNIT=&UNIT, // SPACE=(TRK,(50,10),RLSE), // DCB=(RECFM=FB,LRECL=38,BLKSIZE=0), // DSN=&PROJPREF.&TRILVER.US.REFMAT.NL1 //REFYL1 DD DISP=(NEW,CATLG,DELETE), // UNIT=&UNIT, // SPACE=(TRK,(50,10),RLSE), // DCB=(RECFM=FB,LRECL=38,BLKSIZE=0), // DSN=&PROJPREF.&TRILVER.US.REFMAT.YL1 //* MATCHING FIELDS AND PATTERNS //HHFLDS DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.PARMLIB(HHFLDS) //HHPATS DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.PARMLIB(HHPATS) //SSNFLDS DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.PARMLIB(SSNFLDS) //SSNPATS DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.PARMLIB(SSNPATS) //RGFLDS DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.US.PARMLIB(RGFLDS) //RGPATS DD DISP=SHR,

Matcher 7-84 Display Program Errors

Display Program Errors

This is a list of error messages returned from the display program. Table 7.10 CFMATCH Error Messages

Message Description

TRN WINDOW SET FAILED The re-allocate process failed when reallocating memory to load a set of transaction records. This can happen if the internal max system of 50000 is being exceeded. To workaround, you can set the MAX_WINDOW_SIZE to a higher value. You can also check the if the IEFUSI or IEALIMIT are active in this instance. If they are active, set them to a higher value. Set the parameter REGION= 0 MB and specify DEBUG=YES.

Trillium Software System™ Batch User’s Guide Matcher Driver #3 7-85

Matcher Driver #3

Using CFWINMAT with CKM

The CFWINMAT driver program matches input records which are read one window key set at a time. (It is assumed that the input data is sorted in window key sequence as referenced in the parameter file).

The CFWINMAT driver program produces three files:

1. An output file that contains output records with the matched link appended onto each record, in the field defined by the LINK_KEY parameter. 2. A statistics file that summarizes the results of the matching process.

This program can bypass records from the input file prior to matching. The bypass functionality is invoked within the rules parameter file. See CFWINKEY rules definition on how bypass functionality is coded in the rules parameter file.

3. A links file that indicates which matched records are linked together. These links are created using the unique record ID field as defined by CURR_KEY.

CFWINMAT Parameters

Note that all required parameters appear in bold and shaded. Table 7.11 Parameters for the CFWINMAT Program

Parameter Value Description

CURR_KEY Field name Name of the unique record ID key field in the DDL. REQUIRED

FINAL_TRANS_DDNAME File name Name of the output match key links file. REQUIRED

INP_DDL DDL name File name of the input DDL. REQUIRED

INP_RECORD_NAME Record name Record name of the input DDL. REQUIRED

Matcher 7-86 CFWINMAT Parameters

Table 7.11 Parameters for the CFWINMAT Program (Continued)

Parameter Value Description

LARGE_WINDOW_KEY_ Numeric Number of window key entries (specified in NUMBER * TABLE_DDNAME) that must be processed for this match.

LRECL Numeric Record length of the table that contains large window key entries. Length must include any platform-specific control characters.

LINK_KEY Field name Name of the new match key field in DDL. REQUIRED

MATCH_CODE Y, Blank Y appends the matching pattern number (three characters) and two blanks to the end of the records in the output match key links file.

MAX_WINDOW_SIZE Numeric Maximum number of records to store in window for matching. Default is all records.

OUT_DDNAME File name Name of the output file. REQUIRED

PRINT_NTH_COUNT Numeric Prints the count of every nth record read. If ‘0’ or not specified, no counts are reported.

RULES_DDNAME Parameter file Name of the parameter file for window key rules and locations of field and pattern parameter field names. REQUIRED

RULES_ECHO File name Name of the listing file of the processed file named in the RULES_DDNAME parameter.

SECOND_SIDE_TRN Y or Blank Set to Y to create a reciprocal link and write to the output file named in the FINAL_TRANS_DDNAME file.

Trillium Software System™ Batch User’s Guide CFWINMAT Parameters 7-87

Table 7.11 Parameters for the CFWINMAT Program (Continued)

Parameter Value Description

TABLE_DDNAME * Text File name of the table to contain large window key values. This is used when large Match windows will exist and the user wants to limit the number of records added so that memory usage is minimized and performance is maximized. Format of the table entries: two bytes for the window key number, followed by the window key value.

TIE_BREAKER_FIELD_NAME* Text DDL field name that contains the data to determine where to stop adding records to the Match window. When this field does not contain the same value as the records previously added to the Match window, the process will stop and the match will be executed for the set of records present in the Match window at that time.

Matcher 7-88 CFWINMAT Parameters

Table 7.11 Parameters for the CFWINMAT Program (Continued)

Parameter Value Description

TRANSITIVE_MODE Y or Blank Y – enables matched records to remain in the Match window for subsequent matching. With window matching, it is possible that ‘A ‘matches ‘B’ and A does not match ‘C’ due to differences in data elements but B matches C if B and C were being compared. Invoking this option allows B to be compared to C. Use of this parameter potentially increases the time it takes to match, due to more matching comparisons being performed. If the FINAL_TRANS_DDNAME output is being created, all additional matching links created by invoking this parameter is generated from the base record in the window, not the record it matched to. For example: If A matches B, the link created is A --> B If A does not match C, but B matches C, then the link created is A --> C Trillium Software recommends that you do not use propagation if this parameter is set to ‘Y.’

TRN_INP_DDNAME File name Name of the sorted input file.

WINDOW_KEY Field name The DDL field that contains the value of the window key.

WRITE_ALL_RECS Y or Blank Set to Y to write out all records to the output file.

Trillium Software System™ Batch User’s Guide Running CFWINMAT on UNIXand 32-bit PC Platforms 7-89

Running CFWINMAT on UNIXand 32-bit PC Platforms

Use the following command syntax to run cfwinmat: cfwinmat –parmfile parm_file_name –parmecho echo_file_name where:

cfwinmat Name of the driver program –parmfile Keyword that indicates the parameter file follows parm_file_name Name of the driver parameter file –parmecho Keyword that indicates the parameter echo file follows echo_file_name File used by –parmecho to store processing error information (Optional)

Matcher 7-90 IBM Mainframe Sample JCL for CFWINMAT

IBM Mainframe Sample JCL for CFWINMAT

The following sample JCL is used to run cfwinmat:

//*********************************************************** //* CFWINMAT - WINDOW MATCH //*********************************************************** //CFWINMAT EXEC PGM=CFWINMAT, PARM=’/-PARMFILE PF -PARMECHO PE’, // REGION=0M //STEPLIB DD DSN=&PROJPREF.LOADLIB,DISP=SHR // DD DSN=&PROJPREF.LINKLIB,DISP=SHR //PARSOUT DD UNIT=&UNIT,VOL=SER=&VOL,DISP=SHR, // DSN=&PROJPREF.DDLLIB(PARSOUT) //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //MATSTAT DD SYSOUT=* //SUSUDUMP DD DUMMY,DCB=BLKSIZE=133 //PF DD * *

Trillium Software System™ Batch User’s Guide IBM Mainframe Sample JCL for CFWINMAT 7-91

*********************************************************** * PARMFILE FOR CFWINMAT *********************************************************** WRITE_ALL_RECS Y INP_DDL PARSOUT INP_RECORD_NAME PARSOUT TRN_INP_DDNAME INPUT

WINDOW_KEY window_key_01 CURR_KEY record_id LINK_KEY match_link

RULES_DDNAMERULES RULES_ECHO

OUT_DDNAMEOUTPUT FINAL_TRANS_DDNAMELINKS //* //* INPUT FILE //* //INPUT DD DISP=SHR, // DSN=&PROJPREF.MATCH.INP //RULES DD DISP=SHR, // DSN=&PROJPREF.PARMLIB(RULES) //HHLDPATS DD DISP=SHR, // DSN=&PROJPREF.PARMLIB(HHLDPATS) //HHLDFLDS DD DISP=SHR, // DSN=&PROJPREF.PARMLIB(HHLDFLDS) //* //* OUTPUT FILES //* //OUTPUT DD UNIT=&UNIT, // VOL=SER=&VOL, // DISP=(NEW,CATLG,DELETE), // SPACE=(TRK,(200,100),RLSE), // DCB=(RECFM=FB,LRECL=&LRECL,BLKSIZE=&BLKSIZE), // DSN=&PROJPREF.MATCH.OUT //* //LINKS DD UNIT=&UNIT, // VOL=SER=&VOL, // DISP=(NEW,CATLG,DELETE), // SPACE=(TRK,(100,10),RLSE), // DCB=(RECFM=FB,LRECL=39,BLKSIZE=23166), // DSN=&PROJPREF.MATCH.LNK //*

Matcher 7-92 Tuning the Match Results

Tuning the Match Results

Tuning the Matcher involves running data through one of the Matcher driver programs and reviewing the resulting data. Matcher tuning is best performed using a window match process, because comparison information about the data is readily available in the form of match keys, patterns and statistics. This information indicates which Matcher business rules can be changed to improve the results.

By affecting Matcher business rules, conditions of ‘over matching’ (records that matched that should not have matched) or ‘under matching’ (records that should have matched, but did not) can be corrected. Tools within the Control Center easily facilitate the process of reviewing the matching results and modifying the business rules.

Statistics written to the Matcher statistics file provide a broad indication of how records were matched, suspected or not matched at all. Reviewing this file is a good place to begin the Matcher tuning process, but reviewing the resulting data based on a match key is the only way to make intelligent business rule decisions.

This section discusses how to review matched data, and provides the method to change the business rules. Because matching is dependent on specific customer business rules, each business rule change may be unique to every customer. Therefore this section provides the process to change the business rules, but cannot provide actual business rule change advice.

Trillium Software System™ Batch User’s Guide Getting Started 7-93

Getting Started

Once data is passed through the Matcher, a review of the Matcher statistics file is in order. This file provides information about the fields used to compare data between records, the match routines applied to these fields, and the number of records that were matched together by particular business rule patterns. (Other statistical information is available, but for the purposes of tuning, these three areas are of particular interest.)

Starting with the information displayed in the pattern section, it is possible to gain an understanding of how records have been brought together, Let’s have a look at the example section of a Matcher statistics file below: P 100 10 AA S 102 1005 AB P 104 11 BA

1. A good starting point is to look for suspect patterns (patterns beginning with ‘S’) that have a significant percentage of records that were brought together by that pattern. Note the pattern ID (102 - the three digit numeric following the pattern type) has a large percentage of records associated with it (1005). This is an indication that either the pattern or the score for a particular field should be modified to either match these records or split them apart. Moving onto the comparison fields used for the match, our example uses the following fields list entries, as shown in the statistics file: street_name scores(95,87,82) routines(streets) fldnames(pr_best_st_tl) last_name scores(100,98,94) routines(spelling) fldnames(pr_last_01)

2. From this we can determine the first letter grade in the pattern is resulting from the comparison of the street name data and the second is from the last name data. We know we want to analyze the data in the second comparison field because our pattern indicates this is the field that is different in the suspect pattern from others around it. 3. The next step is to perform data analysis of the last name field for records that suspected to pattern 102. This analysis is performed using the Control Center Matcher Tuner tool.

Matcher 7-94 Analyzing the Data

Analyzing the Data

After the pattern ID that requires evaluation has been chosen, the Matcher Tuner should be used to display the output of the Matcher. See Chapter 4 of the Control Center manual for more information.

When using the Matcher Tuner, you need to first select the level of match and pattern ID you want to view, then you can view the data contained in the last name field and decide the best course of action to resolve the suspect issue.

If you decide that records with the match pattern ID of 102 should match, then you simply need to edit the pattern in the appropriate parameter file and change the ‘S’ to a ‘P’. This can be accomplished through the Matcher Tuner. Launch the Matcher Field List Editor tool, selecting the appropriate pattern file, and changing the character. The next time the Matcher is run, the records will match.

If you decide that you do not want the records to match, you would follow the same sequence of events described above, but you would make the ‘S’ an ‘F’. This will prevent the records from matching when the conditions are met for pattern 102.

Other courses of action may be required if changing the pattern status is not enough to determine a pass or fail condition for the data provided. Suggestions may be to add additional fields to the match that can pull together or separate matches based on criteria in those fields.

Using “Tie-Breaking” Fields

You can alter the results of a match by introducing “tie-breaking” fields into the match process. For example, consider the following Individual Suspect matches: First Middle Last HSNO Street Name Postal Code Tax ID

John C Nicoli 12 Main 01879 00x11x22x John Nicoli 12 Main 01879 33x222x11

Trillium Software System™ Batch User’s Guide Using Parmvals with the Matcher Comparison Routines 7-95

Given a standard set of comparison routines to compare these two records, the middle name score would be the only field that does not yield an exact passing grade.

By introducing a field comparison of Tax_ID and by adjusting the appropriate patterns, these records could be categorized as either a Pass, Suspect, or Fail. The user’s selection of the proper comparison routine has a significant impact on the rating of a match through the grading system. Within each routine, points are deducted for various exceptions to matches.

The following sections define the routines, show how each comparison routine rates exceptions, and show how each routine returns its scores.

Using Parmvals with the Matcher Comparison Routines

Parmvals can be added to the selected Matcher Routines to provide scoring unique to specific data in the fields compared. The following table lists the Matcher comparison routines and their available parmval options.

Parmvals are case-sensitive.

Syntax house_number scores(98) routines(houseno) fldnames(pr_best_number) parmval(NORANGE)

Matcher 7-96 Comparison Routine and Parmval Details

Comparison Routine and Parmval Details

The following sections describe the scoring criteria for each Matcher Comparison routine and the associated Parmvals. ABSOLUTE Routine The ABSOLUTE routine compares two fields and looks for an EXACT match only. This table describes the scoring values.

Score Description 100 Exact match (including blank versus blank) 0 Any exception

FIELD 1 FIELD 2 RESULTING SCORE

JOHN JOHN 100 STEVE STEVE 100 LISA LISA 100 LISA LAURA 0 (BLANK) (BLANK) 100

Example FIRST_NAME SCORES(100,0) ROUTINES(ABSOLUTE) FLDNAMES(FIRST_NAME)

Trillium Software System™ Batch User’s Guide APTNO Routine 7-97

APTNO Routine The APTNO routine compares two apartment numbers. This routine assumes fields are right-justified. If the two fields do not match exactly, the routine uses special functionality to adjust the two field lengths and starting positions by excluding leading blanks. This table describes the scoring values.

Score Description 100 Exact match (excluding blank versus blank) 99 Blank field value versus anything 99 All zeroes field value versus anything 98 Blank field value versus blank field value 98 All zeroes field value versus all zeroes field value If the fields do not match according to the preceding criteria, the following actions are performed to determine if there is a match: 100 Remove all leading blanks and characters that are not digits or letters from both strings and left justify them. The resulting score would be 100. The routine then uses the following logic to deduct form the ‘100’ score, thus further adjusting the match: Deduct from 100 Reason for deduction – 5 For character errors (such as transposition, insertion, mismatches, or extra characters at the end of either field after considering insertions). – 10 If the number of character errors is more than 25% of the length of the longer field. – 25 If the number of character errors is more than 50% of the length of the longer field.

Example APARTMENT SCORES(100, 99) ROUTINES(APTNO) FLDNAMES(PR_DWEL1_NBR)

APTNO When Parmval (01)

The 01 parmval is used with the ‘APTNO’ routine, which is used to transpose,

Matcher 7-98 APTNO When Parmval (01)

and then compare, apartment numbers. This table describes these values.

Score Description 100 Exact match (excluding blank versus blank). For example: 25B vs. B25 scores 100 103 vs. 301 scores 65. 99 Blank field value versus blank field value. 99 All zeroes field value versus all zeroes field value. 98 Blank field value versus anything. 98 All zeroes field value versus anything.

Example

APARTMENT SCORES(100, 99, 98) ROUTINES(APTNO,NONBLANK) FLDNAMES(PR_DWEL1_NBR) PARMVAL(01)

FIELD 1 FIELD 2 RESULTING SCORE

T504 504T 100 (BLANK) (BLANK) 99 T504 (BLANK) 98

Trillium Software System™ Batch User’s Guide ARRAY1 Routine 7-99

ARRAY1 Routine

ARRAY1 When Parmval (ARRAY1,n)

The ARRAY1,n parmval is used with the ARRAY1 routine to compare segments of a field to segments of another field, and to determine the relationship to blank values. The ‘n’ is the number of bytes per segment to compare.

Blanks and zeros are considered to be the same value.

Example The phone_number field has a length of 50 characters, but phone numbers are typically 10 digits in length. By using ARRAY1,10, you can divide the 50-character field into five segments of 10, thereby enabling all five segments of one field to be compared to all five segments of another field in a record.

PHONE NUMBERS SCORES(100) ROUTINES(ARRAY1,NONZERO) FLDNAMES(PHONE_NUMBERS) PARMVAL(ARRAY1,10)

Scoring Values

Score Description 100 Exact match; at least one field is non-blank on both records. 90 At least one cell (non-blank) in one record can be found in any cell in the other record. 75 All cells are blank on one record and non-blank on the other record. 65 All cells are blank on both records.

Matcher 7-100 ARRAY2 Routine

ARRAY2 Routine

ARRAY2 When Parmval (ARRAY2,n)

The ARRAY2,n parmval is used with the ARRAY2 routine to compare segments of a field to segments of another field, and determine the relationship to blank values. The ‘n’ is the number of bytes per segment to compare.

Blanks and zeros are considered to be the same value.

Example The phone_number field has a length of 50 characters, but phone numbers are typically 10 digits in length. By using ARRAY2,10, you can divide the 50-character field into five segments of 10, thereby enabling all five segments of one field to be compared to all five segments of another field in a record. PHONE NUMBERS SCORES(100, 95) ROUTINES(ARRAY2,NONZERO) FLDNAMES(PHONE_NUMBERS) PARMVAL(ARRAY2,10)

Scoring Values

Score Description 100 Exact match; at least one is non-blank on both records. 95 At least one cell (non-blank) in one record can be found in the same cell in the other record. 75 All cells are blank on one record and non-blank on the other record. 65 All cells are blank on both records.

Trillium Software System™ Batch User’s Guide BUSNAME Routine 7-101

BUSNAME Routine

The BUSNAME routine compares two business names, assuming the two fields are left-justified and blank-filled. The routine also assumes that a word is any group of characters followed by a blank. The BUSNAME routine can successfully match:

An acronym against a list of words.

One word against a group of single-character words. Example BUSINESS_NAME SCORES(100,90,80) ROUTINES(BUSNAME) FLDNAMES(PR_BUSNAME_01)

Scoring Values

Score Description

100 Exact match, excluding blank versus blank

50 One or both fields are blank

The following logic is applied to further narrow the match. Based on this logic, the values will be deducted from a ‘100’ score.

Deduct from 100 Reason for deduction

– 1 For each extra word, if there were no other errors

– 3 For each extra word, if there were any word or letter errors

– 3 For each inserted word (also count one word error)

– 4 For each word transposition (also count one word error)

– 15 For extra words, if the only matches were single letter exact substrings and there were less than three such matches

If two words are not equal and the preceding rules for word errors do not apply, the words are compared letter for letter, and the following deductions are taken.

– 1 For each doubled letter (also count one letter error)

Matcher 7-102 BUSNAME When Parmval (COMPACT)

Score Description

– 2 For each transposed letter (also count one letter error)

– 2 For each inserted letter (also count one letter error)

– 2 For each mismatch (also count one letter error)

– 2 For any extra characters if one word is longer than the other (also count letter errors)

– 10 Number of word errors is more than one third the greater number of words in both fields

– 10 Number of character errors is more than 25% the greater number of characters in both fields

– 25 Number of word errors is greater than 50% of the smaller number of words in both fields

– 25 If number of character errors is greater than 50% of the smaller number of characters in both fields

BUSNAME When Parmval (COMPACT)

The COMPACT parmval is used with the BUSNAME routine to cause the routine to perform character compression and score the resulting strings as described in the table below.

Trillium Software System™ Batch User’s Guide BUSNAME When Parmval (SORT) 7-103

Example business_name scores(100,90) routines(busname) fldnames(pr_busname_01) parmval(COMPACT)

Scoring Values

Score Description

100 Two strings are equal and neither are blanks. Otherwise, if the strings are equal: Remove all vowels starting with the second character for each string Remove the second character of all double characters for each string Left–compress both strings.

90 Length of the compressed string is greater than 8.

85 Length of the compressed string is between 5 and 8.

75 Length of the compressed string is between 1 and 4.

0 Strings are not equal.

BUSNAME When Parmval (SORT)

The SORT parmval is used with the BUSNAME routine to cause the routine to perform word sorting and score the resulting strings as described in the scoring table below. Example BUSINESS_NAME SCORES(95,50) ROUTINES(BUSNAME) FLDNAMES(PR_BUSNAME_01) PARMVAL(SORT)

RECORD 1 MASSACHUSETTS UNIVERSITY–> MASSACHUSETTS UNIVERSITY RECORD 2 UNIVERSITY MASSACHUSETTS–> MASSACHUSETTS UNIVERSITY

After applying the SORT parmval, in this example, both records contain ‘Massachusetts University.’

Matcher 7-104 BUSNAME When Parmval (DI)

Scoring Values

Score Description

100 Exact match scores

95 Sorted exact match scores

50 Blank versus blank scores

0 Anything else scores

BUSNAME When Parmval (DI) Some comparison routines convert non-English characters into digraphs (pairs of English characters) whenever these characters are within a field. For example, the character, β, appears in the German spelling of street, straβe. That character is converted to the digraph, SS, when encountered in a field name. This parmval converts these non-English characters into the appropriate digraphs as listed below:

From character To digraph ÄAE ýOE ÆAE ÅAA ÖOE ÜUE bSS

Example business_name scores(100,90,85) routines(digraphs) fldnames(pr_busname_01) parmval(DI)

Trillium Software System™ Batch User’s Guide DATE Routine 7-105

DATE Routine

The DATE routine compares two date fields. The date format is YYYYMMDD. Example

DATE scores(100,75) routines(date) fldnames(date) Field 1 Field 2 Score

20030214 20030214 100 20020217 20020222 75

Scoring Values

Score Description

100 Dates are equal and neither is zero.

75 One of the following conditions has been met: Both years and months are equal, and not zero (20040214 vs. 20040218) Both months and days are equal, and not zero. (20040214 vs. 20030214) Both years and days are equal and not zero. (20040214 vs. 20040414)

50 Either date is zero.

5 All other cases.

0 Difference between years is greater than or equal to ( >=) 16 and months and days are not equal.

Matcher 7-106 DIFFER Routine

DIFFER Routine

The DIFFER routine was designed to compare two 9-byte numeric field values. This routine uses the following five special parmvals to find matching scores.

Parmval [DIFF90] [DIFF80] [DIFF70] [DIFF60] [DIFF50] DIFFER Routine Parmvals The Parmval [DIFF90] [DIFF80] [DIFF70] [DIFF60] [DIFF50] can be any user- defined value, but the parmvals should be listed in ascending order.

For example, parmval ([1][2][3][4][5]) parmval ([10][20][30][40][50])

Based on the numerics that exist in the records, the difference between the two records determines what score will be used.

Each of the parameter values must be enclosed in square brackets and the entire string of bracketed parmvals must be enclosed in parentheses ([ ]).

Example Field 1 = 000000098 Field 2 = 000000100

000000100–000000098 = 000000002 parmval ([1][2][3][4][5])

Subtracting Field1 from Field2 yields a value of ‘2.’ (100–98 = 2) because 2 is less then or equal to the value in the second parmval (in this case, another 2), the score returned is 80.

Trillium Software System™ Batch User’s Guide Scoring Values 7-107

Scoring Values

Score Description

100 Exact match, including 0 versus 0.

90 Difference between A versus B is less than or equal to [DIFF90]. Based on first parmval value.

80 Difference between A versus B is less than or equal to [DIFF80]. Based on second parmval value.

70 Difference between A versus B is less than or equal to [DIFF70]. Based on third parmval value.

60 Difference between A versus B is less than or equal to [DIFF60].

50 Based on fifth parmval value. Difference between A versus B is less than or equal to [DIFF50].

40 Any non-numeric digit.

0 Any other difference

Example different scores(100,95,50) routines(differ) fldnames(score)

FLAG10 Routine

The FLAG10 routine determines the relationship of a 1 or a 0 to a blank value.

Blanks and zeros are considered to be the same value.

Matcher 7-108 FLAGFM Routine

Scoring Values

Score Description

100 1 versus 1

95 0 versus 0

90 0 versus 1

85 Blank versus 1

80 Blank versus 0

75 Blank versus blank

0Anything else

Example street_name scores(100, 95) routines(FLAG10) fldnames(flagcode)

FLAGFM Routine

The FLAGFM routine determines the relationship between the female gender and anything else.

Blanks and zeros are considered to be the same value.

Example gender scores(100) routines(FLAGfm,nonblank)fldnames(pr_gender_01)

Trillium Software System™ Batch User’s Guide FLAGGN Routine 7-109

Scoring Values

MMale FFemale NAmbiguous Blank Unknown

Score Description (Field 1 versus Field 2)

100 F versus F F versus N F versus Blank N versus N

0 Any other comparison

FLAGGN Routine

The FLAGGN routine determines the relationship between genders. Blanks and zeros are considered to be the same value Example gender scores(65) routines(FLAGgn,nonblank)fldnames(pr_gender_01)

Matcher 7-110 FLAGMF Routine

Scoring Values

M Male F Female A Ambiguous U Unknown Z Not applicable

Score Description (Field 1 versus Field 2)

100 M vs M F vs F

75 M vs A; M vs U; M vs Z

F vs A; F vs U; F vs Z

0M versus F F versus M

100 A versus A

75 A versus U; A versus Z; A versus M; A versus F

65 U versus U

75 U versus A; U versus M; U versus F

65 Z versus Z; Z versus U; U versus Z

75 Z versus M; Z versus F; Z versus A

0 Any other comparison

FLAGMF Routine

The FLAGMF routine is used to determine the relationship between genders.

Trillium Software System™ Batch User’s Guide Scoring Values 7-111

Example gender scores(100, 65) routines(flagmf,nonblank) fldnames(pr_gender_01)

Scoring Values

M = Male; F = Female; N = Ambiguous

Score Description

100 M versus M

75 M versus N M versus blank

0 M versus F

100 F versus F

75 F versus N; F versus blank

100 N versus N

75 N versus blank

65 Blank versus blank

FLAGYN Routine

The FLAGYN routine is used to determine the relationship between text flagged as either ‘YES’ or ‘NO’.

Matcher 7-112 FRSTNAME Routine

Example flagyn scores(75) routines(flagyn,nonblank) fldnames(flagcode)

Scoring Values

Score Description

100 Yes versus yes

95 No versus no

90 Yes versus no

85 Blank versus yes

80 Blank versus no

75 Blank versus blank

0 Anything else

FRSTNAME Routine

The FRSTNAME routine compares two first names, assuming the fields are left- justified and blank-filled. If the two fields do not match exactly, the routine calculates two field lengths by excluding trailing blanks.

Trillium Software System™ Batch User’s Guide FRSTNAME Routine 7-113

Example first_name scores(99,98,95) routines(frstname) fldnames(pr_first_display_01)

Scoring Values

Table 7.12 Scoring for FRSTNAME Routine

Score Description

100 Exact match (excluding blank versus blank)

99 Blank field value versus blank field value

98 Blank field value versus anything

95 Shorter field value is one character in length and it equals the first character of the longer field. For example: J versus John

90 If shorter field value is greater than one character in length and it is an exact starting substring of the longer field. Ex: Jo versus Jo- Anne

80 If there is a mismatch of the first characters of both fields

The following logic is applied to further narrow the match. Based on this logic, the values are deducted from a ‘95’ score:

Deduct from Reason for Deduction 95

– 2 For character errors (transposition, insertion, mismatch, extra characters at the end of either field after taking insertions into consideration)

– 10 If the number of character errors is more than 25% of the length of the longer field

– 25 If the number of character errors is more than 50% of the length of the longer field

Matcher 7-114 FRSTNAME When Parmval = SYMETRIC

No additional scoring deductions occur if:

There is only one error and one extra character difference between the two fields

There is only a one character difference in the lengths of the two fields

Only the first field is greater than 5 characters in length. (See “SYMETRIC” parmval if you want to prevent a scoring deduction when either or both fields are more than 5 characters.) FRSTNAME When Parmval = SYMETRIC The SYMETRIC parmval is used with the FRSTNAME routine to prevent a scoring deduction when either or both fields are more than five characters. FRSTNAME When Parmval = DI Some comparison routines convert non-English characters into digraphs (pairs of English characters) when these characters are encountered within a field. For example, the character, β, appears in the German spelling of street, straβe. That character converts to the digraph, SS, when encountered in a field name.

The DI Parmval converts these non-English characters into the appropriate digraphs listed in the following table.

From character To digraph ÄAE ýOE ÆAE ÅAA ÖOE ÜUE bSS

FRSTNAME When Parmval = TSB

Use the TSB parmval with the FRSTNAME routine to enable ‘backwards compatibility’ to present scores from previous releases.

Trillium Software System™ Batch User’s Guide FRSTNAME When Parmval = INITIAL 7-115

Example first_name scores(99,98) routines(frstname) fldnames(pr_first_display_01) parmval(TSB)

Scoring Values

Table 7.13 Scoring for TSB Parmval

Score Description

99 Yields a 75

98 Yields a 75

95 Yields a 90

90 Yields a 95

FRSTNAME When Parmval = INITIAL The INITIAL parmval is used with the FRSTNAME routine to score an exact match between initials lower than an exact match between whole names.

Matcher 7-116 GENER Routine

Example first_name scores(100,94) routines(frstname) fldnames(pr_first_display_01) parmval(INITIAL)

Scoring Values

Score Description

100 Exact match. For example: John versus John

94 Exact match when using initials only. For example: J versus J

GENER Routine

The GENER routine compares two generation values, assuming fields to be two characters in length. Example generation scores(100,95,90) routines(gener,nonblank) fldnames(pr_gener_01)

Scoring Values

Score Description

100 Exact match (including blanks versus blanks)

95 Blank value versus a “SR” value

90 Blank value versus a non-blank non “SR” value

0 Any other exception

Trillium Software System™ Batch User’s Guide GENER When Parmval (95) 7-117

GENER When Parmval (95)

The 95 Parmval is used with GENER to increase the possibility of getting a score of ‘95’. Example generation scores(100,95,90) routines(gener,nonblank) fldnames(pr_gener_01)parmval(95)

Scoring Values

Score Description

100 Exact match (including blanks versus blanks)

95 Blank value versus a “SR”, “01” or “JR” value.

90 Blank value versus a non-blank, non-“SR” value

0 Any other exception

HOUSENO Routine

The HOUSENO routine compares two house numbers, assuming the fields are right-justified. If the two fields do not match exactly, the routine calculates two field lengths and starting positions by excluding leading blanks. Example house_number scores(100,99,98) routines(houseno) fldnames(pr_best_number)

Matcher 7-118 HOUSENO Routine

Scoring Values

Score Description

100 Exact match (excluding blank versus blank)

99 Blank field value versus non-blank

99 All zeros field value versus all non-zeros (assumes field of same length)

98 Blank field value versus blank field value

98 All zeros field value versus all zeros field value (assumes field of same length)

If the fields do not match according to the preceding criteria, then the following actions are performed to determine if there is a match: 98 Check to see if either of the numbers is in the form N-N. If both numbers are, then no exact match exists; however, If only one of the numbers is of the form N-N, then check if the other number is in the range N-N. If it is, then return a score of 98. (Example, 103 is in the range of 101-105.) If the range extends more than 6, it returns a score of 80. For example: 12-19.

100 Remove all leading blanks and characters that are not digits or letters from both strings and left-justify them. Return 100 if there is an exact match.

Deduct from Reason for deduction 100 – 5 For character errors (transposition, insertion, mismatch, extra characters at the end of either field after considering insertions).

– 10 If the number of character errors is more than 25% of the length of the longer field.

– 25 If the number of character errors is more than 50% of the length of the longer field.

Trillium Software System™ Batch User’s Guide HOUSENO When Parmval (NORANGE) 7-119

HOUSENO When Parmval (NORANGE)

The NORANGE parmval is used with the HOUSENO routine to match a house number within a range. For example, if ‘86’ is matched to the range 83-87; if the number is within the house number range, it returns a score of 98.

Because 86 falls within the range of 83-87, a score of 98 is returned. For this parmval, the parity of the range is not relevant. For example, if the house range is odd (83-87), the matching house number does not need to be odd. Scoring Values

Score Description

98 If a numeric value is found with a range of numbers, the returned score is 98.

If the difference in the range number is greater than 6, a low score is still returned. For example, if the range is 12-19. HOUSENO When Parmval (PARITY)

The PARITY parmval is used with the HOUSENO routine to match a house number within a range. This parmval takes into consideration the parity of the range. The returned score indicates whether parity exists, no parity exists, or whether the initial value is the actual value at either end of the range.

Matcher 7-120 HOUSENO When Parmval (01)

Scoring Values

Score Description 98 If the initial value is found within the range and has the same odd/ even parity. Ex: "27" vs. "25-29" 97 If the initial value is found within the range, and has the same odd/ even parity, and is at one end of the range. Ex: "25" vs. "25-29" or "29" vs. "25-29" 96 If the initial value is found within the range but not in the same odd/ even parity. Ex: "26" vs. "25-29"

HOUSENO When Parmval (01)

The 01 parmval is used with the HOUSNO routine for transposing and matching house numbers. Example house_number scores(98) routines(houseno) fldnames(pr_best_number)parmval(01)

Trillium Software System™ Batch User’s Guide MXDNAME Routine 7-121

Scoring Values

Score Description

100 Exact match (excluding blank vs. blank) For example: 206W versus W206 would score 100.

99 Blank field value versus Blank field value

99 All zeros field value versus all non-zeros field value

98 Blank field value versus non-blank

98 All zeroes field value versus anything

MXDNAME Routine

The MXDNAME routine applies SUBSTRNG-like logic to two records, when the name form values are different. For this routine to work, the window keys must not contain the name form value.

The MXDNAME routine requires the following three fields:

Field1 Name form User-specified field position. Should only contain values of: field (1=personal name, 2=business name and 3=name reject). Field2 First field in Compared to the Record 2 data in this field when used in Record 1 comparisons of 2 personal records (when both name forms = 1).

Compared to the data in Field 3 on Record 2 when comparing a personal record to either a business or reject record (name form 1 versus name forms 2 or 3). See the following example. Field3 Second field Compared to the data in this field in Record 2 when used in comparisons of 2 business and/or reject records (both name forms = 2 and/or 3).

This field in Record 1 will be compared to the data in Field 2 on Record 2 when comparing either a business or reject record to a personal record (name forms 2 or 3 versus name form 1).

Matcher 7-122 Scoring Values

Performs the test only if the form field value is different between two records. Example Field 1 Field 2 Field 3 Record 1 1 Smith Jones Record 2 2Jones Co.Smith, Inc.

In this example, because the name form fields are different between records, (1 vs 2) the program matches ‘Smith’ to ‘Smith, Inc.’ In this case, the score returned would be 99, because ‘Smith’ resides at the beginning of ‘Smith, Inc.’

If the name form fields were the same (both 1 or both 2), the system would match ‘Smith’ to ‘Jones Co.’ or ‘Jones’ to ‘Smith, Inc.’ Both of these cases would score ‘0.’ Scoring Values

Score Description

100 First string is exactly the second string

99 First string resides at the beginning of the second string

98 First string resides at the end of the second string

97 First string resides within the second string

0Any other value

Example mixed name scores(100,99, 98) routines(mxdname) fldnames(pr_nmform_01,pr_last_01, pr_busname_01)

Trillium Software System™ Batch User’s Guide NYSIIS Routine 7-123

NYSIIS Routine

Matches two strings of data, using an algorithm based on a standard NYSIIS algorithm. The maximum field comparison length is 256 characters. “IMPROVED” NYSIIS ALGORITHM 1. If there are any numerics in the string, return a blank code. 2. Blank out all non-alphabetic characters. 3. If string is all blanks, return a blank code. 4. Capitalize all letters in the string. 5. Translate the following first character(s) of the string:

From characters To characters MAC MCC KN NN KC PH and PF FF WR and RH RR DG GG

6. Translate the following last character(s) of the string:

From characters To characters S or Z Blank EE or IE or YE Y DT or RT or RD D NP or ND N IX IC EX EC JR or SR Blank

7. Set first character of code with first character of string.

Matcher 7-124 NYSIIS Routine

8. Translate these character(s) that occur after the first character of string:

From characters To characters EV AF E, I, O, U A Y A (when A is not the last character) QG Z S M or KN N KC PH FF H Replace with preceding character, when preceded or followed by vowel. W replace with preceding character when preceded by a vowel. SCH SSA, when SCH is at the end of the string, OR, SSS, when SCH is NOT at the end of the string SH SA when SH is at end of string GHT TTT DG GG WR RR

9. Add a letter to a code if not same as preceding letter of code; do this until a 10-character code is created or until the string has no more characters. 10. Change the last letter(s) of code:

From characters To characters SBlank AY Y ABlank

Example customer_flag scores(100,80) routines(NYSIIS) fldnames(pr_last_name)

Trillium Software System™ Batch User’s Guide ONECOM Routine 7-125

Scoring Values

Score Description

100 Strings are equal (excludes blank versus blank)

80 Blank code versus blank code

75 Blank code versus non-blank code

0 Code is not equal

ONECOM Routine

The ONECOM routine tests for the presence of one commercial record. Example name_form scores(100) routines(onecom) fldnames(pr_nmform)

Scoring Values

Score Description

100 Either of the fields passed has a value of two ‘2.’

0 Any other value

Matcher 7-126 PARTIAL1 Routine

PARTIAL1 Routine

The PARTIAL1 routine is used to determine the relationship to a blank value.

Blanks and zeros are considered to be the same value.

Example postal_code scores(100,0) routines(partial1) fldnames(pr_gout_postal_code)

Scoring Values

Score Description

100 Exact match (excluding blanks versus blanks)

75 Blank field value versus a non-blank field value

65 Blank field value versus a blank field value

0 Any other value

PARTIAL1 When Parmval (10) The 10 parmval is used with the Partial1 routine to determine the relationship of a ‘0’ or ‘1’ to a blank value. Example date_of_birth scores(100,95) routines(partial1,nonblank) fldnames(flagcode) parmval(10)

Trillium Software System™ Batch User’s Guide PARTIAL1 When Parmval (FM) 7-127

Scoring Values

Score Description

100 1 versus 1

95 0 versus 0

90 0 versus 1

85 Blank versus 1

80 Blank versus 0

75 Blank versus blank

0 Anything else

PARTIAL1 When Parmval (FM)

The FM parmval is used with PARTIAL1 to compare the female gender to anything else.

MMale FFemale NAmbiguous Blank Unknown

Matcher 7-128 PARTIAL1 When Parmval (GN)

Example gender scores(85)routines(partial1,nonblank) fldnames(pr_gender_01) parmval(FM)

Scoring Values

Score Description (Field 1 versus Field 2)

100 F versus F; F versus N; F versus Blank; N versus N

0 Any other comparison

PARTIAL1 When Parmval (GN) The GN parmval is used with the PARTIAL1 routine to compare genders to anything else. Example gender scores(65) routines(partial1,nonblank) fldnames(pr_gender_01)parmval(GN)

MMale FFemale AAmbiguous UUnknown Z Not applicable

Score Description (Field 1 versus Field 2)

100 M versus M

75 M versus A; M versus U; M versus Z

0M versus F

Trillium Software System™ Batch User’s Guide PARTIAL1 When Parmval (MF) 7-129

Score Description (Field 1 versus Field 2)

100 F versus F

75 F versus A; F versus U; F versus Z

0F versus M

100 A versus A

75 A versus U; A versus Z; A versus M; A versus F

65 U versus U U versus Z Z versus Z

75 U versus A; U versus M; U versus F

Z versus M; Z versus F; Z versus A

65 Z versus U

0Any other comparison

PARTIAL1 When Parmval (MF) The MF parmval is used with the PARTIAL1 routine to determine the relationship between genders. Example gender scores(100)routines(partial1,nonblank) fldnames(pr_gender_01)parmval(MF)

Matcher 7-130 PARTIAL1 When Parmval (MU)

Scoring Values

Score Description

100 M versus M

75 M versus N; M versus Blank

0M versus F

100 F versus F

75 F versus N; F versus Blank

100 N versus N

75 N versus Blank

65 Blank versus Blank

PARTIAL1 When Parmval (MU)

The MU parmval is used with the PARTIAL1 routine to determine the relationship between genders. Example gender scores(65) routines(partial1,nonblank) fldnames(pr_gender_01)parmval(MU)

Trillium Software System™ Batch User’s Guide PARTIAL1 When Parmval (YN) 7-131

Scoring Values

Score Description

100 M versus M

75 M versus U M versus blank

0 M versus F

100 F versus F

75 F versus U F versus blank

100 U versus U

75 U versus blank

65 Blank versus blank

PARTIAL1 When Parmval (YN)

The YN parmval is used with the PARTIAL1 routine to determine the relationship between text flagged as either ‘YES’ or ‘NO’.

Matcher 7-132 Scoring Values

Scoring Values

Score Description

100 Yes versus Yes

95 No versus no

90 Yes versus no

85 Blank versus yes

80 Blank versus no

75 Blank versus blank

0 Anything else

PARTIAL1 When Parmval (ARRAY1,n)

The ARRAY1,n parmval is used with the PARTIAL1 routine to compare segments of a field to segments of another field, and to determine the relationship to blank values. The ‘n’ is the number of bytes per segment to compare. Blanks and zeros are considered to be the same value. Example phone_numbers scores(100, 95) routines(partial1,nonblank) fldnames(phone_number) parmval(ARRAY1,10)

Trillium Software System™ Batch User’s Guide PARTIAL1 When Parmval (ARRAY2,n) 7-133

Scoring Values

Score Description

100 Exact match and at least one is non-blank on both records.

90 At least one cell (non-blank) in one record can be found in any cell in the other record

75 All cells are blank on one record and non-blank on the other record.

65 All cells are blank on both records.

The phone_number field has a length of 50 characters, but phone numbers are typically 10 digits in length. By using ARRAY1,10, you can divide the 50-character field into five segments of 10, thereby enabling all five segments of one field to be compared to all five segments of another field in a record. PARTIAL1 When Parmval (ARRAY2,n)

The ARRAY2,n parmval is used with the PARTIAL1 routine to compare segments of a field to segments of another field, and to determine the relationship to blank values. The ‘n’ is bytes per segment to compare.

Blanks and zeros are considered to be the same value.

Example The phone_number field has a length of 50 characters, but phone numbers are typically 10 digits in length. By using ARRAY2,10, you can divide the 50-character field into five segments of 10, thereby enabling all five segments of one field to be compared to all five segments of another field in a record.

phone_numbers scores(100, 95) routines(partial1,nonzero) fldnames(phone_number) parmval(ARRAY2,10)

Matcher 7-134 PARTIAL2 Routine

Scoring Values

Score Description

100 Exact match and at least one is non-blank on both records.

95 At least one cell (non-blank) in one record can be found in the same cell in the other record

75 All cells are blank on one record and non-blank on the other record.

65 All cells are blank on both records.

PARTIAL2 Routine The PARTIAL2 routine is used to determine the relationship to a blank value. Example name scores(100) routines(partial2,nonblank) fldnames(pr_first_01)

Scoring Values

Score Description

100 An exact match (including blanks versus blanks)

75 Blank field value versus a nonblank field value

0 Any other value

Trillium Software System™ Batch User’s Guide PARTIAL2 When Parmval (DATE) 7-135

PARTIAL2 When Parmval (DATE)

The DATE parmval is used with the PARTIAL2 routine to determine the relationship between dates.

Syntax

DATE; source_date_mask; target_init_mask where:

DATE Keyword source_date_mask Format of dates that are to be matched. Use Y for year, M for month, D for day, and X for skipping the current year digit. REQUIRED. target_init_mask Indicates to the matching routine which parts of the date is missing and what to fill it with. Format is yyyymmddd. Optional.

Scoring Values

Score Description

100 Dates are equal and neither is zero.

75 One of the following conditions has been met:

Both years and months are equal, and not zero

Both months and days are equal, and not zero.

Both years and days are equal and not zero.

50 Either year, month, and day is zero.

5 All other cases.

0 Difference between years is greater than or equal to ( >=) 16; MMDDs are not equal.

Matcher 7-136 PARTIAL2 When Parmval (SOUNDEX1)

Example Field 1 Field 2 Parm Value Score

19970203 19970203 DATE;YYYYMMDD 100 19970203 19970204 DATE;YYYYMMDD 75 1223021 12302 DATE;YYMMDD 75 1223021 12302 DATE;MMDDYY 5 1997231 1997231 DATE;YYYYDDD;000001000 100

In this example, the match routine converts dates from 1997231 to 199701231. There is no month in this example; therefore, the target_init_ format parameter is required. The 01 in positions 5 and 6 represents the month.

1997231 1966231 DATE;YYXXDDD;000001000 100

The last two digits of the years are irrelevant in this example; the characters XX are used to skip the comparisons of those two digits.

Example date_of_birth scores(100,75) routines(partial2,nonblank) fldnames(date_of_birth) parmval(DATE; YYMMDD)

PARTIAL2 When Parmval (SOUNDEX1)

The SOUNDEX1 parmval is used with the PARTIAL2 routine. It uses a special SOUNDEX algorithm to perform its functions. SOUNDEX ALGORITHM 1. Capitalize all letters in the string. 2. Retain the first letter of the string. 3. After the first position, all of the following letters are converted to blank: A, E, I, O, U, H, W, Y

Trillium Software System™ Batch User’s Guide PARTIAL2 When Parmval (SOUNDEX1) 7-137

4. Change letters from the following sets into the corresponding digits given:

From characters To digit B, F, P, V 1 C, G, J, K, Q, S, X, Z 2 D, T 3 L4 M, N 5 R6

5. Remove all consecutive pairs of duplicate digits and blanks from the string that resulted after step 4. 6. Return the first four characters of the string, padded with trailing zeros, if needed.

Matcher 7-138 PARTIAL2 When Parmval (RSOUNDEX1)

Sample Scores

Score String 1 String 2 New String 1 New String 2

50 king

100 ng ngai N200 N200

100 patroff pietropaolo P361 P361

0 bill will B400 W400

100 func funk F520 F520

100 nicoli nicolie N240 N240

Scoring Values

Score Description

0 Either both strings are blank, or neither string is blank and they are not equal.

50 One string is blank; the other string is non-blank.

100 Strings are equal.

Example name scores(100,50) routines(partial2) fldnames(pr_first_01) parmval(SOUNDEX1)

PARTIAL2 When Parmval (RSOUNDEX1)

The RSOUNDEX1 parmval is used with the PARTIAL2 routine to match two strings using the SOUNDEX algorithm. The input strings are reversed before the algorithm is used to build the code. For example, if it was ‘JOHN SMITH’, before building the code, the system reads the string as ‘HTIMS NHOJ.’

Trillium Software System™ Batch User’s Guide PARTIAL2 When Parmval (RSOUNDEX1) 7-139

SOUNDEX ALGORITHM

1. Capitalize all letters in the string. 2. Retain the first letter of the string. 3. After the first position, all of the following letters are converted to blank: A, E, I, O, U, H, W, Y 4. Change letters from the following sets into the corresponding digits given:

From characters To digits B, F, P, V 1 C, G, J, K, Q, S, X, Z 2 D, T 3 L4 M, N 5 R6

5. Remove all consecutive pairs of duplicate digits and blanks from the string that resulted after step 4. 6. Return the first four characters of the string, padded with trailing zeros, if needed.

Matcher 7-140 Scoring Values

Sample Scores

Score String 1 String 2 New String 1 New String 2

50 king reversed to: gnik Gn 2

100 ng ngai N200 N200

100 patroff pietropaolo P361 P361

0 bill will B400 W400

100 func funk F520 F520

100 nicoli nicolie N240 N240

Scoring Values

Score Description

0 Either both strings are blank, or neither string is blank and they are not equal.

50 One string is blank; the other string is non-blank.

100 Strings are equal.

Example name scores(100,50) routines(partial2) fldnames(pr_first_01) parmval(RSOUNDEX1)

Trillium Software System™ Batch User’s Guide PARTIAL2 When Parmval (SOUNDEX2) 7-141

PARTIAL2 When Parmval (SOUNDEX2)

The SOUNDEX2 parmval is used with the PARTIAL2 routine to match two strings, using an improved Soundex algorithm.

Improved SOUNDEX Algorithm

1. Capitalize all letters in the string. 2. Replace all non-leading vowels with A. 3. Transform the following prefixes:

From characters To characters MAC MCC KN NN KC PF or PH FF SCH SSS

4. Transform these letter combinations that occur after the first position:

From characters To characters AV AF AW A CAAN TAAN DT DG GG K C KN NN MN NST NSS PH FF QG SCH SSS ZS

Matcher 7-142 PARTIAL2 When Parmval (SOUNDEX2)

5. Replace H with A, unless it is preceded and followed by A (for example, AHA). 6. Remove all A characters except for the leading A. 7. Remove all but the first of repeating adjacent character substrings. 8. (For RelLink only) The comparison is limited to the first four characters returned for each field used in the improved SOUNDEX comparison routine. Scoring Values

Score Description

0 Either both strings are blank, or neither string is blank and they are not equal.

50 One string is blank; the other string is non-blank.

100 Strings are equal.

Example name scores(100,50) routines(partial2) fldnames(pr_first_01) parmval(SOUNDEX2)

Trillium Software System™ Batch User’s Guide PARTIAL2 When Parmval (RSOUNDEX2) 7-143

Sample Scores

Score String 1 String 2 New String 1 New String 2

100 ng ngai NG NG

100 nicoli nicolie NCL NCL

50 king

0 patroff pietropaolo PTRF PTRF

0 bill will BL WL

PARTIAL2 When Parmval (RSOUNDEX2)

The RSOUNDEX2 parmval is used with the PARTIAL2 routine to match two strings, using the improved SOUNDEX algorithm. The input strings are reversed before the algorithm is used to build the code.

For example, if the string was ‘JOHN SMITH’, before building the code, the string is read as ‘HTIMS NHOJ.’

Improved SOUNDEX Algorithm

1. Capitalize all letters in the string. 2. Replace all non-leading vowels with A. 3. Transform the following prefixes:

From characters To characters MAC MCC KN NN KC PF or PH FF SCH SSS

Matcher 7-144 PARTIAL2 When Parmval (RSOUNDEX2)

4. Transform the following letter combinations that occur after the first position:

From characters To characters AV AF AW A CAAN TAAN DT DG GG KC KN NN MN NST NSS PH FF QG SCH SSS ZS

5. Replace H with A, unless it preceded and followed by A (for example, AHA). 6. Remove all A characters except for the leading A. 7. Remove all but the first of repeating adjacent character substrings. 8. (For RelLink only) The comparison is limited to the first four characters returned for each field used in the improved SOUNDEX comparison routine.

Trillium Software System™ Batch User’s Guide PARTIAL2 When Parmval (STATUS) 7-145

Scoring Values

Score Description

0 Either both strings are blank, or neither string is blank and they are not equal.

50 One string is blank; the other string is non-blank.

100 Strings are equal.

Sample Scores

Score String 1 String 2 New String 1 New String 2

100 ng ngai NG NG

50 king

0 bill will BL WL

PARTIAL2 When Parmval (STATUS)

The STATUS parmval is used with the PARTIAL2 routine to compare fields of two different records for a specified literal value.

Syntax: STATUS=literal_value where:

STATUS The keyword. literal_value The value to check

Matcher 7-146 PARTIAL2 When Parmval (NYSIIS)

Scoring Values

Score Description

100 Literal value versus Literal value

90 Literal value versus Non-literal value

80 Non-literal value versus Non-literal value

Example service_type scores(100) routines(partial2) fldnames(service_type_field)parmval(STATUS=RESIDENTIAL)

Record 1 service_type_field RESIDENTIAL

Record 2 service_type_field RESIDENTIAL DWELLING

In this example, the return score will be 100 because the literal value ‘residential’ appears in both fields in both records.

PARTIAL2 When Parmval (NYSIIS)

The NYSIIS parmval is used with the PARTIAL2 routine to match two strings, using an algorithm based on a standard a NYSIIS algorithm. “Improved” NYSIIS Algorithm

1. If there are any numerics in the string, return a blank code. 2. Blank out all non-alphabetic characters. 3. If string is all blanks, return a blank code. 4. Capitalize all letters in the string. 5. Translate the following first character(s) of the string:

From characters To characters MAC MCC

Trillium Software System™ Batch User’s Guide “Improved” NYSIIS Algorithm 7-147

KN NN KC PH and PF FF WR and RH RR DG GG

6. Translate the following last character(s) of the string:

From characters To characters S or Z Blank EE or IE or YE Y DT or RT or RD D NP or ND N IX IC EX EC JR or SR Blank

7. Set first character of code with first character of string. 8. Translate the following character(s) that occur after the first character string:

From characters To characters EV AF E, I, O, U A Y A, when not last character QG ZS M or KN N KC PH FF H Replace with preceding character when preceded or followed by a vowel. W Replace with preceding character when preceded by a vowel. SCH SSA when SCH is at end of string, or, SSS when SCH is not at end of string

Matcher 7-148 “Improved” NYSIIS Algorithm

SH SA when SH is at end of string, or, SS when SH is not at end of string GHT TTT DG GG WR RR

9. Add letter to code if not same as preceding letter of code, do until a 10 character code is created or until string has no more characters. 10. Change last letter(s) of code:

From characters To characters SBlank AY Y ABlank

Scoring Values

Score Description

100 Codes are equal (excludes blank versus blank)

80 Blank code versus blank code

75 Blank code versus non-blank code

0Code is not equal

Example first_name scores(100,80) routines(partial2) fldnames(pr_first_01)parmval(NYSIIS)

Trillium Software System™ Batch User’s Guide PARTIAL2 When Parmval (RNYSIIS) 7-149

PARTIAL2 When Parmval (RNYSIIS)

The RNYSIIS parmval is used with the PARTIAL2 routine to match two strings, using an algorithm based on standard NYSIIS algorithm. The input strings are reversed before the algorithm is used to build the code. Example first_name scores(100,80) routines(partial2) fldnames(pr_first_01) parmval(RNYSIIS)

Scoring Values

Score Description

100 Codes are equal (excludes blank versus blank)

80 Blank code versus blank code

75 Blank code versus non-blank code

0 Codes are not equal

POSTCODE Routine

The POSTCODE routine compares two postcode values. Example postcode scores(100,0) routines(postcode) fldnames(uk_out_postal_code)

Matcher 7-150 Scoring Values

Scoring Values

Score Description

100 All seven characters of postcode agree

95 First five characters agree

90 Blank versus a blank

85 First 4 characters agree

80 First 2 characters agree

0 Any other value

POSTCODE When Parmval = TSB (used in UK only)

The TSB parmval is used with the POSTCODE routine to...

If length = 8 (four characters outbound, one space, three characters inbound) XXXX XXX.

Otherwise, the length should be 7 (four characters outbound, three characters inbound) XXXXXXX. Example postcode scores(90) routines(postcode,nonblank) fldnames(uk_out_postal_code) parmval(TSB)

Trillium Software System™ Batch User’s Guide PREFIX Routine 7-151

Scoring Values

Score Description

100 Non-blank and all characters of postcode agree.

90 First four characters agree and non-blank

80 Any blank postcode

0 Any other value

PREFIX Routine

The PREFIX routine compares two name prefix values, assuming fields to be four characters in length. Example name_prefix scores(95,85) routines(prefix) fldnames(pr_prefix_01)

Matcher 7-152 PREVENT Routine

Sample Scores

vs. MR MSTR MS MRS MISS Blank Other

MR 100 95 25 25 25 90 85

MSTR 95 100 25 25 25 90 85

MS 25 25 100 95 95 90 85

MRS 25 25 95 100 80 90 85

MISS 25 25 95 80 100 90 85

Blank 90 90 90 90 90 100 95

Other 85 85 85 85 85 95 0/100*

* If the values are an exact match, they score 100; otherwise, they score a 0.

PREVENT Routine

The PREVENT routine prevents matching on fields named in the Field/Comparison Routine Listing. It supersedes any other comparisons and forces a non-matching condition if the field specified agrees exactly with any positive matches in a set. This feature is only supported in file matching.

This routine can be used during Household, Individual, and/or Commercial Matching and requires a positive action (pass or fail) through the matching patterns. Therefore, PREVENT can be used positively to force a match when any member of the match group has the same value. You can have only one PREVENT routine per level of matching.

Score Description

100 Exact match (including blank versus blank)

0 Any exception

Trillium Software System™ Batch User’s Guide RNYSIIS Routine 7-153

Example seqno scores(100) routines(prevent) fldnames(seqno)

RNYSIIS Routine

The RNYSIIS routine matches two strings using an algorithm based on a standard NYSIIS algorithm. This routine reverses the input string before building the NYSIIS code. For example, if the string was ‘JOHN SMITH’, before building the code the system will read the string as ‘HTIMS NHOJ.’ See page 7-123 for a layout of the algorithm. Scoring Values

Score Description

100 Codes are equal (excludes blank versus blank)

80 Blank code versus blank code

75 Blank code versus non-blank code

0 Codes are not equal

Example first_name scores(100,80) routines(RNYSIIS) fldnames(pr_first_01)

SOCSEC Routine

The SOCSEC routine compares two Social Security values. Fields must be nine characters in length.

Matcher 7-154 SOUNDEX1 Routine

Example social_security scores(90) routines(socsec,nonblank) fldnames(social)

Scoring Values

Score Description

100 An exact match of two values that contains valid digits.

99 Both values are “000000000”.

98 Either value is “000000000”.

90 Two values that contain digits and match except for one transposition.

10 Two values that contain digits and match except for one mismatch.

0 Either or both values contain: Non-numeric data Values comprised entirely of the same digit (except zero) Value “123456789” More than 1 mismatch More than 1 transposition

SOUNDEX1 Routine

This routine uses the following algorithm to perform functions on data. SOUNDEX Algorithm

1. Retain the first letter of the string. 2. After the first position, convert all occurrences of the following letters to blank: A, E, I, O, U, W, Y.

Trillium Software System™ Batch User’s Guide SOUNDEX Algorithm 7-155

3. Change letters from the following sets to the following corresponding digits:

From characters To digit B, F, P, V 1 C, G, J, K, Q, S, X, Z 2 D, T 3 L4 M, N 5 R6

4. Remove all consecutive pairs of duplicate digits and blanks from the string that resulted after Step 4. 5. Return the first four characters of the string, padded with trailing zeros, if needed.

Matcher 7-156 SOUNDEX2 Routine

Scoring Values

Score Description

0 Case 1: Both strings are blank Case 2: Neither string is blank and they are not equal

50 One string is blank; the other string is non-blank

100 The strings are equal

Sample Scores

Table 7.14 Sample Scores for SOUNDEX1

Score String 1 String 2 New String 1 New String 2

50 king

100 ng ngai N200 N200

100 patroff pietropaolo P361 P361

0billwill B400 W400

Example name scores(90) routines(soundex1,nonblank) fldnames(pr_first_01)

SOUNDEX2 Routine

The SOUNDEX2 routine is used to match two strings using an improved Soundex algorithm.

Improved SOUNDEX Algorithm 1. Capitalize all letters in the string. 2. Replace all non-leading vowels with A.

Trillium Software System™ Batch User’s Guide SOUNDEX2 Routine 7-157

3. Transform the following prefixes: Transform the following prefixes: From Characters To Characters MAC MCC KN NN KC PF or PH FF SCH SS

4. Transform these letter combinations that occur after the first position:

From Characters To Characters DG GG CAAN TAAN DT NST NSS AV AF QG ZS MN KN NN KC

5. Replace AW with A. 6. Replace PH with FF. 7. Replace SCH with SSS. 8. Replace H with A unless it is preceded and followed by A (for example, AHA). 9. Replace terminal NT with TT. 10. Remove all A characters except for the leading A. 11. Remove all but the first of repeating adjacent character substrings. 12. Return the first four characters, padded with blanks on the right, as needed.

Matcher 7-158 SPELLING Routine

Scoring Values

Score Description

0 Either both strings are blank, or neither string is blank and they are not equal.

50 One string is blank; the other string is non-blank.

100 Strings are equal.

Example name scores(90) routines(soundex2,nonblank) fldnames(pr_first_01)

SPELLING Routine

The SPELLING routine performs general-purpose text comparisons, assuming input fields are left-justified and blank-filled. If the two fields do not match exactly, the routine calculates two field lengths by excluding trailing blanks.

This routine also compares a hyphenated vs. a non-hyphenated string of data, when the two data strings are of the same length.

Ex: Record 1 – ST-CHARLES vs. Record 2 – STANYTHING

In a case like this, the routine compares the two bytes of record 1 that precede the hyphen (“ST”) against the first two bytes of record two (“ST”). The resulting score is 96. Example name scores(96) routines(spelling,nonblank) fldnames(pr_first_01)

Trillium Software System™ Batch User’s Guide SPELLING Routine 7-159

Scoring Values

Score Description

100 For an exact match (excluding blank versus blank).

96 Score returned when comparing a hyphenated vs. a non- hyphenated string of data:

If the hyphen is not the first or last character in the string, and the second string matches either the substring preceding the hyphen, or the substring following the hyphen exactly.

2 If the final score is less than 2.

1 Blank field vs. non-blank field

0 Blank field vs. blank field

Deduct from 100 Reason for deduction

- 1 For each non-matched doubled character.

- 1 For each character error and for each extra character if there are extra characters.

- 1 For each extra character if the last character was a mismatch.

- 2 For other character errors (transposition, insertion, mismatch, extra characters at the end of either field after taking insertions and doubled characters into consideration).

-10 If the number of character errors is more than or equal to 25% of the length of the shorter field.

-25 If the number of character errors is more than 50% of the length of the shorter field.

Finally, add

+ 1 If the length of either field is at least 9 characters.

Matcher 7-160 SPELLING When Parmval = DI

SPELLING When Parmval = DI Some comparison routines convert non-English characters into digraphs (pairs of English characters) whenever these characters are encountered within a field. The DI Parmval converts these non-English characters into the appropriate digraphs listed in the following table.

For example, the character, β, appears in the German spelling of street, straβe. That character will be converted to the digraph, SS, when encountered in a field name.

From character To digraph ÄAE ýOE ÆAE ÅAA ÖOE ÜUE bSS

SPELLING When Parmval = SQUISH The SQUISH parmval is used to blank out single quotes and remove all blanks from the data before the SPELLING routine performs.

Ex: 978 555 1221 becomes 9785551221, for comparison.

STATUS Routine

STATUS When Parmval (STATUS)

The STATUS parmval is used with the STATUS routine to compare fields from two separate records for a specified literal value.

Syntax STATUS=literal_value (literal_value is the value to check)

Trillium Software System™ Batch User’s Guide Scoring Values 7-161

Scoring Values

Score Description

100 Literal value versus literal value

90 Literal value versus non-literal value

80 Non-literal value versus non-literal value

Example name scores(90) routines(status,nonblank) fldnames(pr_first_01) parmval(STATUS)

STREETS Routine

The STREETS routine compares street names using the following logic:

1. Prior to performing the comparison, the routine changes all periods (.) to blanks and ampersands (&) to pluses (+). 2. Common street abbreviated words are then normalized and the spelling algorithm is applied. 3. If the spelling yields a score of less than 80, a modified sound algorithm is applied. 4. If the score is still less than 80 for numeral streets, a word comparison routine is applied. 5. If the two fields do not match exactly, the routine calculates two field lengths by excluding trailing blanks. Example street_name scores(100, 95) routines(streets) fldnames(pr_ best_st_tl)

When comparing numeric streets, if street numbers are different, the score is 0.

Matcher 7-162 STREETS Routine

Scoring Values

Score Description

100 For an exact match

90 - 99 Varying degrees of acceptable differences

95 For neither field value blank and one field an exact starting substring (6 characters or more in length) of the other, but the difference in length is not greater than two characters.

88 For an exact match of blank field value vs. blank. Maximum field length allowed is 100 bytes.

80 For blank field versus nonblank field

0 For comparing numeric streets, if street numbers are different. For example: 1232ND STREET and 1242ND STREET

Deduct from 98: Reason for deduction

– 1 For each non-matched doubled character

– 1 For each character error and for each extra character if there are extra characters

– 1 For each extra character if the last character was a mismatch

– 2 For other character errors (transposition, insertion, mismatch, extra characters at the end of either field after taking insertions and doubled characters into consideration)

– 10 If the number of character errors is more than 25% of the length of the shorter field

– 25 If the number of character errors is more than 50% of the length of the shorter field

Finally, add:

+1 If the length of either field is at least 9 characters

Trillium Software System™ Batch User’s Guide STREETS When Parmval (DI) 7-163

Score Description

Score Description 90 If a higher score can be achieved with checking the number of successful word comparisons greater than 2 words and all but one word agree.

STREETS When Parmval (DI) The DI parmval is used with the STREETS routine to convert non-English characters into digraphs (pairs of English characters) whenever these characters are encountered within a field. The DI Parmval converts these non- English characters into the appropriate digraphs listed in the following table.

For example, the character, β, appears in the German spelling of street, straβe. That character will be converted to the digraph, SS, when encountered in a field name.

From characters To digraph Ä AE ýOE ÆAE ÅAA ÖOE ÜUE bSS

STREETS When Parmval (TYPE)

The TYPE parmval is used with the STREETS routine to prevent specific street names from being recoded to blanks before making comparison score determination. Lower scores are assigned for more accurate matching.

The TYPE parmval only works within the field oraddrl2. With the oraddrl2 field, the system looks at the whole address line instead of only the first part.

The TYPE parmval should only be necessary when the field being compared

Matcher 7-164 SUBSTRNG Routine

includes both street name and street type. Example street_name scores(80) routines(streets) fldnames(oraddrl2) parmval(TYPE)

For a comparison between ‘Shinliss Rd.’ and ‘Shinliss Ave’ normal processing would score a 95 because ‘Ave.’ would be blanked out. Using the TYPE Parmval, this example would score an 80.

If the TYPE parmval is not used, all of the following street types are blanked out before being compared and scored:

Avenue Ave. Des Rang RG Rue

SUBSTRNG Routine

The SUBSTRNG routine detects the presence of one string of characters (field1) within another string of characters (field2), excluding blank strings. This routine requires two fields: field1 is the first field, field2 is the second field. This routine returns the highest of the two comparison scores. Example test_names scores(99) routines(substrng,nonblank) fldnames(pr_ first_01, pr_middle_01)

pr_first_01 pr_middle_01

John P. Paul Johnny

John vs. Johnny scores a 99; P. vs. Paul scores 0; the highest of the two

Trillium Software System™ Batch User’s Guide SUBSTRNG When Parmval (AND) 7-165

scores is 99, so that score is the returned score. Scoring Values

Score Description

100 First string is exactly the second string,

99 First string resides at the beginning of the second string.

98 First string resides at the end of the second string.

97 First string resides within the second string.

0Any other value.

SUBSTRNG When Parmval (AND)

The AND parmval is used with the SUBSTRNG routine to take into consideration both comparisons.

Score Description

100 First field is exactly the second field for both comparisons.

50 First field is exactly the second field for one comparison.

0Any other value.

Example test_names scores(100) routines(substrng,nonblank) fldnames(pr_first_01, pr_middle_01) parmval(AND)

pr_first_01 pr_middle_01

John Paul Paul-Henry J

John vs. J. scores a 0; P. vs. Paul scores 50; the highest of the two scores is

Matcher 7-166 TWORET Routine

50, so that score is the returned score.

TWORET Routine

The TWORET routine tests for the presence of two retail records. Example street_name scores(100) routines(tworet) fldnames(pr_best_st_tl)

Score Description

100 “1” versus “1”.

0 Any other value.

TWORET When Parmval (LO)

The LO parmval is used with the TWORET routine to detect the presence of two retail records, except the scoring system is reversed. For example, 1 versus 1 = 0; all else = 100. Example street_name scores(100) routines(tworet) fldnames(pr_best_st_tl)parmval(LO)

Trillium Software System™ Batch User’s Guide TWORET When Parmval (LO) 7-167

Scoring Values

Score Description

100 Any other value.

0“1” versus “1”

Matcher CHAPTER 8 Data Reconstructor

The data reconstructor is a flexible, rule-based data reconstruction utility. It features a rich scripting language with powerful IF-THEN-ELSE capabilities and extensive text manipulation facilities.

The program creates an output file based on the fields defined in an output DDL, and then uses reconstruction rules to build these fields from an input file and DDL. The input file and DDL would generally be from the output of the Matcher step. The reconstruction rules can be used to create an input file for a database or creating delivery address fields that have specific size constraints.

Data Reconstructor 8-2 Input/Output Resources

Input/Output Resources

Input Parameter file Contains the name of the special rules file. pfrecons.par Rules file See the section “Rules File” on page 8-6 for more information about this file. Input file File that is loaded into the Data Reconstructor DDLs reconinp.ddl, reconsrout.ddl Output Output file Contains all the reconstructed data Statistics file Contains BDP statistics and information

Trillium Software System™ Batch User’s Guide Data Reconstructor Parameters 8-3

Data Reconstructor Parameters

All REQUIRED parameters appear in bold in shaded rows.

Parameter names in this file are not case-sensitive.

Table 8.1 Data Reconstructor Parameters

Name Value Description

INPUT_DDL_FNAME DDL Name of the input DDL file that defines the layout of name the input file. REQUIRED

INPUT_FNAME File Name of the input file. REQUIRED name

IS_ALPHA_FNAME File Name of a code-page table used to identify which name characters are alphabetic characters. This parameter may be required for the special characters found in many foreign languages. When it is not specified, your operating system’s default test for an alphabetic character is used instead. The code-page specified by this parameter is used by the: proper_case and title_case action statements "is alphabetic" and "is alphanumeric" string conditions.

IS_DIGIT_FNAME File Name of a code-page table used to identify which name characters are numeric digits. This parameter may be required for the special characters found in many foreign languages. When not specified, your operating system’s default test for a numeric digit is used instead. The code-page specified by this parameter is used by the "is numeric" and "is alphanumeric" string conditions.

MAXIN Numeric The maximum number of records to read. The default is to read all records.

OUTPUT_DDL_FNAME File Name of the output DDL file that defines the output name file. REQUIRED

OUTPUT_FNAME File Name of the output file. REQUIRED name

Data Reconstructor 8-4 Data Reconstructor Parameters

Table 8.1 Data Reconstructor Parameters (Continued)

Name Value Description

PRINT_NTH_COUNT Numeric Prints the count of every Nth record read. If this is set to ‘0’ or not specified, no records are reported.

RULE_FNAME File Name of the rules parameter file that contains the name reconstruction rules, written in the CFRECONS script language. See “Rules File” on page 8-6. REQUIRED

STAT_FNAME File Name of the statistics file. name

START Numeric Numeric value; the first record number to process in the input file. For example: To process 1000 records starting with record number 442, enter: START=442 MAXIN=1000

TO_UPPER_FNAME File Name of the code-page table used to translate name characters to all uppercase. This parameter may be required for the special characters found in many foreign languages or to convert from one code-page to another. When this optional parameter is not specified your operating system’s default conversion to uppercase is used. The code-page specified by this parameter is used by the proper_case, title_case and upper_case action statements.

TO_LOWER_FNAME File Name of the code-page table used to translate name characters to all lowercase. This parameter may be required for the special characters found in many foreign languages or to convert from one code-page to another. When it is not specified your operating system’s default conversion to lowercase is used. The code-page specified by this parameter is used by the proper_case, title_case and lower_case action statements.

Trillium Software System™ Batch User’s Guide Parameter File Syntax 8-5

Table 8.1 Data Reconstructor Parameters (Continued)

Name Value Description

UPLOW_FNAME File Name of the that contains uppercase/lowercase name recode information. This file is required by the proper_case action statement. If the rule specified by the "USE_RULE" parameter contains a "proper_case" action statement, this parameter is required.

USE_RULE User- Specifies the name of the rule to use found in the defined RULE_FNAME parameter. The instructions found in this rule will be used to reconstruct data, with input records into output records. Only one rule can be executed when the program is running.

Parameter File Syntax

The parameter file contains a series of parameter name/value combinations. The parameters can be separated by one blank or tab, or by an equal sign.

For example: parameter-name parameter-value

or parameter-name=parameter-value

You can include additional blanks or tabs to improve readability. Parameter values can optionally be enclosed in double quotes (") or single quotes ('). If the first non-blank/non-tab character in a line is an asterisk (*) the entire line will be treated as a comment.

Blank lines are ignored.

Data Reconstructor 8-6 Rules File

Parameter names are not case-sensitive, but parameter values are. For example, all of the following forms are valid:

INPUT_DDL_FNAME "../dict/parsout.ddl" Input_DDL_FName ../dict/parsout.ddl INPUT_DDL_FNAME=../dict/parsout.ddl INPUT_DDL_FNAME ="../dict/parsout.ddl"

The Data Reconstructor uses extended semantics when parsing its parameter file. Not all utilities support all of the syntax options described in this section.

Rules File

The rule file is a plain text file containing data reconstruction rules written in the data reconstructor scripting language.

A rule file can contain a single rule or many rules.

Each rule begins with a rule keyword and ends with an endrule keyword.

A rules parameter file can have many rules defined, but only one rule can be executed when the program runs.

The rules are constructed and implemented using nested IF-THEN-ELSE expression logic. This allows powerful selection and conditional data reconstruction features.

Rules File Requirements

Every rule must have a name that can be used with the USE_RULE parameter. The name must immediately follow the rule keyword and can be a maximum of 32 characters long. It must begin with a letter and can contain any combination of letters, numbers and underscore characters (_).

Trillium Software System™ Batch User’s Guide Rule Script Language 8-7

Rule Script Language Data reconstruction rules are written in a rich script language making it possible to tailor actions to specific business, country and language requirements. Existing data elements and literal values can be combined to create new data elements based on markers found within the record (such as Customer Data Parser and Geocoder type fields and flag fields). Nested IF-THEN-ELSE statements can be used to create refined logic that takes many factors into account when rebuilding data.

Rules can be very simple or extraordinarily complex depending on individual requirements.

The statements, fields, keywords, semantics and syntax of this language are described, in detail, in the following sections.

Reserved Words The following words have special meaning, and cannot be used except for their intended purposes as described in this document: alphabetic copy_all LE pack proper_case:g alphanumeric endif LT perform proper_case:n and endrule left_justify proper_case:A right_justify AND ENDS_WITH left_justify:full proper_case:G right_justify:full append else lower_case proper_case:N rule append:0spaces EQ move proper_case:s starts_with append:2spaces GE NE proper_case STARTS_WITH append:pack GT numeric proper_case:anyline then BLANKS if NULLS proper_case:geography upper_case contains in or proper_case:name ZEROS CONTAINS IN OUT proper_case:street copy is Out proper_case:a

Data Reconstructor 8-8 Precedence and Associativity

Precedence and Associativity

Precedence controls which operators are executed first in an expression. Operators are grouped into levels of precedence from highest to lowest, as shown in the following table:

Operations Keyword or Symbol Relational operators (Highest precedence) GT, LT, GE, LE, < >, <=, >= Equity operators EQ, NE, =, !=, <>, == String operators Contains starts_with, ends_with, CONTAINS STARTS_WITH, ENDS_WITH, IS Logical AND operator And, and, &, & Logical OR operator (Lowest precedence) OR, or, || For example, in the following expression, the relational operations will be performed first (== >= and <), followed by the logical AND operation, and finally by the logical OR operation: if(state == "CA" or zip_code >= “10000” AND zip_code < “20000”) //statement(s); endif; endif; endrule

Associativity controls the grouping of operators at the same precedence level. All operations have left-to-right associativity. For example, the following two expressions have the same meaning: if(prov == "NL" OR prov == “NL” OR prov == “PE” OR prov = “NB”) move “Atlantic”, out.region; endif; if(((prov == "NL" OR prov == “NS”) OR prov == “PE”) OR prov = “NB”) move “Atlantic”, out.region; endif;

Trillium Software System™ Batch User’s Guide Comments 8-9

Comments

The compiler recognizes C style, C++ style, and shell style comments.

Style Description C Begins with /* and ends with */. Includes all characters in between. Comments can span multiple lines. Only C-style comments can be embedded in the middle of a line. /* #... Example of C-style comments in the Reconstructor. */ C++ Begins with // and extends to the end of the line. For multi-line comments, the comment portion of each line must begin with //. // //... Example of C++ style comments in the Reconstructor. // Shell Shell-style comments are similar to C++, except that # is used instead of //. Comments extend to the end of the line. After being started, a C++ or Shell-style comment extends completely to the end of the line. # #... Example of shell style comments in the Reconstructor. #

Fields

Fields are used in the script language to reference input or output data fields that are defined in data dictionaries (DDL files) and literal values. When used to refer to a data field, the field-name must exactly match the spelling and case of the name in the corresponding data dictionary DDL file. Literal values are described in the section “Literal Values.”

Fields cannot exceed 2000 bytes. If the Data Reconstructor encounters a field that is larger than 2000 bytes, it issues an error message and resets the field length to 2 bytes.

Data Reconstructor 8-10 Input or Output Dictionary

Syntax in. [n:n] “literal value” out. field name (n:n) OR ‘literal value’ IN. [n:*] BLANKS OUT. (n:*) ZEROS NULLS

Input or Output Dictionary

By default, the source_field in an action statement and the first field in an IF condition are assumed to be input fields, as defined within the input dictionary. Similarly, the destination_field in an action statement and the second field in an IF condition are assumed to be output fields as defined in the output dictionary.

It’s possible to override this assumption by prefixing an input field with in. or IN. and an output field with out. or OUT., as shown in the following example: move out.newline2, OUT.newline1;

You may decide to explicitly declare your fields as input or output all of the time by always including the IN. or OUT. prefix to improve the readability of your script, as in the following example: if(in.gout_fail_level != "0") then move in.line1, out.line1; move in.line2, out.line2; move in.line3, out.line3; move in.line4, out.line4; endif;

Trillium Software System™ Batch User’s Guide Selecting a Portion of a Field, field[n:n] & field(n:n) 8-11

Selecting a Portion of a Field, field[n:n] & field(n:n) The language has a built-in substring capability that allows you to select a portion of an input or output field by specifying a position and length after the field as [n:n] or (n:n).

Square brackets [n:n] are the preferred syntax. Round brackets (n:n) are provided for the mainframe compatibility.

The first n is the beginning position of the substring.

The second n is the length of the substring. “Length” can be specified as * to indicate the remainder of the field. For example, each of these statements does the same thing: move "CANADA”, OUT.newline4; move "CANADA”, OUT.newline4[1:*];

Substring notation can only be used with dictionary fields and can not be used with literal values. Each of these statements generates an error message: move BLANKS[1:10] , OUT.newline1; // will generate an error move " CANADA”[2:*], OUT.newline2; // will generate an error

Literal Values

Literal values are string constants consisting of any combination of characters enclosed in double (") or single-quotes ('). For example: 'Trillium Software System' "Version 7.nn"

You must end the literal string with the same quote character you start it with. If you include an actual quote character in the string, you can either enter it twice in a row or quote the entire string with the other quote character: 'Mary said "you can quote me!"'

Data Reconstructor 8-12 Binary Data Strings

Binary Data Strings

A binary string constant can be either octal or hexadecimal.

Hexadecimal—the first quote character must be preceded immediately by an uppercase or lowercase ‘x’ and each character is represented by its equivalent two-digit hexadecimal value (range 00—FF).

A special case is made for x"CR" (carriage return) which is considered equivalent to x"0D" and "LF" (line feed), which is considered equivalent to x"0A". For example: X'5368656C646F6E' or x"CRLF".

Octal—the first quote character must be preceded immediately by an uppercase or lowercase ‘o’ and each character is represented by its equivalent three-digit octal value (range 000—377). For example: O"110141162164154151156147" or o'015012'.

Concatenating Literal Values

Literal values can be concatenated (or joined together) using a plus sign as an operator. This can be useful when you need to create a very long literal string or to make your scripts easier to understand.

move "------" + "------" + "------", dashed_line_120ch;

move 'Network Pathways Inc., ' + 'Suite 100-401, ' + '1600 Bedford Hwy, ' + 'Bedford, NS, ' + 'Canada B4A 1E8' , return_address;

Strings do not have to be delimited the same way to be concatenated. For example: move "Shel" + 'don’, first_name;

Trillium Software System™ Batch User’s Guide BLANKS, ZEROS and NULLS 8-13

BLANKS, ZEROS and NULLS

The BLANKS, ZEROS and NULLS keywords are special literal values that can be used to set a field entirely to blanks, zeros or binary-zeros; or to test if a field contains only blanks, zeros or binary zeros.

When these keywords are used in a script, a literal value is created dynamically with exactly the right number of blanks, zeros or NULLS to match the size of the other field used in the expression. If, for some reason, all fields in an expression are BLANKS, ZEROS or NULLS keywords, the length of resulting literal values will be one.

In this example, all fields used within the IF conditions are 1-byte long: If(BLANKS == BLANKS) then // always true endif;

If(BLANKS == ZEROS) then // always false endif;

In this example, the length of the BLANKS literal will be 10 bytes to match the 10-byte substring selected from the 30-byte city field using the city[2:10] notation:

If(city[2:10] == BLANKS) then // characters 2 through 11 of city are blank endif;

Data Reconstructor 8-14 ‘IF’ Statements

‘IF’ Statements

IF statements allow you to add conditional logic in your scripts to choose between two or more options. For instance, you can choose to build an output address from Geocoder fields or from original input data based on Customer Data Parser and Geocoder flags.

IF Statement Syntax

if [condition [and/or/AND/OR] condition...] then [action statement...] [else action statement;...] endif;

IF statements consist of three parts:

the condition(s) to be evaluated

the action_statement(s) to execute when the condition(s) are TRUE

the action_statement(s) to execute when the condition(s) are FALSE.

When condition(s) evaluate as TRUE, the action_statement(s) following the condition(s) are executed, otherwise the action_statement(s) that follow the else keyword are executed. The condition(s) must be enclosed in round brackets. The then keyword is optional and can be omitted or included to improve readability.

When two fields of unequal lengths are compared, the comparison is made as if the shorter field was padded with blanks to match the length of the larger field.

Example If the field urban_city_name was 20 bytes long, the following two conditions would be the same: if(“urban_city_name == "BOSTON") if(“urban_city_name == "BOSTON ")

Trillium Software System™ Batch User’s Guide Conditions 8-15

Conditions

Data Reconstructor conditions include four relational conditions, two equality conditions, and six string conditions:

Condition Description Relational Conditions field1 GT field2 Greater Than True if field1 is greater than field2 field1 > field2 field1 GE field2 Greater Than Or True if field1 is greater than field2 or field1 >= field2 Equal To field1 is equal to field2 field1 LT field2 Less Than True if field1 is less than field2 field1 < field2 field1 LE field2 Less Than Or True if field1 is less than field2 or field1 is field1 <= field2 Equal To equal to field2 Equality Conditions field1 EQ field2 Equal To True if field1 is equal to field2 field1 == field2 field1 = field2 field1 NE field2 Not Equal To True if field1 is not equal to field2 field1 != field2 field1 <> field2 String Conditions field1 is numeric String is True if field1 contains only numeric Numeric characters. Leading and trailing blanks are trimmed from the field before making the comparison. IS_DIGIT_FNAME can specify a table, indicating which characters are considered numeric. field1 is alphabetic String is True if field1 contains only alphabetic Alphabetic characters. Leading and trailing blanks are trimmed from the field before the comparison is made. The optional IS_ALPHA_FNAME can specify a table that indicates which characters are considered alphabetic.

Data Reconstructor 8-16 Conditions

Condition Description String Conditions Description field1 is alphanumeric String is True if field1 contains only alphabetic or Alphanumeric numeric characters. Leading and trailing blanks are trimmed from the field before the comparison is made. The optional IS_ ALPHA_FNAME and IS_DIGIT_FNAME parameters may be used to specify which characters are alphabetic and which are numeric. field1 CONTAINS field2 String Contains True if field2 is found anywhere within field1 contains field2 field1. Leading and trailing blanks are field1 ~= field2 trimmed from both fields before making the comparison. field1 STARTS_WITH field2 String Starts True if field1 starts with field2. Leading field1 starts_with field2 With and trailing blanks are trimmed from both field1 ~< field2 fields before the comparison is made. field1 ENDS_WITH field2 String Ends True if field1 ends with field2. Leading field1 ends_with field2 With and trailing blanks are trimmed from both field1 ~> field2 fields before the comparison is made.

Trillium Software System™ Batch User’s Guide Conditions 8-17

Example The following example uses all twelve conditions:

if(zip_code GT "10000" AND zip_code LT "50000" AND pr_rev_group GE "008" AND pr_rev_group LE "010" AND pr_gout_fail_level == "0" AND state != "NY" AND first_name starts_with "PH" AND last_name ends_with "ING" AND company_name contains "TAXI" AND in.birth_date is numeric AND postal_code[1:1] is alphabetic AND company_name is alphanumeric then move "1", flag; else move "0", flag; endif;

Data Reconstructor 8-18 Logical Operators, AND and OR

Logical Operators, AND and OR

IF conditions can be combined using logical AND and OR operators to create compound conditions as described in the following table.

Logical Operators Description condition1 AND condition2 Logical AND condition1 and condition2 True only if both condition1 and condition2 are True. condition1 && condition2 condition1 OR condition2 Logical AND condition1 or condition2 True if either condition1 or condition2 is True. condition1 || condition2 The order of evaluation of compound conditions can be altered using brackets to group the conditions to be evaluated first: if((pr_rev_group == "000" OR pr_rev_group == "009") AND pr_gout_fail_level == "0") then

Trillium Software System™ Batch User’s Guide Nested ‘IF’ statements 8-19

Nested ‘IF’ statements

You can create nested IF statements, in which one IF statement is embedded within another, as an action-statement: if(condition1) then # statements to execute if condition1 is true ... if(condition2) then # statements to execute if condition 1 and # condition 2 are true ... if(condition3) then # statements to execute if conditions # 1, 2 and 3 are true ... endif; else # more statements to execute if condition1 is true ... endif;

Action Statements

Action statements use the following syntax:

Syntax verb [:modifier] [source field] [,] [destination field] ; - or- perform rule_name;

Some action statements may include a modifier that changes their operation slightly. If present, the modifier must immediately follow the verb and be delimited from it with a single colon.

Data Reconstructor 8-20 Action Statements

For example, the append:2spaces statement works like the append statement with the exception that two spaces are used for a delimiter instead of one. The comma separating the source-field from the destination-field is optional and can be included to improve readability.

Specific action statements take either no, one or two arguments as described in the following sections. Action Statements

The script language has only one action statement that requires no arguments: copy_all Copies all corresponding input fields to output fields. Fields are considered to correspond if they have the same name in both the input and output data DDL files. Any output fields that do not correspond to input will be reset to blanks. When the input and output DDL dictionary names are the same, copy_all moves the entire input record to the output record instead of performing field- by-field moves as an optimization to save processing time.

Because the copy_all statement has the side-effect of resetting to blanks any output field that has no corresponding input field, it should always be used at the beginning of your script.

Trillium Software System™ Batch User’s Guide Action Statements 8-21

Action Statements The script language has six action statements that require a single argument. In each of these statements, the lone argument is used to specify a destination field and can not be a literal value.

Statement Description pack Removes all blank characters from the destination field. upper_case Converts all of the characters in the destination field to uppercase. lower_case Converts all of the characters in the destination field to lowercase. title_case Converts all of the characters in the destination field to a mix of uppercase and lowercase. The first alphabetic character and any alphabetic character that follows a non-alphabetic character are converted to uppercase; the remaining characters are converted to lowercase.

A special exception is made for apostrophe-s which is converted to lowercase. For example "MARY-JANE’S BAKERY" would be changed to "Mary-Jane's Bakery". right_justify Right justifies the contents of the destination field. Removes any trailing blanks. left_justify Left justifies the contents of the destination field. Removes any leading blanks. right_justify:full Right justifies the contents of the destination field and converts each occurrence of multiple blanks to a single blank. For example, given a 20 character field containing the following value:"EXPIRY 20001127 ", right_justify:full produces:" EXPIRY 20001127" left_justify:full Left justifies the contents of the destination field and converts each occurrence of multiple blanks to a single blank. For example, given a 20 character field containing the following value: " THE PIT STOP " left_justify:full produces:"THE PIT STOP " proper_case Converts all characters in the destination field to a mix of uppercase and lowercase using an external table defined by UPLOW_FNAME. When no corresponding entries are found, the destination field is still converted to mixed uppercase/lowercase using title_case logic.

Data Reconstructor 8-22 Action Statements

Statement Description proper_case:a Indicates the proper_case statement is not for any specific line proper_case:A type. Only the ("A") line-type entries in the table will be searched. proper_case:anyline Default operation when no modifier is specified. proper_case:n proper_case for a field containing name information. proper_case:N Searches the ("N") line-type entries in the table, followed by the proper_case:name ("A") line-type entries if a match was not found in the "N" entries. proper_case:s proper_case for a field containing street information. proper_case:S Searches the ("S") line-type entries in the table, followed by the proper_case:street ("A") line-type entries if a match was not found in the "S" entries. proper_case:g proper_case for a field containing geography information. proper_case:G Searches the ("G") line-type entries in the table, followed by the proper_ ("A") line-type entries if a match was not found in the "G" entries. case:geography perform Causes all of the statements in another rule to be executed. For example: perform fix_name_line; will execute all of the statements in the previously defined “fix_name_line” rule in the same rule-file. The compiler does not support forward referencing. You can only perform a rule that has already been defined earlier within the rule file.

Action Statements The script language action statements require two arguments. In each of these statements, the first argument specifies a source-field or a literal value and the second argument specifies the destination-field.

Action statements Description copy Used to copy the contents of one field to another field adjusting the data type if necessary to match the description of the output field in the data dictionary. For example, a field might be converted from an INTEGER to an ASCII CHARACTER field by a copy statement. If the first argument in a copy statement is a literal value a move operation is performed instead of a copy operation.

Trillium Software System™ Batch User’s Guide Action Statements 8-23

Action statements Description move Used to move one text field to another. Unlike the copy statement, no conversion from one data type to another is attempted. If the source-field is longer than the destination-field, it will be truncated during the move. If the source-field is shorter than the destination-field, the destination-field will be padded with blanks after the move. append Appends the contents of one field to the end of the contents of another field after first adding a single blank character as a separator. If the destination-field is currently empty (all blanks) then a move operation is performed instead of an append operation. This makes it possible to perform a series of append operations on the same destination-field without creating unwanted blanks at the beginning of the field. If there is not enough room at the end of the destination-field, the source-field will be truncated to fit. Note that there must be at least 2 blanks at the end of the destination-field before an append operation will be attempted. append_pack Like the append statement but without the blank separator. append:pack Appends the contents of one field directly to the end of contents of append:0spaces another field. There must be at least 1 blank at the end of the destination-field before this operation will be attempted. append:2spaces Appends the contents of one field to the end of the contents of another field after first adding two blank characters as a separator. May be required in countries (Canada) to separate postal- code from the remainder of the line. If there is not enough room at the end of the destination-field, the source-field will be truncated to fit. There must be at least 3 blanks at the end of the destination-field before an append:2spaces operation will be attempted.

Overlapping Fields Move, append and append_pack/append:pack operations with source and destination fields that overlap in memory are fully supported by the data reconstructor. These operations are completed as if a temporary copy of the source-field had been made before the operation started.

This is illustrated in the following example.

Data Reconstructor 8-24 Action Statements

move "TRILLIUM”, out.temp; move out.temp[2:4], out.temp[1:4] // following this move the out.temp // field will contain “RILLIUM"

‘IF’ Statements are Action Statements The IF statement which is described in the section “IF Statements” is the final form of action statement. Because IF statements are themselves action statements it is possible to write complex nested ‘if-then-else’ logic as described in the section “Nested IF statements.”

Sample Rules File #1 rule LABEL1 if(uk_out_fail_level = "0") then if(uk_out_dpndthorough_name <> BLANKS) then move uk_out_house_number,nwaddrl3; append uk_out_dpndthorough_name, nwaddrl3; append uk_out_dpndthorough_desc, nwaddrl3; append uk_out_thorough_name , nwaddrl4; append uk_out_through_desc , nwaddrl4; else move uk_out_house_number , nwaddrl3; append uk_out_thorough_name , nwaddrl3; append uk_out_through_desc , nwaddrl3; endif; endrule

The preceding sample shows one rule definition called LABEL1, which will either populate the output fields nwaddrl3 or nwaddrl4, depending on whether the input field uk_out_dpndthorough_name is blank or not, as long as the record had a Fail Level of 0.

Both nwaddrl3 and nwaddrl4 fields will be populated if there was data in the dependant thoroughfare name field (uk_out_dpndthorough_name).

Trillium Software System™ Batch User’s Guide String Variables in the Data Reconstructor 8-25

String Variables in the Data Reconstructor

String variables must be declared with a STRING keyword before they’re used at either the beginning of:

The Rules file, before any rules are defined.

A specific rule, before the first action statement.

String variable names are a maximum of 32 characters long, and begin with a dollar sign. They are also case-sensitive: $NAME

String variables have a default length of 256 characters, unless a different length is specified at the time they’re first declared: STRING $LAST_NAME[30]; // 30ch long STRING $last_name; // 256ch long STRING $BigBuffer[10000] //10,000ch long

String variables may be used any place in a rule that a DDL field name can be used. For example:

Rule Sample 1 STRING $name[50];

move in.first_name, $name; append in.last_name, $name; move $name, out.full_name;

endrule;

Data Reconstructor 8-26 Running the Data Reconstructor on UNIX and 32-Bit PC Platforms

Running the Data Reconstructor on UNIX and 32-Bit PC Platforms

To execute the cfrecons program, use the following command-line syntax: cfrecons -parmfile parm_file_name -parmecho echo_file_name where:

cfrecons Data reconstructor program

-parmfile Keyword that indicates the parameter file follows

parm_file_name Name of the program parameter file

-parmecho Keyword that indicates the parameter echo file follows

echo_file_name Displays any parameter processing errors in the program listing file (Optional)

Trillium Software System™ Batch User’s Guide IBM Mainframe Sample Data Reconstructor JCL 8-27

IBM Mainframe Sample Data Reconstructor JCL

The following sample Job Control Language (JCL) runs the Data Reconstructor:

//BOB$RECS JOB 9901,'RHARRIS-UD-BIN040',CLASS=Z, // NOTIFY=RHARRIS,MSGCLASS=X /*MESSAGE ********************************************************* /*MESSAGE RUNNING CFRECONS /*MESSAGE ********************************************************* //CFRECONS EXEC PGM=CFRECONS,PARM='/-PARMFILE PF', // REGION=0M //STEPLIB DD DISP=SHR, // DSN=UDP01.TRIL7VNN.MOD.NECIFER.LOADLIB // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //CEEDUMP DD DUMMY,DCB=BLKSIZE=133 //SYSOUT DD SYSOUT=* //SYSPRINT DD SYSOUT=* //TRILMSGS DD DUMMY //INPUT DD DISP=SHR, // DSN=TRIL.BOBH.TEST.CFRECONS.INPUT //OUTPUT DD UNIT=WORK, // DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=410,BLKSIZE=0), // SPACE=(CYL,(1,1),RLSE), // DSN=TRIL.BOBH.TEST.CFRECONS.OUTPUT //OUTDDL DD DISP=SHR, // DSN=WYL.UD.BOB.LIB(OUTDDL) //INPDDL DD DISP=SHR, // DSN=WYL.UD.BOB.LIB(INPDDL) //PF DD DISP=SHR, // DSN=WYL.UD.BOB.LIB(PFRECONS) //RULES DD DISP=SHR, // DSN=WYL.UD.BOB.LIB(PFRULES) //STAT DD SYSOUT=* /*

Data Reconstructor 8-28 Error Messages

Error Messages

The following errors are returned by the main CFRECONS program: Table 8.2 Data Reconstructor Error Messages

Message Description

E R R O R : b a d c o m m a n d l i n e The command line used to run CFRECONS contained an Invalid argument ‘name' unrecognized argument. The only valid command line arguments are –parmfile or -pf, and -parmecho or -pe. Any other argument will generate this error.

ERROR: 'parmfile' argument Usage: missing from command line CFRECONS -parmfile filename [-parmecho filename] or CFRECONS -pf filename [-pe filename]

ERROR: cannot open input An error occurred trying to open the input data file. file: file-name. Confirm that the file-name is correct.

ERROR: cannot open output An error occurred trying to open the output data file. file: file-name. Confirm that the file-name is correct and that you have the authority to write that file

ERROR: unable to allocate The program could not allocate the necessary buffer space. buffer space If possible, close other applications and try running CFRECONS again. If the problem persists, please contact Technical Support for assistance.

ERROR: reading record: An error occurred trying to read a record form the input file-name. data file.

ERROR: writing record: An error occurred trying to write a record to the output file-name. data file.

ERROR: unable to open An error occurred trying to open the stat file. Confirm that stat file: file-name. the file-name is correct, includes the full absolute or relative path name, and that you have the appropriate authority to write a file with that path name.

ERROR: skipping record: On mainframe and AS/400 platforms: Error occurred file-name: reading past input records to position the file to the record specified in START parameter.

Trillium Software System™ Batch User’s Guide Error Messages 8-29

Table 8.2 Data Reconstructor Error Messages

Message Description

ERROR: unable to position file An error occurred setting the starting position of the file to to record nnn: file-name: the record specified by the START parameter.

Data Reconstructor 8-30 Parameter Echo File Error Messages

Parameter Echo File Error Messages

Whenever CFRECONS encounters a problem with a parameter read from the parameter file it writes an error message to the parameter echo file, if one was specified in the CFRECONS command line. Table 8.3 CFRECONS Echo File Error Messages

Message Description

(“quoted-value), closing The parameter value was started with a double quote (") quote is missing. or a single quote (') and no closing quote was found. For example, each of the following lines will cause this error:

USE_RULE "UK1 USE_RULE 'UK1 USE_RULE "UK1' USE_RULE 'UK1"

(“quoted-value), improperly The parameter value was enclosed in double quotes (") quoted string. or single quotes (') but the closing quote was not the last character in the line. For example, each of the following lines will cause this error:

USE_RULE "UK1" OR NOT USE_RULE 'MURPHY'S LAW' USE_RULE 'UK1' * Specify Rule to use

"name" is not a valid The specified parameter name was not recognized as parameter name. one of the fifteen parameters.

"name" required parameter The specified parameter name, one of the six required is missing. parameters documented in the section “Data Reconstructor Parameters” on page 8-3 was not found in the parameter file.

nn parameter errors; no A count of the total number of errors encountered in the parameter errors parameter file.

Trillium Software System™ Batch User’s Guide Rules File Error Messages 8-31

Rules File Error Messages

When the rules file is compiled, it is checked for errors. Any problems are reported to the stderr file. If all of the scripts in the rule file do not compile cleanly, CFRECONS exits with the following message:

ERROR: compile errors in "file-name" rule file.

Notice that each compiler error message begins with the rule file name and contains the line number in the rule file where the error occurred. Table 8.4 CFRECONS Error Messages

Message Description

line nn: ‘field-name’ not The identified field-name was not found in either the input found in input dictionary at or output data dictionary (DDL) file. The input and output ‘token’ dictionaries are specified by INPUT_DDL_FNAME and OUTPUT_DDL_FNAME. Dictionary field names are case line nn: ‘field-name’ not sensitive and must be spelled correctly. The first field- found in output dictionary at name referenced in an action-statement or if-condition is ‘token’ assumed to be in the input dictionary. The second field- name is assumed to be in the output dictionary. This assumption can be changed by prefixing the field-name with IN. or OUT. In the following example, if new_addr1 and new_addr2 fields occurs only in the output dictionary this error will be reported for both the IF and MOVE statements: if(new_addr1 EQ BLANKS) then move new_addr2, new_addr1; endif These errors can be corrected by using the OUT. prefix; as in: if(OUT.new_addr1 EQ BLANKS) then move OUT.new_addr2, new_addr1; endif

line nn: [n,n] should be [n:n] The proper syntax to select a portion of a field is field- at ‘token’ name[n:n]. Change the comma to a colon.

line nn: (n,n) should be (n:n) at ‘token’

Data Reconstructor 8-32 Rules File Error Messages

Table 8.4 CFRECONS Error Messages

Message Description

line nn: (n:n) not valid with You can’t use [n:n] to select a portion of a literal field. string literals at ‘token’ Instead modify the actual literal value: move "Jon", first_name; line nn: [n:n] not valid with string literals at ‘token’

line nn: [n:n] only valid with You can only use [n:n] notation to select a portion of a field text fields at ‘token’ that is defined in the data dictionary as a text field. This feature would be ambiguous with non-text fields such as integer, packed-decimal and is not supported.

line nn: bad [n:n] values at [n:n] is used to select a portion or substring of a text field. ‘token’ The first n specifies a beginning position within the text field with 1 indicating the first character. The second n specifies the length of the substring. Length can be specified as * to indicate all of the remainder of the field. The position and length values can not be equal to zero or greater than the size of the field.

line nn: can not write to a Literal values can be used as the source field in an action- literal at ‘token’ statement but not as the destination field. The destination field must always be a field found in the output dictionary. For example, the following move statement will cause this error: move first_name, "Sheldon";

line nn: can not write to an Fields defined in the input dictionary can not be used as input field at ‘token’ the destination field in an action-statement. The destination must always be a field found in the output dictionary. For example, the following move statement will cause this error: move first_name, IN.first_name; When the same field-name occurs in both input and output data dictionaries, the input dictionary is used when the field-name is referenced as a source field and the output dictionary is used when the field is referenced as a destination field. It is possible to use output fields as source by preceding the field-name with OUT., as shown in the following example: Move OUT.state, state-key; append_pack "-", state-key; append_pack seqno, state-key;

Trillium Software System™ Batch User’s Guide Rules File Error Messages 8-33

Table 8.4 CFRECONS Error Messages

Message Description line nn: empty rule at ‘token’ A rule must contain at least one statement. line nn: internal error, bad This error indicates an internal problem in the CFRECONS pointer value in routine-name program or real-time library. In the unlikely event you at ‘token’ receive this error, please contact Technical Support for assistance. line nn: literal storage There is a limit to the total size of all the literal values you exhausted at ‘token’ use in your scripts; currently about 100Kbytes. This error message means that your rule file has too many literal values. If possible, try to reduce the size of your rule file; possibly by placing each rule in its own file. If this does not eliminate the problem, contact Technical Support for assistance. line nn: missing "endrule" at Every rule must start with a rule keyword and end with an ‘token’ endrule keyword. If the endrule keyword is missing, this error message will be reported. line nn: missing arguments Every action statement takes zero, one or two arguments. at ‘token’ This error means you’ve attempted to use too many arguments in a given statement. For example, upper_case takes one argument and append takes two arguments. Each of the following statements will cause this error: upper_case; append country; line nn: missing rule name at Every rule must have a name that immediately follows the ‘token’ rule keyword that introduces the rule. line nn: missing semi-colon Every action statement must end in a semi-colon. at ‘token’ line nn: table overflow, too There is a limit to the size of a script that CFRECONS can many actions at ‘token’ compile; currently about 20,000 statements. This error message means that your rule file has too many action statements. If possible, try to reduce the size of your rule file; possibly by placing each rule in its own file. If this does not eliminate the problem, contact Technical Support for assistance.

Data Reconstructor 8-34 Rules File Error Messages

Table 8.4 CFRECONS Error Messages

Message Description

line nn: table overflow, too There is a limit to the size of a script that CFRECONS can many expressions at ‘token’ compile; currently about 20,000 statements. This error message means that your rule file has too many expressions. If possible, try to reduce the size of your rule file; possibly by placing each rule in its own file. If this does not eliminate the problem, contact Technical Support for assistance.

line nn: too many arguments Every action statement takes zero, one or two arguments. at ‘token’ This error means you’ve attempted to use more arguments than the statement can use. For example, upper_case takes one argument. The following statement will cause this error: upper_case state, zip;

line nn: bad quoted string at A literal value was not properly delimited by either double ‘token’ quotes (") or single quotes ('). Quoted strings can not extend over multiple lines. If necessary, you can break a long string into two strings and use a plus sign to concatenate the two strings. For example:

move "------" + "------", dashed_line;

line nn: illegal character An invalid character was encountered in the rule file. The (hex=0xFF) at ‘token’ hex value of the offending character will be displayed in the error message. For example, field names can contain an underscore character (_) but not a dash character (-). Using a dash in a rule name will generate this error, as in the following example: rule first-rule //statements endrule

line nn: syntax error: cannot This error indicates an internal problem with the back up CFRECONS compiler. In the unlikely event you receive this error, please contact Technical Support for assistance.

Trillium Software System™ Batch User’s Guide Rules File Error Messages 8-35

Table 8.4 CFRECONS Error Messages

Message Description line nn: parser stack This error indicates an internal problem with the overflow CFRECONS compiler. In the unlikely event you receive this error, please contact Technical Support for assistance. line nn: parse error; also This error indicates a CFRECONS internal problem. If you virtual memory exceeded receive this error, contact Technical Support for assistance (978-901-0000). line nn: parse error Something unrecognized in a script file that does not cause one of the other compiler errors generates a parse error. line nn: rule 'rule-name' is An attempt was made to define a rule with a name that already defined already exists in the rule-file. Each rule in the file must have a unique name. line nn: unknown rule A perform statement tried to execute a rule that was not defined above it in the rule-file. Rules must be defined before they can be used. This means the rule to be executed must appear in the rule-file before the perform statement that references it.

Data Reconstructor CHAPTER 9 Create Common Module

The Create Common module is a parameter-driven module that functions similar to the Commonizer function of the Matcher. Create Common is a powerful, user-customizable program that allows the user to set specific parameters that make the program commonize matcher output data to up to 10 levels (e.g Household level, Commercial level, etc.). This module allows for 2 major functions:

Commonization - Commonization allows the user to use special instructions (such as the decision routines) that tell the module to copy data (from a specified field) across other specified fields of matched record sets. (These records are linked by a common match key.) The user controls selection and location of data though parameter entries in the parameter file. See page 9-2.

Survivorship - The user-defined designation of a “survivor” record among a group of records linked by a common key, using survivor selection rules. This function flags a single record at any level, indicating the “best” record of the matched set. See page 9-6.

Create Common Module 9-2 User Common Data (Commonization)

This module is used on data files that have been through a Matching process.

Input files must be sorted by grouping keys prior to processing (for levels 1-10).

User Common Data (Commonization)

Commonization is the act of selecting specified data from one field, and copying that data to other fields in other records, across a matched set of records. The commonization in the Create Common module is controlled with parameter entries, and the specific decision routines listed in the parameter file.

These routines allow data to be evaluated anywhere in the record and then commonized across the matched record set. The user can commonize data in the existing field or a new field. Also, the data to commonize can be sourced from another field.

You can instruct the program to commonize data across a matched set of records for up to 10 levels.

The structure of the parameter entries defines where the data is commonized: LEV_01_COMMON_FIELDSc1c2c3c4

c1 = test field c2 = decision routine c3 = from field (data from here goes to target field) c4 = target field

LEV_01_COMMON_FIELDS

phone_number,MOST_NBNZ,phone_number,hhld_phone_number date_open,LOWEST_NBNZ,date_open,hhld_date_open

See “Create Common Parameters” on page 9-8 for a complete description of these parameters and decision routines.

Trillium Software System™ Batch User’s Guide User Common Data (Commonization) 9-3

Commonization Example 1 In this example, the parameter LEV_01_COMMON_FIELDS uses the LONGEST decision routine, at a record level of 1. The longest value in the field (phone_number) will be commonized through all records into the hhld_phone_number field.

LEV_01_COMMON_FIELDS phone_number,LONGEST,phone_number,hhld_phone_number

Input phone_number hhld_phone_number

Rec 1 555-1234

Rec 2 555-1234

Rec 3 978-555-1234

Output phone_number hhld_phone_number

Rec 1 555-1234 978-555-1234

Rec 2 555-1234 978-555-1234

Rec 3 978-555-1234 978-555-1234

Records 3 is the only record to contain an area code (hence, it’s the longest field) so that data will be commonized across all the other records in the target field, the hhld_phone_number field.

Create Common Module 9-4 User Common Data (Commonization)

Commonization Example 2 In this example, the parameter LEV_02_COMMON_FIELDS is set to use the MOST decision routine, at a record level of 2. The most occurring value in the field (in this case, date_of_birth) will be commonized through all records into the common_date_of_birth field.

LEV_02_COMMON_FIELDS date_of_birth,MOST,date_of_birth,common_date_of_birth

Input date_of_birth common_date_of_birth

Rec 1 1/30/1976

Rec 2 1/30/1976

Rec 3 2/14/1976

Output date_of_birth common_date_of_birth

Rec 1 1/30/1976 1/30/1976

Rec 2 1/30/1976 1/30/1976

Rec 3 2/14/1976 1/30/1976

The data “1/30/1976” occurs the most in the date_of_birth field, so it will be commonized into the target field common_date_of_birth.

Trillium Software System™ Batch User’s Guide User Common Data (Commonization) 9-5

Commonization Example 3

In this parameter entry, the LEV_01_COMMON_FIELDS is set to use the LITERAL routine. (The literal value is placed between parentheses). This function searches for the literal value ‘ACME’ in the source_fld field of each record in the matched set, and commonizes that value. The literal value is populated in the field cust_name.

LEV_01_COMMON_FIELDS source_fld,LITERAL (ACME),source_fld, cust_name

Input source_fld cust_name

Rec 1 OAK

Rec 2 ACME

Rec 3 JONES

Output source_fld cust_name

Rec 1 OAK ACME

Rec 2 ACME ACME

Rec 3 JONES ACME

Create Common Module 9-6 Selecting the Surviving Record

Selecting the Surviving Record

The user can designate a “survivor” record from among a group of records linked by a common key. Any record flagged as the survivor can be assigned a random flag number. What to search for within the records can be set up in the parameter file, using the decision routines.

See “Create Common Parameters” on page 9-8 for a complete description of these parameters and decision routines.

When choosing a survivor, there can be multiple options for searching, as shown in the example below. Example In the example below, let’s assume we want to have Record 4 chosen as the survivor: First Middle Title Name Name(s)Surname Gender DOB Phone #

1. John Simon James SmithM 19590618 978-555-0004

2. John James SmythM 19590618 978-555-0005

3. John SmithM 19600618 978-555-0006

4. John S SmithM 19600618 978-555-0007

5. Mr. J S SmithM 19600618 978-555-0007

6. J Smith 19600618 978-555-0009

Trillium Software System™ Batch User’s Guide Selecting the Surviving Record 9-7

To ensure Record 4 is flagged as the Survivor

1. In the parameter file, the LEV_01_ASSIGN_SURVIVOR parameter could be populated as:

LEV_01_ASSIGN_SURVIVOR date_of_birth,HIGHEST,indv_survivor,1 phone_number,MOST,indv_survivor,2

2. This parameter is telling the program to first use the HIGHEST decision routine. This routine looks for the highest numeric value in the test field (date_of_birth). If this value is found in only one record, the indv_ survivor field is flagged with a ‘1’, making that record the survivor. 3. In this case, Records 3, 4, 5 and 6 all contain the highest date of birth, so the program continues on with those four records, trying to narrow the search even further. 4. The program now searches using the MOST decision routine. The routine looks for the most occurring value in the test field (in this case, the phone_number field). Since the number 978-555-0007 occurs the most (in both Record 4 and 5), those two records are what are left. 5. Since there is a tie with no more survivor rules, the program simply takes the first record, making Record 4 the survivor. 6. The indv_survivor field will be flagged with a ‘2’.

Create Common Module 9-8 Create Common Parameters

Create Common Parameters

Certain parameters contain up to ten levels for commonization. Please note that all required parameters appear in bold and shaded: Table 9.1 Create Common Module Parameters

Parameter Value Description

MAXIN Numeric Maximum number of input records

MAXOUT Numeric Maximum number of output records

TITLE User- Text displayed on the output report defined

INPUT_DDL File Name of the file and record of the input data dictionary name

INPUT_FNAME File Full path and name of the input data file name

OUTPUT_DDL File Name of the file and record of the output data name dictionary

OUTPUT_FNAME File Full path and name of the output data file name

MAX_ARRAY_RECORDS Numeric Number of records held in memory for level 1 set; (Default=10000)

REALLOC_RECORDS Numeric If the MAX_ARRAY_RECORDS parameter is a hit, this parameter increases records stored in memory. (Default=2000)

STAT_FNAME File Full path and name of the statistics file name

LEV_NN_KEY_ Field Defines the field name that contains the key used to FIELD_NAME name group records for evaluation, at each commonization level: NN = 01-10 Minimum NN=01

Trillium Software System™ Batch User’s Guide Create Common Parameters 9-9

Table 9.1 Create Common Module Parameters (Continued)

Parameter Value Description

LEV_NN_ASSIGN_ c1,c2, This parameter contains the survivor assignment SURVIVOR c3,c4 rules; the information necessary for flagging the surviving record. This parameter works in conjunction with the decision routines listed on the next page. Shape is cCcC c1 = test field c2 = decision routine c3 = target field c4 = assigned value LEV_02_ASSIGN_SURVIVOR date_of_ birth,HIGHEST,indv_survivor,1 date_open,LOWEST,indv_survivor,2 (NN = 01-10)

LEV_NN_COMMON_ c1,c2, Contains the commonization rules; the information FIELDS c3,c4 necessary for commonizing data across records. (NN = 01-10) Works in conjunction with the decision routines.See “Decision Routines” on page 9-10. Shape is cCcC c1 = test field c2 = decision routine c3 = from field (data from here goes to target field) c4 = target field LEV_01_COMMON_FIELDS phone_number,MOST_ NBNZ,phone_number date_open,LOWEST_NBNZ,date_ open,hhld_date_open (Minimum NN=01)

LEV_NN_SURVIVOR_ Field Name of the field to hold the Survivor Flag; FIELD_NAME name NN = 01-10

LEV_NN_SURVIVOR_ Numeric Contains a list of Survivor Values in Precedent Order. VALUES Based on the user-designated surviving numbers, they will be listed under this parameter. NN = 01-10

Create Common Module 9-10 Decision Routines

Decision Routines

The decision routines are the program rules and instructions used in the cfcrcdrv parameter file, pfcrcdrv.par. They control two functions to determine which:

Data is searched for and how commonization will function within the program.

Records will be set up for survivorship

Those routines marked with “For commonization only” can’t be used to determine a surviving record. Table 9.2 Decision Routines

Decision Routine Description

LOWEST Lowest Numeric Value for Selected Data Field

LOWEST_NB Lowest Non-Blank Numeric Value for Selected Data Field

LOWEST_NZ Lowest Non-Zero Numeric Value for Selected Data Field

LOWEST_NBNZ Lowest Non-Blank/Non-Zero Numeric Value for Selected Data Field

HIGHEST Highest Numeric Value for Selected Data Field

HIGHEST_NB Highest Non-Blank Numeric Value for Selected Data Field

HIGHEST_NZ Highest Non-Zero Numeric Value for Selected Data Field

HIGHEST_NBNZ Highest Non-Blank/Non-Zero Numeric Value for Selected Data Field

LOWCHAR LOWCHARLowest Character Value for Selected Data Field

LOWCHAR_NB Lowest Non-Blank Character Value for Selected Data Field

LOWCHAR_NZ Lowest Non-Zero Character Value for Selected Data Field

LOWCHAR_NBNZ Lowest Non-Blank/Non-Zero Character Value for Selected Data Field

HIGHCHAR Highest Character Value for Selected Data Field

HIGHCHAR_NB Highest Non-Blank Character Value for Selected Data Field

HIGHCHAR_NZ Highest Non-Zero Character Value for Selected Data Field

Trillium Software System™ Batch User’s Guide Decision Routines 9-11

Table 9.2 Decision Routines

Decision Routine Description

HIGHCHAR_NBNZ Highest Non-Blank/Non-Zero Character Value for Selected Data Field

LEAST Least Occurring Value for Selected Field

LEAST_NB Least Occurring Non-Blank Value for Selected Field

LEAST_NZ Least Occurring Non-Zero Value for Selected Field

LEAST_NBNZ Least Occurring Non-Blank/Non-Zero Value for Selected Field

LITERAL The specified value of a Selected Data Field. Value is in parentheses. This example searches for the literal value 978-436-8900. LEV_01_COMMON_FIELDS phone_number,LITERAL (978-436- 8900),phone number,hhld_phone_number The literal value must be the same length as the test field.

LONGEST Compares the length of the test fielf data on one record against the length of the data in the same field on another record. System commonizes the longer of the two fields. Field1 = Smith Field2 = Smit In this case, the contents of test field, “Smith” (the longer of the two) is commonized.

MOST Most Occurring Value for Selected Data Field

MOST_NB Most Occurring Non-Blank Value for Selected Data Field

MOST_NZ Most Occurring Non-Zero Value for Selected Data Field

MOST_NBNZ Most Occurring Non-Blank/Non-Zero Value for Selected Data Field

SHORTEST Compares the length of the test field data on one record against the length of the data in the same field on another record. System commonizes the shorter of the two fields. Test field = Smith Test field = Smit In this case, the contents of test field, “Smit” (the shorter of the two) is commonized.

SURVIVOR Survivor Value Found in List (For Commonization Only)

Create Common Module 9-12 Decision Routines

Decision Routine Selections for a Single Field

In the examples below, we will consider 10 records, and how the contents within those records applies to ten different decision routines.

Record # Field Contents Record 1 123 Record 2 123 Record 3 456 Record 4 ___ Record 5 ___ Record 6 ___ Record 7 000 Record 8 000 Record 9 000 Record 10 000

Trillium Software System™ Batch User’s Guide Decision Routines 9-13

Table 9.3 Sample Decision Routine Results

Routine Searches for the To Commonize field (Records)

HIGHEST Highest numeric value 456 (Record 3)

LOWEST Lowest numeric value ___ (Records 4, 5 and 6)

LOWEST_NB Lowest, non-blank numeric value 000 (Records 7-10)

LOWEST_NZ Lowest, non-zero numeric value ___ (Records 4, 5 and 6)

LOWEST_NBNZ Lowest, non-blank, non-zero 123 (Records 1 and 2) numeric value

LEAST Least occurring value 456 (Record 3)

MOST Most occurring value 000 (Records 7-10)

MOST_NZ Most occurring non-zero value ___ (Records 4, 5 and 6)

MOST_NBNZ Most occurring non-blank, non- 123 (Records 1 and 2) zero value

Create Common Module 9-14 Decision Routines

Sample Parameter File (with Commonization Notes) ************************************************************************* * Sample parameter file for cfcrcdrv * ************************************************************************* TITLE “Test run of create common program” INPUT_DDL ../dict/crcminp.ddl,CRCM INPUT_FNAME ../data/crcm.inp OUTPUT_DDL ../dict/crcmout.ddl,CRCM OUTPUT_FNAME ../data/crcm.out STAT_FNAME crcm.rpt LEV_01_KEY_FIELD_NAME matched_hhld LEV_02_KEY_FIELD_NAME matched_indv_in_matched_hhld LEV_02_ASSIGN_SURVIVOR date_of_birth,HIGHEST,indv_survivor,1 date_open,LOWEST,indv_survivor,2 LEV_01_COMMON_FIELDS phone_number,MOST_NBNZ,phone_number, hhld_phone_number date_open,LOWEST_NBNZ,date_open,hhld_date_open LEV_02_COMMON_FIELDS taxid,MOST_NBNZ,taxid,indv_taxid date_of_birth,LOWEST_NBNZ,date_of_birth, common_date_of_birth date_open,LOWEST,date_open,indv_date_open LEV_02_SURVIVOR_FIELD_NAME indv_survivor LEV_02_SURVIVOR_VALUES 1 2

Explanations for the above parameter file

1. The parameter LEV_02_ASSIGN_SURVIVOR first uses the HIGHEST deci- sion routine, at a record level of 2. This routine looks for the highest numeric value in the test field (date_of_birth). If the value is found, the indv_survivor field will be flagged with a ‘1’.

If the test with the HIGHEST routine fails on the date_of_birth field, the program then searches using the LOWEST decision routine. The routine looks for the lowest numeric value in the test field (in this case, date_open field). If this value is found, the indv_survivor field will be flagged with a ‘2’.

2. The parameter LEV_01_COMMON_FIELDS first uses the MOST_NBNZ deci- sion routine, at a record level of 1. The most occurring non-blank, non-zero

Trillium Software System™ Batch User’s Guide Decision Routines 9-15

value in the test field (in this case, phone_number) will be commonized through all records into the hhld_phone_number field.

The program then continues on, using the LOWEST_NBNZ decision routine on the date_open field. The lowest non-blank, non-zero numeric value in the test field (date_open) will be commonized through all other records into the hhld_date_open field.

3. The parameter LEV_02_COMMON_FIELDS first uses the MOST_NBNZ deci- sion routine, at a record level of 2. The most occurring non-blank, non-zero value in the test field (in this case, the taxid field) will be commonized through all records into the indv_taxid field.

The program then continues on, and uses the LOWEST_NBNZ decision routine on the date_of_birth field. The lowest non-blank, non-zero numeric value in the test field (the date_of_birth field) will be commonized through all other records into the common_date_of_birth field.

Lastly, the program continues on, and uses the LOWEST decision routine on the date_open field. The lowest numeric value in the test field (the date_open field) will be commonized through all other records into the indv_date_open field.

Create Common Module 9-16 IBM Mainframe Sample JCL for cfcrcdrv

IBM Mainframe Sample JCL for cfcrcdrv

//&USERCRCM JOB &JOBNO,’&USER-UD-BIN&BIN’,CLASS=Z, // NOTIFY=&USER,MSGCLASS=X //********************************************************** //* //* SOURCE IS &PROJPREF.&TRILVER.SAMP.UTILITY.JCLLIB(CFCRCDRV) //* //* CHANGE &UNIT TO UNIT DESIGNATION (DISK, 3390, ETC) //* CHANGE &VOL TO VOLUME DESIGNATION (DISK, 3390, ETC) //* CHANGE &BASEPREF TO BASE SOFTWARE DSN PREFIX //* CHANGE &PROJPREF TO PROJECT PREFIX (TRILPROJ) //* CHANGE &TRILVER TO TRILLIUM SOFTWARE VERSION (TRILNVN) //* //********************************************************** //CFCRCDRV EXEC PGM=CFCRCDRV,REGION=5500K,PARM=’/-PF PARMFILE’, // REGION=0M //STEPLIB DD DISP=SHR, // DSN=&BASEPREF.&TRILVER.LOADLIB // DD DSN=CEE.SCEERUN,DISP=SHR // DD DSN=CEE.SCEERUN2,DISP=SHR //SYSPRINT DD SYSOUT=* //SYSOUT DD SYSOUT=* //STAT DD SYSOUT=* //TRILMSGS DD DUMMY //PF DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.SAMP.UTIL.PARMLIB(PFCRCDRV) //INPUT DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.SAMP.UTIL.CRCDRVIN //OUTPUT DD UNIT=&UNIT, // DISP=(NEW,CATLG,DELETE), // VOL=SER=&VOL, // DCB=(RECFM=FB,LRECL=97,BLKSIZE=0), // SPACE=(TRK,(5,5),RLSE), // DSN=&PROJPREF.&TRILVER.SAMP.UTIL.CRCDRVOT //INPDDL DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.SAMP.UTIL.DDLLIB(CRCIPDDL) //OUTDDL DD DISP=SHR, // DSN=&PROJPREF.&TRILVER.SAMP.UTIL.DDLLIB(CRCOPDDL) //*

Trillium Software System™ Batch User’s Guide Running Create Common 9-17

Running Create Common

On UNIX and 32-bit PC Platforms cfcrcdrv -pf [-pe ] where:

cfcrcdrv The Create Common driver program.

-pf Keyword that indicates the parameter file follows.

The driver parameter file containing the program instructions.

-pe Keyword that indicates the parameter echo file follows.

Optional output text file generated by the program that displays errors that occur when loading the parameter file (generally used in first time debugging of process).

Example cfcrcdrv -pf ..\parms\pfcrcdrv.par -pe ..\data\parmecho.txt

Create Common Module 9-18 Error Messages

Error Messages This table shows error messages returned from Create Common.

Table 9.4 Create Common Error Messages

CFCRCDRV ERROR: (Error Description Message)

Parm Processing Error, status = 2 The parameter file is present but incorrect. Check path and filename.

Parm Processing Error, status = 3 Parm echo file is present but incorrect. Check path and filename.

Parm Processing Error, status = 4 The program has encountered an error with a parameter entry. Use the debugging process to determine the entry that is incorrect.

Parmlist Problem The parameter file is missing from the command line.

open cfcmcrcm failed. The parameter file is missing. Check file name in the parm directory.

malloc for curr_key failed. Buffer storing the record with the current match key was insufficient. Try closing running programs to free up system memory.

malloc for prev_key failed. Buffer storing the record with the previous match key was insufficient. Try closing running programs to free up system memory.

malloc for this_key failed. Buffer storing the record with this match key was insufficient. Try clos- ing some running programs to free up system memory.

malloc failed for set_ctl. Builds an array which holds all the household or individual members.

input file is empty; The input file is blank.

process cfcmcrcm failed. Error while processing data. User has no control over in batch.

close cfcmcrcm failed. The close call failed. User has no control over in batch.

copyrecord record failed .

sequence error. The input file is not in the appropriate sort sequence.

later malloc failed for set_ctl. Array to hold household or individual for a match increased to REAL- LOC_RECORD value, and memory reallocation failed again.

realloc set failed. Array to hold household or individual for a match increased to REAL- LOC_RECORD value, and memory reallocation failed again.

Trillium Software System™ Batch User’s Guide Index I-1

in non-OS/390 platforms 7-71 Index cfprsdrv programs Numerics UNIX platforms 32 bit PC 4-52 1 5-4, 6-2 cfprsdsp A parse display program 4-117 cfprsdsp programs Absolute (Comparison routine) 7-95 in Non-OS/390 platforms 4-125 ADD_ENDING 3-9 CFWINKEY Add’l Street Line Info Sample Parameter File 6-6 for Parser Report 4-120 Sample Parameter File #2 6-6 ALIAS 3-10 CFWINMAT ARITHMETIC_COMPARE 1-7 Parameters 7-84 ARRAY1 (ARRAY1,n) Comparison CHANGE_DDNAME routine 7-98 parsing parameter 4-46 Attributes 4-2, 5-2 City Directory AUSTRALIAN_CATEGORY_VALUE 4-14 definition 4-8 Auxiliary City Directory Client Specific Data definition 4-9 for Parser Report 4-119 B CLIENT_WDPAT_FNAME 3-10 BPREPOS (Business Data Parser Common data segment Repository) format 7-56 output record review codes 5-26 Comparison Routine Details 7-95 BRAZILIAN_CATEGORY_VALUE 4-14 Comparison Routine list Business Data Parser individual routine descriptions logic flow 5-2 7-95- 7-161 Busname (Comparison routine) 7-100 Comprehension codes 4-3, 4-4 Confidence codes 4-3, 4-4 C Converter CANADIAN_CATEGORY_VALUE 4-14 Overview 2-1 cfcondrv 2-2 Parameter List 2-4 Error Messages 2-52 Record Select and Bypass running on UNIX and 32-bit PC Functionality 2-8 2-50 Comparison Operators 2-10 Sample JCL 2-51 Rule File Examples 2-11 cffxmdsp 7-68 Rules File Subset Parameters Program Execution 7-76 2-9 cfmatdsp 7-68 sample recode table 1-19 cfmatdsp program COUNTRY 3-11

Index I-2 Index

Country Router 3-1 sonal Names 4-5 COUNTRY_CODE_FIELD 3-4 Multiple Parsers 4-58 COUNTRY_CODES 3-11 Name and Record Generation 4-49 COUNTRY_LIST 3-5 Name Pattern Levels 4-61 COUNTRY_NAME 3-12 NEIGHBORHOOD_FORMAT_OPTI COUNTRY_PREFIX 3-12 ON 4-20 Customer Data Parser 4-1 other special DDL fields CFPRSDRV driver 4-11 in PREPOS DDL 4-12 CHANGE_DDNAME special Output 4-64 functionality 4-46 Output display program 4-117 Comma Reversed Names 4-6 Parameter List 4-118 DEFAULT_ORIGIN 4-15 Pattern Leveling 4-60 DETAIL_DISPLAY 4-15 pfparser.par parm list 4-14 DISPFNAME 4-16 PREPOS Display Program Error Messages Codes Section 4-66 4-124 Geocode Section 4-82 Driver parm file (pfprsdrv.par) Geographic Section 4-78 4-40 Input Geographic Match Sec- Error Messages 4-54 tion 4-83 Functional Capabilities 4-4 Review Codes and Review Groups How Multiple Parsers Work 4-58 4-108 ISALFILE 4-16 Review Group Hierarchy 4-113 ISNMFILE 4-17 running cfprsdrv on UNIX and PC JOIN_LINES special join 4-52 functionality 4-47 Special DDL fields 4-12 KEEP_CHARACTER 4-18 Street Pattern Levels 4-63 KEEP_DELIMITER 4-19 tuning 4-105 LIMIT_GENDER_ASSIGNMENT D 4-19 Line identification 4-2 Data Reconstructor Line Pattern Identification 4-123 Input 8-2 Line Pattern Identification Codes DDL 4-59 input record 4-8 Log File 4-92 DELIM 3-5 Bad Name, Street Patterns and Detail File City Problem Section definition 4-10 4-97 Detail file 4-104-?? Logic Flow 4-2 samples 4-104 Determining Business vs. Per- DIFFER 7-105

Trillium Software System™ Batch User’s Guide Index I-3

DISP_CONFIDENCE0 4-15 See Auxiliary City table Display file 4-3, 5-2 See City directory use with tuning methods 4-105 GERMAN_CATEGORY_VALUE 4-16 DROP_PERIOD 3-12 Global Data Router 3-1 E Error Messages 3-30 OS/390 Execution 3-25 ELIMINATE_DUPLICATE_DWELLINGS parameter syntax 3-3 4-16 Parameters 3-4 F Rules File 3-8 Parameter List 3-9 Field Comparison Routine Lists Sample File 3-27 See Comparison Routine list running the program 3-24 FIELD_COMPARE 1-12 Sample Log File 3-29 FIELD_SCAN 1-13 Sample Parm File 3-8 FIELD_SCAN_TABLE 1-15 GLOBAL_GEOG_FNAME 3-5 FIELDS 3-5 Grade Score (Threshold) File Matching in Comparison Routine list format definition 7-4 7-26 FLAGFM Comparison Routine 7-107 FLAGYN Comparison routine 7-110 H FREQ_ALL_FIELDS 1-9 Heading FREQ_PER 1-10, 2-13 for Parser report 4-119 FREQUENCY 1-16 Household, Individual, and Business FRSTNAME (Comparison routine) Matching 7-26 7-111 HOUSENO (01) Comparison routine Functional Capabilities of the Parser 7-119 5-3 HOUSENO (NORANGE) Comparison Functions routine 7-118 Primary Parser functions 5-3 HOUSENO (Parity) Comparison Secondary Parser functions 5-3 routine 7-118 G HOUSENO Comparison Routine 7-116 GENER (95) Comparison routine 7-116 I GENER (Comparison Routine) 7-115 IGNORE_BUSINESS_MEANINGS 4-16 GEOG_CITY_CHG_RECODE 3-13 INP_DDL 3-5 GEOG_PREFIX 3-14 INP_FIELD_TRAN_FNAME 2-16 Geographic Data INP_FIELD_TRAN01-10 2-15 for Parser report 4-120 INP_FNAME 3-5 Geographic directories 4-98 INP_FREQ 2-31

Index I-4 Index

INP_TRAN01-10 2-16 TO_UPPER_FNAME 1-5 Input J See Grade Pattern list Input files JOIN_LINES See City directory parsing parameter 4-47 See Parser Parameters 5-4, 6-2 K See Pattern table Keywords See Word/Phrase table in parser parameter file 4-14, 4-38 Input Name and Address (INA) definition 4-8 L Input Name and Address Area (INA) LENGTH_OVERRIDE 2-17 4-2 LIMIT_GENDER_ASSIGNMENT 4-19 Investigator 1-1 LINE1 4-20 ARITHMETIC_COMPARE 1-4 Log file Driver 1-1 See Statistics file 5-2 Error Messages 1-30 use with tuning methods 4-105 FIELD_COMPARE 1-4 LOG_FNAME 3-5 FIELD_SCAN 1-4 LOG_NTH_COUNT 3-6 FIELD_SCAN_TABLE 1-4 FREQ_PER 1-4 M FREQUENCY 1-4 mastat 7-36 INP_DDL01 1-4 Master File Records 7-5 INP_FNAME01 1-4 Master File Records, Candidate INP_RNAME01 1-4 Records 7-9 MAXIN01 1-4 Match Testing Early exit 7-34 MULTI_FIELD_LOOKUP 1-5 Matched Information 7-9 NTH_SAMP 1-5 Matcher Comparison Routines 7-94 NUMERIC_RANGE_COMPARE 1-5 Matching Prevention 7-31 Order of Operations 1-2, 1-3 Matching Propagation 7-32 OUTFLD_FNAME 1-5 MAX_NUMB_NAMES 4-43 OUTREC_FNAME 1-5 MAX_PR_REV_GROUP 4-118 Parameter List 1-4 MAXIN 3-6 PRINT_NTH_COUNT 1-5 Misc Geographic Data running on UNIX and PC 1-33 for Parser Report 4-120 Sample Driver Parameter File 1-6 MULTI_FIELD_LOOKUP 1-17 Sample Output Statistics 1-29 Multiple parsers SINGLE_FIELD_LOOKUP 1-5 See also Parser Parameters START 1-5 MXDNAME (Comparison routine) STAT_FNAME 1-5 7-120

Trillium Software System™ Batch User’s Guide Index I-5

N OUT_FIELD_COMPARE 2-29 Name Data OUT_FNAME 3-6 OUT_FREQ 2-31 for Parser report 4-120 OUT_MULTI_RECODE 2-34 Name Pattern Levels 4-61, 4-63 OUT_RANGE_RECODE 2-37 NAME_PARSING_DEPTH 4-21 OUT_RECODE 2-39 NEWLINE 3-6 OUT_SCAN_RECODE 2-41 NO_SPECIAL_BUSINESS_SERVICE OUT_SCAN_TABLE 2-46 4-22 NO_SPECIAL_CHARACTER_LOOKUP_ OUTFLD_FNAME 1-22 OUTREC_FNAME 1-23 SERVICE 4-22 NO_SPECIAL_COMMA_NAME_REVER OUTRPRT_DDNAME 4-44 SE_SERVICE 4-23 P NO_SPECIAL_HOUSE_SERVICE 4-23 Parameter files/lists NO_SPECIAL_INTERNET_BUSINESS_ format 7-26 SERVICE 4-24 see Grade Pattern list NO_SPECIAL_JOIN_STREET_ADJACE Parameter values NT_ALPHA_SERVICE 4-24 in parser parameter file 4-14 NO_SPECIAL_NAME_PREPARATION Parser Context Diagram 4-7 4-24 Parser Display Report NO_SPECIAL_ORDINAL_STREET_SER field descriptions 4-119 VICE 4-25 Parser Parameter File NOMATCH_FNAME 3-6 definition 4-8, 5-4 NTH_SAMP 2-18 Parser Scrub report Numeric Grades, Grade Patterns, and description 4-101 Field/Comparison Routine Listing sample 4-99 7-28 Parsing Parameters NUMERIC_RANGE_COMPARE 1-20 JOIN_LINES 4-47 O PARTIAL1 (10) Comparison routine 7-125 ONECOM (Comparison Routine) 7-124 PARTIAL1 (Comparison routine) 7-125 ORDER_OF_OPERATIONS 2-18 PARTIAL1 (FM) Comparison Routine ORG_RECORD field 7-126 in DDLs 4-12 PARTIAL1 (GN) Comparison Routine OUT_ARITHMETIC_COMPARE 2-19 7-127 OUT_ARITHMETIC_RECODE 2-21 PARTIAL1 (MF) Comparison routine OUT_BUILD_OR_LIST 2-23 7-128 OUT_CHANGE_RECODE 2-27 PARTIAL1 (MU) Comparison routine OUT_CHANGE_TABLE 2-29 7-129 OUT_DDL 3-6

Index I-6 Index

PARTIAL1(YN) Comparison routine description 4-3 7-130 Field list 4-65 PARTIAL2 7-135, 7-137 non-display and display field PARTIAL2 (RSOUNDEX1) Comparison values 4-64 Routine 7-137 PREPOS_FORMAT_OPTION 4-28 PARTIAL2 (RSOUNDEX2) Comparison PREVENT (Comparison routine) 7-151 Routine 7-142 PREVENT Field 7-31 PARTIAL2 (SOUNDEX1) Comparison Primary Parser Functions 4-4, 5-3 Routine 7-135 PRIMARY_AUX_CITYFNAME 4-28 PARTIAL2 (SOUNDEX2) Comparison PRIMARY_AUX_MAIN 4-29 Routine 7-140 PRIMARY_CITY_NAME_TYPE 4-29 PARTIAL2 Comparison routine 7-133 PRIMARY_CITYFNAME 4-29 pastat PRIMARY_DETFNAME=PARSDET 4-29, Parsing Statistics Report 4-116 4-30 Pattern Leveling 4-60 PRIMARY_GEO_CATEGORY=(user- Pattern Table defined value) 4-30 definition 4-8, 5-5 PRIMARY_GEO_VALUE 4-30, 4-31 pfccflds.par 7-26 PRIMARY_LOGFNAME=PARSLOG 4-14 pfcoflds.par 7-26 PRIMARY_PATTERNFNAME 4-31 pfcopats.par 7-29 PRIMARY_WORDFNAME 4-31, 4-32 pfhhflds.par 7-26 PRINT_NTH_COUNT 1-24, 3-7 pfhhpats.par 7-29 Probablistic Matching 7-33 pfindrv.par 1-6 Program Execution pfinflds.par 7-26 cfmatdrv 7-66 pfinpats.par 7-29 cfmatdsp 7-71 PFMATDRV Programs sample parm file 7-25 cffxmdsp 7-68 Possible Combinations of cfprsdsp 4-117 OUTSCAN_RECODE Scan Operators cfwinmat 7-84 2-45 Propagation POSTCODE (Comparison Routine) Transitivity 7-32 7-148 R POSTCODE When Parmval = TSB 7-149 recode table pr_name_sect 4-65, 4-87, 4-90 for the converter 1-26, 2-40 pr_rev_group 4-70 Record Information PREFIX (Comparison Routine) 7-150 for Parser report 4-119 PREPOS (Customer Data Parser Reference Matching 7-6 Repository) Report File

Trillium Software System™ Batch User’s Guide Index I-7

definition 4-10 definition 5-5 Report file Street Data format 4-101 for Parser report 4-120 Review Code Data STREET_PARSING_DEPTH 4-35 for Parser report 4-119 STREETS (Comparison routine) 7-160 Review codes for output records 4-3 SUBSTRNG (Comparison Routine) RULES_DDNAME 3-7 7-163 RULES_ECHO 3-7 Summary Statistics 7-5, 7-9 S Summary Statistics file (mastat) 7-36 Sample CMI parameter file 7-11 T Sample JCL TO_LOWER_FNAME 2-47 cfcrcdrv 9-16 TO_UPPER_FNAME 1-28, 2-48 cffxmdsp 7-77 Transitivity cfmatdrv 7-67 in Matching Prevention 7-32 cfmatdsp 7-72 TRILLIUM 4-100 cfprsdrv 4-53 TWORET (Comparison Routine) 7-165 cfprsdsp 4-126 U cfwinmat - Update Mode 7-89 Sample Matcher Parm File UNITED_KINGDOM_CATEGORY_VAL pfmatdrv 7-25 UE 4-36 Sample Parser Log File 4-92 UNITED_STATES_CATEGORY_VALUE SAVE_COUNTRY_FIELD 3-7 4-36 SAVE_WEIGHT_FIELD 3-7 Updated Records (input to Matcher) Scores 7-5 for APTNO Comparison routine UPLOW_FNAME 2-49 7-96 User Common Data Routines 7-56 for Frstname 7-112 USER1 4-37 Secondary Parser Functions 5-3 USER1 - USER9, USERA 4-37 SINGLE_FIELD_LOOKUP 1-25 W SKIP_DELIMITER 4-32 Window Keys SOCSEC (Comparison Routine) 7-152 Sample Rules File 6-11 SORT_DWELLINGS 4-33 Window Matching SPELLING (Comparison Routine) Input/Output 7-5 7-157 Word/Phrase Table START 1-27, 3-7 definition 4-8, 5-4 STAT_FNAME 1-27 Statistics File

Index