<<

Matching Discussion DMSC, October 2012 rev.

Record matching and ownership algorithm: http://platinum.ohiolink.edu/dms/DMSdocs/match.htm Implemented at OhioLINK, February‐June, 2012.

Problems:  Bad matches  Duplicate OCLC records  Non‐contributed records

BAD MATCHES: (these examples have now been cleaned up; this is how they looked before clean‐up)

BAD MATCH 1:  different OCLC numbers  no other 0xx fields to match on  matched on 110matchkey (titles 245|a and imprints match)  validity check passed because does not look at 245|n|p Local record Matched at Central 001 1480947 001 1480945 245 00 Reports service :|p West Africa 245 00 Reports service: |p Southeast series Europe series 246 10 West Africa series 246 10 Southeast Europe series 260 [New York] |b American Universities 260 [New York] :|b American Field Staff Universities Field Staff 300 v. |b ill., ports., maps.|c29 cm 300 v. |b ill., ports., maps.|c29 cm

BAD MATCH 2:  different OCLC numbers  match on 020  validity check passed, because first three words of title match  records are serials, so 260|c is not checked Local record Matched at Central 001 639045 001 135791 020 0306651408 020 0306651408 245 10 Mössbauer effect data index:|b 245 10 Mossbauer effect data index, covering the 1970 literature covering the 1969 literature. 260 New York:|b IFI/Plenum, |c[1972] 260 0 New York ,|b IFI/Plenum |c[1970]

BAD MATCH 3:  different OCLC numbers  match on 020  100|a, 245|a, and 260|b|c match  validity check passed because does not look at 245|n|p, or 300 Local record Matched at Central 001 668194718 001 730996921 020 9781606994412 020 9781606994412 (v. 1) 020 1606994417 020 1606994417 (v. 1) 100 1 Gottfredson, Floyd. 020 9781606994955 (v. 2) 245 10 's . |n 020 1606994956 (v. 2) [Volume 1],|p"Race to Death Valley" /|c by 100 1 Gottfredson, Floyd Floyd Gottfredson ; series editiors: David 245 10 Walt Disney's Mickey Mouse /|c by Gerstein and . Floyd Gottfredson 246 30 Mickey Mouse. |p "Race to Death 246 30 Mickey Mouse Valley" 260 Seattle :|b Books, |c 246 30 Race to Death Valley c2011- 260 Seattle, WA : |b Fantagraphics 300 v. :|b chiefly ill. ;|c23 x 27 cm Books ;|a[S. l.] : |b distributed in the U.S. by W.W. Norton, |c c2011. 300 286 p. :|b chiefly ill. (some col.) ;|c23 x 27 cm

BAD MATCH 4:  different OCLC numbers  match on 022|y  validity check passed, because first three words of title match Local record Matched at Central 001 55201861 001 34488341 010 2005255010 022 |y 0076-6879 022 0 1557-7988 |y 0076-6879 |2 1 245 00 Methods in enzymology index |h 245 10 Methods in enzymology |h [electronic resource] [electronic resource] 246 13 Methods in enzymology index CD-ROM 260 New York, NY : |b Academic Press, 260 San Diego, Calif. :|b Academic |c 1955- Press ;|a San Francisco, Calif. :|b Lightbinders, |c c1995- 300 computer laser optical discs :|b col. ; |c 4 3/4 in. +|e 1 guide (6 p. : ill. ; 12 cm.)

Solution to bad matches, implemented 5/8/2012: Record do not merge if both records have 001 fields (OCLC numbers) that differ.

One of our major goals was to merge new and old OCLC records (001 in one record matches 019 in another record). Also, we wanted to merge OCLC and non‐OCLC records for the same item. For this, we had to allow records merge, even if the 001 fields did not match. Because of the bad matches described above, we had to turn OFF this feature. That is, we had to return to the requirement that if 001’s do not match, the records are unique. Side effects:

 Cannot merge records old and new OCLC records based on 001/019  Cannot merge records non‐OCLC records into OCLC records for the same item  Can still merge SkyRiver and OCLC because it's a different process

DUPLICATE RECORDS: Multiple instances of an OCLC number. Caused during validity check; some criterion makes the system believe they are different items.

DUPLICATE RECORDS 1:  same OCLC number  one has 533 and the other does not Record at Central Dup at Central 001 184905387 001 184905387 245 00 Software engineering for multi- 245 00 Software engineering for multi- agent systems V |h[electronic resource] agent systems V |h[electronic resource] 260 Berlin ;|a New York 260 Berlin ;|a New York :|bSpringer,|cc2007. :|bSpringer,|cc2007 300 1 online resource (xii, 231 p.) 300 xii, 231 p. :|bill. ;|c24 cm :|bill. 533 Electronic reproduction .|b Berlin :|Springer,|d2008. |n System requirements: Adobe Acrobat Reader and Internet browser; text in HTML and PDF. |n Mode of access: World Wide Web. |n Title from screen …

DUPLICATE RECORDS 2:  same OCLC number  260|c do not match Record at Central Dup at Central 001 1564718 001 1564718 022 0589-1132 111 2 Conference on Group Processes 111 20 Conference on Group Processes |n(1st :|d1954 :|c Ithaca, N.Y.) 245 10 Group processes :|b transactions of 245 10 Group processes :|b transactions of the ... conference / the first conference, 246 03 Transactions of the ... conference 260 New York :|b J. Macy, Jr. 260 01 New York, N.Y. :|b The Foundation,|cc1955 Foundation,|c1954- 300 334 p. :|bill. ;|c24 cm 300 v. :|bill. ;|c24 cm 310 Annual 362 0 1st-5th

DUPLICATE RECORDS 3:  same OCLC number  one has GMD 245|h; the other does not Record at Central Dup at Central 001 21903537 001 21903537 100 1 Raab, James M 100 1 Raab, James M 245 00 Ground-water resources of Adams 245 10 Ground-water resources of Adams County /|c by James M. Raab ; Diane County |h [cartographic material] /|c by Hamilton, cartographer James M. Raab ; Diane Hamilton, 255 Scale [ca. 1:63,000] cartographer 260 Columbus :|b Ohio Dept. of Natural 255 Scale [ca. 1:63,000] Resources, Division of Water, Ground-Water 260 Columbus :|b Ohio Dept. of Natural Resources Section,|c1989 Resources, Division of Water, Ground-Water 300 1 map :|b col. ;|c100 x 67 cm Resources Section,|c1989 300 1 map :|b col. ;|c100 x 67 cm

SOLUTION implemented 10/12/2012: Remove 533 from list of fields that require a separate record. Other types of dups will continue to occur; reassess when it comes times to renew contract for matching algorithm.

NON‐CONTRIBUTED RECORDS: no‐load options caused certain legitimate records to be suppressed from Central because they lack 260.

Archival record:

001 431996071 100 1 Resnik, Judith,|d1949-1986 245 10 Judith Resnik papers,|f1976-1986 300 4|fcubic feet |a (4|fboxes)

RDA record:

001 794365164 020 2870312776 020 9782870312773 100 1 Albertson, Fred,|d1952- 245 10 Mars and Rhea Silvia in Roman art /|c Fred C. Albertson 264 1 Bruxelles :|b Éditions Latomus,|c2012 300 241 pages, xxxiii pages of plates :|b illustrations ;|c24 cm 336 text|2rdacontent 337 unmediated|2rdamedia 338 volume|2rdacarrier

SOLUTION implemented 8/30/2012: Eliminate the no‐load step altogether.