Matching Scores
Total Page:16
File Type:pdf, Size:1020Kb
68 Bridge, St. Suite 307 +1 888-779-6578 [email protected] www.DataLadder.com Suffield, CT 06708 Get Clean, Stay Clean Solution DataMatch Enterprise™ Server + API is a component written by Data Ladder for state of the art fuzzy matching, data formatting, and data cleansing – amongst its most common uses are duplicate prevention, inquiry, deduplication and merge/purge. The DataMatch Enterprise™ API splits and cases names and addresses, generates match keys for phonetic matching, generates 3‐grams for a more accurate fuzzy match and grades matching records. The component pr ovides a compact and efficient solution to the problems of data quality and duplication on any Windows-based system. High Performance Quick Intuitive and Scalability Implementation Interface Delivers results quickly regardless No advance preparation Execute data projects of size of database needed to start within a matter of days Robust Seamless Integration Syncs with Data Matching Technology with Databases in Real-Time Find what you’re looking for with Operates apart from and links to Instantaneous updates the world’s best matching and current databases for maximum work in conjunction with deduplication technology speed and effciency as part of matching updates the API How it works Load Selected Project Run Search Process Go to Settings Window Live Search Demo 3.1.13.1 (1.0.7.7) - X Enter Search Word Search Criteria Start Settings Victor Search time “Victor”: 120 ms Hide log Live Search Search V Score Data Source Record Company Address City 11/12/2018 2:57:44 PM - Start loading Engine Wrapper Name Name No 0 by project ‘smoke3.1.7.0’ 100.00 Customer Master 1152 Hungry’s Express... 11700 Old Katy Rd hOUSTON 11/12/2018 2:57:44 PM - End loading Engine Wrapper 0 without errors in No0 100.00 Customer Master 1550 Ayala Refrigeratio... 916 E Euclid Ave pHOENIX Load time No0: 9241 ms 95.00 Customer Master 2066 Airbrush Guy & Co 815 S Market St bENTON Loading finished successfully Search Time ‘0ФЛ’: 107 91.90 New Prospect R... 6109 Victor Arcos 417 Jacson St Search Time ‘0ФcЛ’: 76 90.90 New Prospect R... Search Time ‘0ФЛ’: 60 6737 Lcu P.O. BOX 4544 Search Time ‘Jac’: 162 -1 2 89.28 New Prospect R... 6827 Desert Grove P.O. BOX 60352 Search Time ‘Jack’: 137 - 14 Disable/Enable Live Search Search Time ‘Jac’: 121 - 12 89.23 6883 Missouri Pub Stat... P.O. BOX 685 Options New Prospect R... Search Time ‘Vic’: 86 - 10 89.20 New Prospect R... 7680 Victoria Logistics P.O. BOX 24119 Search Time ‘Vict’: 103 - 10 Search Time ‘Victo’: 91 - 9 88.57 New Prospect R... 7011 Scenic Hills Realty P.O. BOX 90 Search Time ‘Victor’: 120 - 17 88.00 New Prospect R... 6525 Mc Teer & P.O. BOX 2368 87.50 New Prospect R... 6751 Coliman Pacific P.O. BOX 48 V V V So what do you do if there are inconsistencies or variations in your data? Even worse, what if there are different errors in both a database and a search engine? Data Ladder’s DataMatch Enterprise™ Server + API finds the right data – even with incomplete information. Our algorithms can find the areas of similarity regardless of what fields they’re located in or however the data is aligned. Our platform is a robust approach to making imperfect data usable. Our platform can make the right connections with any type of structured data. From spelling errors to redundancies, our tool can work through many of the common issues found in large amounts of data. DataMatch Enterprise™ Server + API can handle many of the issues that compromise your data systems. Our system is scalable – even with large datasets, the information can be analyzed with lightning fast response times. The result for you? Increased accuracy and less manual work needed. Our software integrates directly with your database, yet functions independently and doesn’t affect any other applications. *As seen in 20 different independent match comparisons, DataMatch Enterprise™ Server found 5-10% more matches than any competitor or in-house solution. Match Accuracy 40K to 4M Purchase / 40K Records 400K Records Speed Records Licensing Costs Data Ladder 96% 91% 95% Very Fast Low IBM Quality Stage 88% 87% 91% Fast High ($250K+) SAS DataFlux 84% 84% 81 % Fast High ($250K+) Note: The above tests were completed on internal test data (External cofirmation in process). Take into account, these tests were done using our proprietary algorithms; no pre-processed algorithmic results were used. API Architecture Diagrams Client 1 Client 2 Client N Client 1 Client 2 Client N 1 Insert new 1 record Insert new record Back - End 2 Back - End Check if Try insert Use fuzzy logic Inform back-end layers exists 2 7 3 record into 9 DB about record uniqueness Uniqueness DME Uniqueness Check logic API Trigger Check logic DB 6 4 3 Trigger Use Inform on insert fuzzy 4 8 about logic uniqueness 5 YES 6 Record Rollback DME exists? transaction API DB 7 Inform about duplicate NO Record 5 YES exist? 6 Insert record into NO the DB Commit Inform back-end layer transaction 7 about record uniqueness Fig 1. Client server architecture A) DME API is used as an intermediate layer B) DME API is called from DB triggers between DB and a business layer Overview There are two fundamental parts to the _ Live Search Demo 3.1.13.1 (1.0.7.7) X DataMatch Enterprise™ Server + API: Powered by: Live Search Real Time Duplicate Check Record Indexing Record Matching Frontend Search Criteria Use the cache table Use floating tresholds General Fields Path: Andersen Submit lastname (90%) Auto Match: 90 Reset Manuel Review: 80 These can be utilized in dierent scenarios: Backend Record status: DUPLICATE Score id title firstname lastname company address1 address2 address3 zip date pai Data capture incorporating duplicate 98.2000001788... 1 Mr Gary Anderssen Mobil Oil Canada 1160-1124 Aviati... Hunstville Alabama 35894 11/13/2008 6:31... 0 prevention Database id title firstname lastname company address1 address2 address3 zip date 1 Mr Gary Anderssen Mobil Oil Canada 1160-1124 Aviati... Hunstville Alabama 35894 11/13/2008 6:31... Single data source matching 1 Ms Carrie Conrad Intergraph Corpo.. One Madison Ind... Louisville Missouri 64116 8/26/2009 10:37... 1 Mr Ron Olsen Universal Under... 10 Richards Road Boise Idaho 83705 12/4/2009 1:48 ... 1 Mr Carol Lisney Boise Cascade C... 3565 South Owy... Batavia Illinois 60510 7/8/2009 1:53 AM Cross data source matching Match Definitions Match definition is a set of rules we apply on the fields to apply in the matching process. Match definition for one field consisting of: Live Search Demo 3.1.13.1 (1.0.7.7) - X Matching type which can be Fuzzy or Exact. Before doing any of those two (Fuzzy or Exact) Search Criteria we can transform the input to its phonetic Start Settings Victor Search time “Victor”: 120 ms Hide log equivalent: Live Search Search V Score Data Source Record Company Address City 11/12/2018 2:57:44 PM - Start loading Engine Wrapper Name Name – Phonetic No 0 by project ‘smoke3.1.7.0’ 100.00 Customer Master 1152 Hungry’s Express... 11700 Old Katy Rd hOUSTON 11/12/2018 2:57:44 PM - End loading Engine Wrapper 0 without errors in No0 100.00 Customer Master 1550 Ayala Refrigeratio... 916 E Euclid Ave pHOENIX Load time No0: 9241 ms 95.00 Customer Master 2066 Airbrush Guy & Co 815 S Market St bENTON Loading finished successfully Example: phonetic transformation of words Search Time ‘0ФЛ’: 107 91.90 New Prospect R... 6109 Victor Arcos 417 Jacson St Search Time ‘0ФcЛ’: 76 90.90 New Prospect R... Search Time ‘0ФЛ’: 60 6737 Lcu P.O. BOX 4544 Search Time ‘Jac’: 162 -1 2 Dayton and Deighton is equal. 89.28 New Prospect R... 6827 Desert Grove P.O. BOX 60352 Search Time ‘Jack’: 137 - 14 89.23 Search Time ‘Jac’: 121 - 12 New Prospect R... 6883 Missouri Pub Stat... P.O. BOX 685 Search Time ‘Vic’: 86 - 10 89.20 New Prospect R... 7680 Victoria Logistics P.O. BOX 24119 Search Time ‘Vict’: 103 - 10 Search Time ‘Victo’: 91 - 9 88.57 New Prospect R... 7011 Scenic Hills Realty P.O. BOX 90 Search Time ‘Victor’: 120 - 17 88.00 New Prospect R... 6525 Mc Teer & P.O. BOX 2368 87.50 New Prospect R... 6751 Coliman Pacific P.O. BOX 48 V If match definition is Fuzzy than we need to apply V V a value for the: – Level It defines the threshold for the comparator. If the results of the comparison are equal to higher than Level, the match would be considered successful. Matching Scores Matching score is the average value of all Live Search Demo 3.1.13.1 (1.0.7.7) - X matching scores per individual fields. If any Real Time Search Unique Check Logic Database Connection: ... field has a matching level below the level the Triggers: Full Name: First Name: complete score will be 0. Last Name: Address: Treshold Definitely Match: - Potential Duplicate: - Not a Duplicate: - Build Results: Name 1 dbo.Employees 25 49 True 98.21 98.21 98.21 98.21 98.21 MR G SONATA Gohn Sonata POST OFFICE BOX 192610 88000 [email protected] 3/17/2008 2 dbo.Employees 49 25 False 99.48 99.48 99.48 99.48 99.48 MR G SOPATA Gonatan Sopata POST OFFICE BOX 192610 88000 1987654321 2/4/2009 3 dbo.Employees 29 66 True 90.12 90.12 90.12 90.12 90.12 MR R OLESEN Rick Olesen 10 RICHARDS ROAD 83705 6/18/2009 4 dbo.Employees 66 29 False 74.25 74.25 74.25 74.25 74.25 MR R MANSEN Raul Mansen 4909 EAST MCDOWELL ROAD 80524 2/10/2008 5 dbo.Employees 35 45 True 20.39 20.39 20.39 20.39 20.39 MIKE SAWALL Mike Sawall 8227 GRAVOLS 27425-5408 11/5/2009 6 dbo.Employees 45 35 False 61.78 61.78 61.78 61.78 61.78 MIKE BASLER Mike Basler POST OFFICE BOX 2261 93033 1/11/2009 V V Back-End Part - Database and API Service Suppressing management as additional service layer After adding “John Smith” it is impossible to add “Johnny Smith” because first record was saved as unique record in database Back - End and in the cache of DME API Services 1.