Temporary Keyory Error Pattern Ly 6
Total Page:16
File Type:pdf, Size:1020Kb
US006023536A United States Patent (19) 11 Patent Number: 6,023,536 Visser (45) Date of Patent: Feb. 8, 2000 54 CHARACTER STRING CORRECTION OTHER PUBLICATIONS SYSTEMAND METHOD USING ERROR PATTERN Simpson, A. “Mastering WordPerfect 5.1 & 5.2 for Win dows”, pp. 362-365, 1993. 75 Inventor: Eric M. Visser, Kawasaki, Japan Novell, Inc. “WordPerfect 6.1 User's Guide” p. 468, screen 73 Assignee: Fujitsu Limited, Kawasaki, Japan capture,t 1994. Primary Examiner Amelia Au 21 Appl. No.: 08/668,222 Assistant Examiner-Larry J. Prikockis 22 Filed: Jun. 21, 1996 Attorney, Agent, or Firm-Staas & Halsey LLP 30 Foreign Application Priority Data 57 ABSTRACT Jul. 3, 1995 |JP Japan - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7-167676 A character String correction System COrrectS a Spelling CO (51) Int. Cl." ....................................................... G06K 972 E. 52 U.S. Cl. ........................... 382/310; 382/229; 707/532 is preliminarilyor pattern Setrep and Storedg Ireq in the memory, etc. A pro 58 Field of Search ..................................... 382/229, 231, ceSSor reads an input character String character by character, 382/310, 311, 309, 185-187, 177; 395/793-796; and compares the read character with the error pattern. If the 707/532,533, 534 input character String matches an error pattern, it is assumed that an error exists. The input character is replaced with one 56) References Cited of the alternative R Using the in the String U.S. PATENT DOCUMENTS or the character String corrected with an alternative character, a dictionary (TRIE table) is searched. If a corre Re. 35,738 2/1998 Woo, Jr. et al. ........................ 382/311 Sponding word is detected in the dictionary, the word is 4.328,561 5/1982 Convis et al. .......................... '' output as one of the recognition results 4,979,227 12/1990 Mittelbach et al. ... 382/310 p 9. 5,161,245 11/1992 French ................. ... 382/310 5,315,671 5/1994 Higuchi................................... 382/309 10 Claims, 21 Drawing Sheets PERYANENT EVORY DICTIONARY 2 (TRIE TABLE) INPUT MODULE ERROR PATTERN ----------- ERROR CONDITION 4 20 PROCESR TEMPORARY KEYORY ERROR PATTERN LY 6 ------------ ANALYS IS PATH l 17 2) OUTPUT NODULE MORPHEAE r 18 DER WATION ALTERNATIVE -1 9 CHARACTER U.S. Patent Feb. 8, 2000 Sheet 1 of 21 6,023,536 TRE TABLE"OOt" INPUT CHARACTER DICTIONARY WORD TRE TABLE LINK FIG, 1A (PRIOR ART) TRE TABLE"r-" FG, 1B (PRIOR ART) FIG. 1C (PRIOR ART) D TRIE TABLE"rd." (PROR R INPUT CHARACTER DICTIONARY WORD TRE TABLE LINK U.S. Patent Feb. 8, 2000 Sheet 2 of 21 6,023,536 FG, 2 (PRIOR ART) START S1 READING TRIE TABLE"ROOT" READINGLEFTMOST CHARACTER S2 OF INPUT CHARACTER STRING; SHIFTING INPUT POINTERTO RIGHT S3 DOES INPUT CHARACTER MATCHTRIE TABLE ENTRY? DOES CHARACTER STRING TO BE PROCESSED MATCH DCTIONARY ENTRY RECOGNIZING CHARACTER STRING S7 DISCARDINGLAST CHARACTER S8 YES ANY CHARACTERS LEFT IN CHARACTER STRING TO BE PROCESSED N S9 ANALYSIS FAILED END U.S. Patent Feb. 8, 2000 Sheet 3 of 21 6,023,536 FIG. 3 INPUT CHARACTER STRING DCTIONARY STORAGE UNIT RETREVING UNIT ERROR PATTERN STORAGE UNIT CANDIDATE FOR RECOGNIZED WORD U.S. Patent Feb. 8, 2000 Sheet 4 of 21 6,023,536 FG. 4 PERYANENT VENORY (TRIEA TABLE) ri 2 13 --wr-e-r--armara-a-a-a-a-Mar-arramw-wowINPUT MODULE ERROR CONDITIONL- 2 PEN EPORrepre-rrrrrrrrl-ara EORY ERROR PATTERN s — ANALYS IS PATH 17 29 OUTPUT NODULE MORPHEME - 18 DER WATION ALTERNATIVE CHARACTER U.S. Patent Feb. 8, 2000 Sheet 6 of 21 6,023,536 FIG. 6A, CURRENT TRE TABLE AST - READ CHARACTER PATH = SUBSTITUTED CHARACTER OR CHARACTERS 4-dows ERROR PATTERN IN PROGRESS ERROR STATSTICS SO FAR INPUT POINTER POSITION STEP FIG. 6B ri ROOT PATH = O FIG. 6C root -" ca." ca. c t al PATH FOR "cat' c t (MISSPELLED “cra’) ((/a) => (/t) || 3) i error, weight 0.8 1 error, weight 0.6 O l 2 3 ST ST2 ST3 ST4 U.S. Patent Feb. 8, 2000 Sheet 8 of 21 6,023,536 FIG. 8 (2) S25 (A) S17 OBTAINING NEW PATH, TRIE TABLE, AND IS THERE ANY MORPHEMEDERVATION ERROR PATTERNIN UNDER ASSUMPTION PROGRESS2 THAT INPUT CHARACTER S CORRECT S18 AREERRORS ALLOWED ON CURRENT PATH S19 IS THERE ANY ERROR PATTERN IN READINGERROR PATTERN PROGRESS2 S20 IS THERE ANY ERROR PATTERN LEFT IN PROGRESS Y S21 SELECTING ONE ERROR PATTERN S22 SERROR PATTERN APPLICABLE OBTAINING NEW PATH, TRIE TABLE, AND MORPHEME DERVATION U.S. Patent Feb. 8, 2000 Sheet 9 of 21 6,023,536 FIG. 9 S31 S CORRECT S42 PATTERN EMPTY S37 HAS A MORPHEME BEEN N S32 RECOGNIZED COMPUTING ALTERNATIVE S38 N. , CHARACTER (STRING); WRIT ING COMPUTED CHARACTER WRITING MORPHEMEDERVA- TO TEMPORARY MEMORY TION TO TEMPORARY MEMORY SELECTING ONE ALTERNA S39 TIVE CHARACTER (STRING) FROM TEMPORARY MEMORY CURRENT ALTER NATIVE CHARACTER STRING EMPTY 2 S40 Y WRITING NEW PATH TO TEMPORARY MEMORY ALTERNATIVE CHARACTER MATCHTRE S41 ANY TABLE ENTRY OTHER ALTER NATIVE CHARACTER STBNg READING NEXT TRIE TABLE, N COMPUTING NEW PATH END U.S. Patent Feb. 8, 2000 Sheet 10 of 21 6,023,536 FIG, 10 S51 DOES INPUT CHARACTER MATCHTRE TABLE ENTRY READING NEXT TRE TABLE FROM PERMANENT MEMORY, COMPUTING NEW PATH AND WRITING T TO TEMPORARY MEMORY SMORPHEME RECOGNIZED S54 WRITING MORPHEMEDERVATION TO TEMPORARY MEMORY U.S. Patent Feb. 8, 2000 Sheet 11 of 21 6,023,536 INPUT POINTERVALUE | | | 7 U.S. Patent Feb. 8, 2000 Sheet 12 of 21 6,023,536 FIG, 12 "rOOt" O O O O O 7 "rOOt" "a-" O f O a 1.1 O (a) = (?) || 0) O 1 ERROR, WEIGHT 0.6 7 8 "OOt" "r." O f 12 O r 1.2 O (r) = (lf) || 0) O 1 ERROR, WEIGHT 0.6 7 8 "rOOt" "a" O f O a 1.3 O D O 1 ERROR, WEIGHT 0.4 7 8 "OOt" "r" O f 1.4 O r O Cog O 1 ERROR, WEIGHT O.4 7 8 "root" if " 1.5) U.S. Patent Feb. 8, 2000 Sheet 13 of 21 6,023,536 -N-N -N -N as OO O O. O. O. N. N--N-1N-1 U.S. Patent Feb. 8, 2000 Sheet 14 of 21 6,023,536 FG, 14 "root" "a-" "ar-" O f r O a r (2.1 O ex O O 1 ERROR, WEIGHT 0.41 ERROR, WEIGHT 0.4 7 8 9 rOOt" f fa O f r O f a (2.2) O O (a) => (?r) || 0) O O 1 ERROR, WEIGHT 0.6 7 8 9 "root" lf." A "fo-" O f r 2.3 O f O (2.3 O O (o) = (?r) || 0) O O 1 ERROR, WEIGHT 0.6 7 8 9 "rOOt" "f." "fa." O f r O f 8 2.4 O O Cog O O 1 ERROR, WEIGHT 0.4 7 8 9 "OOt" "f." "fr." O f r (2.5 O f r O O O O O O 7 8 9 U.S. Patent Feb. 8, 2000 Sheet 15 Of 21 6,023,536 FIG. 15 O Ot" "fr." "fro-" (3.1) U.S. Patent Feb. 8, 2000 Sheet 17 Of 21 6,023,536 TRIE TABLE"a-" NPUT CHARACTER DICTIONARY WORD TRIE TABLE LINK FG, 17A TRE TABLE"ar-" INPUT CHARACTERDICTIONARY WORD TRIE TABLE LINK FIG. 17B TRE TABLE"arO-" FIG, 17C INPUT CHARACTER DICTIONARY WORD TRIE TABLE LIN FIG. 17D Nygree growniana/O/772 TRIE TABLE "from-" FIG, 17E INPUT CHARACTER TRIE TABLE LINK U.S. Patent Feb. 8, 2000 Sheet 18 of 21 6,023,536 -- OO -- N. -H CO -- O 8 # 9 88||'50IH U.S. Patent Feb. 8, 2000 Sheet 19 of 21 6,023,536 FIG. 19 "OOt" "W 1. "We-" O V e (2.6 O o Oe O O O 3 4 5 "root" "y" "Ve- "Ven-" O W e n 3.3 O W e n (3.3 O O O (n) => 0) O O O 1 ERR WGT 0.3 3 4 5 6 "root." "V-" "Ve-" "Wen-" 'Wen O W C n O W e n O (4.5 O O O (n) => 0) D O O O 1 ERR WGTO.3 1 ERR WGTO.3 3 4 5 6 7 "root" "y." "We "Ven-" "Vene O V e n e O W e n e (4.6 O O O O O O O O O O 3 4 5 6 7 U.S. Patent Feb. 8, 2000 Sheet 20 of 21 6,023,536 J } ((1)<=0) 9?O10M'HHE! (())<=0) 8?010M‘HHE! 8 OZ"SOIH 9U 00 00 Z9 00 00 49 9U 00 00 Z9 Old CD O O CD Old O Old CD CD O Ol li } 8 do o o co 8 do do co 3 O O Coco [[G] [19] U.S. Patent Feb. 8, 2000 Sheet 21 of 21 6,023,536 TRE TABLE"ve-" INPUT CHARACTER DICTIONARY WORD TRIE TABLE LINK FIG 21A TRE TABLE"Ven-" FIG 21B TRE TABLE"Vene." INPUT CHARACTER DICTIONARY WORD TRIE TABLE LINK 6,023,536 1 2 CHARACTER STRING CORRECTION The second column, titled “dictionary word” indicates SYSTEMAND METHOD USING ERROR whether the String of characters read up to this point corre PATTERN sponds to a dictionary entry or not. In the example shown in FIGS. 1A through 1D, this is done by giving the part-of BACKGROUND OF THE INVENTION speech of the entry if it does. For example, “Art”, “N”, 1. Field of the Invention “Prop. N”, and “Prep’ respectively indicate an article, noun, The present invention relates to a morphological analysis proper noun, and preposition, and the empty-Set symbol “qp' and, more specifically, to a character String correction SyS if it doesn't.