Method of Accommodating for Carbon/Electret Telephone Set Variability in Automatic Speaker Verification
Total Page:16
File Type:pdf, Size:1020Kb
Europaisches Patentamt 19 European Patent Office Office europeen des brevets © Publication number: 0 654 781 A2 12 EUROPEAN PATENT APPLICATION @ Application number: 94308231.3 @ Int. CI.6: G10L 5/06 (22) Date of filing : 09.11.94 (So) Priority: 19.11.93 US 155973 @ Inventor: Sachs, Richard M. 64 Sunset Place @ Date of publication of application : Middletown, New Jersey 07748 (US) 24.05.95 Bulletin 95/21 Inventor : Schoeffler, Max S. 17 Kenwood Lane @ Designated Contracting States : Matawan, New Jersey 07747 (US) DE ES FR GB IT (74) Representative : Watts, Christopher Malcolm @ Applicant : AT & T Corp. Kelway, Dr. et al 32 Avenue of the Americas AT&T (UK) Ltd. New York, NY 10013-2412 (US) 5, Mornington Road Woodford Green Essex, IG8 0TU (GB) (S) Method of accommodating for carbon/electret telephone set variability in automatic speaker verification. In verification method of (57) a speaker system, a FIG. 6 compensating for differences in speech sam- ples obtained during registration and those obtained during verification due to the use of VERIFICATION (4-WAY) different types of microphones is provided by ,601 at least of the such that filtering one samples PROMPT the similarities of the two samples are in- creased. The filtered sample is used within the ,604 speaker verification matching process. A two- RECEIVE VERIFICATION way comparison is disclosed in which both a SPEECH SAMPLE verification speech sample and a reference sample are filtered with nonlinear microphone r609 ,606 characteristics such as carbon microphone PRODUCE CARBON characteristics. A is also VERIFICATION FILTER four-way comparison PATTERN SAMPLE disclosed in which patterns produced from un- filtered verification and reference samples and 611 from the filtered verification PRODUCE CARBON patterns produced FILTERED and reference samples are compared to identify VERIFICATION a match. A score is determined for each com- PATTERN The the best DB (407) parison. comparison having score 613 is used to determine if a match has occurred. 6-i 4-WAY COMPARISON WITH TWO STORED REFERENCE PATTERNS 617 DB (412) PICK "BEST" SCORE CM r619 r622 < CLOSE ENOUGH?^ ^ SECOND TRY ' I YES I YES 00 \ /-624,624„. I --627 GRANT ACCESS DENY 10 /633 CO ( BEGIN SERVICE") C END CALL) 635 C END ) LU Jouve, 18, rue Saint-Denis, 75001 PARIS 1 EP 0 654 781 A2 2 Background of the Invention tion pattern generated from a speech sample provid- ed by an imposter. The present invention is generally directed to Thus, a subscriber who registers using one type speaker verification, and more particularly, to a meth- of telephone handset microphone and attempts to be od of accommodating variability among different 5 "verified" using another type of handset microphone types of telephone handsets, in order to improve the is more likely to be denied access than one who reg- accuracy of speaker verification. isters and attempts to be verified using the same type Speaker Verification (SV) is a speaker-depend- of handset microphone. ent pattern-matching process in which a subscriber's speech sample presented for verification is process- 10 Summary of the Invention ed to produce a verification pattern. This verification pattern is compared to an SV reference pattern that In accordance with the present invention, the is typically produced from speech samples previously problem of compensating for variability in speech provided in the course of a so-called registration ses- samples due to the use of different types of micro- sion. A "match" between the verification and refer- is phones is solved by filtering at least one of the sam- ence patterns occurs when their characteristics are ples in accordance with the characteristics of one of substantially similar. Otherwise, a "mismatch" is said the microphone types and using the filtered sample to have occurred. within the matching process. A typical application of SV is a telephony-based In general, it is not possible to determine whether security system. Asubscriber "registers" with the sys- 20 any particular speech sample originated from any tem by providing speech samples over a telephone particular type of microphone. Therefore, in preferred link and an SV reference pattern is produced. Subse- embodiments, both the verification speech sample quently, a caller, seeking access to, for example, a and the SV reference sample are filtered with typical service or some secure data, calls the system and carbon-microphone characteristics. Consequently, presents his/herspeech sample forverification as de- 25 any variability which may have resulted from using scribed above. If a match occurs, the desired access different types of handset microphones is reduced. is granted. If there is a mismatch, it is presumed that Variability originating from other properties of the a so-called imposter-pretending to be a subscriber- speech sample such as added background noise, and was the caller and access is denied. telephone network distortion or variable speaking lev- Many times, SV is complicated by the fact that the 30 el is also reduced. For example, if the samples are verification pattern is different from the SV reference generated by an electret microphone, the filtering pattern due to circumstances such as, illustratively, causes the samples to have similar characteristics to the use of different types of telephone handset micro- samples that would have been generated by a carbon phones, e.g., linear (such as electret) and non-linear microphone. If the samples are generated by a carbon (such as carbon). Other examples include different 35 microphone, the filtering will result in samples which, background noises and different speaking levels. although now different, retain their essential charac- These differences can cause characteristics of the ter as carbon microphone speech samples. Thus, no speech sample provided during registration and the matter which type of microphone was used to provide speech sample provided during any particular SV ver- the two samples, theirf iltered versions both have car- ification session to be different from one another. The 40 bon-microphone-like characteristics. corresponding patterns will then also be different, The principal consequence of the foregoing is possibly resulting in an incorrect "mismatch" determi- that because the invention reduces the variability be- nation. tween samples provided using different microphone In particular, an electret microphone performs a types, that variability need not be taken into account fairly linear transformation on incoming speech sam- 45 when establishing criteria under which a "match" will ples and, as such, minimally distorts them. A carbon occur. Indeed, the invention allows those criteria to be microphone, on the other hand, performs a non-linear made more stringent while not increasing the level of transformation on the speech samples by, for exam- incorrect rejection (the latter being the declaration of ple, compressing high-volume speech levels and sup- a mismatch when the caller is, in fact, the subscriber). pressing low background noise levels, the latter often so In an alternative embodiment of the invention, being referred to in the art as "enhancement." As patterns produced from unf iltered versions of the ver- such, the carbon microphone distorts the speech ification and reference samples are used along with samples to a significant extent. Because of the vari- the patterns produced from the filtered versions of ability in the effects that these different types of mi- the verification and reference samples as described crophones have on the samples, it is difficult to dis- 55 above. Comparisons are made between each version criminate between a mismatch caused by using dif- of the verification pattern and each version of the ref- ferent types of microphones and a mismatch caused erence pattern. The results are then used to deter- by comparing an SV reference pattern to a verifica- mine whether a match has occurred. This approach 2 3 EP 0 654 781 A2 4 could, in theory, improve the overall system perfor- form 10 which implements the principles of the pres- mance, for reasons that are explained in detail here- ent invention. At the heart of service platform 10 is a inbelow. microprocessor 11 and various standard peripherals Variability in the patterns can arise from factors with which it communicates over bus 13. The periph- other than differences in microphone type. For exam- 5 erals include random access memory (RAM) 12, ple, background noise derived acoustically or from read-only memory 14, hard disk memory 16, tele- telephone-network-based circuitry may introduce va- phone interface 18, digital signal processor (DSP) 19 riability into the patterns. Other factors such as vari- and a number of other peripherals indicated at 15. (Al- able speaking level or variability arising from other though not shown in the FIG., DSP 19 may have its properties of the utterance not related to speaker dif- 10 own memory elements and/or a direct connection to ferences may also introduce variability which may re- various memory elements within the system, such as sult in a mismatch determination. disk memory 16.) Indeed, the principles of the invention can be Service platform 10 is accessible only by sub- used to address such other variabilities. In particular, scribing individuals referred to herein as "subscrib- the invention generally encompasses the concept of 15 ers." The process of becoming a subscriber includes processing at least one of the recognition and verifi- a "registration" process wherein the subscriber is cation speech samples so that the properties charac- asked to recite utterances which are converted into terizing the processed speech sample are more sim- reference speech samples. This is illustratively car- ilar to the properties of the other speech sample than ried out during a telephone call made to the system is the unprocessed speech sample. The processing 20 from rotatory telephone set 31 via telephone central could thus be noise-reduction processing or volume- office (CO) 20 and a telephone line 21 extending from normalization processing, or whatever processing is CO 20 to telephone interface 18.