CONTENT

COMPREHENSIVE FORENSIC AUDIO ANALYSIS

IKAR Lab 3 Tasks solved with IKAR Lab 3 SIS 4 EdiTracker 7 Sound Cleaner 9 Caesar 10 STC - H246 11 Equipping the Modern Audio Forensic Laboratory 12

AUTOMATED VOICE DATABASES AND OPERATIVE SPEAKER SEARCH

Methods of biometric features extraction and comparison 15 Voice Net 18 National voice search and information systems 22 Trawl X 24

TRAINING, THEORETICAL AND PRACTICAL SUPPORT

Training courses and methodologies 27

AUDIO FORENSIC EXAMINATION AND SPEECH ENHANCEMENT

Audio Forensic experience and range of tasks 29 COMPANY OVERVIEW

Speech Technology Center (STC) is the world’s leading manufacturer of Audio Forensic Products.

STC employs 250 staff with more than 10% having a PhD, 5 audio forensic experts with more than 5 years of practical experience.

STC Audio Forensic Achievements include: 2001-2002 Submarine “Kursk” tapes investigation, restoration of more than 120 hours of information. 2003 participation in the investigation of the terrorist phone recordings at the Moscow Trade Center. Over 100 identification cases for different languages. Forensic Audio examinations for USA, Great Britain, Belgium, India, Columbia, Philippines and others.

STC experts are members of Scientific Council of the Federal Forensic Expert Center of Russia, Russian Federation Justice Department Expert Group, Audio Engineering Society (AES), International Society of Air Safety Investigators (ISASI), International Speech Communication Association (ISCA), Institute of Electrical and Electronics Engineers (IEEE).

Conferences and Exhibitions European Network of Forensic Science Institutes Expert Working Group Forensic Speech and Audio Analysis (ENFSI FSAAWG), International Association for Forensic Phonetics and Acoustics (IAFPA), Intelligence Support Systems (ISS), Audio Engineering Society (AES), CeBIT, SpeechTek, Milipol, Interspeech, DSA, NATIA, Safety&Security Asia.

1 COMPREHENSIVE FORENSIC AUDIO ANALYSIS IKAR Lab

Professional hardware and software set for advanced audio/speech signal analysis

The unique software and hardware set which ensures comprehensive forensic analysis of analog and digital records.

Application

Analysis of audio information in specialized laboratories and forensic science centers, research and educational institutions and companies.

By Russian Federation Ministry of Justices’ order N156-2004.09.02 Speech Interactive Sys- tem of sound analysis and processing (SiS - core software of IKAR-Lab set) is included in the curriculum of State forensic examiners training.

Tasks solved with IKAR Lab:

Æ Speaker identification. Æ Authenticity analysis of analog or digital audio recordings. Æ Audio equipment testing and identification. Æ Analysis of noises, diagnosis of the acoustic environment and recording conditions. Æ Speech enhancement and audio restoration. Æ Text transcription of low quality recordings.

IKAR-Lab includes:

Æ SIS 6.x - sound analysis and editing software application. Æ Sound Cleaner Premium - professional real-time noise suppression and speech enhancement software. Æ EdiTracker - software module for authenticity analysis of audio recording. Æ Transcriber Caesar - software for fast and convenient speech documentation. Æ STC-H246 - external measuring input/output device. Æ Professional microphone and headphones. Æ Connecting cables.

3 з

SIS

Basic functionality

1. Signal input, segmentation and editing Æ Precise signal input with STC-H246: sampling rate up to 200 kHz, 24 bit. Æ Real-time dynamic spectrum visualization. Æ Visualization of several waveforms in one window.

ÆPIC 1

Æ Signal editing functions. Æ Synchronization of several signals by means of their shifting on a definite time interval. Æ Merging of two mono signals in stereo and visa versa. Æ Various segmentations modes: Æ Temporary marks. Æ Unlimited number of permanent marks with text comments.

2. Signal visualization and analysis Æ Real multi-windows interface. Æ Different visualization modes for one or several signals: Æ Waveform. Æ Dynamic spectrogram (sonogram). Æ FFT power spectrum average. Æ Cepstrum. Æ Autocorrelation.

ÆPIC 2

4 з

ÆSIS

Æ Fundamental frequency pitch curve. Æ LPC frequency response (Linear Prediction Coefficients). Æ Histogram. Æ Partial correlation. Æ Zero crossing frequency. Æ Presets for voice and recording channel analysis: Æ Male voice: Tenor; Baritone; Basso; ContrBasso. Æ Female voice: Soprano; MezzoSoprano; Contralto. Æ Child. Æ Microphone or telephone line. Æ Visualization of several speech signal’s characteristics in one window: Æ Dynamic formants tracks, fundamental frequency curve and spectrogram.

ÆPIC 3

formants tracks

fundamental frequency

Æ Cepstrum and fundamental frequency curve. Æ Waveform of several signals with different sampling rates and bits. Æ Manual editing of formant tracks and fundamental frequency curve. Æ Synchronization of several windows with different visualization modes for comparative analysis. Æ Automatic search for comparable sound fragments for instrumental analysis including automatic vowels se- lecting for further formant analysis.

ÆPIC 4

5 ÆSIS

Æ Automatic comparison of fundamental frequency statistics with results in a text report.

ÆPIC 5

3. Noise suppression and signal preprocessing

Æ Built-in noise suppression filters and speech records preparation for comparative instrumental analysis. Signal before and after noise suppression:

ÆPIC 6

Æ Æ

4. Additional modules (plug-ins)

Æ DirectX-plug-ins of side vendors support. Æ EdiTracker module for records authenticity analysis.

6 EdiTracker

The unique software for audio authenticity analysis

Application

Fast analysis of audio authenticity in specialized laboratories and forensic science centers.

Functionality

Forensic investigation of audio recordings authenticity: Æ Testing and calculating of technical characteristics of the recording device: total harmonic (THD), frequency response, detonation, parasite amplitude modulation.

THD calculation parasite modulation calculation

ÆPIC 7

Æ Finding the traces of previous digital processing. Digital processing of analog signal leaves the traces in the signal which can be automatically detected by EdiTracker.

ÆPIC 8,9

Æ Search for the traces of editing by the harmonic’s phase shift. EdiTracker automatically finds the harmonic components of the signal and investigates the continuity of the harmonics’ phase. Æ Background noise scanning - analysis of the background noise continuity. An unjustified abrupt change of its characteristics can be suggestive of previous editing. Æ Auditory analysis - detailed list of linguistic features of tampering. Æ Final report is generated automatically.

7 ÆEdiTracker

Highlights

Æ Automation of finding and analysis of “suspicious” spots in the recording. Æ Automation of recording device identification. Æ Visualization of all steps of analysis. Æ Automatic logging of each type of analysis and final protocol composition for the expert’s report. Æ Built-in instructions for each type of analysis. Æ Fastening of authenticity examination procedure and increasing reliability of the expert’s conclusion. Æ Practical recommendations for audio authenticity examination.

8 Sound Cleaner

Professional real-time noise suppression and speech enhancement software

Award of International forum “Security and Safety Technologies”

Application

Æ Noise suppression and enhancement of the poor quality recordings. Æ Text decoding (transcription) of the noisy recordings. Æ Audio pre-processing for identification. Æ Noise suppression while getting field data.

Functionality

Æ Sound input from line output and microphones or from sound files and processing. Æ Saving results in wav format. Æ Real-time noise suppression, changing filtering settings without playback stop. Æ Automatic report on sound processing for expert’s conclusion. Æ Preset schemes of noise suppression for typical cases. Æ Saving filters parameters for further usage. Æ Can be used sound editors of side vendors as a DirectX-plugin.

Highlights

Æ Patented methods. Æ Efficient suppression of all types of noises and . Æ Adaptive to changing noises. Æ Easy to use, results are immediately hearable. Æ Filers can be combined to process the noisy record simultaneously. Æ Low quality records processing for further instrumental ID analysis: “weak” formants enhancement.

Filters

Sound Cleaner can be supplied as a separate product or as a part of IKAR-Lab. Æ Sound Cleaner contains 19 different modules of speech processing and enhancement: Æ 2 adaptive filters of broadband noises. Æ Parametric equalizer. Æ Harmonic filter. Æ Adaptive inverse filter. Æ . Æ Adaptive filter of tonal noises. Æ Impulse noises filter. Æ Dynamic filter (weak signal amplifying/weakening, strong signal weakening). Æ Anti-reverberation filter. Æ Stereo signal filtering (time and frequency processing). Æ Others. Æ Practical recommendations for noise suppression and text decoding.

9 Caesar

Audio transcribing software

Application

Verbal text transcription and text protocols of forensic audio evidences.

Functionality

Æ Integration of high-quality digital recorder with text editor. Æ Listening and transcription of speech information form the beginning of downloading without breaking. Æ Automatic spell check. Æ Automatic Gain Control (AGC). Æ Noise suppression mode allows getting rid of constant noises in the room. Æ Sound stretcher helps to get slowing/fastening of speech without pitch distortion.

ÆPIC 10

Highlights

Æ Transcription speed (compared to any common text editing tool) is significantly increased. Æ Linking audio fragments to text allows transcription quality control and makes correction an easy task. Æ Possibility to use MS Word as text editor. Æ Easy-to-use, user friendly interface. Æ Pedal for playback control.

Delivery Set

Æ Software application. Æ Audio input/output device.

10 STC-H246

Sound input/output devices

For signal input/output in the IKAR Lab set STC H216 or STC-H246 STC audio signal input/ output devices can be used.

STC-H246 technical characteristics

ÆPIC 11

Device for measurement and generation of the electric signals in the sound range in accordance with technical specifications. STC-H246 has a Certificate of measuring equipment type approval. Æ Inputs/outputs: Æ analog: linear symmetrical and unsymmetrical. Æ digital: SPDIF coaxical and optical. Æ Resolution ADC/DAC: 24 bit. Æ Nominal voltage level for analog inputs and outputs: 2.0V. Æ Sound-to-noise ratio in by pass channel (without weighting): 112 bB. Æ Harmonic distortion rate (without weighting): 0.003%. Æ Frequency response flatness in by pass channel : ± 0.01 dB. Æ Sampling rate for digital signals: 32; 44.1; 48; 88.2; 96; 192 kHz. Æ Sampling rate for analog signals: 4; 8; 10; 11.025; 11.167; 16; 22.05; 32; 44.1; 48; 96; 192; 200 kHz.

Results of the RightMark Audio Analizer testing Testing chain: External loopback (line-out - line-in). 24-bit, 192 kHz.

Results: Frequency response flatness ( 40 Hz - 15 kHz), dB: +0.02, -0.01 Perfect Noise level, dB (А): -113.7 Perfect , dB (А): 113.4 Perfect Harmonic distortion %: 0.0002 Perfect Intermodulation distortion + noise %: 0.0023 Perfect Channel interpenetration, dB: -109.7 Perfect Intermodulation 10 kHz, %: 0.0032 Perfect Grade: Perfect

11 Equipping the Modern Audio Forensic Laboratory

Main principles of an expert laboratory organization

Laboratory equipment ensures support for all basic audio forensic tasks: Æ Lab core - Server of the VoiceNet System or file server (if there is no voice database) for storage and registra- tion of the incoming data. Æ Experts’ working places are created upon quantity and types of the handled tasks (identification, diagnostics, input and storage of signals). Æ All the work places have network access to the file server.

Typical forensic laboratory equipment set

Voice Net system server or fileserver (if there is no voice database) for storage and registration of the incoming data.

ÆPIC 12

AWP - Automated working places for experts:

Working place N1 - for signal input and authenticity analysis. Working place N2 - for identification analysis. Working place N3 - for noise suppression and text enhancement.

12 ÆEquipping the Modern Audio Forensic Laboratory

Automated working places:

Æ Working place N1 for visual examination and capturing of the image of the investigated objects, for input and authenticity analysis of the audio evidence: Æ SIS module EdiTracker. Æ Measuring input/output device STC-H246. Æ Players for various mediums. Æ Digital camera.

Æ Working place N2 for identification analysis: Æ SIS. Æ Measuring input/output device STC-H246.

Æ Working place N3 for noise suppression and text enhancement of the low quality voice recordings: Æ SIS. Æ Sound Cleaner. Æ Transcriber Ceaser. Æ Measuring input/output device STC-H246.

Each working place must be equipped by the audio players:

Æ Dynamic head telephones. Æ Table active acoustic displays. Æ High quality dynamic microphone.

Additional equipment:

Æ Local area network equipment, Æ Equipment for expert report hard copy production (local network laser printer), Æ Professional digital voice recorder for voice samples recording.

Notes: Æ Several working places can be combined in one (2 and 3, 1 and 4). Æ Each working place is a workstation with Voice Net operator (if there is a national voice database system) or a workstation for registration and archiving of incoming audio data.

13 AUTOMATED VOICE DATABASES AND FAST SPEAKER SEARCH FOR INVESTIGATIONS Methods of biometric features extraction and comparison

In VoiceNet and Trawl-X language and content independent methods of biometric features extraction and compari- son are used. They ensure efficiency and accuracy of the systems and surpass other systems existing in the market.

The primary method of comparison is spectral-formant method. Pitch statistics method and voice models method are used as auxiliary methods of comparison.

Operating speed and accuracy of voice biometric features extraction and comparison are the highlights of the methods. Reliability is measured by EER value (EER - Equal Error Rates), which is the match point of FRR (False Rejection Rate) and FAR (False Acceptance Rate). Operating speed is time for extraction and comparison.

Spectral-formant method

Spectral-formant method (SFM) is based on the unique shape of each person’s vocal tract which is reflected in the visible speech of different people. The difference in spectra can be most vividly seen on the formants tracks in the vocalized speech fragments. Spectral-formant method used in VoiceNet and Trawl-X is based on the extraction and comparison of the positions and dynamics of three and more formants.

An example of formant representation of the phrase “Forensic audio” pronounced by two different persons is shown in the picture 14 (The horizontal axis is time in seconds. The vertical axis is frequency in Hertz. Energy level is depicted by the darkness of the trace).

Since 2004 the spectral-formant method is patent protected by Russian patent.

ÆPIC 13

15 ÆMethods of biometric features extraction and comparison

Method reliability was tested on the officially registered voice database RUSTEN and is given in the table1.

EER for spectral-formant method depending on the duration of the speech recording:

ÆTAB 1 Speech 96x96 48x96 48x48 32x96 32x48 32x32 16x96 16x48 16x16 duration, sec

EER, % 8 9.9 11 11.7 12.8 13.9 13.6 15 17.9

This method is used as a primary because of the following reasons: Æ In comparison with others the method works with low quality records. It accepts signals with sound-to-noise ration up to 12 dB.Ideal for multi-modal ivestogations. Æ Method reliability has little dependence on the emotionality of the speech. Æ High speed of biometric features extraction (see page 24).

Pitch statistics method

Pitch statistics method (PSM) engages 16 different pitch parameters, including average pitch value, maximum, minimum, median, percent of areas with rising pitch, pitch logarithm variation, pitch logarithm asymmetry, pitch logarithm excess and 8 parameters more.

EER values for pitch statistics method depending on the speech duration are given in the table 2. Realization of this algorithm became possible thanks to fully automated pitch extractor created by the STC professionals. An example of automated pitch extraction in the phrase “Forensic audio” pronounced by two different persons is shown in the picture 15.

ÆPIC 14

16 ÆMethods of biometric features extraction and comparison

Method reliability was tested on the officially registered voice database RUSTEN and is given in the table 2.

EER for pitch statistics method depending on the duration of the speech record:

ÆTAB 2 Speech duration, sec 40*40 20*40 20*20 10*40 10*20 10*10

EER weighted value, % 15.9 17.0 17.7 18.4 18.9 19.7

The high speed of comparison and speaker search is the advantage of this method. However, dependence on the speaker emotional state in the moment of speech production makes this method auxiliary in VoiceNet and Trawl X systems.

Voice models method

The main idea of the method is to create a model of speaker individual acoustic features. The modeling is GMM based.

A degree of difference between two speakers can be estimated by distance between them in the space of speaker- dependent features extracted from the recording.

EER for this method is less then 5% for the signals of 20 sec length.

The reliability of the method is increased when there are several recordings of the same speaker.

At the same time this method has high requirements for the quality of the signal. Together - with long time of features extraction and card creation proccesses makes the method to be auxiliary for VoiceNet and Trawl X.

Additional methods of signal processing

Usage of various verification and speaker recoguition technologies shows the dependence of reliability of voice bio- metric systems on the quality of a signal: broadband noises and distortions in the transmitting channels. Noise suppres- sor and frequency compensator are built in VoiceNet and Trawl X. To avoid these noises and distortions automatic. Dynamic spectra of the speech signal (recorded from real telephone channel) before and after noise suppression are shown in the pic 6. Unprocessed signal can not be used for identification by any of the methods, however processed signal can be used in full scope.

17 VoiceNet

Software system for voice database management and automatic speakers’ search

Industrial award “ZUBR 2007” in “Information Securty”, Moscow, Russia

Application Æ Local (regional) and federal voice databases for criminal cases investigations. Æ Fast search of offenders using voice records.

Functionality Æ Speech recordings input and storage in the database. Æ Storage of the additional information including images, personal information and other text information de- scribing the speaker and the circumstances of the case. Æ Automatic noise suppression and normalization of the distortions from the sound transmitting channel. Æ Database management: unlimited number of voice recordings, editing of speaker’s card template with ac- cordance with the national, federal legislation. Æ Preset templates for speakers’ card.

ÆPIC 15

Æ Automatic calculation of voice biometric features using three independent methods. Æ Different types of automatic search: 1 Æ “known among known” (to avoid registration of the person under another name or with false personal information). Æ “unknown among known” or visa versa (unknown speaker ID or known person involved in other crimes). Æ “unknown among other unknown” (unknown person involved in other crimes).

1 «known» - an identified person.

18 ÆVoiceNet

Æ Equal Error Rate calculation for each search. Æ Remote data input, remote databases access, distributed systems’ creation. Æ Biometric data security from unauthorized access and modification with digital signature. Æ Flexible access control policy, security of biometric data transferring. Æ The recordings revealed in the search can be further examined with IKAR-Lab (page 3).

Highlights

Æ High automation The highest possible degree of automation of speech processing and analysis. In case when all processes must me strictly controlled by an operator, manual mode is available.

Æ Low quality records acceptability Built-in automatic noise suppression and spectrum correction significantly expand the capability of the system for poor quality signal (telephone or radio channel) processing and analysis.

Æ High search reliability Simultaneous usage of three content and language independent biometrical methods ensures high search reliability: Æ Spectral-formant method (SFM). Æ Pitch statistics method (PSM). Æ Voice models method (VMM).

Æ Adaptability and settings flexibility Speakers’ index contains informational fields which can be changed by the supplier or user in compliance with a national legislation or department’s standards.

Æ High security of biometric information Modern cryptographic methods, digital signature and access restriction ensure security of biometric data from unapproved usage and modification.

Æ Scalability The system does not have limitations for the quantity of data and the number of users. The system’s capability can be increased by more productive computes, additional operators’ stations and new voice databases.

Æ Audio formats compatibility The system operates with all audio formats using the endecs installed in the operational system. For data exchange between the workstations standard Ethernet channels are engaged.

Æ Comprehensive approach The system allows both searching and forensic audio examination for further submission results to court.

19 ÆVoiceNet

Sources of recordings

Æ Recordings received in investigations. Æ Recordings received in trials (protocols of examination, confrontations, and court). Æ Telephone calls to police and emergencies. Æ Recordings rendered by the victims. Æ Voice samples taken from the suspects for registering in the voice bank and further examination.

Operating procedure

Operators receive the recordings with the covering information (on the data media or via communication channel). Operator listens to the recording and makes a preliminary manual or automatic processing: Æ Segmentation of the speakers. Æ Deletion of non-speech fragments. Æ Noise suppression and speech enhancement for low-quality recordings. Operator fills in a speaker’s card including covering text and graphic information about a person, details of record and crime. Name of the operator and creation time are added automatically. Biometric features are automatically calculated and stored together with the card in the corresponding section of a voice database. Each card with biometric information is secured from falsification by digital signature. Operator starts search of speakers whose biometric voice features match with or are close to the voice of a speaker in the new card or in any card from the database.

ÆPIC 16

Results

The results of VoiceNet search are presented in the text report (pic. 16). All search tasks and results are saved in the log files. For presentation of the results as an evidevce at court a forensic investigation should be done by means of IKAR-Lab.

20 ÆVoiceNet

VoiceNet basic characteristics

Operating speed

Operating speed is defined by the time spent for the voice biometric features extraction and the quantity of pair comparisons done at a time unit. On an 8-core PC 26 thousands recordings can be compared by 1 hour.

Search reliability

Search reliability is defined by EER.

From one side EER depends on the signal’s duration and comparability of emotional and physical speaker’s state and from another on the signal’s quality: sound-to-noise ratio, frequency response flatness, bandwidth, acoustic and electric distortions and other), signal’s duration.

VoiceNet was tested on the officially registered database RUSTEN and the following results have been received:

Signal requirements

For the declared search reliability the sound signal (speech recordibg) must meet the following requirements: Æ “pure” speech duration - not less than 16 seconds. Æ Sound-to-noise ratio in the bandwidth 300...3400 Hz - not less than 12 dB. Æ Frequency response flatness in the bandwidth 300...3400 Hz - not worth than 15 dB. Æ No echo or reverberation distortions.

Delivery set

VoiceNet complex includes two or more network PCs with software applications operating as client-server: Æ Server - software module providing storage and comparison of speakers’ cards. Æ FormeBuilder (optionally) - software application for voice databases creation, users registration and access policy, database structuring and editing of database sections. Æ Operator - software application for creating, viewing and editing database sections, and speakers’ cards management.

21 National voice search and information system

To ensure fast processing of search requests from regional and federal law enforcement agencies National Voice Search and Information System based on VoiceNet must include two level databases: Æ Central. Æ Regional. Central database includes regional databases. Regional databases contain information about the criminals and crimes registered only on the controlled territories. Thus, the structure of national voice search and information system can be presented as shown in the picture 18.

Central database is physically located in the administrative center. Operators of the central and regional law en- forcement agencies can have the access to the central database via Ethernet channels. All databases are operat- ing under VoiceNet-Server and MS Windows-Server 2003.

Multi-processor server station or cluster increases the speed of complicated calculations, e.g. search (comparison) among the great number of cards. VoiceNet operator working under MS Windows XP operation system addis cards, fullfils search tasks and analysis of the search results.

ÆPIC 17

Central voice database structure

22 ÆNational voice search and information system

Information exchange between the server (voice database) and remote client (operator) is done via secured VPN (Virtual Private Network). All transferred information is encrypted with the highly reliable Blowfish symmetric algorithm.

Central and state databases are synchronized automatically with periodicity specified by a user.

Search results can be used by operators-analysts for further forensic speaker identification by means of specialized IKAR-Lab set (page 3).

Thus, the structure of national voice search and information system for n regions can be presented as shown in the picture 19.

ÆPIC 18

National voice database system

23 Trawl Х

Software application designed for real-time automatic speaker search

Golden Medal of industrial award “ZUBR 2008” in “Information Securty” Moscow, Russia

Application

Trawl X is used as a part of the telephone interception system. Trawl X accepts speech recordings from any kind of telephone lines. The system provides real-time search of the recordings containing speech of a specified person(s) given in speech samples. Picture 19 shows one of the realizations of the system.

Trawl X was designed for the search within a limited list of suspects

ÆPIC 19

24 Æ Trawl Х

Functionality

The system was designed to achieve maximum usability. The operator can have no experience or any special knowledge in the field of voice forensic examination and of voice recordings analysis. The user copies the incoming recordings that contain voices of unknown speakers in one of the folders. The system compares the incoming recordings with the samples given in the pre-defined folder. Once the system attributes the incoming recording to one of the samples it moves it to the folder named after this sample. Whether the speech cannot be attributed to any of the samples the file is moved to unknown speaker folder. The threshold of the decision making - False Rejection Rate to False Acceptance Rate ratio is the only parameter that should be set. As well as for every product STC provides training for the system operating. In case of Trawl X it is only one day of training.

Highlights

Æ Maximum automation Processing and analysis of voice recordings intercepted from digital (duplex) phone lines is completely automated. Note: The modern systems of the digital communication (including sell phones) and consequently phone calls’ logging equipment enables to provide two-channel voice recordings and each of the channels contains the speech of only one speaker. The unique algorithm developed by STC enables automatic extraction and deletion of non-speech fragments, which can have negative impact on the search reliability.

Æ User friendly User’s task is just to copy the files.

Æ Works with low quality speech signals The inbuilt automatic noise suppression, speech enhancement and frequency response correction functions together with several methods of the biometric features extraction considerably widen possibilities of Trawl X system for operation with the low quality speech recordings.

Æ High searching reliability High reliability of search is guaranteed due to the simultaneous using of the three content- and language independent methods of biometric features extraction and comparison (against other similar systems): Æ Spectral-formant method (SFM). Æ Pitch statistics method (PSM). Æ Voice models method (VMM).

Æ Scalability The system has no limitations neither for the quantity of processed files nor for the number of operators. System capacity is increased by application of a more powerful PC.

Æ Audio formats compatibility The system operates with all audio formats using the codecs installed in the operational system.

Basic features

Operating speed and accuracy rate are the same as for VoiceNet (page 18)

25 TRAINING, theoretical and practical SUPPORT Training courses and methodologies

The user qualifications and skills in the fields of sound processing and speech treatment play an important role for operating with STC hardware and software complexes.

STC training team together with the company’s audio professionals and experts provide the following specialized training courses:

Æ Using IKAR Lab Hardware and Software Set for the means of the Audio Forensics. (Basic and Advanced courses). Æ Speech enhancement in the field work using STC hardware devices. Æ «Sound Cleaner». Speech enhancement of low quality recordings using Sound Cleaner software complex.

The training is focused on acoustics, psychoacoustics, theory of speech production, digital sound processing and spectral analysis and is divided into two parts: theory and practice - so the trainees can apply their theoretical knowledge while improving of practical skills using IKAR Lab set.

Original STC methodologies are also included into the training courses:

Æ Speech enhancement and text transcription of poor quality recordings; Æ Language independent speaker identification (this original STC methodology is approved by the Scientific Methodological Board of the Federal Forensic Expert Center of Russia, Russian Federation Justice Department Expert Group). Æ Authentication (this original STC methodology is approved by the Scientific Methodological Board of the Federal Forensic Expert Center of Russia, Russian Federation Justice Department Expert Group).

The trainees are supplied with training materials for the self-training, texts of methodologies and sound examples. After the training STC provides company’s Certificate.

Since 1993 STC professional audio experts participate and give lectures at seminars and conferences held by Russian governmental organizations (Ministry of Interior, Drug Enforcement, Ministry of Justice).

27 AUDIO FORENSIC EXAMINATION AND SPEECH ENHANCEMENT Audio Forensic experience and range of task

STC is a recognized leader in forensic audio examination and speech enhancement.

The chief expert of the STC expert department has over 20 years experience in Audio Forensics and is a member of the Scientific Methodological Board of the Federal Forensic Expert Center of Russia, Russian Federation Justice Department Expert Group, member of International Aircraft Accidents Society (ISASI), vice-president of “Russian Biometric Society”.

The head of expert department has over 15 years experience in Audio Forensics and is a member of the Scientific Methodological Board of the Federal Forensic Expert Center of Russia, Russian Federation Justice Department Expert Group, member of AES.

In 2001-2002 STC expert team restored more than 120 hours of speech information recorded at nuclear submarine “Kursk” raised from underwater.

In last 5 years a lot of difficult cases were done for governmental structures and private customers from Russia, USA, Belgium, United Kingdom of Great Britain, India, Columbia, Philippines.

The following range of tasks lies in the field of our competence:

Æ Language independent voice identification. Æ Authentication of analog and digital recordings. Æ Identification of the recording equipment. Æ Text decoding of low-quality recordings. Æ Noise suppression and speech enhancement. Æ Analysis of the acoustic environment.

29