<<

POLYGRAPH 101: FOR THE CRIMINAL PRACTITIONER

Kenneth L. DAVIS 07NOV19 OBJECTIVES

I. Pros & Cons on whether or not to take a

II. How are Relevant Test Questions Formulated & by Whom?

III. What is an “Inconclusive”/”No Opinion” Test Result?

IV. Why is the Post-Test Interview More Important Than the Test Result? How can we Choose… When we ASSUME this!

PROS & CONS PROS & CONS

When Considering the Following:

 Suspect(s)

 Accused

 Witnesses and/or Criminal Informant(s)

 Victim(s) SUSPECT(S)

PRO: o Helps to gauge who’s involved & what may their involvement be o May gain - Admissions & Confessions o Cost Time/Effective*

CON: o If there’s involvement…the Spotlight is on THEM o Admissions & Confessions may by obtained

UNCHARTED WATER:  TITLE IX Investigations!  Family/Custody-CYS/Sexual Harassment ACCUSED

PRO: o May shed light on new information involving case o Typically the Comm. does not engage unless exigent circumstances

CON: o Could undermine Case

ILLUSTRATIONS: 1 (dating of injury) 2 (Det. SS) 3 (Wayne/Susq)

UNCHARTED WATER:  ASSET FOREFEITURE & Proffer$ WITNESSES AND/OR CRIMINAL INFORMANT(S)

PRO: o Could aid the investigation - add Probable Cause o Assist with Officer Safety

CON: o Could change the Direction of the Investigation VICTIM(S)

PRO: o Comm. does NOT test victim(s) o May add Credibility to Victim

CON: o Could undermine Case & have negative impact on Victim

ILLUSTRATIONS: 1 (recent + vs -)

UNCHARTED WATER:  Civil Cases PROS & CONS

Other Things to Consider:

 Who’s the Examiner?

 When is the Polygraph Test being Implemented

ILLUSTRATION 1* cost/time effective (Cco)

UNCHARTED WATER:

 Discovery – Quality Control RELEVANT TEST QUESTION FORMULATION

Category Types of Relevant Questions: 1) Primary 2) Secondary

Who Determines the Relevant Test Questions?  For the Comm.:  Investigator  District Attorney

 In the Private Sector:  The person who hired them RELEVANT TEST QUESTION FORMULATION

HOWEVER!

The Examiner has the Ultimate Decision on the wording of the test question to follow validated testing procedures/formats! RELEVANT TEST QUESTION FORMULATION

What Dictates the Number of Relevant Test Questions?

 The Information that is trying to be Verified  The Validated Test Format that Best Incorporates the Aforementioned

SOMETHING To CONSIDER:  GAMESMANSHIP**

ILLUSTRATION 1** gamesmanship (TD case denial) INCONCLUSIVE/NO OPINION TEST RESULT

Simply – for that Validated Test Format – Not enough Numeric data for a Definitive Test Result

What Causes That?  Possible Examiner Error…Did the examiner do anything to mitigate that?  Outside Influences***  Examinee Not Testable  Examinee employing Countermeasures/Purposely Non-Cooperative (PNC)

ILLUSTRATIONS: 1 outside influences*** INCONCLUSIVE/NO OPINION TEST RESULT

If Not:

 Examiner Error

 Outside Influences

 Examinee (testable)

You Conduct a Post-Test Interview

 If Countermeasures/PNC

 You Conduct a Post-Test Interview

 Plus – Possible Consciousness of Guilt WHY IS THE POST-TEST INTERVIEW MORE IMPORTANT THAN THE TEST RESULT?

First – let’s Discuss this Term! DIFFERENTIAL SALIENCE WHY IS THE POST-TEST INTERVIEW MORE IMPORTANT THAN THE TEST RESULT?

Post-Test Interviews HELP get an Understanding as to Why perhaps that person’s test result was DI-Deception Indicated or SR-Significant Responses.

ILLUSTRATION: 1 (13yr/#of times in bedroom)

HENCE: The IMPORATNACE of Post-Test Interviews & GAMESMANSHIP! QUESTIONS Thank You!

In many but not all jurisdictions, the Frye standard has been superseded by the . States still following Frye include: , Illinois, , Minnesota, , New York, Pennsylvania, and Washington. Effective July 1, 2013, Florida no longer adheres to the Frye standard.

United States v. Lee, 315 F.3d 206 (3d Cir. 2003), cert. denied, 540 U.S. 858 (2003). Defendant pleads guilty to possession and transport of child pornography, and to enticing minor by computer to engage in sex. District court conditions supervised release on defendant's submission to periodic polygraph exams. Sentence affirmed. Among other arguments, defendant contends polygraph exam is unnecessarily burdensome, because polygraph results would be inadmissible at any revocation hearing. Actually, some courts have held admissible following Daubert, but Third Circuit need not reach that issue now, because polygraph could still be used by probation officer to enhance defendant's supervision and treatment.

Nawrocki v. Twp. of Coolbaugh, No. 01-1196 (3d Cir. Apr. 8, 2002) (unpublished). Teacher #1 receives various anonymous threatening and pornographic letters, and teacher #2 is suspected of being their source. School district arranges for polygraph of teacher #2, and polygrapher concludes that teacher #2 is responsible for letters. Teacher #1 files criminal charges, but teacher #2 is acquitted. Teacher #2 then brings malicious prosecution claims against teacher #1 under section 1983. District court enters judgment as matter of law for teacher #1, in part because polygraph results supplied probable cause. Admissibility affirmed. Polygraph results were not admitted for their truth, but to show probable cause. Although most appellate courts have continued to exclude polygraph evidence notwithstanding Daubert, district court did not err in admitting polygraph findings for this limited purpose.

317 Pa.Super. 118 (1983)

463 A.2d 1113

COMMONWEALTH of Pennsylvania v. Keith O. SMITH, Appellant.

Supreme Court of Pennsylvania.

Petition for Allowance of Appeal Denied January 20, 1984.

Michael A. DeFino, Philadelphia, for appellant.

Robert B. Lawler, Assistant District Attorney, Philadelphia, for Com., appellee.

Before CERCONE, P.J., and CAVANAUGH and WIEAND, JJ.

WIEAND, Judge:

On appeal from an order denying P.C.H.A. relief, Keith O. Smith asks us to review issues which, on direct appeal to the Supreme Court, were not decided, apparently because the issues had not been preserved for appellate review. Appellant makes this request by asking us to hold that contentions regarding the alleged involuntariness of his possessed arguable merit and that prior counsel, therefore, must have been ineffective for failing to preserve them for determination on direct appeal.

Smith had been tried and convicted of robbery, conspiracy and of the second degree for his part in the holdup of a Philadelphia flower shop and the shooting of the proprietor. On direct appeal to the Supreme Court the conviction was upheld and the judgment of sentence was affirmed per curiam. See: Commonwealth v. Smith, 468 Pa. 375, 362 A.2d 990 (1976). A dissenting opinion, authored by Justice (now Chief Justice) Roberts, suggested that the majority had avoided reaching the merits of appellant's contention that his confession had been coerced by an inadequate understanding of his rights with respect to a polygraph examination because the issue had not been preserved in written post verdict motions. After his direct appeal had been denied, appellant filed a pro se P.C.H.A. petition, alleging that prior counsel had been ineffective for failing to preserve the issue deemed waived by the Supreme Court. Counsel was appointed to file an amended petition, which contained the additional contention that prior counsel had been ineffective in failing to preserve for review the assertion that appellant's confession was involuntary because of undue pre-arraignment delay, in violation of Commonwealth v. Futch, 447 Pa. 389, 290 A.2d 417 (1972). At a hearing on the amended petition, the representative of the Commonwealth stipulated that the issues which appellant wished to raise in post conviction proceedings had not been preserved by prior counsel in post verdict motions. However,

[317 Pa. Superior Ct. 122] there was otherwise no attempt to show that prior counsel had rendered constitutionally ineffective assistance. The P.C.H.A. court dismissed the petition, and Smith appealed.

After appellant had been arrested and had waived his Miranda rights, he was asked if he was willing to submit to a polygraph examination. Although warned at least twice of his right to remain silent and make no statement, he unhesitatingly agreed to take the examination and signed a written consent form.1 During a pre-polygraph interview, appellant confessed to his participation in the robbery but denied that he had shot the proprietor. No polygraph examination was thereafter conducted. During a pre- suppression hearing, in which appellant's confession was found to be voluntary, appellant contended, as he now does on appeal from the denial of P.C.H.A. relief, that the threatened use of the polygraph was coercive and that, therefore, his confession was involuntary and should have been suppressed. He argues specifically that he should have been warned (1) that the taking of the polygraph test was not mandatory; (2) that the results of the test were not admissible as evidence in a court; and (3) that a favorable test result would not necessarily require the police to release him from custody. These instructions, he urges us to hold, must be given in addition to Miranda warnings whenever questioning of a suspect takes place in a polygraph situation.

The polygraph has been acknowledged by the courts of this Commonwealth to be a valuable tool in the investigative process. See: Commonwealth v. Hernandez, 498 Pa. 405, 415, 446 A.2d 1268, 1273 (1982); Commonwealth v. Smith, 487 Pa. 626, 631, 410 A.2d 787, 790 (1980); Commonwealth v. Blagman, 458 Pa. 431, 435-436, 326 A.2d 296,

[317 Pa. Superior Ct. 123] [317 Pa. Superior Ct. 124] 298-299 (1974). Its use does not per se render a confession involuntary. Commonwealth v. Jones, 341 Pa. 541, 548, 19 A.2d 389, 393 (1941); Commonwealth v. Hipple, 333 Pa. 33, 39, 3 A.2d 353, 355-356 (1939). See: Thompson v. Cox, 352 F.2d 488 (10th Cir. 1965); v. McDevitt, 328 F.2d 282 (6th Cir. 1964). A confession is not involuntary merely because it was made in anticipation of, during, or following a polygraph examination. See: 89 ALR3d 236, and cases there gathered. In Pennsylvania, an inculpatory statement made during a pretest interview was held admissible in Commonwealth v. Cain, 471 Pa. 140, 146, 369 A.2d 1234 (1974) (Opinion of Eagen, J., in support of affirmance). Other decisions have impliedly found Miranda warnings adequate, for they have permitted evidentiary use of confessions when they have been made voluntarily in polygraph settings. See also: Commonwealth v. Hernandez, supra; Commonwealth v. Hitson, 482 Pa. 404, 393 A.2d 1169 (1978); Commonwealth v. Dussinger, 478 Pa. 182, 386 A.2d 500 (1978) (plurality opinion); Commonwealth v. Cunningham, 471 Pa. 577, 370 A.2d 1172 (1977); Commonwealth v. Cain, supra; Commonwealth v. Johnson, 467 Pa. 146, 354 A.2d 886 (1976); Commonwealth v. Jones, 457 Pa. 423, 322 A.2d 119 (1974); Commonwealth v. Blagman, supra; Commonwealth v. Marabel, 445 Pa. 435, 283 A.2d 285 (1971); Commonwealth v. Camm, 443 Pa. 253, 277 A.2d 325 (1971), cert. denied, 405 U.S. 1046, 92 S.Ct. 1320, 31 L.Ed.2d 589 (1972). In Wyrick v. Fields, 459 U.S. 42, 103 S.Ct. 394, 74 L.Ed.2d 214 (1982), the defendant, a soldier accused of rape, gave an inculpatory statement to investigators after being confronted with the results of a polygraph test showing his answers to be untruthful. The Court of Appeals granted habeas corpus relief on grounds that appellant's waiver of counsel had not been voluntary. The Supreme Court reversed. The defendant had been informed of his Miranda rights and had signed a written waiver thereof. This was sufficient to demonstrate a voluntary waiver. See also: Keiper v. Cupp, 509 F.2d 238 (9th Cir. 1975); People v. Mason, 29 Ill.App.3d 121, 329 N.E.2d 794 (1975); Grey v. State, Ind., 404 N.E.2d 1348 (1980); State v. Bowden, 342 A.2d 281 (Me. 1975); Lee v. State, 338 So.2d 395 (Miss. 1976); State v. Clifton, 271 Or. 177, 531 P.2d 256 (1975); McAdoo v. State, 65 Wis.2d 596, 223 N.W.2d 521 (1974).

There are no Pennsylvania decisions which have held Miranda warnings inadequate to insure voluntariness of a confession given during a pre-polygraph interview.2 If a suspect freely agrees to submit to a polygraph examination, as did the appellant in the instant case, there would seem to be no reason for holding Miranda warnings inadequate. Not only do such warnings advise the suspect that he has a right to remain silent and answer no questions, but they also instruct him regarding his right to consult counsel. Where an accused voluntarily confesses despite such warnings, his confession will not be rendered more voluntary because a consent to take a polygraph examination was accompanied by warnings that the results of the test have no evidentiary value and that his immediate release will not depend thereon.

Moreover, if a later judicial decision were to impose a requirement for warnings such as appellant suggests, it would not follow that his counsel had been ineffective for failing to preserve and present such an argument in post trial motions. Counsel is presumed to be effective. Commonwealth v. Miller, 494 Pa. 229, 233, 431 A.2d 233, 235 (1981); Commonwealth v. Strickland, 306 Pa.Super. 516, 526, 452 A.2d 844, 849 (1982); Commonwealth v. Norris, 305 Pa.Super. 206, 209-10, 451 A.2d 494, 496 (1982). He will not be deemed ineffective because he fails to predict and/or argue all possible future developments in the law. See: Commonwealth v. Arthur, 488 Pa. 262, 268 n. 3, 412 A.2d 498, 501 n. 3 (1980), cert. denied, 449 U.S. 862, 101 S.Ct. 166, 66 L.Ed.2d 79 (1980); Commonwealth v. Triplett,

[317 Pa. Superior Ct. 125] 476 Pa. 83, 89, 381 A.2d 877, 881 (1977); Commonwealth v. McCloud, 312 Pa.Super. 29, 42, 458 A.2d 219, 225 (1983); Commonwealth v. Warren, 307 Pa.Super. 221, 224, 453 A.2d 5, 7 (1982); Commonwealth v. Brinton, 303 Pa.Super. 14, 18-19, 449 A.2d 54, 57-58 (1982). The law at the time of appellant's trial and at the time of filing post trial motions did not require that, in order to be voluntary, a confession during a pre- polygraph examination be preceded by warnings in addition to those required by the Supreme Court in Miranda.3 Thus, counsel cannot be held ineffective for failing to preserve the absence of additional warnings in post trial motions.

Appellant's contention that his confession was the product of pre-arraignment delay is wholly lacking in merit.4 The rule in Commonwealth v. Futch, supra, now embodied in Pa.R.Crim.P. 130(a), has resulted in a threeprong test for determining when delay will mandate exclusion of inculpatory evidence. "The delay must be unnecessary; evidence that is prejudicial must be obtained; and the incriminating evidence must be reasonably related to the delay." Commonwealth v. Smith, supra 487 Pa. at 630, 410 A.2d at 789-790, quoting Commonwealth v. Williams, 455 Pa. 569, 572, 319 A.2d 419, 420 (1974). The only time period relevant to this determination is the amount of time "the accused is in custody up until the incriminating evidence is obtained. It would be ludicrous to suggest that the time following the statement was related to its procurement." Commonwealth v. Smith, supra 487 Pa. at 631, 410 A.2d at 790. See: Commonwealth v. Van Cliff, 483 Pa. 576, 587, 397 A.2d 1173, 1179 (1979), cert. denied, 441 U.S. 964, 99 S.Ct. 2412, 60 L.Ed.2d 1070 (1979); Commonwealth v. Goodwin, 460 Pa. 516, 527-528, 333 A.2d 892, 897 (1975); Commonwealth v. Futch, supra 447 Pa. at 394, 290 A.2d at 419. Furthermore, it is now settled that time

[317 Pa. Superior Ct. 126] devoted to a polygraph examination, when agreed to by the accused and administered promptly thereafter, will not be considered "unnecessary delay." Commonwealth v. Hernandez, supra 498 Pa. at 415, 446 A.2d at 1273; Commonwealth v. Smith, 487 Pa. at 631, 410 A.2d at 790; Commonwealth v. Hitson, supra 482 Pa. at 407, 393 A.2d at 1171; Commonwealth v. Blagman, supra 458 Pa. at 436, 326 A.2d at 299.

Appellant was arrested at or about 7:30 p.m. on May 5, 1974. His pre-polygraph interview commenced at 9:20 p.m., after he had been processed and warned of his Miranda rights and had signed a consent to submit to a polygraph examination. He began to give a statement, was warned again of his rights to remain silent and to have counsel present, and confessed at or about 9:40 p.m., less than two and a half hours after his arrest. The confession was thereafter reduced to writing and signed by appellant at 11:00 p.m., three and a half hours after arrest. The suppression court's finding that there had been no unreasonable delay, therefore, was fully sustained by the record, and a contrary contention in post verdict motions would have been frivolous. Counsel was not ineffective for failing to preserve this issue for appellate review.

Order affirmed.

FOOTNOTES

1. The written consent provided as follows: I, Keith Smith, do hereby voluntarily and willingly, submit to a detector examination in order to show, if possible, the truth and honesty of my statements, and I hereby release the Philadelphia Police Department, and the examiner, . . . from any and all claims resulting from, or arising out of this examination. 2. Appellant has not cited a decision from any jurisdiction which holds that the Constitution requires additional warnings where a confession is given in connection with a polygraph test. Instead, he relies on dicta appearing in United States v. Little Bear, 583 F.2d 411 (8th Cir. 1978) and Hunter v. State, 590 P.2d 888, 901 (Alaska 1979). He also relies upon the dissenting opinion of Justice Roberts in Commonwealth v. Smith, supra. 3. Miranda v. Arizona, 384 U.S. 436, 86 S.Ct. 1602, 16 L.Ed.2d 694 (1966). 4. Appellant was arrested in 1974, prior to the decision in Commonwealth v. Davenport, 471 Pa. 278, 370 A.2d 301 (1977), where the Supreme Court first enunciated the six hour rule.

Consciousness of Guilt

Virtually all United States courts, so long as a proper evidentiary foundation is made, permit evidence of conduct that indicates the consciousness of guilt by the defendant. The basis of admissibility of such evidence is that certain post‐criminal act behavior is probative of a guilty mind. Behavior that maybe indicative of a “guilty mind” encompasses a wide range of acts, including but not limited to flight, concealment or destruction of evidence, alteration of appearance, attempted suicide, intimidation or bribes of prosecution witnesses or , and use of false identification or aliases. Evidence, Criminal Cases‐Consciousness of Guilt, 29Am.Jur.2d§318(2018). Curriculum Vitae

•. CID Special Agent since 1972 34 yrs investigative experience, 20 yrs as examiner.

• Federally Certified Polygraph Examiner

• 3 Term Past President, American Polygraph Association

• Trained - DOD Polygraph Institute & UV A - FBI Advanced Course.

• B.S. Criminal Justice - MA, Management

• Polygraph Lectures to Secret Service, CIA, DEA, Mexican Govt, Canadian Govt., DOD Polygrapb Institute, APA and AAPP, various Stat~ Associations, US Army JAG School Utility - How ph Stacks Up

• Widacki, Jan & Horvath, - Journal of 23 - Relative Utility and Validity of the Po Technique and

Three Other Common Methods of ...... JL.a...... '''''''' stigation - 80 subjects/20 groups of 4 (3 innocen Correct Error Incl • Polygraph 18 1 1 950/0 • Handwriting 17 1 2 94% • Eyewitness 7 4 9 640/0 • Fingerprints 4 0 16 100%

• Deliver envelope, sign for package with disguised signature

eyewitness, steal item in package and lie to polygraph V.i).. "·"''''''''''''' • NOTE: Fingerprints were of poor quality What is a Polygraph?

• The term ''polygraph'' literally means "many writings." The name refers to the manner in which selected physiological activities are simultaneously recorded • First used in 1906 by Sir James Mackenzie, afamous heart specialist in London, UK • First admitted as evidence in a assault trial, 2 Feb 1935, State (WI) v Loniello & Grignano (Leonard Keeler ran the test) • First used in military criminal case in the "Hess Crown Jewels" Case in 1948. Polygraph History • The polygraph is not an invented instrument but a compilation of several well established medical & psychological recording tools used over 100 years . . -. 1875 - Angelo Mosso -Scientific Cradle - 1895 - Cesare Lombroso - Hydrospygmograph - 1897 - A. Sticker - Galvanic skin phenomenon - 1906 - Sir James Mackenzie - Ink Polygraph - 1907 - S. Veraguth - Sweat gland activity - 1914 - Vittorio Benussi - Respiration - 1918 - Harold Burt - Inspiration/Exhalation - 191 7 - William Marston - Discontinuous BP ------

Polygraph Theory

• One of the more commonly accepted theories is that an examinee's fear of detection produces a measurable physiological reaction that can be recorded when the examinee responds deceptively. There are over a dozen other theories that also support this psychophysiological reaction to deception. • Psychological Set • Salience - Gravitation towards that which is most immediately important • Cognitive Dissonance - resulting from inconsistency between one's beliefs and one's actions • All incorporate the Fear of Detection What is Psychological Set

• "From all the energies about us, our sense organs select only certain ones. The others are tuned out just as effectively as we tune out the voice of one speaker on the radio so that we may hear that of another. •....Although several stimuli compete, only those fitting the need of the moment are selected." Floyd L. Ruch, Psychology and Life 1948 What is Psychological Set

• "Selective Attention - When the mind takes in only some of the information it is exposed to, it is engaging in selective attention. Selective attention is an indispensable adaptive function. We cannot possibly attend to, let alone process, all the information that impinges on our faculties at any given moment. So, we focus on that which seems to us most important and filter out the rest."

Abnonnal Psychology Botzin, et al 1993 Reticular Activating System (RAS) Relevant

)

.. nOise What is Psychological Set • "Double-Bind Effect: A situation constructed where only two escapes are obvious (other options are logically possible, but not allowed). For example, you can be truthful or you can lie; this restricts your obvious choices to only two - a double-bind." - Would prefer to tell the truth but not allowed - Talked out of truth and encouraged to lie

Psychological Set: Its Origin, Theory and Application Polygraph, Vol 30, 196 James Matte & Robert Grove What is Psychological Set • Cognitive Dissonance: Set choices are not just based on the options available; choices are also influenced by the stakes involved in each choice. • Based upon the link between the belief about a specific event and the threat of abandoning that belief, creating a conflict resolved by outside issues, not the specific issue. - Innocent believes he has come to be honest - Comparison questions create belief dissonance

Psychological Set: Its Origin, Theory and Application Polygraph, Vol 30, 196 James Matte & Robert Grove Polygraph Procedure

• Pre-test interview phase

• Data collection phase

• Data analysis phase

• Post-test interview - NDI Release Pre-test Interview

• Rights Advisement • Polygraph Statement of Consent • Collection of Biographical & Medical Info - Fitness to test - Family, health and school/work history • Instrumentation - Explanation of components, physiology and "FFF" • Review of case facts - Ensure validity of questions - Reduce anxiety, anger and emotional baggage • Question Review - Re-enforce psychological set Data Collection Phase

• Attachment of the Components - Pneumograph Assembly - Electro-dermal sensors - Cardio cuff • Acquaintance Test (numbers test) • Administration of the Polygraph - Usually 3-4 charts Data Analysis

• Conduct numeric analysis - 7 position scale • Assign values of + or - 1,2, or 3 to each comparison group (relevant vs. comparison) - 3 position scale • Assign values of + or - to each comparison group (relevant vs. comparison) • Objective Scoring - Measure each component line length - Assign value from table - Determine score Polygraph Conclusions

• "DI" Deception Indicated • "NDI" No Deception Indicated • "No Opinion" (AKA: Inconclusive)

People are not inconclusive polygraph charts sometimes are! What is an "Inconclusive"

• A lack of valid data upon which to form an accurate opinion. - No different from fingerprints - No different from DNA - No different from witness descriptions

PER SONALITY No Deception Indicated

D Comparison Questions II Relevant Questions

Spot 1 Spot 2 Spot 3 ~ @@@ ~M@Uili)D[fl)@@~ @@% [Q)~ ~@~@ @@@ ~[Q)~ . @@@ [Q)~ "Bell Curve" Distribution

50% 500/0

NDI

1000/0 Symmetric Distribution Asymmetric Distribution "01"

NDI Types of Questions

• Four basic question types used in specific issue polygraph testing. Two are analyzed for the detection ofdeception. - Irrelevant Questions - Relevant Questions - Comparison Questions - Symptomatic Questions Relevant Questions

• A question asked during a polygraph examination that pertains directly to the matter under investigation for which the examinee is being tested. -. The more physical the better! - The less abstract the better! - The simpler the better Relevant Questions

• Address the primary issue • Test direct involvement • Use an action verb to describe the act committed • Generally requires a "No" answer. - Exception for "victim" Example Relevant Questions • Did you steal that computer?

• Did •vou strike that man with that hammer? • Did you make any ofthe writings on that check?

• Did •vou shoot that man? • Did you set fire to that building? • Did you use any cocaine while in that house? • Did you personally receive any money for awarding that contract? Comparison Questions

• In most specific issue polygraph testing formats, it is a test question that is a "Probable Lie." A "probable lie" is the denial ofa misdeed that a person has more than likely engaged in, participated in or has considered. • It is a question for which the physiological . response is designed to be compared to the physiological response ofthe relevant question. • Generally separated from the relevant issue by time or situation Example Comparison Questions

• Before 2002, did you ever [steal from] [lie to] anyone who placed their trust in you? • Between your 12th and 21 st birthday, did you ever steal anything at all you haven't told me about? • Did you ever provide false information on any official document? • Prior to this year, did you ever participate in a sexual act you would be likely to lie about? Accuracy

• American Polygraph Association 0'0129,2000, No 3) • The National Academy for Scientific Investigative Study conducted a Field Validity Study of 3009 confirmed federal polygraph tests from 1998 and 1999. The study found examiners correctly identified 100% ofNDI examinees and 99.5% of DI examinees, excluding Inconclusive results *. *94.80/0 NDI / 90.50/0 DI with inconclusive results included Accuracy "PhilUp Crewson Study"

A review of literature published between1986 and 2001 was conducted to evaluate studies reporting the accuracy and reliability of screening and diagnostic tests in polygraph,medicine, and psychology. Data for 198 studies were collected from 145 articles. Accuracy estimates are the combined average of sensitivity and specificity across all studies found within a particular category (1.00 = 100% accuracy) .

•.; ccurac'l

Diagnosl!!kc

SCi~eenin;;J

0.50 fL60 0.70 O.BO 0.90 ~I...... on

iii Psychology CI Medicine II Polygraph Accuracy of Various Diagnostic Tools The average accuracy reported for 37 diagnostic polygraph studies (specific issue) was similar to MRI (17 studies)

CT (19 studies) Poly graph Ultrasound (38 studies) CT MMPI (17 studies) US Plain Film

05D 0.60 0.70 0.80 0.90 c.OO

MMPI had the lowest reported accuracy. Diagnostic Accuracy by Target Condition

[::iagnostic A.ccuraoj by Target Ccncitl::Jn

Acute Appendicitls-CT ~~---r--~--~ Brain Turmr

carotic p..rter~ Disease

p..cu:e Appendicitis-US ~~---r--~~ Breast Cancer-US Deception-Poly graph

050 0.60 0.70 0.80 0.90 1.00 Conclusion

"The level of accuracy and agreement reported in the polygraph literature is consistent with the medical and psychological literature."

Executive Summary A Comparative Analysis of Polygraph with other Screening and Diagnostic Tools National Polygraph Consultants, Inc.

htlp:llwww.nationalpolygraphconsultants.comlexecSummary.pdf Problem with Polygraph Research • Polygraph involves human interaction - Every human is different - Every polygraph examiner is different - Every issue is different • Real life reactions differ from the laboratory - Fear of detection is difficult to replicate in the lab - Real life involves • Anger • Fear • Social stigma • Outside influences Accuracy

• 80 research projects published since 1980

• 6,380 polygraph examinations or sets of charts from examinations • 12 studies of the validity of field examinations, following 2, 174 field examinations providing an average accuracy of 98%.

• Researchers conducted 11 studies involving the reliability of independent analyses of 1,609 sets of charts from field examinations confirmed by independent evidence providing an average accuracy of 92%.

• Researchers conducted 41 studies involving the accuracy of 1,787 laboratory simulations of polygraph examinations, producing an average accuracy of 80%.

·Researchers conducted 16 studies involving the reliability of independent analyses of 810 sets of charts from laboratory simulations producing an average accuracy of 81 %. What do they all have in common?

• Medical tests • Psychological tests • Vision tests • Hearing tests • Academic tests • Polygraph tests

Accuracy depends upon the skill of the examiner and the cooperation of the examinee Thoracic breathing Abdominal breathing Electro-dermal Activity Cardio Activity

' ,(P2 52.': " ~~ Exam Ie C·hart

Ch 1 X S1 S;> S:l S4 S5 XX ;> X 11 :l X n 12 R3 14 R5 C6 17 R8 R9 Cl0 XX

~I.I ~I [1(jl~hl 51 H) 152Q 25 3Q 35 4Q 45 I 0 I ~ I .!::J ~~---- -,,,,",-",,",_., :-- '-"---~..,..-.... - ,----".'.'"------'--'''---" .. '.. ,,------'",",.":,"':'","'--~ "'":---'"~'--- ""'rr'1---' ... -...... ,. ... _. r:··="="o"' ...... ;.. ·==..... ~.. ==O .. ·c:''':=·=.. ·='~'=,.c:'O·...... T==c..'!NuM'T"'C"==' For Help_ press F1 • •• ,_, ••••• _._." - 'M'" " """ ...... _.. .. """""" NDI Reaction at 4 & 6 compared to 5

S5 S6 S7 xx 2 X 1A 2 S3 C4 R5 Cli R7 S8 C9 R10 xx 3 X 18 2 S3 Cli R5 C6 R7 S8 C9 Rl0 11 12 13 xx 4 X 10 1A Cou ntermeasu res

Can the test be beaten? How do you do it? - Spontaneous Countermeasures (Occur often) • Ineffective but used by both DI & NDI people • Usually obvious, usually late and often exaggerated! - Physical Countermeasures (Most often taught) • Usually delayed • Over reactive - difficult to control - exaggerated • Movement sensors and new algorithms very effective at detection - Mental Countermeasures (Difficult to master) - Drugs -None known to differentiate between comparison and relevant questions

Countermeasures on Wrong Question

£-... Session Display Action Window Help .-.. I Ch 1 X 2 3 4 5 6 7 XX Ch 2 X n SR2 E3 C4 R5 Co R7 E8 ;1 (: C9 Rl0 XX Ch 3 X

SR2 E3 Co ns R5 C4 R7 E8 11A C9 Rl0 XX Ch 4 Real Reactions v. Countermeasures Polygraph & Military Court • I doubt, though, that the rule of per se exclusion is wise, and some later case might present a more compelling case for introduction of the testimony than this one does. Though the considerable discretion given to the trial court in admitting or excluding scientific evidence is not a constitutional mandate, see Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 587 (1993), there is some tension between that rule and our holding today. And, as Justice Stevens points out, there is much . inconsistency between the Government's extensive use of polygraphs to make vital security determinations and the argument it makes here, stressing the inaccuracy of these tests.

Consenting - Kennedy J. SUPREME COURT OF THE UNITED STATES US v. Scheffer Polygraph & Military Court • The stated reasons for the adoption of Rule 707 do not rely on any special military concern. They merely invoke three interests: (1) the interest in excluding unreliable evidence; (2) the interest in protecting the trier of fact from being misled by an unwarranted assumption that the polygraph evidence has "an aura of near infallibility"; and (3) the interest in avoiding collateral debates about the admissibility of particular test results.

• It seems clear that those interests pose less serious concerns in the military than in the civilian context. Disputes about the qualifications of the examiners, the equipment, and the testing procedures should seldom arise with respect to the tests conducted by the military. Moreover, there surely is no reason to assume that military personnel who perform the fact-finding function are less competent than ordinary jurors to assess the reliability of particular results, or their relevance to the issues. Thus, there is no identifiable military concern that justifies the President's promulgation. of a special military rule that is more burdensome to defendants in military than the evidentiary rules applicable to the trial of civilians. Stevens, J., dissenting SUPREME COURT OF THE UNITED STATES US v. Scheffer

Brief History of the Polygraph

Since the dawn of civilization, mankind has sought ways to distinguish truthfulness from lying in those individuals suspected of criminal wrongdoing. Various inventive techniques for the verification of truth and the detection of deception have been tried over the centuries, many of these being ridiculous and cruel. Despite their primitiveness, each technique was based on the assumption that some form of physiological reaction occurred within a person when confronted with certain stimuli regarding a specific event under investigation, and that this physiological reaction would, in turn, be manifested in certain recognizable external symptoms that were indicative of honesty or deception.

Polygraphy — the science of truth verification based upon psychophysiological analogues — is barely 100 years old.

In 1730, British novelist Daniel Defoe wrote an essay entitled "An Effectual Scheme for the Immediate Preventing of Street Robberies and Suppressing all Other Disorders of the Night", wherein he recommended that taking the pulse of a suspicious fellow was a practical, effective and humane method for distinguishing truthfulness from lying. Defoe's was an early and insightful suggestion to employ medical science in the fight against .

In 1878, science first came to the aid of the truthseeker through the research of Italian physiologist Angelo Mosso. It was then that Mosso used an instrument called a plethysmograph in his research on emotion and fear in subjects undergoing questioning and he studied the effects of these variables on their cardiovascular and respiratory activity. Mosso studied blood circulation and breathing patterns and how these changed under certain stimuli. The use of the plethysmograph revealed periodic undulations or waves in a subject's blood pressure caused by the respiratory cycle in response to certain stimuli. Angelo Mosso was the first scientist to report on experiments in which he observed that a person's breathing pattern changed under certain stimuli, and that this change, in turn, caused variations in their blood pressure and pulse rate.

Although not for the purpose of detecting deception, Sir James Mackenzie, M.D., constructed the clinical polygraph in 1892, an instrument to be used for medical examinations with the capability to simultaneously record undulated line tracings of the vascular pulses (radial, venous and arterial), by way of a stylus onto a revolving drum of smoked paper.

Until the end of the 19th century, no measuring device for the detection of deception had ever been used. The first use of a scientific instrument designed to measure physiological responses for this purpose came in 1895 when Italian physician, psychiatrist and pioneer criminologist Cesare Lombroso modified an existing instrument called a hydrosphygmograph and used this modified device in his experiments to measure the physiological changes that occurred in a crime suspect's blood pressure and pulse rate during a police .

Notably, Lombroso's early device for measuring pulse rate and blood pressure is similar to the cardiosphygmograph component of the contemporary polygraph. Although Cesare Lombroso did not invent the hydrosphygmograph, he is accorded the distinction of being the first person to have used the instrument successfully as a means for determining truthfulness from deception in crime suspects. On several occasions, he used the hydrosphygmograph in actual cases to assist the police in the identification of criminals. In 1906, Sir James Mackenzie refined his clinical polygraph of 1892 when he devised the clinical ink polygraph with the help of Lancashire watchmaker, Sebastian Shaw. This instrument used a clockwork mechanism for the paper-rolling and time-marker movements and it produced ink recordings of physiological functions that were easier to acquire and to interpret. Interestingly, it has been written that the modern polygraph is really a modification of Dr. Mackenzie's clinical ink polygraph.

In 1914, Italian psychologist Vittorio Benussi discovered a method for calculating the quotient of the inhalation to exhalation time as a means of verifying the truth and detecting deception in a subject. Using a pneumograph — a device that recorded a subject's breathing patterns — Benussi conducted experiments regarding the respiratory symptoms of lying. He concluded that lying caused an emotional change within a subject that resulted in detectible respiratory changes that were indicative of deception.

Dr. , an American attorney and psychologist, is credited with inventing an early form of the lie detector when, in 1915, he developed the discontinuous systolic blood pressure test which would later become one component of the modern polygraph. Dr. Marston's technique used a standard blood pressure cuff and a stethoscope to take intermittent systolic blood pressure readings of a suspect during questioning for the purpose of detecting deception.

In 1921, John A. Larson, a Canadian psychologist employed by the Berkeley Police Department, in California, developed what many consider to be the original lie detector when he added the item of respiration rate to that of blood pressure. He named his instrument the polygraph — a word derived from the Greek language meaning many writings — since it could read several physiological responses at the same time and document these responses on a revolving drum of smoked paper. Using his polygraph, John A. Larson was the first person to continually and simultaneously measure changes in a subject's pulse rate, blood pressure and respiratory rate during an interrogation. His polygraph was used extensively, and with much success, in criminal investigations.

In 1925, Leonarde Keeler, who had gained firsthand experience in polygraph as a result of working with John A. Larson at the Berkeley Police Department, worked to devise a polygraph that used inked pens for recording the relative changes in a subject's blood pressure, pulse rate and respiratory patterns, thus eliminating the need for smoking the paper and then preserving it with shellac.

In 1926, the Keeler Polygraph came on the market as the new and improved lie detector, an enhanced version of John A. Larson's polygraph.

In 1938, Leonarde Keeler further refined the polygraph when he added a third physiological measuring component for the detection of deception — the psychogalvanometer — a component that measured changes in a subject's galvanic skin resistance during questioning, and in doing so, thus signaling the birth of the polygraph as we know it today.

In 1939, Leonarde Keeler patented what is now understood as the prototype of the modern polygraph — the Keeler Polygraph. Today, Leonarde Keeler is known as the father of polygraph.

In 1947, John E. Reid, a from Chicago, Illinois, developed the Control Question Technique (CQT), a polygraph technique that incorporated control questions (comparison) which were designed to be emotionally arousing for non-deceptive subjects and less emotionally arousing for deceptive subjects than the relevant questions previously used. The Control Question Technique (CQT) replaced the Relevant/Irrelevant Question Technique (RIT) which used relevant or irrelevant questions during a polygraph examination. The Reid Control Question Technique was a major breakthrough in polygraph methodology.

In 1948, Leonarde Keeler founded the world's first polygraph school — the Keeler Polygraph Institute — in Chicago, Illinois.

In 1960, , building upon the Reid Control Question Technique, developed the Backster Zone Comparison Technique(ZCT), a polygraph technique which primarily involved an alteration of the Reid question sequencing.

Cleve Backster also introduced a quantification system of chart analysis, thus making it more objective and scientific than before. This system for the numerical evaluation of the physiological data collected from the polygraph charts has been adopted as standard procedure in the polygraph field today.

Since 1962, the study of the use of computers in the physiological detection of deception has progressed through several phases.

In the late 1970s, Dr. Joseph F. Kubis, of Fordham University in New York City, was the first researcher to use potential computer applications for the purpose of polygraph chart analysis.

During the 1980s, research was conducted on computerized polygraph at the University of Utah by Drs. John C. Kircher and David C. Raskin and, in 1988, they developed the Computer Assisted Polygraph System (CAPS), which incorporated the first algorithm to be used for evaluating physiological data collected for diagnostic purposes.

In 1992, the polygraph made its official entrance into the computer age.

In 1993, statisticians Dr. Dale E. Olsen and John C. Harris at Johns Hopkins University Applied Physics Laboratory, in Maryland, completed a software program called PolyScore, which used a sophisticated mathematical algorithm to analyze the polygraph data and to estimate a probability or degree of deception or truthfulness in a subject.

PolyScore 3.0 Polygraph Software was developed by analyzing the data from polygraph examinations administered in 624 real criminal cases in which 303 suspects were non- deceptive and 321 suspects were deceptive.

In 2003, PolyScore 5.1 Polygraph Software was developed by analyzing the data from polygraph examinations administered in 1,411 real criminal cases provided by the United States Department of Defence Polygraph Institute for study and comparison purposes.

PolyScore is a computerized polygraph chart scoring algorithm that uses statistical probability to arrive at truthfulness or deception. It has been shown that validated algorithms have exceeded 98 per cent in their accuracy to quantify, analyze and evaluate the physiological data collected from polygraph examinations administered in real criminal cases. In 2003, the U.S. Department of commissioned a review committee of The National Academy of Sciences to study the scientific evidence on the polygraph. In this endeavour, the committee sifted through existing evidence in the polygraph research literature and did not conduct any new laboratory or field research on polygraph testing for, as they clearly reported, real-world conditions are difficult — if not impossible — to replicate in a mock-crime setting or a laboratory environment for the purpose of assessing polygraph effectiveness.

The review committee of The National Academy of Sciences concluded that, although there may be alternative techniques to polygraph testing, none can outperform the polygraph, nor do any of these yet show promise of supplanting the polygraph in the near future.

Withstanding more than a century of research, development and widespread use, the polygraph test remains the most effective means of verifying the truth and detecting deception.

Polygraph Testing Of Sex Offenders

Treatment of Sex Offenders: Strengths and Weaknesses in Assessment and Intervention. Polygraph Testing Of Sex Offenders* Don Grubin

Introduction breaching a number of basic ethical principles relating to autonomy and non-malfeasance From tentative beginnings in the (Chaffin, 2011; Cross & Saxe, 2001; Meijer et 1990s, post conviction sex offender testing al, 2008), and, common to all critical commen- (PCSOT) has become increasingly incorpo- taries, lacks research to show that it is effec- rated into sex offender treatment and super- tive (Rosky, 2013). vision in both the United States and United Kingdom. McGrath et al (2010), for example, To what extent, then, does PCSOT reported that nearly 80% of community adult make a positive contribution to sex offender sex offender programs in the US and over half treatment and management, a question some- of residential ones make use of polygraph test- times simplified to, ‘does it work?’ As a first ing to inform treatment or supervision, while consideration, it must be able to differenti- in the UK mandatory testing of high risk sex ate truth telling from deception reliably, and offenders on parole was introduced in 2014 it should facilitate the disclosure of clinically after a number of trials. Its spread to other relevant information. If it meets these require- countries is likely, with a number of jurisdic- ments, it then needs to be demonstrated that tions actively considering its use. in doing so it has a beneficial impact on treat- ment and/or management. But even if PC- The growing influence of PSCOT, how- SOT does ‘work’ in this way, if in the process it ever, is not without controversy. The speed crosses ethical or legal red lines then it would with which it has been embraced by programs be hard to justify continued reliance on it. has tended to outpace evidence, with much of its impetus coming from clinical experience Polygraph Testing supported by a research base of limited ro- bustness. Only recently have more well de- As indicated above, there are two pri- signed studies been carried out. Although this mary outputs from a polygraph test, each of is not unusual when new procedures are intro- which complements the other. duced, PCSOT carries with it significant bag- gage associated with polygraph testing more The first, and what people usually as- generally. Thus, while proponents claim that sociate with polygraph testing, is test outcome, PCSOT makes important contributions to sex that is, whether an examinee ‘passes’ or ‘fails’ offender treatment and management by bring- the test. Although the focus is typically on ing to attention changes in risk, facilitating ‘,’ determination of truthfulness is disclosures, and perhaps encouraging offend- equally important. In order to shift attention ers to modify their behaviour (Grubin, 2008; away from the polygraph as a ‘lie detector,’ Levenson, 2009), others are more sceptical. therefore, many practitioners now refer to it Commentators, for example, have argued that as a means of ‘credibility assessment’ (Raskin the type of polygraph test used in PCSOT lacks et al, 2014). The fundamental questions here, validation, is unscientific and potentially dan- of course, are how accurate polygraphy is in gerous (Ben-Shakhar, 2008; Iacono, 2008), detecting deception and confirming honesty, may adversely affect the therapeutic alliance and whether that level of accuracy is sufficient between offender and therapist or supervisor for the setting in which it is being used. Un- (McGrath et al, 2010; Vess, 2011), is based fortunately, this second question is often over- on manipulation or intimidation, potentially looked, an important oversight when translat-

*This chapter appears in the; Treatment of Sex Offenders pp 133-156 Date: 01 March 2016 Polygraph Testing of Sex Offenders Don Grubin © Springer International Publishing Switzerland 2016 DOI 10.1007/978-3-319-25868-3_6 Print ISBN 978-3-319-25866-9 Online ISBN 978-3-319-25868-3 It appears with permission of Springer.

Polygraph, 2016, 45 (2) 97 Don Grubin

ing research findings regarding accuracy into bold countenance or a false tongue. (Defoe, practice – what may not be accurate enough 1730/quoted in Matte, 1996) in a context or in a court of law may be sufficient for investigating crime or Fairly, though, Defoe also noted, “The when used post conviction. experiment perhaps has not been try’d.”

The second output of a polygraph test While the phrase ‘a tremor in the blood’ is disclosure. Numerous studies have report- is so often quoted by those who write about ed that individuals report information during the history of the polygraph that it is in danger a polygraph examination they would otherwise of becoming a cliché, it nonetheless lays the have kept to themselves. Critics sometimes groundwork for both the basis of polygraph dismiss this effect as being a ‘bogus pipeline to testing, and some of the misconceptions asso- the truth’ as they say it depends on an exam- ciated with it. inee believing that the polygraph ‘works’, and that disclosures would not occur if examinees The involuntary physiological respons- did not hold this belief. This assertion, how- es associated with guilt and deception recog- ever, begs two questions: the extent to which nized by Defoe are now known to be caused disclosures are in fact dependent on a belief by activity in the autonomic nervous system. in the accuracy of the polygraph test, and if These responses, however, are not unique to they are, the level of accuracy required to trig- deception – lots of things can make the blood ger this effect. As will be discussed later in tremor besides guilt and lying, and no physi- this chapter, while many social psychology ological variable has yet been discovered that studies have demonstrated that disclosures is specific to deception. Because of this, it is do increase when subjects believe they are at- sometimes concluded that polygraphy, or any tached to a ‘lie detector,’ the strength of this other technique that relies on recording and effect is unclear. A third more philosophical interpreting physiological activity, cannot pos- consideration also arises in respect of this is- sibly work. But there need not be a unique sue – if disclosures are a function of a belief in physiological lie response for polygraph test- polygraph accuracy, but polygraphy is shown ing to be effective; instead, what matters is to meet the level of accuracy required to trig- whether physiological reactivity recorded in ger this belief, is it still correct to refer to the the context of a polygraph examination dis- phenomenon as a ‘bogus’ pipeline? criminates truth telling from deception at levels sufficiently above chance to make the Thus, although discussions about technique meaningful and worthwhile. False PCSOT often get bogged down in arguments positive and false negative findings occur with about accuracy levels and the basis of disclo- every test and investigation; more relevant is sures, both issues are more complex than they being able to quantify their frequency, and en- appear at face value. sure that whatever actions follow a test result take this error rate into account. What the polygraph records A second misconception that can be That there is an association between seen in Defoe’s observations is that physiolog- deception and physiological activity has been ical responses associated with deception are known for centuries. One of the earliest and the result of emotion, especially the emotions clearest expressions of this was by Daniel De- of fear and anxiety. This mistake leads some foe, who when writing about the prevention of to argue that anxious individuals, either in- street robberies in the 18th century, observed: herently or because they are made anxious by the circumstances of the test, are likely to Guilt carries fear always about with it; wrongly ‘fail’ for this reason. Other critics are there is a tremor in the blood of the thief that, concerned that in order for the test to work if attended to, would effectually discover him polygraph examiners must induce anxiety or . . . take hold of his wrist and feel his pulse, fear in examinees, which is ethically dubious there you shall find his guilt; a fluttering heart, (BPS, 1986; Vess, 2011). There is also a be- an unequal pulse, a sudden palpitation shall lief that psychopathic individuals, because evidently confess he is the man, in spite of a of their low levels of anxiety and emotional

98 Polygraph, 2016, 45 (2) Polygraph Testing Of Sex Offenders

responsiveness, are more likely to ‘beat’ the are simply told to pick a number and then to test. But though there is uncertainty regard- lie when asked if they have done so), and not ing the mode of action of polygraphy and the all polygraph formats require lying at all but neuropsychological basis of the physiological instead relate to the ‘recognition’ of relevant reactions it records, it is clear that emotional items. reactivity is only part of the story, and that a number of cognitive processes associated with The reality is that we are well short deception contribute to what the polygraph of understanding the mode of action of the observes. Anxiety and fear, except insofar polygraph, indicated by the number of theo- as they indicate that the examinee takes the ries proposed to explain it (National Research examination seriously, are likely to be minor Council, 2003; Nelson, 2015), although it is components at best. More will be said about now accepted that a range of mental processes this later. are involved in addition to emotion. Important are concepts and factors such as ‘differential Cardiovascular, respiratory, and elec- salience,’ (that is, differing degrees of impor- trodermal activity measured by recording de- tance or threat represented by the questions vices as opposed to being observed indirect- asked on the test), the cognitive work involved ly began to be used as a means of detecting in lying and in inhibiting truth telling (truth deception in the late 19th and first part of telling being the default position), autobi- the 20th centuries, mainly on their own but ographical memory, orienting to ‘threat,’ and in some cases together, both in and attention, (Senter et al, 2010; Nelson, 2015), the United States (Alder, 2007; Krapohl and which interact to produce in the au- Shaw, 2015). Criminologists, psychologists, tonomic nervous system that can be seen in a and physicians such as Cesare Lombroso, number of peripheral physiological processes. Hugo Munsterberg, Georg Sticker, Vittorio Benussi, Walter Summers, William Marston, While a lengthy discussion regarding John Larson and Leonarde Keeler researched the physiological and psychological mecha- and applied their various techniques, some- nisms underlying polygraphy cannot be pur- times with phenomenal claims of success. In sued here, the fundamental point is that con- the 1930s this work coalesced into instru- ducting a successful polygraph test is about ments that could simultaneously record data more than simply attaching the recording from the three physiological systems, giving hardware and then asking questions. In- rise to what became known as the polygraph. stead, the examiner must work at ensuring Although the hardware has improved since that whatever reactions are recorded are pro- then, and the process has become digitalised duced because the examinee is deceptive to so that ink pens writing on moving paper are the questions being asked, rather than by oth- no longer required, little has changed in terms er possible causes of autonomic arousal. This of the basic physiology that is recorded. is achieved in a lengthy pre-test interview, and requires examiner training and skill, in oth- In what way is activity in these phys- er words, a competent examiner. Given that iological systems associated with deception? the process is so heavily dependent on the ex- Traditionally polygraph examiners are taught aminer’s capabilities it has been argued that that what they are observing is a ‘fight flight, polygraphy should not be seen as a ‘scientific or freeze’ response caused by the fear of being test’ (BPS, 2004), but this is perhaps more of a caught out in a lie and the consequences that semantic than a practical issue – operator skill follow, implicitly accepting an emotional basis is important in all forms of scientific testing. to the test’s mode of action. There are a num- But whether ‘scientific’ or not, what matters ber of major problems with this explanation, is whether, in the hands of a competent exam- however: response characteristics that are as- iner, polygraph testing can be shown to be a sociated with deception on the polygraph test reliable means of distinguishing truth telling are not identical to what is seen in a ‘fight, from deception. flight or freeze’ scenario, deceptive responses are recorded even where there is little anxiety In terms of PCSOT, there is no need to and no consequence attached to being caught induce anxiety in examinees, anxious individ- out (for example, in tests where examinees uals are no more or less likely to ‘fail’ the test,

Polygraph, 2016, 45 (2) 99 Don Grubin

and, because the generation of fear or anxiety highly supportive of a meaningful association is irrelevant, psychopaths are no more or less between what the polygraph records, truth likely to wrongly ‘pass’ the test (Raskin & Hare, telling and deception. 1978; Patrick & Iacono, 1989). Furthermore, as will be discussed later, the examinee does The National Academies review was not need to be deceived about the accuracy of carried out on behalf of the US Department of polygraphy nor manipulated in other ways for Energy, triggered by allegations of the test to be successful. at the Los Alamos nuclear weapons facility, and was designed to advise on the use of poly- Polygraph accuracy graph testing for personnel security vetting. Its overall conclusion was that an error rate of While the physiological targets of poly- 10 to 20% was too high for this type of appli- graph testing have not changed much since cation given the low levels of deception likely the 1930s, numerous testing techniques, to be found in the population to be tested (one question formats, scoring systems, and spe- hopes that there are not many spies working cialised applications have emerged since then, in federal agencies), and the disproportionate often introduced with little empirical support. number of false positive findings such an error The plethora of approaches and the associated rate would imply. Although polygraph propo- lack of standardisation have made it difficult nents disagree with this conclusion, arguing to provide clear estimates of polygraph accu- that it is based on a misconception of the way racy. in which security vetting is undertaken be- cause in this setting it acts as an initial screen A number of initiatives have meant rather than providing a definitive outcome, the situation has improved (Kraphol & Shaw, more important in terms of PCSOT is the re- 2015). Chart scoring, as opposed to decisions view’s observation that polygraphy becomes based on a global overview of the polygraph viable when the underlying rate of deception is chart, was introduced in the 1960s, a hard- over 10% – a rate which most observers, even ening of testing protocols took place between those critical of polygraphy, would accept is the 1960s and 1990s, increased acceptance of probably exceeded in sex offender populations. blind scoring of charts as a means of Quali- ty Control to overcome the risk of examiner For a number of reasons, however, the bias became more commonplace in the 1990s, National Academies Review is not the end of research in the early 2000s better clarified the story, at least in terms of PCSOT. Its es- response patterns that are indicative of de- timate of accuracy is based on single issue, ception (and just as importantly, response ‘diagnostic’ tests, that is, tests in which a sin- patterns that aren’t) and the amount of vari- gle known issue is being investigated, for ex- ance explained by the different physiological ample, whether an individual was involved in channels, and in the late 2000s the American a bank robbery. Although this is sometimes Polygraph Association undertook an exercise the case in PCSOT, as when the focus is on to validate testing techniques (American Poly- specific behaviors reported to have occurred graph Association, 2011). All of this has pro- during an offence, or where the matter of con- vided a better scientific basis on which to eval- cern is whether the offender is responsible for uate the efficacy of polygraph testing. a new crime, the majority of tests carried out in PCSOT are screening in nature. In screen- The most definitive review of polyg- ing tests a number of behaviors are explored, raphy accuracy to date has been carried out but there is not a known event that underpins by the National Academies of Science in the the thrust of the exam. United States. It concluded that “polygraph tests can discriminate lying from truth telling Screening tests are generally consid- at rates well above chance, though well below ered to be less accurate then single issue tests, perfection” (National Research Council, 2003, although there are insufficient trials from p. 4). Accuracy for the most commonly used which to determine their precise level of ac- test format, the comparison question test (a curacy. Screening tests however tend to have version of which is employed in PCSOT), was higher false positive rates (tests which wrongly estimated to be between 81 to 91%, which is label an examinee as deceptive). Two studies

100 Polygraph, 2016, 45 (2) Polygraph Testing Of Sex Offenders

used anonymous surveys with sex offenders (the first where it the positive predictive value in the US to ask about their experiences of be- is higher, the second when the negative pre- ing wrongly accused of deception, and also of dictive value is). instances where deception had been missed (Grubin & Madsen, 2006; Kokish et al, 2005). There remains the problem of exam- The findings were very similar, with responses iner competence and its impact on test accu- from offenders in both studies suggesting an racy. However, if properly trained examiners accuracy rate for PCSOT between 80 and 90%, use correct techniques that are administered reassuringly similar to the National Academies properly then their accuracy rate should be estimate. similar to that reported in the research liter- ature. Ensuring this is the case requires a Because of its likely error rate, the util- well-constructed quality assurance and quali- ity of PCSOT tends to be emphasised rather ty control program, which unfortunately many than its accuracy, with disclosures seen as PCSOT programs lack. But this is a reason more important than test outcome. In ad- to improve programs rather than to dismiss dition, it is recommended that outcome in polygraphy. Provided it is in place the import- screening tests is reported as ‘significant re- ant question becomes not whether polygraph sponse’ or ‘no significant response’ rather than is ‘accurate’, but whether accuracy in the ‘deception indicated’ or ‘no deception indicat- range of 80 to 90% is accurate enough. ed’ as it is in single issue tests. However, a more recent initiative has expressed polygraph The answer to this question will de- test outcome as a probability statement with pend on how test outcome is used. An error confidence intervals derived from data normed rate of 10 to 20% is clearly too high to warrant on large sets of confirmed tests. Referred to sending someone to prison or taking away as the ‘Empirical Scoring System’ (ESS), this their livelihood, but not too high to inform de- allows a better judgment to be made about the cisions about treatment engagement, chang- degree of confidence one can have in a given es in monitoring conditions, or the need for test result (Nelson et al, 2011). Although the further investigation into possible transgres- data base on which ESS is built could be larg- sions. This is particularly the case when one er, and while it still requires independent vali- remembers that typically we make these types dation, this type of approach provides greater of decision based on our own determination on polygraph test accuracy in environ- of whether or not someone is deceptive, even ments such as PCSOT. though in experimental settings the ability of the average person to do so accurately is rare- The error rate associated with polygra- ly above 60% (Bond & DePaulo, 2006; Vrij, phy, and its screening function in most PCSOT 2000). settings, means that it is probably a mistake to talk about an individual ‘passing’ or ‘failing’ Utility and Disclosure the test. One doesn’t pass or fail a screen- ing exam of any sort. The aim of screening Polygraphy is known to increase the is to identify those who require further inves- likelihood that an examinee will disclosure tigation. In the case of PCSOT, significant previously unknown information. There are responses to some questions are observed, many anecdotal accounts of this phenomenon which might be thought of as ‘screening posi- in both investigative and screening settings, tive’, but this is different from failing a test. It but the best evidence for this effect is found is therefore probably more sensible to think in in sex offender testing where numerous stud- terms of positive and negative predictive val- ies describe significant increases in self-report ues: the former referring to the likelihood of a of previous offence types and victims, deviant true positive (that is, deception) when an indi- sexuality, and risky behaviors (for example, vidual shows a significant response, and the Ahlmeyer et al, 2000; Grubin, et al, 2004; Heil latter to the likelihood of truthfulness when no et al, 2003; Hindman & Peters, 2001; Madsen significant responses are recorded. It is usu- et al, 2004). This work, however, lacks robust- ally the case that one is higher than the other, ness in that comparisons are usually made in providing an indication of whether one should terms of what was known about an offender be more confident in deceptive or truthful calls before and after polygraph testing rather than

Polygraph, 2016, 45 (2) 101 Don Grubin

with contemporaneous comparison groups in al was limited to high risk offenders (defined which polygraph testing is not used. As critics as those released on parole following a pris- readily point out, this makes it difficult to dis- on sentence of a year or more), and though entangle the effects of polygraphy from other many had undertaken sex offender treatment factors such as treatment impact or changes in prison, relatively few were involved in com- in supervision. munity treatment programs. The focus of the mandatory trial, therefore, was on the impact The lack of a comparison group with of polygraph testing on supervision only. which to determine polygraph efficacy in facil- itating disclosures has been addressed in two There were over 300 offenders in each large UK studies, both of which confirmed the group, which again did not differ on demo- findings of earlier work that showed increases graphic variables. Although the mandatory in disclosure when polygraphy is used. In one trial involved an overall higher risk group and of these studies polygraph testing was volun- many fewer were in treatment than in the vol- tary (Grubin, 2010), while in the other it was a untary trial, its findings were similar. Signif- mandatory condition of a parole license (Gan- icant increases were found in the number of non et al, 2012; Gannon et al, 2014). individuals who made what were referred to as ‘clinically relevant disclosures’ and in the In the trial of voluntary testing (Gru- number of disclosures these individuals made bin, 2010), the supervision of nearly 350 poly- in the polygraph group. This was particularly graphed offenders was compared with 180 sex noticeable in respect of sexual and risk related offenders from probation areas where polygra- behaviors. However, the odds ratio of a disclo- phy was not used. Just over 40% of eligible of- sure being made was lower at 3.1. fenders agreed to be tested, of whom 47% were tested more than once. The majority were tak- In both studies significantly more ac- ing part in treatment programs. Probation of- tions were taken by probation officers who ficers reported that new disclosures relevant to managed offenders subject to polygraphy than treatment or supervision were made in 70% of by probation officers supervising comparison first tests, compared with 14% of the non-po- offenders. One interesting finding reported lygraphed offenders making similar types of in Gannon et al (2012) was that while 73% of disclosure in the previous six months. A sim- interviewed probation officers believed the of- ilar difference was found in respect of retests fenders they supervised were ‘open and hon- (only in this case the comparison was with est’ with them, this was the case for only 25% three months before). The disclosures made of the probation officers who supervised poly- by polygraphed offenders were rated as ‘me- graphed offenders. This is perhaps an expla- dium’ or ‘high’ severity (the former relating to nation for the finding in Grubin (2010) that behaviors indicative of increased risk, the lat- whereas probation officers of polygraphed of- ter to actual breaches or offences) in over 40% fenders were more likely to increase risk rat- of cases. The odds of a polygraphed offender ings, risk ratings were more likely to be de- making a disclosure relevant to his treatment creased in the comparison group. or supervision were 14 times greater than they were for non-polygraphed offenders. Although the impact of polygraph test- ing on disclosures is clear, the question still Although the test and comparison remains whether it is simply a ‘bogus pipeline’ groups reported in Grubin (2010) did not effect. As described earlier, this refers to the differ on demographic or criminological vari- increase in disclosures being the product of ables, the fact that those tested were volun- a belief that the polygraph ‘works,’ the impli- teers could have introduced bias. Because of cation being that disclosures would dry up this the mandatory trial described by Gannon in the absence of such a belief. As one crit- et al (2012 and 2014) was considered neces- ic commented in a newspaper article, it relies sary before a decision could be reached about on offenders “not knowing how to use Google” implementing mandatory testing nationwide (London Daily Telegraph, 2012). (it was a requirement set by the UK Parlia- ment). Like the earlier study, a comparison A number of social psychology studies group was used. Unlike it, the mandatory tri- have demonstrated that subjects who believe

102 Polygraph, 2016, 45 (2) Polygraph Testing Of Sex Offenders

they are attached to a ‘lie detector’ appear to place in normal supervision. Whatever the be more honest in their answers to questions reason, the effect deserves increased research regarding attitudes and behaviors, which has attention, and consideration given as to how been interpreted as a reflection of social desir- to enhance it. ability or acquiescence biases (Jones & Sigall, 1971; Roese & Jamieson, 1993). But the ef- One further issue to address in respect fect is not in fact that great – a meta-analysis of disclosures is whether the circumstances of 31 published reports found a mean effect of a polygraph test result in offenders making size of d=.41, which is in the small to moder- false admissions in order to please polygraph ate range (Roese & Jamieson, 1993). examiners or to explain a ‘failed’ test. Because many of the disclosures made in PCSOT are Another factor to take into account in any case difficult if not impossible to verify when considering the ‘bogus pipeline’ hypoth- (for example, how can one determine wheth- esis is that all of the bogus pipeline studies are er or not an offender has been masturbating based on the use of a near 100% lie detector. to deviant fantasies?) it can be a challenge to It is not clear from them what would happen confirm their veracity. What little research if, rather than being sold as being 100% accu- there is in relation to this suggests that false rate, the ‘lie detector’ was instead said to have admissions occur, but not often. Two stud- an accuracy rate “well above chance, though ies have addressed this question using anon- well below perfection” as described by the Na- ymous surveys with sex offenders in the US tional Academies in respect of polygraph test- who were asked whether they had ever made ing (National Research Council, 2003). In as false disclosures in a polygraph test (Grubin yet unpublished research our group found & Madsen, 2006; Kokish et al, 2005). In both that a ‘lie detector’ claimed to have a 75% ac- studies fewer than 10% of offenders indicated curacy rate (i.e., a level below that attribut- that they had done so; in the Grubin & Mad- ed to polygraphy) appears to elicit disclosures sen (2006) study, those who reported making with a frequency similar to that of a near 100% false admissions had higher scores on the accurate lie detector. This would seem to sug- NEO neuroticism scale and lower scores on gest that if part of the increase in disclosures the conscientiousness scale, suggesting that brought about by polygraph testing is due to a those who make false admissions during a belief in its lie detecting properties, then what- polygraph test may share characteristics with ever else it may be the pipeline is not a bogus those who make false confessions in police in- one. terviews (Gudjonnson et al, 2004; Gudjonnson & Pearse, 2011). In any case, while the issue Regardless of the merits and impact is not trivial, it does not seem to be a major of the ‘bogus pipeline effect,’ the much more problem. psychologically interesting question is what makes individuals disclose in this setting any- Proponents of PCSOT argue that what- way, bogus pipeline or not. It may be that ever the reason for increased disclosure by offenders disclose because they believe they offenders who undergo polygraph tests, the will be, or have been, ‘caught out’ by the poly- effect is genuine and valuable. They ask graph, which would be consistent with re- whether critics are really suggesting that this search showing that one of the best predictors information should not be sought or used be- of whether a suspect will confess to a crime is cause of concerns regarding the evidence base the belief that there is good evidence against for the mechanisms that generate it. But res- them (Gudjonsson et al, 2004). As indicated olution of this issue perhaps depends more on above, however, the ‘bogus pipeline effect’ it- how PCSOT is implemented than on the ac- self is unlikely to be the entire reason for in- ademic arguments regarding polygraph itself. creased disclosures, explaining only a small part of the variance. It could be that a poly- The implementation of PCSOT graph test allows the offender an opportunity to change his account in a face-saving manner The initial use of polygraph testing (after all, he was found out by a ‘lie detector’), with sex offenders was as a clinical assess- or it may simply be that the dynamics of the ment to assist treatment providers in gaining interview itself are different from what takes fuller histories with which to inform treatment

Polygraph, 2016, 45 (2) 103 Don Grubin

plans. The term ‘post conviction sex offender should be remembered that polygraphy not test’ started to be used in the 1990s in ref- only detects , it also catches offenders tell- erence to tests administered to individuals ing the truth. under court order, court supervision or court ordered treatment, with the intention of en- Whether or not following a strict con- hancing treatment or improving supervision tainment approach, PCSOT has moved away (Holden, 2000). Its aim was to generate more from being an accessory of treatment to as- complete information about an offender’s his- sume a more central role in offender supervi- tory, sexual interests and functioning, and of- sion. It remains, however, the servant of those fending behavior based on disclosures and test working directly with the offender, functioning outcome. This has been referred to as adding to provide information about whatever is most “incremental validity to treatment planning relevant at the time. In this respect, different and risk management decisions” with which test types are relevant depending on the of- to improve decision making (Colorado, 2011), fender’s circumstances. and can perhaps be thought of more simply as ‘information gain.’ Test structure

In the late 1990s the ‘Containment Before describing the types of test used Model’ was developed by practitioners in Colo- in PCSOT, the basic structure of a polygraph rado (English, 1998). It has since become the session needs to be described. The typical basis of many PCSOT programs in the Unit- format employed in PCSOT is the ‘compari- ed States, although it has not taken root in son question technique.’ It consists of three the United Kingdom. The Containment Model phases: a pre-test interview, the examination refers to a triangle formed by supervision of- itself, and a post-test interview. ficer, treatment provider and polygraph exam- iner, although others may also be involved, in The pre-test is the longest part of the which the offender is ‘contained.’ It depends examination, and can take from one on good communication between agencies, to two hours. Amongst other matters with information obtained by one informing information is collected about the ex- the actions of others. aminee’s background and current be- havior, and the test questions are es- While the Containment Model has tablished and reviewed in detail. Many clear attractions from a public protection per- disclosures take place during this part spective, it implies that all sex offenders re- of the process. quire high levels of external control to keep them from reoffending. Compliance in the im- The polygraph examination consists of mediate term may be obtained, but whether it 10 to 12 questions, of which just 3 or leads to longer term change is uncertain. And 4 target the areas of concern and are though some offenders may require ‘contain- referred to as the ‘relevant questions.’ ment,’ others genuinely seek to improve their Responses to the relevant questions internal controls and engage with treatment are compared with so-called compar- and supervision. In other words, there are ison questions to determine whether some offenders who work with treatment pro- or not they are indicative of deception. viders and supervisors, and there are others More will be said about this shortly. who work against them. For the latter group Polygraph questions need to be simple, containment may be necessary, with the poly- answerable with a yes or no, and relate graph serving primarily as a lie detector to in- to specific behaviour rather than men- dicate when risk is increasing (related to this tal state, intention or motivation. is a finding of Cook et al (2014) that recidivism rates were higher in offenders who avoided or In the post-test interview the outcome delayed their polygraph), but for the former of the exam is fed back, with the ex- group of offenders polygraphy can act as a aminee given an opportunity to explain truth facilitator, encouraging them to discuss deceptive responses. In the UK study problematic thoughts and behaviors and pro- of voluntary testing, one third of dis- viding reassurance that their risk is stable. It closures were made during the post-

104 Polygraph, 2016, 45 (2) Polygraph Testing Of Sex Offenders

test (Grubin, 2010). Test types

As referred to above, in the compari- There are four basic types of polygraph son question technique relevant questions are test used in PCSOT, some of which have vari- evaluated against comparison ones. If physio- ants to them (American Polygraph Associa- logical responses to the former are greater than tion, 2009). the latter, the examinee is judged to be decep- tive; vice versa and the examinee is considered Sex History Exams. The purpose of truthful. The comparison questions often take this test is to obtain a fuller and more accu- the form of a ‘probable lie,, that is, questions rate account of an offender’s sexual history, that the examinee is unlikely to be able to an- including the type and range of deviant be- swer truthfully. Examples of probable lies are, haviors in which he has engaged, the age at ‘have you ever lied to a loved one?’ or ‘have you which they commenced, and his history of in- ever stolen from a family member?’ The theory volvement in unknown or unreported offens- is that truthful subjects will find these ques- es. There are two forms of this exam, one that tions more concerning than the relevant ones focuses on unreported victims of contact of- because of their implications and thus show fenses, the other on sexually deviant behavior greater responses to them, while the deceptive more generally and offenses that don’t involve examinee will be more responsive to the rele- force such as voyeurism or internet related of- vant questions because they represent more fending. The rationale for the separation is of a threat. The strength with which relevant that the more severe potential consequenc- questions exert a greater pull on the examin- es associated with the former behaviors may ee than the comparison ones has been called contaminate responses to the latter. Prior to ‘relevant issue gravity’ (Ginton, 2009), which the polygraph exam the offender completes a is a tidy way of packaging the various cognitive sex history questionnaire, usually as part of processes that determine autonomic arousal sex offender treatment. The questionnaire is in response to polygraph questions. the focus of the examination, but only selected questions are asked during the test itself. The probable lie approach has been criticised on a number of grounds. First, the The intention of the Sex History Exams underlying theory that the differential response is to develop a better understanding of risk and to the two question types relates to truthful in- of treatment need. There can be a tendency, dividuals being more worried about what are however, for examiners to dig for much more in effect less serious comparison questions is detail than is needed to achieve these aims, frankly implausible (Ben-Shakhar, 2008; Na- making the procedure an unrealistic exercise tional Research Council, 2003). But given in recall for the offender as well as a poten- that the technique has been shown to be able tially humiliating one; more information is not to identify deception this suggests we need necessarily better information. In addition, a new theory, not that the technique itself is because it is based on a lengthy questionnaire faulty. Others are concerned that the prob- which covers behaviors that have taken place able lie approach means the test is based on over many years, the risk of false positive out- deceiving the examinee, and requires the ex- comes (that is, wrongly ‘failing’ the test) is in- aminee to be forced into a position of having creased. This is an important consideration to lie (Cross & Saxe, 2001; Meijer et al, 2008; given that about half of American community Vess, 2012). This ethical objection, however, and a third of residential sex offender treat- is based on a misconception – the cognitive ment programs for adult males require the sex work of the probable lie doesn’t arise from the history exam to be passed in order for treat- lie, but from the uncertainty associated with ment to be completed successfully. the question. Indeed, comparison questions can take the form of a ‘directed lie’ in which A further problematic issue associated the examinee is instructed to lie to a ques- with Sex History Exams is what to do about tion such as ‘Have you ever made a mistake?,’ self-incriminating disclosures. Programs typ- which involves neither manipulation nor dis- ically try to get around this be ensuring that honesty. More will be said about directed lies only general information about past offens- later in this chapter. es is obtained, but in some states even this

Polygraph, 2016, 45 (2) 105 Don Grubin

minimal level of disclosure needs to be passed where the offender denies important aspects to the authorities. In reality, however, this is of what took place. A variant of this test re- not a difficulty unique to polygraph testing lates to prior allegations where there hasn’t and applies to treatment programs general- been a conviction. Like the sex history exam, ly. Whatever solution works for the program this test is directly relevant to treatment. Also should be sufficient for PCSOT. like the sex history exam there is a risk that the examiner will go on a fishing exercise seek- The following are two examples of how ing detail that doesn’t take treatment any fur- Sex History Exams can be helpful to treatment ther. Used properly, however, it can overcome (these and subsequent examples are taken denial that is blocking treatment progress. from the UK polygraph trials): Below is an example of how an Instant An offender on parole following a con- Offense Exam assisted treatment in a perhaps viction for the indecent assault of his unexpected way: stepdaughter disclosed during a Sex History Examination a large amount An offender was on license having of previously unknown pornography committed an indecent assault on a use and cross dressing. Subsequent child in a supermarket when intoxicat- to the test he began to discuss this and ed. He admitted the offense, but de- his sexual fantasies more generally in nied any memory of having pushed his treatment for the first time. groin into the girl’s back as reported by her mother even though he accept- An offender in his fifties with no sex ed this could have happened. Much offending history was convicted of in- time was spent in the treatment group ternet related offences. In a Sex Histo- trying to overcome his ‘denial’. On an ry Examination he admitted to stealing Instant Offense Exam he was ques- underwear from his sister’s house, to tioned about his lack of recall, and he sexual fantasies regarding schoolgirls, was found truthful. The consistency of and to sitting in cinema car parks to his self-report taken together with the watch young girls. Based on this and test result led to his account of par- other fantasy related information he tial amnesia being accepting, allowing disclosed during the test new treat- treatment to move beyond this issue. ment targets regarding fantasy and fantasy modification were identified Some critics believe this sort of in- and delivered. formation would be obtained anyway in the course of treatment, but whether or not this is Critics argue that information from the case, supporters of PCSOT argue that the Sex History Exams tell us nothing new in that disclosures come much earlier when polygra- it would be a surprise if offenders hadn’t en- phy is used. There is little evidence with which gaged in deviant behaviours besides their of- to determine either of these claims. fenses, and that there is little evidence to show that the additional information adds meaning- Offenders may see the Instant Offense fully to risk assessment or treatment provi- Exam as an opportunity to prove their ‘inno- sion (Rosky, 2012). This criticism seems odd, cence’ in the face of a wrongful conviction. Al- however, given that sex history questions are though there may be a time and place for this asked routinely in sex offender assessment issue to be explored, PCSOT is not it. The in- and are considered an important part of the stant offense exam, therefore, must be used evaluation, the only difference being that there with caution. is more likelihood of getting an honest account during a polygraph examination. Maintenance Exam. The Maintenance Exam is the workhorse of PCSOT. It addresses Instant Offense Exam.This exam type an offender’s compliance with the terms and explores behavior that took place during the conditions of probation, parole or treatment. instant offense where there is inconsisten- It is a screening test that typically covers a cy between victim and offender accounts, or wide range of issues in the pre-test, following

106 Polygraph, 2016, 45 (2) Polygraph Testing Of Sex Offenders

which 3 or 4 specific questions are asked on absence of disclosures. Given the 10 to 20% the test itself. Maintenance Exams can ad- error rate of polygraph testing it is hard to jus- dress sexual thoughts and fantasies so long tify sanctions such as prison recall based on as they are linked to masturbatory behavior. a failed test alone (although this does occur The aims of the test are to identify behaviors in some US states, it is prohibited in the UK), indicative of increased risk so that interven- but a deceptive test does provide a warning tions can take place, confirm when offenders sign that all may not be well. Depending on are not engaging in problematic behavior, and the risk represented by the offender the re- deter offenders from engaging in risky behav- sponse could range from the probation officer iors in the first place. Its primary purpose is addressing the issue in supervision with him, to prevent reoffending rather than to detect re- to not relaxing restrictions such as curfews or offenses after they have occurred. exclusion zones, to, in especially high risk cas- es, putting the offender under surveillance. Two examples of Maintenance Exams illustrate their potential value: Maintenance Exams are carried out regularly, to set protocols – for example, in the An offender on parole license disclosed UK they take place at 6-monthly intervals, but he had recently started a relationship sooner if the offender fails a test or concerns with a young woman (one of his license emerge between exams. This gives rise to a conditions being that he informed his risk of habituation or sensitization, resulting probation officer of any new relation- in fewer disclosures and false negative test ships). Although that was the extent results (Branaman and Gallagher, 2005). To of his disclosure, his offender manager counter this PCSOT policies usually recom- met with the new girlfriend and found mend that a different examiner is introduced not only that she was a single mother, after a set number of tests have been under- but also that the offender was groom- taken. Again, however, research relating to ing her child in a manner similar to this issue is sparse, and it is not clear the ex- his instant offense. He was recalled to tent to which habituation occurs, or whether prison. the suggested remedy is effective.

An offender with a history of involve- Monitoring Exams. Monitoring Ex- ment with sex offender networks had a ams are specific issue tests that take place license condition not to associate with where there is concern that an offender may known sex offenders. Following a de- have committed a new offense, or breached a ceptive test he admitted to breaching license condition. As in Maintenance Tests, this condition. When his probation no sanction follows a failed test in the absence officer later explored this with him he of disclosure, but a failed test may indicate the admitted to marked feelings of lone- need for further investigation. On the other liness and isolation following a move hand, a passed test can offer reassurance to from a probation hostel. Steps were supervisors. taken to address his isolation, and on his next Maintenance Test he said he The following is an example of how a was no longer reliant on his former sex monitoring exam can contribute to manage- offender contacts and much more set- ment: tled in himself; he showed no signifi- cant responses to questions relating to A 24 year old man was on parole hav- associating with other sex offenders. ing been convicted of unlawful sexual intercourse with a 14 year old girl. His In neither of these cases can it be probation officer believed he was still demonstrated that offenses were prevented, in a sexual relationship with his vic- but it would be hard to argue that the out- tim, but this was persistently denied comes were not worthwhile. by the offender, who was compliant with a night-time curfew and a tag. A difficulty faced by Maintenance Tests He denied any wrongdoing during the is how to respond to a deceptive result in the pre-test interview, but he was decep-

Polygraph, 2016, 45 (2) 107 Don Grubin

tive on the test. In the post-test in- ulate the test. It should also be remembered terview he admitted to regular contact that polygraph examiners read the same web- with the girl as well as a low level of sites as their examinees. sexual activity with her. The probation officer passed this information to the PCSOT, treatment benefit, and risk police and the offender was arrested. reduction When interviewed by the police the girl reported regularly spending a night a Probation officers like PCSOT. In the week in the offender’s home (a place English probation trials (Grubin, 2010; Gan- his tag confirmed him to be), where in non et al, 2014) over 90% rated polygraphy as addition to the sexual activity he had being ‘somewhat’ or ‘very’ helpful, with very few described she said they also engaged tests considered by officers to have had either in sexual intercourse. no or a negative impact. But while subjective- ly probation officers may believe polygraphy makes their jobs easier, this is not the same as being able to demonstrate objectively that PC- Beating the test SOT results in improved treatment outcome or a genuine reduction in risk (Rosky, 2012). Somewhat incongruously, the same critics who argue that polygraphy does not Evidence regarding reduction in recid- reliably differentiate truth telling from decep- ivism is extremely thin, although the absence tion nonetheless also invariably raise the is- of evidence should not be confused with ev- sue of countermeasures, that is, physical or idence of absence. It is difficult to carry out psychological techniques used to manipulate randomized control trials of PCSOT for a range responses on the test to enable examinees to of reasons, not the least of which is a reluc- appear truthful when they are being deceptive tance by criminal justice agencies to ‘experi- (Ben-Shakhar, 2008; London Daily Telegraph, 2012). They argue that false negative findings, ment’ with dangerous sex offenders. Further- whether the result of error or countermea- more, the low levels of recidivism that make sures, mean that ‘dangerous’ offenders can treatment programs difficult to evaluate create ‘beat’ the test and remain free in the commu- similar problems for PCSOT, although signif- nity. icant increases in prison recall for breaches have been demonstrated (Grubin, 2010; Gan- It is almost certainly the case that some non et al, 2014). offenders ‘beat’ the test, but the reality is that without polygraphy many more ‘beat’ their Two early studies, although not of supervisors and treatment providers. For ex- PCSOT per se, point in the right direction. ample, as referred to earlier, in the absence Abrams & Ogard (1986) compared recidivism of polygraphy probation officers are more like- rates of 35 probationers (few of whom were ly to reduce their risk assessments then they sex offenders) from two counties in re- are when polygraphy is used (Grubin, 2010). quired to take periodic polygraph tests, with Decisions, however, should not be based on 243 offenders from a county where supervi- polygraphy alone – PCSOT is just one part of sion did not involve polygraphy. Over 2 years the information package. 31% of the polygraphed men committed an offense or infringement compared with 74% It is also the case that countermea- of those who were not polygraphed. But the sure techniques exist and can be taught, and number of polygraphed offenders was small, there are a number of websites that offer to the samples were not matched, nor is it clear do so. But in order to be successful practice whether there was selection bias in choosing is required – theory is not sufficient, and the those who underwent polygraphy. Also in Or- examinee needs feedback when attached to egon, Edson (1991) reported that 95% of 173 the polygraph (Honts et al, 1985). Most sex sex offenders on parole or probation who were offenders do not have access to this type of required to undertake periodic polygraph test- coaching, and without it their charts usually ing did not reoffend over 9 years, but there show tell-tale signs of their attempts to manip- was no comparison group in this study at all.

108 Polygraph, 2016, 45 (2) Polygraph Testing Of Sex Offenders

McGrath et al (2007) carried out the tion then becomes, ‘is PCSOT worth it?’ one randomized trial of PCSOT in the litera- ture, comparing 104 sex offenders in Vermont Internet offenders who received treatment in programs that in- cluded PCSOT with 104 matched offenders Men who download indecent images in programs where polygraphy was not used. of children from the internet present a par- At 5 year follow-up they found no difference ticular challenge for those carrying out risk in sex offense recidivism rates, but they did assessments. Typically, little is known about find a significantly lower rate of reconviction relevant risk factors and they often have no for non-sexual violent offenses. But though criminal history. It is estimated, however that the study was well designed its results are dif- around 50% of men convicted of internet of- ficult to interpret because while the research fences have committed undetected sexual as- was sound, the way in which PCSOT was de- saults on children, and the majority show pe- livered was not. Offenders undertook poly- dophilic sexual arousal patterns (Seto, 2013). graph examinations on average just once ev- It has been suggested that applying PCSOT ery 22 months, dissipating the likelihood that techniques in a preconviction setting to men polygraphy would have much of an impact on arrested for downloading offenses could assist behavior. Even so, the reduction in violent of- in differentiating low from high risk offenders fending is notable. (where risk relates to contact offending against children), enabling police resources to be bet- In trying to determine the impact of ter focused and criminal justice interventions PCSOT there is another issue to consider. It to be more accurately targeted in terms of cus- is well established in relation to sex offender tody and treatment. That this can be done interventions generally that to be effective they was demonstrated in a small study in which should adhere to the ‘risk-need-responsivity’ 31 apparently low risk internet offenders un- principle – that is, they should target high risk derwent sex history type polygraph examina- individuals, reflect treatment need, and be re- tions preconviction, where it was found that sponsive to cognitive and cultural differences only 8 (26%) could be confirmed as genuinely between offenders (Andrews et al, 2011). PC- low risk (Grubin et al, 2014). A number of po- SOT does not tend to be delivered in this way lice forces in England are now exploring this because it is an assessment procedure rath- application of polygraphy further. er than an intervention as such. After all, a screening technique for a medical condition Legal considerations is not judged on the basis of whether it im- proves survival rates for that condition – that The legal situation in the United King- is the role of what follows – but on its success dom is more straightforward than it is in the in identifying at risk individuals. Expecting United States. In the UK, the Offender Man- PCSOT to reduce recidivism may be an unreal agement Act 2007 sets out the statutory posi- expectation. tion regarding the mandatory testing of sex of- fenders on parole (Offender Management Act, So how then is PCSOT to be judged? 2007). Offenders must have been sentenced Rather than focus on recidivism perhaps at- to a year or more in prison in order to ensure tention should be focused instead on the value that the polygraph condition is proportionate. of the information gained as one would in an The legislation prohibits the use of evidence evaluation of screening instruments generally. from polygraph tests in criminal proceedings, The frequency and content of disclosures, the although this information can form the ba- impact of test outcome on decision making, sis of criminal investigation, and it can also actions taken after a polygraph test could all be used in civil proceedings. The Act is sup- form part of a cost-value analysis to determine ported by a statutory instrument containing the value added by PCSOT compared with the polygraph ‘rules’ which govern the conduct of cost of administering it. In other words, to polygraph sessions and sets out the require- what extent does PCSOT better enable proba- ments that must be met by examiners. The tion officers to monitor risk and initiate timely 2007 legislation mandated a time-limited peri- interventions, and are treatment targets better od to allow mandatory polygraph testing to be identified, when polygraph is used? The ques- evaluated on a basis in a small number

Polygraph, 2016, 45 (2) 109 Don Grubin

of probation regions, after which the Secretary the Parole Board’s decision (Aney v. Canada, of State for Justice was required to return to 2005). In general, however, the Canadian Parliament for approval to extend mandatory Courts allow polygraph disclosures to be used testing nationwide. Following the successful in criminal proceedings so long as the is evaluation of the pilot (Gannon et al, 2012), not told that they came from a polygraph test. Parliamentary approval was granted in 2013, and mandatory testing throughout England Ethics and Wales became effective in January 2014. Commentators rightly distinguish be- Although the Offender Management tween practice standards and ethical princi- Act 2007 prohibits the use of the results of ples, observing that the two do not necessarily mandatory testing in criminal proceedings, overlap (Chaffin, 2011). Even where the deliv- there is no legislation that prevents polygraph ery of PCSOT is well managed and delivered, testing in general from being used as evidence potential ethical objections don’t go away. in the British courts. It is sometimes claimed When discussing PCSOT, a number of ethical that case law prevents the use of polygraph issues are frequently raised. These tend to re- evidence, but this is not true (Stockdale & late to a lack of respect for autonomy, intru- Grubin, 2012). Whether polygraphy evidence siveness, and compulsion, as well as special should be allowed in criminal proceedings is considerations that arise when testing special too complicated an issue to be explored here, groups such as adolescents, the intellectually apart from observing that while polygraphy disabled, and individuals with mental disor- can be a valuable investigative tool it is not der. clear that it can add much to the decision making process in court. Some of these objections relate to a misconception of what happens in PCSOT, The position regarding PCSOT in North others to its questionable implementation. America is more haphazard. The main is- For example, Cross and Saxe (2001) refer to sue for the courts has been whether PCSOT PCSOT as ‘psychological manipulation’ on breaches Fifth Amendment rights against the basis that examiners deceive offenders by self-incrimination. In considering this ques- telling them that the polygraph is error free. tion the Supreme Court ruled in McKune v. While this may occur, it is certainly not good Lile that it does not, albeit in a tight 5 to 4 practice, nor is there any reason for examin- decision. It observed that the treatment pro- ers to make out that the test is any more ac- gram of which it was part served ‘a vital pe- curate than it actually is. Indeed, the Brit- nological purpose’. On the other hand, in ish Psychological Society (2004) observes that United States v. Antelope (2005) the Federal participants should be informed of known er- 9th Circuit Appeal Court ruled that a paroled ror rates, a sentiment with which it is hard to offender could not be compelled to waive his disagree. There is no reason to believe that Fifth Amendment rights and take a polygraph PCSOT would cease to be effective in these cir- exam with the threat of prison recall if he did cumstances. not. This has made it even more necessary for programs to ensure that they properly address Cross and Saxe (2001), Meijer et al the self-incrimination issue, both in terms of (2008) and Vess (2012) all argue that the test PCSOT and more generally. itself is based on deception when the probable lie technique is used given the hypocrisy in- PCSOT is hardly used in Canada (Mc- volved in demanding the offender to be honest. Grath et al, 2010) and it therefore does not Vess (2012) and McGrath et al (2010) wonder appear to have been an issue for the Canadian in addition what damage this might do to the courts, apart from one case where a prisoner therapeutic relationship. But as indicated applied for judicial review of a Parole Board earlier, the probable lie technique is not in fact decision not to release him partly on the ba- dependent on the examinee lying even though sis that the decision was made before he had this is what tends to be taught (indeed, as re- undertaken a polygraph examination – in this ferred to above other critics refer to this theory case the Court decided that the polygraph test being deficient), but on uncertainty. Regard- results would not have changed anything in less, the use of ‘directed lies’ overcomes this

110 Polygraph, 2016, 45 (2) Polygraph Testing Of Sex Offenders

objection, and also avoids the risk of the ex- There remains the question, however, aminee admitting to transgressions that have of special groups. About half of adolescent nothing to do with his sexual risk. treatment programs in the United States, for example, incorporate PCSOT (McGrath et al, Chaffin (2011), although concerned 2010), and the American Polygraph Associ- mainly with the testing of adolescents, focuses ation PCSOT model policy allows for testing on PCSOT ‘extracting confessions’ from exam- juveniles down to the age of 12. As Chaffin inees, stating, “The polygraph is fundamental- (2011) points out, given the increased vulner- ly a coercive interrogation tool for extracting ability of juveniles and adolescents to coercion involuntary confessions” (p. 320). PCSOT, and suggestion, and differences in the way that however, need not, and should not, involve risk, treatment and rehabilitation are concep- interrogation. It is instead an interview pro- tualised in this group, one can’t assume that cess in which lying is explicitly discouraged. PCSOT approaches are appropriate for them. The questions asked during PCSOT are asked He could have added that it is not even clear by assessors and treatment providers anyway that polygraphy itself works in the same way – the fact that PCSOT encourages disclosure as it does in adults given differences in brain of information relevant to treatment and risk maturity and psychological development, and management is in itself not an ethical issue. that the American Polygraph Association age threshold appears arbitrary. Because of these Mandatory PCSOT is of course coercive and similar issues mandatory polygraph test- in that there are penalties for non-cooperation. ing in the UK does not apply to offenders who But PCSOT examinees are convicted offenders, are under the age of 18. who by virtue of their criminal convictions are Does this mean that polygraph test- required to accept a range of restrictive and co- ing of those under 18 is unethical? Testing ercive measures such as conditions on where offenders younger than 18 has its advocates they live, limitations on employment, curfews, (Jensen et al, 2015). Even Chaffin (2011), who and treatment requirements. Indeed, the Eu- considers the ethical concerns to be “substan- ropean Court of Human Rights has ruled that tial,” doesn’t go that far, although his view is (a technique in which contingent on the ability of those supporting penile arousal in response to sexual stimuli is its use in this group to prove that it provides measured and recorded) can be made a com- more benefit than harm. Unless and until this pulsory part of sex offender treatment on the evidence is produced, however, it probably grounds of public safety (Gazan, 2002); one makes sense to use PCSOT with great caution might think this considerably more ‘invasive’ with those under 18, with decisions made on than polygraphy. Provided that the questions a consideration of individual cases rather than asked during the polygraph test are directly based on a blanket policy of PCSOT for all. relevant to treatment or supervision, the pro- cess does not seem any more coercive then In terms of other special groups, such these other measures, or any more morally as those with intellectual disability and men- problematic. tal disorder, the position is similar. PCSOT has the potential to be of benefit, but caution Another objection to PCSOT is that it needs to be used, by examiners who are aware carries with it the implication that sex offend- of the pitfalls. ers are not to be trusted, and that this itself the relationship between supervisors Finally, one might ask whether it is and offenders. There is no evidence, however, unethical not to use PCSOT in the treatment that this is the case, while what evidence there and supervision of sex offenders. If the infor- is suggests it does not (Grubin, 2010). Indeed, mation obtained during polygraph examina- this implication is often implicit in any case. tion adds significantly to what is otherwise One should not underestimate the benefits known about treatment need and risk, is it of an offender being able to demonstrate that right to deny the potential benefits of PCSOT he is being truthful in his dealings with those to an offender? When asked, many offenders supervising him, and the positive impact this themselves reported that they find polygraph can have on the therapeutic relationship. testing to be helpful (Grubin & Madsen, 2006;

Polygraph, 2016, 45 (2) 111 Don Grubin

Kokish et al, 2005). If PCSOT does reduce and whether modifications are necessary de- risk, how can one explain to a future victim pending on the characteristics of the individu- why it did not form part of the offender’s treat- al taking part. In other words, consideration ment and supervision package? should be given to how the ‘risk-needs-re- Conclusion sponsivity’ principle can be made to apply to PCSOT. Does PCSOT increase community safe- ty? Does it enhance sex offender treatment? Although the evidence is supportive, the ben- In the meantime, those who deliver PC- efits of PCSOT have yet to be conclusively SOT need to ensure that examiners are prop- demonstrated. Objections made by many of erly trained and supervised, protocols for the its critics, however, are based on opinion rath- er than fact. But what would count as defin- process are sound, and good quality control itive evidence? For ideological reasons some procedures are in place. In turn, those who will never be convinced. make use of it must know the right questions to ask of it, how much weight to give its re- Given the complexity of sex offender management, simply collecting data on num- sults, and how to integrate it with everything bers of disclosures, reconvictions, and the like else they do with an offender. It should not be will tell us little more than we already know. forgotten, however, that PCSOT remains just More thought needs to be directed to which offenders are most likely to benefit, the needs one tool in the box, and like any tool if it is not that PCSOT should target in those offenders, used with care it can cause harm.

112 Polygraph, 2016, 45 (2) Polygraph Testing Of Sex Offenders

References

Abrams, S. and Ogard, E. (1986). Polygraph surveillance of probationers. Polygraph, 15:174-182.

Alder. K. (2007). The Lie Detectors: The History of an American Obsession. New York: Free Press.

Ahlmeyer, S., Heil, P., McKee, B., English, K. (2000). The impact of polygraphy on admissions of victims and offenses in adult sexual offenders. Sexual Abuse: A Journal of Research and Treatment, 12:123-139.

American Polygraph Association (2009). Model Policy for Post-Conviction Sex Offender Testing. Retrieved at: http://www.polygraph.org/files/model_policy_for_post-conviction_sex_ offender_testing_final.pdf

American Polygraph Association (2011). Committee Report on Validated Techniques, Polygraph, 40:194-305.

Andrews, D. A., Bonta, J., Wormith, J. S. (2011). The Risk-Need-Responsivity (RNR) Model: Does Adding the Good Lives Model Contribute to Effective Crime Prevention? Criminal Justice and Behavior, 38:735-755.

Aney v. Canada (2005). 2005 FC 182.

Ben-Shakhar, G. (2008). The case against the use of polygraph examinations to monitor post- conviction sex offenders. Legal and Criminological Psychology, 13:191-207.

Bond, C.F. and DePaulo, B.M. (2006). Accuracy of deception judgments. Personality and Social Psychology Review, 10:213-234.

Branaman, T. F. and Gallagher, S. N. (2005). Polygraph testing in sex offender treatment: A review of limitations. American Journal of Forensic Psychology, 23:45-64.

British Psychological Society (1986). Report of the working group on the use of the polygraph in criminal investigation and personnel screening. Bulletin of the British Psychological Society, 39:81-94.

British Psychological Society (2004). A Review of the Current Scientific Status and Fields of Application of Polygraphic Deception Detection. Report (26/05/04) from the BPS Working Party. Retrieved from: http://www.bps.org.uk/sites/default/files/documents/polygraphic_ deception_detection_-_a_review_of_the_current_scientific_status_and_fields_of_application. pdf.

Chaffin, M. (2011). The Case of Juvenile Polygraphy as a Clinical Ethics Dilemma. Sexual Abuse: A Journal of Research and Treatment, 23:314-328.

Colorado (2011). Colorado Sex Offender Management Board: Standards and Guidelines for the Assessment, Evaluation, Treatment and Behavioral Monitoring of Adult Sex Offenders. Retrieved from: https://cdpsdocs.state.co.us/somb/ADULT/FINAL_2012_Adult_Standards_120712. pdf

Cook, R., Barkley, W.,. Anderson, P. B. (2014). The sexual history polygraph examination and its influences on recidivism. Journal of Social Change, 6:1-10.

Cross, T. and Saxe, L. (2001). Polygraph testing and sexual abuse: The lure of the magic lasso. Child Maltreatment, 6:195-206.

Polygraph, 2016, 45 (2) 113 Don Grubin

Edson, C.F. (1991). Sex Offender Treatment. Jackson County, OR: Department of Corrections.

English, K. (1998). The containment approach: An aggressive strategy for the community management of adult sex offenders. Psychology, Public Policy, & Law Special Issue: Sex Offenders: Scientific, Legal, and Public Policy Perspectives, 4:218-235.

Gannon, T. A., Wood, J. L., Vasquez, E. A., Fraser, I. (2012). The Evaluation of the Mandatory Polygraph Pilot (Ministry of Justice Research Series 14/12). Retrieved from https://www.gov. uk/government/uploads/system/uploads/attachment_data/file/217436/evaluation-of- mandatory-polygraph-pilot.pdf.

Gannon, T. A., Wood, J. L., Pina, A., Tyler, N., Barnoux, M. F., Vasquez, E. A. (2014). An evaluation of mandatory polygraph testing for sexual offenders in the United Kingdom. Sexual Abuse: A Journal of Research and Treatment, 26:178-203.

Gazan, F. (2002). Penile Plethysmography Before the European Court of Human Rights. Sexual Abuse: A Journal of Research and Treatment, 14:89-93.

Ginton, A. (2009). Relevant Issue Gravity (RIG) strength – a new concept in PDD that reframes the notion of psychological set and the role of attention in CQT polygraph examinations. Polygraph, 38:204-217.

Grubin, D. (2008). The case for polygraph testing of sex offenders. Legal and Criminological Psychology, 13:177-189.

Grubin, D. (2010) A Trial of Voluntary Polygraphy Testing in 10 English Probation Areas. Sex Abuse, 22:266-278.

Grubin, D., Madsen, L., Parsons, S., Sosnowski, D., Warberg, B. (2004). A prospective study of the impact of polygraphy on high-risk behaviors in adult sex offenders. Sexual Abuse: A Journal of Research and Treatment, 16:209-222.

Grubin, D. and Madsen, L. (2006). The accuracy and utility of post conviction polygraph testing with sex offenders. British Journal of Psychiatry, 188:479-483.

Grubin, D., Joyce, A., Holden, E. J. (2014). Polygraph Testing of ‘Low Risk’ Offenders Arrested for Downloading Indecent Images of Children. Sexual Offender Treatment, 9:1-10.

Gudjonsson, G. H., Sigurdsson, J. F., Bragason, O. O., Einarsson, E., Valdimarsdottir, E. V. (2004). Confessins and denials and the relationship with personality. Legal and Criminological Psychology, 9:121-133.

Gudjonnson, G. H. and Pearse, J. (2011). Suspect interviews and false confessions. Current Directions in Psychological Sciences, 20:33-37.

Heil, P., Ahlmeyer, S., Simons, D. (2003). Crossover sexual offenses. Sexual Abuse: A Journal of Research and Treatment, 15:221-236.

Hindman, J. and Peters, J. (2001). Polygraph testing leads to better understanding adult and juvenile sex offenders. Federal Probation, 65:8-15.

Holden, E. J. (2000). Pre and post-conviction polygraph: Building blocks for the future – procedures, principles, and practices. Polygraph, 29:69-92

Honts, C. R., Hodes, R. L., Raskin, D. C. (1985). Effects of physical countermeasures on the

114 Polygraph, 2016, 45 (2) Polygraph Testing Of Sex Offenders

physiological detection of deception. Journal of Applied Psychology, 70:177-187.

Iacono, W. G. (2008). Effective policing understanding how polygraph tests work and are used. Criminal Justice and Behavior, 35:1295-1308.

Jensen, T. M., Shafer, K., Roby, C., Roby, J. L. (2015). Sexual history disclosure polygraph outcomes: Do juvenile and adult sex offenders differ? Journal of Interpersonal Violence, 30:928-944.

Jones, E. E. and Sigall, H. (1971). The bogus pipeline: A new paradigm for measuring affect and attitude. Psychological Bulletin, 76:349-364.

Kokish, R., Levenson, J., Blasingame, G. (2005). Post-conviction sex offender polygraph examination: Client-reported perceptions of utility and accuracy. Sexual Abuse: A Journal of Research and Treatment, 17:211-221.

Krapohl, D. J. and Shaw, P. K. (2015). Fundamentals of Polygraph Practice. : Elsevier.

Levenson, J. S. (2009). Sex offender polygraph examination: An evidence-based case management tool for social workers. Journal of Evidence-Based Social Work, 6:361-375.

London Daily Telegraph (2012). The Awkward Truth About Lie Detectors. Retrieved from http:// blogs.telegraph.co.uk/news/tomchiversscience/100173508/the-awkward-truth-about-lie- detectors.

McKune v. Lile (2002). 536 U.S. 24.

McGrath, R. J., Cumming, G. F., Hoke, S.E., Bonn-Miller, M. O. (2007). Outcomes in a community sex offender treatment program: A comparison between polygraphed and matched non- polygraphed offenders. Sexual Abuse: A Journal of Research and Treatment, 19:381-393.

McGrath, R. J., Cumming, G. F., Burchard, B. L., Zeoli, S., Ellerby, L. (2010). Current Practices and Emerging Trends in Sexual Abuser Management: The Safer Society 2009 North American Survey. Brandon, VT: Safer Society Press.

Madsen, L., Parsons, S., Grubin, D. (2004). A preliminary study of the contribution of periodic polygraph testing to the treatment and supervision of sex offenders. British Journal of Forensic Psychiatry and Psychology, 15:682-695.

Matte, J. A. (1996). Forensic Psychophysiology Using the Polygraph: Scientific Truth Verification, Lie Detection. New York: J.A.M. Publications.

Meijer, E. H., Verschuere, B., Merckelbach, H., Crombez, G. (2008). Sex offender management using the polygraph: A critical review. International Journal of Law and Psychiatry, 31:423-429

National Research Council (2003). The polygraph and lie detection. Committee to Review the Scientific Evidence on the Polygraph. Washington, DC: The National Academic Press.

Nelson, R. (2015). Scientific basis of polygraph testing. Polygraph, 44:28-61.

Nelson, R., Handler, M., Shaw, P. Gougler, M. Blalock, B., Russell, C., Cushman, B., Oelrich, M. (2011). Using the empirical scoring system. Polygraph, 40:67-78.

Offender Management Act 2007. Retrieved at: http://www.legislation.gov.uk/ukpga/2007/21/ pdfs/ukpga_20070021_en.pdf.

Polygraph, 2016, 45 (2) 115 Don Grubin

Patrick, C. J. and Iacono, W. G. (1989). Psychopathy, threat and polygraph test accuracy. Journal of Applied Psychology, 74:347-355.

Raskin, D.C. and Hare, R. D. (1978). Psychopathy and detection of deception in a prison population. Psychophysiology, 15:126–36.

Raskin, D. C., Honts, C. R., Kircher, J. C. (Eds.) (2014). Credibility Assessment: Scientific Research and Applications San Diego: Elsevier.

Roese, N. J. and Jamieson, D. W. (1993). Twenty years of bogus pipeline research: a critical review and meta-analysis. Psychological Bulletin, 114:363-381.

Rosky, J. W. (2013). The (F)utility of post-conviction polygraph testing. Sexual Abuse: A Journal of Research and Treatment, 25:259-281.

Senter, S., Weatherman, D., Krapohl, D., Horvath, F. (2010). Psychological set and differential salience: A proposal for reconciling theory and terminology in polygraph testing. Polygraph, 39:109-117.

Seto, M. (2013). Internet Sex Offenders. Washington D.C.: American Psychological Association.

Stockdale, M. and Grubin, D. (2012). The admissibility of polygraph evidence in English criminal proceedings. Journal of Criminal Law, 76:232-253.

United States v. Antelope (2005). 395 F.3d 1128.

Vess, J. (2011). Ethical practice in sex offender assessment: Consideration of actuarial and polygraph methods. Sexual Abuse: A Journal of Research and Treatment, 23:381-396.

Vrij, A. (2000). Detecting Lies and Deceit. Chichester, England: Wiley.

116 Polygraph, 2016, 45 (2) Nelson

Scientific Basis for Polygraph Testing

Raymond Nelson

Abstract

Published scientific literature is reviewed for comparison question polygraph testing and its application to diagnostic and screening contexts. The review summarizes the literature for all aspects of the testing procedure including the pretest interview, test data collection, test data analysis, and a proffer of the physiological and psychological basis for polygraph testing. Polygraph accuracy information is summarized for diagnostic and screening exams. Evidence is reviewed for threats to polygraph accuracy and the contribution of polygraph results to incremental validity or increased decision accuracy by professional consumers of polygraph test results. The polygraph is described as a probabilistic and non-deterministic test, involving both physiological recording and statistical methods. Probabilistic tests, statistical models, and scientific tests in general are needed when neither deterministic observation nor physical measurement are possible. Event-specific diagnostic polygraphs have been shown to provide mean accuracy of .89 with a 95% confidence range from .83 to .95. Multi-issue screening polygraphs have been shown to provide accuracy rates, with a mean of .85 and a 95% confidence range of .77 to .93.

Keywords: Polygraph, lie detection, signal detection, test data analysis, scientific basis.

Polygraph examinations, like other for the selection of a testing technique that scientific and forensic tests, can take the provides something less than the highest form of either diagnostic test or screening achievable level of diagnostic accuracy. tests. The difference between diagnostic and Diagnostic tests achieve high levels of screening exams is that diagnostic decision accuracy, in part, by restricting to examinations involve the existence of a the test to a single issue of concern. known problem, in the form of symptoms, evidence, allegations, or incidental In contrast, screening tests are circumstances that suggest an individual intended to add incremental validity to risk may have some involvement, for which the management decisions that are made in the examination results are intended to support a absence of any known problem. This is positive or negative diagnostic conclusion. accomplished both by gathering information Screening tests include all tests conducted in and by investigating the possible involvement the absence of a known incident, known of an individual in one or more issues of allegation, or known problem. concern. Screening tests should not be used alone as the basis for action that may affect The purpose of diagnostic tests is to an individual´s rights, liberties, or health. form a conclusion that may serve as a basis Absence of any known problem is the for action. This action will often affect the defining characteristic of a screening test future of an individual in term of rights, (Wilson, & Jungner, 1968; Raffle, & Muir liberties or health. For this reason, it is Gray, 2007). Screening polygraphs tests difficult to imagine an ethical justification address the objective of adding incrementally

Author note: Raymond Nelson is a polygraph examiner and psychotherapist who has authored numerous papers and studies on many aspects of the polygraph test. Mr. Nelson is a researcher associated with the Lafayette Instrument Company, is the curriculum director of an accredited polygraph training program, and is an elected member of the Board of Directors of the American Polygraph Association - currently serving in the role of President. Mr. Nelson has taught at several polygraph training programs, is an active speaker and presenter at both international conferences, and has been an expert witness in several court jurisdictions regarding both polygraph and psychotherapy matters. The views and opinions expressed in this publication are those of the author and not necessarily those of the Lafayette Instrument Company or the American Polygraph Association. Inquiries and correspondence can be sent to [email protected].

28 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

to risk management through a combination that describe their possible involvement in of smaller goals that may include both the problematic behaviors, while also sensitizing discriminate ability of the test result, and the or increasing the awareness and response capability of the testing process to develop potential of deceptive examinees to test information. Polygraph screening programs questions that describe their past behavior. can also involve a third goal in the form of increased deterrence of problems (American The polygraph pretest interview is a Polygraph Association, 2009a; 2009b). process, involving several steps (American Deterrence objectives may be achieved by Polygraph Association, 2009a, 2009b; deterring higher-risk persons from access to Department of Defense Polygraph Institute, high-risk environments (e.g., police applicant 2002), including: a free-narrative interview screening or government/operational security (Powell & Snow, 2007), semi-structured screening), or through dissuasion of (or interview (Lindlof & Taylor, 2002), or decreased) non-compliance with policies, structured interview (Drever, 1995), a rules, and regulations (e.g., operational thorough review of the test question stimuli, security policies). and a practice or orientation test. The first objective of the pretest interview is to Discussion establish a positive identification and introduction, and to clarify the roles of the According to the American Polygraph examiner and examinee. The examiner will Association (2011), polygraph examinations also introduce the examinee to the consist of three phases: 1) a pretest interview, examination room, including the use of audio 2) an in-test data collection phase, and 3) test or video recording devices, and all of the data analysis. Each of these phases has an polygraph sensors that will later be attached important effect on both test accuracy and to the examinee. the usefulness of the test result. For this reason, all assumptions and procedures The next stage of the process consists considered fundamental to the polygraph test of making an initial determination of the should ideally be based on generally accepted suitability of the examinee and obtaining knowledge or evidence and theoretical informed consent for testing. This is done constructs for which there exists published after a review of the rights of the examinee and replicated empirical support. during testing, including the right to terminate the examination at any time. Polygraph pretest interview Ideally, informed consent should also include information about who will receive the At its most basic, an interview is information and results from the merely a conversation with a purpose examination, and where to obtain more (Hodgson, 1987), and, as indicated by Kahn information about the strengths and and Cannel (1957), the success of many limitations of the polygraph procedure. The professional endeavors depends in part on examiner will then engage the examinee in a the ability to get information from others. The brief discussion about the case background polygraph pretest interview is intended to and personal background of the examinee, in orient the examinee to the testing procedures, order to continue to establish an adequate the purpose of the test, and the investigation and suitable testing rapport. The examiner target questions. The basic premise of will also provide more information about the interviewing holds that people will report psychological and physiological basis for the more useful information when they are polygraph test, and will provide answers to prompted to do so by an interested listener any questions the examinee may have who builds rapport through the use of regarding the testing procedures. conversation and interview questions. Polygraph pretest interviews are intended to A practice test or acquaintance test allow truthful examinees to become should be conducted as part of standardized accustomed to - or habituated to - the field practice (American Polygraph cognitive and emotional impact of hearing Association, 2009a, 2009b; Department of and responding to test stimulus questions Defense, 2006a). The purpose of this test is to

Polygraph, 2015, 44(1) 29 Nelson

orient the examinee to the testing procedure screening exams, whether pertaining to before commencing the actual examination. operational security, law enforcement pre- Research by Kircher, Packard, Bell & employment, or post-conviction supervision, Bernhardt (2001) has shown that this can will take the form of either a structured contribute to increased test accuracy. A interview or semi-structured interview. scientific view, supported by recent studies involving non-naive examinees who are fully Structured interviews differ from aware of the details of the testing procedures semi-structured interview in that structured (Honts & Reavy, 2009; Honts & Alloway, interviews are conducted verbatim, without 2007; Nelson, Handler, Blalock & Hernandez, deviation from the interview protocol (General 2012; Rovner, 1986), holds that the Accounting Office, 1991; Campion, Campion, effectiveness of evidence-based scientific tests & & Hudson, 1994; Kvale, 1996). In contrast, is not dependent on the examinee's belief semi-structured interviews are conducted system. The purpose of the acquaintance test using a structured content and question is not to demonstrate or convince the outline, for which the interviewer is permitted examinee to believe that the polygraph test is to present interview questions in a manner infallible, but to orient the examinee to the that is individualized based on the testing procedures. Regardless of the personalities, education levels, and rapport examiner's and examinee's attitudes or between the interviewer and interviewee. beliefs concerning the acquaintance test, Although structured interviews may be scientific studies (Bradley & Janisse, 1981; preferred by some researchers and program Horneman & O'Gorman, 1985; Horowitz, administrators for their consistency, Kircher & Raskin, 1986; Kirby, 1981; Widup, structured interviews make little use of the R, Jr & Barland, 1994) have shown that the skill and expertise of the interviewer. use of an acquaintance test does not harm and may at times increase the accuracy of Semi-structured interviews are the polygraph examination result. The actual intended to make more effective use of reason for this effect may have more to do interviewer skill and expertise to access rich with ensuring that the instrument and information regarding the interview content. sensors are adjusted and functioning Like structured interviews, semi-structured adequately and that the examinee has had an interviews should be anchored by a defined opportunity to practice complying with interview schedule or interview protocol, with behavioral instructions. clearly formulated operational definitions that describe the behavioral issues of concern. The next stage of the pretest interview Compared to structured interview methods, will be a free-narrative interview, a structured semi-structured interview strategies both interview or semi-structured interview. Free- depend on and foster greater interviewing narrative interviews are characterized by the skill. Like structured interview methods, use of simple and common language, an semi-structured interview protocols require absence of coercive techniques, an that all interview topics and questions are opportunity for the interviewee to addressed at some point during an interview. communicate details at the level of one's own choosing, along with encouragement to In the last stage of the pretest elaborate. Free narrative interviews interview – following the free-narrative conducted during polygraph testing may interview or semi-structured interview – the include direct or probing questions regarding examiner will develop and review the test a known or alleged incident, before questions with the examinee (American proceeding to construct polygraph test Polygraph Association, 2009a, 2009b; questions. Free-narrative interview strategies Department of Defense, 2002). Test question are useful during diagnostic investigations, language will be adjusted to ensure correct but are not well suited toward use in understanding and to account for information polygraph screening tests which are or admissions that the examinee may provide conducted in the absence of a known or during the interview or while developing the alleged incident. Pretest interviews for test questions. Relevant questions will screening exams conducted during polygraph describe the possible behavioral involvement

30 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

of the examinee in the issue or issues of diagnostic polygraphs are constructed with concern. These questions will generally avoid the assumption of non-independent criterion issues related to memory, intent, and variance. The scientific and probabilistic motivation. However, some investigative meaning of this is that the RQs have a testing protocols will allow for test questions common or shared source of response regarding memory if an examinee admits the variance because the external criterion states alleged behavioral act and the issue of of different RQs may (and do) affect one memory or motivation is target of the another. The practical meaning of this is that investigation (American Polygraph all RQs must address behavior within a single Association, 2009a). incident of concern.

When a polygraph examination Multi-issue screening polygraphs, consists of multiple series of test questions, conducted in the absence of a known the examiner will review each series of allegation or incident, may be constructed questions separately, then conduct the in- with relevant questions that describe distinct test data collection phase for each question behaviors for which the external criterion series questions before reviewing and states are assumed to vary independently collecting data for each subsequent question (i.e., external criterion states are assumed to series. When a polygraph consists of multiple be exclusive or not interact and affect one series of test questions, there is no evaluation another). There is evidence that response or discussion of the results of any individual variance for these questions is not actually series of questions until all test question independent (Barland, Honts & Barger, 1989; series have been fully recorded and analyzed. Podlesny & Truslow, 1993; Raskin, Honts & If an acquaintance test was not conducted Kircher, 2014), and for this reason field earlier it may be conducted after reviewing practices do not permit both positive and the test questions and before proceeding to negative test results within a single the in-test phase of the exam. Some earlier examination. Regardless of whether polygraph testing formats employed a conducted for diagnostic or screening procedure analogous to an acquaintance test, purposes, all polygraph examinations are though following the first presentation of the ultimately interpreted at the level of the test test stimuli during the in-test phase of the as a whole, though subtotal scores for exam. individual RQs may be evaluated according to standardized procedures. In-test data collection Most polygraph examinations in the The second phase of the polygraph United States today are conducted with some examination is that of in-test data collection. variant of the comparison question technique This may be accomplished using any of a (CQT). The CQT was first described in variety of validated diagnostic or screening publication by Summers (1939) while he was test formats (American Polygraph head of the Psychology Department at the Association, 2011b; Department of Defense, Fordham University Graduate School in New 2002). All screening and diagnostic polygraph York. The CQT was popularized within the techniques include relevant questions (RQs) polygraph profession by Reid (1947) and that describe the examinee's possible Backster (1963). It is the most commonly involvement in the behavioral issues under used and exhaustively researched family of investigation. Effective relevant questions will polygraph techniques in use today. In be simple, direct, and should avoid legal or addition to RQs, these polygraph techniques clinical jargon and words for which the also include comparison questions (CQs; correct meaning may be ambiguous, referred to in earlier polygraph literature as confusing or not recognizable to persons control questions). When scoring a test, unfamiliar with legal or professional examiners will numerically and statistically vocabulary. Each relevant questions must evaluate differences in responses to RQs and address a single behavioral issue. CQs.

Relevant questions of event-specific The traditional form of comparison

Polygraph, 2015, 44(1) 31 Nelson

question is the probable-lie comparison (PLC) 1997) and at the U.S. Department of Defense questions, while some evidence-based (Honts & Reavy, 2009). The major difference contemporary CQTs make use of the directed- between PLC and DLC techniques is that DLC lie comparison (DLC). Examiners who use techniques are transparent and can be used PLCs will maneuver the examinee into without the need to maneuver or manipulate denying a common behavioral issue that is the examinee into denying a common not the target of the examination. Probable-lie behavioral issue. comparison questions have been the basis of some criticism of the polygraph technique DLCs have been shown in numerous due to their manipulative nature, and also studies, summarized by Blalock, Nelson, the uncertainty surrounding the veracity of Handler and Shaw (2011; 2012), to perform the examinee regarding these questions classification tasks with equal efficiency and (Furedy, 1989; Lykken, 1981; Office of similar statistical distributions of numerical Technology Assessment, 1983; Saxe, 1991). scores (American Polygraph Association, Some of these criticisms rest on an 2011) compared to PLC exams. Some inaccurate assumption that the polygraph researchers have suggested that DLCs are measures actual lies per se. The polygraph, less ethically complicated than PLCs because like many scientific tests, records responses they do not require the examiner to to stimuli. Polygraph instruments do not psychologically manipulate the examinee actually measure lies, but instead (Honts & Raskin, 1988; Honts & Reavy, 2009; discriminate deception and truth-telling Horowitz et al., 1997; Kircher, Packard, Bell, through the use of probability models and & Bernhardt, 2001; Raskin & Kircher, 1990). statistical reference data that describe the DLC examinations have also been shown to differences in the patterns of reactions of retain effectiveness in different languages and truthful and deceptive persons when cultures (Nelson, Handler & Morgan, 2012). responding to RQs and CQs. In addition to PLC and DLC questions, Although not a control in the strictest other variants of comparison questions have sense, CQs serve a similar function as a been suggested and argued, including control in that they allow an examiner to exclusive comparison questions and non- effectively parse and compare diagnostic exclusive (i.e., inclusive) comparison variance and other sources of variance. questions. Studies have shown all of these Response variance of CQs is not completely CQ variants to perform with similar independent of the investigation target issue effectiveness, for which accuracy does not in the same way that scientific controls are - differ at a statistically significant level (Amsel, because responses for CQs and RQs are from 1999; Honts & Reavy, 2009; Horvath & the same examinee. This model of testing can Palmatier, 2008; Horvath, 1988; Palmatier, be thought of as analogous to the way that 1991). A recent meta-analytic survey data is acquired from the subjects in a two- (American Polygraph Association, 2011b) has way repeat measures ANOVA design - in further solidified this conclusion, in which each subject serves as his or her own demonstrating that the same polygraph control set. In this way, each polygraph techniques perform with equivalent examination serves as a form of single effectiveness and no significant differences in subject scientific experiment. the sampling distributions of criterion deceptive and criterion truthful scores, when Directed lie comparison (DLC) the techniques are employed with PLC or questions have been introduced as an DLC questions. Scientific assumptions alternative to the use of PLCs (Barland, 1981; underlying scoring models for PLC and DLC Research Division Staff, 1995a; 1995b). DLCs techniques, assuming only that examinees are used in polygraph techniques developed will respond differently to relevant and by the United States (U.S.) government for comparison stimuli as a function of deception use in polygraph screening programs, and in in response to RQs. diagnostic polygraph techniques developed by researchers at the University of Utah (Honts All polygraph techniques may include & Raskin, 1988; Kircher, Honts & Raskin, other procedural questions that are not

32 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

numerically scored. Procedural questions a minimum of three times and as many as designed for other technical examination five times. The common method is to repeat purposes have not been supported by the entire series of test questions, while scientific studies, including: overall truth pausing the recording and deflating the questions (Abrams, 1984; Hilliard, 1979), cardio sensor in between repetitions. Some outside issue questions - referred to also as examination protocols (Department of “symptomatic” questions - that attempt to Defense 1995a, 1995b; Handler, Nelson & inquire about interference from outside the Blalock, 2008) achieve several repetitions of scope of the examination questions (Honts, the test questions without pausing the Amato & Gordon, 2004; Krapohl & Ryan, examination. 2001), guilt-complex questions (Podlesny, Raskin & Barland, 1976), and sacrifice Test data analysis – scoring of polygraph relevant questions regarding an examinee's examinations intent to answer truthfully (Capps, 1991; Horvath, 1994). The absence of evidence to Prior to informing the examinee or support their validity has led to the others of the results of the polygraph abandonment of the use of most of technical examination, the examiner must analyze the questions. test data. Procedures for test data analysis are designed to partition and compare the Only two non-scored technical sources of response variance: variance in questions remain widely used today, and response to RQs and variance in response to these are used only in a procedural sense. CQs. Responses are numerically coded and Though not numerically scored, and not the result is compared to cutscores that included in structured or statistical decision represent normative expectations for models, outside issue questions have been deceptive or truthful persons. The retained as a structural and procedural part overarching theory of polygraph testing is of some test formats. Likewise, sacrifice that responses to RQs and CQs vary questions are valued for the purported significantly as a function of deception and purpose of absorbing and discarding the truth-telling in response to the RQs. examinee's initial response to the first question that describes the investigation The basic premise of numerical target issue. These questions are also not scoring of polygraph exams was first numerically scored and not included in described by Kubis (1962) a method similar structured or statistical decision models. Un- to that of Likert (1932) who showed how to scored sacrifice questions are included in reduce subjectivity using numerical coding of virtually all modern polygraph techniques in ordinal non-linear response data. Numerical use today. scoring was popularized within the polygraph profession by Backster (1963) as the seven- A basic principle of measurement and position scoring system, and has been testing is to obtain several measurements for subject to further development and each issue of concern. This is accomplished refinement through empirical study. The during polygraph testing by the use of several procedural construct for evaluation of component sensors, each of which is differences in reaction to RQs and CQs can designed to monitor increases or changes in be traced to Summers (1939), who used a activity in the autonomic nervous system, question sequence consisting of three and by the standard practice of aggregating relevant target questions and three or combining the responses to several comparative response questions repeated presentations of each test stimulus (Bell, three times. Resulting data are found to Raskin, Honts, & Kircher, 1999; Kircher & produce different distributions for the Raskin, 1988; Raskin, Kircher, Honts, & different criterion groups, and these Horowitz, 1988; Reid, 1947; Research distributions can be used to classify other Division Staff, 1995a; 1995b). Field polygraph case observations. Variants of this model are procedures (Handler & Nelson, 2008; Kircher observed in both signal detection and signal & Raskin, 1988; Department of Defense, discrimination theory (Wickens, 1991; 2002). 2006a) require that test stimuli are presented

Polygraph, 2015, 44(1) 33 Nelson

There are three commonly used (Greenberg, 1982; Lehmann, 1950; Lieblich variants of the seven-position scoring system Ben-Shakhar, Kugelmass, & Cohen, 1978; in use today, including the model developed Pratt, Raiffa, & Schlaifer, 1995; Wald, 1939;), by the U.S. Government (Department of like statistical learning theory (Hastie, Defense, 2006a; 2006b), the model published Tibshirani & Friedman, 2001) is concerned by ASTM International (2002), and the one with making optimal decisions. Signal developed by researchers from the University detection theory is concerned with identifying of Utah (Bell, Raskin, Honts, & Kircher, 1999; and separating useful information from Kircher & Raskin, 1988; Raskin & Hare, background noise or random information 1978) and described by Handler (2006) and (Green, & Swets. 1966; Marcum, 1947; Handler and Nelson (2008). Differences Schonhoff, & Giordano, 2006; Swets,1964; between these seven-position methods are Swets, 1996; Tanner, Wilson, & Swets, 1954). procedural and may be inconsequential in Signal detection theory includes two terms of test accuracy (American Polygraph fundamental models: signal detection (e.g., Association, 2011b). Yes or No) (Wickens, 2002) and signal discrimination models (e.g., A or B) (Wickens, A commonly used modification of the 1991). CQT polygraph testing represents the seven-position scoring model is the three- second form, of these two, in that polygraph position system defined by the Department of decisions attempt to achieve diagnostic Defense (2006a, 2006b). Three-position accuracy when placing or predicting scoring models are favored by some individual examinee membership into examiners for their simplicity and reliability, criterion categories of deception and truth- though there is a known increase in telling. Recent efforts (Nelson, Krapohl & inconclusive test results (American Polygraph Handler, 2008; Nelson et al., 2011; Nelson & Association, 2011b) when using numerical Handler, 2010) have begun to make more cutscores intended for seven-position scores. extensive use of statistical decision theory to The Empirical Scoring System (Nelson, quantify the probability of erroneous Krapohl & Handler, 2008; Nelson & Handler, polygraph test results. 2010; Nelson et al., 2011) is a statistically referenced, standardized, and evidence-based Physiological reaction features. modification of the three-position and seven- Scoring of polygraph examinations begins position scoring models, with test accuracy with the identification of observable or comparable to other scoring models, though measurable physiological responses that are without the increase in inconclusive results. correlated with deception and which can be combined into an efficient and effective As a theoretical matter, scoring of diagnostic model. A number of studies - polygraph examinations is not different from largely funded by the U.S. Department of the evaluation of other scientific tests in Defense and conducted at the University of medicine, psychology, and forensics, and Utah and Johns Hopkins University - have involves four basic concerns: 1) the described investigations into the identification of observable or measurable identification and extraction of polygraph criteria (i.e., scoring features), 2) scoring features (Bell, Raskin, Honts, & transformation of scoring features to Kircher, 1999; Harris, Horner, & McQuarrie, numerical values, and reduction of numerical 2000; Kircher, Kristjiansson, Gardner, & values to a grand total index for the Webb, 2005; Kircher & Raskin, 1988; Raskin examination as a whole and subtotal indices et al., 1988). These efforts are reflected in for the individual examination items, 3) field practice standards published by the statistical reference distributions to calculate Department of Defense (2006a; 2006b), by statistical classifiers and numerical ASTM International (2002), and in cutscores, and 4) structured decision policies. publications on the Empirical Scoring System (Nelson & Handler, 2010; Nelson et al., 2011). Discussions of polygraph scoring methods are inseparable from discussions of A small number of physiological test theory, including both decision theory indicators have repeatedly shown to be and signal detection theory. Decision theory correlated with deception in structural

34 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

decision models presently used in field Numerical_transformations. polygraph programs. They are: respiration – Numerical scores, in the form of non- observed as the respiration line length parametric integers of positive or negative (Kircher & Raskin, 1988), respiration value, are assigned to each presentation of excursion length (Kircher & Raskin, 2002), or each RQs by comparing the strength of as either sustained decreases in respiration reaction to each RQ with the strength of amplitude for three or more respiratory reaction to the CQs presented in sequence cycles, slowing of respiration rate for three or with the RQs. A fundamental assumption more cycles, or temporary increases in during comparison question testing, is that respiratory baseline of three cycles or more, both truthful and deceptive examinees may and apnea; electrodermal activity – observed exhibit some degree of reaction to relevant or measured as an increase in skin questions stimuli. Indeed, Ansley (1999), conductance (decrease in resistance), Ansley and Krapohl (2000) and Offe and Offe increased duration of response, and multiple (2007) have shown empirically that it is not responses; and cardiovascular activity – in the presence or absence of a response, nor the form of increase in relative blood the linear magnitude of response to the pressure, increased duration of response, relevant questions that discriminates slowing of heart-rate, and decrease in finger deception from truth-telling. Instead, the blood-pulse volume. simple relative magnitude or degree of response to CQs, relative to the degree of Scoring features have been described response to the RQs, is the differentiating as either primary or secondary (Bell, et al.,, characteristic between deceptive and truthful 1999; Department of Defense, 2006a; 2006b). examinees. Primary features are those that capture the greatest degree of variance in deceptive and Deceptive examinees generally exhibit truthful responses to RQs and CQs within in larger magnitude of change in autonomic each of the recorded physiological channels. activity in response to relevant stimuli than Secondary features are correlated with comparison stimuli, while truthful examinees differences in deceptive and truthful will generally exhibit larger magnitude of responses at statistically significant levels, change to comparison stimuli than to but have weaker correlation coefficients relevant stimuli. Deceptive scores are compared to primary features. Also, assigned when the magnitude of change to secondary features provide information that relevant question stimuli are greater than is so strongly correlated with their primary comparison question stimuli. Conversely, counterparts that the added information is truthful scores are assigned whenever the largely redundant and may not be additive to degree of change in response to the the effectiveness of some structural models. comparison question stimuli is greater than Some computerized scoring algorithms responses to the relevant question stimuli. (Honts & Devitt, 1992; Kircher & Raskin, Numerous scientific reviews of countless 1988; Krapohl, 2002; Krapohl & McManus, scientific studies have affirmed the validity of 1999; MacLaren & Krapohl, 2003; Nelson, the operational construct that responses to Krapohl & Handler 2008; Raskin et al., 1988;) relevant and comparison stimuli vary as a in use today, and the evidence-based function of deception or truth-telling Empirical Scoring System (Nelson & Handler, regarding a past behavior (American 2010; Nelson et al., 2011) have been designed Polygraph Association, 2011; Ansley, 1983; to use only primary features, forgoing 1990; Abrams, 1973; 1977; 1989; National reaction features considered secondary in Research Council/National Academy of importance. Primary features, sometimes Science, 2003; Nelson & Handler, 2013; referred to as “Kircher features” are the Office of Technology Assessment, 1983; following: respiration – observed as excursion Podlesny & Raskin, 1978; Raskin, Honts & length or correlated patterns, electrodermal Kircher, 2014). activity - observed as the amplitude of vertical increase, and cardiovascular activity - Decision cutscores and reference observed as the amplitude of vertical increase distributions. Numerical test scores are in relative blood pressure. translated into categorical test results

Polygraph, 2015, 44(1) 35 Nelson

through the comparison of test scores with Reference distributions have been cutscores that are anchored to reference summarized (American Polygraph distributions that describe the statistical Association, 2011; Nelson & Handler 2015) in density or probability of obtaining each the form of descriptive statistics that inform particular score within the range of possible us about the location (i.e., mean or average), test scores. Because all scientific data are a dispersion (i.e., variance or standard combination of both diagnostic variance (i.e., deviation) and shape of the distribution of explained variance) and unexplained variance scores observed in the sampling data for (i.e., error variance, random variance or criterion deceptive and criterion truthful uncontrolled variance), individual scores can persons. Published reference distributions be expected to vary somewhat within each can be used to calculate the margin of examination. For this reason, aggregated test uncertainty, in the form of a level of scores have been found to provide the statistical significance, odds ratio, confidence greatest diagnostic efficiency. This is often in level or probability of error, associated with the form of a grand total score, though any possible test score. Test results are said subtotal scores are also used with some to be statistically significant when the polygraph techniques. Grand total and probability of error is less than or equal to a subtotal scores are compared to cutscores declared probability cutscore or alpha level and statistical reference distributions to (i.e., p <= ). This is equivalent to the determine the likelihood that an observed test condition when a test score equals or exceed scores has occurred due simply to a cutscore. uncontrolled error variance or random chance. Decision rules. Decision rules are the practical and procedural comparison of test Probability cutscores are an scores with either traditional cutscores (Bell, expression of our tolerance for uncertainty or et al., 1999; Department of Department of error, expressed as a statistical probability of Defense, 2006a; 2006b; Kircher & Raskin, error, often using the Greek letter , declared 1988; Raskin et al., 1988) or cutscores prior to conducting an examination. A selected for their level of statistical common probability cutscore in polygraph significance and probability of error using and other scientific disciplines is .05, with published reference distributions (American the goal of constraining the proportion of Polygraph Association, 2011, Krapohl & errors to 5% or less while attempting to McManus, 1999; Krapohl, 2002; Nelson, provide a minimum confidence level of 95% Krapohl & Handler, 2008; Nelson & Handler, for the categorical test result. Alternative 2010; 2015; Nelson et al., 2011). probability boundaries of .10 and even .01, Procedurally, following the assignment of representing intended confidence levels of numerical scores, all scores are aggregated by 90% and 99%, are sometimes used when summing the subtotal scores for all testing objectives indicate a need for fewer presentations of each RQ stimuli. Subtotal inconclusive or unresolved test results or scores are then summed to achieve a grand- (.10) for fewer errors (.01). total score for event-specific diagnostic exams. Procedural decision rules are Cutscores have also been determined constructed with regard for assumptions of using performance curves (Bell et al., 1999) independent and non-independent criterion and through heuristic experience. Regardless variance of the RQs of event-specific of the method used to determine numerical diagnostic exams and multi-issue screening cutscores, all decision cutscores will have exams. some associated statistical information to describe the level of significance or Procedural decision rules for event probability of error. The relationship between specific diagnostic examinations, for which numerical scores and associated statistical the criterion variance of the several RQs is reference distributions can be calculated assumed to be non-independent or mathematically, and can also be conveniently dependent (i.e., all RQs address a single event determined using published reference tables. for which the criterion status of different test stimuli may be strongly related), will make

36 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

use of the grand total score to make the most comparing each subtotal score to cutscores accurate classification possible regarding the derived from statistical distributions of test as a whole. Some decision rules, such as subtotal scores of examinations constructed those used by the Department of Defense of questions for which the criterion state was (2006a; 2006b), as described by Light (1999), assumed to vary independently (American or those developed by Senter and Dollins Polygraph Association, 2011; Nelson and (2002) may also make use of subtotal scores Handler, 2010; Nelson et al., 2011). in attempt to reduce inconclusive results, increase test sensitivity to deception, or Although the SSR involves the use of reduce false-negative errors. Decision results individual subtotal scores, previous research involving the grand total, referred to herein as (Barland, Honts & Barger, 1989; Podlesney & the grand-total-rule (GTR), are accomplished Truselow, 1993; Raskin, Honts & Kircher, by comparing the grand total score to 2014; Raskin, Kircher, Honts & Horowitz, cutscores for deceptive and truthful 1988) has shown that although comparison classifications. When using the GTR, test question polygraph tests are effective at results are statistically significant and a differentiating individuals who are truthful or categorical conclusion is made if the grand deceptive, these tests are not as effective at total score equals or exceeds one of the determining the exact question or questions, cutscores. within a series, to which an individual has lied or told the truth. The reasons for this A two-stage modification of the GTR, may have to do with both the psychological referred to herein as the two-stage-rule (TSR) and attentional demands of multiple was described by Senter (2003) and Senter independent stimulus targets, and the and Dollins (2002; 2008). The TSR allows the mathematical and statistical complexities use of subtotal scores to achieve a categorical that result from aggregating the sensitivity, conclusion when the grand-total score is not specificity, false-positive and false-negative statistically significant (i.e., inconclusive). rates of multiple independent results. Test When used, subtotal scores should be questions of multiple issue exams also have a compared with cutscores that are statistically shared, non-independent, source of response corrected for the known inflation of alpha, variance in the form of the examinee. For and associated potential increase in false- these reasons, the final classification of positive errors, that results from the use of examination results as belonging to the multiple statistical comparisons (Abdi, 2007, groups of deceptive or truthful persons is Nelson and Handler, 2010; Nelson et al., always determined at the level of the test as a 2011; Nelson, Krapohl & Handler, 2008). Use whole. of a simple statistical correction, referred to as a Bonferroni correction, can prevent an When using the SSR, the test result is increase in false-positive errors when using classified as deceptive if any independent subtotal scores of event-specific diagnostic question produces a result that is statistically exams. significant for deception, while truthful classifications require that the results of all By definition, the criterion states of independent questions are statistically the RQs of multi-issue screening exams are significant for truth-telling. Field practices assumed to vary independently. For this (American Polygraph Association, 2009a; reason, grand total scores are generally not Department of Defense, 2006a; 2006b) do not used with multi-issue screening exams, for support the interpretation of responses to which the subtotal scores are more some questions as truthful and other commonly used. Scores for multi-issue responses as deceptive within a single screening polygraphs are commonly examination. Of course, statistical methods evaluated using the individual subtotal involving regression, variance and covariance scores (Department of Defense, 2006a; may provide capabilities not available within 2006b; Nelson and Handler, 2010; Nelson et the simple procedural rubric of the SSR. al., 2011; Nelson, Krapohl & Handler, 2008), using a rule referred to herein as the subtotal- Subtotal cutscores for truth-telling score-rule (SSR). The SSR is executed by should ideally be determined using

Polygraph, 2015, 44(1) 37 Nelson

procedures to statistically correct for the cardiovascular or respiratory health. potential reduction of test specificity when requiring multiple statistically significant Polygraph instrumentation consists truthful scores before a truthful classification minimally of three component sensors: two is made. A common solution for this pneumograph sensors (thoracic and correction in the statistical and mathematical abdominal) to record breathing movement sciences, as has been described in procedural activity, electrical sensors to record methods for polygraph scoring (Nelson and autonomic activity in the palmar or distal Handler, 2010; Nelson et al., 2011), is the regions (Handler, Nelson, Krapohl & Honts, Šidák correction, used when requiring 2010), and cardiovascular sensors to record multiple statistically significant independent relative changes in blood pressure (American probability events (Abdi, 2007). Some Polygraph Association, 2009a, 2009b; procedures involve the use of subtotal scores Department of Defense 2006a). Vasomotor with traditional cutscores that are not derived sensors (Kircher & Raskin, 1988; Bell et al., from statistical reference distributions but 1999) are regarded as optional components of are instead based on classification the polygraph instrument. Field testing performance curves or heuristic experience protocols since 2007 have recommended the (Bell et al., 1999; Department of Defense, use of activity sensors to aid in the detection 2006a; 2006b). of countermeasure activity sensors, and are now required by the American Polygraph Physiological basis for the polygraph Association (2011a) as of January 1, 2012. This core combination of required sensors Although a thorough and detailed has been studied for several decades, and description of the physiological responses has been empirically shown to produce recorded by the polygraph is beyond the numerical scores that are structurally scope of this paper, a practical description of correlated with the criterion states of polygraph physiology in inextricably linked deception and truth-telling in statistical with the need to translate changes in reference distributions from development and recorded data into the form of test scores and validation samples used in both field and test results. In contrast to earlier models for laboratory studies (American Polygraph test data analysis that relied somewhat Association, 2011; Bell, Raskin, Honts & heavily on pattern recognition as a means of Kircher, 1999; Harris & Olsen, 1994; Harris, monitoring and observing physiological Horner & McQuarrie, 2000; Horowitz, activity (Department of Defense, 2004), Kircher, Honts & Raskin, 1997; Kircher & evidence-based models in use today will Raskin, 1988; Kircher, Kristjiansson, employ only those physiological features that Gardner & Webb, 2005; MacLaren & Krapohl, are amenable to measurement and that have 2003; Offe & Offe, 2007; Olsen, Harris & been shown, through published and Chiu, 1994; Raskin, Kircher, Honts & replicated peer reviewed scientific studies, to Horowitz, 1988). be correlated at statistically significant levels with differences in response to different types The physiological mechanics of of test stimuli that occur as a function of polygraph responses during comparison deception or truth-telling regarding past question tests occur in the context of the behavior (Bell, et al., 1999; Harris, Horner & autonomic nervous system (ANS) which McQuarrie, 2000; Kircher, Krisjiansson, includes both sympathetic (S/ANS) and Gardner & Webb, 2005; Kircher & Raskin, parasympathetic (PS/ANS) components 1988; Podlesny & Truslow, 1993, Raskin, (Bear, Barry, & Paradiso, 2007; Costanzo, Kircher, Honts & Horiwitz, 1988). Polygraph 2007; Maton et al., 1993; Paradiso, Bear, & recording instrumentation has tended to Connors, 2007; Silverthorn, 2009; Standring, focus on the acquisition of physiological 2005). The ANS regulates involuntary response data that is of practical use to the processes including cardiac rhythm, task of scoring and interpreting polygraph respiration, salivation, perspiration, and test results, with few capabilities beyond that other forms of arousal. S/ANS activity is objective. For example: polygraph responsible for stimulation of the internal instrumentation is not used to evaluate organs in response to activity demands.

38 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

PS/ANS activity serves to reduce (respiratory suppression, electrodermal physiological activation to the minimum level activity, and cardiovascular activity) are necessary to ensure both longevity and easily observed and recorded. adequate response to situational demands. PS/ANS and S/ANS activity are therefore in Common misconceptions about the homeostatic balance with respect to real or polygraph include the notion that the perceived demands. The alternative to polygraph measures deep or rapid breathing, homeostatic balance is a general state of sweaty palms or sweating activity, and rapid disease that may eventually lead to death. or increasing heart rate activity. Of these, the For this reason, every form of change in the first two, increased respiratory activity and ANS can be thought of as intended to sweating activity, are known to be inaccurate maintain homeostasis and survival. and unsatisfactory explanatory models for polygraph reactions. Only the third, heart Polygraph examiners are primarily rate, has been included in validated interested in recording and observing S/ANS statistical classifiers for deception and truth- activity, but it is important to understand telling. However, it is slowing of cardiac that some activity, as with some rhythm, not increase, which is correlated cardiovascular data and respiration with deception (Kircher et al., 2005; Raskin & responses of interest to polygraph examiners, Hare, 1978). Pulse rate is included in the may actually be the result of changes in statistical model for the PolyScore algorithm PS/ANS activity. (For more information about (Blackwell, 1998; Dollins, Krapohl & Dutton, the dual innervation of the autonomic 1999; Dollins, Krapohl & Dutton, 2000; nervous system the reader is directed to more Harris & Olsen, 1994; Olsen et al., 1991; complete works by Janig (2006), Porges Olsen et al., 1994; Olsen, Harris, Capps & (2014), Handler and Richerter (2014)). The Ansley, 1997). Changes in heart rate are not process of change, with the goal of included in other validated statistical models maintaining homeostasis, is referred to as for scoring comparison questions tests. Pulse allostasis (Sterling & Eyer, 1988; Berntson & rate activity is therefore rarely included in Cacioppo, 2007). Changes recorded during polygraph decisions in field settings. polygraph testing can be thought of as allostatic changes (Handler, Rovner & Nelson, Manual analysis of the relative or 2008) that occur in an attempt to attain or absolute magnitude of change or response in maintain homeostasis. phasic activity is easily accomplished for electrodermal activity and cardiovascular Observable and recordable activity. Electrodermal data has been shown physiological changes in physiological activity to be a strong indicator of S/ANS arousal that are structurally correlated with (Boucsein, 2012), and to be the most robust deception and truth-telling during and reliable contributor to the final score and comparison question testing include the resulting classification of comparison following three features: 1) subtle and question polygraph test results (Ansley & temporary respiratory suppression (i.e., Krapohl, 2000; Harris & Olsen, 1994; suppression or reduction of respiratory Kircher, 1981; 1983; Kircher & Raskin, 2002; movement, 2) relative magnitude of phasic Kircher & Raskin, 1988; Kircher et al., 2005; electrodermal activity indicative of increased Krapohl & McManus, 1999; Nelson, Krapohl S/ANS activity, 3) relative magnitude of & Handler, 2008; Olsen et al., 1997; Raskin phasic response in the moving average of et al., 1988). relative blood pressure. These measurable reactions have been described in several Reactions to test stimuli can be publications (ASTM International, 2002; Bell evaluated through either non-parametric et al., 1999; Department of Defense, 2006a, observation or through linear measurement. 2006b; Harris, Horner & McQuarrie, 2000; However, polygraph instrument manufacturer Kircher & Raskin, 1988; Kircher et al., 2005; s have not completely standardized the signal Krapohl & McManus, 1999; Raskin & Hare, processing and feature extraction methods 1978; Raskin et al., 1988). Physiological whereby data obtained during polygraph responses for these three primary sensors testing are to be anchored to linear changes

Polygraph, 2015, 44(1) 39 Nelson

in physiological activity. As a result, (Honts & Handler, 2014). measured or responses to polygraph stimuli are used only in automated statistical Respiratory suppression, though classification models within the polygraph accurately measured by the curvilinear testing paradigm and may not be directly distance (Kircher & Raskin, 1988; Raskin et related to measurements of similar al., 1988; Timm, 1982) or sum of absolute physiological activity as utilized in medical magnitude of change in y-axis excursion fields. Most polygraph scoring paradigms will (Kircher & Raskin, 2002), is not easily a non-parametric feature extraction method. measured without mechanical devices. Field polygraph examiners are taught to evaluate Cardiovascular responses during recorded data for the presence or absence of comparison question testing have been reaction patterns that have been described as shown to be correlated with the criterion correlated with the criterion categories of categories of deception and truth-telling at deception and truth-telling (Raskin & Hare, statistically significant levels (Bell et al., 1978; Bell et al., 1999; Harris et al., 2000; 1999; Harris et al., 2000; Kircher & Raskin, Kircher & Raskin, 1988; Kircher et al., 2005; 1988; Kircher et al., 2005; Nelson et al., 2008; Raskin et al., 1988). Pattern features shown Raskin et al., 1988). The structural to be correlated with respiratory suppression correlation for cardiovascular response and CQT criterion categories are few, and activity has been shown to be weaker than include the following: 1) a subtle and that of electrodermal response data, though temporary reduction of the tidal or inhalation stronger than that of respiratory response volume resulting in a reduction of the y-axis data. Diagnostic features and the (vertical) magnitude of the respiratory interpretation of cardiovascular response tracings for multiple respiratory cycles activity were described by Handler and following the onset of the test question Reicherter (2008) and Handler, Geddes and stimulus, 2) a subtle and temporary slowing Reicherter (2007). Some field polygraph of respiratory rate for multiple respiratory examiners make use of photoelectric cycles following the onset of the test question plethysmograph data, a form of stimulus, and 3) a subtle and temporary cardiovascular recording for which elevation of the exhalation baseline or information has been described by Handler residual volume for multiple respiratory and Krapohl (2007), Geddes (1974) and cycles following the stimulus onset. Apnea is Honts, Handler, Shaw & Gougler (2015). also correlated with differences in deception and truth-telling (Bell et al., 1999; Kircher & Of the three physiological sensors, Raskin, 1988), but can be easily feigned. respiratory data has found to be the most susceptible to disruption from voluntary Polygraph sensors, while capable of activity during polygraph testing. Respiration recording sympathetic autonomic responses data has the weakest structural coefficients to test stimuli, are non-robust against of the required polygraph sensors (Harris & disruptive somatic or physical activity that is Olsen, 1994; Harris et al., 2000; Kircher & sometimes not easily observed. In response to Raskin, 1988; Kircher et al., 2005; Nelson, concerns about the potential for attempted Krapohl & Handler, 2008; Olsen et al., 1997; faking during testing (i.e., countermeasures), Raskin et al., 1988). However, field examiners somatic activity sensors have been developed have learned to evaluate respiration data for to detect and record both overt and covert indicators of cooperation or non-cooperation physical activity. There is indication in the during testing, in addition to evaluating literature that somatic activity sensors can respiration data for indicators of deception increase examiners’ ability to observe and and truth-telling. Some research (Kircher et detect these attempts (Ogilvie & Dutton, al., 2005) has suggested that pneumograph 2008; Stephenson & Barry, 1986). In the data may be less diagnostic during absence of recorded data of artifacted, odd or comparison question tests conducted using uninterpretable quality that indicates overt or DLC exams, while other findings have shown covert physical activity, field examiners will that respiration data of DLC exams does assume that responses recorded by the contain useable diagnostic information respiration, electrodermal and cardiovascular

40 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

sensors have their origins in the ANS and are explanation for observed differences in not altered or contaminated by covert somatic response to different test stimuli (Palmatier & activity. Rovner, 2014), though more discussion is needed to fully understand the advantages Psychological basis for the polygraph and limitations of this theory as applied to the polygraph. Until a more detailed evidence A satisfactory psychological theory will is described, a general constructed would parsimoniously and holistically account for suggest that all responses to test stimuli the variety of known and observed result from some combination of mental phenomena associated with the polygraph activity, emotion, and behavioral test. Such a theory will explain electrodermal conditioning. All of these may play a role in responses, cardiovascular responses, and physiological reactions that are load respiratory responses, to both PLC and DLC differentially for different types of polygraph question, and will contribute to our test stimuli (i.e., relevant and comparison understanding of test accuracy with both questions) as a function of deception or psychopathic and non-psychopathic persons. truth-telling in response to relevant stimuli Moreover, a sound understanding of the that describe a behavioral issue of concern. It psychological basis of the polygraph test will will be important to refrain from attempting enable us to better understand issues of test to define which single emotion, or define the suitability and unsuitability (i.e., for whom exact focus of attention and cognition within the test may or may not work). A the examinee until such time as evidence comprehensive theoretical understanding of exists to verify a more detailed description. the psychological basis for responses to different testing paradigms such as the CQT Field examiners have tended and other polygraph and lie-detection historically to simplify the explanation of paradigms – such as the concealed- polygraph psychology to a minimum level information-test (CIT), which uses similar that satisfies both themselves and their recorded physiological signals as a basis for examinees. This was often done using a ipsative calculations of the statistical scientifically unsatisfactory explanation of significance of differences in responses to “psychological set” (see Handler & Nelson, different test stimuli. Finally, a satisfactory 2007) as related to the fight-or-flight psychological theory for polygraph testing will response that has been attributed to Cannon achieve a coherent integration of scientific (1929). Although now regarded as an knowledge regarding the polygraph with inadequate model for both polygraph extant knowledge in related fields of science responses and stress responses in general including cognitive, social and behavioral (Bracha, et al., 2004; Taylor et al., 2000), the psychology, psychophysiology, signal application of this now problematic detection theory, decision theory, statistical hypothesis holds that examinees will focus learning theory and more. their attention and physiological response to the question or issue that presents the While a comprehensive discussion of greatest immediate threat to their survival the psychological basis for polygraph testing and well-being. The most obvious evidence of is beyond the scope of this paper, a brief the limitations of the “psychological set” explanation will hold that the psychological hypothesis is that it cannot account for the basis for responses to polygraph test stimuli effectiveness of DLCs, and does not involves a constellation of simple adequately explain test effectiveness with psychological mechanisms including psychopaths who have been shown to have cognition, emotion, and behavioral low levels of fear conditioning (Birbaumer, et conditioning (Handler & Nelson, 2007; al., 2005). Additionally, the “psychological Handler, Shaw & Gougler, 2010; Kahn, set” requires the assumption that polygraph Nelson, & Handler, 2009; Senter, sensors can identify different types of Weatherman, Krapohl, & Horvath, 2010). emotions, though the literature does not Recently, preliminary process theory, related support this notion (Kahn, Nelson, & to orienting theory (Barry, 1996) has been Handler, 2009). Moreover, this explanation suggested as a potentially parsimonious suffers from a fundamental vulnerability to

Polygraph, 2015, 44(1) 41 Nelson

suggestions that it is pseudoscientific responses to a repeated sequence of because it cannot satisfy a fundamental polygraph test stimuli will be loaded onto scientific requirement for falsifiability different types of test stimuli as a function of (Popper, 1959). deception and truth-telling regarding the investigation target issues, and that the basis Handler & Nelson (2007) described the of observed responses can be thought of as troublesome origins of the term originating in cognition, memory, emotion “psychological set” which does not appear in and conditioned experience relative to the the scientific psychological literature in the test stimuli. Relative differences in response form employed by polygraph examiners. to different types of test stimuli can be Differential salience has been suggested as a compared with statistical reference more general and parsimonious psychological distributions and evaluated for their level of theory that is more consistent with the field statistical significance to quantify the margin of scientific psychology, including emotion, of uncertainty regarding a categorical cognition and conditioned learning as a basis conclusion of deception or truth telling. This of response to polygraph stimuli (Senter, form of theoretical explanation is Weatherman, Krapohl & Horvath, 2010). fundamentally testable and therefore fundamentally scientific (Popper, 1959). Polygraph responses might also be accounted for using the conceptual Accuracy of polygraph examinations framework of behavioral conditioning, as first described by Pavlov (1927), and learning Results from several decades of theory, including the concepts of sensitization scientific study have consistently supported and habituation (Domjan, 2010, Groves & the validity of the hypothesis that the Thompson, 1970). A conditioned learning combination of instrumental recording and model for responses to polygraph stimuli statistical modeling can discriminate suggests that involvement in a serious deception and truth-telling at rates transgression amounts to a form of single- significantly greater than chance. Scientific trial behavioral conditioning with test reviews of peer reviewed polygraph studies questions functioning as a conditioned have borne this out repeatedly. Abrams stimulus. Polygraph interviewing theory holds (1989) surveyed the published literature and that a thorough and effective pretest reported an accuracy level of .89. Honts and interview will give the truthful examinee an Peterson (1997), Raskin (2002), and Raskin & opportunity to habituate to test questions, Podlesny (1979) reported the accuracy of while causing a deceptive examinee to polygraph studies as exceeding .90. The become sensitized to the test questions as a systematic review completed by the Office of conditioned stimulus. Technology Assessment (1983) suggested that laboratory studies had an average Cognitive-behavioral theory, which unweighted accuracy of .83, with slightly includes cognition, emotion and higher accuracy, .85 from field studies at the behavioral/experiential learning as a basis of time. Crewson (2001) reported an accuracy physiological response, has been also rate of .88 for diagnostic polygraphs in a suggested as an explanatory hypothesis for comparison with medical and psychological the variety of known polygraph phenomena tests. The National Research Council (2003) (Kahn, Nelson & Handler, 2009), and this concluded with reservation that the model is consistent with the salience polygraph differentiated deception from hypothesis described by Handler and Nelson truth-telling at rates that were significantly (2007) and Senter et al., (2010). A greater than chance though less than perfect, generalization of the cognitive-behavioral and reported a median ROC of .89 for field model for polygraph reactions suggests that studies and .86 for laboratory studies. truth-telling presents simpler cognitive and emotional task demands than deception. Different types of studies offer different advantages and disadvantages. Field A cognitive-behavioral and differential studies offer assumed ecological validity, but salience model would hold that physiological are accompanied by a lack of experimental

42 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

control, and by inconsistent case the recorded physiological data or numerical confirmation and non-random case selection scores, and may have overemphasized the – making generalization of some field study use of confession information as the case results troublesome or impossible. Laboratory confirmation and selection criteria. These studies offer the potential for random factors can potentially introduce non-random sampling and sufficient experimental control and non-representative case selection criteria to study questions of causality, but have that can systematically exclude both false- unknown ecological validity. The general negative and false-positive error cases for trend in psychological research has been a which a confession is in not likely to be high level of correspondence between field obtained. Advocacy research involving and laboratory studies (Anderson, Lindsay & proprietary polygraph techniques seems to Bushman, 1999), and this fact underscores have also resulted in the production of the need to avoid confusing ecological validity exaggerated accuracy estimates – often near with external validity. perfect – sometimes involving the principal investigator as examiner, scorer and External validity and ecological technique developer/proprietor. Individual validity are not synonymous. External validity studies reporting near-perfect accuracy have – the ability to generalize results to field been described as seriously methodologically settings - is often achieved from scientific flawed (American Polygraph Association, studies in laboratory settings with imperfect 2011). ecological validity. Previous studies by Office of Technology Assessment (1983), the As with all forms of scientific testing, National Research Council (2003), the diagnostic tests conducted in response to a American Polygraph Association (2011), and single issue of concern, for which there is Pollina et al., (2004) showed no statistically evidence of a problem, will provide greater significant differences in the results of overall accuracy than multi-issue exams polygraph test accuracy in field and (American Polygraph Association, 2011; laboratory studies. Crewson, 2001) that are intended to simultaneously test several issues for which The most recent scientific review of the criterion variance is assumed to be comparison question polygraph techniques in independent (i.e., a person could lie to one or present use (American Polygraph Association, more investigation target questions while not 2011) reported a mean accuracy of .89 for lying to other investigation targets). Multiple- event-specific diagnostic polygraphs, with issue examinations involve more probabilistic some evidence-based methods having been and statistical decisions, and therefore a shown to provide mean accuracy levels in greater aggregated potential for error and excess of .90. Multi-issue polygraphs, of the uncertainty compared to single issue exams. types used in operational security, law Other causes for differences in accuracy enforcement pre-employment, and post- among diagnostic and screening exams may conviction screening programs, have been involve the competing attentional demands of shown to have a mean accuracy rate of .85. multiple test target stimuli, and the potential More important than mean accuracy that screening exam formats may at times be statistics are the 95% confidence ranges that systematically biased for test sensitivity - surround those point estimates, described with the goal of slightly over-predicting later in this report, especially the lower-limit problems that can be resolved upon further of test accuracy. investigation. It is also possible that some screening studies were completed using sub- Earlier published scientific reviews optimal decision rules and cutscores that (Abrams, 1973; Ansley, 1983; Ansley, 1990) were not derived through scientific analysis. have reported higher rates of test accuracy, often in the upper 90s. Results from these The aggregated sensitivity rate for earlier systematic reviews are now thought to deception during diagnostic polygraphs was be confounded by sampling methodologies reported as .84 (95% confidence range .73 to that may have overemphasized examiner self- .93) and the aggregated specificity rates for report, possibly without researcher access to truth telling during diagnostic polygraphs

Polygraph, 2015, 44(1) 43 Nelson

was reported as .77 (95% confidence range were reported as .11 for false negative errors .65 to .85) (American Polygraph Association, and .14 for false positive errors. Inconclusive 2011). Evidence indicates that deceptive rates were also reported as .11% for deceptive persons have a statistically significantly persons and .14 for truthful persons. The greater than chance probability of failing a 95% confidence interval for unweighted polygraph test while truthful persons have a average accurate rate for polygraph statistically significantly greater than chance examinations constructed of test questions probability of passing a polygraph. for which the criterion variance of the relevant questions was assumed to be Aggregated error estimates for independent was reported as .77 to .93 polygraph diagnostic tests were calculated in (American Polygraph Association, 2011). the most recent meta-analytic survey using 24 peer reviewed scientific studies involving For diagnostic polygraphs, the Positive 8,975 confirmed scores of field and lab Predictive Value (PPV), or probability that a studies were reported as .08 for false negative failed polygraph result is correct, was errors (i.e., deceptive persons who pass the reported as reported at .89 (95% confidence polygraph) and .12 for false positive errors rage .81 to . 99, while the Negative Predictive (i.e., truthful persons who fail the polygraph). Value (NPV), or probability that passed Inconclusive rates were reported as .09 for polygraph result is correct, was reported as deceptive persons and .13 for truthful .91 (95% confidence rage .82 to .99) persons. The 95% confidence rage for (American Polygraph Association, 2011). For decision accuracy of event-specific diagnostic screening polygraphs, PPV was reported as polygraphs was .83 to .95 (American .83 (95% confidence rage .71 to .94), while Polygraph Association, 2011). Quite NPV was reported as .88, (95% confidence obviously, deviations from empirically rage .78 to .97). validated testing protocols may decrease expected test accuracy, and, of course, these Conservative judgment necessitates accuracy estimates assume that each the selection of the lower end of the examination is conducted on a suitable confidence limit as the boundary at which we examinee. can be confident that polygraph accuracy exceeds. Therefore, diagnostic polygraphs can The aggregated sensitivity rate for be assumed to provide accuracy over .81 for deception during multi-issue polygraphs, of deceptive results, and over .82 for truthful the type employed in polygraph screening results, while screening polygraphs can be programs, was reported as .77 (95% assumed to provide accuracy over .71 for confidence range .60 to . 90) and the deceptive results and over .88 for truthful aggregated specificity rates for truth-telling results. However, PPV and NPV are non- during multi-issue criterion independent resistant to differences in base-rates, and polygraphs was reported as .72 (95% figures reported herein apply only to balanced confidence range .63 to .81) (American groups of polygraph exams. Common Polygraph Association, 2011). Evidence inferential estimates of test accuracy (e.g., indicates that deceptive persons have a sensitivity, specificity, inconclusive and error statistically significantly greater than chance rates) are resistant to differences in base- probability of failing a test constructed of rates and can be more useful when multiple independent issues, while truthful interpreting the meaning of the result of a persons have a statistically significantly single examination, such as when a court is greater than chance probability of passing a evaluating an individual case. similarly constructed multi-issue polygraph. Threats to polygraph accuracy Aggregated error estimates for polygraph tests constructed from test Because polygraph tests - like all tests questions for which the criterion variance is - are inherently probabilistic (i.e., they are assumed to be independent, were calculated neither deterministic observation nor from 14 peer reviewed studies involving 1,194 physical measurement), they are not perfect. confirmed scores of field and lab studies, No probabilistic test is completely immune to

44 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

potential error or threats to test accuracy. summary, polygraph testing with Although the National Research Council psychopathic persons can be assumed to be (2003) could not find any scientific evidence similar - as accurate and as inaccurate - as that any personality type or endogenous that with non-psychopathic persons. factors significantly affect polygraph test Regardless, public and media reactions may accuracy, it is commonly understood that tend to simplistically assume that a person polygraph test accuracy may be compromised has “beaten” the polygraph whenever a or reduced by the health, level of functioning testing error is observed. Some proportion of or suitability of the examinee. testing errors should not be surprising unless the proportion of errors can be shown as Abrams (1975) showed that polygraph exceeding the 95% confidence interval for test accuracy was reduced significantly with normally expected error rates. the level of functional maturity for young juveniles. In other studies, Abrams and In response to concerns that Weinstein (1974) showed the polygraph examinations conducted under friendly cannot be expected to be accurate with circumstances, such as those conducted subjects who have chronic mental health under attorney-client privilege, have less diagnoses within the psychotic spectrum of validity than those conducted by law disorders, and further showed that polygraph enforcement examiners, Honts and Peterson accuracy is unstable for people whose (1997) described flawed logic and reliance on intellectual abilities are below the lower limit a false hypothesis as the basis of this of the normal range (Abrams, 1974). concern, and summarized the findings reported by Honts (1997) who investigated Although developmental problems, low the hypothesis through logic, case analysis, intellectual functioning, low functional and meta-analysis. At the present time there maturity, and psychosis can adversely affect is no basis of evidence to support and polygraph accuracy; there is no evidence that available evidence contradicts the notion that psychopathic personality issues will adversely exams conducted under attorney client affect polygraph test accuracy. Barland and privilege offer reduced accuracy or validity. Raskin (1975) studied criminal suspect with These findings underscore the value of high psychopathic deviate scores on MMPI structured quantitative methods and the testing and showed no significant differences importance of objectivity and reproducibility in the ability to detect deception. Patrick and when analyzing test data. Iacono (1989) also showed no significant differences in the detection of deception A final concern regarding polygraph among psychopathic and non-psychopathic test accuracy involves the possibility that inmates. Raskin and Hare (1978) reported the countermeasures (i.e., faking) might be same conclusion with a different sample of effective at altering the test outcome. This inmate subjects. Balloun and Holmes (1979) hypothesis represents an important concern showed that polygraph accuracy using a for U.S. government and operational security guilty knowledge test paradigm was also not programs, as well as for law enforcement pre- significantly different for college students employment screening, and post-conviction with high and low psychopathic deviant supervision screening tests with convicted scores on MMPI testing. offenders. Despite the importance of this concern, only a small number of studies have Although both the Office of been published on the topic of faking and Technology Assessment (1983) and the polygraph accuracy. National Research Council (2003) expressed concern at the notion that polygraph test Rovner (1979; 1986) and Rovner, accuracy may be lower for persons with Raskin and Kircher (1979) showed that dangerous personality profiles, both reported having access to information about the that the published scientific evidence does polygraph technique was insufficient to not support, and consistently refutes, the significantly alter test accuracy. A concerning hypothesis that psychopaths believe their lies finding in these studies was that truthful and can therefore defeat the polygraph. In subjects who attempted to employ

Polygraph, 2015, 44(1) 45 Nelson

countermeasures, in attempt to increase their in the production of more deceptive test assurance of passing, actually increased their scores. Somatic activity sensors, and testing likelihood of being classified as deceptive. procedures intended to elucidate both mental These findings lead the National Research and physical countermeasure, become more Council (2003) to conclude that widely used following these studies, and countermeasure use by truthful examinees replication of these studies is needed using was not advisable. contemporary testing instrumentation and methodologies. Timm (1991) reported post-hypnotic suggestion to be ineffective as a polygraph Activity sensors are designed to be countermeasure, while Ben-Shakhar and sensitive to somatic/behavioral nervous Dolev (1996) along with Elaad and Ben- system activity while remaining robust Shakhar (1991) suggested that mental efforts against recording the effects of ANS activity of may have an effect on the electrodermal interest to polygraph test data analytic channel primarily. Studies by Iacono, models. The rationale for activity sensors Boisvenu and Fleming (1984) and Iacono, involves the fact that polygraph component Cerri, Patrick and Fleming (1992) showed sensors, while intended to be sensitive to benzodiazepines and stimulant medications sympathetic autonomic nervous system to be ineffective countermeasures, though an activity, are non-robust against also earlier study by Waid, Orne and Orne (1981) recording the effects of somatic activity. indicated that meprobamate may adversely Stephenson and Barry, (1986) along with affect polygraph results. An inherent Ogilvie and Dutton (2008) showed that the limitation to our acquisition of additional addition of an activity sensor can increase the knowledge in this area will be that obtaining detection of somatic activity, and this may ethics committee approval to fully explore the reduce the occurrence of false accusations of effects of drugs or psychiatric medication on countermeasure use. It is assumed that polygraph test results may be difficult or observable activity indicates that the unlikely. recorded polygraph data is likely to be an adulterated composite of both autonomic and Concerning results regarding somatic activity. Correspondingly, the polygraph countermeasures have been absence of somatic activity would indicate the described by Honts (1987), Honts, Amato & recorded autonomic data is most likely Gordon, (2004); Honts and Hodes (1983), unaltered and authentic. Honts, Hodes and Raskin (1985), Honts, Raskin and Kircher (1987), and by Honts, Statistical methods have not yet been Raskin, Kircher and Hodes (1988) whose widely exploited in countermeasure detection. collective work began to suggest that human However, the OSS-3 algorithm (Nelson et al., polygraph experts are not as effective as they 2008) includes a procedural requirement to claim at differentiating countermeasure use review data for interpretable data quality and from other artifacts. Additionally, Honts et al., mark any segments of data that are artifacted (1988) and Honts and Reavy (2009) found by movement or other problem activity. The that spontaneous countermeasure use was OSS-3 algorithm will aggregate the number not uncommon among both deceptive and and location of indicated artifact events and truthful examinees. Raskin and Kircher then calculate the statistical probability that (1990) reported that training in physical observed artifacts have occurred due to countermeasures can reduce polygraph test random causes. The algorithm will alert the accuracy, and Honts, Raskin and Kircher examiner to the possibility that an examinee (1994) reported mental and physical may have attempted to systematically or countermeasures as equally effective. In a intentionally alter the recorded physiological different study, Honts, Winbush and Devitt data whenever the likelihood falls below an (1994) reported that mental countermeasures established alpha boundary for statistical can be used to defeat guilty knowledge tests. significance. Statistical methods have not yet In another study, Honts and Amato (2001) been exhaustively studies, and additional again reported that countermeasures research is needed regarding the application attempts by truthful subjects again resulted of statistical and computational methods to

46 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

detect countermeasure attempts. type would also require that the adulteration of both diagnostic and error variance in the Strategies have been described recorded physiological data is accomplished involving both covert physical/muscle activity in a manner that will convince trained and mental activities. An alternative, social, examiners of the authentic quality of the countermeasure strategy will be an attempt recorded data. The inherent difficulty of this to convince the examiner to ignore recorded challenge is made more difficult by the fact indicators of deception as the result of some that polygraph testing, as with other forms of alternative cause. testing, can include mechanisms and analytical procedures designed to quantify Other forms of potential the probability that a persons has attempted countermeasures may involve the use of to engage in countermeasures. medications or drugs, or physical exhaustion, and the use of mental In response to the complexity of the efforts such as hypnosis, meditation or metal issues, assertions and conclusions surroun- activity. In general, polygraph countermeasu- ding the potential for countermeasures res can be expected to attempt to either during polygraph testing, the National dampen or exaggerate responses to the entire Research Council (2003) wrote the following: set of test questions, resulting in an increased likelihood of an inconclusive test “Because the effective application result. Alternatively, examinees may attempt of mental or physical to strategically alter responses to relevant or countermeasures on the part of comparison test stimuli. Because it appears examinees would require skill in unwise to exaggerate responses to relevant distinguishing between relevant stimuli, and because the reliable suppression and comparison questions, skill in of responses to only some test questions will regulating physiological response, present non-trivial complexities to the and skill in concealing examinee, polygraph countermeasure countermeasures from trained strategies will most likely involve attempts to examiners, claims that it is easy to either dampen or disrupt the test as a whole train examinees to “beat” both the or to augment or increase responses to polygraph and trained examiners comparison questions stimuli. require scientific supporting evidence to be credible. However, The goal of countermeasures or faking we are not aware of any such attempts, in terms of test data analysis, is to research.” (p.147). substantially alter the diagnostic and error variance contained in the recorded data such The literature review of the National that it is uninterpretable or such there is no Research Council (2003) was unable to correlation between the deceptive or truthful produce evidence supporting the hypothesis criterion state and reaction differences that that polygraph countermeasures and faking occur in response to different types of test attempts are effective at assisting deceptive stimulus questions. This would produce a persons to defeat comparison question test result that is inconclusive because the polygraphs as they are presently used in field data are either uninterpretable or not settings. However, this should not be statistically significant. interpreted as support of the infallibility of the polygraph, nor an assertion that nobody A sophisticated countermeasure had ever passed a polygraph in error. objective would propose to alter the recorded test data such that the mathematical, Although all test paradigms may have statistical and computational methods to some potential vulnerability to exploitation, partition and compare the sources of variance data at this time suggest that systematic would result in a direct reversal of the efforts to alter the polygraph results are not valence of the correlation coefficients for the well supported by evidence. Neither are criterion states of deception and truth-telling. claims that the polygraph test is infallible. Successful countermeasure attempts of this Polygraph test results remain a probabilistic

Polygraph, 2015, 44(1) 47 Nelson

estimate of the degree of uncertainty provides information about the increase in surrounding a categorical conclusion. What decision accuracy across the entire range of is known at this time is that virtually all possible base-rates. guilty or deceptive persons who agree to undergo polygraph testing may attempt to Honts and Schweinle (2009) showed engage in some form of activity in an attempt that diagnostic polygraph significantly to achieve a negative test result. increase the decision accuracy for both deceptive and truthful examinees, peaking at Polygraph has been shown to be an 27 times the accuracy of unassisted effective, even if imperfect, tool for decisions, under assumed high base-rate discriminating deception and truth-telling conditions, with statistically significant even in field studies involving persons who increases in decision accuracy, compared to were suspected of actual, sometimes serious, unassisted expert judgment, from base-rates - some of whom can be assumed to .01 to .97 for deceptive outcomes and .03 to have attempted to engage in some form of .99 for truthful outcomes. Screening countermeasure attempts to pass the test polygraphs, often conducted under assumed while lying. However, even a remote potential low base-rate conditions, showed a for effective countermeasure use - and the statistically significant increase in the relationship of this to polygraph accuracy in accuracy of decisions regarding deception, government operational security, law but no significant increase in the accuracy of enforcement pre-employment, and post- decisions regarding truth telling. The increase conviction supervision of convicted offenders in decision accuracy for deceptive outcomes in community settings - will mean that the was statistically significant, compared to interplay between countermeasure attempts unassisted lie-detection, from base-rates .02 and the authenticity of recorded test data will to .83, and was statistically significant for remain an important area of concern. truthful outcomes from base-rates .11 to .99. Regardless of the effectiveness or ineffectiveness of countermeasures and Handler, Honts and Nelson (2013) faking attempts, additional and continuous further evaluated the IGI statistic using research is needed to more fully understand polygraph screening tests of the type used in the vulnerability to countermeasures of operational security, law-enforcement contemporary polygraph testing procedures. screening, and post-conviction sex offender supervision programs. Handler et al., showed Contribution of polygraph results to statistically significant increases in the professional decisions accuracy of screening decisions, compared to unassisted lie detection, from base-rates .01 In an attempt to quantify the value to .94 for deceptive outcomes and from base- and contribution of polygraph results to rates .07 to .99 for truthful outcomes. This professional judgment, Honts and Schweinle suggests that effective use of polygraph (2009) described the use of the Information testing, though probabilistic and imperfect, Gain Index (IGI, Wells & Olson, 2002) to has the potential to increase the effectiveness diagnostic and screening polygraphs. IGI is a of professional decision making. measurement of the increase in decision accuracy that results from method, and offers Conclusions the advantage of providing information across the spectrum of prior base-rates – something Evidence exists to support the other statistical metrics have failed to do scientific validity of polygraph testing in both efficiently. This method compares the diagnostic and screening contexts, and there increase in decision accuracy provided by is sufficient evidence to warrant continued polygraph results, with unassisted interest in both research and practice of the professional judgment – for which Vrij (2008) instrumental and statistical discrimination of showed that police officers achieve a decision deception and truth telling in both forensic accuracy of 56% when attempting to and screening programs. Available evidence determine deception or truth telling. The can describe and account for virtually all importance of the IGI statistic is that it aspects of the polygraph test, including

48 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

general test theory, testing procedures, phenomena that cannot be subjected to decision theory, and signal detection theory, physical measurement. Tests are not needed test accuracy, and vulnerability to when direct mechanical or linear countermeasures or faking, along with the measurement is possible; in which case we psychological and physiological basis of simply measure the item of interest. responses to polygraph stimuli. Measurement, as compared to testing, involves mechanical measurement error. The polygraph test, like other Testing of any type may involve human scientific tests, is a probabilistic test that sources of variance and other random or involves the recording of physiological uncontrolled sources of error variance in responses to stimuli and uses statistical addition to the diagnostic variance contained decision theory to quantify the margin of within and expressed by the testing data. error or level of statistical significance – or alternatively the odds or confidence level - All test data, including polygraph test associated with the test result (Nelson, data, are a combination of diagnostic 2014a; 2014b; 2014c, 2014d; 2014e). The variance (also referred to as explained polygraph test achieves its objectives through variance, controlled variance, or signal) and the structural combination of physiological error variance (also referred to as random responses that have been shown to be variance, unexplained variance, uncontrolled reliable proxies that are correlated at variance, or noise). Ideally, testing data will statistically significant levels with differences include a large portion of diagnostic variance in responses that are loaded onto different and a small portion of error variance, but no types of test stimuli (i.e., RQs and CQs) as a test is perfect. Scientific tests are expected function of deception and truth-telling only to quantify and account for the margin regarding a past behavior. of error surrounding a test result, and to account for the basis of assumptions related The need for a test that can to the testing procedures. discriminate deception and truth-telling arises from the fact that evidence may not Because scientific tests are used to exist, or may not yet have been uncovered to evaluate amorphous phenomena that cannot enable a deterministic conclusion or physical be subjected to deterministic observation or measurement. A probabilistic test of physical measurement, all test results are deception – with accuracy significantly probabilistic – including when they are greater than both chance and unassisted lie simplified to categorical test results. detection - is the scientific alternative to the Quantification of the margin of error or level near chance accuracy rates of unaided of statistical significance associated with a human inference. Scientific tests are needed test result will enable referring professionals whenever a perfect deterministic observation and consumers of testing results to make is not possible. better-informed conclusions about the meaning and usefulness of a test result. Tests are often needed to make informed conclusions about events in the Concerns about the ethics of past, which can no longer be observed polygraph testing, and especially polygraph directly or deterministically, and also to screening programs, have sometimes pointed understand the potential for future events to the lack of perfection and the false positive which have not yet occurred and which error rate as the basis for argument against therefore cannot yet be directly or the use of the polygraph. Expectations for deterministically observed. Polygraph test deterministic perfection – in which test results refer to both the likelihood that a past results are not affected by uncontrolled behavior has occurred, and to the future variance, human behavior, or random error – potential that information or evidence will be are not realistic, and frustration or uncovered to confirm a conclusion. disappointment regarding a lack of deterministic perfection are not warranted in Scientific tests are also needed when a scientific testing context. there is a desire to measure an amorphous

Polygraph, 2015, 44(1) 49 Nelson

It is important to remember that one from related fields of science. of the operational goals of testing - in medicine, psychology, forensics, polygraph Practical areas for which more programs and other testing contexts - is the information is needed include the accuracy of reduction of harm resulting from both false- both diagnostic and screening polygraphs, positive and false-negative errors. Any the contribution of the results of diagnostic practical testing method that achieves polygraphs to investigation outcomes, and accuracy significantly greater than chance the contribution of screening polygraphs to has the potential for reducing such harm if case and program outcome measures such as used effectively. It is also important to note rule-violations, corruption, dereliction and that screening tests of some types may be recidivism. As long as some persons are intended to slightly over-predict the presence motivated to engage in deception, and as long of problems – with the goal of correcting as others attempt to use scientific errors with subsequent diagnostic testing. technologies to detect deception, there will be While false-positive errors can be identified some who are interested in developing and corrected with additional testing and countermeasure strategies to evade detection. investigation, identification of false negative For this reason, there will be continued and errors is sometimes not possible until a ongoing interest in developing knowledge problem has escalated to a degree that can regarding the potential vulnerabilities of the sometimes permanently affect individual lives polygraph test, and what additional methods and futures. It is equally important to can be applied to counter those remember that neither polygraph test results, vulnerabilities. nor any form of test result, should be used alone as the basis for decisions that affect the Though it is colloquially referred to as rights and liberties of individuals. There are a “lie detector” test as a term of convenience, no published policies or standards of practice science and scientific reason do not suppose for screening polygraphs that advise or that the polygraph actually measures lies per require the use of polygraph test results se. All test results are probability statements. alone as a sufficient basis for professional The alternative to a probabilistic understand- decision-making. ding of polygraph test results is to encourage false expectations and frustration that the As always, more information and lies themselves can somehow be subjected to research is desirable in some areas, deterministic observation or physical measu- pertaining to both theoretical constructs and rement. In reality, the principles of physiology practical concerns. Theories are satisfactory and psychology are sufficiently complex and only as long as they account for known and variable that a probabilistic model is observed phenomena. The emergence of any necessary and unavoidable. evidence or phenomena that is not accounted for by our current theories should be taken Because lies per se are amorphous as an indicator of the need and obligation to temporal events, lie detection will likely continue to revise our knowledge and remain a probabilistic and imperfect task. It assumptions in response to the new will be important to remain aware that the information. Failure to make revisions to goals of scientific testing are often to quantify working theories is an indicator of stasis, and and measure phenomena that cannot be is a characteristic of pseudoscientific subjected to deterministic observation or endeavors. Scientists are continuously mechanical measurement. Polygraph test upgrading all working theories in response to results are a measurement of the uncertainty an ever-increasing volume of known and surrounding a categorical conclusion based observed phenomena. All theories in the on differences in responses that are loaded realm of science are expected to evolve over onto different types of test stimuli as function time and to move towards an integrated of deception or truth-telling regarding a framework with other theories from other behavioral concern. fields of science. For this reason, it will be important for the polygraph profession to continue to make use of new information

50 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

References

Abdi, H. (2007). Bonferroni and Šidák corrections for multiple comparisons. In N.J. Salkind (Ed.), Encyclopedia of Measurement and Statistics. Sage.

Abrams, S. (1973). Polygraph validity and reliability: A review. Journal of Forensic Sciences, 18, 313-326.

Abrams, S. (1974). The validity of the polygraph with schizophrenics. Polygraph, 3, 328-337.

Abrams, S. (1975). A response to Lykken on the polygraph. American Psychologist, 30, 709-711.

Abrams, S. (1977). A polygraph handbook for attorneys. Lexington, MA: Lexington Books.

Abrams, S. (1989). The complete polygraph handbook. Lexington, MA: Lexington Books.

Abrams, S. (. (1984). The question of the intent question. Polygraph, 13, 326-332.

Abrams, S. & Weinstein, E. (1974). The validity of the polygraph with retardates. Journal of Police Science and Administration, 2, 39766.

American Polygraph Association (2009a). Model Policy for Post-conviction Sex Offender Testing. [Electronic version] Retrieved January 25, 2012, from http://www .polygraph.org.

American Polygraph Association (2009b). Model Policy for Law Enforcement Public Services Pre- Employment Screening Examinations [Electronic version]1/25/2012, from http://www .polygraph.org.

American Polygraph Association (2011a). By-laws: American Polygraph Association, effective 1-1- 2012. [Electronic version] Retrieved January 6, 2012, from http://www .polygraph.org.

American Polygraph Association (2011b). Meta-analytic survey of criterion accuracy of validated polygraph techniques. Polygraph, 40(4), 196-305.

Amsel, T. T. (1999). Exclusive or nonexclusive comparison questions: A comparative field study. Polygraph, 28, 273-283.

Anderson, C. A., Lindsay, J. J., & Bushman, B. J. (1999). Research in the psychological laboratory: Truth or triviality? Current Directions In Psychological Science, 8, 3-9.

Ansley, N. (1983). A compendium on polygraph validity. Polygraph, 12, 53-61.

Ansley, N. (1990). Law notes: Civil and criminal cases. Polygraph, 19, 72-102.

Ansley, N. (1999, July). The frequency of appearance of evaluative criteria in polygraph charts. Defense Personnel Research Center.

Ansley, N. & Krapohl, D.J. (2000). The frequency of appearance of evaluative criteria in field polygraph charts. Polygraph, 29, 169-176.

ASTM (2002). Standard Practices for Interpretation of Psychophsysiological Detection of Deception (Polygraph) Data (E 2229-02). ASTM International.

Backster, C. (1963). Standardized polygraph notepack and technique guide: Backster zone comparison technique. Cleve Backster: New York.

Polygraph, 2015, 44(1) 51 Nelson

Balloun, K. D. & Holmes, D.S. (1979). Effects of repeated examinations on the ability to detect guilt with a polygraphic examination: A laboratory experiment with a real crime. Journal of Applied Psychology, 64, 316-322.

Barland, G. H. (1981). A Validity and Reliability Study of Counterintelligence Screening Test. Fort George G. Meade, Maryland: Security Support Battalion, 902d Military Intelligence Group. [Reprinted in Polygraph 41 (1) 1-27].

Barland, G. H., Honts, C. R. & Barger, S.D. (1989). Studies of the accuracy of security screening polygraph examinations. DTIC AD Number A304654. Department of Defense Polygraph Institute.

Barland, G. H. & Raskin, D.C. (1975a). An evaluation of field techniques in detection of deception. Psychophysiology, 12 (3), 321-330.

Barland, G. H. & Raskin, D.C. (1975a). Psychopathy and detection of deception in criminal suspects. Psychophysiology, 12, 224.

Barry, R. J. (1996). Preliminary process theory: Towards an integrated account of the psychophysiology of cognitive processes. Acta Neurobiologiae Experimentalis, 56(1), 469-484.

Bear, M. F., Barry, W. C. & Paradiso, M.A. (2007). Neuroscience: Exploring the brain: Third Edition. Philadelphia, PA: Lippincott Williams & Wilkins.

Bell, B. G., Raskin, D. C., Honts, C. R. & Kircher, J.C. (1999). The Utah numerical scoring system. Polygraph, 28 (1), 1-9.

Ben-Shakhar, G. & Dolev, K. (1996). Psychophysiological detection through the guilty knowledge technique: Effects of mental countermeasures. Journal of Applied Psychology, 81, 273-281.

Berntson, G. G., & Cacioppo, J. T. (2007). Integrative physiology: Homeostasis, allostasis and the orchestration of systemic physiology, in Cacioppo, J. T., Tassinary, L. G., and Berntson, G. G. (Eds.) Handbook of Psychophysiology, 3rd edition (pp. 433-449). New York: Cambridge University Press.

Birbaumer, N., Veit, R., Lotze, M., Erb, M., Hermann, C., Grodd, W. & Flor, H. (2005). Deficient fear conditioning in psychopathy: a functional magnetic resonance imaging study. Archives of general psychiatry, 62, 799-805.

Blackwell, J. N. (1998). PolyScore 33 and psychophysiological detection of deception examiner rates of accuracy when scoring examination from actual criminal investigations. Available at the Defense Technical Information Center. DTIC AD Number A355504/PAA. [Reprinted in Polygraph, 28 (2), 149-175].

Blalock, B., Nelson, R., Handler, M. & Shaw, P. (2011). A position paper on the use of directed lie comparison questions in diagnostic and screening polygraphs. Police Polygraph Digest, (2011), 2-5.

Blalock, B., Nelson, R., Handler, M. & Shaw, P. (2012). The empirical basis for the use of directed lie comparison questions in diagnostic and screening polygraphs. APA Magazine, 45(1), 36- 39.

Bracha, H. S., Ralston, T. C., Matsukawa, J. M., Williams, A. E. & Bracha, A.S. (2004). "Does "Fight or Flight" need updating? Psychosomatics, 45 (5), 448–449.

52 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

Bradley, M. T. & Janisse, M.P. (1981). Accuracy demonstrations, threat, and the detection of deception: Cardiovascular, electrodermal, and pupillary measures. Psychophysiology, 18, 307-315.

Boucsein, W. (2012). Electrodermal activity. New York: Plenum Press.

Campion, M., Campion, J. & & Hudson, J. (1994). Structured Interviewing: A note on incremental validity and alternative question types. Journal of Applied Psychology, 79, 998-1002.

Cannon, W. B. (1929). Bodily changes in pain, hunger, fear, and rage. New York: Appleton-Century- Croft

Capps, M. H. (1991). Predictive value of the sacrifice relevant. Polygraph, 20, 1-6.

Costanzo, L. (2007). Physiology. Hagerstwon, MD: Lippincott Williams & Wilkins.Department of Defense (2004a). Federal psychophysiological detection of deception examiner handbook. [Retrieved from http://www.antipolygraph.org on 2-23-2008].

Crewson, P. E. (2001). A comparative analysis of polygraph with other screening and diagnostic tools. Research Support Service. Report No. DoDPI01-R-0003. Reprinted in Polygraph 32 (57-85).

Department of Defense (2004b). Test data analysis: DoDPI numerical evaluation scoring system. [Retreived from http://www.antipolygraph.org on 6-28-2007].

Department of Defense (2006a). Federal psychophysiological detection of deception examiner handbook. Reprinted in Polygraph, 40 (1), 2-66.

Department of Defense (2006b). Test data analysis: DoDPI numerical evaluation scoring system. [Retrieved from http://www.antipolygraph.org on 3-31-2007].

Department of Defense Polygraph Institute (2002). Department of Defense Polygraph Institute Law Enforcement Pre-employment Test. [Retrieved from http://www.antipolygraph.org on 6-2- 2011].

Dollins, A. B., Krapohl, D. J. & Dutton, D. (1999). A comparison of computer programs designed to evaluate psychophysiological detection of deception examinations: Bakeoff 1. Department of Defense Polygraph Institute. Reprinted in Polygraph 29 (237-247).

Domjan, M. (2010). Principles of learning and behavior, 6th edition. Cengage/Wadsworth.

Drever, E. (1995). Using Semi-Structured Interviews in Small-Scale Research. A Teacher's Guide. Scottish Council for Research in Education, Edinburgh.

Elaad, E. & Ben-Shakhar, G. (1991). Effects of mental countermeasures on psychophysiological detection in the guilty knowledge test. International journal of psychophysiology: official journal of the International Organization of Psychophysiology, 11, 99-108.

Furedy, J. J. (1989). The North American CQT polygraph and the legal profession: a case of Canadian credulity and a cause for cultural concern. The Criminal Law Quarterly, 31, 431- 451.

General Accounting Office (1991). Using structured interviewing techniques. Program Evaluation and Methodology Division [Retrieved online http://www.gao.gov/policy/10_1_5.pdf on 10- 20-2009].

Polygraph, 2015, 44(1) 53 Nelson

Green, D. M. & Swets. J A (1966). Signal detection theory and psychophysics. New York: Wiley.

Greenberg, I. (1982). The role of deception in decision theory. Journal of Conflict Resolution, 26, 139-156.

Groves, P. M. & Thompson, R.F. (1970). Habituation: A dual-process theory. Psychological Review, 77, 419-450.

Handler M. (2006). The Utah PLC. Polygraph, 35, 139-148.Handler, M. & Nelson, R. (2007). Polygraph terms for the 21st Century. Polygraph, 36, 157-164.

Handler, M., Geddes, L., Reicherter, J. (2007). A discussion of two diagnostic features of the cardiovascular channel. Polygraph, 36 (2), 70-83.

Handler, M., Honts, C., Krapohl, D., Nelson, R. & Griffen, S. (2009). Integration of pre-employment polygraph screening into the police selection process. Journal of Police and Criminal Psychology, 24, 69-86.

Handler, M. & Krapohl, D. (2007). The use and benefits of the photoelectric plethysmograph in polygraph testing. Polygraph, 36, 18-25.

Handler, M. & Nelson, R. (2007). Polygraph terms for the 21st Century. Polygraph, 36, 157-164.

Handler, M. & Nelson, R. (2008). Utah approach to comparison question polygraph testing. European Polygraph, 2, 83-110.

Handler, M. Honts, C. & Nelson, R. (2013). Information gain of the Directed Lie Screening Test. Polygraph, 42, 192-202.

Handler, M., Nelson, R. & and Blalock, B. (2008). A focused polygraph technique for PCSOT and law enforcement screening programs. Polygraph, 37 (2), 100-111.

Handler, M., Nelson, R., Krapohl, J. & Honts, C. (2010). An EDA Primer for Polygraph Examiners. Polygraph, 39, 68-108.

Handler, M., & Reicherter, J. (2008). Respiratory blood pressure fluctuations observed during polygraph examinations. Polygraph, 37 (4), 256-262.

Handler, M., Rovner, L. & Nelson, R. (2008). The concept of allostasis in polygraph testing. Polygraph, 38, 228-233.

Handler, M., Shaw, P. & Gougler, M. (2010). Some thoughts about feelings: A study of the role of cognition and emotion in polygraph testing. Polygraph, 39, 139-154.

Harris, J. C. & Olsen, D.E. (1994). Polygraph Automated Scoring System. Patent Number: 5,327,899. U.S. Patent and Trademark Office.

Harris, J., Horner, A. & McQuarrie, D. (2000). An evaluation of the criteria taught by the department of defense polygraph institute for interpreting polygraph examinations. Johns Hopkins University, Applied Physics Laboratory. SSD-POR-POR-00-7272.

Hastie, T., Tibshirani, R. & Friedman, J. (2001). Elements of statistical learning. Springer.

Hilliard, D. L. (1979). A cross analysis between relevant questions and a generalized intent to answer truthfully question. Polygraph, 8, 73-77.

54 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

Hodgson, P. (1987). A practical guide to successful interviewing. Maidenhead: McGraw-Hill.

Honts, C. R. (1987). Interpreting research on polygraph countermeasures. Journal of Police Science and Administration, 15, 204-209.

Honts, C. R. (1997). Is it time to reject the friendly polygraph examiner hypothesis (FPEH)? Paper presented at the American Psychological Society at the 9th annual meeting, Washington, D. C., May 23-26, 1997.

Honts, C. R. & Alloway, W.R. (2007). Information does not affect the validity of a comparison question test. Legal and Criminological Psychology, 12, 311-320.

Honts, C. R. & Amato, S. (2001). Psychophysiological credibility assessment. Journal of Forensic Psychology Practice, 1, 87-99.

Honts, C., Amato, S. & Gordon, A. (2004). Effects of outside issues on the comparison question test. Journal of General Psychology, 131 (1), 53-74.

Honts, C. R. & Devitt, M.K. (1992). Bootstrap decision making for polygraph examinations. Department of Defense Polygraph Institute report No DoDPI92-R-0002.

Honts, C. & Handler, M. (2014). Scoring respiration when using directed lie comparison questions. Polygraph, 43 (3) 71-78.

Honts, C. Handler, M. Shaw, P. & Gougler, M. (2015). The vasomotor response in the comparison question test. Polygraph 44 (1) 62-78.

Honts, C. R. & Hodes, R.L. (1983). The detection of physical countermeasures. Polygraph, 12, 7- 17.

Honts, C. R., Hodes, R. L. & Raskin, D.C. (1985). Effects of physical countermeasures on the physiological detection of deception. Journal of Applied Psychology, 70, 177-187.

Honts, C. R. & Peterson, C.F. (1997). Brief of the Committee of Concerned Social Scientists as Amicus Curiae United States v Scheffer. Available from the author.

Honts, C. R. & Raskin, D. C. (1988). A field study of the validity of the directed-lie control question. Journal of Police Science and Administration, 16, 56-61.

Honts, C. R., Raskin, D. C. & Kircher, J.C. (1987). Effects of physical countermeasures and their electromyographic detection during polygraph tests for deception. Psychophysiology, 1, 241-247.

Honts, C. R., Raskin, D. C. & Kircher, J.C. (1994). Mental and physical countermeasures reduce the accuracy of polygraph tests. Journal of Applied Psychology, 79, 252-259.

Honts, C. R., Raskin, D. C., Kircher, J. C. & Hodes, R.L. (1988). Effects of spontaneous countermeasures on the physiological detection of deception. Journal of Police Science and Administration, 16, 91-94.

Honts, C. R. & Reavy, R. (2009). Effects of comparison question type and between test stimulation on the validity of comparison question test. Final progress report on contract No.W911Nf-07-1- 0670, submitted to the Defense Academy of Credibility Assessment (DACA). Boise State University.

Polygraph, 2015, 44(1) 55 Nelson

Honts, C. R., Winbush, M. & Devitt, M.K. (1994). Physical and mental countermeasures can be used to defeat guilty knowledge tests. Psychophysiology, 31, S57.

Horneman, C. J. & O'Gorman, J.G. (1985). Detectability in the card test as a function of the subject's verbal response. Psychophysiology, 22, 330-333.

Horowitz, S. W., Kircher, J. C. & Raskin, D.C. (1986). Does stimulation test accuracy predict accuracy of polygraph test?. Psychophysiology, 23, 442.

Horowitz, S. W., Kircher, J. C., Honts, C. R. & Raskin, D.C. (1997). The role of comparison questions in physiological detection of deception. Psychophysiology, 34, 108-115.

Horvath, F. & Palmatier, J. (2008). Effect of two types of control questions and two question formats on the outcomes of polygraph examinations. Journal of Forensic Sciences, 53(4), 1- 11.

Horvath, F. S. (1988). The utility of control questions and the effects of two control question types in field polygraph techniques. Journal of Police Science and Administration, 16, 198-209.

Horvath, F. S. (1994). The value and effectiveness of the sacrifice relevant question: An empirical assessment. Polygraph, 23, 261-279.

Iacono, W. G., Boisvenu, G. A. & Fleming, J.A. (1984). Effects of diazepam and methylphenidate on the electrodermal detection of guilty knowledge. Journal of Applied Psychology, 69, 289-299.

Iacono, W. G., Cerri, A. M., Patrick, C. J. & Fleming, J.A. (1992). Use of antianxiety drugs as countermeasures in the detection of guilty knowledge. The Journal of applied psychology, 77, 60-4.

Janig. W. (2006). The Integrative Action of the Autonomic Nervous System: Neurobiology of Homeostasis. Cambridge University Press.

Kahn, R. L. and C. F. Cannell (1957). The psychological basis of the interview. The dynamics of interviewing: theory, technique, and cases. New York, John Wiley & Sons: 22-64.

Kahn, J., Nelson, R. & Handler, M. (2009). An exploration of emotion and cognition during polygraph testing. Polygraph, 38, 184-197.

Kirby, S. L. (1981). The comparison of two stimulus tests and their effect on the polygraph technique. Polygraph, 10, 63-76.

Kircher, J. C. (1981). Computerized chart evaluation in the detection of deception. Masters thesis: University of Utah.

Kircher, J. C. (1983). Computerized decision making and patterns of activation in the detection of deception. Dissertation Abstracts International, 44, 345.

Kircher, J. & Raskin, D. (2002). Computer methods for the psychophysiological detection of deception. In Murray Kleiner (Ed.), Handbook of Polygraph Testing . San Diego: Academic Press.

Kircher, J. C., Kristjiansson, S. D., Gardner, M. K. & Webb, A. (2005). Human and computer decision-making in the psychophysiological detection of deception. University of Utah.

56 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

Kircher, J. C. & Raskin, D.C. (1988). Human versus computerized evaluations of polygraph data in a laboratory setting. Journal of Applied Psychology, 73, 291-302.

Kircher, J. C., Packard, T., Bell, B. G. & Bernhardt, P. C., (2001). Effects of Prior Demonstrations of Polygraph Accuracy on Outcomes of Probable-Lie and Directed-lie Polygraph Tests. Final report to the U. S. Department of Defense Polygraph Institute, Ft. Jackson, SC. Salt Lake City: University of Utah, Department of Educational Psychology.

Krapohl, D. J. (2002). Short report: Update for the objective scoring system. Polygraph, 31, 298- 302.

Krapohl, D. & McManus, B. (1999). An objective method for manually scoring polygraph data. Polygraph, 28, 209-222.

Krapohl, D. J. & Ryan, A.H. (2001). Final Comment on the bleated look at symptomatic questions. Polygraph, 30, 218-219.

Kvale, S. (1996). Interviews: An Introduction to Qualitative Research Interviewing. Sage Publications.

Lehmann, E. L. (1950). Some principles of the theory of testing hypotheses. Annals of Mathematical Statistics, 21 (1), 1-26.

Lieblich, I., Ben-Shakhar, G., Kugelmass, S. & Cohen, Y. (1978). Decision theory approach to the problem of polygraph interrogation. Journal of Applied Psychology, 63, 489-498.

Light, G. D. (1999). Numerical evaluation of the Army zone comparison test. Polygraph, 28, 37-45.

Lindlof, T. R. & Taylor, B.C. (2002). Qualitative communication research methods. Thousand Oaks: Sage.

Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 5-55.

Lykken, D. T. (1981). A tremor in the blood: Uses and abuses of the lie detector. New York:McGraw- Hill.

MacLaren, V. & Krapohl, D. (2003). Objective Assessment of Comparison Question Polygraphy. Polygraph, 32, 107-126.

Marcum, J. I. (1947). A statistical theory of target detection by pulsed radar. Rand Corporation. [Retrieved online http://www.rand.org 2011-11-2].

Maton, A., Hopkins, J., McLaughlin, C., Johnson, S., Quon Warner, M., LHart, D. & Wright, J. (1993). Human biology and health. Englewood Cliffs, New Jersey, USA: Prentice Hall.

National Research Council (2003). The polygraph and lie detection. Washington, D.C.: National Academy of Sciences.

Nelson, R. (2014a). What does the polygraph measure? APA Magazine, 47(2) 39-47.

Nelson, R. (2014b). : What does the polygraph measure? (in 600 words or less). APA Magazine, 47 (3), 36-37.

Nelson, R. (2014c). Take 3: What does the polygraph measure? (in 250 words or less). APA Magazine, 47 (4), 60.

Polygraph, 2015, 44(1) 57 Nelson

Nelson, R. (2014d). Short Answer: What does the polygraph measure? (in 150 words or less). APA Magazine, 47 (5), 27.

Nelson, R. (2014e). Sound-bite: What does the polygraph measure? (in 50 words or less). APA Magazine, 47 (6), 29.

Nelson, R. & Handler, M. (2010). Empirical scoring system: NPC quick reference. Lafayette Instrument Company. Lafayette, IN.

Nelson, R. & Handler, M. (2012). Monte Carlo study of criterion validity of the directed lie screening test using the Empirical Scoring System and the Objective Scoring System version 3. Polygraph, 41, 145-155.

Nelson, R. & Handler, M. (2013). A brief history of scientific reviews of polygraph accuracy research. APA Magazine, 47 (6), 22-28.

Nelson, R, & Handler, M. (2015,). Statistical reference distributions for comparison question polygraphs. Polygraph, 44 (1), 91-114.

Nelson, R., Handler, M., Blalock, B. & Hernandez, N. (2012). Replication and extension study of Directed Lie Screening Tests: criterion validity with the seven and three Position models and the Empirical Scoring System. Polygraph 41 (3), 186-198.

Nelson, R., Handler, M., & Morgan, C. (2012). Criterion validity of the directed lie screening test and the empirical scoring system with inexperienced examiners and non-naive examinees in a laboratory setting. Polygraph 41 (3), 176-185.

Nelson, R., Handler, M., Shaw, P., Gougler, M., Blalock, B., Russell, C., Cushman, B. & Oelrich, M. (2011). Using the Empirical Scoring System. Polygraph, 40, 67-78.

Nelson, R., Krapohl, D. & Handler, M. (2008). Brute force comparison: A Monte Carlo study of the Objective Scoring System version 3 (OSS-3) and human polygraph scorers. Polygraph, 37, 185-215.

Offe, H. & Offe, S. (2007). The comparison question test: does it work and if so how?. Law and human behavior, 31, 291-303.

Office of Technology Assessment (1983). The validity of polygraph testing: A research review and evaluation. Printed in Polygraph, 12, 198-319.

Ogilvie, J. & Dutton, D. (2008). Improving the detection of physical countermeasures with chair sensors. Polygraph, 37 (2), 136-148.

Olsen, D. E., Ansley, N., Feldberg, I. E., Harris, J. C. & Cristion, J.A. (1991). Recent developments in polygraph technology. Johns Hopkins APL Technical Digest, 12, 347-357.

Olsen, D. E., Harris, J. C. & Chiu, W.W. (1994). The development of a physiological detection of deception scoring algorithm. Psychophysiology, 31, S11.

Olsen, D. E., Harris, J. C., Capps, M. H. & Ansley, N. (1997). Computerized polygraph scoring system. Journal of Forensic Sciences, 42, 61-70.

Palmatier, J. J. (1991). Analysis of two variations of control question polygraph testing utilizing exclusive and nonexclusive controls. Unpublished doctoral dissertation.

58 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

Palmatier, J., Rovner, L (2014). Credibility assessment: Preliminary process theory, the polygraph process, and construct validity. International Journal of Psychophysiology, [available online: http://www.ncbi.nlm.nih.gov/pubmed/24933412].

Paradiso, M. A., Bear, M. F. & Connors, B.W. (2007). Neuroscience: Exploring the Brain. Hagerstwon, MD: Lippincott Williams & Wilkins.

Patrick, C. J. & Iacono, W.G. (1989). Psychopathy, threat and polygraph test accuracy. Journal of Applied Psychology, 74, 347-355.

Pavlov, I. P. (1927). Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. Sage Publications Inc. 2009.

Podlesny, J. A. & Raskin, D. C. (1978). Effectiveness of techniques and physiological measures in the detection of deception. Psychophysiology, 15, 344-359.

Podlesny, J., Raskin, D. & Barland, G. (1976). Effectiveness of techniques and physiological measures in the detection of deception. Report No. 76-5, Contract 75-N1-99-001 LEAA (available through Department of Psychology, University of Utah, Salt Lake City).

Popper, K. R. (1959). The logic of scientific discovery. Routledge.

Porges, S. (2011). The Polyvagal Theory: Neurophysiological Foundations of Emotions, Attachment, Communication, and Self-regulation. Norton.

Pratt, J., Raiffa, H. & Schlaifer, R. (1995). Introduction to statistical decision theory. Cumberland, RI: MIT Press.

Pollina, D. A., Dollins, A. B., Senter, S. M., Krapohl, D. J. & Ryan, A. H. (2004). Comparison of polygraph data obtained from individuals involved in mock crimes and actual criminal investigations. Journal of applied psychology, 89, 1099-105.

Popper, K. R., (1959). The Logic of Scientific Discovery. The German version is currently in print by Mohr Siebeck (ISBN 3-16-148410-X), the English one by Routledge publishers (ISBN 0-415- 27844-9).

Powell, M. B. & Snow, P.C. (2007). Guide to questioning children during the free-narrative phase of an investigative interview. Australian Psychologist, 42 (1), 57-65.

Raffle, A. E. & Muir Gray, J.A. (2007). Screening. Oxford University Press.

Raskin D. C., Honts. C. R. (2002). The comparison question test. In M. Kleiner (Ed.), Handbook of Polygraph Testing. San Diego:Academic Press.

Raskin, D. C., Honts, C. R., Kircher, J. C. (2014). Credibility assessment: Scientific research and applications. Academic Press.

Raskin, D. C. & Hare, R.D. (1978). Psychopathy and detection of deception in a prison population. Psychophysiology, 15, 126-136.

Raskin, D. C. & Kircher, J.C. (1990, May 2). Development of a computerized polygraph system and physiological measures for detection of deception and countermeasures: A pilot study. Scientific Assessment Technologies, Inc.

Polygraph, 2015, 44(1) 59 Nelson

Raskin, D. C. & Podlesny, J.A. (1979). Truth and deception: A reply to Lykken. Psychological Bulletin, 86, 54-59.

Raskin, D., Kircher, J. C., Honts, C. R. & Horowitz, S.W. (1988). A study of the validity of polygraph examinations in criminal investigations. Final Report, National Institute of Justice, Grant No. 85-IJ-CX-0040.

Reid, J. E. (1947). A revised questioning technique in lie detection tests. Journal of Criminal Law and Criminology, 37, 542-547. Reprinted in Polygraph 11, 17-21.

Research Division Staff (1995a). Psychophysiological detection of deception accuracy rates obtained using the test for espionage and sabotage. DTIC AD Number A330774. Department of Defense Polygraph Institute. Fort Jackson, SC. Reprinted in Polygraph, 27, (3), 171-180.

Research Division Staff (1995b). A comparison of psychophysiological detection of deception accuracy rates obtained using the counterintelligence scope Polygraph and the test for espionage and sabotage question formats. DTIC AD Number A319333. Department of Defense Polygraph Institute. Fort Jackson, SC. Reprinted in Polygraph, 26 (2), 79-106.

Richerter J. & Handler, M. (2014). A physiology manual for PDD lifelong learners of the science. Available from the authors. Retrieved online at [http://www.polygraph.org/files/apa_psychophysiology_study_guide.pdf] on 11-30-2014.

Rovner, L. I. (1979). The effects of information and practice on the accuracy of physiological detection of deception. Unpublished doctoral dissertation: University of Utah.

Rovner, L. I. (1986). Accuracy of physiological detection of deception for subjects with prior knowledge. Polygraph, 15 (1), 1-39.

Rovner, L. I., Raskin, D. C. & Kircher, J.C. (1979). Effects of information and practice on detection of deception. Psychophysiology, 16, 197-198 (abstract).

Standring, S. (2005). Gray's anatomy (39th ed.). Elsevier Churchill Livingstone.

Saxe, L. (1991). Science and the CQT polygraph: A theoretical critique. Integrative Physiological and Behavioral Science, 26, 223-231.

Schonhoff, T. A. & Giordano, A.A. (2006). Detection and estimation theory and its applications. New Jersey: Pearson Education.

Senter, S. M. (2003). Modified general question test decision rule exploration. Polygraph, 32, 251- 263.

Senter, S. M. & Dollins, A.B. (2002). New decision rule development: Exploration of a two-stage approach. Report number DoDPI00-R-0001. Department of Defense Polygraph Institute Research Division, Fort Jackson, SC.

Senter, S. M. & Dollins, A.B. (2008). Optimal decision rules for evaluating psychophysiological detection of deception data: an exploration. Polygraph, 37 (2), 112-124.

Senter, S., Weatherman, D., Krapohl, D. & Horvath, F. (2010). Psychological set or differential salience: A proposal for reconciling theory and terminology in polygraph testing. Polygraph, 39 (20), 109-117.

60 Polygraph, 2015, 44(1) Scientific Basis of Polygraph

Silverthorn, D. U. (2009). Human physiology: An integrated approach (4 ed.). Pearson/Benjamin Cummings.

Stephenson, M. & Barry, G. (1986). Use of a motion chair in the detection of physical countermeasures. [Reprinted in Polygraph, 17, 21-27].

Sterling, P. and Eyer, J. (1988) Allostatsis: a new paradigm to explain arousal pathology. In: Fisher, S., Reason, J. (Eds.) Handbook of Life Stress, Cognition and Health. New York: Wiley and Sons.

Summers, W. G. (1939). Science can get the confession. Fordham Law Review, 8, 334-354.

Swets, J. A. (1964). Signal detection and recognition by human observers. New York: Wiley.

Swets, J. A. (1996). Signal detection theory and ROC analysis in psychology and diagnosis. Mahwah, NJ: Erlbaum.

Tanner, J., Wilson, P. & Swets, J. (1954). A decision-making theory of visual detection. Psychological Review, 61 (6), 401-409.

Taylor, S. E., Klein, L. C., Lewis, B. P., Gruenewald, T. L., Gurung, R. A. & Updegraff, J.A. (2000). Biobehavioral responses to stress in females: Tend-and-befriend, not fight-or-flight. Psychological Review, 107, 411-429.

Timm, H. W. (1982). Analyzing deception from respiration patterns. Journal of Police Science and Administration, 10, 47-51.

Timm, H. W. (1991). Effect of posthypnotic suggestions on the accuracy of preemployment polygraph testing. Journal of Forensic Sciences, 36, 1521-1535.

Vrij, A. (2008). Detecting Lies and Deceit, Pitfalls and Opportunities, 2nd edition. West Sussex, Enland: Wiley & Sons.

Wald, A. (1939). Contributions to the theory of statistical estimation and testing hypotheses. Annals of Mathematical Statistics, 10 (4), 299-326.

Waid, W. M., Orne, E. C. & Orne, M.T. (1981). Selective memory for social information, alertness, and physiological arousal in the detection of deception. Journal of Applied Psychology, 66, 224-232.

Wells, G. L. & & Olson, E.A. (2002). Eyewitness identification: Information gain from incriminating and exonerating behaviors. Journal of Experimental Psychology: Applied, 8, 155–167.

Wickens, T. D. (1991). Maximum-likelihood estimation of a multivariate Gaussian rating model with excluded data. Journal of Mathematical Psychology, 36, 213-234.

Wickens, T. D. (2002). Elementary signal detection theory. New York: Oxford.

Widup, R. & Barland, G.H. (1994). Effect of the location of the numbers test on examiner decision rates in criminal psychophysiological detection of deception tests. Department of Defense Polygraph Institute.

Wilson, J. & Jungner, G. (1968). Principles and practice of screening for disease. WHO Chronicle Geneva: World Health Organization, 22 (11):473.

Polygraph, 2015, 44(1) 61 Form Approved REPORT DOCUMENTATION PAGE OMBNo. 07040188

Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188), Washington, DC 20503.

1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED June 12, 2001 from August 2000 - June 2001 4. TITLE AND SUBTITLE 5. FUNDING NUMBERS Comparative Analysis of Polygraph with Other Screening and Diagnostic Tools DoDPIOO-P-0030

B. AUTHOR(S) Philip E. Crewson, Ph.D.

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Research Support Service REPORT NUMBER 21515 Welby Terrace DoDPIOl-R-0003 Ashburn, VA 20148

9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING /MONITORING DoD Polygraph Institute AGENCY REPORT NUMBER 7540 Pickens Avenue Fort Jackson, South Carolina

11. SUPPLEMENTARY NOTES

12a. DISTRIBUTION / AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE

Public release, distribution unlimited

13. ABSTRACT (Maximum 200 words! The purpose of this study was to conduct a limited review of literature published between January 1986 and May 2001 concerning the accuracy and reliability of screening and diagnostic tests in polygraph, medicine, and psychology. Out of 5,189 hits produced by the literature search, 1,158 articles and abstracts were reviewed, 145 were found to useful resulting ii data on 198 studies. For field screening assessments, the sensitivity of polygraph, medical, and psychological tools was .59, 79, and .74 respectively. Specificity of polygraph, medical, and psychological tools was .92, .83, and .72. Agreement was measured with kappa. Among readers in polygraph, medicine, and psychology kappa was .77, .56, and .79 respectively. Reports in the literature of polygraph's accuracy and reliability (agreement) on specific issues appear to be consistent with published studies on medical and psychological assessment tools. However, there is an enormous range of accuracy and agreement not only within polygraph but also medicine and psychology. Although there were very few polygraph screening studies, accuracy reports were lower than those in medicine and psychology.

20020722 249

14. SUBJECT TERMS 15. NUMBER OF PAGES 160 16. PRICE CODE

17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION OF THIS 19. SECURITY CLASSIFICATION 20. LIMITATION OF ABSTRACT OF REPORT PAGE OF ABSTRACT Unclassified Unclassified Unclassified None

NSN 7540-01-280-5500 Standard Farm 298 (Rev. 2-89) USAPPCV1.00 Prescribed by ANSI Std. Z39-18 298-102 Report No. DoDPIOl-R-0003

Comparative Analysis of Polygraph with Other Screening and Diagnostic

Phillip E. Crewson Ph.D. Research Support Service, 21515 Welby Terrace, Ashburn, VA 20148

June 12, 2001

Department of Defense Polygraph Institute 7540 Pickens Avenue Fort Jackson SC 29207 Project: A Comparative Analysis of Polygraph with other Screening and Diagnostic Tools

Date: June 12,2001

Investigator: Philip E. Crewson, Ph.D. Research Support Service 21515 Welby Terrace Ashburn, VA 20148 703-858-7248 703-858-7247 (fax) e-mail: [email protected]

Funded by: Department of Defense Polygraph Institute Contract No. DABT60-01-P-3017 Abstract

The purpose of this study was to conduct a limited review of literature published between

January 1986 and May 2001 concerning the accuracy and reliability of screening and diagnostic tests in polygraph, medicine, and psychology. Out of the 5,189 hits produced by the literature

search, 1,158 articles and abstracts were reviewed, 145 were found to be useful resulting in data

on 198 studies. For field screening assessments, the sensitivity of polygraph, medical, and

psychological tools was .59, .79, and .74 respectively. Specificity of polygraph, medical, and

psychological screening was .90, .94, and .78. For field diagnostic assessments, the sensitivity of

polygraph, medical, and psychological tools was .92, .83, and .72. Specificity of polygraph,

medical, and psychological diagnostic testing was .83, .88, and .67 respectively. Agreement was

measured with kappa. Among readers in polygraph, medicine, and psychology kappa was .77,

.56, and .79 respectively. Reports in the literature of polygraphs accuracy and reliability

(agreement) on specific issues appear to be consistent with published studies on medical and

psychological assessment tools. However, there is an enormous range of accuracy and

agreement not only within polygraph but also medicine and psychology. Although there were

very few polygraph screening studies, accuracy reports were lower than those in medicine and

psychology.

An Executive Summary is available on page 26 of this report. Introduction

The purpose of this study was to conduct a limited review of the literature concerning the accuracy and reliability of screening and diagnostic tests in polygraph, medicine, and psychology. Measures in common use today for evaluating assessment tools assume perfection is the benchmark of a tool's efficacy. This inevitably causes disappointment in the performance of assessment tools, since they rarely produce 100% accuracy or reliability unless significant tradeoffs are made. The premise of this study is that something less than perfection is the common outcome of assessment tool studies. What follows is an effort to put the reported accuracy and reliability of polygraph in context with studies from the medical and psychological literature. It is important to recognize that comparing assessment tools across different

disciplines and technologies will not clarify whether or not polygraph is an accurate or reliable means for detecting truth and deception. It does, however, place the less than perfect performance of polygraph along side other commonly used diagnostic and screening tools.

The literature review focuses on the validity (accuracy) and reliability (agreement) of

polygraph as it compares to other assessment tools outside the framework of the detection of

deception. The primary focus is on common medical (diagnostic radiology) assessment tools

such as ultrasound (US), x-rays, computed tomography (CT), and magnetic resonance imaging

(MRI) along with psychological assessment tools such as the Minnesota Multiphasic Personality

Inventory (MMPI) and the Diagnostic and Statistical Manual of Mental Disorders (DSM-III and

IV). Polygraphs approach involves a human reader using technology to measure and interpret

physiological conditions and responses to make a diagnosis. This process is very similar to the

mechanics involved in diagnostic radiology. This connection between reader, technology, and

examinee is less evident in the psychological literature where many of the assessment tools are paper and pencil tests with established scoring systems that require no reader interpretation. An effort was made to locate assessment tools in education and personnel screening but few were found that either a) reported data that were comparable or b) were not already incorporated into the psychology literature.

A very utilitarian approach was used in gathering data for this comparative analysis.

Study selection depended on whether there was sufficient information in an abstract to a) use in the analysis or b) would lead this investigator to believe usable data could be obtained from the

article text. What follows is a very specific delineation of the methodology used. Consumers of

this report should be warned that the summary measures calculated from this set of research

studies may be different if another selection of studies is used. Method

In addition to information provided by the Department of Defense Polygraph Institute

(DoDPI) research staff, a literature search was conducted through MEDLINE of the National

Library of Medicine, Psyclnfo Direct of the American Psychological Association, and an index

of polygraph studies on the National Polygraph Consultants website. The search was limited to research published in the past fifteen years, but some results from review articles include earlier

studies. The keywords used in the literature search are listed in Table 1.

Table 1: Search Keywords

screemng multi-rater screening test test reliability screemng evaluation personnel testing screening techniques security screening screemng accuracy psychological testing diagnostic MMPI diagnostic test validity diagnostic evaluation accuracy diagnostic techniques sensitivity diagnostic accuracy specificity reliability area under the curve agreement receiver operating characteristic curve kappa ROC percent agreement test validity rater agreement

DoDPI staff provided assistance in collecting hard copy versions of polygraph articles

and also gave guidance on relevant polygraph studies not found during the literature search. The

polygraph studies were reviewed first to identify measures of accuracy and agreement that would

be common to all three professions. Accuracy and agreement measures were reported in several different forms and in several cases had to be converted to maintain consistency. Table 2 lists the common measures used for this review. These terms are more common in the medical

literature but their meanings are directly transferable to polygraph and psychology.

Table 2: Summary of common terms used and not used in this study.

Terms Used Definition Also Known As

Sensitivity (Se) The proportion of diseased cases with a True Positive Rate (TPR) positive test, (perfect accuracy = 1.0)

Specificity (Sp) The proportion of non-diseased cases with True Negative Rate (TNR) a negative test, (perfect accuracy = 1.0)

Total Accuracy (Se+Sp)/2 (perfect accuracy = 1.0) Lykken's formula

Percent Agreement The proportion of all readings conducted by two readers in which their interpretations agreed.

Kappa (k) Coefficient representing agreement obtained between two readers beyond chance. A value of 1 represents perfect agreement. A value of 0 represents no agreement.

Terms Not Used Definition Also Known As

False Positive Rate The proportion of non-diseased cases with 1-Specificity a positive test, (perfect accuracy = 0.0)

False Negative Rate The proportion of diseased cases with a 1-Sensitivity negative test, (perfect accuracy = 0.0)

Total Accuracy # of correct interpretations 4- # of total interpretations The primary measures used in this report are sensitivity, specificity, and kappa.

Sensitivity reflects the proportion of diseased cases correctly identified by an assessment tool. In polygraph, disease is analogous to deception. A sensitivity of 1.0 indicates the tool correctly identifies 100% of cases with the target condition. Specificity reflects the proportion of non- diseased (truthful) cases correctly identified by an assessment tool. A specificity of 1.0 indicates the tool correctly identifies 100% of cases without the target condition. Sensitivity and

specificity are also averaged to give one combined estimate of accuracy. The combined

accuracy measure is the same as that advocated by Lykken (1983) and used in a review of

polygraph studies by McCauley and Forman (1988). The fourth measure is kappa. Kappa is a

coefficient that represents agreement obtained between two readers beyond what would be

expected by chance alone. A value of 1.0 represents perfect agreement. A value of 0.0

represents no agreement. Kappa can also range to -1.0 (perfect disagreement) but there are no

negative kappas reported in this study. A list of common terms there are not used is also

provided in Table 2.

Upon review, a study was categorized as either analog (laboratory) or field-based (actual

cases) and whether they were measuring an assessment tool in a screening or diagnostic

application. Screening applications involve the use of an assessment tool on a general

population in which there is no specific evidence of disease. As an example, screening

mammography is routinely used on asymptomatic women in the hope of finding disease at an

early stage. Diagnostic correlates with the polygraph specific issue test and is reserved for

studies where there is prior evidence a condition exists, such as when a test is ordered after a

clinical examination of a patient suggests an abnormality. As an example, diagnostic

1 Fleiss, J.L. (1981), Statistical Methods for Rates and Proportions. John Wiley, New York. Landis, J.R and G.G. Koch. (1977). "The Measurement of Observer Agreement for Categorical Data." Biometrics 8 mammography is used on symptomatic women; those who have discovered a lump or other abnormality in the breast.

Abstracts were reviewed for evidence of comparable measures of accuracy and

agreement. Exploratory studies, newsletters, commentaries, non-established scales, duplications,

and studies that were unlikely to produce appropriate statistics were avoided. Agreement studies

that didn't compare interpretations between two or more human raters were not used. If accuracy

or agreement were reported separately by various control groups (sex, race, age) an effort was

made to calculate an average. Scales that did not have an established cutoff for disease were not

used. Only studies investigating a procedure in common use were used. This was determined by

words and phrases in the text such as "preliminary," "could be used," "potential for." When

accuracy was presented for both a newly proposed versus old established technique, only the

data for the established technique were used. Studies involving the use of two or more

procedures to form a decision and studies designed to stage the progression of known disease

were also excluded. Medical studies involving invasive scopes were not used; nor did this

review include any medical diagnostic tests outside radiology, such as pathology or cardiology.

When accuracy was reported at several cutoffs, the first disease cutoff was used if there was no

other indication of recommended practice. Contrary to the review conducted by the Office of

Technology Assessment (OTA) in 1983, inconclusive results were not used in the accuracy

estimates. Although inconclusives were rarely mentioned in the medical and psychological

literature, when they were mentioned, they were explicitly excluded from the accuracy estimates.

Inconclusive interpretations were used for agreement statistics when the data were available.

Data collected on accuracy, agreement, number of subjects, and number of studies were

entered into a spreadsheet. These data were double-verified for accuracy. The spreadsheet was used to sort studies and quantify summary measures. A mean, median, minimum, and maximum value were calculated to summarize the overall results of the screening and diagnostic studies found in this inquiry. This approach is similar to prior reviews. No statistical analysis was conducted nor is it recommended.

Weaknesses

Before continuing, the results of this review should be put into context by clearly noting

several weaknesses in both study design and application. This report contains a fifteen-year

snapshot of three literature domains, not definitive estimates of diagnostic test performance.

Therefore the greatest concern is overgeneralization of the results beyond their simple intent to

frame the science of accuracy and reliability. In addition, this review should be viewed with the

following caveats in mind:

1. This is not a systematic review of the literature in polygraph, medicine, or psychology.

Specific rules were followed to collect examples of the relevant body of literature, but

there was also a very utilitarian perspective taken in obtaining a sampling of accuracy and

agreement reports on commonly known assessment tools.

2. The summary statistics reported for polygraph, medicine, and psychology should not be

interpreted as generalizable to all assessment tools or applications within these

professions. The summary statistics are simply a method of conveying the central

tendency and variation of accuracy and reliability estimates reported in the literature.

They are not statements of accuracy for a particular procedure or profession. There is

much more that would need to be done to develop that level of precision. As an example,

a systematic and replicable review should include special analytical techniques such as 10

meta-analysis, study quality scoring, exclusion of low quality studies, and exhaustive

disease-technology specific literature searches. This would be an enormous undertaking

that far exceeds the objective of this study.

3. Similar to one of the weaknesses mentioned above, this review did not make any effort to

determine the quality of the research that produced the statistics reported in the tables that

follow.

4. It should be noted that the tools used for polygraph, medicine, and psychology are not

directly comparable in either their technology, application, or patient populations. 11

Results

The results of this literature review are separated into several sections. After reviewing the results of the search effort, the overall results for screening and diagnostic accuracy will be presented. This will be followed by a rank-ordered comparison of accuracy as it relates to common medical and psychological diseases (e.g. appendicitis, depression). A rank-ordered comparison will also be provided by assessment technique (MRI, MMPI, etc.). The results section will conclude with a review of reader agreement.

Literature Search

A search for polygraph studies was conducted on April 1, 2001 through the index provided by National Polygraph Consultants (www.nationalpolygraphconsultants.com). Out of

152 articles found, 42 were reviewed, data from 16 articles were used representing 51 separate studies (see Table 3). A search for medical studies was conducted on April 30, 2001 through the

PubMed index (www.ncbi.nlm.nih.gov/PubMed). The search looked for keywords in both the title and abstract and covered the time frame 1/1/1986 through 4/30/2001. Since there were tens of thousands of hits in PubMed, the search was refined to focus on any keywords in the title or abstract that contained both a common imaging modality (plain film, mammography, ultrasound,

CT, MRI) and kappa, sensitivity, specificity, or receiver operating characteristic curve (ROC).

Abstracts and/or articles from 933 articles were reviewed. Data from 90 of these articles were used representing 90 separate studies. A search for psychological literature was conducted via

Psyclnfo Direct (http://www.psycinfo.com) on April 29, 2001 for keywords in the abstract. The 12 search covered 1/1/1985 to 4/29/2001. Out of 3,975 articles found, 183 were reviewed, data from 39 articles were used representing 57 separate studies.

Table 3: Search Results

Articles Studies Field Hits Reviewed Used Reported

Polygraph 152 42 16 51 Medical 1,065 933 90 90 Psychological 3,975 183 39 57

Total 5,189 1,158 145 198

Although analog studies were very common in the polygraph literature, none were found in the psychology literature and only two were found in the medical literature. As a result, most comparisons mentioned in the report focus on describing field polygraph accuracy (bolded in tables); however, the tables also include results for analog polygraph along with analog and field studies averaged into one combined accuracy measure. Some articles reported more than one accuracy or agreement estimate. As a result, the count of studies provided in some of the tables may sum to more than what is reported in Table 3. A complete listing of the data used in this report is contained in the Appendix.

Accuracy of Screening Techniques

Five polygraph screening studies were found. Based on three analog studies, the mean sensitivity of polygraph screening (.76) is greater than that reported in the two field polygraph screening studies (.59). Specificity in analog polygraph screening studies (.82) is less than field 13 screening studies (.90). For ten medical screening studies, both the mean sensitivity (.79) and specificity (.94) are greater than polygraph. Psychology screening (36 studies) reports have greater sensitivity (.74) than polygraph but lower specificity (.78).

Table 4: Accuracy of screening techniques in polygraph, medicine, and psychology

Polygraph Analog Field Combined* Medicine Psychology Sensitivity (TPR) Mean 0.76 0.59 0.67 0.79 0.74 Median 0.67 0.59 0.63 0.78 0.79 Minimum 0.61 0.45 0.53 0.51 0.11 Maximum 1.00 0.73 0.86 0.97 1.00 Studies 3 2 5 10 36

Specificity (TNR) Mean 0.82 0.90 0.86 0.94 0.78 Median 0.83 0.90 0.87 0.93 0.85 Minimum 0.63 0.87 0.75 0.87 0.00 Maximum 1.00 0.93 0.97 1.00 1.00 Studies 3 2 5 10 36

Combined Accuracy Mean 0.79 0.74 0.77 0.86 0.76 Median 0.72 0.74 0.73 0.85 0.78 Minimum 0.65 0.69 0.67 0.76 0.42 Maximum 1.00 0.80 0.90 0.99 0.98 Studies 3 2 5 10 36

Number of Subjects Mean 50 467 258 56,581 996 Median 40 467 253 19,758 307 Minimum 40 200 120 79 55 Maximum 71 733 402 202,070 16,235 Studies 3 2 5 10 36

* (analog+ field)/2 14

Overall, the mean reported combined accuracy of screening polygraph (.74) is similar to screening psychology studies (.76) but lower than the mean combined screening accuracy for medical (.86) studies. The range between the minimum and maximum combined accuracy estimates from the literature is very different for polygraph (.69 to .80), medicine (.76 to .99), and psychology (.42 to .98). On average, polygraph screening studies use about half (467) as many subjects as psychology (996) studies and far less than reported for medical screening

studies (56,581).

Accuracy of Diagnostic Techniques

There are 37 field polygraph, 94 medical, and 51 psychology diagnostic studies reported

in Table 5. Mean sensitivity and specificity reported in field polygraph diagnostic studies are

greater than those based on analog diagnostic studies. The mean sensitivity reported for

medicine (.83) and psychology (.72) are lower than field polygraph (.92) studies. Although

polygraph field studies have a mean specificity (.83) that is greater than psychology studies (.67),

polygraph's specificity is similar but lower than that reported in the medical studies (.88).

Overall, the mean combined diagnostic accuracy of polygraph (.88) and medical (.86) studies are

very similar. The range between the minimum and maximum combined accuracy estimates from

the literature are very similar for polygraph (.64 to 1.0), medicine (.60 to 1.0), and psychology

(.50 to .93). On average, polygraph diagnostic studies use about half (108) as many subjects as

medicine (284) and psychology (218). 15

Table 5: Accuracy of diagnostic techniques in polygraph, medicine, and psychology

Polygraph Analog Field Combined Medicine Psychology Sensitivity (TPR) Mean 0.89 0.92 0.91 0.83 0.72 Median 0.92 0.95 0.94 0.85 0.71 Minimum 0.63 0.71 0.67 0.25 0.37 Maximum 1.00 1.00 1.00 1.00 0.96 Studies 18 37 55 94 51

Specificity (TNR) Mean 0.78 0.83 0.81 0.88 0.67 Median 0.79 0.90 0.85 0.93 0.65 Minimum 0.49 0.43 0.46 0.44 0.41 Maximum 0.97 1.00 0.99 1.00 0.95 Studies 18 37 55 94 51

Combined Accuracy Mean 0.84 0.88 0.86 0.86 0.70 Median 0.85 0.90 0.87 0.88 0.69

of Subjects Mean 72 108 90 284 218 Median 55 64 60 124 84 Minimum 15 16 16 23 29 Maximum 192 959 576 4,811 1,079 Studies 18 37 55 89 51

Accuracy by Target Condition

Table 6 reports screening and diagnostic accuracy by target condition and assessment technique used. The list is ordered from highest to lowest mean combined accuracy. Diagnosing acute appendicitis with computed tomography (CT) has the greatest combined accuracy (.96).

Based on five studies, CT has a sensitivity of .95 and specificity of .98 in the diagnosis of acute appendicitis. 16

Table 6: Rank ordered "Combined Accuracy" on common medical & psychological diseases

Average Accuracy

Sensitivity Specificity Combined Number of Target Condition Technique (TPR) (TNR) Accuracy Studies

Acute Appendicitis CT 0.95 0.98 0.96 5 Brain Tumor MRI 0.93 0.98 0.95 2 Carotid Artery Disease US 0.89 0.93 0.91 14 Acute Appendicitis US 0.84 0.97 0.91 2 Breast Cancer US 0.92 0.87 0.90 3 Deception Polygraph 0.92 0.83 0.88 37 Breast Cancer MRI 0.98 0.74 0.86 3 Breast Cancer (screen) Plain Film 0.79 0.92 0.86 4 Multiple Sclerosis MRI 0.73 0.93 0.83 2 Breast Cancer Plain Film 0.78 0.83 0.80 7 Alcohol Abuse (screen) MAST* 0.80 0.78 0.79 4 Deception (screen) Poiygraph 0.59 0.90 0.74 2 Personality Disorders DSM-IV** 0.84 0.60 0.72 3 Depression MMPI 0.68 0.65 0.67 25

*Also included a study using MMPI **Also included studies using ICD-10 and a Personality Index

Diagnosing depression with the MMPI has the lowest mean combined accuracy (.67) reported in

Table 6. Based on 37 studies, diagnostic field polygraph studies have an average combined accuracy of .88. This is similar to using ultrasound to diagnose carotid artery disease (.91), acute appendicitis (.91), and breast cancer (.90). It is also similar to using MRI (.86) and plain film 17

(.86) to diagnose breast cancer. The combined accuracy of screening polygraph is one of the lowest reported in Table 6.

Accuracy by Evaluation Tool

Table 7 reports accuracy by type of evaluation tool. Similar to Table 6, the list is ordered

from highest to lowest mean combined accuracy. Based on 37 field studies, diagnostic (specific

issue) polygraph has the highest combined accuracy (.88). Overall, however, the combined

diagnostic accuracy reported in field polygraph studies is very similar to those reported in MRI

(.87), CT (.86), and ultrasound (.86) diagnostic studies. The MMPI, either screening (.61) or

diagnostic (.67), has the lowest average combined accuracy.

Table 7: Rank ordered "Combined Accuracy" of diagnostic & screening tools

Average Accuracy

Sensitivity Specificity Combined Number of Evaluation Tool (TPR) (TOR) Accuracy Studies

Polygraph 0.92 0.83 0.88 37 MRI 0.86 0.88 0.87 17 CT 0.83 0.89 0.86 19 US 0.84 0.87 0.86 38 Plain Film 0.77 0.85 0.81 12 MAST (screening) 0.64 0.92 0.78 3 Polygraph (screening) 0.59 0.90 0.74 2 DSM-IV 0.72 0.68 0.70 1 MMPI 0.68 0.65 0.67 17 MMPI (screening) 0.70 0.53 0.61 5 18

Inter-Rater Agreement

Agreement among raters is measured as either the percent of cases in which two raters agree on an interpretation or the proportion of agreement beyond that expected by chance, which is represented by the kappa coefficient. Although these are very common measures of agreement, neither of these measures were reported often in the literature reviewed for this study.

As an example, there was only one study gathered in the search of psychology literature that reported between rater percent agreement. All agreement studies reported in Table 8 are based on field studies. There were only three screening studies found that reported agreement data and these were all polygraph. There were no analog studies found.

The eight polygraph studies reporting percent agreement averaged 91% among polygraph

examiners compared to 81% for physicians (based on five studies). Kappa coefficients were

found in all three disciplines. Based on six studies in the psychology literature, the mean kappa

among psychologists is .79. This is similar to polygraph examiners (.77) but greater than reports

on physicians (.56). It is important to note that kappa is a chance corrected measure. This means

that the kappa coefficient depends on both agreement and the distribution of cases used in a

particular study. Two studies with identical percent agreements can have dramatically different

kappas if the distribution of subject diagnoses vary (proportion of subjects with and without

disease). As a result, it is very difficult to compare kappa from one study to the next either

within the same discipline or between two disciplines. 19

Table 8: Inter-rater agreement on diagnostic cases among polygraph examiners, physicians, and psychologists

Polygraph Examiners Physicians Psychologists Percent Agree Mean 91% 81% 88% Median 91% 80% 88% Minimum 77% 77% 88% Maximum 100% 85% 88% Studies 8 5 1

Kappa (bi-rater) Mean 0.77 0.56 0.79 Median 0.80 0.60 0.79 Minimum 0.53 0.34 0.64 Maximum 1.00 0.72 0.91 Studies 9 13 6

Number of Subjects Mean 102 150 174 Median 69 138 113 Minimum 21 41 76 Maximum 402 308 331 Studies 9 14 6 20

Conclusion

The purpose of this study was to conduct a limited review and analysis of the literature concerning the accuracy and reliability of screening and diagnostic tests in polygraph, medicine, and psychology. Out of the 5,189 hits produced by the literature search, 1,158 articles and abstracts were reviewed, 145 were found to be useful resulting in data on 198 studies. The results of this review have shown there is an enormous range in reports of accuracy and agreement not only in polygraph but also medicine (limited to diagnostic radiology) and psychology. Overall, polygraph research on specific issue tests reports accuracy results similar to medicine. In contrast, polygraph screening studies report lower accuracy than medical studies but are similar to what is reported in the psychology literature.

To put these results into perspective, its worth reviewing several methodological issues raised almost two decades ago in the Office of Technology Assessment's (OTA) report on the

"Scientific Validity of Polygraph Testing." These issues, taken directly from the Conclusion of the OTA report, are as follows:

• Accuracy is affected by factors such as reader training, experience, personal bias, and

examinee characteristics

• Cases and readers are often selectively chosen rather than randomly

• Criteria for ground truth are inadequate in some studies

• There is wide variability in results from multiple studies

This review found that these same methodological deficits are very evident in the medical and psychological literature. Polemics on polygraph often correctly identify these issues but either 21 overstate or fail to mention that these same problems afflict much of the research in medicine and psychology.2

To put the results of this review in context, Table 9 contrasts the average accuracy found in field diagnostic studies with those reported in the OTA study in 1983. Since the OTA study included inconclusive results in accuracy estimates (and this study does not), it is not surprising that this current literature review reports a somewhat greater level of accuracy. As can be seen, however, the level of polygraph accuracy found by OTA is in the same "ballpark" as that reported in medicine and psychology.

Table 9: OTA findings compared to study results (field diagnostic cases)

Average Accuracy

Sensitivity Specificity Combined Number of Aggregate Measures (TPR) (TNR) Accuracy Studies

OTA Findings (w/Incl.) .86 .76 .81 10 nt Polygraph Findings .92 .83 .88 37 Medicine .83 .88 .86 94 Psychology .72 .67 .70 51

The findings presented in Table 9 are consistent with OTA's conclusion that research into

specific issue polygraph testing has shown the technique has some validity. It does not,

however, answer the question of polygraph validity in screening tests. Although not very

extensive, this review reports three analog and two field screening polygraph studies. Table 10

puts these alongside what was found for medical and psychological screening. As can be seen,

the mean combined accuracy of medical screening (.86) is greater than the mean reported for

analog or field polygraph screening studies. Both field (.74) and analog (.79) polygraph studies

2 For a recent example, see Aftergood, Steven. "Polygraph testing and the DOE National Laboratories." SCIENCE 22 are similar to psychology screening studies (.76), but the reported sensitivity of field (.59) polygraph screening is noticeably lower than the mean found for medical and psychological screening.

Table 10: Rank ordered findings for screening studies

Average Accuracy

Sensitivity Specificity Combined Number of Aggregate Measures (TPR) (TNR) Accuracy Studies

Medicine (Field) .79 .94 .86 10 Current Polygraph Findings (Analog) .76 .82 .79 3 Psychology (Field) .74 .78 .76 36 Current Polygraph Findings (Field) .59 .90 .74 2

Summary

There has been much debate over the past 30 years about polygraph and its accuracy,

reliability, utility, and lack of theoretical foundation. It should be recognized from this literature

review, however, that many of these same issues could be raised about medical and

psychological diagnostic tools. Based on the results of this review, it is unlikely polygraph

research will be able to reach a level of accuracy and reliability to satisfy its opponents. It

suffers from the same flaws of many other diagnostic tools: it will not be 100% accurate, nor will

its application from one subject to the next or by one examiner to another be invariant.

The accuracy of humans assessing humans is unlikely to be 100%. As has been shown in

this brief survey of the medical and psychological literature, there is wide variation in the

accuracy of diagnostic tools from one application to the next. In fact, there is often wide

Online. Vol 290 (5493), Nov 3,2000: 939-940. 23 variation between studies focused on one diagnostic tool, such as has been seen in past polygraph reviews. Since perfection remains elusive, some professions have learned to accept and manage this uncertainty. As an example, training, standardization, and an ongoing review of procedures are used to establish a baseline of acceptable practice and alternative mechanisms are developed and employed to help clarify equivocal test results.

Recommendations

One of the goals of this study is to compare polygraph research methodology to that used in medicine (diagnostic radiology) and psychology. The author does not claim to be an expert in all these methodologies, so what follows are general impressions from the literature review and personal experience.

1. Room for the "Medical" Perspective! Most research found in the medical profession uses

very specific terminology (sensitivity, specificity, odds ratios) and other methodological

approaches (receiver operating characteristic curves) for assessing diagnostic tests. Greater

use of similar terminologies and methodological approaches in polygraph may make the

results of polygraph research more meaningful to those outside the polygraph and

psychology community. There may also be a reservoir of methodologies and knowledge in

the health technology assessment field that could be applied to polygraph research.

2. Greater Focus on Accuracy and Reliability? The polygraph literature reviewed for this study

occasionally delved more into providing a sophisticated analysis of process and metrics

instead of clarifying how the outcomes of these factors affect accuracy and reliability. In 24

several cases, basic measures of accuracy or reliability were not immediately apparent

(sensitivity, specificity, and kappa had to be hand calculated from frequencies in tables).

3. Analog Generalizability? Although analog studies are practically nonexistent in the medical

and psychological literature, they appear to serve an important role in polygraph research and

are very valuable in assessing the internal validity of a test. It is difficult, however, to see

how analog studies are generalizable to an applied (clinical) setting. As with medical and

psychological tools, its unlikely polygraph will be able to demonstrate efficacy with a heavy

reliance on laboratory studies.

4. Screening Studies? Compared to the number of specific issue polygraph studies, there were

relatively few screening studies available for review. Although developing a suitable gold

standard (ground truth) for an evaluation of field polygraph screening is a very difficult

problem to surmount, similar issues have been faced by medicine and psychology. Based on

the medical and psychological literature, there appears to be a considerable range in the

quality of gold standards. The point suggested from the literature is that lack of a "pure" gold

standard has not stopped screening research in psychology and medicine.

5. Too Many Inconclusives? Although inconclusive test results were a common element in

polygraph research, they were seldom mentioned in the medical and psychological literature.

Any accuracy or reliability study which selects out only obvious interpretations will inflate

accuracy estimates and threatens both the legitimacy of the research and the assessment

technique. 25

Executive Summary Executive Summary A Comparative Analysis of Polygraph with other Screening and Diagnostic Tools

Method Accuracy by Target Condition

A limited review of literature published between January The average diagnostic accuracy for detecting deception 1986 and May 2001 was conducted to evaluate studies with polygraph was similar to diagnosing breast cancer reporting the accuracy and reliability of screening and with MRI or ultrasound (US). diagnostic tests in polygraph, medicine, and psychology. Data for 198 studies were collected from 145 articles. Accuracy estimates are the combined average of Diagnostic Accuracy by Target Condition sensitivity and specificity across all studies found within a particular category (1.00 = 100% accuracy). Acute Appendjcffis-CT Diagnostic and Screening Accuracy Brain Tumor Carotid Artery Dsease For field diagnostic assessments, the accuracy of polygraph, medical, and psychological tools was .88, Acute Appendicitis-US .86, and .70 respectively. For field screening Breast Cancer-US assessments, the accuracy of polygraph, medical, and psychological tools was .74, .86, and .76 respectively. Deceptbn-Balygraph Breast Cancer-M3 Accuracy Mjftipte Sclerosis

Breast Cancer-xray

Diagnostic FtersonaRy Disorders Depression

Screening 0.50 0.60 0.70 0.80 0.90 1.00

0.50 0.60 0.70 0.80 0.90 1.00 Agreement (kappa) ■ ftychctogy □ Madbine ■ FWygraph i i i i i -■ Averaging a standard measure of agreement across the reviewed literature suggests polygraph and psychology Accuracy of Various Diagnostic Tools studies report similar levels of agreement A kappa value of 1.0 represents 100% agreement beyond what The average accuracy reported for 37 diagnostic would be expected by chance. polygraph studies (specific issue) was similar to MRI (17 studies), CT (19 studies), and ultrasound (38 studies). MMPI had the lowest reported accuracy (17 studies). Reader Agreement (Kappa)

Diagnostic Accuracy by Assessment Tool

0.50 0.60 0.70 0.80 0.90 1.00

Conclusion

The level of accuracy and agreement reported in the 0.50 0.60 0.70 0.80 0.90 1.00 polygraph literature is consistent with the medical and psychological literature. Data Tables 28

Polygraph Studies Field Accuracy Studies Screeninq Diagnostic Se Sp Total % Cases Se Sp Total % Cases Mean 0.59 0.90 0.74 467 0.92 0.83 0.88 108 Median 0.59 0.90 0.74 467 0.95 0.90 0.90 64 Minimum 0.45 0.87 0.69 200 0.71 0.43 0.64 16 Maximum 0.73 0.93 0.80 733 1 1 1 959 Count 2 2 2 2 37 37 37 37

First Author Year Jayne-screening 1989 0.45 0.93 0.69 200 Brownlie 1997 0.73 0.87 0.80 733 Bersh-GQT 1969 0.97 0.89 0.93 68 Bersh-GQT &ZOC majority 1969 0.71 0.80 0.76 59 Bersh-GQT and ZOC 1969 0.93 0.92 0.93 157 Bersh-ZOC 1969 0.89 0.94 0.92 89 Horvath 1971 0.85 0.91 0.88 40 Hunter 1973 0.88 0.86 0.87 20 Slowick 1975 0.85 0.93 0.89 30 Wicklander 1975 0.95 0.93 0.94 20 Barland-judicial 1976 1.00 0.43 0.72 41 Barland-panel 1976 0.98 0.45 0.72 64 Raskin-nonnumerical 1976 0.93 0.69 0.81 16 Raskin-numerical 1976 1.00 0.95 0.98 16 Horvath 1977 0.77 0.51 0.64 56 Davidson 1979 0.88 0.81 0.85 20 Yamamura & Miyake 1980 0.80 0.94 0.87 95 Edwards 1981 0.98 0.98 0.98 959 Kleinmuntz 1982 0.75 0.63 0.69 80 Putnam 1983 0.99 0.95 0.97 285 Elaad-blind 1985 0.77 0.77 0.77 60 Elaad-blind 1985 0.77 0.90 0.84 60 Eladd & Schahar 1985 0.99 0.95 0.97 174 Yankee-Experienced examiners-without incl 1985 1.00 0.99 1.00 51 Yankee-Inexperienced examiners-without incl 1985 1.00 0.99 1.00 51 Patrick 1987 1.00 0.90 0.95 81 Patrick-blind 1987 0.98 0.55 0.77 69 Honts & Driscoll-ranked scores 1988 0.94 0.65 0.80 25 Honts & Driscoll-std num scores 1988 0.97 0.77 0.87 25 Honts-blind 1988 1.00 0.80 0.90 21 Raskin 1988 0.95 0.96 0.96 85 Raskin-blind 1988 0.94 0.86 0.90 70 Franz-blind 1989 1.00 0.97 0.99 81 Matte-blind 1989 1.00 1.00 1.00 114 Murray 1989 1.00 0.86 0.93 171 Arellano-blind 1990 1.00 1.00 1.00 40 Patrick & lacono 1991 0.97 0.56 0.77 402 Krapohl et al-speciflc issue 2001 0.87 0.92 0.90 221 Krapohl et al-specific issue 2001 0.84 0.97 0.91 76 29

Polygraph Studies Analog Accuracy Studies Screening Diagnostic Se Sp Total % Cases Se Sp Total % Cases Mean 0.76 0.82 0.79 50 0.89 0.78 0.84 72 Median 0.67 0.83 0.72 40 0.92 0.79 0.85 55 Minimum 0.61 0.63 0.65 40 0.63 0.49 0.60 15 Maximum 1.00 1.00 1.00 71 1.00 0.97 0.98 192 Count 3 3 3 3 18 18 18 18

First Author Year Correa & Adams-preemployment 1981 1.00 1.00 1.00 40 Ansley-screening 1989 0.61 0.83 0.72 71 Honts 1999 0.67 0.63 0.65 40 Barland 1975 0.83 0.71 0.77 72 Podlesny 1978 0.81 0.96 0.89 40 Raskin 1978 1.00 0.95 0.98 48 Rovner 1978 0.90 0.85 0.88 72 Dawson 1980 1.00 0.70 0.85 24 Hammond 1980 0.96 0.67 0.82 62 Bradley-electrodermal 1981 0.82 0.86 0.84 192 Bradley-heart rate 1981 0.63 0.63 0.63 192 Szecko 1981 0.71 0.49 0.60 30 Ginton 1982 1.00 0.85 0.93 15 Honts 1982 1.00 0.67 0.84 21 Honts 1982 1.00 0.67 0.84 38 Kischer 1983 0.94 0.97 0.96 100 Honts & Driscoll-ranked scores 1987 0.95 0.86 0.91 41 Honts & Driscoll-std num scores 1987 0.86 0.86 0.86 44 Barland et al-multiple issue approach 1990 0.93 0.75 0.84 100 Barland et al-single issue approach 1990 0.91 0.83 0.87 100 Blackwell-Examiners 1996 0.86 0.73 0.79 108 30

Polygraph Studies Field Agreement Studies Screening Diagnostic Agree % Kappa Cases Agree % Kappa Cases Mean 0.98 53 0.91 0.77 102 Median 0.99 60 0.91 0.80 69 Minimum 0.95 40 0.77 0.53 21 Maximum 0.99 60 1.00 1.00 402 Count 3 0 3 8 9 9

First Author Year Edel & Moore 1984 0.95 40 Yankee-Experienced examiners -with incl 1985 0.99 60 Yankee-Inexperienced examiners-with incl 1985 0.99 60 Elaad 1985 0.77 0.53 60 Elaad 1985 0.83 0.67 60 Patrick 1987 0.86 0.60 69 Honts 1988 0.90 0.81 21 Raskin 1988 0.91 0.80 70 Franz 1989 0.99 0.98 81 Matte 1989 1.00 1.00 114 Arellano 1990 1.00 1.00 40 Patrick & lacono 1991 0.53 402 31

Medical Studies

Field Accuracy Studies Screening Diagnostic Se Sp Total % Cases Se Sp Total % Cases Mean 0.79 0.94 0.86 56581 0.83 0.88 0.86 284 Median 0.78 0.93 0.85 19758 0.85 0.93 0.88 124 Minimum 0.51 0.87 0.76 79 0.25 0.44 0.60 23 Maximum 0.97 1.00 0.99 202070 1.00 1.00 1.00 4811 Count 10 10 10 10 94 94 94 89

First Author Tech Year Baines-breast cancer Mammography 1988 0.75 0.94 0.85 44718 Unk author-breast cancer Mammography 1992 0.91 0.96 0.94 72706 Burhenne-breast cancer Mammography 1995 0.84 0.93 0.88 201937 Beam-breast cancer Mammography 1996 0.79 0.90 0.85 79 Heinzen-breast cancer Mammography 2000 0.78 0.92 0.85 202070 Mettlin-prostate cancer US 1991 0.77 0.89 0.83 2425 Levi-congenital anomalies US 1995 0.51 1.00 0.76 25046 Strandell-endometrial pathology US 1999 0.73 0.87 0.80 103 Lennon-neural tube and ventral wall defects US 1999 0.97 1.00 0.99 2257 van Nagell-ovarian cancer US 2000 0.81 0.99 0.90 14469 Stark-Hepatic metastases CT 1987 0.80 0.94 0.87 135 Shackford-minor head injuries CT 1992 1.00 0.51 0.76 2166 Pasanen-unjaundiced cholestasis CT 1994 0.53 0.86 0.70 33 van Gils-paraganglioma of the head/neck CT 1994 0.73 0.94 0.84 60 Budoff-coronary artery disease CT 1996 0.95 0.44 0.70 710 Rao- appendicitis CT 1997 0.98 0.98 0.98 100 Mushlin-multiple sclerosis CT 1997 0.25 0.95 0.60 303 Mushlin-brain tumor CT 1997 0.93 1.00 0.97 303 Mushlin-cerebrovascular disease CT 1997 0.88 0.95 0.92 303 D'lppolito- appendicitis CT 1998 0.91 1.00 0.96 52 Miller-acute flank pain CT 1998 0.96 1.00 0.98 106 Vieweg- acute flank pain CT 1998 0.98 0.98 0.98 105 Keberie- throat tumors CT 1999 0.88 1.00 0.94 99 Lane- appendicitis CT 1999 0.96 0.99 0.98 300 Garcia - appendicitis CT 1999 0.94 0.94 0.94 139 Valk-colorectal cancer CT 1999 0.69 0.96 0.83 115 Kurtz-ovarian cancer CT 1999 0.92 0.89 0.91 213 Walker- appendicitis CT 2000 0.94 1.00 0.97 65 Joseph-open-globe injuries CT 2000 0.75 0.93 0.84 200 von Kummer-stroke damage CT 2001 0.64 0.85 0.75 786 Stafford-blunt abdominal trauma CT no contrast 1999 0.89 0.57 0.73 195 Stafford-blunt abdominal trauma CT with contrast 1999 0.84 0.94 0.89 199 Martelli-breast cancer Mammography 1990 0.73 0.80 0.77 1708 Elmore-breast cancer Mammography 1994 0.70 0.94 0.82 150 Cwikla-breast cancer Mammography 1998 0.70 0.57 0.64 70 Fenlon-breast cancer Mammography 1998 0.81 0.82 0.82 44 Drew-breast cancer Mammography 1999 0.88 0.89 0.88 285 Zonderland-breast cancer Mammography 1999 0.83 0.97 0.90 4811 Moss-breast cancer Mammography 1999 0.79 0.83 0.81 559 Stark-Hepatic metastases MR 1987 0.82 0.99 0.91 135 Barronian-imaging of the knee MR 1989 0.67 0.86 0.77 23 Glashow-anterior cruciate and meniscal lesions MR 1989 0.83 0.84 0.84 47 Mooney-multiple sclerosis MR 1990 0.88 0.94 0.91 Mooney-brain infarct MR 1990 0.88 1.00 0.94 Mooney-brain tumor MR 1990 0.93 0.95 0.94 Mooney-other brain disease MR 1990 0.91 0.92 0.92 Young-carotid artery stenosis MR 1994 0.89 0.82 0.86 70 Levine-osteomyelitis MR 1994 0.77 1.00 0.89 26 Ascher-Endometriosis MR 1995 0.76 0.60 0.68 31 Mussurakis-breast cancer MR 1996 0.99 0.56 0.78 57 Mushlin-multiple sclerosis MR 1997 0.58 0.91 0.75 303 Mushlin-brain tumor MR 1997 0.93 1.0C 0.97 303 Mushlin-cerebrovascular disease MR 1997 1.00 1.0C 1.00 303 Regan-acute cholecystitis MR 1998 0.91 0.7S 0.85 72 Kurtz-ovarian cancer MR 1999 0.98 0.88 0.93 280 Drew-breast lesions MR 199S 0.9S 0.91 0.95 285 Blanchard-rotator cuff tears MR 199S 0.7S 0.81 0.8C 38 Razumovsky-acute cerebral ischemia MR 199S 0.84 1.0C 0.92 30 Adamek-pancreatic cancer MR 200C 0.84 0.97 0.91 124 Schroter-Creutzfeldt-Jakob disease MR 200C 0.67 0.9C 0.8C 220 Imbriaco- breast masses MR 2001 0.96 0.7E 0.86 49 Scott-orthopedic fractures PLAIN FILM 199C 0.7£ 0.8C 0.81 60 32

Medical Studies

Analog Accuracy Studies Screening Diagnostic Se Sp Total % Cases Se Sp Total % Cases Mean 0.81 0.73 0.77 92 Median 0.81 0.73 0.77 92 Minimum 0.79 0.59 0.71 80 Maximum 0.83 0.86 0.83 104 Count 0 0 0 0 2 2 2 2

First Author Tech Year Mill-foreign bodies in human tissue US 1997 0.83 0.59 0.71 80 Orlinsky-radiolucent foreign body us 2000 0.79 0.86 0.83 104

Medical Studies

Field Agreement Studies Screening Diagnostic Agree % Kappa Cases Agree % Kappa Cases Mean 0.81 0.56 150 Median 0.80 0.60 138 Minimum 0.77 0.34 41 Maximum 0.85 0.72 308 Count 0 0 0 5 13 14

First Author Tech Year Freed-ureteral stone disease CT 1998 0.69 103 Grotta-stroke CT 1999 0.77 0.39 70 Weishaupt -osteoarthritis CT 1999 0.60 308 Kurtz-ovarian cancer CT 1999 0.65 213 Elmore-breast cancer Mammography 1994 0.78 0.47 150 Baker-breast cancer Mammography 1996 0.34 60 Brant-Zawadzk-lumbar disc abnormalities MR 1995 0.80 0.58 125 Mussurakis-breast cancer MR 1996 0.42 57 Weishaupt-osteoarthritis MR 1999 0.41 308 Kurtz-ovarian cancer MR 1999 0.70 179 Masdeu-head trauma SPECT 1994 0.83 0.72 41 Wong-early pregnancy complications US 1998 0.85 151 Kurtz-ovarian cancer US 1999 0.66 264 Ihlberg-vein grafts US 2000 0.69 69 33 Psychology Studies Field Accuracy Studies Screening Diagnostic Se Sp Total % Cases Se Sp Total % Cases Mean 0.74 0.78 0.76 996 0.72 0.67 0.70 218 Median 0.79 0.85 0.78 307 0.71 0.65 0.69 84 Minimum 0.11 0.00 0.42 55 0.37 0.41 0.50 29 Maximum 1.00 1.00 0.98 16235 0.96 0.95 0.93 1079 Count 36 36 36 36 51 51 51 51

First Author Year Bradley-alcohol screening 1998 0.80 0.86 0.83 771 Bradley-alcohol screening 1998 0.78 0.89 0.83 771 Brooks-neuropsychological screening 1990 0.80 1.00 0.90 175 Glascoe-developmental screening 1993 0.72 0.76 0.74 89 Steer-major depression 1999 0.97 0.99 0.98 120 Bradley-alcohol screening 1998 0.52 0.85 0.69 771 Bradley-alcohol screening 1998 0.35 0.98 0.66 771 Dent-memory problems in multiple sclerosis 2000 0.93 0.48 0.71 61 Bradley-alcohol screening 1998 0.91 0.77 0.84 771 Bradley-alcohol screening 1998 0.75 0.89 0.82 771 Parikh-post-stroke depression 1988 0.86 0.90 0.88 80 Baird-autism at 18 months of age 2000 0.38 0.98 0.68 16235 Chen-attention-deficit hyperactivity 1994 0.23 1.00 0.62 122 Scheinberg-eating disorders 1993 0.93 0.41 0.67 1112 Scheinberg-eating disorders 1993 1.00 0.38 0.69 1112 Gureje 1990 0.68 0.70 0.69 787 Pomeroy-depression 2001 0.91 0.65 0.78 87 Razavi-adjustment and major depressive disorders 1990 0.70 0.75 0.73 210 Inwald-performance of government security personnel 1991 0.60 0.76 0.68 307 Johnson-pathological gamblers 1998 1.00 0.85 0.93 423 Benussi-alcoholism 1982 1.00 0.94 0.97 104 Yersin-alcoholism 1989 0.70 0.92 0.81 268 Erford-Math Essential skills screen 1998 0.98 0.88 0.93 100 Uhlmann-dementia 1991 0.81 0.97 0.89 209 Inwald-performance of government security personnel 1991 0.45 0.73 0.59 307 Colligan-alcoholism 1988 0.74 0.84 0.79 2144 Colligan-alcoholism 1988 1.00 0.00 0.50 2144 Colligan-alcoholism 1988 0.62 0.34 0.48 2144 Hirschfeld-bipolar spectrum disorder 2000 0.73 0.90 0.82 198 Sherman-Pediatric Language Acquisition 1999 0.11 0.73 0.42 84 Hiatt-job performance problems 1988 0.68 0.73 0.70 55 de las Cuevas-Severity of Dependence Scale (SDS) 2000 0.98 0.94 0.96 100 Birtchnell-depressive disorders 1989 0.94 0.86 0.90 133 Bradley-alcohol screening 1998 0.48 0.86 0.67 771 Bradley-alcohol screening 1998 0.91 0.72 0.82 771 Bradley-alcohol screening 1998 0.82 0.85 0.83 771 Berument-Autism 1999 0.85 0.75 0.80 200 Kogan-Geriatric Depression Scale 1994 0.64 0.73 0.69 59 Laprise- Geriatric Depression 1998 0.96 0.46 0.71 66 Blais-FRANTIC AVOIDANCE 1999 0.63 0.95 0.79 76 Blais-UNSTABLE RELATIONSHIPS 1999 0.94 0.91 0.93 76 Blais-IDENTITY DISTURBANCE 1999 0.73 0.84 0.79 76 Blais-IMPULSIVITY 1999 0.55 0.70 0.63 76 Blais-SUICIDAL 1999 0.96 0.41 0.69 76 Blais-AFFECTIVE INSTABILITY 1999 0.91 0.42 0.67 76 Blais-CHRONIC EMPTINESS 1999 0.52 0.79 0.66 76 Blais-POORLY CONTROLLED ANGER 1999 0.73 0.53 0.63 76 Blais-STRESS-RELATED PARANOIA 1999 0.51 0.60 0.56 76 Kogan-Geriatric Depression Scale 1994 0.79 0.69 0.74 59 Laprise-Geriatric Depression 1998 0.89 0.56 0.73 66 Merson-personality disorders 1994 0.95 0.50 0.73 29 Ivnick-MAYO VERBAL COMPREHENSION FACTOR S 2000 0.55 0.85 0.70 1079 Chaffee-expressive and receptive language scales 1990 0.88 0.45 0.67 152 Ivnick- ATTENTION-CONCENTRATION SCORE 2000 0.71 0.70 0.71 1079 Ivnick-LEARNING FACTOR SCORE 2000 0.77 0.84 0.81 1079 Ivnick-PERCEPTUAL ORGANIZATION SCORE 2000 0.70 0.83 0.77 1079 Ivnick- RETENTION SCORE 2000 0.88 0.80 0.84 1079 BOONE 1994-depression 1994 0.61 0.62 0.62 62 WETZLER 1998-depression 1998 0.64 0.65 0.65 113 BEN-PORATH 1991-depression 1991 0.66 0.64 0.65 73 BEN-PORATH 1991-depression 1991 0.63 0.61 0.62 87 MUNLEY 1997-depression 1997 0.71 0.71 0.71 84 GREENBLATT 1999-depression 1999 0.54 0.56 0.55 75 34

Psychology Studies Field Agreement Studies Screening Diagnostic Agree % Kappa Cases Agree % Kappa Cases Mean 0.61 510 0.88 0.79 174 Median 0.61 510 0.88 0.79 113 Minimum 0.61 510 0.88 0.64 76 Maximum 0.61 510 0.88 0.91 331 Count 0 1 1 1 6 6

First Author Year Lavigne- DSM-III-R with preschool children 1994 0.61 510 Klin-autism 2000 0.88 0.71 131 DSM-III Phase Two Field Trials 1980 0.72 331 DSM-III Phase Two Field Trials 1980 0.64 331 Blais-NINE SCALE PERSONALITY DISORDER 1999 0.85 76 Hogervors-Alzheimer's disease 2000 0.90 82 Hilsenroth-Schizophrenia 1998 0.91 95 35

Articles Used 36

Articles Used-Polygraph

Aftergood, Steven. "Polygraph testing and the DOE National Laboratories." SCIENCE Online. Vol 290 (5493), Nov 3, 2000: 939-940.

Ansley, Norman. 1989. Accuracy and utility of RI screening by student examiners at DODPI. Polygraph and Personnel Security Research. Office of Security. . Fort George G. Meade, MD.

Ansley, Norman. 1990. "The validity and reliability of polygraph decisions in real cases. Polygraph 19(3): 169-181. (Table 2, pg 174)

Barland, Gordon H. R. Honts, and Steven D. Barger. 1990. The detection of deception for multiple issues. DoDPI Project No. DoDPI89-P-0005.

Blackwell, JoanN. 1996. Polyscore: A comparison of accuracy. DoDPI Report no: DoDPI94-P- 0006.

Brownlie C, Johnson GJ, and B. Knill. 1997. "Validation study of the relevant/irrelevant screening format" Unpublished report from government project. Provided by DoDPI.

Correa, Eileen Israel and Henry E. Adams. 1981. The validity of the preemployment polygraph examination and the effects of motivation. Polygraph. 143-155.

Edel, Eugene and Lane A. Moore Jr. 1984. "Analysis of Agreement in polygraph charts. Polygraph 193-197. (Table II).

Honts CR and S. Amato. 1999. "The automated polygraph examination." Contract No. 110224- 1998-MO.

Honts, Charles R and Lawrence N. Driscoll. 1987."An evaluation of the reliability and validity of rank order and standard numerical scoring of polygraph charts. Polygraph 241-255.

Honts, Charles R and Lawrence N. Driscoll. 1988. "A field validity study of the rank order scoring system (ROSS) in multiple issue control question tests." Polygraph 1-13

Jayne, Brian C. 1989. A comparison between the predictive value of two common preemployment screening procedures. The Investigator 5(3)

Krapohl, Donald J, Donnie W. Dutton, and Andrew H. Ryan. 2001. The Rank Order Scoring System: Replication and extension with field data. Forthcoming Polygraph 2001.

Krapohl, Donald J, Kendall W. Shull, and Andrew A. Ryan. 2000. Does the confession criterion in case selection inflate polygraph validity estimates? Paper submitted to the Forensic Science Communications, November 8, 2000. Polygraph Institute, Ft. Jackson SC. 37

Lykken, D.T. 1983. A tremor in the blood: Uses and abuses of the lie detector. New York: McGraw-Hill.

McCauley, Clark and Robert F. Forman. 1988. "A review of the Office of Technology Assessment report on polygraph validity." Basic and Applied Social Psychology 9(2): 73-84.

Patrick, CJ. and W.G. Iacono. 1991. "Validity of the control question polygraph test: The problem of sampling bias." Journal of Applied Psychology 76(2): 229-238.)

Scientific validity of polygraph testing: A research review and evaluation. Office of Technology Assessment. U.S. Congress: Report No. OTATMH15, November 1983.

Yankee, William J, James M. Powell, and Ross Newland. 1985. An investigation of the accuracy and consistency of polygraph chart interpretation by inexperienced and experienced examiners. Polygraph 14(2): 108-117. 38

Articles Used-Medical

Adamek HE, Albert J, Breer H, Weitz M, Schilling D, Riemann JF. Pancreatic cancer detection with magnetic resonance cholangiopancreatography and endoscopic retrograde cholangiopancreatography: a prospective controlled study. Lancet 2000 Jul 15;356(9225):190-3

Ascher SM, Agrawal R, Bis KG, Brown ED, Maximovich A, Markham SM, Patt RH, Semelka RC. Endometriosis: appearance and detection with conventional and contrast-enhanced fat- suppressed spin-echo techniques. J Magn Reson Imaging 1995 May-Jun;5(3):251-7

Author unk; Specificity of screening in United Kingdom trial of early detection of breast cancer. BMJ 1992 Feb 8;304(6823):346-9

Ayers M, Prince M, Ahmadi S, Baran DT. Reconciling quantitative ultrasound of the calcaneus with X-ray-based measurements of the central skeleton. J Bone Miner Res 2000 Sep; 15(9): 1850- 5

Baines CJ, McFarlane DV, Miller AB. Sensitivity and specificity of first screen mammography in 15 NBSS centres. Can Assoc Radiol J 1988 Dec;39(4):273-6

Baker JA, Kornguth PH, Lo JY, Floyd CE. 1996. Artificial neural network: Improving the quality of breast biopsy recommendations. Radiology 198:131-135.

Barronian AD, Zoltan JD, Bucon KA. Magnetic resonance imaging of the knee: correlation with arthroscopy. Arthroscopy 1989;5(3):187-91

Beam CA, Layde PM, Sullivan DC. 1996. Variability in the interpretation of screening mammograms by US radiologists. Arch Intern Med 156: 209-213

Belsky M, Gaitini D, Goldsher D, Hoffman A, Daitzchman M. Color-coded duplex ultrasound compared to CT angiography for detection and quantification of carotid artery stenosis. Eur J Ultrasound 2000 Sep;12(l):49-60

Blanchard TK, Bearcroft PW, Constant CR, Griffin DR, Dixon AK. Diagnostic and therapeutic impact of MRI and arthrography in the investigation of full-thickness rotator cuff tears. Eur Radiol 1999;9(4):638-42

Bozkurt T, Richter F, Lux G. Ultrasonography as a primary diagnostic tool in patients with inflammatory disease and tumors of the small intestine and large bowel. J Clin Ultrasound 1994 Feb;22(2):85-91

Braccini G, Lamacchia M, Boraschi P, Bertellotti L, Marrucci A, Goletti O, Perri G. Ultrasound versus plain film in the detection of pneumoperitoneum. Abdom Imaging 1996 Sep- Oct;21(5):404-12 39

Brant-Zawadzki MN, Jensen MC, Obuchowski N, Ross JS, Modic MT. Interobserver and intraobserver variability in interpretation of lumbar disc abnormalities. A comparison of two nomenclatures. Spine 1995 Jun l;20(ll):1257-63

Britton PD, Coulden RA. The use of duplex Doppler ultrasound in the diagnosis of breast cancer. Clin Radiol 1990 Dec;42(6):399-401

Budoff MJ, Georgiou D, Brody A, Agatston AS, Kennedy J, Wolfkiel C, Stanford W, Shields P, Lewis RJ, Janowitz WR, Rich S, Brundage BH. Ultrafast computed tomography as a diagnostic modality in the detection of coronary artery disease: a multicenter study. Circulation 1996 Mar l;93(5):898-904

Burhenne LJ, Burhenne HJ, Kan L. 1995. Quality-oriented mass mammography screening. Radiology 194:185-188.

Cicinelli E, Romano F, Anastasio PS, Blasi N, Parisi C, Galantino P. Transabdominal sonohysterography, transvaginal sonography, and hysteroscopy in the evaluation of submucous myomas. Obstet Gynecol 1995 Jan;85(l):42-7

Currie IC, Jones AJ, Wakeley CJ, Tennant WG, Wilson YG, Baird RN, Lamont PM. Non- invasive aortoiliac assessment. Eur J Vase Endovasc Surg 1995 Jan;9(l):24-8

Cwikla JB, Buscombe JR, Kelleher SM, Parbhoo SP, Thakrar DS, Hinton J, Deery AR, Crow J, Hilson AJ. Comparison of accuracy of scintimammography and X-ray mammography in the diagnosis of primary breast cancer in patients selected for surgical biopsy. Clin Radiol 1998 Apr;53(4):274-80

DePriest PD, Varner E, Powell J, Fried A, Puls L, Higgins R, Shenson D, Kryscio R, Hunter JE, Andrews SJ, et al.The efficacy of a sonographic morphology index in identifying ovarian cancer: a multi-institutional investigation. Gynecol Oncol 1994 Nov;55(2): 174-8

D'lppolito G, de Mello GG, Szejnfeld J. The value of unenhanced CT in the diagnosis of acute appendicitis. Rev Paul Med 1998 Nov-Dec;116(6):1838-45

Drew PJ, Turnbull LW, Chatterjee S, Read J, Carleton PJ, Fox JN, Monson JR, Kerin MJ. Prospective comparison of standard triple assessment and dynamic magnetic resonance imaging of the breast for the evaluation of symptomatic breast lesions. Ann Surg 1999 Nov;230(5):680-5

Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. 1994. Variability in Radiologist's Interpretations of Mammograms. NEJM. 331(22): 1493-1499.

Fenlon HM, N, Tierney S, Gorey T, Ennis JT. Tc-99m tetrofosmin scintigraphy as an adjunct to plain-film mammography in palpable breast lesions. Clin Radiol 1998 Jan;53(l): 17-24

Freed KS, Paulson EK, Frederick MG, Preminger GM, Shusterman DJ, Keogan MT, Vieweg J, Smith RH, Nelson RC, Delong DM, Leder RA. Interobserver variability in the interpretation of 40 unenhanced helical CT for the diagnosis of ureteral stone disease. J Comput Assist Tomogr 1998 Sep-Oct;22(5):732-7

Garcia Pena BM, Mandl KD, Kraus SJ, Fischer AC, Fleisher GR, Lund DP, Taylor GA. Ultrasonography and limited computed tomography in the diagnosis and management of appendicitis in children. JAMA 1999 Sep 15;282(11): 1041-6

Gebel M, Caselitz M, Bowen-Davies PE, Weber S.A multicenter, prospective, open label, randomized, controlled phase Illb study of SH U 508 a (Levovist) for Doppler signal enhancement in the portal vascular system. Ultraschall Med 1998 Aug;19(4):148-56

Gentile AT, Moneta GL, Lee RW, Masser PA, Taylor LM Jr, Porter JM. Usefulness of fasting and postprandial duplex ultrasound examinations for predicting high-grade superior mesenteric artery stenosis. Am J Surg 1995 May; 169(5):476-9

Glashow JL, Katz R, Schneider M, Scott WN. Double-blind assessment of the value of magnetic resonance imaging in the diagnosis of anterior cruciate and meniscal lesions. J Bone Joint Surg Am 1989 Jan;71(l): 113-9

Grab D, Flock F, Stohr I, Nussle K, Rieber A, Fenchel S, Brambs HJ, Reske SN, Kreienberg R. Classification of asymptomatic adnexal masses by ultrasound, magnetic resonance imaging, and positron emission tomography. Gynecol Oncol 2000 Jun;77(3):454-9

Grotta JC, Chiu D, Lu M, Patel S, Levine SR, Tilley BC, Brott TG, Haley EC Jr, Lyden PD, Kothari R, Frankel M, Lewandowski CA, Libman R, Kwiatkowski T, Broderick JP, Marler JR, Corrigan J, Huff S, Mitsias P, Talati S, Tanne D. Agreement and variability in the interpretation of early CT changes in stroke patients qualifying for intravenous rtPA therapy. Stroke 1999 Aug;30(8): 1528-33

Hedges, Larry V.; Hasselblad, Vic. Meta-analysis of screening and diagnostic tests. Psychological Bulletin. 1995 JanVol 117(1) 167-178

Heinzen MT, Yankaskas BC, Kwok RK. Comparison of woman-specific versus breast-specific data for reporting screening mammography performance. Acad Radiol 2000 Apr;7(4):232-6

Hill R, Conron R, Greissinger P, Heller M. Ultrasound for the detection of foreign bodies in human tissue. Ann Emerg Med 1997 Mar;29(3):353-6

Ihlberg L, Alback A, Roth WD, Edgren J, Lepantalo M. Interobserver agreement in duplex scanning for vein grafts. Eur J Vase Endovasc Surg 2000 May;19(5):504-8

Imbriaco M, Del Vecchio S, Riccardi A, Pace L, Di Salle F, Di Gennaro F, Salvatore M, Sodano A. Scintimammography with 99mTc-MIBI versus dynamic MRI for non-invasive characterization of breast masses. Eur J Nucl Med 2001 Jan;28(l):56-63 41

Johansson M, Jensen G, Aurell M, Friberg P, Herlitz H, Klingenstierna H, Volkmann R. Evaluation of duplex ultrasound and captopril renography for detection of renovascular hypertension. Kidney Int 2000 Aug;58(2):774-82

Johnson MB, Wilkinson ID, Wattam J, Venables GS, Griffiths PD. Comparison of Doppler ultrasound, magnetic resonance angiographic techniques and catheter angiography in evaluation of carotid stenosis. Clin Radiol 2000 Dec;55(12):912-20

Joseph DP, Pieramici DJ, Beauchamp NJ Jr. Computed tomography in the diagnosis and prognosis of open-globe injuries. Ophthalmology 2000 Oct; 107(10): 1899-906

Keberle M, Kenn W, Tschammler A, Wittenberg G, Hilgarth M, Hoppe F, Hahn D. Current value of double-contrast pharyngography and of computed tomography for the detection and for staging of hypopharyngeal, oropharyngeal and supraglottic tumors. Eur Radiol 1999;9(9):1843- 50

Kurtz AB, Tsimikas JV, Tempany CM, Hamper UM, Arger PH, Bree RL, Wechsler RJ, Francis IR, Kuhlman JE, Siegelman ES, Mitchell DG, Silverman SG, Brown DL, Sheth S, Coleman BG, Ellis JH, Kurman RJ, Caudry DJ, McNeil BJ.Diagnosis and staging of ovarian cancer: comparative values of Doppler and conventional US, CT, and MR imaging correlated with surgery and histopathologic analysis-report of the Radiology Diagnostic Oncology Group. : Radiology 1999 Jul;212(l): 19-27

Lane MJ, Liu DM, Huynh MD, Jeffrey RB Jr, Mindelzun RE, Katz DS. Suspected acute appendicitis: nonenhanced helical CT in 300 consecutive patients. Radiology 1999 Nov;213(2):341-6

Lennon CA, Gray DL. Sensitivity and specificity of ultrasound for the detection of neural tube and ventral wall defects in a high-risk population. Obstet Gynecol 1999 Oct;94(4):562-6

Levi S, Schaaps JP, De Havay P, Coulon R, Defoort P. End-result of routine ultrasound screening for congenital anomalies: the Belgian Multicentric Study 1984-92. Ultrasound Obstet Gynecol 1995 Jun;5(6):366-71

Levine SE, Neagle CE, Esterhai JL, Wright DG, Dalinka MK. Magnetic resonance imaging for the diagnosis of osteomyelitis in the diabetic patient with a foot ulcer. Foot Ankle Int 1994 Mar; 15(3): 151-6

Lindsell DR. Ultrasound imaging of pancreas and biliary tract. Lancet 1990 Feb 17;335(8686):390-3

Ma OJ, Mateer JR. Trauma ultrasound examination versus chest radiography in the detection of hemothorax. Ann Emerg Med 1997 Mar;29(3):312-5 42

Maggino T, Gadducci A, D'Addario V, Pecorelli S, Lissoni A, Stella M, Romagnolo C, Federghini M, Zucca S, Trio D, et al. Prospective multicenter study on CA 125 in postmenopausal pelvic masses. Gynecol Oncol 1994 Aug;54(2):l 17-23

Martelli G, Pilotti S, Coopmans de Yoldi G, Viganotti G, Fariselli G, Lepera P, Moglia D. Diagnostic efficacy of physical examination, mammography, fine needle aspiration cytology (triple-test) in solid breast lumps: an analysis of 1708 consecutive cases. Tumori 1990 Oct 31;76(5):476-9

Masdeu JC, Van Heertum RL, Kleiman A, Anselmi G, Kissane K, Horng J, Yudd A, Luck D, Grundman M. Early single-photon emission computed tomography in mild head trauma. A controlled study. J Neuroimaging 1994 Oct;4(4): 177-81

Mettlin C, Lee F, Drago J, Murphy GP. The American Cancer Society National Prostate Cancer Detection Project. Findings on the detection of early prostate cancer in 2425 men. Cancer 1991 Jun 15;67(12):2949-58

Miller OF, Rineer SK, Reichard SR, Buckley RG, Donovan MS, Graham IR, Goff WB, Kane CJ. Prospective comparison of unenhanced spiral computed tomography and intravenous urogram in the evaluation of acute flank pain.: Urology 1998 Dec;52(6):982-7

Mooney C, Mushlin AI, Phelps CE. 1990. Targeting assessments of magnetic resonance imaging in suspected multiple sclerosis. Med Decis Making 10:77-94.

Moss HA, Britton PD, Flower CD, Freeman AH, Lomas DJ, Warren RM. How reliable is modern breast imaging in differentiating benign from malignant breast lesions in the symptomatic population? Clin Radiol 1999 Oct;54(10):676-82

Mushlin AI, Mooney C, Holloway RG, Detsky AS, Mattson DH, Phelps CE. 1997. The cost- effectiveness of magnetic resonance imaging for patients with equivocal neurological symptons. International Journal of Technology Assessment in Health Care 13(l):21-34.

Mussurakis S, Buckley DL, Coady AM, Turnbull LW, Horsman A.Observer variability in the interpretation of contrast enhanced MRI of the breast. Br J Radiol 1996 Nov;69(827): 1009-16

Nozoe T, Matsumata T, Sugimachi K. Usefulness of preoperative transvaginal ultrasonography for women with advanced gastric carcinoma. Am J Gastroenterol 1999 Sep;94(9):2509-12

Orlinsky M, Knittel P, Feit T, Chan L, Mandavia D. The comparative accuracy of radiolucent foreign body detection using ultrasonography. Am J Emerg Med 2000 Jul;18(4):401-3

Pasanen PA, Partanen K, Pikkarainen P, Alhava E, Pirinen A, Janatuinen E. A prospective study on the value of ultrasound, computed tomography and endoscopic retrograde cholangiopancreatography in the diagnosis of unjaundiced cholestasis. In Vivo 1994 Mar- Apr;8(2):227-30 43

Perre CI, Koot VC, de Hooge P, Leguit P. The value of ultrasound in the evaluation of palpable breast tumours: a prospective study of 400 cases. Eur J Surg Oncol 1994 Dec;20(6):637-40

Rao PM, Rhea JT, Novelline RA, Mostafavi AA, Lawrason JN, McCabe CJ. Helical CT combined with contrast material administered only through the colon for imaging of suspected appendicitis. AJR Am J Roentgenol 1997 Nov;169(5): 1275-80

Razumovsky AY, Gillard JH, Bryan RN, Hanley DF, Oppenheimer SM. TCD, MRA and MRI in acute cerebral ischemia. ActaNeurol Scand 1999 Jan;99(l):65-76

Regan F, Schaefer DC, Smith DP, Petronis JD, Bohlman ME, Magnuson TH. The diagnostic utility of HASTE MRI in the evaluation of acute cholecystitis. Half-Fourier acquisition single- shot turbo SE. J Comput Assist Tomogr 1998 Jul-Aug;22(4):638-42

Robertson PL, Berlangieri SU, Goergen SK, Waugh JR, Kalff V, Stevens SN, Hicks RJ, Fabiny RP, Ugoni A, Kelly MJ. Comparison of ultrasound and blood pool scintigraphy in the diagnosis of lower limb deep venous thrombosis. Clin Radiol 1994 Jun;49(6):3 82-90

Rosen CL, Brown DF, Sagarin MJ, Chang Y, McCabe CJ, RE. Ultrasonography by emergency physicians in patients with suspected ureteral colic. J Emerg Med 1998 Nov- Dec;16(6):865-70

Rozycki GS, Ballard RB, Feliciano DV, Schmidt JA, Pennington SD. Surgeon-performed ultrasound for the assessment of truncal injuries: lessons learned from 1540 patients. Ann Surg 1998 Oct;228(4):557-67

Saidi MH, Sadler RK, Theis VD, Akright BD, Farhart SA, Villanueva GR. Comparison of sonography, sonohysterography, and hysteroscopy for evaluation of abnormal uterine bleeding. J Ultrasound Med 1997 Sep;16(9):587-91

Schröter A, Zerr I, Henkel K, Tschampa HJ, Finkenstaedt M, Poser S. Magnetic resonance imaging in the clinical diagnosis of Creutzfeldt-Jakob disease. Arch Neurol 2000 Dec;57(12):1751-7

Schwerk WB, Wichtrup B, Rothmund M, Ruschoff J. Ultrasonography in the diagnosis of acute appendicitis: a prospective study. Gastroenterology 1989 Sep;97(3):630-9

Scott WW Jr, Rosenbaum JE, Ackerman SJ, Reichle RL, Magid D, Weiler JC, Gitlin JN. Subtle orthopedic fractures: teleradiology workstation versus film interpretation. Radiology 1993 Jun;187(3):811-5

Shackford SR, Wald SL, Ross SE, Cogbill TH, Hoyt DB, Morris JA, Mucha PA, Pachter HL, Sugerman HJ, O'Malley K, et al. The clinical utility of computed tomographic scanning and neurologic examination in the management of patients with minor head injuries. J Trauma 1992 Sep;33(3):385-94 44

Simonovsky V. The diagnosis of cirrhosis by high resolution ultrasound of the liver surface. Br J Radiol 1999 Jan;72(853):29-34

Srinivasan J, Mayberg MR, Weiss DG, Eskridge J. Duplex accuracy compared with angiography in the Veterans Affairs Cooperative Studies Trial for Symptomatic Carotid Stenosis. Neurosurgery 1995 Apr;36(4):648-53

Stafford RE, McGonigal MD, Weigelt JA, Johnson TJ. Oral contrast solution and computed tomography for blunt abdominal trauma: a randomized study. Arch Surg 1999 Jun;134(6):622-6

Stark DD, Wittenberg J, Butch RJ, Ferrucci JT Jr. Hepatic metastases: randomized, controlled comparison of detection with MR imaging and CT. Radiology 1987 Nov;165(2):399-406

Strandell A, Bourne T, Bergh C, Granberg S, Asztely M, Thorburn J. The assessment of endometrial pathology and tubal patency: a comparison between the use of ultrasonography and X-ray hysterosalpingography for the investigation of infertility patients. Ultrasound Obstet Gynecol 1999 Sep;14(3):200-4

Thomas B, Falcone RE, Vasquez D, Santanello S, Townsend M, Hockenberry S, Innes J, Wanamaker S. Ultrasound evaluation of blunt abdominal trauma: program implementation, initial experience, and learning curve. J Trauma 1997 Mar;42(3):384-8; discussion 388-90

Thourani VH, Pettitt BJ, Schmidt JA, Cooper WA, Rozycki GS. Validation of surgeon- performed emergency abdominal ultrasonography in pediatric trauma patients. J Pediatr Surg 1998 Feb;33(2):322-8

Tsuda H, Kawabata M, Kawabata K, Yamamoto K, Hidaka A, Umesaki N. Comparison between transabdominal and transvaginal ultrasonography for identifying endometrial malignancies. Gynecol Obstet Invest 1995;40(4):271-3

Valk PE, Abella-Columna E, Haseman MK, Pounds TR, Tesar RD, Myers RW, Greiss HB, Hofer GA. Whole-body PET imaging with [18F]fluorodeoxyglucose in management of recurrent colorectal cancer. Arch Surg 1999 May;134(5):503-ll van Gils AP, van den Berg R, Falke TH, Bloem JL, Prins HJ, Dillon EH, van der Mey AG, Pauwels EK. MR diagnosis of paraganglioma of the head and neck: value of contrast enhancement. AJR Am J Roentgenol 1994 Jan;162(l):147-53 van Nagell JR Jr, DePriest PD, Reedy MB, Gallion HH, Ueland FR, Pavlik EJ, Kryscio RJ. The efficacy of transvaginal sonographic screening in asymptomatic women at risk for ovarian cancer. Gynecol Oncol 2000 Jun;77(3):350-6

Vieweg J, Teh C, Freed K, Leder RA, Smith RH, Nelson RH, Preminger GM. Unenhanced helical computerized tomography for the evaluation of patients with acute flank pain. J Urol 1998 Sep;160(3 Pt l):679-84 45 von Kummer R, Bourquain H, Bastianello S, Bozzao L, Manelfe C, Meier D, Hacke W. Early prediction of irreversible brain damage after ischemic stroke at CT Radiology 2001 Apr;219(l):95-100

Walker S, Haun W, Clark J, McMillin K, Zeren F, Gilliland T. The value of limited computed tomography with rectal contrast in the diagnosis of acute appendicitis. Am J Surg 2000 Dec;180(6):450-4; discussion 454-5

Weishaupt D, Zanetti M, Boos N, Hodler J. MR imaging and CT in osteoarthritis of the lumbar facet joints. Skeletal Radiol 1999 Apr;28(4):215-9

Wong TW, Lau CC, Yeung A, Lo L, Tai CM. Efficacy of transabdominal ultrasound examination in the diagnosis of early pregnancy complications in an emergency department. J Accid Emerg Med 1998 May; 15(3): 155-8

Young GR, Humphrey PR, Shaw MD, Nixon TE, Smith ET. Comparison of magnetic resonance angiography, duplex ultrasound, and digital subtraction angiography in assessment of extracranial internal carotid artery stenosis. : J Neurol Neurosurg Psychiatry 1994 Dec;57(12): 1466-78

Zielke A, Hasse C, Sitter H, Rothmund M. Influence of ultrasound on clinical decision making in acute appendicitis: a prospective study. Eur J Surg 1998 Mar; 164(3):201-9

Zonderland HM, Coerkamp EG, Hermans J, van de Vijver MJ, van Voorthuisen AE. Diagnosis of breast cancer: contribution of US as an adjunct to mammography. Radiology 1999 Nov;213(2):413-22

Zuberi SM, Matta N, Nawaz S, Stephenson JB, McWilliam RC, Hollman A. Muscle ultrasound in the assessment of suspected neuromuscular disease in childhood. Neuromuscul Disord 1999 Jun;9(4):203-7 46

Articles Used-Psychological

American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, 3rd Edition. Washington, D.C., APA, 1980.

Baird, Gillian; Charman, Tony; Baron-Cohen, Simon; Cox, Anthony; Swettenham, John; Wheelwright, Sally; Drew, Auriol. A screening instrument for autism at 18 months of age: A 6- year follow-up study. Journal of the American Academy of Child & Adolescent Psychiatry. 2000 JunVol 39(6) 694-702

Berument, Sibel Kazak; Rutter, Michael; Lord, Catherine; Pickles, Andrew; Bailey, Anthony. Autism screening questionnaire: Diagnostic validity. British Journal of Psychiatry. 1999 Nov Vol 175 444-451

Birtchnell, John; Evans, Chris; Deahl, Martin; Masters, Nigel. The Depression Screening Instrument (DSI): A device for the detection of depressive disorders in general practice. Journal of Affective Disorders. 1989 Mar-JunVol 16(2-3)269-281

Blais, Mark A.; Hilsenroth, Mark J.; Fowler, J. Christopher. Diagnostic efficiency and hierarchical functioning of the DSM-IV borderline personality disorder criteria. Journal of Nervous & Mental Disease. 1999 Mar Vol 187(3) 167-173

Bradley KA, Boyd-Wickizer J, Powell SH, Burman ML. 1998. Alcohol screening questionnaires in women: A critical review. JAMA 280(2): 166-171

Brooks, David A.; Williams, Ronald N; Dean, Raymond S.; Wood, Tina M.; et al. The predictive validity of a neuropsychological screening measure. International Journal of Neuroscience. 1990 Mar Vol 51(1-2) 83-88

Chaffee, C. Anne; Cunningham, Charles E.; Secord-Gilbert, Margaret; Elbard, Heather; et al Screening effectiveness of the Minnesota Child Development Inventory expressive and receptive language scales: Sensitivity, specificity, and predictive value. Psychological Assessment. 1990 Mar Vol 2(1) 80-85

Chen, Wei J.; Faraone, Stephen V.; Biederman, Joseph; Tsuang, Ming T. Diagnostic accuracy of the Child Behavior Checklist scales for attention-deficit hyperactivity disorder: A receiver- operating characteristic analysis. Journal of Consulting & Clinical Psychology. 1994 Oct Vol 62(5) 1017-1025

Colligan, Robert C; Davis, Leo J.; Morse, Robert M.; Offord, Kenneth P. Screening medical patients for alcoholism with the MMPI: A comparison of seven scales. Journal of Clinical Psychology. 1988 Jul Vol 44(4) 582-592 47 de las Cuevas, ; Sanz, Emilio J.; de la Fuente, Juan A.; Padilla, Jonathan; Berenguer, Juan C. The Severity of Dependence Scale (SDS) as screening test for benzodiazepine dependence: SDS validation study. Addiction. 2000 Feb Vol 95(2) 245-250

Dent, Alexandra; Lincoln, Nadina B. Screening for memory problems in multiple sclerosis. British Journal of Clinical Psychology. 2000 Sep Vol 39(3) 311-315

Erford, Bradley T.; Bagley, Donna L.; Hopper, James A.; Lee, Ramona M.; Panagopulos, Kathleen A.; Preller, Denise B. Reliability and validity of the Math Essential Skill Screener- Elementary Version (MESS-E). Psychology in the Schools. 1998 Apr Vol 35(2) 127-135

Foa, Edna B.; Cashman, Laurie; Jaycox, Lisa; Perry, Kevin. The validation of a self-report measure of posttraumatic stress disorder: The Posttraumatic Diagnostic Scale. Psychological Assessment. 1997 Dec Vol 9(4) 445-451

Glascoe, Frances Page; Byrne, Karen E. The accuracy of three developmental screening tests. Journal of Early Intervention. 1993 Fal Vol 17(4) 368-379

Gross, Kristin; Keyes, Megan D.; Greene, Roger L. Assessing depression with the MMPI and MMPI-2. Journal of Personality Assessment. 2000 Dec Vol 75(3) 464-477

Gureje, O.; Obikoya, B. The GHQ-12 as a screening tool in a primary care setting. Social Psychiatry & Psychiatric Epidemiology. 1990 Sep Vol 25(5) 276-280

Hiatt, Deirdre; Hargrave, George E. Predicting job performance problems with psychological screening. Journal of Police Science & Administration. 1988 Jun Vol 16(2) 122-135

Hilsenroth, Mark J.; Fowler, J. Christopher; Padawer, Justin R. The Rorschach Schizophrenia Index (SCZI): An examination of reliability, validity, and diagnostic efficiency. Journal of Personality Assessment. 1998 Jun Vol 70(3) 514-534

Hirschfeld, Robert M. A.; Williams, Janet B. W.; Spitzer, Robert L.; Calabrese, Joseph R.; Flynn, Laurie; Keck, Paul E., Jr.; Lewis, Lydia; McElroy, Susan L.; Post, Robert M.; Rapport, Daniel J.; Russell, James M.; Sachs, Gary S.; Zajecka, John. Development and validation of a screening instrument for bipolar spectrum disorder: The Mood Disorder Questionnaire. American Journal of Psychiatry. 2000 Nov Vol 157(11) 1873-1875

Hogervorst, Eef; Barnetson, L.; Jobst, K. A.; Nagy, Zs.; Combrinck, M.; Smith, A. D. Diagnosing dementia: Interrater reliability assessment and accuracy of the NINCDS/ADRDA criteria versus CERAD histopathological criteria for Alzheimer's disease. Dementia & Geriatric Cognitive Disorders. 2000 Mar-Apr Vol 11(2) 107-113

Hyler, Steven E.; Skodol, Andrew E.; Kellman, H. David; Oldham, John M.; et al. Validity of the Personality Diagnostic Questionnaire-Revised: Comparison with two structured interviews. American Journal of Psychiatry. 1990 Aug Vol 147(8) 1043-1048 48

Inwald, Robin E.; Brockwell, Albert L. Predicting the performance of government security personnel with the IPI and MMPI. Journal of Personality Assessment. 1991 Jun Vol 56(3) 522- 535

Ivnik, Robert J.; Smith, Glenn E.; Petersen, Ronald C; Boeve, Bradley F.; Kokmen, Emre; Tangalos, Eric G. Diagnostic accuracy of four approaches to interpreting neuropsychological test data. Neuropsychology. 2000 Apr Vol 14(2) 163-177

Johnson, Edward E.; Hamer, Robert M.; Nora, Rena M. The Lie/Bet questionnaire for screening pathological gamblers: A follow-up study. Psychological Reports. 1998 Dec Vol 83(3, Pt 2) 1219-1224

Klin, Ami; Lang, Jason; Cicchetti, Domenic V.; Volkmar, Fred R. Brief report: Interrater reliability of clinical diagnosis and DSM-IV criteria for autistic disorder: Results of the DSM-IV Autism Field Trial. Journal of Autism & Developmental Disorders. 2000 Apr Vol 30(2) 163-167

Kogan, Evan S.; Kabacoff, Robert I.; Hersen, Michel; Van Hasselt, Vincent B. Clinical cutoffs for the Beck Depression Inventory and the Geriatric Depression Scale with older adult psychiatric outpatients. Journal of Psychopathology & Behavioral Assessment. 1994 Sep Vol 16(3) 233-242

Laprise, Rejeanne; Vezina, Jean. Diagnostic performance of the Geriatric Depression Scale and the Beck Depression Inventory with nursing home residents. Canadian Journal on Aging. 1998 Win Vol 17(4) 401-413

Lavigne, John V.; Arend, Richard; Rosenbaum, Diane; Sinacore, James; et al. Interrater reliability of the DSM-III--R with preschool children. Journal of Abnormal Child Psychology. 1994 Dec Vol 22(6) 679-690

Maki, Naruhiko; Ikeda, Manabu; Hokoishi, Kazuhiko; Nebu, Akihiko; Komori, Kenjiro; Shigenobu, Kazue; Fukuhara, Ryuji; Hirono, Nobutsugu; Nakata, Hideki; Tanabe, Hirotaka. Validity of the Short-Memory Questionnaire in vascular dementia. International Journal of Geriatric Psychiatry. 2000 Dec Vol 15(12) 1143-1146

Merson, S.; Tyrer, P.; Duke, Peter J.; Henderson, F. Interrater reliability of ICD-10 guidelines for the diagnosis of personality disorders. Journal of Personality Disorders. 1994 Sum Vol 8(2) 89- 95

Parikh, Rajesh M.; Eden, Dianne T.; Price, Thomas R; Robinson, Robert G. The sensitivity and specificity of the Center for Epidemiologie Studies Depression Scale in screening for post-stroke depression. International Journal of Psychiatry in Medicine. 1988 Vol 18(2) 169-181

Pomeroy, Ian M.; Clark, Christopher R; Philp, Ian. The effectiveness of very short scales for depression screening in elderly medical patients. International Journal of Geriatric Psychiatry. 2001 Mar Vol 16(3) 321-326 49

Razavi, Darius; Delvaux, Nicole; Farvacques, Christine; Robaye, Edmond. Screening for adjustment disorders and major depressive disorders in cancer in-patients. British Journal of Psychiatry. 1990 Jan Vol 156 79-83

Scheinberg, Zvi; Koslowsky, Meni; Bleich, Avi; Mark, Mordechai; et al. Sensitivity, specificity, and positive predictive value as measures of prediction accuracy: The case of the EAT-26. Educational & Psychological Measurement. 1993 Fal Vol 53(3) 831-839

Sherman, Tracy; Shulman, Brian B. Specificity and sensitivity ratios of the Pediatric Language Acquisition Screening Tool for Early Referral-Revised. Infant-Toddler Intervention. 1999 Dec Vol 9(4) 315-330

Steer, Robert A.; Cavalieri, Thomas A.; Leonard, Douglas M.; Beck, Aaron T. Use of the Beck Depression Inventory for primary care to screen for major depression disorders. General Hospital Psychiatry. 1999 Mar-Apr Vol 21(2) 106-111

Storgaard, Heidi; Nielsen, Susanne D.; Gluud, Christian. The validity of the Michigan Alcoholism Screening Test (MAST). Alcohol & Alcoholism. 1994 Sep Vol 29(5) 493-502

Uhlmann, Richard F.; Larson, Eric B. Effect of education on the Mini-Mental State Examination as a screening test for dementia. Journal of the American Geriatrics Society. 1991 Sep Vol 39(9) 876-880 50

Articles Used-Security

Jayne, Brian C. 1989. A comparison between the predictive value of two common preemployment screening procedures. The Investigator 5(3) 51

Articles Reviewed 52

Articles Reviewed-Polygraph

Abrams, Stanley, Polygraph validity and reliability: A review. Journal of Forensic Sciences, Year: 1973, Volume: 18, Page: 313-326,

Abrams, Stanley, The Validity Of Polygraph Technique With Children. Polygraph , Year: 1994, Volume: 23, Page: 115-116,

Abrams, Stanley, The validity of the polygraph technique with children Journal of Police Science and Administration, Year: 1975, Volume: 3, Page: 310-311,

Abrams, Stanley, The validity of the polygraph with schizophrenics Polygraph, Year: 1974, Volume: 3, Page:,

Abrams, Stanley, Validity of the polygraph: A bibliography. Polygraph, Year: 1972, Volume: 1, Page: 97-101,

Alpher, Victor S.Blanton, Richard L., The accuracy of lie detection: Why lie tests based on the polygraph should not be admitted into evidence today. Law and Psychology Review, Year: 1985, Volume: 9, Page: 67-75,

Ansley, Norman, A bibliography on validity and reliability of polygraph techniques Polygraph, Year: 1982,

Ansley, Norman, A compendium on polygraph validity Polygraph, Year: 1983, Volume: 12, Page: 53-61,

Ansley, Norman, A compendium on polygraph validity The Narc Officer, Year: 1989, Volume: 5, Page: 43-45,

Ansley, Norman, A review of the scientific literature on the validity, reliability, and utility of polygraph techniques (3rd ed.), Location: Fort Meade, MD, Publisher: National Security Agency, Year: 1983,

Ansley, Norman, Accuracy and utility of RI screening by student examiners at DoDPI, Location: Fort Meade, MD, Publisher: National Security Agency, Year: 1989,

Ansley, Norman, Quick reference guide to polygraph validity, Year: 1983,

Ansley, Norman, Research on the validity of the relevant-irrelevant technique as used in screening Polygraph, Year: 1980, Volume: 9, Page: 153-161,

Ansley, Norman, The validity and reliability of polygraph decisions in real cases Polygraph, Year: 1990, Volume: 19, Page: 169-181, Comments2: Reprinted by the American Polygraph Association. Library has 1 copies including one signed by the author., 53

Ansley, Norman. 1989. Accuracy and utility of RI screening by student examiners at DODPI. Polygraph and Personnel Security Research. Office of Security. National Security Agency. Fort George G. Meade, MD.

Ansley, Norman. 1990. "The validity and reliability of polygraph decisions in real cases. Polygraph 19(3): 169-181. (Table 2, pg 174)

Ansley, NormanGarwood, MarciaBarland, Gordon H., The accuracy and utility of polygraph testing, Location: Washington, DC, Publisher: Department of Defense, Year: 1984, Comments2: Reprinted in Polygraph, 13, 1-143. Reprinted as a separate publication by the American Polygraph Association, 1984.,

Ansley, NormanWeir, Raymond J., Statistical study on the use of the Polygraph as a personnel security screening aid, Year: 1953,

APL program increases polygraphs reliability The Johns Hopkins University Gazette, Year: 1993, Volume: 12, Page: 40, Comments2: Johns Hopkins University,

Ash, Philip, The polygraph in pre-employment screening: Research on pre-employment use Polygraph, Year: 1980, Volume: 9, Page: 17-26,

Barland, Gordon H. Charles R. Honts, and Steven D. Barger. 1990. The detection of deception for multiple issues. DoDPI Project No. DoDPI89-P-0005.

Barland, Gordon H., A bibliography of scientific studies pertaining to the validity of current lie detection techniques. , Editor: Ansley, Norman Legal admissibility of the polygraph, Location: Springfield, IL, Publisher: Charles C. Thomas, Year: 1975, Page: 328-332,

Barland, Gordon H., Bibliography on validity Polygraph, Year: 1972, Volume: 1, Page: 256-262,

Barland, Gordon H., Bibliography: The use of polygraphs in pre-employment screening Polygraph, Year: 1974, Volume: 3, Page: 231-234,

Barland, Gordon H., The reliability of chart evaluations , Editor: Ansley, Norman Legal admissibility of the polygraph, Location: Springfield, IL, Publisher: Charles C. Thomas, Year: 1975, Page: 120-132,

Barland, Gordon H., The validity of the polygraph technique, Editor: Ansley, Norman Legal admissibility of the polygraph, Location: Springfield, IL, Publisher: Charles C. Thomas, Year: 1975, Page: 153-167,

Barland, Gordon H.Honts, Charles R.Barger, Steven D., A laboratory study of the validity of the ZCT: An executive summary, Location: Fort McClellan, AL 36205, Publisher: Department of Defense Polygraph Institute, GovReport: DoDPI88-P-0002; DoDPI89-P-0005; DoDPI89-P- 0007, Comments2: See ProCite record numbers 571, 572 and 2460, 54

Barland, Gordon H.Honts, Charles R.Barger, Steven D., Executive summaries: A laboratory study of the validity of the ZCT. The detection of deception for multiple issues. The relative utility of skin resistance and skin conductance., Location: Fort McClellan, Alabama, Publisher: Department of Defense Polygraph Institute , GovReport: DoDPI89-P-0005, Comments2: DTIC AD#A339187/PAA,

Barland, Gordon H.Honts, Charles R.Barger, Steven D., The validity of detection of deception for multiple issues Psychophysiology, Year: 1989, Volume: 26, Page: 13, Comments2: [abstract],

Barland, Gordon H.Raskin, David C, Studies of the accuracy of security screening polygraph examinations, Year: 1976, GovReport: Report No. 76-1 (Contract 75-NI-99-0001), National Institute of the Law Enforcement and Criminal Justice, Law Enforcement Assistance Administration, U.S. Department of Justice; and Department of Psychology, University of Utah,

Barland, Gordon H.Raskin, David C, Validity and reliability of polygraph examinations of criminal suspects, Editor: Prokasy, W. F.Raskin, David C. , Location: New York, Publisher: Academic Press, Year: 1973, Page: 417-477,

Bersh, Philip J.Brisentine, Robert A., Jr., The reliability of blind interpretation of polygraph records for lie detection purposes, GovReport: Report prepared for the DoD Joint Services Group for Coordinated Lie Detection Research,

Blackwell, Joan N. 1996. Polyscore: A comparison of accuracy. DoDPI Report no: DoDPI94-P- 0006.

Bradley, Michael T., Accuracy demonstrations, threat and the detection of deception: Cardiovascular, electrodermal and pupillary measures, Location: Canada, Publisher: University of Manitoba, Year: 1980,

Bradley, Michael T.Janisse, Michel P., Accuracy demonstrations, threat, and the detection of deception: Cardiovascular, electrodermal, and pupillary measures Psychophysiology, Year: 1981, Volume: 18, Page: 307-315, Comments2: Reprinted in Polygraph 10(2), 77-91,

Brownlie C, Johnson GJ, and B. Knill. 1997. "Validation study of the relevant/irrelevant screening format" Unpublished report from government project. Provided by DoDPI.

Cabot, R. E., Validity of the polygraph Law and Order, Year: 1977, Volume: 25, Page: 52-53,

Carroll, Douglas, The accuracy of polygraph lie detection Bulletin of the British Psychological Society, Year: 1985, Volume: 38, Page: A56, Comments2: [abstract],

Cestaro, Victor L., A comparison between decision accuracy rates obtained using the polygraph instrument and the computer voice stress analyzer (CVSA) in the absence of jeopardy, Location: 55

Fort McClellan, AL, Publisher: Department of Defense Polygraph Institute, GovReport: D0DPI95-R-OOO2, Comments2: DTIC # ADA300334, reprinted in Polygraph 25(2)117-127,

Cestaro, Victor L., A comparison of accuracy rates between detection of deception examinations using the polygraph and the computer voice stress analyzer in a mock crime scenario, Location: Fort McClellan, AL, Publisher: Department of Defense Polygraph Institute, GovReport: DoDPI95-P-0004, Comments2: DTIC AD Number A318985,

Chamberlin, Charles S., Research that debates the validity of the polygraph raises a different set of questions Security World, Year: 1985, Volume: 22, Page: 23,

Clarkson, Douglas B.Martin, Doug, Improving the reliability of polyraph tests, Comments2: Contract no. N00014-93-1-0451, DTIC AD # A303889,

Correa, Eileen Israel and Henry E. Adams. 1981. The validity of the preemployment polygraph examination and the effects of motivation. Polygraph. 143-155.

Cross, Theodore P.Saxe, Leonard, A critique of the validity of polygraph testing in child sexual abuse cases Journal of Child Sexual Abuse, Year: 1992, Volume: 1, Page: 19-33,

Crowe, Michael J.Peters, R. DouglasSuarez, Yolanda M.Claeren, Lisa, The research project to compare the relative validity of the positive control and the relevant-irrelevant polygraph techniques, Location: Jacksonville, AL, Publisher: Jacksonville State University, Year: 1988, GovReport: Contract #MDA 904-87-C-2293,

Cureton, Edward E., A consensus as to the validity of polygraph procedures Tennessee Law Review, Year: 1953, Volume: 22, Page: 18-32, 728-742,

Dougherty, Denise, Critique of DoD (1984), The accuracy and utility of polygraph testing, and comparison with OTA report, The scientific validity of polygraph testing: A research review and evaluation, Location: Washington, DC, Publisher: Office of Technology Assessment, ,

Dunlap, William P.Silver, ClaytonBittner, Alvah C, Estimating reliability with small samples: Increased precision with averaged correlations. Human Factors, Year: 1986, Volume: 28, Page: 685-690,

Edel, Eugene and Lane A. Moore Jr. 1984. "Analysis of Agreement in polygraph charts. Polygraph 193-197. (Table II).

Edel, Eugene C.Jacoby, Jacob, Examiner reliability in polygraph chart analysis: Identification of physiological responses. Journal of Applied Psychology, Year: 1975, Volume: 60, Page: 632- 634,

Edwards, Robert H, A survey: reliability of polygraph examinations conducted by Virginia polygraph examiners, Comments2: Published in Polygraph (1981), 10(4), 229-272., 56

Elaad, Eitan, The accuracy of human decisions and objective measurements in psychophysiological detection of knowledge Journal of Psychology , Year: 1994, Volume: 128, Page: 267-280,

Elaad, EitanSchahar, Esther, Polygraph field validity Polygraph, Year: 1985, Volume: 14, Page: 217-223,

Elaad, EitanSchahar, Esther, Polygraph field validity, Editor: Nachshon, Israel Scientific interrogation in criminal investigation, Year: 1976, Comments2: Selected papers presented at the First National Conference on the Scientific Interrogation Criminal Investigation, Bar-Ilan University, Ramat-Gan, Israel,

Elaad, EitanSchahar, Esther, Polygraph field validity. Crime and Social Deviance, Year: 1978, Volume: 6, Page: 16-24, Comments2: [text in Hebrew],

Eminent stress researcher comments on the validity of the polygraph Security Letter, Year: 1976, Volume: 6, Page: 3,

FBI expert produces evidence of high accuracy of polygraph tests APA Newsletter, Year: 1986, Volume: 19, Page: 10,

Forman, Robert F., The reliability and validity of three polygraph techniques using the field practice model. Dissertation Abstracts International, Year: 1985, Volume: 45, Page: 3622, Comments2: (University Microfilm International No. 8503690),

Furedy, John J.Heslegrave, Ronald J., Validity of the lie detector: A psychophysiological perspective. Criminal Justice and Behavior, Year: 1988, Volume: 15, Page: 219-246,

Garwood, Marcia, Two issues on the validity of personnel screening polygraph examinations. Polygraph, Year: 1985, Volume: 14, Page: 209-216,

Gerstein, Lawrence H.Barke, Charles R.Johnson, Stephen D., Internal validity studies of a telephone preemployment measure. Journal of Employment Counseling , Year: 1989, Volume: 26, Page: 77-83,

Gibbons, John H., Scientific validity of polygraph testing Testimony before the Subcommittee on Legislation and National Security, Committee on Government Operations, U.S. House of Representatives., Location: Washington, DC, Publisher: GPO,

Gibbons, John H., Scientific validity of polygraph testing. Polygraph, Year: 1983, Volume: 12, Page: 196-197,

Goodman, David J., An investigation into the relative effects of repeated tests upon the validity of diagnosis of results obtained in lie detector examinations, Year: 1952, Comments2: Paper 57 written while a student a Bowling Green State University. Pages 10-11 are missing, probably contained tabular data which could not be copied at that time.,

Gugas, Chris, A scientifically accurate method of personnel screening. Police, Year: 1961, Page: 19-24,

Hawkins, Darren, Experts debate the accuracy of polygraph tests Daily Chronicle, Location: University of Utah, Page: 1,

Hikita, YoshioSuzuki, Akihiro, An experimental study on the reliability of the judgments between CQT technique and POT technique. National Research Institute of Police Science Bulletin (Polygraph Report), Year: 1965, Volume: 21, Page: 23-64, Comments2:,

Honts CR and S. Amato. 1999. "The automated polygraph examination." Contract No. 110224- 1998-MO.

Honts, Charles R and Lawrence N. Driscoll. 1987."An evaluation of the reliability and validity of rank order and standard numerical scoring of polygraph charts. Polygraph 241-255.

Honts, Charles R and Lawrence N. Driscoll. 1988. "A field validity study of the rank order scoring system (ROSS) in multiple issue control question tests." Polygraph 1-13

Honts, Charles R.Driscoll, Lawrence N., A field validity study of rank order scoring system (ROSS) in multiple issue control question tests. Polygraph, Year: 1988, Volume: 17, Page: 16- Jan,

Honts, Charles R.Driscoll, Lawrence N., An evaluation of the reliability and validity of rank order and standard numerical scoring of polygraph charts. Polygraph, Year: 1987, Volume: 16, Page: 241-257,

Honts, Charles R.Raskin, David C, A field study of the validity of the directed lie control question. Journal of Police Science and Administration, Year: 1988, Volume: 16, Page: 56-61,

Honts, Charles R.Raskin, David C.Kircher, John C, Mental and physical countermeasures reduce the accuracy of polygraph tests Journal of Applied Psychology, Year: 1994, Volume: 79, Page: 252-259,

Horvath, Frank S., The accuracy and reliability of police polygraphic (lie detector) examiners' judgments of truth and deception: The effect of selected variables Dissertation Abstracts International, Year: 1974, Volume: 36, Page: 420,

Horvath, Frank S.Reid, John E., The reliability of polygraph examiner diagnosis of truth and deception. Journal of Criminal Law, Criminology and Police Science, Year: 1971, Volume: 62, Page: 276-281, Comments2: Reprinted in Polygraph 11:1, p. 91-99., 58

Hunter, Fred L., Polygraph reliability in identifying the truthful subject. APA Newsletter, Year: 1971, Volume: 6, Page: 23-Nov,

Hunter, Fred L.Ash, Philip, The accuracy and consistency of polygraph examiners' diagnosis. Journal of Police Science and Administration, Year: 1973, Volume: 1, Page: 370-375,

Iacono, W. G.Lykken, D. T., The validity of the lie detector: two surveys of scienctific opinion Journal of Applied Psychology, Year: 1997, Volume: 82, Page: 426-433,

Iacono, William G., Can we determine the accuracy of polygraph tests? , Editor: Ackles, P. KJennings, J. RichardColes, Michael G. H. Advances in psychophysiology , Location: Greenwich, CT, Publisher: JAI Press., Year: 1991, Volume: 4, Page: -,

Iacono, William G., Validity of polygraph examinations in the field setting. Psychophysiology, Year: 1984, Volume: 21, Page: 567, Comments2: [abstract],

Jayne, Brian C. 1989. A comparison between the predictive value of two common preemployment screening procedures. The Investigator 5(3)

Jones, Hugh E.Salter, Sarah, Polygraph accuracy: An analog study Polygraph, Year: 1989, Volume: 18, Page: 69-74,

Kalbfleisch, Pamela J., Accuracy in deception detection: A quantitative review Dissertation Abstracts International, Year: 1985, Volume: 46, Page: 4453, Comments2: (University Microfilms No. AAD86-03433), 102 p.,

Kaye, D. H., The validity of tests: Caveant Omnes. Jurimetrics Journal, Year: 1987, Volume: 27, Page: 349-361,

Kenery, W. H., The psychological stress evaluator-The theory, validity, and legal status of an innovative lie detector. Indiana Law Journal, Year: 1980, Volume: 55, Page: 349-374,

Kradz, Michael P.Bartone, John C, Investigative proof of the reliability and value of the psychological stress evaluator in science, medicine and law, Year: 1981, Comments2: Master of Arts thesis. Washington, DC: Abbe Publishing Association.,

Krapohl, Donald J, Donnie W. Dutton, and Andrew H. Ryan. 2001. The Rank Order Scoring System: Replication and extension with field data. Forthcoming Polygraph 2001.

Krapohl, Donald J, Kendall W. Shull, and Andrew A. Ryan. 2000. Does the confession criterion in case selection inflate polygraph validity estimates? Paper submitted to the Forensic Science Communications, November 8, 2000. Polygraph Institute, Ft. Jackson SC.

Lahri, S. K.Ganguly, A. K., An experimental study of the accuracy of polygraph technique in diagnosis of deception with volunteer and criminal subjects. Polygraph, Year: 1978, Volume: 7, Page: 89-94., 59

Langer, Eleanor, Lie detectors: Sleuthing by polygraph increasingly popular: Claims of accuracy are unproved Science, Year: 1964, Volume: 144, Page: 395-397,465-466,

Langley, L. L., The polygraph lie detector: Its physiological basis, reliability and admissibility Alabama Lawyer, Year: 1955, Volume: 16, Page: 209-224,

Link, Frederick C, Polygraph accuracy: are we asking the right questions? part II The Interrogation News, Year: 1984, Volume: 2,

Link, Frederick C, Polygraph accuracy: are we asking the right questions? The Interrogation News, Year: 1984, Volume: 1, Page: 5-Apr,

Lykken, David T., The validity of tests: Caveat emptor Jurimetrics Journal, Year: 1987, Volume: 27, Page: 263-270,

Lykken, David T., The validity of the guilty knowledge technique: The effects of faking Journal of Applied Psychology, Year: 1960, Volume: 44, Page: 258-262, Comments2: Reprinted in Polygraph, 7,42-48.,

Manthorne, DavePerrier, David C, Reliability of polygraph chart interpretation based upon numerical evaluation, Year: 1981,

McCauley, Clark and Robert F. Forman. 1988. "A review of the Office of Technology Assessment report on polygraph validity." Basic and Applied Social Psychology 9(2): 73-84.

McCauley, ClarkForman, Robert F., A review of the Office of Technology Assessment report on polygraph validity. Basic and Applied Social Psychology, Year: 1988, Volume: 9, Page: 73-84,

McDaniel, Michael A.Jones, John W., Predicting employee theft: A quantitative review of the validity of a standardized measure of dishonesty. Journal of Business and Psychology, Year: 1988, Volume: 2, Page: 327-345,

New report gives high reliability and validity to the polygraph Security Letter, Year: 1976, Volume: 6, Page: 2-Jan,

Office of Technology Assessment, The validity of polygraph testing., Year: 1983, Page: -, Comments2: Preliminary draft. Printed in Polygraph, 12, 198-319.,

Patrick, C.J. and W.G. Iacono. 1991. "Validity of the control question polygraph test: The problem of sampling bias." Journal of Applied Psychology 76(2): 229-238.)

Patrick, Christopher J., The validity of lie detection with criminal psychopaths, Location: Canada, Publisher: University of British Columbia, Year: 1987, 60

Patrick, Christopher J.Iacono, William G., Psychopathy, threat and polygraph test accuracy Journal of Applied Psychology, Year: 1989, Volume: 74, Page: 347-355,

Patrick, Christopher J.Iacono, William G., Validity of the control question polygraph test: The problem of sampling bias. Journal of Applied Psychology, Year: 1991, Volume: 76, Page: 229- 238,

Polygraph validity found inconclusive Security, Year: 1987, Volume: 24, Page: 10,

Psychologists group says jury is still out on polygraph validity Law Enforcement News, Year: 1986, Volume: 11, Page: 3,

Quigley-Fernandez, BarbaraTedeschi, James T., The bogus pipeline as lie detector: Two validity studies. Journal of Personality and Social Psychology, Year: 1978, Volume: 36, Page: 247-256,

Rafky, David M.Sussman, Richard C, Polygraphie reliability and validity: Individual components and stress of issue in criminal tests. Journal of Police Science and Administration, Year: 1985, Volume: 13, Page: 283-294,

Rapp, John F., The comparative reliability of polygraph evidence. Review of Litigation, Year: 1985, Volume: 4, Page: 151-172,

Raskin, David C, Estimates of validity and the role of base rates in the interpretation of polygraph results Psychophysiology, Year: 1984, Volume: 21, Page: 567, Comments2: [abstract],

Raskin, David C, Methodological issues in estimating polygraph accuracy in field applications. Canadian Journal of Behavioural Science, Year: 1987, Volume: 19, Page: 389-404,

Raskin, David C.Kircher, John C.Honts, Charles R.Horowitz, Steven W., Validity of control question polygraph tests in criminal investigation. Psychophysiology, Year: 1988, Volume: 25, Page: 476,

Report of the investigation of the validity of the psychological stress evaluator, Year: 1980, Page: -, Comments2: Prepared for the Commonwealth of Virginia Department of Commerce.,

Roman, Deborah D.Tuley, Michael L.Villaneuva, Michael RMitchell, W. E., Evaluating MMPI validity in a forensic psychiatric population. Criminal Justice and Behavior, Year: 1990, Volume: 17, Page: 186-198,

Rovner, Louis I., Scientific review: 'An investigation of the accuracy and consistency of polygraph chart interpretation by inexperienced and experienced examiners' by Yankee, Powell & Newland. APA Newsletter, Year: 1986, Volume: 19, Page: 12-Nov,

Rovner, Louis I., Scientific review: 'Polygraph field validity' by Elaad & Shakhar. APA Newsletter, Year: 1986, Volume: 19, Page: 15-16, 61

Sackett, Paul R., Honesty testing for personnel selection: Assessing the validity of 'paper and pencil' test. Personnel Administrator, Year: 1985, Volume: 30, Page: 67-70, 72, 74, 76, 121,

Saxe, LeonardDougherty, DeniseCross, Theodore P., The validity of polygraph testing: scientific analysis and public controversy American Psychologist, Year: 1985, Volume: 40, Page: 355-366,

Sevilla, Charles M., Reliability of polygraph examination. American Jurisprudence, Year: 1977, Volume: 14,

Shull, Kendall W.Crowe, Michael J., Effects of two methods of comparing relevant and control questions on the accuracy of psychophysiological detection of deception, Location: Fort McClellan, AL, Publisher: Department of Defense Polygraph Institute, GovReport: DoDPI93-R- 0002, Comments2: DTIC AD Number A304538,

Simpson, B. Allen, The polygraph: Concept, usage and validity. Psychology, Year: 1986, Volume: 23, Page: 42-45,

Slowik, Stanley M.Buckley, Joseph P., Ill, Relative accuracy of polygraph examiner diagnosis of respiration, blood pressure and GSR recordings. Journal of Police Science and Administration, Year: 1975, Volume: 3, Page: 305-309,

Smith, R. Jeffrey, Dubious validity of polygraph tests Science, Year: 1984, Page: 1217,

Smith, R. Jeffrey, Polygraph tests: Dubious validity. Science, Year: 1984, Volume: 224, Page: 1217,

Steinbrook, Robert, The polygraph test - A flawed diagnostic method The New England Journal of Medicine, Year: 1992, Volume: 327, Page: 122-123,

Stone, Donald H., Pre-employment inquiries: Drug testing, alcohol screening, physical exams, honesty testing, genetics screening-Do they discriminate? Akron Law Review, Year: 1991, Volume: 25, Page: 367,

Sussmann, MarioRobertson, Donald U., The validity of validity: an analysis of validation study designs Journal of Applied Psychology, Year: 1986, Volume: 71, Page: 461-468,

Suzuki, Akihiro, Theoretical inquiry on the polygraph test - Concerning the legal argument on the reliability critic. Kakeiken Shiryo (Police Science Data), Year: 1968, Volume: 49, Page: 24- 28, Comments2: [text in Japanese],

Suzuki, AkihiroWatanabe, ShoichiOhnishi, KazuoMatsuno, KatsunoriArasuna, Masana, Polygraph examiners'judgements in chart interpretation: Reliability of judgement Reports of the National Research Institute of Police Science, Year: 1973, Volume: 26, Page: 34-39, Comments2: [text in Japanese; English abstract], 62

Suzuki, Sadao, Theoretical considerations on the polygraph test: With regard to legal discussions as to its reliability., Comments2: Reference unknown. [English translation from Japanese] (Available from DoDPI),

Swets, John A., Measuring the accuracy of diagnostic systems Science, Year: 1988, Volume: 240, Page: 1285-1293,

Tests measure polygraph accuracy LEAA Newsletter, Year: 1977, Page: 214,

The Polygraph: Another report supports its reliability Security Letter, Year: 1979, Volume: 9, Page: Part 1,4,

Validity and reliability of the polygraph Assets Protection, Year: 1979, Volume: 4, Page: 34-36,

Wasyliw, Orest E.Grossman, LindaHaywood, Thomas W.Cavanaugh, James L., Jr., The detection of malingering in criminal forensic groups: MMPI validity scales. Journal of Personality Assessment, Year: 1988, Volume: 52, Page: 321-333,

Widacki, JanHorvath, Frank S., An experimental investigation of the relative validity and utility of the polygraph technique and three other methods of criminal identification. Journal of Forensic Sciences, Year: 1978, Volume: 23, Page: 596-601, Comments2: Reprinted in Polygraph, 7,215-221.,

Williams, Vergil L., Response to Cross and Saxe's 'A Critique of the Validity of Polygraph Testing in Child Sexual Abuse Cases' Journal of Child Sexual Abuse, Year: 1995, Volume: 4, Page: 55-71,

Yamaoka, KazunobuYamamura, Takehiko, Reliability and validity of polygraph technique. Keisato Gakuron Shuu [Reports of Police Science], Year: 1982, Volume: 36, Page: 134-150, Comments2: [text in Japanese],

Yankee, William J, James M. Powell, and Ross Newland. 1985. An investigation of the accuracy and consistency of polygraph chart interpretation by inexperienced and experienced examiners. Polygraph 14(2): 108-117. 63

Articles Reviewed-Medical

Abdel-Nabi HH, Doerr RJ. Multicenter clinical trials of monoclonal antibody B72.3-GYK- DTPA lllln (lllIn-CYT-103; OncoScint CR103) in patients with colorectal carcinoma. Targeted Diagn Ther 1992;6:73-88

Abrams BJ, Sukumvanich P, Seibel R, Moscati R, Jehle D. Ultrasound for the detection of intraperitoneal fluid: the role of Trendelenburg positioning. Am J Emerg Med 1999 Mar; 17(2): 117-20

Abril JC, Berjano P, Diaz A. Concordance between hip ultrasonography and hip arthrography in the assessment of developmental dysplasia of the hip. J Pediatr Orthop B 1999 Oct;8(4):264-7

Acampa W, Cuocolo A, Sullo P, Varrone A, Nicolai E, Pace L, Petretta M, Salvatore M. Direct comparison of technetium 99m-sestamibi and technetium 99m-tetrofosmin cardiac single photon emission computed tomography in patients with coronary artery disease. J Nucl Cardiol 1998 May-Jun;5(3):265-74

Achiron R, Goldenberg M, Lipitz S, Mashiach S, Oelsner G. Transvaginal Doppler sonography for detecting ectopic pregnancy: is it really necessary. Isr J Med Sei 1994 Nov;30(l l):820-5

Adamek HE, Albert J, Breer H, Weitz M, Schilling D, Riemann JF. Pancreatic cancer detection with magnetic resonance cholangiopancreatography and endoscopic retrograde cholangiopancreatography: a prospective controlled study. Lancet 2000 Jul 15;356(9225):190-3

Adamek HE, Albert J, Weitz M, Breer H, Schilling D, Riemann JF. A prospective evaluation of magnetic resonance cholangiopancreatography in patients with suspected bile duct obstruction. Gut 1998 Nov;43(5):680-3

Aguirre GK, Zarahn E, D'esposito M. The variability of human, BOLD hemodynamic responses. Neuroimage 1998Nov;8(4):360-9

Ahn CY, DeBruhl ND, Gorczyca DP, Shaw WW, Basse« LW. Comparative silicone breast implant evaluation using mammography, sonography, and magnetic resonance imaging: experience with 59 implants. Plast Reconstr Surg 1994 Oct;94(5):620-7

Akahoshi K, Chijiiwa Y, Nakano I, Nawata H, Ogawa Y, Tanaka M, Nagai E, Tsuneyoshi M. Diagnosis and staging of pancreatic cancer by endoscopic ultrasound. Br J Radiol 1998 May;71(845):492-6

al-Abadi H. Fine needle aspiration biopsy vs. ultrasound-guided transrectal random core biopsy of the prostate. Comparative investigations in 246 cases. Acta Cytol 1997 Jul-Aug;41(4):981-6

Alam MS, Takeuchi R, Kasagi K, Misaki T, Miyamoto S, Iida Y, Hidaka A, Konishi J. Value of combined technetium-99m hydroxy methylene diphosphonate and thallium-201 imaging in detecting bone metastases from thyroid carcinoma. Thyroid 1997 Oct;7(5):705-12 64

Alasaarela E, Leppilahti J, Hakala M. Ultrasound and operative evaluation of arthritic shoulder joints. Ann Rheum Dis 1998 Jun;57(6):357-60

Alcaraz MJ, De la Morena EJ, Polo A, Ramos A, De la Cal MA, Gonzalez Mandly A. A comparative study of magnetic resonance cholangiography and direct cholangiography.Rev Esp Enferm Dig 2000 Jul;92(7):427-38

Allen MW, Hendi P, Basse« L, Phelps ME, Gambhir SS. A study on the cost effectiveness of sestamibi scintimammography for screening women with dense breasts for breast cancer. Breast Cancer Res Treat 1999 Jun;55(3):243-58

Alloatti S, Molino A, Bonfant G, Ratibondi S, Bosticardo GM. Measurement of vascular access recirculation unaffected by cardiopulmonary recirculation: evaluation of an ultrasound method. Nephron 1999 Jan;81(l):25-30 al-Nahhas AM, Jawad AS, Norman A, McCready VR. 99Tcm-MDP blood-pool phase in the assessment of repetitive strain injury. Nucl Med Commun 1997 Oct; 18(10): 927-31

Alonso JC, Soriano A, Zarca MA, Guerra P, Alcazar R, Molino C. Breast cancer detection with sestamibi'-Tc-99m and Tl-201 radionuclides in patients with non conclusive mammography. Anticancer Res 1997 May-Jun;17(3B):1661-5

AM, Kiat H, Friedman JD, Berman DS. Adenosine technetium-99m sestamibi myocardial perfusion SPECT in women: diagnostic efficacy in detection of coronary artery disease. J Am Coll Cardiol 1996 Mar 15;27(4):803-9

Amanullah AM, Aasa M. Significance of ST segment depression during adenosine-induced coronary hyperemia in angina pectoris and correlation with angiographic, scintigraphic, hemodynamic, and echocardiographic variables. Int J Cardiol 1995 Feb;48(2): 167-76

Ambrosetti P, Jenny A, Becker C, Terrier TF, Morel P. Acute left colonic diverticulitis- compared performance of computed tomography and water-soluble contrast enema: prospective evaluation of 420 patients. Dis Colon Rectum 2000 Oct;43( 10): 1363-7

Anagnostopoulos C, Gunning MG, Davies G, Francis J, Underwood SR. Simultaneous biplane first-pass radionuclide ventriculography using 99Tcm-tetrofosmin: a comparison with magnetic resonance imaging. Nucl Med Commun 1998 May;19(5):435-41

Andren-Sandberg A, Lindberg CG, Lundstedt C, Ihse I. Computed tomography and laparoscopy in the assessment of the patient with pancreatic cancer. J Am Coll Surg 1998 Jan;186(l):35-40

Angus BJ, Smith MD, Suputtamongkol Y, Mattie H, Walsh AL, Wuthiekanun V, Chaowagul W, White NJ. Pharmacokinetic-pharmacodynamic evaluation of ceftazidime continuous infusion vs intermittent bolus injection in septicaemic melioidosis. Br J Clin Pharmacol 2000 May;49(5):445-52 Corrected and republished in: Br J Clin Pharmacol 2000 Aug;50(2):184-91 65

Anticancer Res 1998 May-Jun;18(3C):2159-61

Antinori A, De Rossi G, Ammassari A, Cingolani A, Murri R, Di Giuda D, De Luca A, Pierconti F, Tartaglione T, Scerrati M, Larocca LM, Ortona L. Value of combined approach with thallium- 201 single-photon emission computed tomography and Epstein-Barr virus DNA polymerase chain reaction in CSF for the diagnosis of AIDS-related primary CNS lymphoma. J Clin Oncol 1999Feb;17(2):554-60

Arbab AS, Koizumi K, Toyama K, Arai T, Yoshitomi T, Araki T. Detection of lung lesions and lymph nodes with 201T1 SPET. Nucl Med Commun 1998 May;19(5):411-6

Arbit E, Galicich JH. Importance of image-guided stereotactic biopsy to confirm diagnosis in an oncological setting. Ann Surg Oncol 1994 Sep;l(5):368-72

Arioglu E, Duncan-Morin J, Sebring N, Rother KI, Gottlieb N, Lieberman J, Herion D, Kleiner DE, Reynolds J, Premkumar A, Sumner AE, Hoofnagle J, Reitman ML, Taylor SI. Efficacy and safety of troglitazone in the treatment of lipodystrophy syndromes. Ann Intern Med 2000 Aug 15;133(4):263-74

Aro AR, de Koning HJ, Absetz P, Schreck M. Psychosocial predictors of first attendance for organised mammography screening. J Med Screen 1999;6(2):82-8

Asakura H, Nakai A, Ishikawa G, Suzuki S, Araki T. Prediction of uterine dehiscence by measuring lower uterine segment thickness prior to the onset of labor: evaluation by transvaginal ultrasonography. J Nippon Med Sch 2000 Oct;67(5):352-6

Ascher SM, Agrawal R, Bis KG, Brown ED, Maximovich A, Markham SM, Part RH, Semelka RC. Endometriosis: appearance and detection with conventional and contrast-enhanced fat- suppressed spin-echo techniques. J Magn Reson Imaging 1995 May-Jun;5(3):251-7

Asencio F, Aguilo J, Salvador JL, Villar A, De la Morena E, Ahamad M, Escrig J, Puche J, Viciano V, Sanmiguel G, Ruiz J. Video-laparoscopic staging of gastric cancer. A prospective multicenter comparison with noninvasive techniques. Surg Endosc 1997 Dec; 11(12): 1153-8

Ashkinazi IYa, Vershinina EA. Pain sensitivity in chronic psychoemotional stress in humans. Neurosci Behav Physiol 1999 May-Jun;29(3):333-7

Aversa A, Bonifacio V, Moretti C, Frajese G, Fabbri A. Re-dosing of prostaglandin-El versus prostaglandin-El plus phentolamine in male erectile dysfunction: a dynamic color power Doppler study. Int J Impot Res 2000 Feb;12(l):33-40

Avril N, Rose CA, Schelling M, Dose J, Kuhn W, Bense S, Weber W, Ziegler S, Graeff H, Schwaiger M. Breast imaging with positron emission tomography and fluorine-18 fluorodeoxyglucose: use and limitations. J Clin Oncol 2000 Oct 15;18(20):3495-502 66

Ayers M, Prince M, Ahmadi S, Baran DT. Reconciling quantitative ultrasound of the calcaneus with X-ray-based measurements of the central skeleton. J Bone Miner Res 2000 Sep;15(9):1850- 5

Baba S, Akira K, Suzuki H, Imachi M. Use of nuclear magnetic resonance spectroscopy and selective C-labeling for pharmacokinetic research in man: detection of benzoic acid conversion to hippuric acid. Biol Pharm Bull 1995 May; 18(5):643-7

Babaian RJ, Sayer J, Podoloff DA, Steelhammer LC, Bhadkamkar VA, Gulfo JV. Radioimmunoscintigraphy of pelvic lymph nodes with lllindium-labeled monoclonal antibody CYT-356. J Urol 1994 Dec;152(6 Pt l):1952-5

Babikian V, Sloan MA, Tegeler CH, DeWitt LD, Fayad PB, Feldmann E, Gomez CR. Transcranial Doppler validation pilot study. J Neuroimaging 1993 Oct;3(4):242-9

Bain G, Bodley R, Jamous A, Williams S, Silver J. A comparison of the chest radiograph and computerised tomography in assessing lung changes in acute spinal injuries~an assessment of their prevalence and the accuracy of the chest X-ray compared with CT in their assessment. Paraplegia 1995 Mar;33(3): 121-5

Baines CJ, McFarlane DV, Miller AB. Sensitivity and specificity of first screen mammography in 15 NBSS centres. Can Assoc Radiol J 1988 Dec;39(4):273-6

Baines CJ, Miller AB, Wall C, McFarlane DV, Simor IS, Jong R, Shapiro BJ, Audet L, Petitclerc M, Ouimet-Oliva D, et al. Sensitivity and specificity of first screen mammography in the Canadian National Breast Screening Study: a preliminary report from five centers. Radiology 1986Aug;160(2):295-8

Baines CJ. The Canadian National Breast Screening Study: a perspective on criticisms. Ann Intern Med 1994 Feb 15;120(4):326-34

Baird AE, Austin MC, McKay WJ, Donnan GA. Sensitivity and specificity of 99mTc-HMPAO SPECT cerebral perfusion measurements during the first 48 hours for the localization of cerebral infarction. Stroke 1997 May;28(5):976-80

Bakkevold KE, Arnesjo B, Kambestad B. Carcinoma of the pancreas and papilla of Vater: presenting symptoms, signs, and diagnosis related to stage and tumour site. A prospective multicentre trial in 472 patients. Norwegian Pancreatic Cancer Trial. Scand J Gastroenterol 1992 Apr;27(4):317-25

Barber PA, Darby DG, Desmond PM, Gerraty RP, Yang Q, Li T, Jolley D, Donnan GA, Tress BM, Davis SM. Identification of major ischemic change. Diffusion-weighted imaging versus computed tomography. Stroke 1999 Oct;30(10):2059-65

Barber PA, Demchuk AM, Hudon ME, Pexman JH, Hill MD, Buchan AM. Hyperdense sylvian fissure MCA "dot" sign: A CT marker of acute ischemia. Stroke 2001 Jan;32(l):84-8 67

Barentsz JO, Berger-Hartog O, Witjes JA, Hulsbergen-van der Kaa C, Oosterhof GO, VanderLaak JA, Kondacki H, Ruijs SH. Evaluation of chemotherapy in advanced urinary bladder cancer with fast dynamic contrast-enhanced MR imaging. Radiology 1998 Jun;207(3):791-7

Baronciani D, Atti G, Andiloro F, Bartesaghi A, Gagliardi L, Passamonti C, Petrone M. Screening for developmental dysplasia of the hip: from theory to practice. Collaborative Group DDH Project. Pediatrics 1997 Feb;99(2):E5

Barron B, Hanna C, Passalaqua AM, Lamki L, Wegener WA, Goldenberg DM. Rapid diagnostic imaging of acute, nonclassic appendicitis by leukoscintigraphy with sulesomab, a technetium 99m-labeled antigranulocyte antibody Fab' fragment. LeukoScan Appendicitis Clinical Trial Group. Surgery 1999 Mar;125(3):288-96

Barronian AD, Zoltan JD, Bucon KA. Magnetic resonance imaging of the knee: correlation with arthroscopy. Arthroscopy 1989;5(3):187-91

Barry BP, Hall N, Cornford E, Broderick NJ, Somers JM, Rose DH. Improved ultrasound detection of renal scarring in children following urinary tract infection. Clin Radiol 1998 Oct;53(10):747-51

Bartolozzi C, Donati F, Cioni D, Crocetti L, Lencioni R. MnDPDP-enhanced MRI vs dual-phase spiral CT in the detection of hepatocellular carcinoma in cirrhosis. Eur Radiol 2000; 10(11): 1697- 702

Bartolozzi C, Donati F, Cioni D, Crocetti L, Lencioni R. MnDPDP-enhanced MRI vs dual-phase spiral CT in the detection of hepatocellular carcinoma in cirrhosis. Eur Radiol 2000;10(11): 1697- 702

Ba-Ssalamah A, Heinz-Peer G, Schima W, Schibany N, Schick S, Prokesch RW, Kaider A, Teleky B, Wrba F, Lechner G. Detection of focal hepatic lesions: comparison of unenhanced and SHU 555 A-enhanced MR imaging versus biphasic helical CTAP. J Magn Reson Imaging 2000 Jun;ll(6):665-72

Bateman TM, Gray RJ, Whiting JS, Sethna DH, Berman DS, Matloff JM, Swan HJ, Forrester JS. Prospective evaluation of ultrafast cardiac computed tomography for determination of coronary bypass graft patency. Circulation 1987 May;75(5):1018-24

Baum RA, Rutter CM, Sunshine JH, Blebea JS, Blebea J, Carpenter JP, Dickey KW, Quinn SF, Gomes AS, Grist TM, et al. Multicenter trial to evaluate vascular magnetic resonance angiography of the lower extremity. American College of Radiology Rapid Technology Assessment Group. JAMA 1995 Sep 20;274(11):875-80 68

Baxter GM, Kincaid W, Jeffrey RF, Millar GM, Porteous C, Morley P. Comparison of colour Doppier ultrasound with venography in the diagnosis of axillary and subclavian vein thrombosis. Br JRadiol 1991 Sep;64(765):777-81

Bazzoli F, Zagari M, Pozzato P, Varoli O, Fossi S, Ricciardiello L, Alampi G, Nicolini G, Sottili S, Simoni P, Roda A, Roda E. Evaluation of short-term low-dose triple therapy for the eradication of Helicobacter pylori by factorial design in a randomized, double-blind, controlled study. Aliment Pharmacol Ther 1998 May;12(5):439-45

Beaulieu CF, Jeffrey RB Jr, Karadi C, Paik DS, Napel S. Display modes for CT colonography. Part II. Blinded comparison of axial CT and virtual endoscopic and panoramic endoscopic volume-rendered studies. Radiology 1999 Jul;212(l):203-12

Beck JG, Stanley MA, Baldwin LE, Deagle EA 3rd, Averill PM. Comparison of cognitive therapy and relaxation training for panic disorder. J Consult Clin Psychol 1994 Aug;62(4):818- 26

Beck RW, Kupersmith MJ, Cleary PA, Katz B. Fellow eye abnormalities in acute unilateral optic neuritis. Experience of the optic neuritis treatment trial. Ophthalmology 1993 May;100(5):691-7; discussion 697-8

Beets-Tan RG, Beets GL, Vliegen RF, Kessels AG, Van Boven H, De Bruine A, von Meyenfeldt MF, Baeten CG, van Engelshoven JM. Accuracy of magnetic resonance imaging in prediction of tumour-free resection margin in rectal cancer surgery. Lancet 2001 Feb 17;357(9255):497-504

Bellin MF, Roy C, Kinkel K, Thoumas D, Zaim S, Vanel D, Tuchmann C, Richard F, Jacqmin D, Delcourt A, Challier E, Lebret T, Cluzel P. Lymph node metastases: safety and effectiveness of MR imaging with ultrasmall superparamagnetic iron oxide particles—initial clinical experience. Radiology 1998 Jun;207(3):799-808

Belsky M, Gaitini D, Goldsher D, Hoffman A, Daitzchman M. Color-coded duplex ultrasound compared to CT angiography for detection and quantification of carotid artery stenosis. Eur J Ultrasound 2000 Sep;12(l):49-60

Benedetto C, Valensise H, Marozio L, Giarola M, Massobrio M, Romanini C. A two-stage screening test for pregnancy-induced hypertension and preeclampsia. Obstet Gynecol 1998 Dec;92(6): 1005-11

Benfeldt E. In vivo microdialysis for the investigation of drug levels in the dermis and the effect of barrier perturbation on cutaneous drug penetration. Studies in hairless rats and human subjects. Acta Derm Venereol Suppl (Stockh) 1999;206:1-59

Bentin S, Deouell LY, Soroker N. Selective visual streaming in face recognition: evidence from developmental prosopagnosia. Neuroreport 1999 Mar 17;10(4):823-7 69

Berghammer P, Obwegeser R, Mulauer-Ertl S, Karanikas G, Wiltschke C, Kubista E, Sinzinger H, Zielinski C. 99m-Tc-tetrofosmin scintigraphy and breast cancer. Gynecol Oncol 1999 Apr;73(l):87-90

Bergman H, Chertkow H, C, Stern J, Rush C, Whitehead V, Dixon R. HM-PAO (CERETEC) SPECT brain scanning in the diagnosis of Alzheimer's disease. J Am Geriatr Soc 1997 Jan;45(l): 15-20

Berland LL, Koslin DB, Routh WD, Keller FS. Renal artery stenosis: prospective evaluation of diagnosis with color duplex US compared with angiography. Work in progress. Radiology 1990 Feb;174(2):421-3

Berlangieri SU, Scott AM, Knight SR, Fitt GJ, Hennessy OF, Tochon-Danguy HJ, Clarke CP, McKay WJ. F-18 fluorodeoxyglucose positron emission tomography in the non-invasive staging of non-small cell lung cancer. Eur J Cardiothorac Surg 1999 Sep;16 Suppl l:S25-30

Berman DS, Kiat H, Friedman JD, Wang FP, van Train K, Matzer L, Maddahi J, Germano G. Separate acquisition rest thallium-201 /stress technetium-99m sestamibi dual-isotope myocardial perfusion single-photon emission computed tomography: a clinical validation study. J Am Coll Cardiol 1993 Nov 1 ;22(5): 1455-64

Berna L, Chico A, Matias-Guiu X, Mato E, Catafau A, Alonso C, Mora J, Mauricio D, Rodriguez-Espinosa J, Man C, Flotats A, Martin JC, Estorch M, Carrio I. Use of somatostatin analogue scintigraphy in the localization of recurrent medullary thyroid carcinoma. Eur J Nucl Med 1998 Nov;25(ll): 1482-8

Berrouschot J, Barthel H, von Kummer R, Knapp WH, Hesse S, Schneider D. 99m technetium- ethyl-cysteinate-dimer single-photon emission CT can predict fatal ischemic brain edema. Stroke 1998 Dec;29(12):2556-62

Bettoni L, Bortone E, Dascola I, Delsoldato S, Giorgi C, Mancia D. The masseteric inhibitory reflex in the diagnosis of multiple sclerosis. Electromyogr Clin Neurophysiol 1998 Jan- Feb;38(l):ll-7

Binder T, Assayag P, Baer F, Flachskampf F, Kamp O, Nienaber C, Nihoyannopoulos P, Pierard L, Steg G, Vanoverschelde JL, Van der Wouw P, Meland N, Marelli C, Lindvall K. NCI00100, a new echo contrast agent for the assessment of myocardial perfusion-safety and comparison with technetium-99m sestamibi single-photon emission computed tomography in a randomized multicenter study. Clin Cardiol 1999 Apr;22(4):273-82

Birney TJ, White JJ Jr, Berens D, Kuhn G. Comparison of MRI and discography in the diagnosis of lumbar degenerative disc disease. J Spinal Disord 1992 Dec;5(4):417-23

Bizollon T, Rode A, Bancel B, Gueripel V, Ducerf C, Baulieux J, Trepo C. Diagnostic value and tolerance of Lipiodol-computed tomography for the detection of small hepatocellular carcinoma: correlation with pathologic examination of explanted livers. J Hepatol 1998 Mar;28(3):491-6 70

Blachere H, Latrabe V, Montaudon M, Valli N, Couffinhal T, Raherisson C, Leccia F, Laurent F. Pulmonary embolism revealed on helical CT angiography: comparison with ventilation- perfusion radionuclide lung scanning. AJR Am J Roentgenol 2000 Apr; 174(4): 1041-7

Blaivas M, Lambert MJ, Harwood RA, Wood JP, Konicki J. Lower-extremity Doppler for deep venous thrombosis—can emergency physicians be accurate and fast? Acad Emerg Med 2000 Feb;7(2): 120-6

Blanchard TK, Bearcroft PW, Constant CR, Griffin DR, Dixon AK. Diagnostic and therapeutic impact of MRI and arthrography in the investigation of full-thickness rotator cuff tears. Eur Radiol 1999;9(4):638-42

Blankenberg FG, Loh NN, Bracci P, D'Arceuil HE, Rhine WD, Norbash AM, Lane B, Berg A, Person B, Coutant M, Enzmann DR. Sonography, CT, and MR imaging: a prospective comparison of neonates with suspected intracranial ischemia and hemorrhage. AJNR Am J Neuroradiol 2000 Jan;21(l):213-8

Blend MJ, Hyun H, Kozloff M, Levi H, Mills GQ, Gasparini M, Buraggi G, Hughes L, Pinsky CM, Goldenberg DM. Improved staging of B-cell non-Hodgkin's lymphoma patients with 99mTc-labeled LL2 monoclonal antibody fragment. Cancer Res 1995 Dec 1;55(23 Suppl):5764s-5770s

Blomqvist L, Machado M, Rubio C, Gabrielsson N, Granqvist S, Goldman S, Holm T. Rectal tumour staging: MR imaging using pelvic phased-array and endorectal coils vs endoscopic ultrasonography. Eur Radiol 2000;10(4):653-60

Blomqvist L, Ohlsen H, Hindmarsh T, Jonsson E, Holm T. Local recurrence of rectal cancer: MR imaging before and after oral superparamagnetic particles vs contrast-enhanced computed tomography. Eur Radiol 2000;10(9): 1383-9

Blomqvist L, Rubio C, Holm T, Machado M, Hindmarsh T. Rectal adenocarcinoma: assessment of tumour involvement of the lateral resection margin by MRI of resected specimen. Br J Radiol 1999 Jan;72(853): 18-23

Bluemke DA, Stillman AE, Bis KG, Grist TM, Baum RA, DAgostino R, Maiden ES, Pierro JA, Yucel EK. Carotid MR angiography: phase II study of safety and efficacy for MS-325. Radiology 2001 Apr;219(l): 114-22

Blum JE, Handmaker H, Rinne NA. The utility of a somatostatin-type receptor binding peptide radiopharmaceutical (P829) in the evaluation of solitary pulmonary nodules. Chest 1999 Jan;115(l):224-32

Boam WD, Miser WF, Yuill SC, Delaplain CB, Gayle EL, MacDonald DC. Comparison of ultrasound examination with bone scintiscan in the diagnosis of stress fractures. J Am Board Fam Pract 1996 Nov-Dec;9(6):414-7 71

Boardman LA, Peipert JF, Brody JM, Cooper AS, Sung J. Endovaginal sonography for the diagnosis of upper genital tract infection. Obstet Gynecol 1997 Jul;90(l):54-7

Bogers HA, Sedelaar JP, Beerlage HP, de la Rosette JJ, Debruyne FM, Wijkstra H, Aarnink RG. Contrast-enhanced three-dimensional power Doppler angiography of the human prostate: correlation with biopsy outcome. Urology 1999 Jul;54(l):97-104

Boilleau G, Pujol JL, Ychou M, Faurous P, Marty-Ane C, Michel FB, Godard P. Detection of lymph node metastases in lung cancer: comparison of 131I-anti-CEA-anti-CA 19-9 immunoscintigraphy versus computed tomography. Lung Cancer 1994 Sep;ll(3-4):209-19

Bokura H, Kobayashi S, Yamaguchi S. Distinguishing silent lacunar infarction from enlarged Virchow-Robin spaces: a magnetic resonance imaging and pathological study. J Neurol 1998 Feb;245(2): 116-22

Bom HS, Song HC, Kim JY, Choi SK, Rew JS, Yoon CM. Differentiation of malignant from benign pancreatic mass by Tl-201 abdominal SPECT. J Korean Med Sei 1995 Apr;10(2):93-6

Bongers V, Bakker J, Beutler JJ, Beek FJ, De Klerk JM. Assessment of renal artery stenosis: comparison of captopril renography and gadolinium-enhanced breath-hold MR angiography. Clin Radiol 2000 May;55(5):346-53

Bonvalot S, Bouvard N, Lothaire P, Maurel J, Galateau F, Segol P, Gignoux M. Contribution of cervical ultrasound and ultrasound fine-needle aspiration biopsy to the staging of thoracic oesophageal carcinoma. Eur J Cancer 1996 May;32A(5):893-5

Boonbaichaiyapruck S, Panpunnang S, Siripornpitak S, Nitjapanich S, Sritara C, Sritara P. Utilization of electron beam CT scan in diagnosis of pulmonary embolism. J Med Assoc Thai 1997 Aug;80(8):527-33

Boraschi P, Braccini G, Grassi L, Campatelli A, Di Vito A, Mosca F, Perri G. Incidentally discovered adrenal masses: evaluation with gadolinium enhancement and fat-suppressed MR imaging at 0.5 T. Eur J Radiol 1997 May;24(3):245-52

Boroojerdi B, Hungs M, Mull M, Topper R, Noth J. Interhemispheric inhibition in patients with multiple sclerosis. Electroencephalogr Clin Neurophysiol 1998 Jun;109(3):230-7

Borzellino G, De Manzoni G, Ricci F. Detection of abdominal adhesions in laparoscopic surgery. A controlled study of 130 cases. Surg Laparosc Endosc 1998 Aug;8(4):273-6

Bosio M. Cystosonography with echocontrast: a new imaging modality to detect vesicoureteric reflux in children. Pediatr Radiol 1998 Apr;28(4):250-5 72

Bozkurt T, Richter F, Lux G. Ultrasonography as a primary diagnostic tool in patients with inflammatory disease and tumors of the small intestine and large bowel. J Clin Ultrasound 1994 Feb;22(2):85-91

Braccini G, Lamacchia M, Boraschi P, Bertellotti L, Marrucci A, Goletti O, Perri G. Ultrasound versus plain film in the detection of pneumoperitoneum. Abdom Imaging 1996 Sep- Oct;21(5):404-12

Bradley M, Bladon J, Barker H. D-dimer assay for deep vein thrombosis: its role with colour Doppler sonography. Clin Radiol 2000 Jul;55(7):525-7

Brady PG, Goldschmid S, Chappel G, Slone FL, Boyd WP. A comparison of biopsy techniques in suspected focal liver disease. Gastrointest Endosc 1987 Aug;33(4):289-92

Brambati B, Cislaghi C, Tului L, Alberti E, Amidani M, Colombo U, Zuliani G. First-trimester Down's syndrome screening using nuchal translucency: a prospective study in patients undergoing chorionic villus sampling. Ultrasound Obstet Gynecol 1995 Jan;5(l):9-14

Brammer MJ. Multidimensional wavelet analysis of functional magnetic resonance images. Hum Brain Mapp 1998;6(5-6):378-82

Brand E, Schorr U, Ringel J, Beige J, Distler A, Sharma AM. Aldosterone synthase gene (CYP11B2) C-344T polymorphism in Caucasians from the Berlin Salt-Sensitivity Trial (BeSST). Aldosterone synthase gene (CYP11B2) C-344T polymorphism in Caucasians from the Berlin Salt-Sensitivity Trial (BeSST).

Brandt-Mainz K, Müller SP, Gorges R, Sailer B, Bockisch A. The value of fluorine-18 fluorodeoxyglucose PET in patients with medullary thyroid cancer. Eur J Nucl Med 2000 May;27(5):490-6

Brant-Zawadzki MN, Jensen MC, Obuchowski N, Ross JS, Modic MT. Interobserver and intraobserver variability in interpretation of lumbar disc abnormalities. A comparison of two nomenclatures. Spine 1995 Jun l;20(ll):1257-63

Brasel KJ, Borgstrom DC, Weigelt JA. Cost-effective prevention of pulmonary embolus in high- risk trauma patients. J Trauma 1997 Mar;42(3):456-60; discussion 460-2

Brass LM, Walovitch RC, Joseph JL, Leveille J, Marchand L, Hellman RS, Tikofsky RS, Masdeu JC, Hall KM, Van Heertum RL. The role of single photon emission computed tomography brain imaging with 99mTc-bicisate in the localization and definition of mechanism of ischemic stroke. J Cereb Blood Flow Metab 1994 Jan;14 Suppl l:S91-8

Brawer MK, Ploch NR, Bigler SA. Prostate cancer tumor location as predicted by digital rectal examination transferred to ultrasound and ultrasound-guided prostate needle biopsy. J Cell Biochem Suppl 1992;16H:74-7 73

Brittenden J, Watmough D, Heys SD, Eremin O. Preliminary clinical evaluation of a combined optical Doppler ultrasound instrument for the detection of breast cancer. Br J Radiol 1995 Dec;68(816):1344-8

Britton PD, Coulden RA. The use of duplex Doppler ultrasound in the diagnosis of breast cancer. Clin Radiol 1990 Dec;42(6): 3 99-401

Bromley B, Lieberman E, Laboda L, Benacerraf BR. Echogenic intracardiac focus: a sonographic sign for fetal Down syndrome. Obstet Gynecol 1995 Dec;86(6):998-1001

Brossner C, Madersbacher S, Bayer G, Pycha A, Klingler HC, Maier U. Comparative study of two different TRUS-guided sextant biopsy techniques in detecting prostate cancer in one biopsy session. Eur Urol 2000 Jan;37(l):65-71

Brown J, Buckley D, Coulthard A, Dixon AK, Dixon JM, Easton DF, Eeles RA, Evans DG, Gilbert FG, Graves M, Hayes C, Jenkins JP, Jones AP, Keevil SF, Leach MO, Liney GP, Moss SM, Padhani AR, Parker GJ, Pointon LJ, Ponder BA, Redpath TW, Sloane JP, Turnbull LW, Walker LG, Warren RM. Magnetic resonance imaging screening in women at genetic risk of breast cancer: imaging and analysis protocol for the UK multicentre study. UK MRI Breast Screening Study Advisory Group. Magn Reson Imaging 2000 Sep; 18(7): 765-76

Brown J, Buckley D, Coulthard A, Dixon AK, Dixon JM, Easton DF, Eeles RA, Evans DG, Gilbert FG, Graves M, Hayes C, Jenkins JP, Jones AP, Keevil SF, Leach MO, Liney GP, Moss SM, Padhani AR, Parker GJ, Pointon LJ, Ponder BA, Redpath TW, Sloane JP, Turnbull LW, Walker LG, Warren RM. Magnetic resonance imaging screening in women at genetic risk of breast cancer: imaging and analysis protocol for the UK multicentre study. UK MRI Breast Screening Study Advisory Group. Magn Reson Imaging 2000 Sep;18(7):765-76

Brugge WR, Lee MJ, RW, Mathisen DJ. Endoscopic ultrasound staging criteria for esophageal cancer. Gastrointest Endosc 1997 Feb;45(2): 147-52

Buadu LD, Murakami J, Murayama S, Hashiguchi N, Toyoshima S, Sakai S, Yabuuchi H, Masuda K, Kuroki S, Ohno S. Colour Doppler sonography of breast masses: a multiparameter analysis. Clin Radiol 1997 Dec;52(12):917-23

Buccheri G, Biggi A, Ferrigno D, D'Angeli B, Vassallo G, Leone A, Taviani M, Comino A. Imaging lung cancer by scintigraphy with Indium Ill-labeled F(ab')2 fragments of the anticarcinoembryonic antigen monoclonal antibody F023C5. Cancer 1992 Aug 15;70(4):749-59

Buchmann I, Reinhardt M, Eisner K, Bunjes D, Altehoefer C, Finke J, Moser E, Glatting G, Kotzerke J, Guhlmann CA, Schirrmeister H, Reske SN. 2-(fluorine-18)fluoro-2-deoxy-D-glucose positron emission tomography in the detection and staging of malignant lymphoma. A bicenter trial. Cancer 2001 Mar l;91(5):889-99 74

Buchpiguel CA, Mathias SC, Itaya LY, Barros NG, Portela LA, Freitas JM, Caramelli P, Carrilho PE, Bacheschi LA, Hironaka FH, Nitrini R. Brain SPECT in dementia. A clinical- scintigraphic correlation. Arq Neuropsiquiatr 1996 Sep;54(3):375-83

Budoff MJ, Georgiou D, Brody A, Agatston AS, Kennedy J, Wolfkiel C, Stanford W, Shields P, Lewis RJ, Janowitz WR, Rich S, Brundage BH. Ultrafast computed tomography as a diagnostic modality in the detection of coronary artery disease: a multicenter study. Circulation 1996 Mar l;93(5):898-904

Buraggi GL, Gasparini M, Seregni E, Bombardieri E, Regalia E, Maffioli L. Pilot, multicenter and prospective trials with an anti-CEA antibody. Int J Biol Markers 1992 Jul-Sep;7(3): 189-92

Buraggi GL, Gasparini M, Seregni E. Immunoscintigraphy of colorectal carcinoma with an anti- CEA monoclonal antibody: a critical review. Int J Rad Appl Instrum B 1991; 18(1 ):45-50

Burak Z, Argon M, Memis A, Erdem S, Balkan Z, Duman Y, Ustun EE, Erhan Y, Ozkilic H. Evaluation of palpable breast masses with 99Tcm-MIBI: a comparative study with mammography and ultrasonography. Nucl Med Commun 1994 Aug;15(8):604-12

Burk DL Jr, Karasick D, Kurtz AB, Mitchell DG, Rifkin MD, Miller CL, Levy DW, Fenlin JM, Bartolozzi AR. Rotator cuff tears: prospective comparison of MR imaging with arthrography, sonography, and surgery. AJR Am J Roentgenol 1989 Jul;153(l):87-92

Bury T, Corhay JL, Duysinx B, Daenen F, Ghaye B, Barthelemy N, Rigo P, Bartsch P. Value of FDG-PET in detecting residual or recurrent nonsmall cell lung cancer. Eur Respir J 1999 Dec;14(6):1376-80

Buscail L, Escourrou J, Moreau J, Delvaux M, D, Lapeyre F, Tregant P, Frexinos J. Endoscopic ultrasonography in chronic pancreatitis: a comparative prospective study with conventional ultrasonography, computed tomography, and ERCP. Pancreas 1995 Apr;10(3):251- 7

Buscarini L, Fornari F, Bolondi L, Colombo P, Livraghi T, Magnolfi F, Rapaccini GL, Salmi A. Ultrasound-guided fine-needle biopsy of focal liver lesions: techniques, diagnostic accuracy and complications. A retrospective study on 2091 biopsies. J Hepatol 1990 Nov;ll(3):344-8

Bussar-Maatz R, Weissbach L. Retroperitoneal lymph node staging of testicular tumours. TNM Study Group. Br J Urol 1993 Aug;72(2):234-40

CA, Bartolone C, Warner DL, Aizenstein R, Hibblen J, Yaghmai B, Wiley TE, Layden TJ. The inaccuracy of duplex ultrasonography in predicting patency of transjugular intrahepatic portosystemic shunts. Gastroenterology 1998 May;114(5):975-80

Cain MP, Garra B, Gibbons MD. Scrotal-inguinal ultrasonography: a technique for identifying the nonpalpable inguinal testis without laparoscopy. J Urol 1996 Aug; 156(2 Pt 2):791-4 75

Calvo MM, Calderon A, Heras I, Duran M, Orive V, Cabriada J, Astigarraga E. Boegard T, Rudling O, Dahlstrom J, Dirksen H, Petersson IF, Jonsson K. Bone scintigraphy in chronic knee pain: comparison with magnetic resonance imaging. Ann Rheum Dis 1999 Jan;58(l):20-6

Calvo MM, Calderon A, Heras I, Duran M, Orive V, Cabriada J, Astigarraga E. Magnetic resonance study of the pancreatic duct. Rev Esp Enferm Dig 1999 Apr;91(4):287-96

Candenas M, Villa R, Fernandez Collar R, Moina MJ, Pintado S, Garcia Saez F, Alvarez FV. Maternal serum alpha-fetoprotein screening for neural tube defects. Report of a program with more than 30000 screened pregnancies. Acta Obstet Gynecol Scand 1995 Apr;74(4):266-9

Carmeci C, Jeffrey RB, McDougall IR, Nowels KW, Weigel RJ. Ultrasound-guided fine-needle aspiration biopsy of thyroid masses. Thyroid 1998 Apr;8(4):283-9

Carter R, Hemingway D, Cooke TG, Pickard R, Poon FW, McKillop JA, McArdle CS. A prospective study of six methods for detection of hepatic colorectal metastases. Ann R Coll Surg Engl 1996 Jan;78(l):27-30

Castellani M, Reschini E, Longari V, Paracchi A, Corbetta S, Marotta G, Gerundini P. Role of Tc-99m sestamibi scintigraphy in the diagnosis and surgical decision-making process in primary hyperparathyroid disease. Clin Nucl Med 2001 Feb;26(2): 139-44

Catalano MF, Lahoti S, Geenen JE, Hogan WJ. Prospective evaluation of endoscopic ultrasonography, endoscopic retrograde pancreatography, and secretin test in the diagnosis of chronic pancreatitis. Gastrointest Endosc 1998 Jul;48(l):l 1-7

Catalona WJ, Richie JP, deKernion JB, Ahmann FR, Ratliff TL, Dalkin BL, Kavoussi LR, MacFarlane MT, Southwick PC. Comparison of prostate specific antigen concentration versus prostate specific antigen density in the early detection of prostate cancer: receiver operating characteristic curves. J Urol 1994 Dec;152(6 Pt l):2031-6

Catalona WJ, Southwick PC, Slawin KM, Partin AW, Brawer MK, Flanigan RC, Patel A, Richie JP, Walsh PC, Scardino PT, Lange PH, Gasior GH, Loveland KG, Bray KR. Comparison of percent free PSA, PSA density, and age-specific PSA cutoffs for prostate cancer detection and staging. Urology 2000 Aug l;56(2):255-60

Cazzato S, Zompatori M, Burzi M, Baruzzi G, Falcone F, Poletti V. Bronchoalveolar lavage and transbronchial lung biopsy in alveolar and/or ground-glass opacification. Monaldi Arch Chest Dis 1999 Apr;54(2): 115-9

Celani MG, Righetti E, Migliacci R, Zampolini M, Antoniutti L, Grandi FC, Ricci S. Comparability and validity of two clinical scores in the early differential diagnosis of acute stroke. BMJ 1994 Jun 25;308(6945): 1674-6

Champault G. The use of laparoscopic ultrasound in the assessment of pancreatic cancer. Wiad Lek 1997;50 Su 1 Pt 1:195-203 76

Champault G. The use of laparoscopic ultrasound in the assessment of pancreatic cancer. Wiad Lek 1997;50 Su 1 Pt 1:195-203

Chan YL, Leung CB, Yu SC, Yeung DK, Li PK. Comparison of non-breath-hold high resolution gadolinium-enhanced MRA with digital subtraction angiography in the evaluation on allograft renal artery stenosis. Clin Radiol 2001 Feb;56(2): 127-32

Chang JJ, Shinohara K, Bhargava V, Presti JC Jr. Prospective evaluation of lateral biopsies of the peripheral zone for prostate cancer detection. J Urol 1998 Dec; 160(6 Pt 1):2111-4

Chao CL, Lee YT, Chieng PU, Ho FM, Shiau YC, Shen SJ, Su CT, Huang PJ. Diagnosis of coronary artery disease using dipyridamole thallium-201 imaging. J Formos Med Assoc 1994 Nov-Dec;93(ll-12):906-10

Chappie KS, Spencer JA, Windsor AC, Wilson D, Ward J, Ambrose NS. Prognostic value of magnetic resonance imaging in the management of fistula-in-ano. Dis Colon Rectum 2000 Apr;43(4):511-55:6

Chetanneau A, Baum RP, Lehur PA, Liehn JC, Perkins AC, Bares R, Bourguet P, Herry JY, Saccavini JC, Chatal JF. Multi-centre immunoscintigraphic study using indium-lll-labelled CEA-specific and/or 19-9 monoclonal antibody F(ab')2 fragments. Eur J Nucl Med 1990;17(5):223-9

Chiti A, Fanti S, Savelli G, Romeo A, Bellanova B, Rodari M, van Graafeiland BJ, Monetti N, Bombardieri E. Comparison of somatostatin receptor imaging, computed tomography and ultrasound in the clinical management of neuroendocrine gastro-entero-pancreatic tumours. Eur J Nucl Med 1998 Oct;25(10): 1396-403

Chiti A, van Graafeiland BJ, Savelli G, Ferrari L, Seregni E, Castellani MR, Bombardieri E. Imaging of neuroendocrine gastro-entero-pancreatic tumours using radiolabelled somatostatin analogues. Ital J Gastroenterol Hepatol 1999 Oct;31 Suppl 2:S190-4

Cho JY, Kim SH, Lee SE. Peripheral hypoechoic lesions of the prostate: evaluation with color and power Doppler ultrasound. Eur Urol 2000 Apr;37(4):443-8

Chui DW, Gooding GA, McQuaid KR, Griswold V, Grendell JH. Hydrocolonic ultrasonography in the detection of colonic polyps and tumors. N Engl J Med 1994 Dec 22;331(25): 1685-8

Chung HW, Kim YH, Hong SH, Kim SS, Chung JK, Seong SC, Kang HS. Indirect signs of anterior cruciate ligament injury on SPET: comparison with MRI and arthroscopy.

Chung MH, Lee HG, Kwon SS, Park SH. MR imaging of solitary pulmonary lesion: emphasis on tuberculomas and comparison with tumors. J Magn Reson Imaging 2000 Jun;l 1(6):629-3 77

Cicinelli E, Romano F, Anastasio PS, Blasi N, Parisi C, Galantino P. Transabdominal sonohysterography, transvaginal sonography, and hysteroscopy in the evaluation of submucous myomas. Obstet Gynecol 1995 Jan;85(l):42-7

Circulation 1997 Aug 5;96(3):785-92

Cohen MS. Parametric analysis of fMRI data using linear systems methods. Neuroimage 1997 Aug;6(2):93-1

Cole FH Jr, Thomas JE, Wilcox AB, Haiford HH 3rd. Cerebral imaging in the asymptomatic preoperative bronchogenic carcinoma patient: is it worthwhile? Ann Thorac Surg 1994 Apr;57(4):838-40

Collier BD, Abdel-Nabi H, Doerr RJ, Harwood SJ, Olsen J, Kaplan EH, Winzelberg GG, Grossman SJ, Krag DN, Mitchell EP. Immunoscintigraphy performed with In-111-labeled CYT- 103 in the management of colorectal cancer: comparison with CT. Radiology 1992 Oct;185(l):179-86

Colosimo C, Ruscalleda J, Korves M, La Ferla R, Wool C, Pianezzola P, Kirchin MA. Detection of intracranial metastases: a multicenter, intrapatient comparison of gadobenate dimeglumine- enhanced MR! with routinely used contrast agents at equal dosage. Invest Radiol 2001 Feb;36(2):72-81

Coltman CA Jr, Thompson IM Jr, Feigl P. Prostate Cancer Prevention Trial (PCPT) update. Eur Urol 1999;35(5-6):544-7

Convit A, de Asis J, de Leon MJ, Tarshish CY, De Santi S, Rusinek H. Atrophy of the medial occipitotemporal, inferior, and middle temporal gyri in non-demented elderly predict decline to Alzheimer's disease. Neurobiol Aging 2000 Jan-Feb;21(l): 19-26

Cookson MS, Floyd MK, Ball TP Jr, Miller EK, Sarosdy MF. The lack of predictive value of prostate specific antigen density in the detection of prostate cancer in patients with normal rectal examinations and intermediate prostate specific antigen levels. J Urol 1995 Sep;154(3):1070-3

Cornel JH, Bax JJ, Fioretti PM, Visser FC, Maat AP, Boersma E, van Lingen A, Elhendy A, Roelandt JR. Prediction of improvement of ventricular function after revascularization. 18F- fluorodeoxyglucose single-photon emission computed tomography vs low-dose dobutamine echocardiography. Eur Heart J 1997 Jun; 18(6):941-8

Cornud F, Belin X, Piron D, Chretien Y, Flam T, Casanova JM, Helenon O, Mejean A, Thiounn N, Moreau JF. Color Doppler-guided prostate biopsies in 591 patients with an elevated serum PSA level: impact on Gleason score for nonpalpable lesions. Urology 1997 May;49(5):709-15

Cosnard G, Duprez T, Grandin C, Smith AM, Munier T, Peeters A. Fast FLAIR sequence for detecting major vascular abnormalities during the hyperacute phase of stroke: a comparison with MR angiography. Neuroradiology 1999 May;41(5):342-6 78

Cot M, Le Hesran JY, Miailhes P, Esveld M, Etya'ale D, Breart G. Increase of birth weight following chloroquine chemoprophylaxis during the first pregnancy: results of a randomized trial in Cameroon. Am J Trop Med Hyg 1995 Dec;53(6):581-5

Coubes P, Awad IA, Antar M, Magdinec M, Sufka B. Comparison and spacial correlation of interictal HMPAO-SPECT and FDG-PET in intractable temporal lobe epilepsy. Neurol Res 1993 Jun;15(3):160-8

Cramer MJ, Verzijlbergen JF, Niemeyer MG, van der Wall EE, Zwinderman AH, Ascoop CA, Pauwels EK. 99Tcm-sestamibi SPECT with combined dipyridamole and exercise stress in coronary artery disease. Nucl Med Commun 1994 Jul;15(7):554-9

Crane JM, Van den Hof M, Armson BA, Liston R. Transvaginal ultrasound in the prediction of preterm delivery: singleton and twin gestations. Obstet Gynecol 1997 Sep;90(3):357-63

Cremerius U, Wildberger JE, Borchers H, Zimny M, Jakse G, Günther RW, Buell U. Does positron emission tomography using 18-fluoro-2-deoxyglucose improve clinical staging of testicular cancer?~Results of a study in 50 patients. Urology 1999 Nov;54(5):900-4

Cross MJ, Harms SE, Cheek JH, Peters GN, Jones RC. New horizons in the diagnosis and treatment of breast cancer using magnetic resonance imaging. Am J Surg 1993 Dec;166(6):749- 53; discussion 753-5

Cundiff D, McCarthy K, Savarese JJ, Kaiko R, Thomas G, Grandy R, Goldenheim P. Evaluation of a cancer pain model for the testing of long-acting analgesics. The effect of MS Contin in a double-blind, randomized crossover design. Cancer 1989 Jun 1;63(11 Suppl):2355-9

Currie IC, Jones AJ, Wakeley CJ, Tennant WG, Wilson YG, Baird RN, Lamont PM. Non- invasive aortoiliac assessment. Eur J Vase Endovasc Surg 1995 Jan;9(l):24-8

Cwikla JB, Buscombe JR, Kelleher SM, Parbhoo SP, Thakrar DS, Hinton J, Deery AR, Crow J, Hilson AJ. Comparison of accuracy of scintimammography and X-ray mammography in the diagnosis of primary breast cancer in patients selected for surgical biopsy. Clin Radiol 1998 Apr;53(4):274-80

Dadparvar S, Krishna L, Miyamoto C, Brady LW, Brown SJ, Bender H, Slizofski WJ, Eshleman J, Chevres A, Woo DV. Indium-lll-labeled anti-EGFr-425 scintigraphy in the detection of malignant gliomas. Cancer 1994 Feb 1;73(3 Suppl):884-9

Dagli MS, Ingeholm JE, Haxby JV. Localization of cardiac-induced signal change in fMRI. Neuroimage 1999 Apr;9(4):407-15

Davidson BL, Elliott CG, Lensing AW. Low accuracy of color Doppler ultrasound in the detection of proximal leg vein thrombosis in asymptomatic high-risk patients. The RD Heparin Arthroplasty Group. Ann Intern Med 1992 Nov 1;117(9):735-8 79 de Bruine JF, Limburg M, van Royen EA, Hijdra A, Hill TC, van der Schoot JB. SPET brain imaging with 201 diethyldithiocarbamate in acute ischaemic stroke. Eur J Nucl Med 1990;17(5):248-51 de Kanter AY, van Geel AN, Paul MA, van Eijck CH, Henzen-Logmans SC, Kruyt RH, Krenning EP, Eggermont AM, Wiggers T. Controlled introduction of the sentinel node biopsy in breast cancer in a multi-centre setting: the role of a coordinator for quality control. Eur J Surg Oncol 2000 Nov;26(7):652-6 de Ledinghen V, Lecesne R, Raymond JM, Gense V, Amouretti M, Drouillard J, Couzigou P, Silvain C. Diagnosis of choledocholithiasis: EUS or magnetic resonance cholangiography? A prospective controlled study. Gastrointest Endosc 1999 Jan;49(l):26-31

De Rosa V, Mangoni di Stefano ML, Brunetti A, Caraco C, Graziano R, Gallo MS, Maffeo A. Computed tomography and second-look surgery in ovarian cancer patients. Correlation, actual role and limitations of CT scan. Eur J Gynaecol Oncol 1995;16(2): 123-9 de Vries LD, Dijkhuizen FP, Mol BW, Brolmann HA, Moret E, Heintz AP. Comparison of transvaginal sonography, saline infusion sonography, and hysteroscopy in premenopausal women with abnormal uterine bleeding. J Clin Ultrasound 2000 Jun;28(5):217-23

Deasy NP, Conry BG, Lewis JL, Ford TF, Russell GA, Basu R, Flanagan JJ. Local staging of prostate cancer with 0.2 T body coil MRI. Clin Radiol 1997 Dec;52(12):933-7

Debatin JF, Nadel SN, Gray L, Friedman HS, Trotter P, Hockenberger B, Oakes WJ. Phase III clinical evaluation of gadoteridol injection: experience in pediatric neuro-oncologic MR imaging. Pediatr Radiol 1992;22(2):93-8

DeBella K, Poskitt K, Szudek J, Friedman JM. Use of "unidentified bright objects" on MRI for diagnosis of neurofibromatosis 1 in children. Neurology 2000 Apr 25;54(8): 1646-51

Del Pizzo JJ, Sklar GN, You-Cheong JW, Levin B, Krebs T, Jacobs SC. Helical computerized tomography arteriography for evaluation of live renal donors undergoing laparoscopic nephrectomy. J Urol 1999 Jul;162(l):31-4

Denti M, Monteleone M, Trevisan C, De Romedis B, Barmettler F. Magnetic resonance imaging versus arthroscopy for the investigation of the osteochondral humeral defect in anterior shoulder instability. A double-blind prospective study. Knee Surg Sports Traumatol Arthrose 1995;3(3):184-6

DePriest PD, Varner E, Powell J, Fried A, Puls L, Higgins R, Shenson D, Kryscio R, Hunter JE, Andrews SJ, et al. The efficacy of a sonographic morphology index in identifying ovarian cancer: a multi-institutional investigation. Gynecol Oncol 1994 Nov;55(2): 174-8 80

Desai SS. Early diagnosis of spinal tuberculosis by MRI. J Bone Joint Surg Br 1994 Nov;76(6): 863-9

Desberg AL, Paushter DM, Lammert GK, Hale JC, Troy RB, Novick AC, Nally JV Jr, Weltevreden AM. Renal artery stenosis: evaluation with color Doppler flow imaging. Radiology 1990 Dec; 177(3):749-53

Devereux JG, Foster PJ, Baasanhu J, Uranchimeg D, Lee PS, Erdenbeleig T, Machin D, Johnson GJ, Alsbirk PH. Anterior chamber depth measurement as a screening tool for primary angle- closure glaucoma in an East Asian population. Arch Ophthalmol 2000 Feb;l 18(2):257-63

Devizzi L, Maffioli L, Bonfante V, Viviani S, Balzarini L, Gasparini M, Valagussa P, Bombardieri E, Santoro A, Bonadonna G. Comparison of gallium scan, computed tomography, and magnetic resonance in patients with mediastinal Hodgkin's disease. Ann Oncol 1997;8 Suppl 1:53-6

Devonec M. Diagnostic value of transrectal ultrasonography in prostatic cancer. Prog Clin Biol Res 1987;243B:1-10

Di Martino E, Nowak B, Hassan HA, Hausmann R, Adam G, Buell U, Westhofen M. Diagnosis and staging of head and neck cancer: a comparison of modern imaging modalities (positron emission tomography, computed tomography, color-coded duplex sonography) with panendoscopic and histopathologic findings. Arch Otolaryngol Head Neck Surg 2000 Dec;126(12):1457-61

Dietemann JL, Thibaut-Menard A, Warter JM, Neugroschl C, Tranchant C, Gillis C, Eid MA, Bogorin A. MRI in multiple sclerosis of the spinal cord: evaluation of fast short-tan inversion- recovery and spin-echo sequences. Neuroradiology 2000 Nov;42(l l):810-3

Dietlein M, Scheidhauer K, Voth E, Theissen P, Schicha H. Fluorine-18 fluorodeoxyglucose positron emission tomography and iodine-131 whole-body scintigraphy in the follow-up of differentiated thyroid cancer. Eur J Nucl Med 1997 Nov;24(l 1): 1342-8

Dixon RM, Meire HB, Leigh C, Posner J, Rolan PE. Evaluation of non-invasive techniques to assess vasoconstriction in healthy volunteers using methoxamine as a probe drug. Cephalalgia 1996Nov;16(7):507-17

Djavan B, Zlotta A, Remzi M, Ghawidel K, Basharkhah A, Schulman CC, Marberger M. Optimal predictors of prostate cancer on repeat prostate biopsy: a prospective study of 1,051 men. J Urol 2000 Apr; 163(4): 1144-8; discussion 1148-9

D'lppolito G, de Mello GG, Szejnfeld J. The value of unenhanced CT in the diagnosis of acute appendicitis. Rev Paul Med 1998 Nov-Dec;l 16(6): 1838-45 81

Dodd GD 3rd, Memel DS, Baron RL, Eichner L, Santiguida LA. Portal vein thrombosis in patients with cirrhosis: does sonographic detection of intrathrombus flow allow differentiation of benign and malignant thrombus? AJR Am J Roentgenol 1995 Sep;165(3):573-7

Dogruca Z, Kabasakal L, Yapar F, Nisil C, Vural VA, Onsel Q. A comparison of Tl-201 stress- reinjection-prone SPECT and Tc-99m-sestamibi gated SPECT in the differentiation of inferior wall defects from artifacts. Nucl Med Commun 2000 Aug;21(8):719-27

Dresel S, Kirsch CM, Tatsch K, Zachoval R, Hahn K, Goldenberg DM. Detection of hepatocellular carcinoma with a new alpha-fetoprotein antibody imaging kit. J Clin Oncol 1997 Jul;15(7):2683-90

Drew PJ, Turnbull LW, Chatterjee S, Read J, Carleton PJ, Fox JN, Monson JR, Kerin MJ. Prospective comparison of standard triple assessment and dynamic magnetic resonance imaging of the breast for the evaluation of symptomatic breast lesions. Ann Surg 1999 Nov;230(5):680-5

Droste DW, Kriete JU, Stypmann J, Castrucci M, Wichter T, Tietje R, Weltermann B, Young P, Ringelstein EB. Contrast transcranial Doppler ultrasound in the detection of right-to-left shunts: comparison of different procedures and different contrast agents. Stroke 1999 Sep;30(9): 1827-32

Duboeuf F, Jergas M, Schott AM, Wu CY, Gluer CC, Genant HK. A comparison of bone densitometry measurements of the central skeleton in post-menopausal women with and without vertebral fracture. Br J Radiol 1995 Jul;68(811):747-53

Duerinckx AJ, Urman MK. Two-dimensional coronary MR angiography: analysis of initial clinical results. Radiology 1994 Dec;193(3):731-8

Duffy SW, Chen HH, Tabar L, Fagerberg G, Paci E. Sojourn time, sensitivity and positive predictive value of mammography screening for breast cancer in women aged 40-49. Int J Epidemiol 1996 Dec;25(6): 1139-45

Durham RM, Zuckerman D, Wolverson M, Heiberg E, Luchtefeld WB, Herr DJ, Shapiro MJ, Mazuski JE, Salimi Z, Sundaram M. Computed tomography as a screening exam in patients with suspected blunt aortic injury. Ann Surg 1994 Nov;220(5):699-704

Ecochard R, Marret H, Rabilloud M, Bradai R, Boehringer H, Girotto S, Barbato M. Sensitivity and specificity of ultrasound indices of ovulation in spontaneous cycles. Eur J Obstet Gynecol Reprod Biol 2000 Jul;91(1): 59-64

Egawa S, Suyama K, Takashima R, Mizoguchi H, Kuwao S, Baba S. Prospective evaluation of prostate cancer detection by prostate-specific antigen-related parameters. Int J Urol 1999 Oct;6(10):493-501

el-Azab M, Mohsen T, el-Diasty T, Shokeir AA. Doppler ultrasonography in evaluation of potential live kidney donors: a prospective study. J Urol 1996 Sep;156(3):878-80 82

Elhendy A, Trocino G, Salustri A, Cornel JH, Roelandt JR, Boersma E, van Domburg RT, Krenning EP, El-Said GM, Fioretti PM. Low-dose dobutamine echocardiography and rest- redistribution thallium-201 tomography in the assessment of spontaneous recovery of left ventricular function after recent myocardial infarction. Am Heart J 1996 Jun;131(6):1088-96

Elias A, Le Corff G, Bouvier JL, Benichou M, Serradimigni A. Value of real time B mode ultrasound imaging in the diagnosis of deep vein thrombosis of the lower limbs. Int Angiol 1987 Apr-Jun;6(2): 175-82

Elkohen M, Beregi JP, Deklunder G, Artaud D, Mounier-Vehier C, Carre AG. A prospective study of helical computed tomography angiography versus angiography for the detection of renal artery stenoses in hypertensive patients. J Hypertens 1996 Apr;14(4):525-8

Emshoff R, Rudisch A. Validity of clinical diagnostic criteria for temporomandibular disorders: clinical versus magnetic resonance imaging diagnosis of temporomandibular joint internal derangement and osteoarthrosis. Oral Surg Oral Med Oral Pathol Oral Radiol Endod 2001 Jan;91(l):50-5

Erasmus JJ, McAdams HP, Rossi SE, Goodman PC, Coleman RE, Patz EF. FDG PET of pleural effusions in patients with non-small cell lung cancer. AJR Am J Roentgenol 2000 Jul;175(l):245-9

Erbel R, Engberding R, Daniel W, Roelandt J, Visser C, Rennollet H. Echocardiography in diagnosis of aortic dissection. Lancet 1989 Mar 4;1(8636):457-61

Errington ML, Ferguson JM, Gillespie IN, Connell HM, Ruckley CV, Wright AR. Complete pre- operative imaging assessment of abdominal aortic aneurysm with spiral CT angiography. Clin Radiol 1997 May;52(5):369-77

Eskew LA, Bare RL, McCullough DL. Systematic 5 region prostate biopsy is superior to sextant method for diagnosing carcinoma of the prostate. J Urol 1997 Jan;157(l):199-202; discussion 202-3

Evangelista A, Garcia-del-Castillo H, Gonzalez-Alujas T, Dominguez-Oronoz R, Salas A, Permanyer-Miralda G, Soler-Soler J. Diagnosis of ascending aortic dissection by transesophageal echocardiography: utility of M-mode in recognizing artifacts. J Am Coll Cardiol 1996 Jan;27(l): 102-7

Evans AJ, Sostman HD, Witty LA, Paulson EK, Spritzer CE, Hertzberg BS, Carroll BA, Tapson VF, Saltzman HA, DeLong DM. Detection of deep venous thrombosis: prospective comparison of MR imaging and sonography. J Magn Reson Imaging 1996 Jan-Feb;6(l):44-51

Everaert K, Oostra C, Delanghe J, Vande Walle J, Van Laere M, Oosterlinck W. Diagnosis and localization of a complicated urinary tract infection in neurogenic bladder disease by tubular proteinuria and serum prostate specific antigen. Spinal Cord 1998 Jan;36(l):33-8 83

Ezekowitz MD, Migliaccio F, Farlow D, Pope C, Denny D, Markowitz D, Hammers L, AH A, Hirsh J. Comparison of platelet scintigraphy, impedance plethysmography gray scale and color flow duplex ultrasound and venography for the diagnosis of venous thrombosis. Prog Clin Biol Res 1990;355:23-7

Falkai P, Schneider T, Greve B, Klieser E, Bogerts B. Reduced frontal and occipital lobe asymmetry on the CT-scans of schizophrenic patients. Its specificity and clinical significance. J Neural Transm Gen Sect 1995;99(l-3):63-77

Farrell S, Hayes T, Shaw M. A negative SimpliRED D-dimer assay result does not exclude the diagnosis of deep vein thrombosis or pulmonary embolus in emergency department patients. Ann Emerg Med 2000 Feb;35(2): 121-5

Fenlon HM, Phelan N, Tierney S, Gorey T, Ennis JT. Tc-99m tetrofosmin scintigraphy as an adjunct to plain-film mammography in palpable breast lesions. Clin Radiol 1998 Jan;53(l):17-24

Feuerstein IM, Jicha DL, Pass HI, Chow CK, Chang R, Ling A, Hill SC, Dwyer AJ, Travis WD, Horowitz ME, et al. Pulmonary metastases: MR imaging with surgical correlation~a prospective study. Radiology 1992 Jan; 182(1): 123-9

Fexa J, Nemec J, Novak Z, Zimak J, Bednar J. Sonography in the evaluation of treatment of differentiated thyroid cancer. First results in 158 patients. Neoplasma 1990;37(4):461-5

Ficaro EP, Fessler JA, Shreve PD, Kritzman JN, Rose PA, Corbett JR. Simultaneous transmission/emission myocardial perfusion tomography. Diagnostic accuracy of attenuation- corrected 99mTc-sestamibi single-photon emission computed tomography. Circulation 1996 Feb l;93(3):463-73

Fielding JR, Steele G, Fox LA, Heller H, Loughlin KR. Spiral computerized tomography in the evaluation of acute flank pain: a replacement for excretory urography. J Urol 1997 Jun;157(6):2071-3

Filella X, Molina R, Ballesta AM, Gil MJ, Allepuz C, Rioja LA. Value of PSA (prostate-specific antigen) in the detection of prostate cancer in patients with urological symptoms. Results of a multicentre study. Eur J Cancer 1996 Jun;32A(7):l 125-8

Filippi M, Barkhof F, Bressi S, Yousry TA, Miller DH. Inter-rater variability in reporting enhancing lesions present on standard and triple dose gadolinium scans of patients with multiple sclerosis. Mult Scler 1997 Aug;3(4):226-30

Filippi M, Mastronardo G, Bastianello S, Rocca MA, Rovaris M, Gasperini C, Pozzilli C, Comi G. A longitudinal brain MRI study comparing the sensitivities of the conventional and a newer approach for detecting active lesions in multiple sclerosis. J Neurol Sei 1998 Jul 15;159(1):94- 101 84

Filippi M, Rovaris M, Capra R, Gasperini C, Yousry TA, Sormani MP, Prandini F, Horsfield MA, Martinelli V, Bastianello S, Kuhne I, Pozzilli C, Comi G. A multi-centre longitudinal study comparing the sensitivity of monthly MRI after standard and triple dose gadolinium-DTPA for monitoring disease activity in multiple sclerosis. Implications for phase II clinical trials. Brain 1998 Oct;121(PtlO):2011-20

Filippi M, Yousry T, Campi A, Kandziora C, Colombo B, Voltz R, Martinelli V, Spuler S, Bressi S, Scotti G, Comi G. Comparison of triple dose versus standard dose gadolinium-DTPA for detection of MRI enhancing lesions in patients with MS. Neurology 1996 Feb;46(2):379-84

Finazzi G, Falanga A, Galli M, Cortelazzo S, Remuzzi A, Barbui T. Recombinant versus high- sensitivity conventional thromboplastin: a randomized clinical study in patients on oral anticoagulation. Thromb Haemost 1994 Dec;72(6):804-7

Fishman A, Altaras M, Bernheim J, Cohen I, Beyth Y, Tepper R. The value of transvaginal sonography in the preoperative assessment of myometrial invasion in high and low grade endometrial cancer and in comparison to frozen section in grade 1 disease. Eur J Gynaecol Oncol 2000;21(2): 128-30

Flamen P, Lerut A, Van Cutsem E, De Wever W, Peeters M, Stroobants S, Dupont P, Bormans G, Hiele M, De Leyn P, Van Raemdonck D, Coosemans W, Ectors N, Haustermans K, Mortelmans L. Utility of positron emission tomography for the staging of patients with potentially operable esophageal carcinoma. J Clin Oncol 2000 Sep 15;18(18):3202-10

Fletcher BD, Kauffman WM, Kaste SC, Winer-Muram HT, Fang L, Chen G, Hudson M. Use of Tl-201 to detect untreated pediatric Hodgkin disease. Radiology 1995 Sep;196(3):851-5

Fletcher JG, Johnson CD, Welch TJ, MacCarty RL, Ahlquist DA, Reed JE, Harmsen WS, Wilson LA. Optimization of CT colonography technique: prospective trial in 180 patients. Radiology 2000 Sep;216(3):704-11

Fletcher S, Jones RG, Rayner HC, Harnden P, Hordon LD, Aaron JE, Oldroyd B, Brownjohn AM, Turney JH, Smith MA. Assessment of renal osteodystrophy in dialysis patients: use of bone alkaline phosphatase, bone mineral density and parathyroid ultrasound in comparison with bone histology. Nephron 1997;75(4):412-9

Foley EF, Kolecki RV, Schirmer BD. The accuracy of laparoscopic ultrasound in the detection of colorectal cancer liver metastases. Am J Surg 1998 Sep;176(3):262-4

Forbes K, Stevenson AJ. The use of power Doppler ultrasound in the diagnosis of isolated deep venous thrombosis of the calf. Clin Radiol 1998 Oct;53(10):752-4

Formica CA, Nieves JW, Cosman F, Garrett P, Lindsay R. Comparative assessment of bone mineral measurements using dual X-ray absorptiometry and peripheral quantitative computed tomography. Osteoporos Int 1998;8(5):460-7 85

Franzen C, Altfeld M, Hegener P, Hartmann P, Arendt G, Jablonowski H, Rockstroh J, Diehl V, Salzberger B, Fatkenheuer G. Limited value of PCR for detection of Toxoplasma gondii in blood from human immunodeficiency virus-infected patients. J Clin Microbiol 1997 Oct;35(10):2639- 41

Franzen D, Sechtem U, Hopp HW. Comparison of angioscopic, intravascular ultrasonic, and angiographic detection of thrombus in coronary stenosis. Am J Cardiol 1998 Nov 15;82(10): 1273-5, A9

Frattali CM, Sonies BC, Chi-Fishman G, Litvan I. Effects of physostigmine on swallowing and oral motor functions in patients with progressive supranuclear palsy: A pilot study. Dysphagia 1999Summer;14(3):165-8

Freed KS, Paulson EK, Frederick MG, Preminger GM, Shusterman DJ, Keogan MT, Vieweg J, Smith RH, Nelson RC, Delong DM, Leder RA. Interobserver variability in the interpretation of unenhanced helical CT for the diagnosis of ureteral stone disease. J Comput Assist Tomogr 1998 Sep-Oct;22(5):732-7

French SD, Green S, Forbes A. Reliability of methods commonly used to detect manipulable lesions in patients with chronic low-back pain. J Manipulative Physiol Ther 2000 May;23(4):231-8

Fridriksson T, Kini N, Walsh-Kelly C, Hennes H. Serum neuron-specific enolase as a predictor of intracranial lesions in children with head trauma: a pilot study. Acad Emerg Med 2000 Jul;7(7):816-20

Fritscher-Ravens A, Petrasch S, Reinacher-Schick A, Graeven U, König M, Schmiegel W. Diagnostic value of endoscopic ultrasonography-guided fine-needle aspiration cytology of mediastinal masses in patients with intrapulmonary lesions and nondiagnostic bronchoscopy. Respiration 1999;66(2): 150-5

Fritscher-Ravens A, Petrasch S, Reinacher-Schick A, Graeven U, König M, Schmiegel W. Diagnostic value of endoscopic ultrasonography-guided fine-needle aspiration cytology of mediastinal masses in patients with intrapulmonary lesions and nondiagnostic bronchoscopy. Respiration 1999;66(2): 150-5

Fritz P, Rieden K, Lenarz T, Haels J, zum Winkel K. Radiological evaluation of temporal bone disease: high-resolution computed tomography versus conventional X-ray diagnosis. Br J Radiol 1989 Feb;62(734): 107-13

Fukuda H, Nakagawa T, Shibuya H. Metastases to pelvic lymph nodes from carcinoma in the pelvic cavity: diagnosis using thin-section CT. Clin Radiol 1999 Apr;54(4):237-42

Galjee MA, van Rossum AC, Doesburg T, van Eenige MJ, Visser CA. Value of magnetic resonance imaging in assessing patency and function of coronary artery bypass grafts. An angiographically controlled study. Circulation 1996 Feb 15;93(4):660-6 86

Gallacher SJ, Kelly P, Shand J, Logue FC, Cooke T, Boyle IT, McKillop JH. A comparison of 10 MHz ultrasound and 201-thallium/99m-technetium subtraction scanning in primary hyperparathyroidism. Postgrad Med J 1993 May;69(811):376-80

Garcia Pena BM, Mandl KD, Kraus SJ, Fischer AC, Fleisher GR, Lund DP, Taylor GA. Ultrasonography and limited computed tomography in the diagnosis and management of appendicitis in children. JAMA 1999 Sep 15;282(11): 1041-6

Gardeil F, Greene R, Stuart B, Turner MJ. Subcutaneous fat in the fetal abdomen as a predictor of growth restriction. Obstet Gynecol 1999 Aug;94(2):209-12

Garg S, Fortling B, Chadwick D, Robinson MC, Hamdy FC. Staging of prostate cancer using 3- dimensional transrectal ultrasound images: a pilot study. J Urol 1999 Oct; 162(4): 1318-21

Garre Sanchez MC, Sola Perez J, Bas Bernal A, Mercader Martinez J, Lopez Alanis A. Ultrasound-guided fine-needle aspiration biopsy. Study of the cost per patient and comparison with computed tomography-guided biopsy. Rev Esp Enferm Dig 1997 Apr;89(4):297-304

Gasche C, Moser G, Turetschek K, Schober E, Moeschl P, Oberhuber G. Transabdominal bowel sonography for the detection of intestinal complications in Crohn's disease. Gut 1999 Jan;44(l): 112-7

Gasparini M, Buraggi GL, Regalia E, Maffioli L, Balzarini L, Gennari L. Comparison of radioimmunodetection with other imaging methods in evaluating local relapses of colorectal carcinoma. Cancer 1994 Feb 1;73(3 Suppl):846-9

Gateno J, Miloro M, Hendler BH, Horrow M. The use of ultrasound to determine the position of the mandibular condyle. J Oral Maxillofac Surg 1993 Oct;51(10):1081-6; discussion 1086-7

Gautier JF, Mourier A, de Kerviler E, Tarentola A, Bigard AX, Villette JM, Guezennec CY, Cathelineau G. Evaluation of abdominal fat distribution in noninsulin-dependent diabetes mellitus: relationship to insulin resistance. J Clin Endocrinol Metab 1998 Apr;83(4): 1306-11

Gdeedo A, Van Schil P, Corthouts B, Van Mieghem F, Van Meerbeeck J, Van Marck E. Comparison of imaging TNM [(i)TNM] and pathological TNM [pTNM] in staging of bronchogenic carcinoma. Eur J Cardiothorac Surg 1997 Aug;12(2):224-7

Gebel M, Caselitz M, Bowen-Davies PE, Weber S. A multicenter, prospective, open label, randomized, controlled phase Illb study of SH U 508 a (Levovist) for Doppler signal enhancement in the portal vascular system. Ultraschall Med 1998 Aug; 19(4): 148-56

Gebhard F, Authenrieth M, Strecker W, Kinzl L, Hehl G. Ultrasound evaluation of gravity induced anterior drawer following anterior cruciate ligament lesion. Knee Surg Sports Traumatol Arthrose 1999;7(3): 166-72 87

Geijer M, Sihlbom H, Gothlin JH, Nordborg E. The role of CT in the diagnosis of sacro-iliitis. ActaRadiol 1998 May;39(3):265-8

Gentile AT, Moneta GL, Lee RW, Masser PA, Taylor LM Jr, Porter JM. Usefulness of fasting and postprandial duplex ultrasound examinations for predicting high-grade superior mesenteric artery stenosis. Am J Surg 1995 May; 169(5) Al6-9

Gerspacher-Lara R, Pinto-Silva RA, Serufo JC, Rayes AA, Drummond SC, Lambertucci JR. Splenic palpation for the evaluation of morbidity due to schistosomiasis mansoni. Mem Inst OswaldoCruz 1998;93 Suppl 1:245-8

Ghaye B, Dondelinger RF, Dewe W. Percutaneous CT-guided lung biopsy: sequential versus spiral scanning. A randomized prospective study. Eur Radiol 1999;9(7): 1317-20

Giancarlo T, Palmieri A, Giacomarra V, Russolo M. Pre-operative evaluation of cervical adenopathies in tumours of the upper aerodigestive tract. Anticancer Res 1998 Jul- Aug;18(4B):2805-9

Giesen RJ, Huynen AL, Aarnink RG, de la Rosette JJ, vd Kaa C, Oosterhof GO, Debruyne FM, Wijkstra H. Computer analysis of transrectal ultrasound images of the prostate for the detection of carcinoma: a prospective study in radical prostatectomy specimens. J Urol 1995 Oct;154(4):1397-400

Girschick HJ, Krauspe R, Tschammler A, Huppertz HI. Chronic recurrent osteomyelitis with clavicular involvement in children: diagnostic value of different imaging techniques and therapy with non-steroidal anti-inflammatory drugs. Eur J Pediatr 1998 Jan;157(l):28-33

Glaser J, Mann O, Pausch J. Diagnosis of chronic pancreatitis by means of a sonographic secretin test. Int J Pancreatol 1994 Jun; 15(3): 195-200

Glashow JL, Katz R, Schneider M, Scott WN. Double-blind assessment of the value of magnetic resonance imaging in the diagnosis of anterior cruciate and meniscal lesions. J Bone Joint Surg Am 1989 Jan;71(l): 113-9

Gmeinwieser J, Feuerbach S, Hohenberger W, Albrich H, Strotzer M, Hofstadter F, Geissler A. Spiral-CT in diagnosis of vascular involvement in pancreatic cancer. Hepatogastroenterology 1995 Jul-Aug;42(4):418-22

Gnudi S, Malavolta N, Ripamonti C, Caudarella R. Ultrasound in the evaluation of osteoporosis: a comparison with bone mineral density at distal radius. Br J Radiol 1995 May;68(809):476-80

Gochman RF, Karasic RB, Heller MB. Use of portable ultrasound to assist urine collection by suprapubic aspiration. Ann Emerg Med 1991 Jun;20(6):631-5

Goessens WH, Mouton JW, van der Meijden WI, Deelen S, van Rijsoort-Vos TH, Lemmens-den Toom N, Verbrugh HA, Verkooyen RP. Comparison of three commercially available 88 amplification assays, AMP CT, LCx, and COBAS AMPLICOR, for detection of Chlamydia trachomatis in first-void urine. J Clin Microbiol 1997 Oct;35(10):2628-33

Goffinet F, Paris J, Heim N, Nisand I, Breart G. Predictive value of Doppler umbilical artery velocimetry in a low risk population with normal fetal biometry. A prospective study of 2016 women. Eur J Obstet Gynecol Reprod Biol 1997 Jan;71(l):l 1-9

Goldenberg DM, Abdel-Nabi H, Sullivan CL, Serafini A, Seidin D, Barron B, Lamki L, Line B, Wegener WA. Carcinoembryonic antigen immunoscintigraphy complements mammography in the diagnosis of breast carcinoma. Cancer 2000 Jul 1;89(1):104-15

Golebiowski M, Barcikowska M, Pfeffer A. Magnetic resonance imaging-based hippocampal volumetry in patients with dementia of the Alzheimer type. Dement Geriatr Cogn Disord 1999 Jul-Aug;10(4):284-8

Goletti O, Celona G, Galatioto C, Viaggi B, Lippolis PV, Pieri L, Cavina E. Is laparoscopic sonography a reliable and sensitive procedure for staging colorectal cancer? A comparative study. Surg Endosc 1998 Oct;12(10):1236-41

Goletti O, Celona G, Galatioto C, Viaggi B, Lippolis PV, Pieri L, Cavina E. Is laparoscopic sonography a Surg Endosc 1998 Oct; 12(10): 1236-41 reliable and sensitive procedure for staging colorectal cancer? A comparative study.

Gomez-Pastrana D, Torronteras R, Caro P, Anguita ML, Barrio AM, Andres A, Navarro J. Diagnosis of tuberculosis in children using a polymerase chain reaction. Pediatr Pulmonol 1999 Nov;28(5):344-51

Gordon ML, Cohen NL. Efficacy of auditory brainstem response as a screening test for small acoustic neuromas. Am J Otol 1995 Mar; 16(2): 136-9

Gorelik U, Ulish Y, Yagil Y. The use of standard imaging techniques and their diagnostic value in the workup of renal colic in the setting of intractable flank pain. Urology 1996 May;47(5):637-42

Gormley GJ, Ng J, Cook T, Stoner E, Guess H, Walsh P. Effect of finasteride on prostate- specific antigen density. Urology 1994 Jan;43(l):53-8; discussion 58-9

Gottlieb RH, Voci SL, Cholewinski SP, Hartley DF, Rubens DJ, Orloff MS, Bronsther OL. Sonography: a useful tool to detect the mechanical causes of renal transplant dysfunction. J Clin Ultrasound 1999 Jul-Aug;27(6):325-33

Grab D, Flock F, Stohr I, Nussle K, Rieber A, Fenchel S, Brambs HJ, Reske SN, Kreienberg R. Classification of asymptomatic adnexal masses by ultrasound, magnetic resonance imaging, and positron emission tomography. Gynecol Oncol 2000 Jun;77(3):454-9 89

Gracia Marco R, Aguilar Garcia-Iturrospe EJ, Fernandez Lopez L, Cejas Mendez MR, Herreros Rodriguez O, Diaz Ramirez A, Hernandez Martinez J, Keshavan MS. Hypofrontality in schizophrenia: influence of normalization methods. Prog Neuropsychopharmacol Biol Psychiatry 1997 Nov;21(8): 1239-56

Grandjean H, Sarramon MF. Sonographic measurement of nuchal skinfold thickness for detection of Down syndrome in the second-trimester fetus: a multicenter prospective study. The AFDPHE Study Group. Association Francaise pour le Depistage et la Prevention des Handicaps de l'Enfant. Obstet Gynecol 1995 Jan;85(l): 103-6

Grasel RP, Outwater EK, Siegelman ES, Capuzzi D, Parker L, Hussain SM. Endometrial polyps: MR imaging features and distinction from endometrial carcinoma. Radiology 2000 Jan;214(l):47-52

Grauer A, Raue F, Ziegler R. Clinical usefulness of a new chemiluminescent two-site immunoassay for human calcitonin. Exp Clin Endocrinol Diabetes 1998; 106(4):353-9

Greenspan JD, Joy SE, McGillis SL, Checkosky CM, Bolanowski SJ. A longitudinal study of somesthetic perceptual disorders in an individual with a unilateral thalamic lesion. Pain 1997 Aug;72(l-2): 13-25

Greenwell TJ, Woodhams S, Denton ER, MacKenzie A, Rankin SC, Popert R. One year's clinical experience with unenhanced spiral computed tomography for the assessment of acute loin pain suggestive of renal colic. BJU Int 2000 Apr;85(6):632-6.

Greim CA, Roewer N, Laux G, Schulte am Esch J. On-line estimation of left ventricular stroke volume using transoesophageal echocardiography and acoustic quantification. Br J Anaesth 1996 Sep;77(3):365-9

Greven KM, Williams DW 3rd, Keyes JW Jr, McGuirt WF, Watson NE Jr, Case LD. Can positron emission tomography distinguish tumor recurrence from irradiation sequelae in patients treated for larynx cancer? Cancer J Sei Am 1997 Nov-Dec;3(6):353-7

Grotta JC, Chiu D, Lu M, Patel S, Levine SR, Tilley BC, Brott TG, Haley EC Jr, Lyden PD, Kothari R, Frankel M, Lewandowski CA, Libman R, Kwiatkowski T, Broderick JP, Marler JR, Corrigan J, Huff S, Mitsias P, Talati S, Tanne D. Agreement and variability in the interpretation of early CT changes in stroke patients qualifying for intravenous rtPA therapy. Stroke 1999 Aug;30(8): 1528-33

Grunshaw ND, Renwick IG, Scarisbrick G, Nasmyth DG. Prospective evaluation of ultrasound in distal ileal and colonic obstruction. Clin Radiol 2000 May; 5 5 (5): 3 56-62

Grunwald F, Menzel C, Bender H, Palmedo H, Willkomm P, Ruhlmann J, Franckson T, Biersack HJ. Comparison of 18FDG-PET with 131iodine and 99mTc-sestamibi scintigraphy in differentiated thyroid cancer. Thyroid 1997 Jun;7(3):327-35 90

Gryspeerdt S, Clabout L, Van Hoe L, Berteloot P, Vergote IB. Intraperitoneal contrast material combined with CT for detection of peritoneal metastases of ovarian cancer. Eur J Gynaecol Oncol 1998;19(5):434-7

Guerriero S, Ajossa S, Mais V, Paoletti AM, Melis GB. The screening of tubal abnormalities in the infertile couple. J Assist Reprod Genet 1996 May;13(5):407-12

Gulec SA, Serafini AN, Moffat FL, Vargas-Cuba RD, Sfakianakis GN, Franceschi D, Crichton VZ, Subramanian R, Klein JL, De Jager RL. Radioimmunoscintigraphy of colorectal carcinoma using technetium-99m-labeled, totally human monoclonal antibody 88BV59H21-2. Cancer Res 1995 Dec 1;55(23 Suppl):5774s-5776s

Gunning MG, Anagnostopoulos C, Davies G, Knight CJ, Pennell DJ, Fox KM, Pepper J, Underwood SR. Simultaneous assessment of myocardial viability and function for the detection of hibernating myocardium using ECG-gated 99Tcm-tetrofosmin emission tomography: a comparison with 201T1 emission tomography combined with cine magnetic resonance imaging. Nucl Med Commun 1999 Mar;20(3):209-14

Gupta NC, Esterbrooks DJ, Hilleman DE, Mohiuddin SM. Comparison of adenosine and exercise thallium-201 single-photon emission computed tomography (SPECT) myocardial perfusion imaging. The GE SPECT Multicenter Adenosine Study Group. J Am Coll Cardiol 1992Feb;19(2):248-57

Gupta NC, Graeber GM, Rogers JS 2nd, Bishop HA. Comparative efficacy of positron emission tomography with FDG and computed tomographic scanning in preoperative staging of non-small cell lung cancer. Ann Surg 1999 Feb;229(2):286-91

Gussenhoven EJ, van der Lugt A, Pasterkamp G, van den Berg FG, Sie LH, Vischjager M, The SH, Li W, Pieterman H, van Urk H. Intravascular ultrasound predictors of outcome after peripheral balloon angioplasty. Eur J Vase Endovasc Surg 1995 Oct;10(3):279-88

Guyatt GH, Lefcoe M, Walter S, Cook D, Troyan S, Griffith L, King D, Zylak C, Hickey N, Carrier G. Interobserver variation in the computed tomographic evaluation of mediastinal lymph node size in patients with potentially resectable lung cancer. Canadian Lung Oncology Group. Chest 1995 Jan;107(l):l 16-9

Haferkamp A, Brkovic D, Wiesel M, Staehler G, Dorsam J. Role of color-coded Doppler sonography in the assessment of internal ureteral stent patency. J Endourol 1999 Apr;13(3):199- 203

Hagspiel KD, Neidl KF, Eichenberger AC, Weder W, Marincek B. Detection of liver metastases: comparison of superparamagnetic iron oxide-enhanced and unenhanced MR imaging at 1.5 T with dynamic CT, intraoperative US, and percutaneous US. Radiology 1995 Aug; 196(2):471-8

Hakim LS, Kulaksizoglu H, Mulligan R, Greenfield A, Goldstein I. Evolving concepts in the diagnosis and treatment of arterial high flow priapism. J Urol 1996 Feb;155(2):541-8 91

Haller H, Matecjcic N, Rukavina B, Krasevic M, Rupcic S, Mozetic D. Transvaginal sonography and hysteroscopy in women with postmenopausal bleeding. Int J Gynaecol Obstet 1996 Aug;54(2): 155-9

Halligan S, Northover J, Bartram CI. Vaginal endosonography to diagnose enterocoele. Br J Radiol 1996Nov;69(827):996-9

Halvorsen RA Jr, Panushka C, Oakley GJ, Letourneau JG, Adcock LL. Intraperitoneal contrast material improves the CT detection of peritoneal metastases. AJR Am J Roentgenol 1991 Jul;157(l):37-40

Hamann GF, Schatzer-Klotz D, Frohlig G, Strittmatter M, Jost V, Berg G, Stopp M, Schimrigk K, Schieffer H. Femoral injection of echo contrast medium may increase the sensitivity of testing for a patent foramen ovale. Neurology 1998 May;50(5): 1423-8

Hans D, Srivastav SK, Singal C, Barkmann R, Njeh CF, Kantorovich E, Gluer CC, Genant HK. Does combining the results from multiple bone sites measured by a new quantitative ultrasound device improve discrimination of hip fracture? J Bone Miner Res 1999 Apr;14(4):644-51

Haramati LB. CT-guided automated needle biopsy of the chest. AJR Am J Roentgenol 1995 Jul;165(l):53-5

Harding K, Evans S, Newnham J. Screening for the small fetus: a study of the relative efficacies of ultrasound biometry and symphysiofundal height. Aust N Z J Obstet Gynaecol 1995 May;35(2): 160-4

Hardy PA, Yue G. Measurement of magnetic resonance T2 for physiological experiments. J ApplPhysiol 1997 Sep;83(3):904-ll

Haring HP, Dilitz E, Pallua A, Hessenberger G, Kampfl A, Pfausler B, Schmutzhard E. Attenuated corticomedullary contrast: An early cerebral computed tomography sign indicating malignant middle cerebral artery infarction. A case-control study. Stroke 1999 May;30(5):1076- 82

Harkins SW, Taylor JR, Mattay VS. Response to tacrine in patients with dementia of the Alzheimer's type: cerebral perfusion change is related to change in mental status. Int J Neurosci 1996Feb;84(l-4):149-56

Harland CC, Kale SG, Jackson P, Mortimer PS, Bamber JC. Differentiation of common benign pigmehted skin lesions from melanoma by high-resolution ultrasound. Br J Dermatol 2000 Aug;143(2):281-9

Harris GJ, Lewis RF, Satlin A, English CD, Scott TM, Yurgelun-Todd DA, Renshaw PF. Dynamic susceptibility contrast MR imaging of regional cerebral blood volume in Alzheimer 92 disease: a promising alternative to nuclear medicine. AJNR Am J Neuroradiol 1998 Oct;19(9): 1727-32

Harrow EM, Abi-Saleh W, Blum J, Harkin T, Gasparini S, Addrizzo-Harris DJ, Arroliga AC, Wight G, Mehta AC. The utility of transbronchial needle aspiration in the staging of bronchogenic carcinoma. Am J Respir Crit Care Med 2000 Feb;161(2 Pt l):601-7

Hartnell GG, Cohen MC, Meier RA, Finn JP. Magnetic resonance angiography demonstration of congenital heart disease in adults. Clin Radiol 1996 Dec;51(12):851-7

Hatsukami TS, Primozich JF, Zierler RE, Harley JD, Strandness DE Jr. Color Doppler imaging of infrainguinal arterial occlusive disease. J Vase Surg 1992 Oct;16(4):527-31; discussion 531-3

Hatsukami TS, Ross R, Polissar NL, Yuan C. Visualization of fibrous cap thickness and rupture in human atherosclerotic carotid plaque in vivo with high-resolution magnetic resonance imaging. Circulation 2000 Aug 29;102(9):959-64

Hauge K, Flo K, Riedhart M, Granberg S. Can ultrasound-based investigations replace laparoscopy and hysteroscopy in infertility? Eur J Obstet Gynecol Reprod Biol 2000 Sep;92(l): 167-70

Hawighorst H, Schoenberg SO, Knapstein PG, Knopp MV, Schaeffer U, Essig M, van Kaick G. Staging of invasive cervical carcinoma and of pelvic lymph nodes by high resolution MRI with a phased-array coil in comparison with pathological findings. J Comput Assist Tomogr 1998 Jan- Feb;22(l):75-81

Hawnaur JM, Hutchinson CE, Isherwood I. Clinical evaluation of fast spin echo sequences for cranial magnetic resonance imaging at 0.5 tesla. Br J Radiol 1994 May;67(797):423-8

Hays JT, Mahmarian JJ, Cochran AJ, Verani MS. Dobutamine thallium-201 tomography for evaluating patients with suspected coronary artery disease unable to undergo exercise or vasodilator pharmacologic stress testing. J Am Coll Cardiol 1993 Jun;21(7):1583-90

Heaney RP, Avioli LV, Chesnut CH 3rd, Lappe J, Recker RR, Brandenburger GH. Osteoporotic bone fragility. Detection by ultrasound transmission velocity. JAMA 1989 May 26;261(20):2986-90

Heinle SK, Noblin J, Goree-Best P, Mello A, Ravad G, Mull S, Mammen P, Grayburn PA. Assessment of myocardial perfusion by harmonic power Doppler imaging at rest and during adenosine stress: comparison with (99m)Tc-sestamibi SPECT imaging. Circulation 2000 Jul 4;102(l):55-60

Heinzen MT, Yankaskas BC, Kwok RK. Comparison of woman-specific versus breast-specific data for reporting screening mammography performance. Acad Radiol 2000 Apr;7(4):232-6 93

Helms G, Stawiarz L, Kivisakk P, Link H. Regression analysis of metabolite concentrations estimated from localized proton MR spectra of active and chronic multiple sclerosis lesions. Magn Reson Med 2000 Jan;43(l): 102-10

Hems TE, Birch R, Carlstedt T. The role of magnetic resonance imaging in the management of traction injuries to the adult brachial plexus. J Hand Surg [Br] 1999 Oct;24(5):550-5

Hendel RC, Berman DS, Cullom SJ, Follansbee W, Heller GV, Kiat H, Groch MW, Mahmarian JJ. Multicenter clinical trial to evaluate the efficacy of correction for photon attenuation and scatter in SPECT myocardial perfusion imaging. Circulation 1999 Jun l;99(21):2742-9

Hendrix NW, Grady CS, Chauhan SP. Clinical vs. sonographic estimate of birth weight in term parturients. A randomized clinical trial. J Reprod Med 2000 Apr;45(4):317-22

Herholz K, Hölzer T, Bauer B, Schroder R, Voges J, Ernestus RI, Mendoza G, Weber- Luxenburger G, Lottgen J, Thiel A, Wienhard K, Heiss WD. HC-methionine PET for differential diagnosis of low-grade gliomas. Neurology 1998 May;50(5): 1316-22

Herlitz H, Landin K, Widgren B. Relationship between sodium-lithium countertransport and insulin sensitivity in mild hypertension. J Intern Med 1996 Mar;239(3):235-40

Heussel CP, Kauczor HU, Heussel GE, Fischer B, Beglich M, Mildenberger P, Thelen M. Pneumonia in febrile neutropenic patients and in bone marrow and blood stem-cell transplant recipients: use of high-resolution computed tomography. J Clin Oncol 1999 Mar;17(3):796-805

Hibbard JU, Tart M, Moawad AH. Cervical length at 16-22 weeks' gestation and risk for preterm delivery. Obstet Gynecol 2000 Dec;96(6):972-8

Hill R, Conron R, Greissinger P, Heller M. Ultrasound for the detection of foreign bodies in human tissue. Ann Emerg Med 1997 Mar;29(3):353-6

Hinkle GH, Burgers JK, Neal CE, Texter JH, Kahn D, Williams RD, Maguire R, Rogers B, Olsen JO, Badalament RA. Multicenter radioimmunoscintigraphic evaluation of patients with prostate carcinoma using indium-111 capromab pendetide. Cancer 1998 Aug 15;83(4):739-47

Ho YL, Lin LC, Yen RF, Wu CC, Chen MF, Huang PJ. Significance of dobutamine-induced ST- segment evaluation and T-wave pseudonormalization in patients with Q-wave myocardial infarction: simultaneous evaluation by dobutamine stress echocardiography and thallium-201 SPECT. Am J Cardiol 1999 Jul 15;84(2): 125-9

Hockel M, Vorndran B, Schlenger K, Baussmann E, Knapstein PG. Tumor oxygenation: a new predictive parameter in locally advanced cancer of the uterine cervix. Gynecol Oncol 1993 Nov;51(2):141-9 94

Hoegerle S, Altehoefer C, Ghanem N, Brink I, Moser E, Nitzsche E. 18F-DOPA positron emission tomography for tumour detection in patients with medullary thyroid carcinoma and elevated calcitonin levels. Eur J Nucl Med 2001 Jan;28(l):64-71

Hoekstra HJ, Boeve WJ, Kamman RL, Mooyaart EL. Clinical applicability of human in vivo localized phosphorus-31 magnetic resonance spectroscopy of bone and soft tissue tumors. Ann Surg Oncol 1994 Nov;l(6):504-ll

Holden A. The role of colour and duplex Doppler ultrasound in the assessment of thyroid nodules. Australas Radiol 1995 Nov;39(4):343-9

Hollenbeck M, Hubert N, Meusel F, Grabensee B. Increasing sensitivity and specificity of Doppler sonographic detection of renal transplant rejection with serial investigation technique. Clin Investig 1994 Aug;72(8):609-15

Homsy YL, Tripp BM, Lambert R, Campos A, Capolicchio G, Dinh L, Chheda H. The captopril renogram: a new tool for diagnosing and predicting obstruction in childhood hydronephrosis. J Urol 1998 Oct; 160(4): 1446-9

Hopper KD, Keeton NC, Kasales CJ, Mahraj R, Van Slyke MA, Patrone SV, Singer PS, Tenhave TR. Utility of low mA 1.5 pitch helical versus conventional high mA abdominal CT. Clin Imaging 1998 Jan-Feb;22(l):54-9

Hori M, Murakami T, Kim T, Takahashi S, Oi H, Tomoda K, Narumi Y, Nakamura H. Sensitivity of double-phase helical CT during arterial portography for detection of hypervascular hepatocellular carcinoma. J Comput Assist Tomogr 1998 Nov-Dec;22(6): 861 -7

Hosotani H, Ohashi Y, Yamada M, Tsubota K. Reversal of abnormal corneal epithelial cell morphologic characteristics and reduced corneal sensitivity in diabetic patients by aldose reductase inhibitor, CT-112. Am J Ophthalmol 1995 Mar;l 19(3):288-94

Hricak H, White S, Vigneron D, Kurhanewicz J, Kosco A, Levin D, Weiss J, Narayan P, Carroll PR. Carcinoma of the prostate gland: MR imaging with pelvic phased-array coils versus integrated endorectal-pelvic phased-array coils. Radiology 1994 Dec;193(3):703-9

Huang JA, Wang PY, Chang MC, Chia LG, Yang DY, Wu TC. Allen score in clinical diagnosis of intracranial hemorrhage. Zhonghua Yi Xue Za Zhi (Taipei) 1994 Dec;54(6):407-11

Huang PJ, Yen RF, Chieng PU, Chen ML, Su CT. Do beta-blockers affect the diagnostic sensitivity of dobutamine stress thallium-201 single photon emission computed tomographic imaging? J Nucl Cardiol 1998 Jan-Feb;5(l):34-9

Huang TJ, Chen CY, Fan GF, Liao YS, Tu YK, Wong HF, Hsu RW. Posterior bow and vanishing line signs in diagnosis of burst fractures of the spine on plain radiographs. J Formos Med Assoc 1996 Dec;95(12):929-32 95

Huang Z, Komori S, Sawanobori T, Kohno I, Sano S, Ishihara T, Umetani K, Ijiri H, Koizumi K, Araki T, Kamiya K, Tada Y, Tamura K. Dipyridamole thallium-201 single-photon emission computed tomography for prediction of perioperative cardiac events in patients with arteriosclerosis obliterans undergoing vascular surgery. Jpn Circ J 1998 Apr;62(4):274-8

Hundley WG, Hamilton CA, Thomas MS, Herrington DM, Salido TB, Kitzman DW, Little WC, Link KM. Utility of fast cine magnetic resonance imaging and display for the detection of myocardial ischemia in patients not well suited for second harmonic stress echocardiography. Circulation 1999 Oct 19;100(16):1697-702

Hunerbein M, Rau B, Schlag PM. Laparoscopy and laparoscopic ultrasound for staging of upper gastrointestinal tumours. Eur J Surg Oncol 1995 Feb;21(l):50-5

Hunerbein M, Totkas S, Moesta KT, Ulmer C, Handke T, Schlag PM. The role of transrectal ultrasound-guided biopsy in the postoperative follow-up of patients with rectal cancer. Surgery 2001Feb;129(2):164-9

Husband DJ, Grant KA, Romaniuk CS. MRI in the diagnosis and treatment of suspected malignant spinal cord compression. Br J Radiol 2001 Jan;74(877): 15-23

Hustinx R, Paulus P, Jacquet N, Jerusalem G, Bury T, Rigo P. Clinical evaluation of whole-body 18F-fluorodeoxyglucose positron emission tomography in the detection of liver metastases. Ann Oncol 1998 Apr;9(4):397-401

Hutchinson M, Rusinek H, Nenov VI, Feinberg DA, Johnson G. Segmentation analysis in functional MRI: activation sensitivity and gray-matter specificity of RARE and FLASH. J Magn Reson Imaging 1997 Mar-Apr;7(2):361-4

Ihlberg L, Alback A, Roth WD, Edgren J, Lepantalo M. Interobserver agreement in duplex scanning for vein grafts. Eur J Vase Endovasc Surg 2000 May;19(5):504-8

Ikezoe J, Kadowaki K, Morimoto S, Takashima S, Kozuka T, Nakahara K, Kuwahara O, Takeuchi N, Yasumitsu T, Nakano N, et al. Mediastinal lymph node metastases from nonsmall cell bronchogenic carcinoma: reevaluation with CT. J Comput Assist Tomogr 1990 May- Jun;14(3):340-4

Imanishi M, Yano M, Okumura M, Kimura G, Kawano Y, Oda J, Hayashida K, Ishida Y, Takamiya M, Omae T. Aspirin renography in diagnosis of unilateral renovascular hypertension. Hypertens Res 1998 Sep;21(3):209-13

Imbeault P, Prud'Homme D, Tremblay A, Despres JP, Mauriege P. Adipose tissue metabolism in young and middle-aged men after control for total body fatness. J Clin Endocrinol Metab 2000 Jul;85(7):2455-62 96

Imbriaco M, Del Vecchio S, Riccardi A, Pace L, Di Salle F, Di Gennaro F, Salvatore M, Sodano A. Scintimammography with 99mTc-MIBI versus dynamic MRI for non-invasive characterization of breast masses. Eur J Nucl Med 2001 Jan;28(l):56-63

Indinnimeo M, Cicchini C, Stazi A, Alessandrini G, Ghini C, Fanelli F, Pavone P. Magnetic resonance imaging using endoanal coil in anal canal tumors after radiochemotherapy or local excision. Int Surg 2000 Apr-Jun;85(2): 143-6

Ishii K, Sasaki M, Matsui M, Sakamoto S, Yamaji S, Hayashi N, Mori T, Kitagaki H, Hirono N, Mori E. A diagnostic method for suspected Alzheimer's disease using H(2)150 positron emission tomography perfusion Z score. Neuroradiology 2000 Nov;42(l l):787-94

Jacobs I, Stabile I, Bridges J, Kemsley P, Reynolds C, Grudzinskas J, Oram D. Multimodal approach to screening for ovarian cancer. Lancet 1988 Feb 6;1(8580):268-71

Jager HR, Albrecht T, Curati-Alasonatti WL, Williams EJ, Haskard DO. MRI in neuro-Behcet's syndrome: comparison of conventional spin-echo and FLAIR pulse sequences. Neuroradiology 1999 Oct;41(10):750-8

Jager L, Menauer F, Holzknecht N, Scholz V, Grevers G, Reiser M. Sialolithiasis: MR sialography of the submandibular duct~an alternative to conventional sialography and US? Radiology 2000 Sep;216(3):665-71

Jalili M, Crade M, Davis AL. Carotid blood-flow velocity changes detected by Doppler ultrasound in determination of brain death in children. A preliminary report. Clin Pediatr (Phila) 1994Nov;33(ll):669-74

Jenkins CN, Thuau H. Ultrasound imaging in assessment of fractures of the orbital floor. Clin Radioll997Sep;52(9):708-ll

Johannsson G, Marin P, Lonn L, Ottosson M, Stenlof K, Bjorntorp P, Sjostrom L, Bengtsson BA. Growth hormone treatment of abdominally obese men reduces abdominal fat mass, improves glucose and lipoprotein metabolism, and reduces diastolic blood pressure. J Clin Endocrinol Metab 1997 Mar;82(3):727-34

Johansson M, Jensen G, Aurell M, Friberg P, Herlitz H, Klingenstierna H, Volkmann R. Evaluation of duplex ultrasound and captopril renography for detection of renovascular hypertension. Kidney Int 2000 Aug;58(2):774-82

Johnson LL, Seidin DW. Clinical experience with technetium-99m teboroxime, a neutral, lipophilic myocardial perfusion imaging agent. Am J Cardiol 1990 Oct 16;66(13):63E-67E

Johnson LL, Seidin DW. The role of antimyosin antibodies in acute myocardial infarction. SeminNucl Med 1989 Jul;19(3):238-46 97

Johnson MB, Wilkinson ID, Wattam J, Venables GS, Griffiths PD. Comparison of Doppier ultrasound, magnetic resonance angiographic techniques and catheter angiography in evaluation of carotid stenosis. Clin Radiol 2000 Dec;55(12):912-20

Johnson PT, Halpern EJ, Kuszyk BS, Heath DG, Wechsler RJ, Nazarian LN, Gardiner GA, Levin DC, Fishman EK. Renal artery stenosis: CT angiography-comparison of real-time volume-rendering and maximum intensity projection algorithms. Radiology 1999 May;211(2):337-43

Jones L, Pressdee DJ, Lamont PM, Baird RN, Murphy KP A phase contrast (PC) rephase/dephase sequence of magnetic resonance angiography (MRA): a new technique for imaging distal run-off in the pre-operative evaluation of peripheral vascular disease. Clin Radiol 1998 May;53(5):333-7

Jongbloets LM, Lensing AW, Koopman MM, Buller HR, ten Cate JW. Limitations of compression ultrasound for the detection of symptomless postoperative deep vein thrombosis. Lancet 1994 May 7;343(8906):1142-4

Joseph DP, Pieramici DJ, Beauchamp NJ Jr. Computed tomography in the diagnosis and prognosis of open-globe injuries. Ophthalmology 2000 Oct;107(10):1899-906

Jouglard J, Kozak-Ribbens G, de Haro L, Cozzone PJ. Research into individual predisposition to develop acute rhabdomyolysis attributed to fenoverine. Hum Exp Toxicol 1996 Oct;15(10):815- 20

Joypaul BV, Kennedy N, Hanson J, Holley M, Browning M, Newman EL, Cuschieri A. Immunoscintigraphy of primary colorectal cancers with indium-111 monoclonal antibody B72.3. J R Coll Surg Edinb 1994 Feb;39(l):39-43

Jucquois I, Nihoyannopoulos P, D'Hondt AM, Roelants V, Robert A, Melin JA, Glass D, Vanoverschelde JL. Comparison of myocardial contrast echocardiography with NCI00100 and (99m)Tc sestamibi SPECT for detection of resting myocardial perfusion abnormalities in patients with previous myocardial infarction. Heart 2000 May; 83 (5): 518-24

Juweid M, Sharkey RM, Behr T, Swayne LC, Rubin AD, Hanley D, Herskovic T, Markowitz A, Siegel J, Goldenberg DM. Targeting and initial radioimmunotherapy of medullary thyroid carcinoma with 1311-labeled monoclonal antibodies to carcinoembryonic antigen. Cancer Res 1995 Dec 1;55(23 Suppl):5946s-5951s

Kacl GM, Hagspiel KD, Marincek B. Focal nodular hyperplasia of the liver: serial MRI with Gd- DOTA, superparamagnetic iron oxide, and Gd-EOB-DTPA. Abdom Imaging 1997 May- Jun;22(3):264-

Kallen K, Burtscher IM, Holtas S, Ryding E, Rosen I. 201Thallium SPECT and 1H-MRS compared with MRI in chemotherapy monitoring of high-grade malignant astrocytomas. J Neurooncol 2000;46(2): 173-85 98

Kallen K, Geijer B, Malmstrom P, Andersson AM, Holtas S, Ryding E, Rosen I. Quantitative 201T1 SPET imaging in the follow-up of treatment for brain tumour: a sensitive tool for the early identification of response to chemotherapy? Nucl Med Commun 2000 Mar;21(3):259-67

Kallikazaros IE, Tsioufis CP, Stefanadis CI, Pitsavos CE, Toutouzas PK. Closed relation between carotid and ascending aortic atherosclerosis in cardiac patients. Circulation 2000 Nov 7;102(19Suppl3):III263-8

Kalofonos HP, Giannakenas C, Kosmas C, Apostolopoulos D, Onienadum A, Petsas T, Dimopoulos D, Epenetos AA, Vassilakos PJ. Radioimmunoscintigraphy in patients with ovarian cancer. Acta Oncol 1999;38(5):629-34

Kandarpa K, Piwnica-Worms D, Chopra PS, Admas DF, Hunink MG, Donaldson MC, Whittemore AD, Mannick JA, Harrington DP. Prospective double-blinded comparison of MR imaging and aortography in the preoperative evaluation of abdominal aortic aneurysms. J Vase Interv Radiol 1992 Feb;3(l):83-9

Kaneko T, Nakao A, Inoue S, Harada A, Nonami T, Itoh S, Endo T, Takagi H. Intraportal endovascular ultrasonography in the diagnosis of portal vein invasion by pancreatobiliary carcinoma. Ann Surg 1995 Dec;222(6):711-8

Kaneko T, Nakao A, Nomoto S, Endo T, Ito S, Takagi H. Intraportal endovascular ultrasonography for assessment of vascular invasion by biliary tract cancer. Gastrointest Endosc 1998 Jan;47(l):33-41

Karadeniz T, Topsakal M, Ariman A, Erton H, Basak D. Judgment of color Doppler ultrasound with respect to cavernous artery occlusion pressure in dynamic infusion cavernosometry when evaluating arteriogenic impotence. Urol Int 1996;57(2):85-8

Karadeniz T, Topsakal M, Aydogmus A, Beksan M. Role of RigiScan in the etiologic differential diagnosis of erectile dysfunction. Urol Int 1997;59(l):41-5

Karlsson B, Granberg S, Hellberg P, Wikland M. Comparative study of transvaginal sonography and hysteroscopy for the detection of pathologic endometrial lesions in women with postmenopausal bleeding. J Ultrasound Med 1994 Oct;13(10):757-62

Katzenwadel A, Popken G, Buitrago-Tellez CH, Schultze-Seemann W, Langer M, Sommerkamp H. Digitization and postprocessing of plain-film radiographs for assessment of stone fragmentation after extracorporeal Shockwave lithotripsy. J Endourol 1995 Dec;9(6):433-8

Kaul S, Senior R, Dittrich H, Raval U, Khattar R, Lahiri A. Detection of coronary artery disease with myocardial contrast echocardiography: comparison with 99mTc-sestamibi single-photon emission computed tomography. 99

Kazam E, Rudelli R, Monte W, Rubenstein WA, Ramirez de Arellano E, Kairam R, Paneth N. Sonographic diagnosis of cistemal subarachnoid hemorrhage in the premature infant. AJNR Am J Neuroradiol 1994 Jun;15(6): 1009-20

Keberle M, Kenn W, Tschammler A, Wittenberg G, Hilgarth M, Hoppe F, Hahn D. Current value of double-contrast pharyngography and of computed tomography for the detection and for staging of hypopharyngeal, oropharyngeal and supraglottic tumors. Eur Radiol 1999;9(9):1843- 50

Kedar RP, Cosgrove D, McCready VR, Bamber JC, Carter ER. Microbubble contrast agent for color Doppler US: effect on breast masses. Work in progress. Radiology 1996 Mar;198(3):679- 86

Keevil SF, Alstead EM, Dolke G, Brooks AP, Armstrong P, Farthing MJ. Non-invasive assessment of diffuse liver disease by in vivo measurement of proton nuclear magnetic resonance relaxation times at 0.08 T. Br J Radiol 1994 Nov;67(803): 1083-7

Keijer JT, van Rossum AC, van Eenige MJ, Bax JJ, Visser FC, Teule JJ, Visser CA. Magnetic resonance imaging of regional myocardial perfusion in patients with single-vessel coronary artery disease: quantitative comparison with (201)Thallium-SPECT and coronary angiography. J Magn Reson Imaging 2000 Jun;l 1(6):607-15

Kelly IE, Han TS, Walsh K, Lean ME. Effects of a thiazolidinedione compound on body fat and fat distribution of patients with type 2 diabetes. Diabetes Care 1999 Feb;22(2):288-93

Keltner JL, Johnson CA, Spurr JO, Beck RW. Visual field profile of optic neuritis. One-year follow-up in the Optic Neuritis Treatment Trial. Arch Ophthalmol 1994 Jul; 112(7):946-53

Kemeny V, Droste DW, Hermes S, Nabavi DG, Schulte-Altedorneburg G, Siebler M, Ringelstein EB. Automatic embolus detection by a neural network. Stroke 1999 Apr;30(4):807- 10

Kermode J, Butt W, Shann F. Comparison between prostaglandin El and epoprostenol (prostacyclin) in infants after heart surgery. Br Heart J 1991 Aug;66(2): 175-8

Kessler O, Franco I, Jayabose S, Reda E, Levitt S, Brock W. Is contralateral exploration of the kidney necessary in patients with Wilms tumor? J Urol 1996 Aug;156(2 Pt 2):693-5

Khalkhali I, Iraniha S, Cutrone JA, Diggles LE, Klein SR. Scintimammography with Tc-99m sestamibi. Acta Med Austriaca 1997;24(2):46-9

Khalkhali I, Villanueva-Meyer J, Edell SL, Connolly JL, Schnitt SJ, Baum JK, Houlihan MJ, Jenkins RM, Haber SB. Diagnostic accuracy of 99mTc-sestamibi breast imaging: multicenter trial results. J Nucl Med 2000 Dec;41(12): 1973-9 100

Khaw KT, Saverymuttu SH, Joseph AE. Correlation of lllindium WBC scintigraphy with ultrasound in the detection and assessment of inflammatory bowel disease. Clin Radiol 1990 Dec;42(6):410-3

Kiat H, Iskandrian AS, Villegas BJ, Starling MR, Berman DS. Arbutamine stress thallium-201 single-photon emission computed tomography using a computerized closed-loop delivery system. Multicenter trial for evaluation of safety and diagnostic accuracy. The International Arbutamine Study Group. J Am Coll Cardiol 1995 Nov 1;26(5): 1159-67

Kikuchi A, Sultana J, Okai T, Kobayashi K, Taketani Y. Intrauterine sonography for preoperative assessment of cervical invasion in endometrial carcinoma. Gynecol Oncol 1997 Jun;65(3):415-20

Kim JJ, Jung HC, Song IS, Choi KW, Kim CY, Han JK, Choi BI, Park JG, Lee KU, Choe KJ, Kim WH. Preoperative evaluation of the curative resectability of gastric cancer by abdominal computed tomography and ultrasonography: a prospective comparison study. Korean J Intern Medl997Jan;12(l):l-6

Kim NK, Choi JS, Sohn SK, Min JS. Transrectal ultrasonography in preoperative staging of rectal cancer. Yonsei Med J 1994 Dec;35(4):396-403

Kinoshita T, Okudera T, Tamura H, Ogawa T, Hatazawa J. Assessment of lacunar hemorrhage associated with hypertensive stroke by echo-planar gradient-echo T2*-weighted MRJ. Stroke 2000 Jul;31(7): 1646-50

Kirchhof K, Hahnel S, Jansen O, Zake S, Sartor K. Gadolinium-enhanced magnetic resonance dacryocystography in patients with epiphora. J Comput Assist Tomogr 2000 Mar-Apr;24(2):327- 31

Kisilevsky BS, Muir DW, Low JA. Maturation of responses elicited by a vibroacoustic stimulus in a group of high-risk fetuses. Matern Child Nurs J 1990 Fall;19(3):239-50

Klijanienko J, Cote JF, Thibault F, Zafrani B, Meunier M, Clough K, Asselain B, Vielh P. Ultrasound-guided fine-needle aspiration cytology of nonpalpable breast lesions: Institut Curie's experience with 198 histologically correlated cases. Cancer 1998 Feb 25;84(1):36-41

Knapp R, Frauscher F, Heiweg G, Judmaier W, Strasser H, Bartsch G, zur Nedden D. Blood pressure changes after extracorporeal shock wave nephrolithotripsy: prediction by intrarenal resistive index. Eur Radiol 1996;6(5):665-9

Knecht MF, Heinrich F. Clinical evaluation of an immunoturbidimetric D-dimer assay in the diagnostic procedure of deep vein thrombosis and pulmonary embolism. Thromb Res 1997 Dec l;88(5):413-7 101

Knollmann FD, Bocksch W, Spiegelsberger S, Hetzer R, Felix R, Hummel M. Electron-beam computed tomography in the assessment of coronary artery disease after heart transplantation. Circulation 2000 May 2;101(17):2078-82

Ko YS, Lin LH, Chen DF. Laboratory aid and ultrasonography in the diagnosis of appendicitis in children. Zhonghua Min Guo Xiao Er Ke Yi Xue Hui Za Zhi 1995 Nov-Dec;36(6):415-9

Kofinas AD, Penry M, Hatjis CG. Compliance-weight deficit index. Combining umbilical artery resistance and growth deficit for predicting intrauterine growth retardation and poor perinatal outcome. J Reprod Med 1994 Aug;39(8):595-600

Kogutt MS, Goldwag SS, Gupta KL, Kaneko K, Humbert JR. Correlation of transcranial Doppler ultrasonography with MRI and MRA in the evaluation of sickle cell disease patients with prior stroke. 254: Pediatr Radiol 1994;24(3):204-6

Köhler A, Hoffmann R, Platz A, Bino M. Diagnostic value of duplex ultrasound and liquid crystal contact thermography in preclinical detection of deep vein thrombosis after proximal femur fractures. Arch Orthop Trauma Surg 1998;117(l-2):39-42

Kohlhaufl M, Brand P, Selzer T, Scheuch G, Meyer T, Weber N, Schulz H, Haussinger K, Heyder J. Diagnosis of emphysema in patients with chronic bronchitis: a new approach. Eur RespirJ1998 0ct;12(4):793-8

Koito K, Namieno T, Ichimura T, Hirokawa N, Syonai T, Hareyama M, Katsuramaki T, Hirata K, Nishi M. Power Doppler sonography: evaluation of hepatocellular carcinoma after treatment with transarterial embolization or percutaneous ethanol injection therapy. AJR Am J Roentgenol 2000Feb;174(2):337-41

Kole AC, Plukker JT, Nieweg OE, Vaalburg W. Positron emission tomography for staging of oesophageal and gastroesophageal malignancy. Br J Cancer 1998 Aug;78(4):521-7

Kolecki RV, Golub RM, Sigel B, Machi J, Kitamura H, Hosokawa T, Justin J, Schwartz J, Zaren HA. Accuracy of viscera slide detection of abdominal wall adhesions by ultrasound. Surg Endosc 1994Aug;8(8):871-4

Kook SH, Park HW, Lee YR, Lee YU, Pae WK, Park YL. Evaluation of solid breast lesions with power Doppler sonography. J Clin Ultrasound 1999 Jun;27(5):231-7

Kopf D, Bockisch A, Steinert H, Hahn K, Beyer J, Neumann HP, Hensen J, Lehnert H. Octreotide scintigraphy and catecholamine response to an octreotide challenge in malignant phaeochromocytoma. Clin Endocrinol (Oxf) 1997 Jan;46(l):39-44

Korner T, Kropf J, Gressner AM. Serum laminin and hyaluronan in liver cirrhosis: markers of progression with high prognostic value. J Hepatol 1996 Nov;25(5):684-8 102

Korogi Y, Takahashi M, Katada K, Ogura Y, Hasuo K, Ochi M, Utsunomiya H, Abe T, Imakita S. Intracranial aneurysms: detection with three-dimensional CT angiography with volume rendering-comparison with conventional angiographic and surgical findings. Radiology 1999 May;211(2):497-506

Kostakoglu L, Divgi CR, Hilton S, Cordon-Cardo C, Scott AM, Kalaigian H, Finn RD, Schlom J, Larson SM. Preselection of patients with high TAG-72 antigen expression leads to targeting of 94% of known metastatic tumor sites with monoclonal antibody I-131-CC49. Cancer Invest 1994;12(6):551-8

Koudriavtseva T, Pozzilli C, Di Biasi C, Iannilli M, Trasimeni G, Gasperini C, Argentino C, Gualdi GF. High-dose contrast-enhanced MRI in multiple sclerosis. Neuroradiology 1996 May;38Suppll:S5-9

Kovacs JA, Urowitz MB, Gladman DD, Zeman R. The use of single photon emission computerized tomography in neuropsychiatric SLE: a pilot study. J Rheumatol 1995 Jul;22(7): 1247-53

Kowalchuk RM, Kneeland JB, Dalinka MK, Siegelman ES, Dockery WD. MRI of the knee: value of short echo time fast spin-echo using high performance gradients versus conventional spin-echo imaging for the detection of meniscal tears. Skeletal Radiol 2000 Sep;29(9):520-4

Kraft E, Schwarz J, Trenkwalder C, Vogl T, Pfluger T, Oertel WH. The combination of hypointense and hyperintense signal changes on T2-weighted magnetic resonance imaging sequences: a specific marker of multiple system atrophy? Arch Neurol 1999 Feb;56(2):225-8

Kramer BS, Gohagan J, Prorok PC, Smart C. A National Cancer Institute sponsored screening trial for prostatic, lung, colorectal, and ovarian cancers. Cancer 1993 Jan 15;71(2 Suppl):589-93

Kramer S, Schulz-Wendtland R, Hagedorn K, Bautz W, Lang N. Magnetic resonance imaging and its role in the diagnosis of multicentric breast cancer. Anticancer Res 1998 May- Jun;18(3C):2163-4

Kramer S, Schulz-Wendtland R, Hagedorn K, Bautz W, Lang N. Magnetic resonance imaging in the diagnosis of local recurrences in breast cancer.

Krasna MJ, Mao YS, Sonett J, Gamliel Z. The role of thoracoscopic staging of esophageal cancer patients. Eur J Cardiothorac Surg 1999 Sep;16 Suppl LS31-3

Krause M, Weindler J, Ruprecht KW. Does warming of anesthetic solutions improve analgesia and akinesia in retrobulbar anesthesia? Ophthalmology 1997 Mar;104(3):429-32

Krause T, Eisenmann N, Reinhardt M, Bathmann J, Altehoefer C, Finke J, Moser E. Bone marrow scintigraphy using technetium-99m antigranulocyte antibody in malignant lymphomas. Ann Oncol 1999 Jan;10(l):79-85 103

Kroboth PD, MM, Bauer KS, Tullock W, Wright CE, Sweeney JA. Do alprazolam- induced changes in saccadic eye movement and psychomotor function follow the same time course? J Clin Pharmacol 1998 Apr;38(4):337-46

Krouse JH. Computed tomography stage, allergy testing, and quality of life in patients with sinusitis. Otolaryngol Head Neck Surg 2000 Oct; 123(4):3 89-92

Krumme B, Blum U, Schwertfeger E, Flügel P, Hollstin F, Schollmeyer P, Rump LC. Diagnosis of renovascular disease by intra- and extrarenal Doppler scanning. Kidney Int 1996 Oct;50(4): 1288-92

Krupnick AS, Teitelbaum DH, Geiger JD, Strouse PJ, Cox CS, Blane CE, Polley TZ. Use of abdominal ultrasonography to assess pediatric splenic trauma. Potential pitfalls in the diagnosis. Ann Surg 1997 Apr;225(4):408-14

Kuhl CK, Mielcareck P, Klaschik S, Leutner C, Wardelmann E, Gieseke J, Schild HH. Dynamic breast MR imaging: are signal intensity time course data useful for differential diagnosis of enhancing lesions? Radiology 1999 Apr;211(1): 101-10

Kuhn H, Gietzen FH, Leuner C, Schafers M, Schober O, Strunk-Muller C, Obergassel L, Freick M, Gockel B, Lieder F, Raute-Kreinsen U. Transcoronary ablation of septal hypertrophy (TASH): a new treatment option for hypertrophic obstructive cardiomyopathy. Z Kardiol 2000;89Suppl4:IV41-54

Kulkarni AV, Chumas PD, Drake JM, Armstrong DC. The reliability of the "absent cistern sign" in assessing LP shunt function. Can J Neurol Sei 1999 Feb;26(l):40-3

Kuo PC, Alfrey EJ, Li K, Vagelos R, Valantine H, Garcia G, Dafoe DC. Magnetic resonance imaging-derived parameter of portal flow predicts volume-mediated pulmonary hypertension in liver transplantation candidates. Surgery 1995 Oct;118(4):685-91

Kurita Y, Suzuki A, Masuda H, Ushiyama T, Suzuki K, Fujita K. Transition zone volume- adjusted prostate-specific antigen value predicts extracapsular carcinoma of the prostate in patients with intermediate prostate-specific antigen levels. Eur Urol 1998;33(l):32-8

Kuroda Y, Matsui M, Yukitake M, Kurohara K, Takashima H, Takashima Y, Endo C, Kato A, Mihara F. Assessment of MRI criteria for MS in Japanese MS and HAM/TSP. Neurology 1995 Jan;45(l):30-3

Kurtaran A, Leimer M, Kaserer K, Yang Q, Angelberger P, Niederle B, Virgolini I. Combined use of lllln-DTPA-D-Phe-l-octreotide (OCT) and 1231-vasoactive intestinal peptide (VIP) in the localization diagnosis of medullary thyroid carcinoma (MTC). Nucl Med Biol 1996 May;23(4):503-7

Kurtz AB, Tsimikas JV, Tempany CM, Hamper UM, Arger PH, Bree RL, Wechsler RJ, Francis IR, Kuhlman JE, Siegelman ES, Mitchell DG, Silverman SG, Brown DL, Sheth S, Coleman BG, 104

Ellis JH, Kurman RJ, Caudry DJ, McNeil BJ. Diagnosis and staging of ovarian cancer: comparative values of Doppler and conventional US, CT, and MR imaging correlated with surgery and histopathologic analysis-report of the Radiology Diagnostic Oncology Group. Radiology 1999 Jul;212(1): 19-27

Kwong JS, Adler BD, Padley SP, Muller NL. Diagnosis of diseases of the trachea and main bronchi: chest radiography vs CT. AJR Am J Roentgenol 1993 Sep;161(3):519-22

Laakso MP, Soininen H, Partanen K, Lehtovirta M, Hallikainen M, Hanninen T, Helkala EL, Vainio P, Riekkinen PJ Sr. MRI of the hippocampus in Alzheimer's disease: sensitivity, specificity, and analysis of the incorrectly classified subjects. Neurobiol Aging 1998 Jan- Feb;19(l):23-31

Laczika K, Mitterbauer G, Korninger L, Knobl P, Schwarzinger I, Kapiotis S, Haas OA, Kyrle PA, Pont J, Oehler L, et al. Rapid achievement of PML-RAR alpha polymerase chain reaction (PCR)-negativity by combined treatment with all-trans-retinoic acid and chemotherapy in acute promyelocytic leukemia: a pilot study. Leukemia 1994 Jan;8(l):l-5

Lahde S, Hyrynkangas K, Merikanio J, Pokela R, Jokinen K, Karkola P. Computed tomography and mediastinoscopy in the assessment of resectability of lung cancer. Acta Radiol 1989 Mar- Apr;30(2): 169-73

Laine K, Maatta T, Varonen H, Makela M. Diagnosing acute maxillary sinusitis in primary care: a comparison of ultrasound, clinical examination and radiography. Rhinology 1998 Mar;36(l):2- 6

Laissy JP, Blanc F, Soyer P, Assayag P, Sibert A, Tebboune D, Arrive L, Brochet E, Hvass U, Langlois J, et al. Thoracic aortic dissection: diagnosis with transesophageal echocardiography versus MR imaging. Radiology 1995 Feb;194(2):331-6

Laissy JP, Menegazzo D, Debray MP, Toublanc M, Ravery V, Dumont E, Schouman-Claeys E. Renal carcinoma: diagnosis of venous invasion with Gd-enhanced MR venography. Eur Radiol 2000;10(7):1138-43

Laivuori HM, Laakso M, Tikkanen MJ, Cacciatore B, Ylikorkala RO, Kaaja RJ. Short-term metabolic effects of isradipine and metoprolol in pre-eclampsia. J Hypertens 1999 Aug; 17(8): 1189-94

Lane MJ, Liu DM, Huynh MD, Jeffrey RB Jr, Mindelzun RE, Katz DS. Suspected acute appendicitis: nonenhanced helical CT in 300 consecutive patients. Radiology 1999 Nov;213(2):341-6

Lansberg MG, Albers GW, Beaulieu C, Marks MP. Comparison of diffusion-weighted MRI and CT in acute stroke. Neurology 2000 Apr 25;54(8): 1557-61 105

Larcom PG, Lotke PA, Steinberg ME, Holland G, Foster S. Magnetic resonance venography versus contrast venography to diagnose thrombosis after joint surgery. Clin Orthop 1996 Oct;(331):209-15

Laurent F, Latrabe V, Vergier B, Montaudon M, Vernejoux JM, Dubrez J. CT-guided transthoracic needle biopsy of pulmonary nodules smaller than 20 mm: results with an automated 20-gauge coaxial cutting needle. Clin Radiol 2000 Apr;55(4):281-7

Lechner P, Lind P, Goldenberg DM. Can postoperative surveillance with serial CEA immunoscintigraphy detect resectable rectal cancer recurrence and potentially improve tumor- free survival? J Am Coll Surg 2000 Nov;191(5):511-8

Lee DH, Vellet AD, Eliasziw M, Vidito L, Ebers GC, Rice GP, Hewett L, Dunlavy S. MR imaging field strength: prospective evaluation of the diagnostic accuracy of MR for diagnosis of multiple sclerosis at 0.5 and 1.5 T. Radiology 1995 Jan;194(l):257-62

Lee DS, Cheon GJ, Ann JY, Chung JK, Lee MC. Reproducibility of assessment of myocardial function using gated 99Tc(m)-MIBI SPECT and quantitative software. Nucl Med Commun 2000 Dec;21(12): 1127-34

Lee H, Frank MS, Rowberg AH, Choi HS, Kim Y. A new method for computed tomography image compression using adjacent slice data. Invest Radiol 1993 Aug;28(8):678-85

Lee MG, Baker ME, Sostman HD, Spritzer CE, Paine S, Paulson EK, Keogan MT. The diagnostic accuracy/efficacy of MRI in differentiating hepatic hemangiomas from metastatic colorectal/breast carcinoma: a multiple reader ROC analysis using a jackknife technique. J Comput Assist Tomogr 1996 Nov-Dec;20(6):905-13

Lehericy S, Cohen L, Bazin B, Samson S, Giacomini E, Rougetet R, Hertz-Pannier L, Le Bihan D, Marsault C, Baulac M. Functional MR evaluation of temporal and frontal language dominance compared with the Wada test. Neurology 2000 Apr 25;54(8):1625-33

Lehmann R, Borovicka J, Kunz P, Crelier G, Boesiger P, Fried M, Schwizer W, Spinas GA. Evaluation of delayed gastric emptying in diabetic patients with autonomic neuropathy by a new magnetic resonance imaging technique and radio-opaque markers. Diabetes Care 1996 Oct;19(10):1075-82

Leitha T, Walter R, Schlick W, Dudczak R. 99mTc-anti-CEA radioimmunoscintigraphy of lung adenocarcinoma. Chest 1991 Jan;99(l):14-9

Lennon CA, Gray DL. Sensitivity and specificity of ultrasound for the detection of neural tube and ventral wall defects in a high-risk population. Obstet Gynecol 1999 Oct;94(4):562-6

Leodolter A, Kahl S, Dominguez-Munoz JE, Gerards C, Glasbrenner B, Malfertheiner P. Comparison of two tubeless function tests in the assessment of mild-to-moderate exocrine pancreatic insufficiency. Eur J Gastroenterol Hepatol 2000 Dec;12(12):1335-8 106

Leslie DF, Johnson CD, MacCarty RL, Ward EM, Ilstrup DM, Harmsen WS. Single-pass CT of hepatic tumors: value of globular enhancement in distinguishing hemangiomas from hypervascular metastases. AJR Am J Roentgenol 1995 Dec;165(6):1403-6

Levi S, Schaaps JP, De Havay P, Coulon R, Defoort P. End-result of routine ultrasound screening for congenital anomalies: the Belgian Multicentric Study 1984-92. Ultrasound Obstet Gynecol 1995 Jun;5(6):366-71

Levine SE, Neagle CE, Esterhai JL, Wright DG, Dalinka MK. Magnetic resonance imaging for the diagnosis of osteomyelitis in the diabetic patient with a foot ulcer. Foot Ankle Int 1994 Mar;15(3): 151-6

Liaw YS, Yang PC, Yu CJ, Chang DB, Wang HJ, Lee LN, Kuo SH, Luh KT. Direct determination of cryptococcal antigen in transthoracic needle aspirate for diagnosis of pulmonary cryptococcosis. J Clin Microbiol 1995 Jun;33(6): 1588-91

Lichtenstein D, Meziere G, Biderman P, Gepner A. The comet-tail artifact: an ultrasound sign ruling out pneumothorax. Intensive Care Med 1999 Apr;25(4):383-8

Lindsell DR. Ultrasound imaging of pancreas and biliary tract. Lancet 1990 Feb 17;335(8686):390-3

Lipp RW, Silly H, Ranner G, Dobnig H, Passath A, Leb G, Krejs GJ. Radiolabeled octreotide for the demonstration of somatostatin receptors in malignant lymphoma and lymphadenopathy. J Nucl Med 1995 Jan;36(l):13-8

Lizak MJ, Datiles MB, Aletras AH, Kador PF, Balaban RS. MRI of the human eye using magnetization transfer contrast enhancement. Invest Ophthalmol Vis Sei 2000 Nov;41(12):3878- 81

Loftus WK, Metreweli C, Sung JJ, Yang WT, Leung VK, Set PA. Ultrasound, CT and colonoscopy of colonic cancer. Br J Radiol 1999 Feb;72(854): 144-8

London JF, Shawker TH, Doppman JL, Frucht HH, Vinayek R, Stark HA, Miller LS, Miller DL, Norton JA, Jensen RT, et al. Zollinger-Ellison syndrome: prospective assessment of abdominal US in the localization of gastrinomas. Radiology 1991 Mar;178(3):763-7

Lote K, Egeland T, Hager B, Skullerud K, Hirschberg H. Prognostic significance of CT contrast enhancement within histological subgroups of intracranial glioma. J Neurooncol 1998 Nov;40(2): 161-70

Lovejoy JC, Bray GA, Bourgeois MO, Macchiavelli R, Rood JC, Greeson C, Partington C. Exogenous androgens influence body composition and regional body fat distribution in obese postmenopausal women--a clinical research center study. J Clin Endocrinol Metab 1996 Jun;81(6):2198-203 107

Lovejoy JC, Bray GA, Greeson CS, Klemperer M, Morris J, Partington C, Tulley R. Oral anabolic steroid treatment, but not parenteral androgen treatment, decreases abdominal fat in obese, older men. Int J Obes Relat Metab Disord 1995 Sep;19(9):614-24

Low RN, Saleh F, Song SY, Shiftan TA, Barone RM, Lacey CG, Goldfarb PM. Treated ovarian cancer: comparison of MR imaging with serum CA-125 level and physical examination~a longitudinal study. Radiology 1999 May;211(2):519-28

Lowe MJ, Mock BJ, Sorenson JA. Functional connectivity in single and multislice echoplanar imaging using resting-state fluctuations. Neuroimage 1998 Feb;7(2): 119-32

Lowe VJ, Fletcher JW, Gobar L, Lawson M, Kirchner P, Valk P, Karis J, Hubner K, Delbeke D, Heiberg EV, Patz EF, Coleman RE. Prospective investigation of positron emission tomography in lung nodules. J Clin Oncol 1998 Mar; 16(3): 1075-84

Lowe VJ, Hoffman JM, DeLong DM, Patz EF, Coleman RE. Semiquantitative and visual analysis of FDG-PET images in pulmonary abnormalities. JNuclMed 1994Nov;35(ll):1771-6

Lu C, Chang TC, Hsiao YL, Kuo MS. Ultrasonographic findings of papillary thyroid carcinoma and their relation to pathologic changes. J Formos Med Assoc 1994 Nov-Dec;93(l l-12):933-8

Lucht R, Knopp MV, Brix G. Elastic matching of dynamic MR mammographic images. Magn Reson Med 2000 Jan;43(l):9-16

Lugo-Vicente H, Ortiz VN, Irizarry H, Camps JI, Pagan V. Pediatric thyroid nodules: management in the era of fine needle aspiration. J Pediatr Surg 1998 Aug;33(8): 1302-5

Lupo L, Angelelli G, Pannarale O, Altomare D, Macarini L, Memeo V. Improved accuracy of computed tomography in local staging of rectal cancer using water enema. Int J Colorectal Dis 1996;ll(2):60-4

Ma OJ, Mateer JR. Trauma ultrasound examination versus chest radiography in the detection of hemothorax. Ann Emerg Med 1997 Mar;29(3):312-5; discussion 315-6

Maas LC, Harris GJ, Satlin A, English CD, Lewis RF, Renshaw PF. Regional cerebral blood volume measured by dynamic susceptibility contrast MR imaging in Alzheimer's disease: a principal components analysis. J Magn Reson Imaging 1997 Jan-Feb;7(l):215-9

Maddahi J, Kiat H, Van Train KF, Prigent F, Friedman J, Garcia EV, Alazraki N, DePuey EG, Nichols K, Berman DS. Myocardial perfusion imaging with technetium-99m sestamibi SPECT in the evaluation of coronary artery disease. Am J Cardiol 1990 Oct 16;66(13):55E-62E

Maggino T, Gadducci A, DAddario V, Pecorelli S, Lissoni A, Stella M, Romagnolo C, Federghini M, Zucca S, Trio D, et al. Prospective multicenter study on CA 125 in postmenopausal pelvic masses. Gynecol Oncol 1994 Aug;54(2): 117-23 108

Maier AG, Kersting-Sommerhoff B, Reeders JW, Judmaier W, Annweiler AA, Meusel M, Wallengren NO. Staging of rectal cancer by double-contrast MR imaging using the rectally administered superparamagnetic iron oxide contrast agent ferristene and IV gadodiamide injection: results of a multicenter phase II trial. J Magn Reson Imaging 2000 Nov;12(5):651-60

Maier M, Steinborn M, Schmitz C, Stabler A, Kohler S, Pfahler M, Durr HR, Refior HJ. Extracorporeal shock wave application for chronic plantar fasciitis associated with heel spurs: prediction of outcome by magnetic resonance imaging. J Rheumatol 2000 Oct;27(10):2455-62

Mammen EF, Comp PC, Gosselin R, Greenberg C, Hoots WK, Kessler CM, Larkin EC, Liles D, Nugent DJ. PFA-100 system: a new method for assessment of platelet dysfunction. Semin ThrombHemost 1998;24(2): 195-202

Manfredi R, Pirronti T, Bonomo L, Marano P. Accuracy of computed tomography and magnetic resonance imaging in staging bronchogenic carcinoma. MAGMA 1996 Sep-Dec;4(3-4):257-62

Mani R, Regan F, Sheridan J, Batty V. Duplex ultrasound scanning for diagnosing lower limb deep vein thrombosis. Dermatol Surg 1995 Apr;21(4):324-6

Mantaka P, Strauss AD, Strauss LG, Moehler M, Goldschmidt H, Oberdorfer F, Kontaxakis G, Van Kaick G. Detection of treated liver metastases using fluorine-18-fluordeoxyglucose (FDG) and positron emission tomography (PET). Anticancer Res 1999 Sep-Oct;19(5C):4443-50

Mantoni M, Strandberg C, Neergaard K, Sloth C, Jorgensen PS, Thamsen H, Torholm C, Paaske BP, Rasmussen SW, Christensen SW, Wille-Jorgensen P. Triplex US in the diagnosis of asymptomatic deep venous thrombosis. Acta Radiol 1997 Mar;38(2):327-31

Manyak MJ, Hinkle GH, Olsen JO, Chiaccherini RP, Partin AW, Piantadosi S, Burgers JK, Texter JH, Neal CE, Libertino JA, Wright GL Jr, Maguire RT. Immunoscintigraphy with indium-lll-capromab pendetide: evaluation before definitive therapy in patients with prostate cancer. Urology 1999 Dec;54(6): 1058-63

Marcus MA, Murphy L, Pi-Sunyer FX, Albu JB. Insulin sensitivity and serum triglyceride level in obese white and black women: relationship to visceral and truncal subcutaneous fat. Metabolism 1999 Feb;48(2): 194-9

Maria BL, Drane WE, Quisling RG, Ringdahl DM, Mickle JP, Mendenhall NP, Marcus RB Jr, McCollough WM, Hamed LM, Eskin TA, et al. Value of thallium-201 SPECT imaging in childhood brain tumors. Pediatr Neurosurg 1994;20(l):ll-8

Mariani PP, Adriani E, Bellelli A, Maresca G. Magnetic resonance imaging of tunnel placement in posterior cruciate ligament reconstruction. Arthroscopy 1999 Oct;15(7):733-40

Marin P. Testosterone and regional fat distribution. Obes Res 1995 Nov;3 Suppl 4:609S-612S 109

Markel A, Weich Y, Gaitini D. Doppler ultrasound in the diagnosis of venous thrombosis. Angiology 1995 Jan;46(l):65-73

Markl M, Schneider B, Hennig J, Peschl S, Winterer J, Krause T, Laubenberger J. Cardiac phase contrast gradient echo MRI: measurement of myocardial wall motion in healthy volunteers and patients. Int J Card Imaging 1999 Dec;15(6):441-52

Marks LS, Dorey FJ, Macairan ML, Park C, deKernion JB. Three-dimensional ultrasound device for rapid determination of bladder volume. Urology 1997 Sep;50(3):341-8

Marks MP, Holmgren EB, Fox AJ, Patel S, von Kummer R, Froehlich J. Evaluation of early computed tomographic findings in acute ischemic stroke. Stroke 1999 Feb;30(2):389-92

Maroto A, Gines A, Salo J, Claria J, Gines P, Anibarro L, Jimenez W, Arroyo V, Rodes J. Diagnosis of functional kidney failure of cirrhosis with Doppler sonography: prognostic value of resistive index. Hepatology 1994 Oct;20(4 Pt l):839-44

Martelli G, Pilotti S, Coopmans de Yoldi G, Viganotti G, Fariselli G, Lepera P, Moglia D. Diagnostic efficacy of physical examination, mammography, fine needle aspiration cytology (triple-test) in solid breast lumps: an analysis of 1708 consecutive cases. Tumori 1990 Oct 31;76(5):476-9

Martinez JM, Echevarria M, Borrell A, Puerto B, Ojuel J, Fortuny A. Fetal heart rate and nuchal translucency in detecting chromosomal abnormalities other than Down syndrome. Obstet Gynecol 1998 Jul;92(l):68-71

Martins FE, Reis JP. Visual erotic stimulation test for initial screening of psychogenic erectile dysfunction: a reliable noninvasive alternative? J Urol 1997 Jan;157(l):134-9

Maruyama R, Noguchi T, Takano M, Takagi K, Morita N, Kikuchi R, Uchida Y. Usefulness of magnetic resonance imaging for diagnosing deep anorectal abscesses. Dis Colon Rectum 2000 Oct;43(10 Suppl):S2-5

Marwick TH, Brunken R, Meland N, Brochet E, Baer FM, Binder T, Flachskampf F, Kamp O, Nienaber C, Nihoyannopoulos P, Pierard L, Vanoverschelde JL, van der Wouw P, Lindvall K. Accuracy and feasibility of contrast echocardiography for detection of perfusion defects in routine practice: comparison with wall motion and technetium-99m sestamibi single-photon emission computed tomography. The Nycomed NCI00100 Investigators. J Am Coll Cardiol 1998Nov;32(5):1260-9

Masdeu JC, Van Heertum RL, Kleiman A, Anselmi G, Kissane K, Horng J, Yudd A, Luck D, Grundman M. Early single-photon emission computed tomography in mild head trauma. A controlled study. J Neuroimaging 1994 Oct;4(4): 177-81 110

Mason GC, Lilford RJ, Porter J, Nelson E, Tyrell S. Randomised comparison of routine versus highly selective use of Doppler ultrasound in low risk pregnancies. Br J Obstet Gynaecol 1993 Feb;100(2):130-3

Massari M, De Simone M, Cioffi U, Rosso L, Chiarelli M, Gabrielli F. Value and limits of endorectal ultrasonography for preoperative staging of rectal carcinoma. Surg Laparosc Endosc 1998 Dec;8(6):438-44

Mattman A, Feldman H, Forster B, Li D, Szasz I, Beattie BL, Schulzer M. Regional HmPAO SPECT and CT measurements in the diagnosis of Alzheimer's disease. Can J Neurol Sei 1997 Feb;24(l):22-8

Matzer L, Kiat H, Wang FP, Van Train K, Germano G, Friedman J, Berman DS. Pharmacologic stress dual-isotope myocardial perfusion single-photon emission computed tomography. Am Heart J 1994 Dec; 128(6 Pt 1): 1067-76

Mauriege P, Imbeault P, Prud'Homme D, Tremblay A, Nadeau A, Despres JP. Subcutaneous adipose tissue metabolism at menopause: importance of body fatness and regional fat distribution. J Clin Endocrinol Metab 2000 Jul;85(7):2446-54

Mazzanti M, Germano G, Kiat H, Kavanagh PB, Alexanderson E, Friedman JD, Hachamovitch R, Van Train KF, Berman DS. Identification of severe and extensive coronary artery disease by automatic measurement of transient ischemic dilation of the left ventricle in dual-isotope myocardial perfusion SPECT. J Am Coll Cardiol 1996 Jun;27(7): 1612-20

McMillan WD, McCarthy WJ, Bresticker MR, Pearce WH, Schneider JR, Golan JF, Yao JS. Mesenteric artery bypass: objective patency determination. J Vase Surg 1995 May;21(5):729-40; discussion 740-1

Menegazzo D, Laissy JP, Durrbach A, Debray MP, Messin B, Delmas V, Mignon F, Schouman- Claeys E. Hemodialysis access fistula creation: preoperative assessment with MR venography and comparison with conventional venography. Radiology 1998 Dec;209(3):723-8

Menzel J, Hoepffner N, Sulkowski U, Reimer P, Heinecke A, Poremba C, Domschke W. Polypoid tumors of the major duodenal papilla: preoperative staging with intraductal US, EUS, and CT~a prospective, histopathologically controlled study. Gastrointest Endosc 1999 Mar;49(3 Pt l):349-57

Merchant TE, Schwartz LH, Ballon D, Koutcher JA, Scher HI, Minsky BD. Intracavitary birdcage resonator: applications to the human prostate. J Magn Reson Imaging 1995 May- Jun;5(3):365-8

Method MW, Serafini AN, Averette HE, Rodriguez M, Penalver MA, Sevin BU. The role of radioimmunoscintigraphy and computed tomography scan prior to reassessment laparotomy of patients with ovarian carcinoma. A preliminary report. Cancer 1996 Jun 1;77(11):2286-93 Ill

Mettlin C, Lee F, Drago J, Murphy GP. The American Cancer Society National Prostate Cancer Detection Project. Findings on the detection of early prostate cancer in 2425 men. Cancer 1991 Jun 15;67(12):2949-58

Michael TA, Antonescu A, Bhambi B, Balasingam S. Accuracy and usefulness of atrial pacing in conjunction with transthoracic echocardiography in the detection of cardiac ischemia. Am J Cardiol 1996 Jan 15;77(2):187-90

Michael TA, Rao G, Balasingam S. Accuracy and usefulness of atrial pacing in conjunction with transesophageal echocardiography in the detection of cardiac ischemia (a comparative study with scintigraphic tomography and coronary arteriography). Am J Cardiol 1995 Mar 15;75(8):563-7

Michielsen PP, Duysburgh IK, Francque SM, Van der Planken M, Van Marck EA, Pelckmans PA. Ultrasonically guided fine needle puncture of focal liver lesions. Review and personal experience. Acta Gastroenterol Belg 1998 Apr-Jun;61(2): 158-63

Miller OF, Rineer SK, Reichard SR, Buckley RG, Donovan MS, Graham IR, Goff WB, Kane CJ. Prospective comparison of unenhanced spiral computed tomography and intravenous urogram in the evaluation of acute flank pain. Urology 1998 Dec;52(6):982-7

Miller SM, Ackermann R. Color Doppler sonography of the prostate. Urol Int 1996;57(3): 158-64

Miller WJ, Baron RL, Dodd GD 3rd, Federle MP. Malignancies in patients with cirrhosis: CT sensitivity and specificity in 200 consecutive transplant patients. Radiology 1994 Dec;193(3):645-50

Mitchell DG, Saini S, Weinreb J, De Lange EE, Runge VM, Kuhlman JE, Parisky Y, Johnson CD, Brown JJ, Schnall M, et al. Hepatic metastases and cavernous hemangiomas: distinction with standard- and triple-dose gadoteridol-enhanced MR imaging. Radiology 1994 Oct;193(l):49-57

Mitchell R, Buckler HM, Matson P, Lieberman B, Burger HG, Hilton B, Home G, Dyson M, Robertson WR. Oestradiol and immunoreactive inhibin-like secretory patterns following controlled ovarian hyperstimulation with urinary (Metrodin) or recombinant follicle stimulating hormone (Puregon). Hum Reprod 1996 May;l l(5):962-7

Moehrle M, Blum A, Rassner G, Juenger M. Lymph node metastases of cutaneous melanoma: diagnosis by B-scan and color Doppler sonography. J Am Acad Dermatol 1999 Nov;41(5 Pt l):703-9

Mol BW, Hajenius PJ, Engelsbel S, Ankum WM, van der Veen F, Hemrika DJ, Bossuyt PM. Can noninvasive diagnostic tools predict tubal rupture or active bleeding in patients with tubal pregnancy? Fertil Steril 1999 Jan;71(1): 167-73 112

Molloy J, Martin JF, Baskerville PA, Fräser SC, Markus HS. S-nitrosoglutathione reduces the rate of embolization in humans. Circulation 1998 Oct 6;98(14): 1372-5

Moretti JL, Defer G, Tamgac F, Weinmann P, Belin C, Cesaro P. Comparison of brain SPECT using 99mTc-bicisate (L,L-ECD) and [123I]IMP in cortical and subcortical strokes. J Cereb Blood Flow Metab 1994 Jan;14 Suppl l:S84-90

Moss HA, Britton PD, Flower CD, Freeman AH, Lomas DJ, Warren RM. How reliable is modern breast imaging in differentiating benign from malignant breast lesions in the symptomatic population? Clin Radiol 1999 Oct;54(10):676-82

Mourier A, Gautier JF, De Kerviler E, Bigard AX, Villette JM, Gamier JP, Duvallet A, Guezennec CY, Cathelineau G. Mobilization of visceral adipose tissue related to the improvement in insulin sensitivity in response to physical training in NIDDM. Effects of branched-chain amino acid supplements. Diabetes Care 1997 Mar;20(3):385-91

Mullin WJ, Heithoff KB, Gilbert TJ Jr, Renfrew DL. Magnetic resonance evaluation of recurrent disc herniation: is gadolinium necessary? Spine 2000 Jun 15;25(12): 1493-50:9

Murri L, Gori S, Massetani R, Bonanni E, Marcella F, Milani S. Evaluation of acute ischemic stroke using quantitative EEG: a comparison with conventional EEG and CT scan. Neurophysiol Clin 1998 Jun;28(3):249-57

Mussurakis S, Buckley DL, Coady AM, Turnbull LW, Horsman A. Observer variability in the interpretation of contrast enhanced MRI of the breast. Br J Radiol 1996 Nov;69(827): 1009-16

Mussurakis S, Buckley DL, Drew PJ, Fox JN, Carleton PJ, Turnbull LW, Horsman A. Dynamic MR imaging of the breast combined with analysis of contrast agent kinetics in the differentiation of primary breast tumours. Clin Radiol 1997 Jul;52(7):516-26

Mussurakis S, Gibbs P, Horsman A. Peripheral enhancement and spatial contrast uptake heterogeneity of primary breast tumours: quantitative assessment with dynamic MRI. J Comput Assist Tomogr 1998 Jan-Feb;22(l):35-46

Nagel E, Lehmkuhl HB, Bocksch W, Klein C, Vogel U, Frantz E, Ellmer A, Dreysse S, Fleck E. Noninvasive diagnosis of ischemia-induced wall motion abnormalities with the use of high-dose dobutamine stress MRI: comparison with dobutamine stress echocardiography. Circulation 1999 Feb 16;99(6):763-70

Naguib M, Malabarey T, AlSatli RA, Al Damegh S, Samarkandi AH. Predictive models for difficult laryngoscopy and intubation. A clinical, radiologic and three-dimensional computer imaging study. Can J Anaesth 1999 Aug;46(8):748-59

Nair MK, Webber RL, Johnson MP. Comparative evaluation of Tuned Aperture Computed Tomography for the detection of mandibular fractures. Dentomaxillofac Radiol 2000 Sep;29(5):297-301 113

Nakajima K, Taki J, Shuke N, Bunko H, Takata S, Hisada K. Myocardial perfusion imaging and dynamic analysis with technetium-99m tetrofosmin. J Nucl Med 1993 Sep;34(9): 1478-84

Nakayama E, Yoshiura K, Yuasa K, Tabata O, Araki K, Kanda S, Ozeki S, Shinohara M. Detection of bone invasion by gingival carcinoma of the mandible: a comparison of intraoral and panoramic radiography and computed tomography. Dentomaxillofac Radiol 1999 Nov;28(6):351-6

Natal! PG, Mottolese M, Donnorso RP, Buffa R, Bussolati G, Coggi G, Corradi G, Coscia- Porrazzi L, Ferretti M, Rondanelli E, et al. Use of monoclonal antibodies to human breast-tumor- associated antigens in the diagnosis of fine-needle aspirates of breast nodules: results of a multicenter study. Int J Cancer 1990 Jan 15;45(l):12-5

Natsugoe S, Yoshinaka H, Shimada M, Shirao K, Nakano S, Kusano C, Baba M, Fukumoto T, Takao S, Aikou T. Assessment of cervical lymph node metastasis in esophageal carcinoma using ultrasonography. Ann Surg 1999 Jan;229(l):62-6

Navarrete-Navarro P, Vazquez G, Bosch JM, Fernandez E, Rivera R, Carazo E. Computed tomography vs clinical and multidisciplinary procedures for early evaluation of severe abdomen and chest trauma~a cost analysis approach. Intensive Care Med 1996 Mar;22(3):208-12

Neumann DR, Esselstyn CB Jr, Madera A, Wong CO, Lieber M. Parathyroid detection in secondary hyperparathyroidism with 123I/99mTc-sestamibi subtraction single photon emission computed tomography. J Clin Endocrinol Metab 1998 Nov;83(l 1):3867-71

Neves PL, Faisca M, Gomes V, Cacodcar S, Bernardo I, Anunciada AI, Viegas E, Martins H, da Silva AM. Risk factors for left ventricular hypertrophy: role of Na(+)-Li+ countertransport. Kidney Int Suppl 1996 Jun;55:S 160-2

Nguyen BC, Stanford W, Thompson BH, Rossi NP, Kernstine KH, Kern JA, Robinson RA, Amorosa JK, Mammone JF, Outwater EK. Multicenter clinical trial of ultrasmall superparamagnetic iron oxide in the evaluation of mediastinal lymph nodes in patients with primary lung carcinoma. J Magn Reson Imaging 1999 Sep;10(3):468-73

Nicholson P, Higgins T, Forgarty E, Moore D, Dowling F. Three-dimensional spiral CT scanning in children with acute torticollis. Int Orthop 1999;23(l):47-50

Nienaber CA, Spielmann RP, von Kodolitsch Y, Siglow V, Piepho A, Jaup T, Nicolas V, Weber P, Triebel HJ, Bleifeld W. Diagnosis of thoracic aortic dissection. Magnetic resonance imaging versus transesophageal echocardiography. Circulation 1992 Feb;85(2):434-47

Nienaber CA, von Kodolitsch Y, Brockhoff CJ, Koschyk DH, Spielmann RP. Comparison of conventional and transesophageal echocardiography with magnetic resonance imaging for anatomical mapping of thoracic aortic dissection. A dual noninvasive imaging study with anatomical and/or angiographic validation. Int J Card Imaging 1994 Mar; 10(1): 1-14 114

Nies C, Leppek R, Sitter H, Klotter HJ, Riera J, Klose KJ, Schwerk WB, Rothmund M. Prospective evaluation of different diagnostic techniques for the detection of liver metastases at the time of primary resection of colorectal carcinoma. Eur J Surg 1996 Oct; 162(10): 811-6

Nollert G, Jonas RA, Reichart B. Optimizing cerebral oxygenation during cardiac surgery: a review of experimental and clinical investigations with near infrared spectrophotometry. Thorac Cardiovasc Surg 2000 Aug;48(4):247-5

Norberg M, Egevad L, Holmberg L, Sparen P, Norlen BJ, Busch C. The sextant protocol for ultrasound-guided core biopsies of the prostate underestimates the presence of cancer. Urology 1997 Oct;50(4):562-6

Nozoe T, Matsumata T, Sugimachi K. Usefulness of preoperative transvaginal ultrasonography for women with advanced gastric carcinoma. Am J Gastroenterol 1999 Sep;94(9):2509-12

Nucl Med Commun 2000 Jul;21(7):651-8

Nygren AT, Jogestrand T. Detection of patent foramen ovale by transcranial Doppler and carotid duplex ultrasonography: a comparison with transoesophageal echocardiography. Clin Physiol 1998Jul;18(4):327-30

Oak SN, Kulkarni B, Chaubal N. Color flow Doppler sonography: a reliable alternative to voiding cystourethrogram in the diagnosis of vesicoureteral reflux in children. Urology 1999 Jun;53(6):1211-4

O'Brien JT, Metcalfe S, Swann A, Hobson J, Jobst K, Ballard C, McKeith I, Gholkar A. Medial temporal lobe width on CT scanning in Alzheimer's disease: comparison with vascular dementia, depression and dementia with Lewy bodies. Dement Geriatr Cogn Disord 2000 Mar- Apr;ll(2):114-8

Ohbayashi N, Yamada I, Yoshino N, Sasaki T. Sjogren syndrome: comparison of assessments with MR sialography and conventional sialography. Hemodialysis access fistula creation: preoperative assessment with MR venography and comparison with conventional venography. Radiology 1998 Dec;209(3):723-8

Ohkuchi A, Minakami H, Sato I, Mori H, Nakano T, Tateno M. Predicting the risk of pre- eclampsia and a small-for-gestational-age infant by quantitative assessment of the diastolic notch in uterine artery flow velocity waveforms in unselected women. Ultrasound Obstet Gynecol 2000Aug;16(2):171-8

Ohuchi N, Yoshida K, Kimura M, Ouchi A, Shiiba K, Ohnuki K, Fukao A, Abe R, Matsuno S, Mori S. Comparison of false negative rates among breast cancer screening modalities with or without mammography: Miyagi trial. Jpn J Cancer Res 1995 May;86(5):501-6 115

Okegawa T, Noda H, Nutahara K, Higashihara E. Comparisons of the various combinations of free, complexed, and total prostate-specific antigen for the detection of prostate cancer. Eur Urol 2000 Oct;38(4):380-7

Olbricht CJ, Paul K, Prokop M, Chavan A, Schaefer-Prokop CM, Jandeleit K, Koch KM, Galanski M. Minimally invasive diagnosis of renal artery stenosis by spiral computed tomography angiography. Kidney Int 1995 Oct;48(4): 1332-7

Olgin J, Rosenberg H, Allen G, Seestedt R, Chance B. A blinded comparison of noninvasive, in vivo phosphorus nuclear magnetic resonance spectroscopy and the in vitro halothane/caffeine contracture test in the evaluation of malignant hyperthermia susceptibility. Anesth Analg 1991 Jan;72(l):36-47

Olin JW, Piedmonte MR, Young JR, DeAnna S, Grubb M, Childs MB. The utility of duplex ultrasound scanning of the renal arteries for diagnosing significant renal artery stenosis. Ann Intern Med 1995 Jun l;122(ll):833-8

Olsen EG, Olsen H. Influence of ethanol ingestion on the cornea. Acta Ophthalmol (Copenh) 1993 Oct;71(5):696-8

Olson BR, Cartledge T, Sebring N, Defensor R, Nieman L. Short-term fasting affects luteinizing hormone secretory dynamics but not reproductive function in normal-weight sedentary women. J Clin Endocrinol Metab 1995 Apr;80(4): 1187-93

Oostveen J, Prevo R, den Boer J, van de Laar M. Early detection of sacroiliitis on magnetic resonance imaging and subsequent development of sacroiliitis on plain radiography. A prospective, longitudinal study. J Rheumatol 1999 Sep;26(9): 1953-8

Orlinsky M, Knittel P, Feit T, Chan L, Mandavia D. The comparative accuracy of radiolucent foreign body detection using ultrasonography. Am J Emerg Med 2000 Jul;18(4):401-3

Orsini MG, Kuboki T, Terada S, Matsuka Y, Yatani H, Yamashita A. Clinical predictability of temporomandibular joint disc displacement. J Dent Res 1999 Feb;78(2):650-60

Ouslander JG, Simmons S, Tuico E, Nigam JG, Fingold S, Bates-Jensen B, Schnelle JF. Use of a portable ultrasound device to measure post-void residual volume among incontinent nursing home residents. J Am Geriatr Soc 1994Nov;42(ll):1189-92

Owen RS, Carpenter JP, Baum RA, Perloff LJ, Cope C. Magnetic resonance imaging of angiographically occult runoff vessels in peripheral arterial occlusive disease. N Engl J Med 1992 Jun 11;326(24): 1577-81

Owens LV, Färber MA, Young ML, Carlin RE, Criado-Pallares E, Passman MA, Keagy BA, Marston WA. The value of air plethysmography in predicting clinical outcome after surgical treatment of chronic venous insufficiency. J Vase Surg 2000 Nov;32(5):961-8 116

Ozdoba C, Remonda L, Heid O, Lovblad KO, Schroth G. High-resolution echo-planar imaging of the brain: is it suitable for routine clinical imaging? Neuroradiology 1997 Dec;39(12):833-40

Paik DS, Beaulieu CF, Jeffrey RB Jr, Karadi CA, Napel S. Visualization modes for CT colonography using cylindrical and planar map projections. J Comput Assist Tomogr 2000 Mar- Apr;24(2): 179-88

Pal L, Lapensee L, Toth TL, Isaacson KB. Comparison of office hysteroscopy, transvaginal ultrasonography and endometrial biopsy in evaluation of abnormal uterine bleeding. J Soc Laparoendosc Surg 1997 Apr-Jun;l(2):125-30

Palmedo H, Biersack HJ, Lastoria S, Maublant J, Prats E, Stegner HE, Bourgeois P, Hustinx R, Hilson AJ, Bischof-Delaloye A. Scintimammography with technetium-99m methoxyisobutylisonitrile: results of a prospective European multicentre trial. Eur J Nucl Med 1998 Apr;25(4):375-85

Palmedo H, Schomburg A, Grunwald F, Mailmann P, Boldt I, Biersack HJ. Scintimammography with Tc-99m MIBI in patients with suspicion of primary breast cancer. Nucl Med Biol 1996 Aug;23(6):681-4

Palmer JM, DiSandro M. Diuretic enhanced duplex Doppler sonography in 33 children presenting with hydronephrosis: a study of test sensitivity, specificity and precision. J Urol 1995 Nov; 154(5): 1885-8

Pandya PP, Goldberg H, Walton B, Riddle A, Shelley S, Snijders RJ, Nicolaides KH. The implementation of first-trimester scanning at 10-13 weeks' gestation and the measurement of fetal nuchal translucency thickness in two maternity units. Ultrasound Obstet Gynecol 1995 Jan;5(l):20-5

Pannek J, Marks LS, Pearson JD, Rittenhouse HG, Chan DW, Shery ED, Gormley GJ, Subong EN, Kelley CA, Stoner E, Partin AW. Influence of finasteride on free and total serum prostate specific antigen levels in men with benign prostatic hyperplasia. J Urol 1998 Feb;159(2):449-53

Pantel J, O'Leary DS, Cretsinger K, Bockholt HJ, Keefe H, Magnotta VA, Andreasen NC. A new method for the in vivo volumetric measurement of the human hippocampus with high neuroanatomical accuracy. Hippocampus 2000;10(6):752-8

Panting JR, Taylor AM, Gatehouse PD, Keegan J, Yang GZ, McGill S, Francis JM, Burman ED, Firmin DN, Penneil DJ. First-pass myocardial perfusion imaging and equilibrium signal changes using the intravascular contrast agent NCI00150 injection. J Magn Reson Imaging 1999 Sep;10(3):404-10

Panza JA, Laurienzo JM, Curiel RV, Quyyumi AA, Cannon RO 3rd. Transesophageal dobutamine stress echocardiography for evaluation of patients with coronary artery disease. J Am Coll Cardiol 1994 Nov 1;24(5): 1260-7 117

Pappalardo G, Polettini E, Frattaroli FM, Casciani E, D'Orta C, D'Amato M, Gualdi GF. Magnetic resonance colonography versus conventional colonoscopy for the detection of colonic endoluminal lesions. Gastroenterology 2000 Aug;l 19(2):300-4

Paradis AL, Cornilleau-Peres V, Droulez J, Van De Moortele PF, Lobel E, Berthoz A, Le Bihan D, Poline JB. Visual perception of motion and 3-D structure from motion: an fMRI study. Cereb Cortex 2000 Aug; 10(8): 772-83

Parivar F, Hricak H, Shinohara K, Kurhanewicz J, Vigneron DB, Nelson SJ, Carroll PR. Detection of locally recurrent prostate cancer after cryosurgery: evaluation by transrectal ultrasound, magnetic resonance imaging, and three-dimensional proton magnetic resonance spectroscopy. Urology 1996 Oct;48(4): 594-9

Park MS, Yu JS, Kim YH, Kim MJ, Kim JH, Lee S, Cho N, Kim DG, Kim KW. Acute cholecystitis: comparison of MR cholangiography and US. Radiology 1998 Dec;209(3):781-5

Parsey RV, Krishnan KR. A new MRI ratio method for in-vivo estimation of signal hypointensity in aging and Alzheimer's disease. Prog Neuropsychopharmacol Biol Psychiatry 1997 Nov;21(8): 1257-67

Pasanen PA, Partanen K, Pikkarainen P, Alhava E, Pirinen A, Janatuinen E. A prospective study on the value of ultrasound, computed tomography and endoscopic retrograde cholangiopancreatography in the diagnosis of unjaundiced cholestasis. In Vivo 1994 Mar- Apr;8(2):227-30

Pasquier F, Hamon M, Lebert F, Jacob B, Pruvo JP, Petit H. Medial temporal lobe atrophy in memory disorders. J Neurol 1997 Mar;244(3): 175-81

Patel MR, Kuntz KM, Klufas RA, Kim D, Kramer J, Polak JF, Skillman JJ, Whittemore AD, Edelman RR, Kent KC. Preoperative assessment of the carotid bifurcation. Can magnetic resonance angiography and duplex ultrasonography replace contrast arteriography? Stroke 1995 Oct;26(10): 1753-8

Patrux B, Laissy JP, Jouini S, Kawiecki W, Coty P, Thiebot J. Magnetic resonance angiography (MRA) of the circle of Willis: a prospective comparison with conventional angiography in 54 subjects. Neuroradiology 1994 Apr;36(3): 193-7

Pavics L, Grunwald F, Barzo P, Ambrus E, Menzel C, Schomburg A, Borda L, Mate E, Bodosi M, Csernay L, et al. Evaluation of cerebral vasoreactivity by SPECT and transcranial Doppler sonography using the acetazolamide test. Nuklearmedizin 1994 Dec;33(6):239-43

Pearle MS, Watamull LM, Mullican MA. Sensitivity of noncontrast helical computerized tomography and plain film radiography compared to flexible nephroscopy for detecting residual fragments after percutaneous nephrostolithotomy. J Urol 1999 Jul;162(l):23-6 118

Perre CI, Koot VC, de Hooge P, Leguit P. The value of ultrasound in the evaluation of palpable breast tumours: a prospective study of 400 cases. Eur J Surg Oncol 1994 Dec;20(6):637-40

Perrotti M, Kaufman RP Jr, Jennings TA, Thaler HT, Soloway SM, Rifkin MD, Fisher HA. Endo-rectal coil magnetic resonance imaging in clinically localized prostate cancer: is it accurate? J Urol 1996 Jul; 156(1): 106-9

Pescatore P, Glucker T, Delarive J, Meuli R, Pantoflickova D, Duvoisin B, Schnyder P, Blum AL, Dorta G. Diagnostic accuracy and interobserver agreement of CT colonography (virtual colonoscopy). Gut 2000 Jul;47(l): 126-30

Phillips MS, Williams MP, Flower CD. How useful is computed tomography in the diagnosis and assessment of bronchiectasis? Clin Radiol 1986 Jul;37(4):321-5

Philpott GW, Schwarz SW, Anderson CJ, Dehdashti F, Connett JM, Zinn KR, Meares CF, Cutler PD, Welch MJ, Siegel BA. RadioimmunoPET: detection of colorectal carcinoma with positron- emitting copper-64-labeled monoclonal antibody. JNucl Med 1995 Oct;36(10): 1818-24

Piccini P, Pavese N, Palombo C, Pittella G, Distante A, Bonuccelli U. Transcranial Doppler ultrasound in migraine and tension-type headache after apomorphine administration: double- blind crossover versus placebo study. Cephalalgia 1995 Oct;15(5):399-403

Pieterman RM, van Putten JW, Meuzelaar JJ, Mooyaart EL, Vaalburg W, Koeter GH, Fidler V, Pruim J, Green HJ. Preoperative staging of non-small-cell lung cancer with positron-emission tomography. N Engl J Med 2000 Jul 27;343(4):254-61

Pijl H, Ohashi S, Matsuda M, Miyazaki Y, Mahankali A, Kumar V, Pipek R, Iozzo P, Lancaster JL, Cincotta AH, DeFronzo RA. Bromocriptine: a novel approach to the treatment of type 2 diabetes. Diabetes Care 2000 Aug;23(8): 1154-61

Pinkas H, Mashiach R, Rabinerson D, Avrech OM, Royburt M, Rufas O, Meizner I, Ben-Rafael Z, Fisch B. Doppler parameters of uterine and ovarian stromal blood flow in women with polycystic ovary syndrome and normally ovulating women undergoing controlled ovarian stimulation. Ultrasound Obstet Gynecol 1998 Sep; 12(3): 197-200

Pneumaticos SG, Hipp JA, Esses SI. Sensitivity and specificity of dural sac and herniated disc dimensions in patients with low back-related leg pain. J Magn Reson Imaging 2000 Sep;12(3):439-43

Poehlman ET, Dvorak RV, DeNino WF, Brochu M, Ades PA. Effects of resistance training and endurance training on insulin sensitivity in nonobese, young women: a controlled randomized trial. J Clin Endocrinol Metab 2000 Jul;85(7):2463-8

Polo A, Zanette G, Manganotti P, Bertolasi L, De Grandis D, Rizzuto N. Spinal somatosensory evoked potentials in patients with tethered cord syndrome. Can J Neurol Sei 1994 Nov;21(4):325-30 119

Porte HL, Ernst OJ, Delebecq T, Metois D, Lemaitre LG, Wurtz AJ. Is computed tomography guided biopsy still necessary for the diagnosis of adrenal masses in patients with resectable non- small-cell lung cancer? Eur J Cardiothorac Surg 1999 May;15(5):597-601

Portenoy RK, Maldonado M, Fitzmartin R, Kaiko RF, Kanner R. Oral controlled-release morphine sulfate. Analgesic efficacy and side effects of a 100-mg tablet in cancer pain patients. Cancer 1989 Jun 1;63(11 Suppl):2284-8

Portincasa P, Di Ciaula A, Baldassarre G, Palmieri V, Gentile A, Cimmino A, Palasciano G. Gallbladder motor function in gallstone patients: sonographic and in vitro studies on the role of gallstones, smooth muscle function and gallbladder wall inflammation. J Hepatol 1994 Sep;21(3):430-40

Postert T, Hoppe P, Federlein J, Helbeck S, Ermert H, Przuntek H, Buttner T, Wilkening W. Contrast agent specific imaging modes for the ultrasonic assessment of parenchymal cerebral echo contrast enhancement. J Cereb Blood Flow Metab 2000 Dec;20( 12): 1709-16

Postle BR, Berger JS, D'Esposito M. Functional neuroanatomical double dissociation of mnemonic and executive control processes contributing to working memory performance. Proc Natl Acad Sei U S A 1999 Oct 26;96(22): 12959-64

Postma CT, van Aalen J, de Boo T, Rosenbusch G, Thien T. Doppler ultrasound scanning in the detection of renal artery stenosis in hypertensive patients. Br J Radiol 1992 Oct;65(778):857-60

Pozzoli MM, Fioretti PM, Salustri A, Reijs AE, Roelandt JR. Exercise echocardiography and technetium-99m MIBI single-photon emission computed tomography in the detection of coronary artery disease. Am J Cardiol 1991 Feb 15;67(5):350-5

Prando A, Wallace S. Helical CT of prostate cancer: early clinical experience. Am J Roentgenol 2000Aug;175(2):343-6

Prati F, Gil R, Di Mario C, Ozaki Y, Bruining N, Camenzind E, de Feyter PJ, Roelandt JR, Serruys PW. Is quantitative angiography sufficient to guide stent implantation? A comparison with three-dimensional reconstruction of intracoronary ultrasound images. G Ital Cardiol 1997 Apr;27(4):328-36

Prauer HW, Weber WA, Romer W, Treumann T, Ziegler SI, Schwaiger M. Controlled prospective study of positron emission tomography using the glucose analogue [18fJfluorodeoxyglucose in the evaluation of pulmonary nodules. Br J Surg 1998 Nov;85(ll): 1506-11

Predanic M, Vlahos N, Pennisi JA, Moukhtar M, Aleem FA. Color and pulsed Doppler sonography, gray-scale imaging, and serum CA 125 in the assessment of adnexal disease. Obstet Gynecol 1996 Aug;88(2):283-8 120

Prompeler HJ, Madjar H, Sauerbrei W, Lattermann U, Pfleiderer A. Diagnostic formula for the differentiation of adnexal tumors by transvaginal sonography. Obstet Gynecol 1997 Mar;89(3):428-33

Prough SG, Aksel S, Yeoman R. Luteinizing hormone bioactivity and variable responses to clomiphene citrate in chronic anovulation. Fertil Steril 1990 Nov;54(5):799-804

Proust F, Callonec F, Clavier E, Lestrat JP, Hannequin D, Thiebot J, Freger P. Usefulness of transcranial color-coded sonography in the diagnosis of cerebral vasospasm. Stroke 1999 May;30(5): 1091-8

Pruessner JC, Collins DL, Pruessner M, Evans AC. J Neurosci 2001 Jan 1;21(1): 194-200 Age and gender predict volume decline in the anterior and posterior hippocampus in early adulthood.

Pucci E, Belardinelli N, Regnicolo L, Nolfe G, Signorino M, Salvolini U, Angeleri F. Hippocampus and parahippocampal gyrus linear measurements based on magnetic resonance in Alzheimer's disease. EurNeurol 1998;39(1): 16-25

Pujol JL, Demoly P, Daures JP, Tarhini H, Godard P, Michel FB. Chest tumor response measurement during lung cancer chemotherapy. Comparison between computed tomography and standard roentgenography. Am Rev Respir Dis 1992 May; 145(5): 1149-54

Pujol JL, Parrat E, Lehmann M, Gautier V, Daures JP, Michel FB, Godard P. Lung cancer chemotherapy. Response-survival relationship depends on the method of chest tumor response evaluation.Am J Respir Crit Care Med 1996 Jan;153(l):243-9

Puppo P, Perachino M, Ricciotti G, Bozzo W. Prostate cancer detection in BPH patients. Arch Esp Urol 1994 Nov;47(9):867-72

Quarles LD, Yohay DA, Carroll BA, Spritzer CE, Minda SA, Bartholomay D, Lobaugh BA. Prospective trial of pulse oral versus intravenous calcitriol treatment of hyperparathyroidism in ESRD. Kidney Int 1994 Jun;45(6): 1710-21

Quinn SF, Sheley RC, Szumowski J, Shimakawa A. Evaluation of the iliac arteries: comparison of two-dimensional time of flight magnetic resonance angiography with cardiac compensated fast gradient recalled echo and contrast-enhanced three-dimensional time of flight magnetic resonance angiography. J Magn Reson Imaging 1997 Jan-Feb;7(l): 197-203

Racette SB, Horowitz JF, Mittendorfer B, Klein S. Racial differences in lipid metabolism in women with abdominal obesity. Am J Physiol Regul Integr Comp Physiol 2000 Sep;279(3):R944-50

Rachinsky I, Boguslavsky L, Goldstein D, Golan H, Pak I, Katz M, Lantsberg S. Diagnosis of pyogenic pelvic inflammatory diseases by 99mTc-HMPAO leucocyte scintigraphy. Eur J Nucl Med 2000 Dec;27(12): 1774-7 121

Raggi P, Callister TQ, Cooil B, Russo DJ, Lippolis NJ, Patterson RE. Evaluation of chest pain in patients with low to intermediate pretest probability of coronary artery disease by electron beam computed tomography. Am J Cardiol 2000 Feb l;85(3):283-8

Rajshekhar V, Chandy MJ. Validation of diagnostic criteria for solitary cerebral cysticercus granuloma in patients presenting with seizures. Acta Neurol Scand 1997 Aug;96(2):76-81

Rambaldi R, Poldermans D, Bax JJ, Boersma E, Valkema R, Elhendy A, Vierter WB, Fioretti PM, Roelandt JR, Krenning EP. Dobutamine stress echocardiography and technetium-99m- tetrofosmin/fluorine 18-fluorodeoxyglucose single-photon emission computed tomography and influence of resting ejection fraction to assess myocardial viability in patients with severe left ventricular dysfunction and healed myocardial infarction. Am J Cardiol 1999 Jul 15;84(2): 130-4

Ramsbacher J, Schilling AM, Wolf KJ, Brock M. Magnetic resonance myelography (MRM) as a spinal examination technique. Acta Neurochir (Wien) 1997;139(11):1080-4

Ranganath C, Johnson MK, D'Esposito M. Left anterior prefrontal activation increases with demands to recall specific perceptual information. J Neurosci 2000 Nov 15;20(22):RC108

Rao PM, Feltmate CM, Rhea JT, Schulick AH, Novelline RA Helical computed tomography in differentiating appendicitis and acute gynecologic conditions. Obstet Gynecol 1999 Mar;93(3):417-21

Rao PM, Rhea JT, Novelline RA, Mostafavi AA, Lawrason JN, McCabe CJ. Helical CT combined with contrast material administered only through the colon for imaging of suspected appendicitis. AJR Am J Roentgenol 1997 Nov; 169(5): 1275-80

Rao PM, Rhea JT. Colonic diverticulitis: evaluation of the arrowhead sign and the inflamed diverticulum for CT diagnosis. Radiology 1998 Dec;209(3):775-9

Rasanen HT, Manninen HI, Vanninen RL, Vainio P, Berg M, Saari T. Mild carotid artery atherosclerosis: assessment by 3-dimensional time-of-fiight magnetic resonance angiography, with reference to intravascular ultrasound imaging and contrast angiography. Stroke 1999 Apr;30(4):827-33

Raunest J, Oberle K, Loehnert J, Hoetzinger H. The clinical value of magnetic resonance imaging in the evaluation of meniscal disorders. J Bone Joint Surg Am 1991 Jan;73(l):l 1-6

Razumovsky AY, Gillard JH, Bryan RN, Hanley DF, Oppenheimer SM. TCD, MRA and MRI in acute cerebral ischemia. Acta Neurol Scand 1999 Jan;99(l):65-76

Razumovsky AY, Gillard JH, Bryan RN, Hanley DF, Oppenheimer SM. TCD, MRA and MRI in acute cerebral ischemia. Acta Neurol Scand 1999 Jan;99(l):65-76

Read JW, Perko M. Shoulder ultrasound: diagnostic accuracy for impingement syndrome, rotator cuff tear, and biceps tendon pathology. J Shoulder Elbow Surg 1998 May-Jun;7(3):264-71 122

Regan F, Petronis J, Bohlman M, Rodriguez R, Moore R. Perirenal MR high signal--a new and sensitive indicator of acute ureteric obstruction. Clin Radiol 1997 Jun;52(6):445-50

Regan F, Schaefer DC, Smith DP, Petronis JD, Bohlman ME, Magnuson TH. The diagnostic utility of HASTE MRI in the evaluation of acute cholecystitis. Half-Fourier acquisition single- shot turbo SE. J Comput Assist Tomogr 1998 Jul-Aug;22(4):638-42

Reginster JY, Dethor M, Pirenne H, Dewe W, Albert A. Reproducibility and diagnostic sensitivity of ultrasonometry of the phalanges to assess osteoporosis. Int J Gynaecol Obstet 1998 Oct;63(l):21-8

Reichenberger F, Habicht J, Matt P, Frei R, Soler M, Bolliger CT, Dalquen P, Gratwohl A, Tamm M. Diagnostic yield of bronchoscopy in histologically proven invasive pulmonary aspergillosis. Bone Marrow Transplant 1999 Dec;24(l 1): 1195-9

Reimer P, Jahnke N, Fiebich M, Schima W, Deckers F, Marx C, Holzknecht N, Saini S. Hepatic lesion detection and characterization: value of nonenhanced MR imaging, superparamagnetic iron oxide-enhanced MR imaging, and spiral CT-ROC analysis. Radiology 2000 Oct;217(l):152- 8

Reinhold C, Taourel P, Bret PM, Cortas GA, Mehta SN, Barkun AN, Wang L, Tafazoli F. Choledocholithiasis: evaluation of MR cholangiography for diagnosis. Radiology 1998 Nov;209(2):435-42

Renfer LG, Schow D, Thompson IM, Optenberg S. Is ultrasound guidance necessary for transrectal prostate biopsy? J Urol 1995 Oct; 154(4): 1390-1

Rennie JM, Coughtrey H, Morley R, Evans DH. Comparison of cerebral blood flow velocity estimation with cranial ultrasound imaging for early prediction of outcome in preterm infants. J Clin Ultrasound 1995 Jan;23(l):27-31

Resnick LM, Gupta RK, DiFabio B, Barbagallo M, Mann S, Marion R, Laragh JH. Intracellular ionic consequences of dietary salt loading in essential hypertension. Relation to blood pressure and effects of calcium channel blockade. J Clin Invest 1994 Sep;94(3): 1269-76

Rex DK, Vining D, Kopecky KK. An initial experience with screening for colon polyps using spiral CT with and without CT colography (virtual colonoscopy) Gastrointest Endosc 1999 Sep;50(3):309-13

Rhee E, Osborn A, Witt M. The correlation of cavernous systolic occlusion pressure with peak velocity flow using color duplex Doppler ultrasound. J Urol 1995 Feb;153(2):358-60

Richards JR, McGahan JP, Jones CD, Zhan S, Gerscovich EO. Ultrasound detection of blunt splenic injury. Injury 2001 Mar;32(2):95-103 123

Richards PJ, Riddell L, Reznek RH, Armstrong P, Pinching AJ, Parkin JM. High resolution computed tomography in HIV patients with suspected Pneumocystis carinii pneumonia and a normal chest radiograph. Clin Radiol 1996 Oct;51(10):689-93

Richie JP, Kavoussi LR, Ho GT, Vickers MA, O'Donnell MA, St Laurent D, Chen A, Goldstein DS, Loughlin KR. Prostate cancer screening: role of the digital rectal examination and prostate- specific antigen. Ann Surg Oncol 1994 Mar; 1(2): 117-20

Rickwood AM, Carry HM, McKendrick T, Williams MP, Jackson M, Pilling DW, Sprigg A. Current imaging of childhood urinary infections: prospective survey. BMJ 1992 Mar 14;304(6828):663-5

Ricotta JJ, Bryan FA, Bond MG, Kurtz A, O'Leary DH, Raines JK, Berson AS, Clouse ME, Calderon-Ortiz M, Toole JF, et al. Multicenter validation study of real-time (B-mode) ultrasound, arteriography, and pathologic examination. J Vase Surg 1987 Nov;6(5):512-20

Rieber A, Aschoff A, Nussle K, Wruk D, Tomczak R, Reinshagen M, Adler G, Brambs HJ. MRI in the diagnosis of small bowel disease: use of positive and negative oral contrast media in combination with enteroclysis. Eur Radiol 2000;10(9): 1377-82

Ries A, Singson P, Bidus M, Barnes JG. Use of the endometrial pipelle in the diagnosis of early abnormal gestations. Fertil Steril 2000 Sep;74(3):593-5

Rietbergen JB, Kranse R, Hoedemaeker RF, Kruger AE, Bangma CH, Kirkels WJ, Schroder FH. Comparison of prostate-specific antigen corrected for total prostate volume and transition zone volume in a population-based screening study. Urology 1998 Aug;52(2):237-46

Rinne D, Baum RP, Hör G, Kaufmann R. Primary staging and follow-up of high risk melanoma patients with whole-body 18F-fluorodeoxyglucose positron emission tomography: results of a prospective study of 100 patients. Cancer 1998 May 1;82(9): 1664-71

Rizzi PM, Kane PA, Ryder SD, Ramage JK, Gane E, Tan KC, Portmann B, Karani J, Williams R. Accuracy of radiology in detection of hepatocellular carcinoma before liver transplantation. Gastroenterology 1994 Nov;107(5): 1425-9

Robert Y, Rocourt N, Chevalier D, Duhamel A, Carcasset S, Lemaitre L. Helical CT of the larynx: a comparative study with conventional CT scan. Clin Radiol 1996 Dec;51(12):882-5

Roberts CS, Galloway KP, Honaker JT, Hülse G, Seligson D. Sonography for the office screening of suspected rotator cuff tears: early experience of the orthopedic surgeon. Am J Orthop 1998 Jul;27(7):503-6

Robertson PL, Berlangieri SU, Goergen SK, Waugh JR, Kalff V, Stevens SN, Hicks RJ, Fabiny RP, Ugoni A, Kelly MJ. Comparison of ultrasound and blood pool scintigraphy in the diagnosis of lower limb deep venous thrombosis. Clin Radiol 1994 Jun;49(6):382-90 124

Roehrborn CG, Girman CJ, Rhodes T, Hanson KA, Collins GN, Sech SM, Jacobsen SJ, Garraway WM, Lieber MM. Correlation between prostate size estimated by digital rectal examination and measured by transrectal ultrasound. Urology 1997 Apr;49(4):548-57

Rofsky NM, Weinreb JC, Bernardino ME, Young SW, Lee JK, Noz ME. Hepatocellular tumors: characterization with Mn-DPDP-enhanced MR imaging. Radiology 1993 Jul;188(l):53-9

Rolak LA, Beck RW, Paty DW, Tourtellotte WW, Whitaker JN, Rudick RA. Cerebrospinal fluid in acute optic neuritis: experience of the optic neuritis treatment trial. Neurology 1996 Feb;46(2):368-72

Ronchera-Oms CL, Casillas C, Marti-Bonmati L, Poyatos C, Tomas J, Sobejano A, Jimenez NV. Oral chloral hydrate provides effective and safe sedation in paediatric magnetic resonance imaging. J Clin Pharm Ther 1994 Aug;19(4):239-43

Rorvik J, Halvorsen OJ, Albrektsen G, Ersland L, Daehlin L, Haukaas S. Use of pelvic surface coil MR imaging for assessment of clinically localized prostate cancer with histopathological correlation. ClinRadiol 1999 Mar;54(3): 164-9

Rorvik J, Halvorsen OJ, Albrektsen G, Haukaas S. Lymphangiography combined with biopsy and computer tomography to detect lymph node metastases in localized prostate cancer. Scand J Urol Nephrol 1998 Apr;32(2): 116-9

Rosch T, Lightdale CJ, Botet JF, Boyce GA, Sivak MV Jr, Yasuda K, Heyder N, Palazzo L, Dancygier H, Schusdziarra V, et al. Localization of pancreatic endocrine tumors by endoscopic ultrasonography. N Engl J Med 1992 Jun 25;326(26): 1721-6

Rose DM, Delbeke D, Beauchamp RD, Chapman WC, Sandier MP, Sharp KW, Richards WO, Wright JK, Frexes ME, Pinson CW, Leach SD. 18Fluorodeoxyglucose-positron emission tomography in the management of patients with suspected pancreatic cancer. Ann Surg 1999 May;229(5):729-37; discussion 737-8

Rosen CL, Brown DF, Sagarin MJ, Chang Y, McCabe CJ, Wolfe RE. Ultrasonography by emergency physicians in patients with suspected ureteral colic. J Emerg Med 1998 Nov- Dec;16(6):865-70

Ross BD, Jacobson S, Villamil F, Korula J, Kreis R, Ernst T, Shonk T, Moats RA. Subclinical hepatic encephalopathy: proton MR spectroscopic abnormalities. Radiology 1994 Nov;193(2):457-63

Ross J. Two techniques of laparoscopic Burch repair for stress incontinence: a prospective, randomized study. J Am Assoc Gynecol Laparosc 1996 May;3(3):351-7

Rossi CR, Seno A, Vecchiato A, Foletto M, Tregnaghi A, De Candia A, Rubaltelli L, Montesco C, Lise M. The impact of ultrasound scanning in the staging and follow-up of patients with clinical stage I cutaneous melanoma. Eur J Cancer 1997 Feb;33(2):200-3 125

Rossi GP, Chiesura-Corona M, Tregnaghi A, Zanin L, Perale R, Soattin S, Pelizzo MR, Feltrin GP, Pessina AC. Imaging of aldosterone-secreting adenomas: a prospective comparison of computed tomography and magnetic resonance imaging in 27 patients with suspected primary aldosteronism. J Hum Hypertens 1993 Aug;7(4):357-63

Rother J, Jonetz-Mentzel L, Fiala A, Reichenbach JR, Herzau M, Kaiser WA, Weiller C. Hemodynamic assessment of acute stroke using dynamic single-slice computed tomographic perfusion imaging. Arch Neurol 2000 Aug;57(8): 1161-6

Rotmensch S, Liberati M, Bronshtein M, Schoenfeld-Dimaio M, Shalev J, Ben-Rafael Z, Copel JA. Prenatal sonographic findings in 187 fetuses with Down syndrome. Prenat Diagn 1997 Nov;17(ll):1001-9

Rovaris M, Inglese M, van Schijndel RA, Sormani MP, Rodegher M, Comi G, Filippi M. Sensitivity and reproducibility of volume change measurements of different brain portions on magnetic resonance imaging in patients with multiple sclerosis. J Neurol 2000 Dec;247(12):960- 5

Rovaris M, Rodegher M, Comi G, Filippi M. Correlation between MRI and short-term clinical activity in multiple sclerosis: comparison between standard- and triple-dose Gd-enhanced MRI. Eur Neurol 1999;41(3): 123-7

Roy C, Le Bras Y, Mangold L, Saussine C, Tuchmann C, Pfleger D, Jacqmin D. Small pelvic lymph node metastases: evaluation with MR imaging. Clin Radiol 1997 Jun;52(6):437-40

Roy C, Saussine C, Tuchmann C, Castel E, Lang H, Jacqmin D. Duplex Doppler sonography of the flaccid penis: potential role in the evaluation of impotence. J Clin Ultrasound 2000 Jul- Aug;28(6):290-4

Rozycki GS, Ballard RB, Feliciano DV, Schmidt JA, Pennington SD. Surgeon-performed ultrasound for the assessment of truncal injuries: lessons learned from 1540 patients. Ann Surg 1998 Oct;228(4):557-67

Rozycki GS, Feliciano DV, Ochsner MG, Knudson MM, Hoyt DB, Davis F, Hammerman D, Figueredo V, Harviel JD, Han DC, Schmidt JA. The role of ultrasound in patients with possible penetrating cardiac wounds: a prospective multicenter study. J Trauma 1999 Apr;46(4):543-51; discussion 551-2

Rubin DL, Falk KL, Sperling MJ, Ross M, Saini S, Rothman B, Shellock F, Zerhouni E, Stark D, Outwater EK, Schmiedl U, Kirby LC, Chezmar J, Coates T, Chang M, Silverman JM, Rofsky N, Burnett K, Engel J, Young SW. A multicenter clinical trial of Gadolite Oral Suspension as a contrast agent for MRI. J Magn Reson Imaging 1997 Sep-Oct;7(5):865-72

Ruch DS, Aldridge M, Holden M, Smith TL, Koman LA, Smith BP. Arterial reconstruction for radial artery occlusion. J Hand Surg [Am] 2000 Mar;25(2):282-90 126

Rudick RA, Cookfair DL, Simonian NA, Ransohoff RM, Richert JR, Jacobs LD, Herndon RM, Salazar AM, Fischer JS, Granger CV, Goodkin DE, Simon JH, Bartoszak DM, Bourdette DN, Braiman J, Brownscheidle CM, Coats ME, Cohan SL, Dougherty DS, Kinkel RP, Mass MK, Munchsauer FE, O'Reilly K, Priore RL, Whitham RH, et al. Cerebrospinal fluid abnormalities in a phase III trial of Avonex (IFNbeta-la) for relapsing multiple sclerosis. The Multiple Sclerosis Collaborative Research Group. J Neuroimmunol 1999 Jan l;93(l-2):8-14

Rypins EB, Evans DG, Hinrichs W, Kipper SL. Tc-99m-HMPAO white blood cell scan for diagnosis of acute appendicitis in patients with equivocal clinical presentation. Ann Surg 1997 Jul;226(l):58-65

S Blankenberg FG, Loh NN, Bracci P, DArceuil HE, Rhine WD, Norbash AM, Lane B, Berg A, Person B, Coutant M, Enzmann DR. Sonography, CT, and MR imaging: a prospective comparison of neonates with suspected intracranial ischemia and hemorrhage. AJNR Am J Neuroradiol 2000 Jan;21(l):213-8

Sadek AL, Schiotz HA. Transvaginal sonography in the management of ectopic pregnancy. Acta Obstet Gynecol Scand 1995 Apr;74(4):293-6

Saez F, Urresola A, Larena JA, Martin JI, Pijuan JI, Schneider J, Ibanez E. Endometrial carcinoma: assessment of myometrial invasion with plain and gadolinium-enhanced MR imaging. J Magn Reson Imaging 2000 Sep;12(3):460-6

Said B, McCart JA, Libutti SK, Choyke PL. Ferumoxide-enhanced MRI in patients with colorectal cancer and rising CEA: surgical correlation in early recurrence. Magn Reson Imaging 2000 Apr; 18(3):305-9

Saidi MH, Sadler RK, Theis VD, Akright BD, Farhart SA, Villanueva GR. Comparison of sonography, sonohysterography, and hysteroscopy for evaluation of abnormal uterine bleeding. J Ultrasound Med 1997 Sep;16(9):587-91

Saifuddin A, Burnett SJ. The value of lumbar spine MRI in the assessment of the pars interarticularis. Clin Radiol 1997 Sep;52(9):666-71

Salinsky MC, Oken BS, Kramer RE, Morehead L. A comparison of quantitative EEG frequency analysis and conventional EEG in patients with focal brain lesions. Electroencephalogr Clin Neurophysiol 1992 Dec;83(6):358-66

San Roman JA, Vilacosta I, Castillo JA, Rollan MJ, Hernandez M, Peral V, Garcimartin I, de la Torre MM, Fernandez-Aviles F. Selection of the optimal stress test for the diagnosis of coronary artery disease. Heart 1998 Oct;80(4):370-6

Sandstede JJ, Bertsch G, Beer M, Kenn W, Werner E, Pabst T, Lipke C, Kretschmer S, Neubauer S, Hahn D. Detection of myocardial viability by low-dose dobutamine Cine MR imaging. Magn Reson Imaging 1999 Dec;17(10): 1437-43 127

Sans M, Andreu V, Bordas JM, Llach J, Lopez-Guillenno A, Cervantes F, Bruguera M, Mondelo F, Montserrat E, Teres J, Rodes J. Usefulness of laparoscopy with liver biopsy in the assessment of liver involvement at diagnosis of Hodgkin's and non-Hodgkin's lymphomas. Gastrointest Endosc 1998 May;47(5):391-5

Sarfati MR, Fox KA, Warneke JA, Fajardo LL, Hunter GC, Rappaport WD. Stereotactic fine- needle aspiration cytology of nonpalpable breast lesions: an analysis of 258 consecutive aspirates. Am J Surg 1994 Dec;168(6):529-31; discussion 531-2

Sarihan H, Sari A, Abes M, Dine H. Nonpalpable undescending testis. Value of magnetic resonance imaging. Minerva Urol Nefrol 1998 Dec;50(4):233-6

Sato DT, Goff CD, Gregory RT, Robinson KD, Carter KA, Herts BR, Vilsack HB, Gayle RG, Parent FN 3rd, DeMasi RJ, Meier GH. Endoleak after aortic stent graft repair: diagnosis by color duplex ultrasound scan versus computed tomography scan. J Vase Surg 1998 Oct;28(4):657-63

Sau G, Siracusano S, Aiello I, dAloia G, Liguori G, Stener S, Lissiani A, Belgrano E. The usefulness of the somatosensory evoked potentials of the pudendal nerve in diagnosis of probable multiple sclerosis. Spinal Cord 1999 Apr;37(4):258-63

Sauve JS, Thorpe KE, Sackett DL, Taylor W, Barnett HJ, Haynes RB, Fox AJ. Can bruits distinguish high-grade from moderate symptomatic carotid stenosis? The North American Symptomatic Carotid Endarterectomy Trial. Ann Intern Med 1994 Apr 15;120(8):633-7

Scarabino T, Carriero A, Giannatempo GM, Marano R, De Matthaeis P, Bonomo L, Salvolini U. Contrast-enhanced MR angiography (CE MRA) in the study of the carotid stenosis: comparison with digital subtraction angiography (DSA). J Neuroradiol 1999 Jun;26(2):87-91

Schachter J, Hook EW 3rd, McCormack WM, Quinn TC, Chernesky M, Chong S, Girdner JI, Dixon PB, DeMeo L, Williams E, Cullen A, Lorincz A. Ability of the digene hybrid capture II test to identify Chlamydia trachomatis and Neisseria gonorrhoeae in cervical specimens. J Clin Microbiol 1999Nov;37(ll):3668-71

Schelling M, Braun M, Kuhn W, Bogner G, Gruber R, Gnirs J, Schneider KT, Ulm K, Rutke S, Staudach A. Combined transvaginal B-mode and color Doppler sonography for differential diagnosis of ovarian tumors: results of a multivariate logistic regression analysis. Gynecol Oncol 2000 Apr;77(l):78-86

Schillaci O, Monteleone F, D'Andrea N, Picardi V, Cangemi R, Cangemi V, Scopinaro F. Technetium-99m tetrofosmin single photon emission computed tomography in the evaluation of suspected lung cancer. Cancer Biother Radiopharm 1999 Apr; 14(2): 129-34

Schipper I, Hop WC, Fauser BC. The follicle-stimulating hormone (FSH) threshold/window concept examined by different interventions with exogenous FSH during the follicular phase of 128 the normal menstrual cycle: duration, rather than magnitude, of FSH increase affects follicle development. J Clin Endocrinol Metab 1998 Apr;83(4): 1292-8

Schirrmeister H, Guhlmann A, Eisner K, Kotzerke J, Glatting G, Rentschler M, Neumaier B, Träger H, Nussle K, Reske SN. Sensitivity in detecting osseous lesions depends on anatomic localization: planar bone scintigraphy versus 18F PET. J Nucl Med 1999 Oct;40(10):1623-9

Schlaepfer TE, Strain EC, Greenberg BD, Preston KL, Lancaster E, Bigelow GE, Barta PE, Pearlson GD. Site of opioid action in the human brain: mu and kappa agonists' subjective and cerebral blood flow effects. Am J Psychiatry 1998 Apr; 155(4):470-3

Schlief R, Deichert U. Hysterosalpingo-contrast sonography of the uterus and fallopian tubes: results of a clinical trial of a new contrast medium in 120 patients. Radiology 1991 Jan;178(l):213-5

Schonewille WJ, Tuhrim S, Singer MB, Atlas SW. Diffusion-weighted MRI in acute lacunar syndromes. A clinical-radiological correlation study. Stroke 1999 Oct;30(10):2066-9

Schoot DC, Hop WC, de Jong FH, van Dessel TJ, Fauser BC. Initial estradiol response predicts outcome of exogenous gonadotropins using a step-down dose regimen for induction of ovulation in polycystic ovary syndrome. Fertil Steril 1995 Dec;64(6): 1081-7

Schorr U, Blaschke K, Beige J, Distler A, Sharma AM. G-protein beta3 subunit 825T allele and response to dietary salt in normotensive men. J Hypertens 2000 Jul;18(7):855-9

Schröter A, Zerr I, Henkel K, Tschampa HJ, Finkenstaedt M, Poser S. Magnetic resonance imaging in the clinical diagnosis of Creutzfeldt-Jakob disease. Arch Neurol 2000 Dec;57(12):1751-7

Schutter EM, Sohn C, Kristen P, Mobus V, Crombach G, Kaufmann M, Caffier H, Kreienberg R, Verstraeten AA, Kenemans P. Estimation of probability of malignancy using a logistic model combining physical examination, ultrasound, serum CA 125, and serum CA 72-4 in postmenopausal women with a pelvic mass: an international multicenter study. Gynecol Oncol 1998 Apr;69(l):56-63

Schwarz J, Tatsch K, Gasser T, Arnold G, Oertel WH. [123]IBZM binding predicts dopaminergic responsiveness in patients with parkinsonism and previous dopaminomimetic therapy. Mov Disord 1997 Nov;12(6):898-902

Schwarzler P, Concin H, Bosch H, Berlinger A, Wohlgenannt K, Collins WP, Bourne TH. An evaluation of sonohysterography and diagnostic hysteroscopy for the assessment of intrauterine pathology. Ultrasound Obstet Gynecol 1998 May;l l(5):337-42

Schwerk WB, Wichtrup B, Rothmund M, Ruschoff J. Ultrasonography in the diagnosis of acute appendicitis: a prospective study. Gastroenterology 1989 Sep;97(3):630-9 129

Scopinaro F, Pani R, De Vincentis G, Soluri A, Pellegrini R, Porfiri LM. High-resolution scintimammography improves the accuracy of technetium-99m methoxyisobutylisonitrile scintimammography: use of a new dedicated gamma camera. Eur J Nucl Med 1999 Oct;26(10): 1279-88

Scott WW Jr, Rosenbaum JE, Ackerman SJ, Reichle RL, Magid D, Weiler JC, Gitlin JN. Subtle orthopedic fractures: teleradiology workstation versus film interpretation. Radiology 1993 Jun;187(3):811-5

Secher NJ, Hansen PK, Lenstrup C, Eriksen PS. Controlled trial of ultrasound screening for light for gestational age (LGA) infants in late pregnancy. Eur J Obstet Gynecol Reprod Biol 1986 Dec;23(5-6):307-13

Segall GM, Davis MJ. Prone versus supine thallium myocardial SPECT: a method to decrease artifactual inferior wall defects. J Nucl Med 1989 Apr;30(4):548-55

Seiderer M. Phase III clinical studies with gadoteridol for the evaluation of neurologic pathology. A European perspective. Invest Radiol 1992 Aug;27 Suppl l:S33-8

Seidman AD, Reichman BS, Crown JP, Yao TJ, Currie V, Hakes TB, Hudis CA, Gilewski TA, Baselga J, Forsythe P, et al. Paclitaxel as second and subsequent therapy for metastatic breast cancer: activity independent of prior anthracycline response. J Clin Oncol 1995 May;13(5):l 152- 9

Semelka RC, Schlund JF, Molina PL, Willms AB, Kahlenberg M, Mauro MA, Weeks SM, Cance WG. Malignant liver lesions: comparison of spiral CT arterial portography and MR imaging for diagnostic accuracy, cost, and effect on patient management. J Magn Reson Imaging 1996 Jan-Feb;6(l):39-43

Sengoku K, Satoh T, Saitoh S, Abe M, Ishikawa M. Evaluation of transvaginal color Doppler sonography, transvaginal sonography and CA 125 for prediction of ovarian malignancy. Int J Gynaecol Obstet 1994 Jul;46(l):39-43

Sengupta DK, Kirollos R, Findlay GF, Smith ET, Pearson JC, Pigott T. The value of MR imaging in differentiating between hard and soft cervical disc disease: a comparison with intraoperative findings. Eur Spine J 1999;8(3): 199-204

Senior R, Lahiri A. Enhanced detection of myocardial ischemia by stress dobutamine echocardiography utilizing the "biphasic" response of wall thickening during low and high dose dobutamine infusion. J Am Coll Cardiol 1995 Jul;26(l):26-32

Sensier Y, Bell PR, London NJ. The ability of qualitative assessment of the common femoral Doppler waveform to screen for significant aortoiliac disease. Eur J Vase Endovasc Surg 1998 Apr;15(4):357-64 130

Serafini AN, Klein JL, BG, Baum R, Chetanneau A, Pecking A, Fischman AJ, Hoover HC Jr, Wynant GE, Subramanian R, Goroff DK, Hanna MG Jr. Radioimmunoscintigraphy of recurrent, metastatic, or occult colorectal cancer with technetium 99m-labeled totally human monoclonal antibody 88BV59: results of pivotal, phase III multicenter studies. J Clin Oncol 1998 May; 16(5): 1777-87

Serfaty JM, Chirossel P, Chevallier JM, Ecochard R, Froment JC, Douek PC. Accuracy of three- dimensional gadolinium-enhanced MR angiography in the assessment of extracranial carotid artery disease. AJR Am J Roentgenol 2000 Aug;175(2):455-63

Serie JB, Lustgarten J, Lippa EA, Camras CB, Framm L, Payne JE, Deasy D, Podos SM. Six week safety study of 2% MK-927 administered twice daily to ocular hypertensive volunteers. J Ocul Pharmacol 1992 Spring;8(l):l-9

Servadei F, Faccani G, Roccella P, Seracchioli A, Godano U, Ghadirpour R, Naddeo M, Piazza G, Carrieri P, Taggi F, et al. Asymptomatic extradural haematomas. Results of a multicenter study of 158 cases in minor head injury. Acta Neurochir (Wien) 1989;96(l-2):39-45

Severens JL, Oerlemans HM, Weegels AJ, van 't Hof MA, Oostendorp RA, Goris RJ. Cost- effectiveness analysis of adjuvant physical or occupational therapy for patients with reflex sympathetic dystrophy. Arch Phys Med Rehabil 1999 Sep; 80(9): 103 8-43

Shackford SR, Wald SL, Ross SE, Cogbill TH, Hoyt DB, Morris JA, Mucha PA, Pachter HL, Sugerman HJ, O'Malley K, et al. The clinical utility of computed tomographic scanning and neurologic examination in the management of patients with minor head injuries. J Trauma 1992 Sep;33(3):385-94

Shalev J, Meizner I, Bar-Hava I, Dicker D, Mashiach R, Ben-Rafael Z. Predictive value of transvaginal sonography performed before routine diagnostic hysteroscopy for evaluation of infertility. Fertil Steril 2000 Feb;73(2):412-7

Sham J, Choy D, Kwong PW, Cheng AC, Kwong DL, Yau CC, Wan KY, Au GK. Radiotherapy for nasopharyngeal carcinoma: shielding the pituitary may improve therapeutic ratio. Int J Radiat Oncol Biol Phys 1994 Jul 1;29(4): 699-704

Shannon JJ, Bude RO, Orens JB, Becker FS, Whyte RI, Rubin JM, Quint LE, Martinez FJ. Endobronchial ultrasound-guided needle aspiration of mediastinal adenopathy. Am J Respir Crit Care Med 1996 Apr;153(4 Pt l):1424-30

Sharma AK, Cherry R, Fielding JW. A randomised trial of selective or routine on-table cholangiography. Ann R Coll Surg Engl 1993 Jul;75(4):245-8

Shimizu T, Salvador L, Hughes-Benzie R, Dawson L, Nimrod C, Allanson J. The role of reduced ear size in the prenatal detection of chromosomal abnormalities. Prenat Diagn 1997 Jun;17(6):545-9 131

Shin SM, Choi JK. Effect of indomethacin phonophoresis on the relief of temporomandibular joint pain. Cranio 1997 Oct;15(4):345-8

Shinghal R, Terns MK. Limitations of transperineal ultrasound-guided prostate biopsies. Urology 1999 Oct;54(4):706-8

Shipley CF 3rd, Simmons CL, Nelson GH. Comparison of transvaginal sonography with endometrial biopsy in asymptomatic postmenopausal women. J Ultrasound Med 1994 Feb;13(2):99-104

Shokeir AA, Provoost AP, el-Azab M, Dawaba M, Nijman RJ. Renal Doppler ultrasound in children with obstructive uropathy: effect of intravenous normal saline fluid load and furosemide. J Urol 1996 Oct; 156(4): 1455-8

Shonk TK, Moats RA, Gifford P, Michaelis T, Mandigo JC, Izumi J, Ross BD. Probable Alzheimer disease: diagnosis with proton MR spectroscopy. Radiology 1995 Apr;195(l):65-72

Shoup M, Hodul P, Aranha GV, Choe D, Olson M, Leya J, Losurdo J. Defining a role for endoscopic ultrasound in staging periampullary tumors. Am J Surg 2000 Jun;179(6):453-6

Silvestri GA, Hoffman BJ, Bhutani MS, Hawes RH, Coppage L, Sanders-Cliette A, Reed CE. Endoscopic ultrasound with fine-needle aspiration in the diagnosis and staging of lung cancer. Ann Thorac Surg 1996 May;61(5): 1441-5; discussion 1445-6

Simonovsky V. The diagnosis of cirrhosis by high resolution ultrasound of the liver surface. Br J Radiol 1999 Jan;72(853):29-34

Singer PS, Hopper KD, Jozefiak JA, Patrone SV, Kasales CJ, Mahraj RP, Tenhave TR, Tully DA. Extended pitch thoracic helical CT 47. Clin Imaging 1998 Jan-Feb;22(l): 11-4

Singh AK, Malpass TS, Walker G. Ultrasonic assessment of injuries to the lateral complex of the ankle. Arch Emerg Med 1990 Jun;7(2):90-4

Sirisriro R, Podoloff DA, Part YZ, Curley SA, Kasi LP, Bhadkamkar VA, Kim EE, Murray JL, Smith R, Haynie TP. 99Tcm-IMMU4 imaging in recurrent colorectal cancer: efficacy and impact on surgical management. Nucl Med Commun 1996 Jul;17(7):568-76

Sjogren M, Gustafson L, Wikkelso C, Wallin A. Frontotemporal dementia can be distinguished from Alzheimer's disease and subcortical white matter dementia by an anterior-to-posterior rCBF-SPET ratio. Dement Geriatr Cogn Disord 2000 Sep-Oct;l l(5):275-85

Smit BJ, Albrecht CF, Liebenberg RW, Kruger PB, Freestone M, Gouws L, Theron E, Bouic PJ, Etsebeth S, van Jaarsveld PP. A phase I trial of hypoxoside as an oral prodrug for cancer therapy-absence of toxicity. S Afr Med J 1995 Sep;85(9):865-70 132

Soares SR, Barbosa dos Reis MM, Camargos AF. Diagnostic accuracy of sonohysterography, transvaginal sonography, and hysterosalpingography in patients with uterine cavity diseases. Fertil Steril 2000 Feb;73(2):406-11

Sostman HD, Layish DT, Tapson VF, Spritzer CE, DeLong DM, Trotter P, MacFall JR, Patz EF Jr, Goodman PC, Woodard PK, Foo TK, Färber JL. Prospective comparison of helical CT and MR imaging in clinically suspected acute pulmonary embolism. J Magn Reson Imaging 1996 Mar-Apr;6(2):275-81

Soto JA, Barish MA, Yucel EK, Clarke P, Siegenberg D, Chuttani R, Ferrucci JT. Pancreatic duct: MR cholangiopancreatography with a three-dimensional fast spin-echo technique. Radiology 1995 Aug;196(2):459-64

Soto JA, Barish MA, Yucel EK, Siegenberg D, Ferrucci JT, Chuttani R. Magnetic resonance cholangiography: comparison with endoscopic retrograde cholangiopancreatography. Gastroenterology 1996 Feb; 110(2): 5 89-97

Specificity of screening in United Kingdom trial of early detection of breast cancer. BMJ 1992 Feb 8;304(6823):346-9

Speck O, Chang L, DeSilva NM, Ernst T. Perfusion MRI of the human brain with dynamic susceptibility contrast: gradient-echo versus spin-echo techniques. J Magn Reson Imaging 2000 Sep;12(3):381-7

Spencer JA, Chng WJ, Hudson E, Boon AP, Whelan P. Prostate specific antigen level and Gleason score in predicting the stage of newly diagnosed prostate cancer. Br J Radiol 1998 Nov;71(851):1130-5

Spies KP, Fobbe F, El-Bedewi M, Wolf KJ, Distler A, Schulte KL. Color-coded duplex sonography for noninvasive diagnosis and grading of renal artery stenosis. Am J Hypertens 1995 Dec;8(12 Pt 1): 1222-31

Spreafico C, Marchiano A, Mazzaferro V, Frigerio LF, Regalia E, Lanocita R, Patelli G, Andreola S, Garbagnati F, Damascelli B. Hepatocellular carcinoma in patients who undergo liver transplantation: sensitivity of CT with iodized oil. Radiology 1997 May;203(2):457-60

Srinivasan J, Mayberg MR, Weiss DG, Eskridge J. Duplex accuracy compared with angiography in the Veterans Affairs Cooperative Studies Trial for Symptomatic Carotid Stenosis. Neurosurgery 1995 Apr;36(4):648-53; discussion 653-5

Stafford RE, McGonigal MD, Weigelt JA, Johnson TJ. Oral contrast solution and computed tomography for blunt abdominal trauma: a randomized study. Arch Surg 1999 Jun;134(6):622-6; discussion 626-7

Stanford W, Brundage BH, MacMillan R, Chomka EV, Bateman TM, Eldredge WJ, Lipton MJ, White CW, Wilson RF, Johnson MR, et al. Sensitivity and specificity of assessing coronary 133 bypass graft patency with ultrafast computed tomography: results of a multicenter study. J Am Coll Cardiol 1988 Jul;12(l):l-7

Stark DD, Wittenberg J, Butch RJ, Ferrucci JT Jr. Hepatic metastases: randomized, controlled comparison of detection with MR imaging and CT. Radiology 1987 Nov;165(2):399-406

Starkman MN, Giordani B, Gebarski SS, Berent S, Schork MA, Schteingart DE. Decrease in cortisol reverses human hippocampal atrophy following treatment of Cushing's disease. Biol Psychiatry 1999 Dec 15;46(12): 1595-602

Stokkel MP, Bakker PF, Heine R, Schlosser NJ, Lammers JW, Van Rijk PP. Staging of lymph nodes with FDG dual-headed PET in patients with non-small-cell lung cancer. Nucl Med Commun 1999Nov;20(ll):1001-7

Stokroos RJ, Albers FW, Krikke AP, Casselman JW. Magnetic resonance imaging of the inner ear in patients with idiopathic sudden sensorineural hearing loss. Eur Arch Otorhinolaryngol 1998;255(9):433-6

Stonecipher MR, Jorizzo JL, Monu J, Walker F, Sutej PG. Dermatomyositis with normal muscle enzyme concentrations. A single-blind study of the diagnostic value of magnetic resonance imaging and ultrasound. Arch Dermatol 1994 Oct; 13 0(10): 1294-9

Straathof CS, de Bruin HG, Dippel DW, Vecht CJ. The diagnostic accuracy of magnetic resonance imaging and cerebrospinal fluid cytology in leptomeningeal metastasis. J Neurol 1999 Sep;246(9):810-4

Strandell A, Bourne T, Bergh C, Granberg S, Asztely M, Thorburn J. The assessment of endometrial pathology and tubal patency: a comparison between the use of ultrasonography and X-ray hysterosalpingography for the investigation of infertility patients. Ultrasound Obstet Gynecol 1999 Sep;14(3):200-4

Strotzer M, Gmeinwieser J, Schmidt J, Fellner C, Seitz J, Albrich H, Zirngibl H, Feuerbach S. Diagnosis of liver metastases from colorectal adenocarcinoma. Comparison of spiral-CTAP combined with intravenous contrast-enhanced spiral-CT and SPIO-enhanced MR combined with plain MR imaging. Acta Radiol 1997 Nov;38(6):986-92

Stroumbakis N, Cookson MS, Reuter VE, Fair WR. Clinical significance of repeat sextant biopsies in prostate cancer patients. Urology 1997 Mar;49(3A Suppl):l 13-8

Strupp M, Bruning R, Wu RH, Deimling M, Reiser M, Brandt T. Diffusion-weighted MRI in transient global amnesia: elevated signal intensity in the left mesial temporal lobe in 7 of 10 patients. Ann Neurol 1998 Feb;43(2): 164-70

Stumpe KD, Urbinelli M, Steinert HC, Glanzmann C, Buck A, von Schulthess GK. Whole-body positron emission tomography using fluorodeoxyglucose for staging of lymphoma: effectiveness and comparison with computed tomography. Eur J Nucl Med 1998 Jul;25(7):721-8 134

Subhannachart P, Tangthangtham A, Koanantakool T, Tungsagunwattana S. Percutaneous transthoracic needle aspiration biopsy of localized lung lesions under fluoroscopic or ultrasound guidance. J Med Assoc Thai 1999 Mar;82(3):268-74

Sugiyama M, Atomi Y. Endoscopic ultrasonography for diagnosing choledocholithiasis: a prospective comparative study with ultrasonography and computed tomography. Gastrointest Endosc 1997 Feb;45(2): 143-6

Suri R, Gupta S, Gupta SK, Singh K, Suri S. Ultrasound guided fine needle aspiration cytology in abdominal tuberculosis. Br J Radiol 1998 Jul;71(847):723-7

Surwit EA, Childers JM, Krag DN, Katterhagen JG, Gallion HH, Waggoner S, Mann WJ Jr. Clinical assessment of lllIn-CYT-103 immunoscintigraphy in ovarian cancer. Gynecol Oncol 1993 Mar;48(3):285-92

Taipale P, Hiilesmaa V. Sonographic measurement of uterine cervix at 18-22 weeks' gestation and the risk of preterm delivery. Obstet Gynecol 1998 Dec;92(6):902-7

Takac I. Analysis of blood flow in adnexal tumors by using color Doppler imaging and pulsed spectral analysis. Ultrasound Med Biol 1998 Oct;24(8):l 137-41

Takagi A, Tsurumi Y, Ishii Y, Suzuki K, Kawana M, Kasanuki H. Clinical potential of intravascular ultrasound for physiological assessment of coronary stenosis: relationship between quantitative ultrasound tomography and pressure-derived fractional flow reserve. Circulation 1999Jul20;100(3):250-5

Takahashi K, Okada S, Ozaki T, Kitao M, Sugimura K. Diagnosis of pelvic endometriosis by magnetic resonance imaging using "fat-saturation" technique. Fertil Steril 1994 Nov;62(5):973-7

Takeishi Y, Sukekawa H, Saito H, Nishimura S, Shibu T, Sasaki Y, Tomoike H. Left ventricular function and myocardial perfusion during dipyridamole infusion assessed by a single injection of 99Tcm-sestamibi in patients unable to exercise. Nucl Med Commun 1994 Sep;15(9):697-703

Tan TY. Non-contrast high resolution fast spin echo magnetic resonance imaging of acoustic schwannoma. Singapore Med J 1999 Jan;40(l):27-3

Taneja K, Sharma S, Kumar K, Rajani M. Comparison of computed tomography and cineangiography in the demonstration of central pulmonary arteries in cyanotic congenital heart disease. Cardiovasc Intervent Radiol 1996 Mar-Apr;19(2):97-100

Taneja SS, Penson DF, Epelbaum A, Handler T, Lepor H. Does site specific labeling of sextant biopsy cores predict the site of extracapsular extension in radical prostatectomy surgical specimen. J Urol 1999 Oct;162(4):1352-7; discussion 1357-8 135

Tarin JJ, Conaghan J, Winston RM, Handyside AH. Human embryo biopsy on the 2nd day after insemination for preimplantation diagnosis: removal of a quarter of embryo retards cleavage. Fertil Steril 1992 Nov;58(5):970-6

Tavassoli K, Cavalla P, Porcelli A, Surico N. Ultrasound diagnostic criteria in breast disease. Panminerva Med 1997 Sep;39(3): 178-82

Taylor AM, Thorne SA, Rubens MB, Jhooti P, Keegan J, Gatehouse PD, Wiesmann F, Grothues F, Somerville J, Penneil DJ. Coronary artery imaging in grown up congenital heart disease: complementary role of magnetic resonance and x-ray coronary angiography. Circulation 2000 Aprll;101(14):1670-8

Taylor DJ, Kemp GJ, Radda GK. Bioenergetics of skeletal muscle in mitochondrial myopathy. J Neurol Sei 1994 Dec 20; 127(2): 198-206

Taylor MJ, Denbow ML, Tanawattanacharoen S, Gannon C, Cox PM, Fisk NM. Doppler detection of arterio-arterial anastomoses in monochorionic twins: feasibility and clinical application. Hum Reprod 2000 Jul;15(7):1632-6

Tepper R, Beyth Y, Altaras MM, Zalel Y, Shapira J, Cordoba M, Cohen I. Value of sonohysterography in asymptomatic postmenopausal tamoxifen-treated patients. Gynecol Oncol 1997Mar;64(3):386-91

Tepper R, Zalel Y, Altaras M, Ben-Baruch G, Beyth Y. Transvaginal color Doppler ultrasound in the assessment of invasive cervical carcinoma. Gynecol Oncol 1996 Jan;60(l):26-9

Thomas B, Falcone RE, Vasquez D, Santanello S, Townsend M, Hockenberry S, Innes J, Wanamaker S. Ultrasound evaluation of blunt abdominal trauma: program implementation, initial experience, and learning curve. J Trauma 1997 Mar;42(3):384-8; discussion 388-90

Thompson CM, Puterman AS, Linley LL, Harm FM, van der Eist CW, Molteno CD, Malan AF. The value of a scoring system for hypoxic ischaemic encephalopathy in predicting neurodevelopmental outcome. ActaPaediatr 1997 Jul;86(7):757-61

Thourani VH, Pettitt BJ, Schmidt JA, Cooper WA, Rozycki GS. Validation of surgeon- performed emergency abdominal ultrasonography in pediatric trauma patients. J Pediatr Surg 1998 Feb;33(2):322-8

Thurfjell E, Thurfjell MG, Egge E, Bjurstam N. Sensitivity and specificity of computer-assisted breast cancer detection in mammography screening. Acta Radiol 1998 Jul;39(4):384-8

Tilanus-Linthorst MM, Obdeijn IM, Bartels KC, de Koning HJ, Oudkerk M. First experiences in screening women at high risk for breast cancer with MR imaging. Breast Cancer Res Treat 2000 Sep;63(l):53-60 136

Tomiyama H, Kimura Y, Sakuma Y, Shiojima K, Yamamoto A, Saito I, Ishikawa Y, Yoshida H, Morita S, Doba N. Effects of an ACE inhibitor and a calcium channel blocker on cardiovascular autonomic nervous system and carotid distensibility in patients with mild to moderate hypertension. Am J Hypertens 1998 Jun;l 1(6 Pt l):682-9

Tongsong T, Wanapirak C, Srisomboon J, Sirichotiyakul S, Polsrisuthikul T, Pongsatha S. Transvaginal ultrasound in threatened abortions with empty gestational sacs. Int J Gynaecol Obstet 1994 Sep;46(3):297-301

Toni D, Iweins F, von Kummer R, Busse O, Bogousslavsky J, Falcou A, Lesaffre E, Lenzi GL. Identification of lacunar infarcts before thrombolysis in the ECASS I study. Neurology 2000 Feb 8;54(3):684-8

Torstensen ET, Hollinshead RM. Comparison of magnetic resonance imaging and arthroscopy in the evaluation of shoulder pathology. J Shoulder Elbow Surg 1999 Jan-Feb;8(l):42-5

Toubert ME, Cyna-Gorse F, Zagdanski AM, Noel-Wekstein S, Cattan P, Billotey C, Sarfati E, Rain JD. Cervicomediastinal magnetic resonance imaging in persistent or recurrent papillary thyroid carcinoma: clinical use and limits. Thyroid 1999 Jun;9(6):591-7

Tremaine MD, Choroszy CJ, Gordon GH, Menking SA. Diagnosis of deep venous thrombosis by compression ultrasound in knee arthroplasty patients. J Arthroplasty 1992 Jun;7(2): 187-92

Trojano M, Avolio C, Ruggieri M, Defazio G, Giuliani F, Paolicelli D, Livrea P. Serum soluble intercellular adhesion molecule-I in MS: relation to clinical and Gd-MRI activity and to rIFN beta-lb treatment. Mult Scler 1998 Jun;4(3): 183-7

Tsuda H, Kawabata M, Kawabata K, Yamamoto K, Hidaka A, Umesaki N. Comparison between transabdominal and transvaginal ultrasonography for identifying endometrial malignancies. Gynecol Obstet Invest 1995;40(4):271-3

Tublin ME, Dodd GD 3rd, Verdile VP. Acute renal colic: diagnosis with duplex Doppler US. Radiology 1994Dec;193(3):697-701

Turner CH, Peacock M, Timmerman L, Neal JM, Johnson CC Jr. Calcaneal ultrasonic measurements discriminate hip fracture independently of bone mass. Osteoporos Int 1995 Mar;5(2): 130-5

Tuzcu EM, Berkalp B, De Franco AC, Ellis SG, Goormastic M, Whitlow PL, Franco I, Raymond RE, Nissen SE. The dilemma of diagnosing coronary calcification: angiography versus intravascular ultrasound. J Am Coll Cardiol 1996 Mar 15;27(4):832-8

Tuzel E, Sevinc M, Obuz F, Sade M, Kirkali Z. Is magnetic resonance imaging necessary in the staging of prostate cancer? Urol Int 1998;61(4):227-31 137

Ucakhan O, Salih M, Altan S, Özdemir O. Color Doppler ultrasound in ocular Behcet's disease. Eur J Ophthalmol 1997 Jul-Sep;7(3):256-61

Uggowitzer MM, Kugler C, Machan L, Hausegger KA, Groell R, Quehenberger F, Lindbichler F, Schreyer H. Value of echo-enhanced Doppler sonography in evaluation of transjugular intrahepatic portosystemic shunts. AJR Am J Roentgenol 1998 Apr; 170(4): 1041-6

Uno H, Koide T, Kuriyama M, Ban Y, Deguchi T, Kawada Y. Prostate specific antigen density for discriminating prostate cancer from benign prostatic hyperplasia in the gray zone of prostate- specific antigen. Hinyokika Kiyo 1999 Jul;45(7):457-61

Valk PE, Abella-Columna E, Haseman MK, Pounds TR, Tesar RD, Myers RW, Greiss HB, Hofer GA. Whole-body PET imaging with [18F]fluorodeoxyglucose in management of recurrent colorectal cancer. Arch Surg 1999 May;134(5):503-11; discussion 511-3

Van den Bosch T, Vandendael A, Van Schoubroeck D, Wranz PA, Lombard CJ. Combining vaginal ultrasonography and office endometrial sampling in the diagnosis of endometrial disease in postmenopausal women. Obstet Gynecol 1995 Mar;85(3):349-52 van der Wall EE, Vliegen HW, de Roos A, Bruschke AV. Magnetic resonance techniques for assessment of myocardial viability. J Cardiovasc Pharmacol 1996;28 Suppl l:S37-44 van Gils AP, van den Berg R, Falke TH, Bloem JL, Prins HJ, Dillon EH, van der Mey AG, Pauwels EK. MR diagnosis of paraganglioma of the head and neck: value of contrast enhancement. AJR Am J Roentgenol 1994 Jan;162(l): 147-53 van Heesewijk HP, van der Graaf Y, de Valois JC, Vos JA, Feldberg MA. Chest imaging with a selenium detector versus conventional film radiography: a CT-controlled study. Radiology 1996 Sep;200(3):687-90 van Lankeren W, Gussenhoven EJ, Pieterman H, van Sambeek MR, van der Lugt A. Comparison of angiography and intravascular ultrasound before and after balloon angioplasty of the femoropopliteal artery. Cardiovasc Intervent Radiol 1998 Sep-Oct;21(5):367-74 van Nagell JR Jr, DePriest PD, Reedy MB, Gallion HH, Ueland FR, Pavlik EJ, Kryscio RJ. The efficacy of transvaginal sonographic screening in asymptomatic women at risk for ovarian cancer. Gynecol Oncol 2000 Jun;77(3):350-6

Van Oosterhout AG, Ganzevles PG, Wilmink JT, De Geus BW, Van Vonderen RG, Twijnstra A. Sequelae in long-term survivors of small cell lung cancer. Int J Radiat Oncol Biol Phys 1996 Mar 15;34(5):1037-44

Van Overhagen H, Lameris JS, Berger MY, Tilanus HW, Van Pel R, Klooswijk AI, Schutte HE. Improved assessment of supraclavicular and abdominal metastases in oesophageal and gastro- oesophageal junction carcinoma with the combination of ultrasound and computed tomography. Br J Radiol 1993 Mar;66(783):203-8 138

Vanbeckevoort D, Van Hoe L, Oyen R, Ponette E, De Ridder D, Deprest J. Pelvic floor descent in females: comparative study of colpocystodefecography and dynamic fast MR imaging. J Magn Reson Imaging 1999 Mar;9(3):373-7

Vanharanta H, Ohnmeiss DD, Aprill CN. Vibration pain provocation can improve the specificity of MR! in the diagnosis of symptomatic lumbar disc rupture. Clin J Pain 1998 Sep;14(3):239-47

Varghese JC, Farrell MA, Courtney G, Osborne H, Murray FE, Lee MJ. A prospective comparison of magnetic resonance cholangiopancreatography with endoscopic retrograde cholangiopancreatography in the evaluation of patients with suspected biliary tract disease. Clin Radiol 1999 Aug;54(8):513-20

Vassiliades VG, Foley WD, Alarcon J, Lawson T, Erickson S, Kneeland JB, Steinberg HV, Bernardino ME. Hepatic metastases: CT versus MR imaging at 1.5T. Gastrointest Radiol 1991 Spring;16(2):159-63

Vetto J, Pommier R, Schmidt W, Wachtel M, DuBois P, Jones M, Thurmond A. Use of the "triple test" for palpable breast lesions yields high diagnostic accuracy and cost savings. Am J Surg 1995 May;169(5):519-22

Vetto JT, Pommier RF, Schmidt WA, Eppich H, Alexander PW. Diagnosis of palpable breast lesions in younger women by the modified triple test is accurate and cost-effective. Arch Surg 1996 Sep;131(9):967-72; discussion 972-4

Vieweg J, Teh C, Freed K, Leder RA, Smith RH, Nelson RH, Preminger GM. Unenhanced helical computerized tomography for the evaluation of patients with acute flank pain. J Urol 1998 Sep; 160(3 Pt l):679-84

Vijverberg PL, Giessen MC, Kurth KH, Dabhoiwala NF, de Reijke TM, van den Tweel JG. Is preoperative transrectal ultrasonography of value in localised prostatic carcinoma? A blind comparative study between preoperative transrectal ultrasonography and the histopathological radical prostatectomy specimen. Eur J Surg Oncol 1992 Oct;18(5):449-55

Villalta S, Prandoni P, Cogo A, Bagatella P, Piccioli A, Bemardi E, Simioni P, Scarano L, Girolami A. The utility of non-invasive tests for detection of previous proximal-vein thrombosis. Thromb Haemost 1995 Apr;73(4):592-6

Vitola JV, Delbeke D, Sandier MP, Campbell MG, Powers TA, Wright JK, Chapman WC, Pinson CW. Positron emission tomography to stage suspected metastatic colorectal carcinoma to the liver. Am J Surg 1996 Jan;171(l):21-6

Volpe CM, Abdel-Nabi HH, Kulaylat MN, Doerr RJ. Results of immunoscintigraphy using a cocktail of radiolabeled monoclonal antibodies in the detection of colorectal cancer. Ann Surg Oncol 1998 Sep;5(6):489-94 139 von Kummer R, Bourquain H, Bastianello S, Bozzao L, Manelfe C, Meier D, Hacke W. Early prediction of irreversible brain damage after ischemic stroke at CT. Radiology 2001 Apr;219(l):95-100

Vuento MH, Pirhonen JP, Makinen JI, Tyrkko JE, Laippala PJ, Gronroos M, Salmi TA. Screening for endometrial cancer in asymptomatic postmenopausal women with conventional and colour Doppler sonography. Br J Obstet Gynaecol 1999 Jan; 106(1): 14-20

Wagner UA, Diedrich V, Schmitt O. Determination of skeletal maturity by ultrasound: a preliminary report. Skeletal Radiol 1995 Aug;24(6):417-20

Walker S, Haun W, Clark J, McMillin K, Zeren F, Gilliland T. The value of limited computed tomography with rectal contrast in the diagnosis of acute appendicitis. Am J Surg 2000 Dec;180(6):450-4; discussion 454-5

Waller D, Clarke S, Tsang G, Rajesh P. Is there a role for video-assisted thoracoscopy in the staging of non-small cell lung cancer? Eur J Cardiothorac Surg 1997 Aug;12(2):214-7

Walli R, Michl GM, Muhlbayer D, Brinkmann L, Goebel FD. Effects of troglitazone on insulin sensitivity in HIV-infected patients with protease inhibitor-associated diabetes mellitus. Res Exp Med (Berl) 2000 Apr;199(5):253-62

Wang Q, Takashima S, Fukuda H, Takayama F, Kobayashi S, Sone S. Detection of medullary thyroid carcinoma and regional lymph node metastases by magnetic resonance imaging. Arch Otolaryngol Head Neck Surg 1999 Aug;125(8):842-

Wax JR, Royer D, Mather J, Chen C, Aponte-Garcia A, Steinfeld JD, Ingardia CJ. A preliminary study of sonographic grading of fetal intracardiac echogenic foci: feasibility, reliability and association with aneuploidy. Ultrasound Obstet Gynecol 2000 Aug;16(2):123-7

Webb WR, Gatsonis C, Zerhouni EA, Heelan RT, Glazer GM, Francis IR, McNeil BJ. CT and MR imaging in staging non-small cell bronchogenic carcinoma: report of the Radiologie Diagnostic Oncology Group. Radiology 1991 Mar;178(3):705-13

Weber W, Young C, Abdel-Dayem HM, Sfakianakis G, Weir GJ, Swaney CM, Gates M, Stokkel MP, Parker A, Hines H, Khanvali B, Liebig JR, Leung AN, Sollitto R, Caputo G, Wagner HN Jr. Assessment of pulmonary lesions with 18F-fluorodeoxyglucose positron imaging using coincidence mode gamma cameras. J Nucl Med 1999 Apr;40(4):574-8

Wegener WA, Petrelli N, Serafini A, Goldenberg DM. Safety and efficacy of arcitumomab imaging in colorectal cancer after repeated administration. J Nucl Med 2000 Jun;41(6): 1016-20

Weishaupt D, Zanetti M, Boos N, Hodler J. MR imaging and CT in osteoarthritis of the lumbar facet joints. Skeletal Radiol 1999 Apr;28(4):215-9 140

Weng E, Tran L, Rege S, Safa A, Sadeghi A, Juillard G, Mark R, Santiago S, Brown C, Mandelkern M. Accuracy and clinical impact of mediastinal lymph node staging with FDG-PET imaging in potentially resectable lung cancer. Am J Clin Oncol 2000 Feb;23(l):47-52

Westra SJ, Hurteau J, Galindo A, McNitt-Gray MF, Boechat MI, Laks H. Cardiac electron-beam CT in children undergoing surgical repair for pulmonary atresia. Radiology 1999 Nov;213(2):502-12

Whitlock JP, Evans AJ, Burrell HC, Pinder SE, Ellis 10, Blarney RW, Wilson AR. Digital imaging improves upright stereotactic core biopsy of mammographic microcalcifications. Clin Radiol 2000 May; 5 5 (5): 3 74-7

Whitmore SE, Anhalt GJ, Provost TT, Zacur HA, Hamper UM, Helzlsouer KJ, Rosenshein NB. Serum CA-125 screening for ovarian cancer in patients with dermatomyositis. Gynecol Oncol 1997 May;65(2):241-4

Wiebe S, Lee DH, Karlik SJ, Hopkins M, Vandervoort MK, Wong CJ, Hewitt L, Rice GP, Ebers GC, Noseworthy JH. Serial cranial and spinal cord magnetic resonance imaging in multiple sclerosis. Ann Neural 1992 Nov;32(5):643-50

William RR, Hussein SS, Jeans WD, Wali YA, Lamki ZA. A prospective study of soft-tissue ultrasonography in sickle cell disease patients with suspected osteomyelitis. Clin Radiol 2000 Apr;55(4):307-10

Wilson GR, McLean NR, Chippindale A, Campbell RS, Soames JV, Reed MF. The role of MRI scanning in the diagnosis of cervical lymphadenopathy. Br J Plast Surg 1994 Apr;47(3): 175-9

Winchester PA, Lee HM, Khilnani NM, Wang Y, Trost DW, Bush HL Jr, Sos TA. Comparison of two-dimensional MR digital subtraction angiography of the lower extremity with x-ray angiography. J Vase Interv Radiol 1998 Nov-Dec;9(6):891-9

Wintersperger BJ, Penzkofer HV, Knez A, Weber J, Reiser MF. Multislice MR perfusion imaging and regional myocardial function analysis: complimentary findings in chronic myocardial ischemia. Int J Card Imaging 1999 Dec;15(6):425-34

Winzelberg GG, Grossman SJ, Rizk S, Joyce JM, Hill JB, Atkinson DP, Sudina K, Anderson K, McElwain D, Jones AM. Indium-Ill monoclonal antibody B72.3 scintigraphy in colorectal cancer. Correlation with computed tomography, surgery, histopathology, immunohistology, and human immune response. Cancer 1992 Apr 1;69(7): 1656-63

Wittram C, Whitehouse GH, Williams JW, Bucknall RC. A comparison of MR and CT in suspected sacroiliitis. J Comput Assist Tomogr 1996 Jan-Feb;20(l):68-72

Wolansky LJ, Bardini JA, Cook SD, Zimmer AE, Sheffet A, Lee HJ. Triple-dose versus single- dose gadoteridol in multiple sclerosis patients. J Neuroimaging 1994 Jul;4(3):141-5 141

Wolf JS Jr, Cher M, Dall'era M, Presti JC Jr, Hricak H, Carroll PR. The use and accuracy of cross-sectional imaging and fine needle aspiration cytology for detection of pelvic lymph node metastases before radical prostatectomy. J Urol 1995 Mar;153(3 Pt 2):993-9

Wolzt M, Schmetterer L, Rheinberger A, Salomon A, Unfried C, Breiteneder H, Ehringer H, Eichler HG, Fercher AF. Comparison of non-invasive methods for the assessment of haemodynamic drug effects in healthy male and female volunteers: sex differences in cardiovascular responsiveness. Br J Clin Pharmacol 1995 Apr;39(4):347-59

Wong TW, Lau CC, Yeung A, Lo L, Tai CM. Efficacy of transabdominal ultrasound examination in the diagnosis of early pregnancy complications in an emergency department. J Accid Emerg Med 1998 May; 15(3): 155-8

Wood SL, St Onge R, Connors G, Elliot PD. Evaluation of the twin peak or lambda sign in determining chorionicity in multiple pregnancy. Obstet Gynecol 1996 Jul;88(l):6-9

Woodard PK, Slone RM, Sagel SS, Fleishman MJ, Gutierrez FR, Reiker GG, Pilgram TK, Jost RG. Detection of CT-proved pulmonary nodules: comparison of selenium-based digital and conventional screen-film chest radiographs. Radiology 1998 Dec;209(3):705-9

Wu CH, Lee MM, Huang KC, Ko JY, Sheen TS, Hsieh FJ. A probability prediction rule for malignant cervical lymphadenopathy using sonography. Head Neck 2000 May;22(3):223-8

Wyse LJ, Jones M, Mandel F. Relationship of glycosylated hemoglobin, fetal macrosomia, and birthweight macrosomia. Am J Perinatol 1994 Jul;l l(4):260-2

Xiong J, Rao S, Gao JH, Woldorff M, Fox PT. Evaluation of hemispheric dominance for language using functional MRI: a comparison with positron emission tomography. Hum Brain Mapp 1998;6(l):42-58

Yamaguchi A, Ishida T, Nishimura G, Kanno M, Kosaka T, Yonemura Y, Izumi R, Miyazaki I, Matsui O. Detection by CT during arterial portography of colorectal cancer metastases to liver. Dis Colon Rectum 1991 Jan;34(l):37-40

Yang WT, Lam WW, Yu MY, Cheung TH, Metreweli C. Comparison of dynamic helical CT and dynamic MR imaging in the evaluation of pelvic lymph nodes in cervical carcinoma. AJR Am J Roentgenol 2000 Sep;175(3):759-66

Yang WT, Suen M, Ahuja A, Metreweli C. In vivo demonstration of microcalcification in breast cancer using high resolution ultrasound. Br J Radiol 1997 Jul;70(835):685-90

Yang Y, Wen H, Mattay VS, Balaban RS, Frank JA, Duyn JH. Comparison of 3D BOLD functional MRI with spiral acquisition at 1.5 and 4.0 T. Neuroimage 1999 Apr;9(4):446-51 142

Yao Z, Liu XJ, Shi RF, Dai R, Zhang S, Liu YZ, Tian YQ, Zhang XL. A comparison of 99Tcm- MIBI myocardial SPET and electron beam computed tomography in the assessment of coronary artery disease in two different age groups. Nucl Med Commun 2000 Jan;21(l):43-8

Yarnold JR, Bamber JC, Gibbs J. Tumour growth delay as a clinical endpoint for the measurement of radiation response. Radiother Oncol 1986 Mar;5(3):207-14

Yeh PH, Jaw WC, Wang TC, Yen TY. Evaluation of iliopsoas compartment disorders by computed tomography. Zhonghua Yi Xue Za Zhi (Taipei) 1995 Feb;55(2): 172-9

Yen YF, Han KF, Daniel BL, Heiss S, Birdwell RL, Herfkens RJ, Sawyer-Glover AM, Glover GH. Dynamic breast MRI with spiral trajectories: 3D versus 2D. J Magn Reson Imaging 2000 Apr;ll(4):351-9

Yeoh EE, Botten R, Russo A, McGowan R, Fräser R, Roos D, Penniment M, Borg M, Sun W. Chronic effects of therapeutic irradiation for localized prostatic carcinoma on anorectal function. Int J Radiat Oncol Biol Phys 2000 Jul 1;47(4):915-24

Yeung CK, Tarn YH, Chan YL, Lee KH, Metreweli C. A new management algorithm for impalpable undescended testis with gadolinium enhanced magnetic resonance angiography. J Urol 1999 Sep;162(3 Pt 2):998-1002

Young GR, Humphrey PR, Shaw MD, Nixon TE, Smith ET. Comparison of magnetic resonance angiography, duplex ultrasound, and digital subtraction angiography in assessment of extracranial internal carotid artery stenosis. J Neurol Neurosurg Psychiatry 1994 Dec;57(12): 1466-78

Yucha CB, Russ P, Baker S. Detecting i.v. infiltrations using a Venoscope. J Intraven Nurs 1997 Jan-Feb;20(l):50-5

Zannetti S, De Rango P, Parente B, Parlani G, Verzini F, Maselli A, Nardelli L, Cao P. Role of duplex scan in endoleak detection after endoluminal abdominal aortic aneurysm repair. Eur J Vase Endovasc Surg 2000 May;19(5):531-5

Zarahn E, Aguirre G, D'Esposito M. A trial-based experimental design for fMRI. Neuroimage 1997 Aug;6(2): 122-38

Zardawi IM, Hearnden F, Meyer P, Trevan B. Ultrasound-guided fine needle aspiration cytology of impalpable breast lesions in a rural setting. Comparison of cytology with imaging and final outcome. Acta Cytol 1999 Mar-Apr;43(2):163-8

Zhou XH, Gatsonis CA. A simple method for comparing correlated ROC curves using incomplete data. Stat Med 1996 Aug 15; 15(15): 1687-93 143

Zidi SH, Prat F, Le Guen O, Rondeau Y, Rocher L, Fritsch J, Choury AD, Pelletier G. Use of magnetic resonance cholangiography in the diagnosis of choledocholithiasis: prospective comparison with a reference imaging method. Gut 1999 Jan;44(l):118-22

Zielke A, Hasse C, Sitter H, Rothmund M. Influence of ultrasound on clinical decision making in acute appendicitis: a prospective study. Eur J Surg 1998 Mar;164(3):201-9

Zimmermann P, Alback T, Koskinen J, Vaalamo P, Tuimala R, Ranta T. Doppler flow velocimetry of the umbilical artery, uteroplacental arteries and fetal middle cerebral artery in prolonged pregnancy. Ultrasound Obstet Gynecol 1995 Mar;5(3): 189-97

Zinzani PL, Zompatori M, Bendandi M, Battista G, Fanti S, Barbieri E, Gherlinzoni F, Rimondi MR, Frezza G, Pisi P, Merla E, Gozzetti A, Canini R, Monetti N, Babini L, Tura S. Monitoring bulky mediastinal disease with gallium-67, CT-scan and magnetic resonance imaging in Hodgkin's disease and high-grade non-Hodgkin's lymphoma. Leuk Lymphoma 1996 Jun;22(l- 2):131-5

Zizka J, Elias P, Krajina A, Michl A, Lojik M, Ryska P, Maskova J, Hulek P, Safka V, Vanasek T, Bukac J. Value of Doppler sonography in revealing transjugular intrahepatic portosystemic shunt malfunction: a 5-year experience in 216 patients. AJR Am J Roentgenol 2000 Jul;175(l):141-8

Zonderland HM, Coerkamp EG, Hermans J, van de Vijver MJ, van Voorthuisen AE. Diagnosis of breast cancer: contribution of US as an adjunct to mammography. Radiology 1999 Nov;213(2):413-22

Zook PD, Winter TC 3rd, Nyberg DA. Iliac angle as a marker for Down syndrome in second- trimester fetuses: CT measurement. Radiology 1999 May;211(2):447-51

Zuberi SM, Matta N, Nawaz S, Stephenson JB, McWilliam RC, Hollman A. Muscle ultrasound in the assessment of suspected neuromuscular disease in childhood. Neuromuscul Disord 1999 Jun;9(4):203-7 144

Articles Reviewed-Psychological

Acklin, Marvin W.; McDowell II, Claude J.; Verschell, Mark S.; Chan, Darryl. Interobserver agreement, intraobserver reliability, and the Rorschach comprehensive system. Journal of Personality Assessment. 2000 Feb Vol 74(1) 15-47

Aertgeerts, Bert; Buntinx, F.; Bande-Knops, J.; Vandermeulen, C; Roelants, M.; Ansoms, S.; Fevery, J. The value of CAGE, CUGE, and AUDIT in screening for alcohol abuse and dependence among college freshmen. Alcoholism: Clinical & Experimental Research. 2000 Jan Vol 24(1) 53-57

Andersson, Helle W. The Fagan Test of Infant Intelligence: Predictive validity in a random sample. Psychological Reports. 1996 Jun Vol 78(3, Pt 1) 1015-1026

Appleby, Lawrence; Dyson, Vida; Airman, Edward; Luchins, Daniel J. Assessing substance use in multiproblem patients: Reliability and validity of the Addiction Severity Index in a mental hospital population. Journal of Nervous & Mental Disease. 1997 Mar Vol 185(3) 159-165

Baird, Gillian; Charman, Tony; Baron-Cohen, Simon; Cox, Anthony; Swettenham, John; Wheelwright, Sally; Drew, Auriol. A screening instrument for autism at 18 months of age: A 6- year follow-up study. Journal of the American Academy of Child & Adolescent Psychiatry. 2000 Jun Vol 39(6) 694-702

Balogh, Deborah W.; Merritt, Rebecca D.; Steuerwald, Brian L. Concurrent validity for the Rust Inventory of Schizotypal Cognitions. British Journal of Clinical Psychology. 1991 Nov Vol 30(4) 378-380

Barry, Kristen L.; Fleming, Michael F. The Alcohol Use Disorders Identification Test (AUDIT) and the SMAST-13: Predictive validity in a rural primary care sample. Alcohol & Alcoholism. 1993 Jan Vol 28(1) 33-42

Basco, Monica Ramirez; Bostic, Jeff Q.; Davies, Dona; Rush, A. John; Witte, Brad; Hendrickse, William; Barnett, Vicki. Methods to improve diagnostic accuracy in a community mental health setting. American Journal of Psychiatry. 2000 Oct Vol 157(10) 1599-1605

Bebbington, Paul; Nayani, Tony. The Psychosis Screening Questionnaire. International Journal of Methods in Psychiatric Research. 1995 Apr Vol 5( 1)11-19

Beidel, Deborah C; Turner, Samuel M.; Stanley, Melinda A.; Dancu, Constance V. The Social Phobia and Anxiety Inventory: Concurrent and external validity. Behavior Therapy. 1989 Sum Vol 20(3) 417-427

Bernstein, David E. The science of forensic psychiatry and psychology. Psychiatry, Psychology & Law. 1995 Apr Vol 2(1) 75-80 145

Berument, Sibel Kazak; Rutter, Michael; Lord, Catherine; Pickles, Andrew; Bailey, Anthony. Autism screening questionnaire: Diagnostic validity. British Journal of Psychiatry. 1999 Nov Vol 175 444-451

Berwick, Donald M.; Murphy, Jane M.; Goldman, Paula A.; Ware, John E.; et al. Performance of a five-item mental health screening test. Medical Care. 1991 Feb Vol 29(2) 169-176

Birtchnell, John; Evans, Chris; Deahl, Martin; Masters, Nigel. The Depression Screening Instrument (DSI): A device for the detection of depressive disorders in general practice. Journal of Affective Disorders. 1989 Mar-Jun Vol 16(2-3) 269-281

Bisson, Jocelyn; Nadeau, Louise; Demers, Andree. The validity of the CAGE scale to screen for heavy drinking and drinking problems in a general population survey. Addiction. 1999 May Vol 94(5)715-722

Blais, Mark A.; Hilsenroth, Mark J.; Fowler, J. Christopher Diagnostic efficiency and hierarchical functioning of the DSM-IV borderline personality disorder criteria. Journal of Nervous & Mental Disease. 1999 Mar Vol 187(3) 167-173

Bohn, Michael J.; Babor, Thomas F.; Kranzler, Henry R. The alcohol use disorders identification test (AUDIT): Validation of a screening instrument for use in medical settings. Journal of Studies on Alcohol. 1995 Jul Vol 56(4) 423-432

Borofsky, Gerald L. Assessing the likelihood of reliable workplace behavior: Further contributions to the validation of the Employee Reliability Inventory. Psychological Reports. 1992 Apr Vol 70(2) 563-592

Borofsky, Gerald L.; Smith, Michael. Reductions in turnover, accidents, and absenteeism: The contribution of a pre-employment screening inventory. Journal of Clinical Psychology. 1993 Jan Vol 49(1) 109-116

Boswell, Donald L.; Tarver, Phoebe J.; Simoneaux, John C. The Psychological Screening Inventory's usefulness as a screening instrument for adolescent inpatients. Journal of Personality Assessment. 1994 Apr Vol 62(2) 262-268

Boulet, Jack; Boss, Marvin W. Reliability and validity of the Brief Symptom Inventory. Psychological Assessment. 1991 Sep Vol 3(3) 433-437

Bradley, Katharine A.; Boyd-Wickizer, Jodie; Powell, Suzanne H.; Burman, Marcia L. Alcohol screening questionnaires in women: A critical review. JAMA: Journal of the American Medical Association. 1998 Jul Vol 280(2) 166-171

Braeunig, Peter; Krueger, Stephanie; Shugar, Gerald; Hoeffler, Juergen; Boerner, Ingrid. The Catatonia Rating Scale I~development, reliability, and use. Comprehensive Psychiatry. 2000 Mar-Apr Vol 41(2) 147-158 146

Bronisch, Thomas; Mombour, W. The modern assessment of personality disorders: II. Reliability and validity of personality disorders. Psychopathology. 1998 Nov-Dec Vol 31(6) 293-301

Brooks, David A.; Williams, Ronald N.; Dean, Raymond S.; Wood, Tina ML; et al. The predictive validity of a neuropsychological screening measure. International Journal of Neuroscience. 1990 Mar Vol 51(1-2) 83-88

Butcher, James N.; Graham, John R. The MMPI-2: A new standard for personality assessment and research in counseling settings. Measurement & Evaluation in Counseling & Development. 1994 Oct Vol 27(3) 131-150

Butcher, James N.; Perry, Julia N.; Atlis, Mera M. Validity and utility of computer-based test interpretation. Psychological Assessment. 2000 Mar Vol 12(1) 6-18

Byrne, Joseph M.; Backman, Joan E.; Bawden, Harry N. Minnesota Child Development Inventory: A normative study. Canadian Psychology. 1995 May Vol 36(2) 115-130

Camara, Wayne J.; Schneider, Dianne L. Integrity tests: Facts and unresolved issues. American Psychologist. 1994 Feb Vol 49(2) 112-119

Canivez, Gary L. Validity and diagnostic efficiency of the Kaufman Brief Intelligence Test in reevaluating students with learning disability. Journal of Psychoeducational Assessment. 1996 Mar Vol 14(1)4-19

Canivez, Gary L. Validity of the Kaufman Brief Intelligence Test: Comparisons with the Wechsler Intelligence Scale for Children—Third Edition. Assessment. 1995 Jun Vol 2(2) 101- 111

Carey, Kate B.; Cocco, Karen M.; Simons, Jeffrey S. Concurrent validity of clinicians' ratings of substance abuse among psychiatric outpatients. Psychiatric Services. 1996 Aug Vol 47(8) 842- 847

Carter, Denis R.; Ruiz, Nicholas J. Discriminant validity and reliability studies on the Sexual Addiction Scale of the Disorders Screening InventorySexual Addiction & Compulsivity. 1996 Vol 3(4) 332-340

Castlebury, Frank D.; Hilsenroth, Mark J.; Handler, Leonard; Durham, Thomas W. Use of the MMPI-2 Personality Disorder Scales in the assessment of DSM-IV antisocial, borderline, and narcissistic personality disorders. Assessment. 1997 Jun Vol 4(2) 155-168

Chaffee, C. Anne; Cunningham, Charles E.; Secord-Gilbert, Margaret; Elbard, Heather; et al. Screening effectiveness of the Minnesota Child Development Inventory expressive and receptive language scales: Sensitivity, specificity, and predictive value. Psychological Assessment. 1990 Mar Vol 2(1) 80-85 147

Chen, Wei I; Faraone, Stephen V.; Biederman, Joseph; Tsuang, Ming T. Diagnostic accuracy of the Child Behavior Checklist scales for attention-deficit hyperactivity disorder: A receiver- operating characteristic analysis. Journal of Consulting & Clinical Psychology. 1994 Oct Vol 62(5) 1017-1025

Cherpitel, Cheryl J. Screening for alcohol problems in the U.S. general population: A comparison of the CAGE and TWEAK by gender, ethnicity, and services utilization. Journal of Studies on Alcohol. 1999 Sep Vol 60(5) 705-711

Chouinard, Marie-Josee; Braun, Claude M. A meta-analysis of the relative sensitivity of neuropsychological screening tests. Journal of Clinical & Experimental Neuropsychology. 1993 Jul Vol 15(4) 591-607

Clarke, David M.; McKenzie, Dean P. A caution on the use of cut-points applied to screening instruments or diagnostic criteria. Journal of Psychiatric Research. 1994 Mar-Apr Vol 28(2) 185- 188

Colligan, Robert C; Davis, Leo J.; Morse, Robert M.; Offord, Kenneth P. Screening medical patients for alcoholism with the MMPI: A comparison of seven scales. Journal of Clinical Psychology. 1988 Jul Vol 44(4) 582-592

Cooper, Lucy; Peters, Lorna; Andrews, Gavin. Validity of the Composite International Diagnostic Interview (CIDI) psychosis module in a psychiatric setting. Journal of Psychiatric Research. 1998 Nov-Dec Vol 32(6) 361-368

Craig, Robert J. Sensitivity of MCMI-III scales T (drugs) and B (alcohol) in detecting substance abuse. Substance Use & Misuse. 1997 Aug Vol 32(10) 1385-1393

David, Sandra K.; Riley, William T. The relationship of the Allen Cognitive Level Test to cognitive abilities and psychopathology. American Journal of Occupational Therapy. 1990 Jun Vol 44(6) 493-497 de las Cuevas, Carlos; Sanz, Emilio J.; de la Fuente, Juan A.; Padilla, Jonathan; Berenguer, Juan C. The Severity of Dependence Scale (SDS) as screening test for benzodiazepine dependence: SDS validation study. Addiction. 2000 Feb Vol 95(2) 245-250

Dembo, Richard; Schmeidler, James; Borden, Pollyanne; Sue, Camille Chin; Manning, Darrell. Use of the POSIT among arrested youths entering a juvenile assessment center: A replication and update. Journal of Child & Adolescent Substance Abuse. 1997 Vol 6(3) 19-42

Dent, Alexandra; Lincoln, Nadina B. Screening for memory problems in multiple sclerosis. British Journal of Clinical Psychology. 2000 Sep Vol 39(3) 311-315

Diagnosing dementia: Interrater reliability assessment and accuracy of the NINCDS/ADRDA criteria versus CERAD histopathological criteria for Alzheimer's disease. Dementia & Geriatric Cognitive Disorders. 2000 Mar-Apr Vol 11(2) 107-113 148

Douglas, Kevin S.; Ogloff, James R. P.; Nicholls, Tonia L.; Grant, Isabel. Assessing risk for violence among psychiatric patients: The HCR-20 violence risk assessment scheme and the Psychopathy Checklist: Screening Version. Journal of Consulting & Clinical Psychology. 1999 Dec Vol 67(6) 917-930

Drebing, Charles E.; Van Gorp, Wilfred G.; Stuck, Andreas E.; Mitrushina, Maura; et al. Early detection of cognitive decline in higher cognitively functioning older adults: Sensitivity and specificity of a neuropsychological screening battery. Neuropsychology. 1994 Jan Vol 8(1) 31- 38

Dyson, Vida; Appleby, Lawrence; Altman, Edward; Doot, Martin; Luchins, Daniel J.; Delehant, Michelle. Efficiency and validity of commonly used substance abuse screening instruments in public psychiatric patients. Journal of Addictive Diseases. 1998 Vol 17(2) 57-76

El-Bassel, Nabila; Schilling, Robert F.; Schinke, Steven; Orlandi, Mario; et al. Assessing the utility of the Drug Abuse Screening Test in the workplace. Research on Social Work Practice. 1997 Jan Vol 7(1) 99-114

Epstein, Michael H.; Hamiss, Mark K.; Pearson, Nils; Ryser, Gail. The Behavioral and Emotional Rating Scale: Test-retest and inter-rater reliability. Journal of Child & Family Studies. 1999 Sep Vol 8(3) 319-327

Erford, Bradley T.; Bagley, Donna L.; Hopper, James A.; Lee, Ramona M.; Panagopulos, Kathleen A.; Preller, Denise B. Reliability and validity of the Math Essential Skill Screener- Elementary Version (MESS-E).Psychology in the Schools. 1998 Apr Vol 35(2) 127-135.

Fals-Stewart, William. Ability of counselors to detect cognitive impairment among substance- abusing patients: An examination of diagnostic efficiency. Experimental & Clinical Psychopharmacology. 1997 Feb Vol 5(1) 39-50

Fertig, Joanne B.; Allen, John P.; Cross, Gerald M. CAGE as a predictor of hazardous alcohol consumption in U.S. Army personnel. Alcoholism: Clinical & Experimental Research. 1993 Dec Vol 17(6) 1184-1887

Flanagan, Dawn P.; McGrew, Kevin S.; Abramowitz, Elaine; Untiedt, Stephanie; et al. Improvement in academic screening instruments? A concurrent validity investigation of the K- FAST, MBA, and WRAT-3. Journal of Psychoeducational Assessment. 1997 Jun Vol 15(2) 99- 112

Foa, Edna B.; Cashman, Laurie; Jaycox, Lisa; Perry, Kevin. The validation of a self-report measure of posttraumatic stress disorder: The Posttraumatic Diagnostic Scale. Psychological Assessment. 1997 Dec Vol 9(4) 445-451 149

Forth, Adelle E.; Brown, Shelley L.; Hart, Stephen D.; Hare, Robert D. The assessment of psychopathy in male and female noncriminals: Reliability and validity. Personality & Individual Differences. 1996 May Vol 20(5) 531-543

Frank, Rowena M.; Bryne, Gerard J. The clinical utility of the Hopkins Verbal Learning Test as a screening test for mild dementia. International Journal of Geriatric Psychiatry. 2000 Apr Vol 15(4)317-324

Fuhrer, Rebecca; Ritchie, Karen. "Screening for dementia: A comparison of two tests using Receiver Operating Characteristic (ROC) analysis": Comment. International Journal of Geriatric Psychiatry. 1993 Oct Vol 8(10) 867-868

Garrison, Carol Z.; Addy, Cheryl L.; Jackson, Kirby L.; McKeown, Robert E.; et al. The CES-D as a screen for depression and other psychiatric disorders in adolescents. Journal of the American Academy of Child & Adolescent Psychiatry. 1991 Jul Vol 30(4) 636-641

Ghubash, R; Daradkeh, T.; El-Rufaie, O. F.; Abou-Saleh, M. T. A comparison of the validity of two psychiatric screening questionnaires: The Arabic General Health Questionnaire (AGHQ) and Self-Reporting Questionnaire (SRQ-10) in UAE, using receiver operating characteristic (ROC) analysis. European Psychiatry. 2001 Mar Vol 16(2) 122-126

Glascoe, Frances Page; Byrne, Karen E. The accuracy of three developmental screening tests. Journal of Early Intervention. 1993 Fal Vol 17(4) 368-379

Goethe, John W.; Fischer, Edward H. Validity of the diagnostic interview schedule for detecting alcoholism in psychiatric inpatients. American Journal of Drug & Alcohol Abuse. 1995 Nov Vol 21(4) 565-571

Gottlieb, Gary L.; Gur, Raquel E.; Gur, Ruben C. Reliability of psychiatric scales in patients with dementia of the Alzheimer type. American Journal of Psychiatry. 1988 Jul Vol 145(7) 857- 860

Gray, Shelley; Plante, Elena; Vance, Rebecca; Henrichsen, Mary. The diagnostic accuracy of four vocabulary tests administered to preschool-age children. Language, Speech, & Hearing Services in Schools. 1999 Apr Vol 30(2) 196-206

Green, Herman G; Zar, Jerrold H. A BASIC program to compute the sensitivity, specificity, and predictive values of screening tests. Educational & Psychological Measurement. 1989 Spr Vol 49(1) 147-150

Gross, Kristin; Keyes, Megan D.; Greene, Roger L. Assessing depression with the MMPI and MMPI-2. Journal of Personality Assessment. 2000 Dec Vol 75(3) 464-477

Gureje, O.; Obikoya, B. The GHQ-12 as a screening tool in a primary care setting. Social Psychiatry & Psychiatric Epidemiology. 1990 Sep Vol 25(5) 276-280 150

Hanion, Mark J.; Larson, Stephen; Zacher, Sandy. The Minnesota SOST and sexual reoffending in North Dakota: A retrospective study. International Journal of Offender Therapy & Comparative Criminology. 1999 Mar Vol 43(1) 71-77

Hays, Ron D.; Merz, Jon F.; Nicholas, Ronald. Response burden, reliability, and validity of the CAGE, Short MAST, and AUDIT alcohol screening measures. Behavior Research Methods, Instruments & Computers. 1995 May Vol 27(2) 277-280

Heard, Sharon A.; Ayers, Jerry B. Validity of the American College Test in predicting success on the Pre-Professional Skills TestEducational & Psychological Measurement. 1988 Spr Vol 48(1) 197-200

Hedges, Larry V.; Hasselblad, Vic. Meta-analysis of screening and diagnostic testsPsychological Bulletin. 1995 Jan Vol 117(1) 167-178

Hiatt, Deirdre; Hargrave, George E. Predicting job performance problems with psychological screening. Journal of Police Science & Administration. 1988 Jun Vol 16(2) 122-135

Hilsenroth, Mark J.; Fowler, J. Christopher; Padawer, Justin R. The Rorschach Schizophrenia Index (SCZI): An examination of reliability, validity, and diagnostic efficiency. Journal of Personality Assessment. 1998 Jun Vol 70(3) 514-534

Hirschfeld, Robert M. A.; Williams, Janet B. W.; Spitzer, Robert L.; Calabrese, Joseph R.; Flynn, Laurie; Keck, Paul E., Jr.; Lewis, Lydia; McElroy, Susan L.; Post, Robert M.; Rapport, Daniel J.; Russell, James M.; Sachs, Gary S.; Zajecka, John. Development and validation of a screening instrument for bipolar spectrum disorder: The Mood Disorder Questionnaire. American Journal of Psychiatry. 2000 Nov Vol 157(11) 1873-1875

Hogan, Robert; Hogan, Joyce; Roberts, Brent W. Personality measurement and employment decisions: Questions and answers. American Psychologist. 1996 May Vol 51(5) 469-477

Hogervorst, Eef; Barnetson, L.; Jobst, K. A.; Nagy, Zs.; Combrinck, M.; Smith, A. D.

Holden, Ronald R; Grigoriadis, Sophie. Psychometric properties of the Holden Psychological Screening Inventory for a psychiatric offender sample. Journal of Clinical Psychology. 1995 Nov Vol 51(6) 811-819

Holden, Ronald R; Starzyk, Katherine B.; McLeod, Lindsay D.; Edwards, Melanie J. Comparisons among the Holden Psychological Screening Inventory (HPSI), the Brief Symptom Inventory (BSI), and the Balanced Inventory of Desirable Responding (BIDR). Assessment. 2000 Jun Vol 7(2) 163-175

Honts, Charles R. Counterintelligence Scope Polygraph (CSP) test found to be poor discriminator. Forensic Reports. 1992 Apr-Jun Vol 5(2) 215-218 151

Huppert, Felicia A.; Jorm, Anthony F.; Brayne, Carol; Girling, Deborah M.; et al. Psychometric properties of the CAMCOG and its efficacy in the diagnosis of dementia. Aging, Neuropsychology, & Cognition. 1996 Sep Vol 3(3) 201-214

Hyde, Clive E. The Manchester Scale: A standardised psychiatric assessment for rating chronic psychotic patients. British Journal of Psychiatry. 1989 Nov Vol 155(Suppl 7) 45

Hyler, Steven E.; Skodol, Andrew E.; Oldham, John M.; Kellman, H. David; et al. Validity of the Personality Diagnostic Questionnaire-Revised: A replication in an outpatient sample. Comprehensive Psychiatry. 1992 Mar-Apr Vol 33(2) 73-77

Inwald, Robin E.; Brockwell, Albert L. Predicting the performance of government security personnel with the IPI and MMPI. Journal of Personality Assessment. 1991 Jun Vol 56(3) 522- 535

Inwald, Robin. How to evaluate psychological/honesty tests. Personnel Journal. 1988 May Vol 67(5) 40-46

Isohanni, Matti; Winblad, Ilkka; Nieminen, Pentti; Hiltunen, Pirkko; Spalding, Michael. A short DSM-III-R-based diagnostic instrument for screening mental disorders in geriatric institutions. International Psychogeriatrics. 1996 Fal Vol 8(3) 459-468

Iverson, Grant L.; Franzen, Michael D.; Demarest, David S.; Hammond, Jeffrey A. Neuropsychological screening in correctional settings. Criminal Justice & Behavior. 1993 Dec Vol 20(4) 347-358

Ivnik, Robert J.; Smith, Glenn E.; Petersen, Ronald C; Boeve, Bradley F.; Kokmen, Emre; Tangalos, Eric G. Diagnostic accuracy of four approaches to interpreting neuropsychological test data. Neuropsychology. 2000 Apr Vol 14(2) 163-177

James, John A.; Boake, Corwin. MMPI profiles of child abusers and neglecters. International Journal of Family Psychiatry. 1988 Vol 9(4) 351-371

Jason, Leonard A.; Ropacki, Michael T.; Santoro, Nicole B.; Richman, Judith A.; Heatherly, Wendy; Taylor, Renee; Ferrari, Joseph R.; Haney-Davis, Trina M.; Rademaker, Alfred; Dupuis, Josee; Golding, Jacqueline; Plioplys, Audrius V.; Plioplys, Sigita. A screening instrument for chronic fatigue syndrome: Reliability and validity. Journal of Chronic Fatigue Syndrome. 1997 Vol 3(1) 39-59

Johnson, Edward E.; Hamer, Robert M.; Nora, Rena M. The Lie/Bet questionnaire for screening pathological gamblers: A follow-up study. Psychological Reports. 1998 Dec Vol 83(3, Pt 2) 1219-1224

Johnson, Sally; Barrett, Paula M.; Dadds, Mark R.; Fox, Tara; Shortt, Alison The Diagnostic Interview Schedule for Children, Adolescents, and Parents: Initial reliability and validity data. Behaviour Change. 1999 Vol 16(3) 155-164 152

Jorgenson, Christabel B.; Jorgenson, David E. Concurrent and predictive validity of an early childhood screening test. International Journal of Neuroscience. 1996 Jan-Apr Vol 84(1-4) 97- 102

Journal of Clinical Psychology. 1993 May Vol 49(3) 372-382

Kabacoff, Robert I.; Segal, Daniel L.; Hersen, Michel; Van Hasselt, Vincent B. Psychometric properties and diagnostic utility of the Beck Anxiety Inventory and the State-Trait Anxiety Inventory with older adult psychiatric outpatients. Journal of Anxiety Disorders. 1997 Jan-Feb Vol 11(1) 33-47

Kauhanen, Jussi; Julkunen, Juhani; Salonen, Jukka T. Validity and reliability of the Toronto Alexithymia Scale (TAS) in a population study. Journal of Psychosomatic Research. 1992 Oct Vol 36(7) 687-694

Kazmierczak, David A.; Smyth, Nancy J.; Wodarski, John S. Screening and assessment instruments for alcohol and other drugs. Family Therapy. 1999 Vol 26(2) 103-119

Keenan, P. A.; Lachar, David. Screening preschoolers with special problems: Use of the Personality Inventory for Children (PIC). Journal of School Psychology. 1988 Spr Vol 26(1) 1- 11

Kelley, Patricia L.; Jacobs, Rick R.; Farr, James L. Effects of multiple administrations of the MMPI for employee screening. Personnel Psychology. 1994 Fal Vol 47(3) 575-591

Kilpatrick, David A.; Lewandowski, Lawrence J. Validity of screening tests for learning disabilities: A comparison of three measures. Journal of Psychoeducational Assessment. 1996 Mar Vol 14(1)41-53

Klin, Ami; Lang, Jason; Cicchetti, Domenic V.; Volkmar, Fred R. Brief report: Interrater reliability of clinical diagnosis and DSM-IV criteria for autistic disorder: Results of the DSM-IV Autism Field Trial. Journal of Autism & Developmental Disorders. 2000 Apr Vol 30(2) 163-167

Koeter, Maarten W. Validity of the GHQ and SCL Anxiety and Depression scales: A comparative study. Journal of Affective Disorders. 1992 Apr Vol 24(4) 271-279

Kogan, Evan S.; Kabacoff, Robert I.; Hersen, Michel; Van Hasselt, Vincent B. Clinical cutoffs for the Beck Depression Inventory and the Geriatric Depression Scale with older adult psychiatric outpatientsJournal of Psychopathology & Behavioral Assessment. 1994 Sep Vol 16(3) 233-242

Kresanov, K; Tuominen, J.; Piha, J.; Almqvist, F. Validity of child psychiatric screening methods. European Child & Adolescent Psychiatry. 1998 Jun Vol 7(2) 85-95

1 153

Kujala, Paeivi; Portin, R.; Ruutiainen, J. The progress of cognitive decline in multiple sclerosis: A controlled 3-year follow-up. Brain. 1997 Feb Vol 120(2) 289-297

Langevin, Ron; Wright, Percy; Handy, Loraine. Use of the MMPI and its derived scales with sex offenders: II. Reliability and criterion validity. Annals of Sex Research. 1990 Vol 3(4) 453-486

Laprise, Rejeanne; Vezina, Jean. Diagnostic performance of the Geriatric Depression Scale and the Beck Depression Inventory with nursing home residents. Canadian Journal on Aging. 1998 Win Vol 17(4) 401-413

Lavigne, John V.; Arend, Richard; Rosenbaum, Diane; Sinacore, James; et al. Interrater reliability of the DSM-III--R with preschool children. Journal of Abnormal Child Psychology. 1994 Dec Vol 22(6) 679-690

Lease, Suzanne H.; Yanico, Barbara J. Evidence of validity for the Children of Alcoholics Screening Test. Measurement & Evaluation in Counseling & Development. 1995 Jan Vol 27(4) 200-210

Leon, Andrew C; Portera, Laura; Olfson, Mark; Kathol, Roger; Färber, Leslie; , Kira N.; Sheehan, David V. Diagnostic errors of primary care screens for depression and panic disorder. International Journal of Psychiatry in Medicine. 1999 Vol 29(1) 1-11

Leon, Andrew C; Portera, Laura; Olfson, Mark; Weissman, Myrna M.; Kathol, Roger G.; Färber, Leslie; Sheehan, David V.; Pleil, Andreas M. False positive results: A challenge for psychiatric screening in primary care. American Journal of Psychiatry. 1997 Oct Vol 154(10) 1462-1464

Maki, Naruhiko; Ikeda, Manabu; Hokoishi, Kazuhiko; Nebu, Akihiko; Komori, Kenjiro; Shigenobu, Kazue; Fukuhara, Ryuji; Hirono, Nobutsugu; Nakata, Hideki; Tanabe, Hirotaka. Validity of the Short-Memory Questionnaire in vascular dementia. International Journal of Geriatric Psychiatry. 2000 Dec Vol 15(12) 1143-1146

Mann, A. H.; Raven, P.; Pilgrim, J.; Khanna, S.; Velayudham, A.; Suresh, K. P.; Channabasavanna, S. M.; Janca, A.; Sartorius, N. An assessment of the Standardized Assessment of Personality as a screening instrument for the International Personality Disorder Examination: A comparison of informant and patient assessment for personality disorder. Psychological Medicine. 1999 Jul Vol 29(4) 985-989

Margolis, Ronald B.; Williger, Nancy R.; Greenlief, Catherine L.; Dunn, Edwin J.; et al. The sensitivity of the Bender-Gestalt Test as a screening instrument for neuropsychological impairment in older adults. Journal of Psychology. 1989 Mar Vol 123(2) 179-186

Martin, Christopher S.; Pollock, Nancy K.; Bukstein, Oscar G.; Lynch, Kevin G. Inter-rater reliability of the SCID alcohol and substance use disorders section among adolescents. Drug & Alcohol Dependence. 2000 May Vol 59(2) 173-176 154

Martin, Scott L.; Terris, William. Predicting infrequent behavior: Clarifying the impact on false- positive rates. Journal of Applied Psychology. 1991 Jun Vol 76(3) 484-487

Matson, Johnny L.; Bamburg, Jay W. Reliability of the Assessment of Dual Diagnosis (ADD). Research in Developmental Disabilities. 1998 Jan-Feb Vol 19(1) 89-95

Matson, Johnny L.; Smiroldo, Brandi B.; Hastings, Theresa L. Validity of the Autism/Pervasive Developmental Disorder subscale of the Diagnostic Assessment for the Severely Handicapped- II. Journal of Autism & Developmental Disorders. 1998 Feb Vol 28(1) 77-81

McCormack, Arlene; Selvaggio, Marialena. Screening for pedophiles in youth-oriented community agencies.

McDaniel, Michael A. Biographical constructs for predicting employee suitability. Journal of Applied Psychology. 1989 Dec Vol 74(6) 964-970

McGraw, Kenneth O.; Gordji, Sohrab; Wong, S. P. How many subjects to screen? A practical procedure for estimating multivariate normal probabilities for correlated variables. Journal of Consulting & Clinical Psychology. 1994 Oct Vol 62(5) 960-964

McNiel, Dale E.; Binder, Renee L. Screening for risk of inpatient violence: Validation of an actuarial tool. Law & Human Behavior. 1994 Oct Vol 18(5) 579-586

Merson, S.; Tyrer, P.; Duke, Peter J.; Henderson, F. Interrater reliability of ICD-10 guidelines for the diagnosis of personality disorders. Journal of Personality Disorders. 1994 Sum Vol 8(2) 89- 95

Miller, Lucy J.; Lemerand, Pamela A.; Schouten, Peter G. Interpreting evidence of predictive validity for developmental screening tests. Occupational Therapy Journal of Research. 1990 Mar-Apr Vol 10(2) 74-86

Moriarty, Anthony. Deterring the molester and abuser: Pre-employment testing for child and youth care workersChild & Youth Care Quarterly. 1990 Spr Vol 19(1) 59-66

Musselman, Dominique L.; Hawthorne, Colleen N.; Stoudemire, Alan. Screening for delirium: A means to improved outcome in hospitalized elderly patients. Reviews in Clinical Gerontology. 1997 Aug Vol 7(3) 235-256

Myerholtz, Linda; Rosenberg, Harold. Screening college students for alcohol problems: Psychometric assessment of the SASSI-2. Journal of Studies on Alcohol. 1998 Jul Vol 59(4) 439-446

Naugle, Richard I.; Chelune, Gordon J.; Tucker, Gretchen D. Validity of the Kaufman Brief Intelligence Test. Psychological Assessment. 1993 Jun Vol 5(2) 182-186 1

155

O'Donnell, William E.; de Soto, Clinton B.; de Soto, Janet L. Validity and reliability of the revised Neuropsychological Impairment Scale (NIS).

Oldfield, Kenneth; Hutchinson, Janet R. Predictive validity of the Graduate Record Examination with and without range restrictions. Psychological Reports. 1997 Aug Vol 81(1) 211-220

Oliveri, Stefano; Carpenter, Ian G.; Demopoulos, Gill. Validity and reliability of the Winchester Disability Rating Scale (2): A comprehensive screening instrument for the elderly in the community. Gerontology. 1994 Nov-Dec Vol 40(6) 319-324

Parikh, Rajesh M.; Eden, Dianne T.; Price, Thomas R.; Robinson, Robert G. The sensitivity and specificity of the Center for Epidemiologie Studies Depression Scale in screening for post-stroke depression. International Journal of Psychiatry in Medicine. 1988 Vol 18(2) 169-181

Patrick, Jayne. Validation of the MCMI-1 Borderline Personality Disorder scale with a well- defined criterion sample. Journal of Clinical Psychology. 1993 Jan Vol 49(1) 28-32

Patrick, Jayne; Links, Paul; Van Reekum, Rob; Mitton, M. Janice E.; et al. Using the PDQ--R BPD scale as a brief screening measure in the differential diagnosis of personality disorder. Journal of Personality Disorders. 1995 Fal Vol 9(3) 266-274

Perry, Adrienne; Veleno, Pasquale; Factor, David. Inter-rater agreement between direct care staff and psychologists for the diagnosis of autism according to DSM-III, DSM-III-R, and DSM-IV. Journal on Developmental Disabilities. 1998 Aug Vol 6(1) 32-43

Pomeroy, Ian M.; Clark, Christopher R; Philp, Ian. The effectiveness of very short scales for depression screening in elderly medical patients. International Journal of Geriatric Psychiatry. 2001 Mar Vol 16(3) 321-326

Prewett, Peter N. A comparison of two screening tests (the Matrix Analogies Test-Short Form and the Kaufman Brief Intelligence Test) with the WISC-III. Psychological Assessment. 1995 Mar Vol 7(1) 69-72

Rafilson, Fred M.; Sison, Ray. Seven criterion-related validity studies conducted with the National Police Officer Selection Test. Psychological Reports. 1996 Feb Vol 78(1) 163-176

Razavi, Darius; Delvaux, Nicole; Farvacques, Christine; Robaye, Edmond. Screening for adjustment disorders and major depressive disorders in cancer in-patients. British Journal of Psychiatry. 1990 Jan Vol 156 79-83

Rogers, Richard; Ustad, Karen L.; Salekin, Randall T. Convergent validity of the Personality Assessment Inventory: A study of emergency referrals in a correctional setting. Assessment. 1998 Mar Vol 5(1) 3-12

^d 156

Ross, Helen E.; Gavin, Douglas R.; Skinner, Harvey A. Diagnostic validity of the MAST and the Alcohol Dependence Scale in the assessment of DSM-III alcohol disorders. Journal of Studies on Alcohol. 1990 Nov Vol 51(6) 506-513

Roy-Byrne, Peter P.; Russo, Joan. Interrater agreement among psychiatrists regarding emergency psychiatric assessments. American Journal of Psychiatry. 1999 Nov Vol 156(11) 1840-1841

Sauser, William I.; Hornsby, Jeffrey S.; Benson, Philip G. Psychometric characteristics of a preemployment screening device. Psychological Reports. 1988 Dec Vol 63(3) 971-977

Scanlan, James; Borson, Soo. The Mini-Cog: Receiver operating characteristics with expert and naieve raters. International Journal of Geriatric Psychiatry. 2001 Feb Vol 16(2) 216-222

Scarpa, Angela; Luscher, Kristen A.; Smalley, Kadee J.; Pilkonis, Paul A.; Kim, Yookyung; Williams, William C. Screening for personality disorders in a nonclinical population. Journal of Personality Disorders. 1999 Win Vol 13(4)345-360

Schade, Charles P.; Jones, Everett R, Jr.; Wittlin, Byron J. A ten-year review of the validity and clinical utility of depression screening. Psychiatric Services. 1998 Jan Vol 49(1) 55-61

Scheinberg, Zvi; Koslowsky, Meni; Bleich, Avi; Mark, Mordechai; et al. Sensitivity, specificity, and positive predictive value as measures of prediction accuracy: The case of the EAT-26. Educational & Psychological Measurement. 1993 Fal Vol 53(3) 831-839

Schmitz, N.; Kruse, J.; Heckrath, C; Alberti, L.; Tress, W. Diagnosing mental disorders in primary care: The General Health Questionnaire (GHQ) and the Symptom Check List (SCL-90- R) as screening instruments. Social Psychiatry & Psychiatric Epidemiology. 1999 Jul Vol 34(7) 360-366

Schmitz, N.; Kruse, J.; Tress, W. Application of stratum-specific likelihood ratios in mental health screening. Social Psychiatry & Psychiatric Epidemiology. 2000 Aug Vol 35(8) 375-379

Seagraves, Margaret C; Harrison, Patti L. Test reviews: Phelps Kindergarten Readiness Scale. Journal of Psychoeducational Assessment. 1994 Jun Vol 12(2) 201-204

Selzer, Rob; Hamill, Cheryl; Bowes, Glenn; Patton, George. The Branched Eating Disorders Test: Validity in a nonclinical population. International Journal of Eating Disorders. 1996 Jul Vol 20(1) 57-64

Sherman, Laurie G; Morschauser, Pamela C. Screening for suicide risk in inmates. Psychiatric Quarterly. 1989 Sum Vol 60(2) 119-138

Sherman, Tracy; Shulman, Brian B. Specificity and sensitivity ratios of the Pediatric Language Acquisition Screening Tool for Early Referral-Revised. Infant-Toddler Intervention. 1999 Dec Vol 9(4) 315-330 157

Simeon, Daphne; Guralnik, Oraa; Gross, Shira; Stein, Dan J.; Schmeidler, James; Hollander, Eric. The detection and measurement of depersonalization disorder. Journal of Nervous & Mental Disease. 1998 Sep Vol 186(9) 536-542

Skodol, Andrew E.; Hyler, Steven E.; Oldham, John M. The valid assessment of personality disorders. American Journal of Psychiatry. 1993 Dec Vol 150(12) 1905

Skodol, Andrew E.; Oldham, John M. Assessment and diagnosis of borderline personality disorder. Hospital & Community Psychiatry. 1991 Oct Vol 42(10) 1021-1028

Smith, Gillan; Fischer, Lane Assessment of juvenile sexual offenders: Reliability and validity of the Abel Assessment for Interest in Paraphilias. Sexual Abuse: Journal of Research & Treatment. 1999 Jul Vol 11(3) 207-216

Smith, Mark; Breitbart, William S.; Platt, Meredith M. A critique of instruments and methods to detect, diagnose, and rate delirium. Journal of Pain & Symptom Management. 1995 Jan Vol 10(1) 35-77

Social Casework. 1989 Jan Vol 70(1) 37-42

Solomon, Paul R.; Hirschoff, Aliina; Kelly, Bridget; Relin, Mahri; Brush, Michael; DeVeaux, Richard D.; Pendlebury, William W. A 7 minute neurocognitive screening battery highly sensitive to Alzheimer's disease. Archives of Neurology. 1998 Mar Vol 55(3) 349-355

Starcevic, Vladan; Bogojevic, Goran; Kelin, Katarina. Diagnostic agreement between the DSM- IV and ICD-10-DCR personality disorders. Psychopathology. 1997 Nov-Dec Vol 30(6) 328-334

Steer, Robert A.; Cavalieri, Thomas A.; Leonard, Douglas M.; Beck, Aaron T. Use of the Beck Depression Inventory for primary care to screen for major depression disorders. General Hospital Psychiatry. 1999 Mar-Apr Vol 21(2) 106-111

Steffens, David C; Welsh, Kathleen A.; Burke, James R.; Helms, Michael J.; et al. Diagnosis of Alzheimer's disease in epidemiologic studies by staged review of clinical data. Neuropsychiatry, Neuropsychology, & Behavioral Neurology. 1996 Apr Vol 9(2) 107-113

Stiles, Paul G.; McGarrahan, Jane F. The Geriatric Depression Scale: A comprehensive review. Journal of Clinical Geropsychology. 1998 Apr Vol 4(2) 89-110

Stokes, Alan F.; Banich, Marie T.; Elledge, Valorie C. Testing the tests: An empirical evaluation of screening tests for the detection of cognitive impairment in aviators. Aviation, Space, & Environmental Medicine. 1991 Aug Vol 62(8) 783-788

Storgaard, Heidi; Nielsen, Susanne D.; Gluud, Christian. The validity of the Michigan Alcoholism Screening Test (MAST). Alcohol & Alcoholism. 1994 Sep Vol 29(5) 493-502 158

Teare, Marijo; Fristad, Mary A.; Weiler, Elizabeth B.; Weiler, Ronald A.; Salmon, Paul. Study II: Concurrent validity of the DSM-III-R Children's Interview for Psychiatric Syndromes (ChlPS). Journal of Child & Adolescent Psychopharmacology. 1998 Vol 8(4) 213-219

Trull, Timothy J.; Larson, Stacy L. External validity of two personality disorder inventories. Journal of Personality Disorders. 1994 Sum Vol 8(2) 96-103

Uhlmann, Richard F.; Larson, Eric B. Effect of education on the Mini-Mental State Examination as a screening test for dementia. Journal of the American Geriatrics Society. 1991 Sep Vol 39(9) 876-880 van Heugten, Caroline M.; Dekker, Joost; Deelman, Betto G.; Stehmann-Saris, Fieneke C; Kinebanian, Astrid A diagnostic test for apraxia in stroke patients: Internal consistency and diagnostic value. Clinical Neuropsychologist. 1999 May Vol 13(2) 182-192 van Velzen, Carol J. M.; Luteijn, Frans; Sending, Agnes; van Hout, Wiljo J. P. J.; Emmelkamp, Paul M. G. The efficacy of the Personality Diagnostic Questionnaire—Revised as a diagnostic screening instrument in an anxiety disorder group. Clinical Psychology & Psychotherapy. 1999 Nov Vol 6(5) 395-403

Volker, Martin A.; Guarnaccia, Vincent; Scardapane, Joseph R. Short forms of the Stanford- Binet Intelligence Scale: Fourth Edition for screening potentially gifted preschoolers. Journal of Psychoeducational Assessment. 1999 Sep Vol 17(3) 226-235 von Cleve, Elizabeth; Jemelka, Ron; Trupin, Eric. Reliability of psychological test scores for offenders entering a state prison system. Criminal Justice & Behavior. 1991 Jun Vol 18(2) 159- 165

Ward, Ann; Dockerill, John. The predictive accuracy of the Violent Offender Treatment Program Risk Assessment Scale. Criminal Justice & Behavior. 1999 Mar Vol 26(1) 125-140

Watkins, Marley W. Diagnostic utility of WISC--III subtest variability among students with learning disabilities. Canadian Journal of School Psychology. 1999 Vol 15(1) 11-20

Weiler, Michael David; Bellinger, David; Simmons, Esau; Rappaport, Leonard; Urion, David K.; Mitchell, William; Bassett, Nancy; Burke, Pamela J.; Marmor, Jacilyn; Waber, Deborah. Reliability and validity of a DSM-IV based ADHD screener. Child Neuropsychology. 2000 Mar Vol 6(1) 3-23

Weinstein, Milton C; Berwick, Donald M.; Goldman, Paula A.; Murphy, Jane M.; et al. A comparison of three psychiatric screening tests using receiver operating characteristic (ROC) analysis. Medical Care. 1989 Jun Vol 27(6) 593-607

Wetzler, Scott; Khadivi, Ali; Moser, R. Kevin. The use of the MMPI-2 for the assessment of depressive and psychotic disorders. Assessment. 1998 Sep Vol 5(3) 249-261 159

Whurr, Renata; Evans, Sara. Children's Acquired Aphasia Screening Test. International Journal of Language & Communication Disorders. 1998 Vol 33(Suppl) 343-344

Wittchen, Hans-Ulrich; Boyer, Patrice. Screening for anxiety disorders: Sensitivity and specificity of the Anxiety Screening Questionnaire (ASQ--15). British Journal of Psychiatry. 1998 Jul Vol 173(Suppl 34) 10-17

Wyshak, ; Barsky, Arthur J.; Klerman, Gerald L. Comparison of psychiatric screening tests in a general medical setting using ROC analysis. Medical Care. 1991 Aug Vol 29(8) 775-785

Zimmerman, Mark; Mattia, Jill I. The reliability and validity of a screening questionnaire for 13 DSM-IV Axis I disorders (the Psychiatric Diagnostic Screening Questionnaire) in psychiatric outpatients. Journal of Clinical Psychiatry. 1999 Oct Vol 60(10) 677-683 Meta-Analytic Survey of Criterion Accuracy of Validated Polygraph Techniques

Report Prepared For

The American Polygraph Association Board of Directors

Nate Gordon, President (2010-2011)

by

The Ad-Hoc Committee on Validated Techniques

Mike Gougler, Committee Chair

Raymond Nelson, Principal Investigator

Mark Handler

Donald Krapohl

Pam Shaw

Leonard Bierman

Introduction

Introduction

Pamela Shaw President American Polygraph Association

Over the past few years the APA has carefully undertaken a number of significant initiatives that have affected its members. Prompted by the National Academy of Sciences report of 2003, and further energized by the ongoing review by the forensic science community, APA leaders and members recognized that there were real dangers ahead if we continued on the road we had always taken.

To strategically plan for and ensure our survival in the years ahead, the APA has been implementing initiatives to set in motion needed improvements. In particular, the APA is seeking to increase the level of science in our practices, to standardize our methodologies, to focus on continuous improvement, to upgrade our education, and broaden our vision to cover not only the interests of members, but to include protection of the public, as well. In other words, we are proactively pursuing increased professionalism in polygraphy.

In pursuit of this goal, the APA Strategic Plan has identified several key milestones along the way. They include measures such as development of best practice model polices, mandating continuing education, and making polygraph research articles available on the APA website for easy access by members. One of the key steps is to require APA members to use methods that could be defended by published research beginning in 2012. It is this last step that is the subject of this special edition of Polygraph.

The requirement to use validated testing methods is not a new idea, of course. Other fields such as medicine and psychology eventually came to the same conclusion, albeit many years after the fields were established. It has turned out to be a great thing for them. Try to imagine, if you can, what the fields of medicine and psychology would be like if there were no requirement to validate their methods. Validation serves a number of important functions, not the least of which is protecting the public from misuse, incompetence and . The net effect is not just to help the public but to elevate the profession; a winning solution for everyone. It is from this perspective of “enlightened self-interest” that arise the polygraph initiatives of recent years.

As we approached the year 2012 and the validation requirement for our techniques, it became clear that some APA members may not have received instruction from their polygraph schools as to which methods had scientific support. It seemed reasonable that the APA should undertake a literature review to see what evidence was available, and provide that to the membership. Earlier this year, then-President Nate Gordon established an ad hoc committee to accomplish this task. I was proud to serve with that committee. Over the following months the committee worked feverishly to complete a report that could be available to the membership by the end of the year. In the following pages you will see their findings.

As you read the committee report, there are a few caveats that you need to know. First, as the report itself says, this committee report is not APA policy, but rather a literature summary. There are other literature summaries published by APA in the past 20 years that were likewise not APA policy. The report is for information only, and may be helpful to members already using or intending to use the techniques listed.

Second, the Board has come to be aware of pending research studies on screening techniques. The Board will consider new By Law provisions on screening methodologies to encourage members to use the best methods available. Those decisions will be made in early 2012.

Polygraph, 2011, 40(4) 194 Shaw

And finally, the APA will extend a great deal of effort in its seminars to help members become familiar and competent with techniques that meet the validity requirements.

Many of you have expressed support, encouragement and championed the on-going efforts towards validation and higher standards: for that, I thank you. I would also like to acknowledge and thank the unrelenting and devoted efforts of the committee members for the countless hours they’ve dedicated to this project. Your efforts are key to our future endeavors.

We are at a great time in polygraph history and we can be proud of the steps we are taking to move our profession forward. We must all grow with the knowledge in our field and the demands within our field to ensure our future success. This report is an essential step in that direction and I am proud to provide this document to you in this special issue of Polygraph.

195 Polygraph, 2011, 40(4) Executive Summary

Executive Summary

In 2007 the American Polygraph concept is that of validation, which, as it Association (APA) adopted a Standard of applies to PDD exams, is stipulated by the Practice, effective January 1, 2012, that APA Standards of Practice (Section 3.2.10) to requires APA members to use validated refer to the combination of: 1) a test question Psychophysiological Detection of Deception format that conforms to valid principles for (PDD) examination techniques that meet target selection, question construction, and certain levels of criterion accuracy.1 Those in-test presentation of the test stimuli, and 2) requirements state that event-specific a validated method for test data analysis as it diagnostic examinations used for evidentiary applies to a specified test question format. purposes must be conducted with techniques Although many factors may affect the overall that produce a mean criterion accuracy level effectiveness of PDD examinations, these two of .90 or higher, with an inconclusive rate of parts are recognized as fundamental to the .20 or lower. Diagnostic examinations criterion accuracy of PDD examinations. The conducted using the paired-testing protocol accuracy of all tests is contingent upon these must produce a mean criterion accuracy level two activities: obtaining a sufficient quantity of .86 or higher, with inconclusive rates of .20 of diagnostic information, and interpreting the or lower. Examinations conducted for information correctly. The two-fold purpose of investigative purposes must be conducted this meta-analysis was to advise the APA and with techniques that produce a mean criterion its membership about which PDD techniques accuracy level of .80 or higher, with satisfy the standard practice requirements inconclusive rates of .20 or lower.2 The goal is that take place January 1, 2012, and to an- to eliminate the use of un-standardized, non- swer questions about our present knowledge- validated or experimental techniques in field base regarding the criterion validity of PDD settings where decisions may affect individual techniques as they are presently used. lives, community safety, professional integrity, and national security. The ad hoc committee to examine the evidence on the criterion accuracy of There exists today a confusing array of polygraph techniques was appointed by APA test question formats that are at once similar President Nate Gordon during the March 2011 and dissimilar, and for which there are also meeting of the Board of Directors. The alternatives in the selection of a method for committee was composed of Past President test data analysis. Equally confusing is the and Board Director Mike Gougler (committee abundance of published research, and the chair), Past-President and Editor-in-Chief Don meaning and applicability of that research to Krapohl, President Elect Pam Shaw, Board the techniques used in field settings. The APA Director Raymond Nelson, and APA Members Board of Directors assumed responsibility for Mark Handler and Leonard Bierman. organizing this information in the form of a systematic review and meta-analysis of the The committee also took into published scientific literature which describes consideration that there are both financial the criterion validity of presently available and proprietary issues attached to the polygraph techniques. In the course of doing formulation of such a list. The stakeholders this, it has at times been necessary to define represent a diverse group of professionals and what appear to be obvious concepts. One such interests. The effectiveness of the APA and the

1 Criterion accuracy refers generally to the degree to which a test result corresponds with what the test is designed to detect. In the field of PDD, criterion accuracy denotes the ability of a combination of testing and scoring techniques to discriminate between truthful and deceptive examinees, and ranges from 0.00 for no validity to 1.00 for perfect validity. Criterion accuracy is one form of validity, and in some research reports it may be referred to as decision accuracy, or just accuracy.

2 Near the completion of this study the APA Board of Directors enacted a change in standards, endorsing the use of PDD screening techniques for which research indicates an accuracy rate that is significantly greater than chance.

Polygraph, 2011, 40(4) 196 Ad Hoc Committee on Validated Techniques

credibility of the polygraph profession required to our ability to study questions of causality the committee to give precedence to the and construct validity. However, the accuracy and integrity of the research review generalizability of laboratory studies is over the financial and personal interests of complicated by the fact that these studies may any individual developer of PDD testing not represent the broad range of variables techniques. The committee’s default approach thought to influence the results of field was an inclusionary review process in which examinations. They are therefore presumed to any stakeholder could submit supportive data have ecological validity that is weaker, to some and information for consideration. This did unknown degree, than that of field studies. not mean that anything submitted would automatically be included or endorsed as The committee took the position that valid, but it did mean that all recommenda- both field and laboratory studies have tions would be considered. advantages and disadvantages, and that neither type alone would be sufficient to study The committee began its process with all of the issues of concern to polygraph a discussion of the merits and strengths of researchers. Both types of research are of vital laboratory and field research. Field studies are importance to the study and development of important to polygraph research as these knowledge in the polygraph profession. studies have the advantage of ecological Differences between criterion accuracy of field validity3 and are therefore assumed to have and laboratory studies were found to be increased generalizability. However, the statistically small and insignificant by the generalizability of field studies is compromised 2002 report on the polygraph by the US to some unknown extent by the selection National Research Council. For the purpose of process which necessarily depends on the reviewing the current state of validation of availability of often-incomplete confirmation existing polygraph techniques, the committee data. Real world confirmation data are gave field and laboratory studies equal selective, neither random nor representative of consideration. all data, and confirmed cases more often may have correct PDD results than do unconfirmed The committee compiled a list of cases. As a result, field studies may studies that satisfied the qualitative and overestimate PDD decision accuracy to some quantitative requirements for inclusion in the degree. While field studies are highly useful review, and for which there existed two or for studying correlations, they provide more satisfactory publications that describe imperfect measures of criterion validity. generalizable evidence of criterion validity. The committee formulated recommendations Laboratory studies are also important regarding which techniques satisfied the to polygraph research as these studies can requirements of the pending 2012 APA more easily control and reduce research and provisions for criterion accuracy. The sampling biases. Use of experimental and committee then completed a meta-analysis quasi-experimental research designs, along that bench-marked the findings against the with random sampling and random criterion 2012 APA standards for evidentiary, paired assignment, can increase the generalizability testing, and investigative examinations. In and repeatability of research results. Because addition, statistical tests were completed to of their ability to control a greater number of check for study integrity, and to search for variables, laboratory studies are fundamental inconsistencies and outlier results.

3 Ecological validity addresses how well the experimental settings, processes, subjects and materials match those in real-life conditions. Though it is not the same as external validity, greater ecological validity may provide more confidence that the findings of the study will generalize to other settings.

197 Polygraph, 2011, 40(4) Executive Summary

Inclusion in the research review measurements for PDD test reliability were required that studies in the meta-analysis be reported in the published literature, and all published in Polygraph or other peer-reviewed were accepted for inclusion.7 The committee scientific publications.4 Studies were also evaluated the reliability, generalizability considered for selection if they were published and representativeness of the sample by an academic degree-granting institution distributions through multivariate ANOVAs that was accredited by an accrediting agency using the deceptive and truthful scores. It was recognized by the US Department of expected that multiple samples drawn from Education or foreign equivalent. In addition, the same underlying population, administered research publications of studies funded by the same PDD technique, and scored with the government agencies were also considered for same TDA method would replicate among the selection. Edited academic texts, including sampling distributions of scores. It was also individual chapters, were considered. expected that aggregation of the results of However, studies available only in self- replicated sampling distributions would be published books were excluded. Additional more representative and generalizable than qualitative requirements were that selected the results from any single sampling distribu- studies must have employed a recognizable tion. In addition to sample size information, a PDD technique for which a published minimum of four statistical values were description exists for the test structure, test required for the meta-analysis: test sensitivity question sequence, and administration. and specificity, and the inconclusive rates for Studies must also have used a recognizable guilty and innocent cases. test data analysis (TDA, chart interpretation) method and included a description of the It was important to the credibility of features, numerical transformations (score the committee’s findings that studies be assignments), decision rules and normative accepted or endorsed based on the merits of data or cutscores. In addition, studies must the included studies. For this reason, access have used instrumentation and component to the published evidence and raw data was sensors that reflect common field practices. made a priority, and the list of validated Selected studies must have used a variety of techniques was constructed according to a confirmation criteria including examinee systematic review of published research. In confessions, high quality forensic evidence, or addition to a re-analysis of the study data, substantiated evidence that the crime was not studies were included when it was possible to committed.5 Study results based on samples calculate a complete dimensional profile of that were subject to experimental manipula- criterion accuracy. tion (e.g., fatigue, intoxication, programmed countermeasure use) were not included.6 Following the completion of the literature survey, a list of all identified Quantitative information requirements techniques was sent to the school directors or included some form of reliability statistics for representatives of all APA accredited each technique as evidence of the generaliza- polygraph schools. School directors or their bility of the results. Several types of statistical representatives were invited to provide any

4 The journal Polygraph instituted expert peer review in 2003. Articles published prior to that time were subject only to editorial review. Because Polygraph is an important academic and historic resource, studies published prior to 2003 and without peer review were included in this meta-analysis if they satisfied all of the other qualitative and quantitative requirements for selection.

5 One included study did not meet this requirement, and consisted only of cases confirmed by confession. Consistent with the known concern about inflated accuracy estimations resulting from over-reliance on confession confirmation for sample case selection, this study reported a near-perfect level of decision accuracy.

6 Present standards for research and publication by the APA stipulate that principal investigators should not also serve as study participants (i.e., examinees, examiners, or scorers). However, this was not a requirement in the past, and studies were not excluded from the meta-analysis based on this criterion.

7 One included technique is without published evidence of inter-scorer reliability or agreement.

Polygraph, 2011, 40(4) 198 Ad Hoc Committee on Validated Techniques

published studies or citations to published have also been accepted for publication. The studies for techniques that were not yet result of these additional efforts was that the identified. One additional technique was complete array of techniques in common use suggested for inclusion at that time.8 In a today was included in the meta-analysis. follow-up mailing, a shorter list was sent to all school directors, or representatives, and other Thirty-eight studies satisfied the researchers involved in the development or qualitative and quantitative requirements for validation of PDD techniques. This list inclusion in the meta-analysis. These studies included only those techniques for which involved 32 different samples, and described published and replicated studies were the results of 45 different experiments and identified, along with another request to surveys. These studies included 295 scorers provide any publications or citations for who provided 11,737 scored results of 3,723 techniques that were not yet included in the examinations, including 6,109 scores of 2,015 survey. Two additional studies were suggested confirmed deceptive examinations, 5,628 for inclusion at that time.9 Following the scores of 1,708 confirmed truthful exams. submission of initial results to the APA Board Some of the cases were scored by multiple of Directors, and presentation of a preliminary scorers and using multiple TDA methods. version of this Executive Summary to the APA membership, one additional, recently Table 1 at the end of this executive published, study report was submitted in summary shows the findings for those support of the IZT.10 methods for which there is published and replicated evidence of criterion validity that The committee contacted developers of satisfies the requirements of the APA PDD techniques for which there were Standards of Practice. Five PDD testing and insufficient published studies for inclusion in analysis combinations meet APA 2012 the meta-analysis, and volunteered assistance requirements for evidentiary testing, five for to anyone requesting it. Two studies were paired-testing,11 and four for investigative completed, have been accepted for examinations.12 publication, and are awaiting printing, for the Backster You-Phase technique. This technique Two PDD techniques produced was included in the meta-analysis. Additional accuracy rates that were outliers from and studies were also completed for the Air Force inconsistent with the distribution of results Modified General Question Test (AFMGQT), from all other techniques. They were the the Federal You-Phase Technique, and the Integrated Zone Comparison Technique (IZCT) Directed Lie Screening Test (DLST), which and the Matte Quadri-Track Zone Comparison

8 The Marcy Technique was suggested for inclusion. However, no published studies could be located regarding this technique.

9 The Gordon et al (2000) study of the Integrated Zone Comparison Technique (IZCT) could not be included due to a lack of adequate statistical information. The primary author informed the committee that the report was completed without him seeing the data, which belong to the intelligence service of a foreign government and therefore unavailable. Data for the Mangan, Armitage and Adams (2008) study of the Matte Quadri-Track Zone Comparison Technique were provided to the committee, and this study was included.

10 A portion of the data for the Shurani (2011) study of the IZCT was provided to the committee and this study was included.

11 All PDD techniques that meet the criterion accuracy requirement for paired-testing also meet the standard requirement for investigative testing, and those techniques that meet the standard requirement for evidentiary work also meet the requirements for paired-testing and investigative testing.

12 All techniques that employed three-position TDA methods consistently exceeded the 2012 boundary requirements for inconclusive rates (20%). Because criterion accuracy rates for techniques with three-position TDA did not differ significantly from seven-position criterion accuracy, field practices that involve an initial analysis with the three- position TDA method may be considered acceptable if inconclusive results are resolved via subsequent analysis with a TDA method that provided both accuracy and inconclusive rates that meet the requirements of the APA 2012 standards.

199 Polygraph, 2011, 40(4) Executive Summary

Technique (MQTZCT). While it is within the comparison question techniques intended for realm of possibility that these two techniques event-specific (single issue) diagnostic testing, are superior to other techniques, studies in which the criterion variance of multiple supporting them proved to have more relevant questions is assumed to be non- unresolved methodological issues than others independent,14 produced an aggregated included in this meta-analysis. In addition to decision accuracy rate of .890 (.829 - .951), the committee’s discovery of anomalous with a combined inconclusive rate of .110 sampling distributions, both of these (.047 - .173). Comparison question PDD techniques are supported by studies authored techniques designed to be interpreted with the by the developers and proprietors, and for assumption of independence of the criterion which the developer/proprietor functioned as variance of multiple relevant questions, both principal investigator and study produced an aggregated decision accuracy participant. From a scientific perspective, even rate of .850 (.773 - .926) with a combined well designed research generated by advocates inconclusive rate of .125 (.068 - .183). The of a method who have a vested interest in the combination of all validated PDD techniques, outcome, and who act as participants and excluding outlier results, produced a decision authors of the study report does not have the accuracy level of .869 (.798 - .940) with an compelling power of research not so inconclusive rate of .128 (.068 - .187). Data at encumbered by these factors. The techniques the present time are sufficient to support the have been duly included here because they polygraph as highly accurate, but insufficient met the more general requirements outlined in to support an assertion that PDD testing can the APA Standards of Practice. The committee provide perfect or near-perfect accuracy. advises that because of the potential impact on examiner effectiveness that could result Excluding outlier results, multi-variate from reliance on outlier results, it would be analysis showed there were no significant one- prudent for examiners to exercise an extra way differences in decision accuracy for any of measure of caution before accepting data from the PDD techniques that satisfy the studies showing extraordinary effects before requirements of the APA 2012 standards of they are subject to independent confirmation practice. Neither were there any significant and extended analysis. one-way differences in PDD techniques at the different levels of validation specified in the A dimensional profile of criterion APA standards of practice, or for PDD accuracy was calculated for each PDD techniques interpreted with decision rules technique, including the unweighted average based on an assumption of independence of the proportions of correct decisions for when compared with PDD techniques deceptive and truthful cases, excluding interpreted with decision rules based on an inconclusive results, along with the assumption of non-independence. This unweighted average of the proportions of illustrates that the APA categorical inconclusive results.13 Results were distinctions are arbitrary, not empirically aggregated for techniques that satisfy the APA founded, and scientifically meaningless. These 2012 requirements for evidentiary testing, data are insufficient to support the notion that paired testing, investigative testing, and for all any PDD technique is superior to another, PDD techniques included in the meta- and instead suggest that differences in PDD analysis. Excluding outlier results, question formats may be less important than

13 The unweighted average was considered to be a more conservative and realistic calculation of the overall accuracy of all PDD examination techniques. Calculation of the weighted average, or the simple proportion of correct decisions, often results in higher statistical findings that are less robust against differences in base-rates and therefore less generalizable.

14 Independence, in scientific testing, refers to assumptions about whether external factors that affect the criterion state of each question (i.e. truthfulness about past behavior) is assumed to affect the criterion state of other questions. In PDD testing, the results of multi-facet and multi-issue exams are interpreted with decision rules based on the assumption of independence, while the results of event-specific single-issue examinations are more often interpreted with decision rules based on the assumption of non-independence.

Polygraph, 2011, 40(4) 200 Ad Hoc Committee on Validated Techniques

previously assumed. Differences in PDD inclusion in the meta-analysis. Limitations of techniques may be limited to assumptions the meta-analysis are discussed in the full and procedural differences pertaining to TDA report. methodologies intended for the investigation of independent or non-independent investigation In closing, no attempt should be made target questions. This should be subject to to represent the results of this meta-analytic further study. survey as an enforceable policy or standard. Although the dissemination of a list of The present evidence supports an validated polygraph techniques, could be argument that PDD testing can provide both viewed by some as a form of de facto APA test sensitivity to deception and test specificity endorsement of those techniques, the actual to truth-telling at rates that are significantly role of this meta-analysis is as a thorough greater than chance when conducted and summary of the existing PDD literature. interpreted with the assumptions of criterion Although policies may tend to remain fixed for independence as well as non-independence periods of time, scientific evidence is among the test questions. Evidence shows continuously evolving. The committee offers that all PDD techniques included in the meta- that questions and discussions of test validity analysis provide test sensitivity at rates that are a matter of science and not mere policy. are significantly greater than chance. As such these questions are best answered by However, the present evidence is insufficient scientific evidence. This meta-analysis should to support that every PDD technique is be considered an information resource only, capable of providing test specificity to truth- and no attempt should be made to represent telling at rates that are significantly greater this list of PDD techniques as the final than chance. Details can be found in the authority on PDD test validation. Although complete report. completed with the goal of creating a comprehensive and inclusive list of validated If the present results were to be techniques, it remains possible that other considered an overestimation of PDD studies and techniques exist but have not accuracy, the major causes of that over- been included in this meta-analysis. There estimation would be deficiencies in the exists in the published literature some sampling methodologies. One such factor is evidence of validity for PDD techniques that over-reliance on case confirmation via were not able to be included in this meta- examinee confession, which may present the analytic survey. Of course, a meta-analysis potential for the systematic exclusion of false- based on different study selection or inclusion positive or false-negative errors for which no criteria may yield different results. Nothing in confession would be obtained. Another this executive summary or the complete report sampling concern would be publication or file- should be construed as preventing the use of drawer bias, in which less favorable results any PDD technique for which the criterion are not submitted for publication and thus accuracy level can be defended with scientific not available for inclusion in a meta-analysis evidence. The information herein is provided or other systematic review. Another potential to the APA Board to advise its professional cause of accuracy overestimation would be the membership of the strength of validation of lack of independence between the technique PDD techniques available at this time. This developer, principal investigator and examiner information is intended only to ease the participants in some included studies. If the burden on PDD professionals and to help present results were to be considered an make evidence-based decisions regarding the under-estimation of PDD accuracy, the major selection of PDD techniques for use in field cause might be argued to be deficiencies in settings. It may also assist program the ecological validity of experimental and administrators, policy makers, and courts to survey methodologies of the included studies. make evidence-based decisions about the The present results are intended only to informational value of PDD test results in summarize the presently available general. publications that satisfy the requirements for

201 Polygraph, 2011, 40(4) Executive Summary

Table 1. Mean (standard deviation) and {95% confidence intervals} for correct decisions (CD) and inconclusive results (INC) for validated PDD techniques. References can be found at the end of the complete report. Evidentiary Techniques/ Paired Testing Techniques/ Investigative Techniques/ TDA Method TDA Method TDA Method Federal You-Phase / ESS1 AFMGQT4,8 / ESS5 AFMGQT6,8 / 7 position CD = .904 (.032) {.841 to .966} CD = .875 (.039) {.798 to .953} CD = .817 (.042) {.734 to .900} INC = .192 (.033) {.127 to .256} INC = .170 (.036) {.100 to .241} INC = .197 (.030) {.138 to .255} Event-Specific ZCT / ESS Backster You-Phase / Backster CIT7 / Lykken Scoring CD = .921 (.028) {.866 to .977} CD = .862 (.037) {.787 to .932} CD = .823 (.041) {.744 to .903} INC = .098 (.030) {.039 to .157} INC = .196 (.040) {.117 to .275} INC = NA IZCT / Horizontal2 Federal You-Phase / 7 position DLST (TES)8 / 7 position CD = .994 (.008) {.978 to .999} CD = .883 (.035) {.813 to .952} CD = .844 (.039) {.768 to .920} INC = .033 (.019) {.001 to .069} INC = .168 (.037) {.096 to .241} INC = .088 (.028) {.034 to .142} MQTZCT / Matte3 Federal ZCT / 7 position DLST (TES)8 / ESS CD = .994 (.013) {.968 to .999} CD = .860 (.037) {.801 to .945} CD = .858 (.037) {.786 to .930} INC = .029 (.015) {.001 to .058} INC = .171 (.040) {.113 to .269} INC = .090 (.026) {.039 to .142} Utah ZCT DLT / Utah Federal ZCT / 7 pos. evidentiary CD = .902 (.031) {.841 to .962} CD = .880 (.034) {.813 to .948} - INC = .073 (.025) {.023 to .122} INC = .085 (.029) {.028 to .141} Utah ZCT PLT / Utah CD = .931 (.026) {.879 to .983} - - INC = .077 (.028) {.022 to .133} Utah ZCT Combined / Utah CD = .930 (.026) {.875to .984} - - INC = .107 (.028) {.048 to .165} Utah ZCT CPC-RCMP Series A / Utah CD = .939 (.038) {.864 to .999} - - INC = .185 (.041) {.104 to .266}

1 Empirical Scoring System.

2 Generalizability of this outlier result is limited by the fact that no measures of test reliability have been published for this technique. Also, significant differences were found in the sampling distributions of the included studies, suggesting that the samples data are not representative of each other, or that the exams were administered and/or scored differently. One of the studies involved a small sample (N = 12) that was reported in two articles, for which the participating scorer was also the technique developer. One of the publications described the study as a non- study. Both reports indicated that one of the six truthful participants was removed from the study after making a false-confession. The reported perfect accuracy rate did not include the false confession. Neither the perfect accuracy nor the .167 false-confession rate are likely to generalize to field settings.

3 Generalizability of this outlier result is limited by the fact that the developers and investigators have advised the necessity of intensive training available only from experienced practitioners of the technique, and have suggested that the complexity of the technique exceeds that which other professionals can learn from the published resources. The developer reported a near-perfect correlation coefficient of .99 for the numerical scores, suggesting an unprecedented high rate of inter-scorer agreement, which is unexpected given the purported complexity of the method. Additionally, the data initially provided to the committee for replication studies included only those cases for which the scorers arrived at the correct decision, excluding scores from those cases for which the scorers did not achieve the correct decision. Missing scores were later provided to the committee for both the Mangan et al (2008) and Shurani and Chavez (2009) studies. However, the resulting sampling means were different from those reported for both replication studies. Because of these discrepancies, the statistical analysis was not re-calculated with the missing scores, and the reported analysis reflects the sampling distribution means as reported. Sampling means for replication studies should be considered devoid of error or uncontrolled variance.

4 Two versions exist for the AFMGQT, with minor structural differences between them. There is no evidence to suggest that the performance of one version is superior to the other. Because replicated evidence would be required to reject a null-hypothesis that the differences are meaningless, and because the selected studies include a mixture of both AFMGQT versions, these results are provided as generalizable to both versions. AFMGQT exams are used in both multi-facet event-specific contexts and multi-issue screening contexts. Both multi-facet and multi-issue examinations were interpreted with decision rules based on an assumption of criterion independence among the RQs.

5 The AFMGQT produced accuracy that is satisfactory for paired testing only when scored with the Empirical Scoring System.

6 There are two techniques for which there are no published studies but which are structurally nearly identical to the AFMGQT: the LEPET and the Utah MGQT. Validity of the AFMGQT can be generalized to these techniques if scored with the same TDA methods.

7 Concealed Information Test, also referred to as the Guilty Knowledge Test (GKT) and Peak of Tension test (POT). The data used here were provided in the meta-analysis report of laboratory research by MacLaren (2001).

8 Studies for these PDD techniques were conducted using decision rules based on the assumption of criterion independence among the testing targets. Accuracy of screening techniques may be further improved by the systematic use of a successive-hurdles approach.

Polygraph, 2011, 40(4) 202 Ad Hoc Committee on Validated Techniques

Report of the Ad Hoc Committee on Validated Techniques

Abstract

Meta-analytic methods were used to calculate the effect size of validated psychophysiological detection of deception (PDD) techniques, expressed in terms of criterion accuracy. Monte Carlo methods were used to calculate statistical confidence intervals. Results were summarized for 45 different samples from experiments and surveys, including scored results from 295 scorers who provided 11,737 scored results of 3,723 examinations, including 6,109 scores of 2,015 confirmed deceptive examinations, 5,628 scores of 1,708 confirmed truthful exams. Fourteen different PDD techniques were supported by a minimum of two published studies each that satisfied the qualitative and quantitative requirements for inclusion in the meta-analysis. Results for the individual studies, and for different PDD techniques, were compared using multivariate analytic methods. Two studies produced outlier results that are not accounted for by the available evidence and which are not generalizable. Excluding outliers, there were no significant differences in criterion accuracy between any of the PDD techniques supported by the selected studies. Excluding outlier results, comparison question techniques intended for event-specific (single issue) diagnostic testing, in which the criterion variance of multiple relevant questions is assumed to be non-independent, produced an aggregated decision accuracy rate of .890 (.829 - .951), with a combined inconclusive rate of .110 (.047 - .173). Comparison question PDD techniques designed to be interpreted with the assumption of independence of the criterion variance of multiple relevant questions (multiple-issue and –facet) produced an aggregated decision accuracy rate of .850 (.773 - .926) with a combined inconclusive rate of .125 (.068 - .183). The combination of all validated PDD techniques, excluding outlier results, produced a decision accuracy of .869 (.798 - .940) with an inconclusive rate of .128 (.068 - .187).

Introduction The origins of all modern CQT formats can be traced to Reid (1947) who showed that Is the polygraph scientifically valid? some form of comparison question (CQ), How accurate is the polygraph? The simplicity intended to evoke a response from a truthful of these common questions implies that the examinee, could improve test accuracy and accuracy of psychophysiological detection of reduce the occurrence of false-positive errors. deception (PDD) tests can be described with a CQT formats will often fall into one of two simple answer or with a single numerical major families of techniques: techniques that index. The present approach to answering emerged as modifications of the technique these and other questions about criterion described by Reid (1947), and techniques that validity1 is a meta-analytic review of all emerged as modifications of the technique available studies of criterion accuracy for all described by Backster (1963). These PDD examination techniques. Because the techniques are conducted using differing, comparison question test (CQT) is the most though often similar, procedures based on commonly used and researched of all PDD differing assumptions. These different techniques, this analysis will be primarily assumptions and procedures can yield directed toward the CQT. There will be a differences in test performance or test limited discussion of research supporting the accuracy. Some techniques are highly use of the Concealed Information Test (CIT). theoretical about the exact nature and cause

1 The terms “accuracy,” “test accuracy,” “criterion accuracy,” and “criterion validity” are used interchangeably and synonymously throughout this document. The term “decision accuracy” is also used to describe criterion validity, but in a more limited sense, referring only to the accuracy of decisions, excluding inconclusive results. In a more complete sense, the term “criterion accuracy” refers to a dimensional set of concerns involving all aspects of test accuracy.

203 Polygraph, 2011, 40(4) Validated Techniques

of emotional or cognitive activity and resultant MGQT examinations are commonly psychophysiological changes. Other tech- interpreted with decision rules based on an niques have emphasized an evidence-based assumption that the criterion variance of the scientific approach which forgoes unproven relevant question (RQ) stimuli is independent. hypotheses and complex psychological As a matter of field practice, both families of assumptions about the exact thoughts and techniques have at times been used under emotions of the examinee. assumption of both independence and non- independence among the RQs. In general, the family of Zone Comparison Test2 (ZCT) formats that emerged Other Reviews from the work of Backster (1963) has been Previous systematic reviews have been used most effectively for event-specific completed in an attempt to provide objective diagnostic testing. ZCT questions are answers to the question of PDD accuracy and formulated to describe the examinee's reconcile this with claims of perfection. involvement in a single known or alleged Abrams (1973) reviewed polygraph validity behavioral issue of concern, and are studies dating back to the early part of the interpreted with decision rules based on an 20th century, and reported an accuracy rate assumption of non-independence3 of the of .980. Later, Abrams (1977) reported the criterion variance of the test questions. In average accuracy of polygraph validity studies contrast, the family of Modified General to be .910. Still later, Abrams (1989) summa- Question Test4 (MGQT) formats that has rized the accuracy of polygraph tests as .880. emerged from the work of Reid (1947) are intended to describe and evaluate the Ansley (1983) reported the results of examinee's involvement in different behavioral 1,964 laboratory cases and 1,113 field cases roles or different levels of involvement in a and described a decision accuracy level of known or alleged incident. Although research .968, excluding inconclusive results. has supported the CQT as capable of However, accuracy rates were not reported providing accuracy at levels that are separately for criterion deceptive and truthful significantly greater than chance, previous cases, and inconclusive rates were not research (Barland, Honts & Barger 1989; reported. At that time the Relevant-Irrelevant Podlesny & Truslow, 1993; Research Division technique was reported as more accurate Staff, 1995a, 1995b) has not supported the (.960) than CQT methods (.952). Ansley effectiveness of polygraph questions at (1983) reported the accuracy of CIT formats to pinpointing the exact behavioral role or level be .912. Later, Ansley (1990) summarized the of involvement within a event-specific results of 10 field examinations, involving examination. In addition to their use in multi- 2,042 criminal cases since 1980, reporting an facet investigations of known or alleged overall accuracy rate of .980 for deceptive incidents, MGQT techniques are easily cases and .970 for truthful cases, using the adapted to use in multi-issue screening decisions of the original examiners. Ansley contexts in which test questions are (1990) also described the results of 11 studies formulated to describe the examinee's possible of blind evaluations of 922 criminal involvement in several different behaviors for examinations, reporting accuracy levels of which there is no known incident or .900, with a reported accuracy rate of .940 for allegation. Both multi-facet and multi-issue deceptive cases and .890 for truthful cases.

2 Sometimes referred to as the historically correct expression “Zone Comparison Technique” as well as the “Zone of Comparison Technique.”

3 Independence, in scientific testing, refers to assumptions about whether external factors that affect the criterion state of each question (i.e. truthfulness about past behavior) is assumed affect the criterion state of other questions. In PDD testing, the results of multi-facet and multi-issue exams are interpreted with decision rules based on the assumption of independence, while the results of event-specific single-issue examinations are more often interpreted with decision rules based on the assumption of non-independence.

4 Also referred to as the Modified General Question Technique.

Polygraph, 2011, 40(4) 204 Ad Hoc Committee on Validated Techniques

Honts and Peterson (1997) and Raskin results of the polygraph, over .900, but and Honts (2002) reported the accuracy of the cautioned that offenders also claimed a false polygraph as exceeding .900. It is consistent admission rate of approximately .050. with the .900 accuracy estimation of Raskin and Podlesny (1979). In contrast, the Although valuable in some ways, none systematic review completed by the Office of of these previous surveys are capable of Technology Assessment (OTA, 1983) providing a satisfactory level of guidance suggested that laboratory studies had an regarding the American Polygraph Association average unweighted accuracy of .832 with an (APA) 2012 standard for the use of validated inconclusive rate of .269, while field studies techniques. These previous studies do not had an average unweighted accuracy rate of include information describing more recent .847 with an inconclusive rate of .042. advances in PDD criterion accuracy, and none Crewson (2001) surveyed studies of diagnostic of these surveys does an adequate job and screening polygraphs5 in a comparison providing a complete dimensional profile of with medical and psychological tests, and criterion validity of individual PDD examina- reported diagnostic polygraphs to have an tion techniques. More importantly, none of average accuracy rate of .880. Crewson also these previous surveys satisfies the need for reported the average accuracy of screening summary information regarding study polygraphs as .740. The Crewson information replication and the level of reliability and was also reported by Blackstone (2011) who generalizability of study results for individual argued that the confusion between diagnostic PDD techniques as used in field practice. and screening polygraphs was a reason the polygraph did not enjoy greater support from All previous reviews of PDD test the law.6 accuracy are unsatisfying in their ability to answer the present question regarding the The most recent scientific review was validity, criterion accuracy and reliability of completed by the National Research Council PDD techniques in use today. First, previous (NRC, 2003) who reported accuracy rates, in reviews do not address test accuracy with an terms of the area under the curve (AUC) using adequate description of the procedural the receiver operating characteristic (ROC) combination of the test question sequence and analysis, and concluded that laboratory test data analysis (TDA) method applied in the studies had an average AUC of .860 while field study, both of which are thought to have an studies had an average AUC of .890. More important impact on test effectiveness. recently, Kokish, Levenson, and Blasingame Second, and related to the first concern, is (2005) reported the results of an opinion that none of the previous reviews made an survey of sex offenders subject to polygraph effort to exclude PDD techniques that are no monitoring as a condition of supervision and longer taught at accredited training programs treatment. They reported that the offenders or have fallen out of use in the field. As a expressed a high rate of agreement with the result, a number of previous reviews are of

5 Diagnostic tests are any tests conducted in response to known problems, known symptoms, known incidents, or known allegations. Screening tests are any tests conducted in the absence of a known problem, and are intended to search for possible problems. In practice, diagnostic tests are commonly formulated around a single issue of concern. Screening tests, because of the absence of any known problems, and because of interest in several types of possible problems, are often constructed around multiple issues. The terms multi-issue and mixed-issue are used interchangeably. It is not the number of issues that defines the distinction between diagnostic and screening tests, but the presence or absence of a known problem.

6 Blackstone (2011) also confuses the distinction between diagnostic and screening polygraphs; first by using the less common terms “forensic” and “utility” instead of the more widely understood terms “diagnostic” and “screening,” and then by attempting to portray single-issue screening exams as diagnostic exams. Blackstone further states that multi-facet exams are screening exams and later that multi-facet exams are distinct from multi-issue exams in that multi-issue exams are conducted in the absence of a known issue, indicating that multi-facet exams are a type of diagnostic exam conducted in response to a known problem. In practice, the criterion variance of the RQs of both multi-issue and multi-facet polygraphs is assumed to be independent, and both types of exams are interpreted with decision rules that reflect this assumption.

205 Polygraph, 2011, 40(4) Validated Techniques

little practical use to PDD field examiners, or format which conforms to valid PDD testing program administrators, policy makers and principles, coupled with a valid method of test consumers regarding the merits of different data analysis. The combination of these two PDD techniques. core components is a recognition that a valid test must first obtain a suitable quantity of Present Objectives interpretable and meaningful (i.e., diagnostic) Of primary interest to this review are information, after which the information must those PDD techniques for which there exists be interpreted effectively. Neglecting either of evidence in support of criterion validity at these would result in unsatisfactory test levels required by the standards of practice of performance. Moreover, because the results the APA, which effective January 1, 2012, of a single un-replicated study are regarded as require the use of validated techniques. Those inconclusive in the realm of science, validated requirements state that event-specific techniques are further required to be diagnostic examinations conducted for supported by at least two publications. evidentiary purposes, for which it is expected that the results may be used as evidence in a Selecting evidenced-based techniques judiciary proceeding, should be conducted minimizes exposure that would result from using techniques that produce a criterion the use of un-standardized, un-validated, sub- accuracy level of .900 or higher, excluding optimal, or experimental methods. While the inconclusives, and with an inconclusive rate present effort was undertaken to provide of .200 or lower. Diagnostic examinations useful information in this regard, readers are conducted using the paired-testing protocol reminded that the report constitutes a can achieve a very high accuracy rate through literature review of publications available at the combination of results from examinations the time the report was issued. New conducted with techniques that produce a instrumentation, new validity research and mean criterion accuracy level of .860 or new methods of analysis will become available higher, excluding inconclusives, and with in the future. In light of continuing inconclusive rates of .200 or lower. advancements in the field, the findings in this Examinations conducted for investigative report should be considered a reference, and purposes should be conducted with not a policy of the APA. techniques that produce a mean criterion accuracy level of .800 or higher, excluding The ethics of test administration were inconclusives, and with inconclusive rates of not addressed in this meta-analysis. .200 or lower.7 Validated techniques are Discussions of PDD test accuracy can further required to be supported by published sometimes digress into discussions of the and replicated scientific studies. To be ethics surrounding the procedures for test generalizable, the studies should be based on administration of probable-lie CQT formats, samples that are representative of the general for which it is considered necessary to population. psychologically maneuver the examinee in order to achieve a satisfactory level of test In addition to specifying requirements specificity to truthfulness and to constrain for criterion accuracy of PDD examination false-positive errors to minimal levels. This techniques, the APA has adopted a standard discussion can also lead to unproductive, and specifying that a validated technique consists indeed avoidant, deflections about increased of the combination of a test question sequence incremental validity (i.e., “test utility”) of

7 Near the completion of this report the APA Board of Directors proposed a change to the Standards of Practice specific to screening techniques because of the paucity of available research in this area despite the importance of this application to law enforcement and national security. The proposal would permit the use of screening techniques if research indicates an accuracy significantly greater than chance, and recommends the use of a successive hurdles approach to minimize errors. Because the proposal had not been voted prior to the completion of this report, no additional analyses of screening methods using the proposed standards are included here.

Polygraph, 2011, 40(4) 206 Ad Hoc Committee on Validated Techniques

decisions made by professional consumers of measures lies. Lies are amorphous and PDD tests.8 therefore cannot be measured per se. Deception is a temporal act, and PDD exams, Discussions of PDD test accuracy may like many other scientific tests, are scored also encompass the ethical complications of numerically by measuring or observing the conducting a test for which the examiner and examinee's response to the test stimuli.9 examinee purposes may be dissimilar. For Deception or truthfulness is inferred example: the examinee’s desire is to generate statistically, by observing or measuring the data and test results to exonerate himself responses to several iterations of the test from further scrutiny, while the purpose to stimuli, aggregating the responses, and then the examiner is often to psychologically using structured decision rules to interpret leverage a confession of guilt or disclosure of the result through comparison with normative information from deceptive examinees. In its data. most limited and un-scientific use the polygraph can become little more than a prop Construct validity can also refer to the to enhance the effectiveness of an interview or correctness of assumptions about the function interrogation, with little or no concern for the of the structural components of the PDD test: test result. It is important when reporting the whether the questions function as intended. criterion accuracy of PDD examinations to Several studies have investigated the limit the focus to only those issues pertaining construct validity of various types of test to the level of accuracy of polygraph decisions questions. For example: overall truth rather than merely confession rates. In other questions have been shown not to function as words, how effective are modern polygraph intended, Hilliard (1979) and Abrams (1984) techniques at correctly classifying examinees both providing insight into the general as deceptive or truthful? complications and concerns pertaining to questions of intent. Likewise, symptomatic Test accuracy, in a scientific sense, questions, intended to test for or correct for means several things and can convey outside issues, have been shown to not information about many important concerns. function as intended or reputed (Honts, Amato Foundational among these concerns is the & Gordon, 2004; Krapohl & Ryan, 2001;), issue of construct validity, which, in its most despite some weak evidence of support in an simplistic representation, refers to the early study (Capps, Knill & Evans, 1993). correctness of the underlying constructs, Technical questions designed to test for, or principles, or ideas on which a test is make use of, esoteric phenomena such as a constructed. Simply stated, construct validity guilt-complex have been shown to not refers to whether the PDD test does what it is function effectively (Podlesny, Raskin & intended to do. As a practical matter, PDD Barland, 1976). Similarly, sacrifice questions, tests are often referred to as lie-detection regarding the examinee's intent to answer tests, therefore, a broad formulation of the truthfully regarding the RQs, have been question of construct validity would involve shown not to function as intended (Capps, whether a PDD test actually tests for or 1991; Horvath, 1994).

8 The term incremental validity is preferred to the older term utility because it implies an expectation for empirical evidence of increased decision accuracy or decision effectiveness on the part of consumers of PDD test results, as a result of information gained from the polygraph, and not a mere assumption that all information will prove helpful or useful.

9 Deception during PDD exams is inferred empirically in that PDD studies have shown that deception and truth- telling can be determined with the CQT at rates that are significantly greater than chance as a function of the differential magnitude of response to relevant and comparison stimuli. Differences in response magnitude are thought to be a function of the salience of the stimuli. Persons who are being deceptive regarding the relevant stimuli are expected to show responses of generally larger magnitude to relevant than comparison stimuli, while persons who are truthful regarding the relevant stimuli are expected to show generally larger responses to comparison stimuli.

207 Polygraph, 2011, 40(4) Validated Techniques

Although hypotheses are abundant, seek a balance of test sensitivity and test scientific studies have been unable to show specificity. evidence of construct validity for the array of technical questions, with the exception of one. This meta-analytic survey is intended The CQ is generally capable of producing to summarize the present state of existing larger reactions from truthful persons than published scientific evidence of criterion RQs. While construct validity is an important validity of PDD examination techniques, and concern, this meta-analysis addressed to provide guidance regarding those criterion validity, the ability to differentiate techniques which can be expected to reliably deception from truth-telling at a practical provide criterion accuracy that satisfies the level, with an emphasis on the identification of APA's requirements for precision at the PDD techniques for which there exists evidentiary, paired testing, and investigative published and replicated evidence in support levels. Therefore, this study is limited to those of test accuracy. It did not address questions techniques for which the available evidence pertaining to construct validity. supports their criterion validity, and does not include PDD techniques with un-replicated Criterion accuracy of CQT methods research, or techniques for which there is used in PDD examinations cannot be replicated evidence at levels that do not satisfy adequately described by a single numerical the requirements of the APA standards.11 value. Instead, criterion validity for CQT PDD examinations is the result of an interaction of Method several dimensions of concern, including correct and false hits for deceptive decisions, A literature survey was conducted to and correct and false hits for truthful identify published studies that provided decisions. In field practice, test results are usable information regarding the criterion interpreted or expressed in terms of the accuracy of identified PDD techniques. The presence or absence of significant reactions results of un-replicated studies are not useful indicative of deception. Categorical test in meta-analytic research, and were therefore results for individual cases are either positive not included. As a practical decision this or negative,10 and the criterion validity of a meant that at least two published studies test is estimated by partitioning the results of were required for the inclusion of an sample cases into true positives, false examination technique in the meta-analysis. positives, true negatives and false negatives. Examination techniques were retained in the Attempts to describe criterion accuracy are meta-analysis if published and replicated further complicated by inconclusive results, studies were identified in support of the for which sample results are partitioned into validity of a technique, and if the aggregated inconclusive results for deceptive and truthful results of the included studies indicated a groups. In addition to the challenges of reliable and generalizable level of accuracy measuring and describing estimates of PDD consistent with the requirements of the APA test accuracy, different polygraph techniques Standards of Practice for evidentiary testing, achieve different dimensional profiles of paired testing, or investigative testing. criterion accuracy. Some techniques may be However, it was not a requirement that intended to provide high test sensitivity at the individual studies produce criterion accuracy expense of other dimensions of test accuracy, at the levels specified by the APA. while other techniques may be designed to

10 Although PDD test results are interpreted, in field practice, for the presence or absence of significant indicators of deception, the results of scientific tests are often discussed in value-neutral language. Positive test results are designated to signify the presence of the issue or concern that is being tested. Negative test results signify the absence of the concern or issue.

11 Because research is ongoing in all fields of science, and because standards of practice undergo periodic review and necessary modification, readers are reminded that other PDD techniques may satisfy the present requirements of the APA standards. Field examiners, program administrators and quality assurance reviewers are advised to evaluate this information with awareness for new and emerging standards and information.

Polygraph, 2011, 40(4) 208 Ad Hoc Committee on Validated Techniques

It was deemed important to be as both qualitative and quantitative. Some inclusive as possible with stakeholders (i.e., studies were not designed to function as school directors and developers of PDD studies of criterion validity, but were intended techniques) in the selection of studies and to investigate specific research questions, techniques for the meta-analysis as it was such as the effects of countermeasures on anticipated that the work of many PDD field PDD accuracy. Studies designed to examine examiners and trainers may be affected by the causality and construct questions may not be results and recommendations of this meta- useful for answering questions about criterion analysis. To accomplish this, a long-list of all accuracy, however, studies were included if identifiable PDD techniques was assembled they provided sufficient information to and disseminated to all APA accredited calculate the criterion accuracy of a survey polygraph schools in late March 2011 using sample or normal control group that was not the contact information listed at the APA subject to experimental manipulation beyond website (www.polygraph.org). School directors truthfulness or deception. or their representatives were invited to advise the committee of any techniques which had The APA president and the committee not yet been identified, and to provide either chairperson expressed to the committee citations or copies of studies that could be members that additional studies should be accessed and reviewed in support of the considered for inclusion in the meta-analysis suggested techniques.12 if there was sufficient time to complete the review and publication process prior to the In early June 2011 a short-list was completion of the meta-analysis. Several disseminated to all APA accredited polygraph studies were subsequently completed and schools, again using the contact information submitted for peer-review and publication. at the APA website, including both techniques Results from those studies were included once and citations for which published and the studies were accepted for publication. All replicated studies were identified. Also, a list of these “in press” studies were designed as was sent describing those techniques for criterion accuracy studies, consistent with the which the basis of publication was inadequate requirements of the meta-analysis. for inclusion in the meta-analysis. School directors or their representatives were again Qualitative selection requirements. invited to respond and advise the committee of Qualitative requirements for selection and any techniques or any published studies that inclusion in the research review were that should be considered for inclusion in the studies selected must be published in the meta-analysis.13 journal Polygraph or other peer-reviewed scientific publication.14 Studies were also Study Selection considered for selection if they were published Requirements for the selection of by an academic degree-granting institution individual studies into the meta-analysis were that was accredited by an accrediting agency

12 One technique, developed by Lynn Marcy, was requested to be added to the list for review. However, no published studies could be located regarding that technique. At least one school representative recommended additional research for several PDD techniques.

13 One study was suggested for inclusion in the meta-analysis at that time, the Gordon et al. (2000) field study on the IZCT, for which the 2010-2011 APA President was the developer and has a proprietary interest. However, the Gordon et al. (2000) included no reliability statistics, no statistical parameters or description of the sampling distributions of deceptive and truthful scores. The committee was advised by the primary author (Personal communication, June 10, 2011) that he had never seen the data or the cases because they belong to the intelligence service of a foreign government. It was determined that this study could not be included due to the lack of published reliability data, and inability to evaluate the study data with that from other studies. Exclusion of this study did not prevent the IZCT from being included in the meta-analysis.

14 The journal Polygraph instituted expert peer review in 2003. Articles published prior to that time were subject only to editorial review. Because Polygraph is an important academic and historic resource, studies published prior to 2003 and without peer review were included in this meta-analysis if they satisfied all of the other qualitative and quantitative requirements for selection.

209 Polygraph, 2011, 40(4) Validated Techniques

that is recognized by the United States (US) Because the decision of the original examiner Department of Education or its foreign could have been influenced by extra- equivalent. Also considered for selection were polygraphic information, blind scores were research publications of studies funded by preferred over original examiner decisions.17 government agencies that underwent external Basic demographic information was required peer review. Edited academic texts and their regarding the examinees and the experience chapters were also included. Studies not and training of the examiners. subject to editorial or external review, and studies described only in self-published books Principal investigators for studies were not considered. selected for inclusion were required to be blind to the criterion status of the cases and Selection into the meta-analysis study participants.18 Although the present required that studies were conducted in a APA research standards require that principal manner that allows for the confident investigator not participate in the data generalization of the study results to field collection, this requirement did not exist prior settings. These requirements included that to March 2011, and therefore was not studies be conducted using instrument enforced in the selection of studies for the recording and component sensors that reflect meta-analysis. field practices (i.e., using two pneumograph sensors, electrodermal sensors, and Laboratory and field studies. Field studies cardiograph arm-cuff sensors), and a PDD are important to polygraph research as these technique for which there exists a published studies have the advantage of known description of the examination technique, ecological validity and are therefore assumed including rules and procedures for target to have increased generalizability in this selection, question formulation, and test regard. However, the representativeness and presentation of the test stimuli. In addition, it generalizability of field studies are was required that the method of test data compromised, to some unknown degree, by analysis be consistent with field practice and the inherently non-random case selection supported by a published description of its process which depends on the availability of features, transformations (scoring rules), confirmation data. While field studies are decision rules and normative data15 or highly useful for studying correlations, their cutscores. Results from studies that solely usefulness is frustrated by the impossibility of utilized automated algorithmic TDA models controlling enough variables to determine were not included in the meta-analysis. causality and construct validity.

Ground truth criteria must have been Laboratory studies are also important independent of the polygraph decision.16 to polygraph research as these studies can

15 Most PDD TDA methods do not use normative data, but use traditional cutscores that were determined hypothetically or arbitrarily. Although traditional cutscores have been verified as effective, the lack of normative data means that the level of statistical significance for traditional cutscores remains unknown and these cutscores might be suboptimal.

16 Confirmation based on confession alone would exclude inconclusive and error cases, and would tend to inflate accuracy calculations. Judicial outcomes as a criterion and are also not independent if polygraph evidence was considered during the judicial proceedings, and could lead to inflated accuracy estimates. One included study (Mangan, Armitage &Adams, 2008) did not meet this requirement, and was based only on sample cases that were confirmed by confession. Not surprisingly, the study resulted in a reported 100% accuracy rate. Verschuere, Meijer, & Merckelbach (2008) argued the results of this study as a methodological artifact and therefore unreliable.

17 As a practical matter, the inclusion of extra-polygraphic information may be advantageous if it increased the accuracy of field examiners. In the present analysis, answers to questions of criterion validity pertain only to whether or not the PDD exam data contain information that can be scored and interpreted to an accuracy conclusion.

18 One study, (Gordon et al.,2005), was reported as non-blind in another report by Mohamed et al.(2006) who described the same comparison of fMRI results with those of the PDD examination.

Polygraph, 2011, 40(4) 210 Ad Hoc Committee on Validated Techniques

more easily be constructed using random meta-analysis were that studies provide methods that reduce research and sampling sufficient information to calculate reliability bias, and thereby increase the generalizability and criterion validity of the technique of resulting information. Laboratory studies employed. In order to calculate a complete also provide a greater ability to control a dimensional profile of criterion accuracy, broader range of variables, and are important reported or available data must minimally to the study of causality and construct include the sample size for truthful and validity. However, the generalizability of deceptive groups, along with correct decisions, laboratory studies is thought to be reduced by inconclusive rates and error rates for the the fact that these studies are sometimes truthful and deceptive cases. Several of the conducted in circumstances that imperfectly principal investigators were contacted for approximate the ecological conditions of field additional information regarding the sampling examinations. distributions. Quantitatively, studies were excluded only due to the lack of adequate For the purpose of reviewing the information available to calculate the current state of validation regarding existing reliability and generalizability of the study polygraph techniques, the ad hoc committee results, and the dimensional profile of chose to regard field and laboratory studies criterion accuracy. with equal consideration if they satisfied the qualitative and quantitative requirements for Reliability. Several different statistical selection into the meta-analysis. Differences metrics have been used to describe the between criterion accuracy of field and reliability of PDD examination techniques, laboratory studies have historically been including the Pearson product moment statistically insignificant, and it would be correlation of the numerical scores of study unwise to attempt to opine or hypothesize participants, Cohen's Kappa statistics about the meaning of any differences observed describing the chance-corrected level of in a single comparison. Research studies and agreement between two study participants, reviews have shown a high level of agreement Fleiss' Kappa statistics describing the level of between field and laboratory studies, and the agreement between three or more ultimate cause of any differences should be participants, and the uncorrected proportion determined through the study of data. of decision agreement between study Anderson, Lindsay, and Bushman (1999) participants. None of these reliability metrics recently examined a broad range of was favored over the others, and studies were laboratory-based psychological research on not excluded for missing reliability data so aggressive behavior and concluded the long as they included any statistical following, "correspondence between lab- and description of the interrater reliability of field-based effect sizes of conceptually similar numerical scores or decisions.19 independent and dependent variables was considerable. In brief, the psychological Sampling distributions. Mean and standard laboratory has generally produced truths, deviation parameters were required, or at least rather than trivialities." (p. 3). In the area of minimally able to be calculated from available research directly related to the polygraph the data, for the deceptive and truthful NRC (2003) found no significant differences distributions of scores for all selected studies. between the results of laboratory and field It was expected that multiple samples drawn research. Similarly, Pollina et al. (2004) from the same underlying population and found no significant differences in scored with the same TDA method would classification accuracy of field and laboratory produce sampling distributions that do not polygraph research. differ significantly. It was also expected that replication and aggregation of the results of Quantitative selection requirements. sampling distributions would produce results Quantitative requirements for inclusion in the that are more representative and generalizable

19 One included technique did not meet this requirement. None of the published studies on the IZCT have included any statistical evidence of inter-rater reliability.

211 Polygraph, 2011, 40(4) Validated Techniques

than any single sampling distribution, hence including: test sensitivity and specificity (i.e., one of the reasons for requiring at least two test accuracy for deceptive and truthful studies of a technique. groups excluding inconclusive results), inconclusive rates for the deceptive and Some studies were published with truthful groups, positive predictive value (PPV) incomplete descriptions of the sampling (i.e., the proportion of true positives to all distributions. Therefore, it was necessary to positive results) and negative predictive value obtain raw scores from some of the principal (NPV) (i.e., the proportion of true negatives to investigators in order to calculate these all negative results), and the proportion of missing statistics.20 Most data were still correct decisions excluding inconclusive available, and investigators were willing to results among deceptive and truthful cases, provide additional information as requested.21 labeled unweighted accuracy. It was not It was expected that the sample distributions necessary for a study to provide all of these for each PDD technique would not differ at dimensional descriptions, and studies were statistically significant levels if the samples included if it was possible for the committee to were obtained from examinees who were calculate these statistics from the available representative of the same underlying information. The complete dimensional profile population, the PDD technique was could be calculated from a minimum of five administered in a similar manner for each values: decision accuracy for truthful and study, and the data were scored and deceptive groups, with or without inconclusive interpreted with a similar application of the results, along with the inconclusive rates for rules and procedures for test data analysis. the deceptive and truthful groups and the number of study participants assigned to each The absence of significant differences group. in sampling distributions would be interpreted as indicative that the sample distributions are The reduction of these important representative of each other. This would criterion dimensions to a single number increase our confidence regarding the cannot be accomplished without neglecting a reliability with which the samples are substantial portion of important information. representative of, and generalizable to, other Another complication was that some testing populations. Significant differences measures of accuracy are non-resistant to would be interpreted as indicative of samples difference in base-rates or prior probabilities. drawn from different populations, or to For example: PPV and NPV are vulnerable to differences in PDD test administration or the differences in base-rates and inconclusive application of the rules and procedures for rates, so these criterion dimensions are less test data analysis. Confidence in the useful for comparison of the accuracy of reliability and reproducibility would be different techniques for which the studies may reduced under these circumstances. be conducted using samples with differing base-rates. The unweighted average of correct Criterion accuracy decisions, and the unweighted average of the Study information for criterion validity inconclusive rates for deceptive and truthful was regarded as sufficient for inclusion in the cases was determined to provide the most meta-analysis if a study provided enough usable and generalizable criterion information information to calculate the complete when comparing studies with potentially dimensional profile of criterion accuracy, different base-rates.

20 Statistical descriptions of sampling distributions are now commonly required for publication in scientific journals, including the journal Polygraph. Editors and reviewers of scientific publications did not always require this information in the past because the importance of future meta-analytic research was not always anticipated.

21 Statistical parameters or raw scores could not be obtained for three studies, regarding two techniques, which were conducted by the US Department of Defense. Although committee members were aware that the studies had been subjected to thorough and adequate review by scientists at the Department of Defense, the absence of the data was inconvenient. Reliability statistics were provided in the Department of Defense study reports, and it was decided to retain these studies in the meta-analysis. These studies were later replicated independently.

Polygraph, 2011, 40(4) 212 Ad Hoc Committee on Validated Techniques

Moderator variables number of scorers were given proportionally Results of included studies and PDD more weight in the meta-analysis. Because techniques were coded and grouped for some samples were collected with multiple conformance with the APA 2012 standards for scorers who scored only a subset of the entire evidentiary testing, paired-testing, and sample, weighting values are equivalent to the investigative testing. PDD techniques were number of scored results for each study and also coded for whether they are intended to be each PDD technique. interpreted with the assumption of independent or non-independent criterion Research questions addressed by this variance among the RQ stimuli. No other meta-analysis are: 1) which PDD examination moderators or mediators were coded for this techniques have published and replicated study. evidence of validity that satisfies the APA 2012 standards of practice requirement for decision Data analysis accuracy and inconclusive rates, 2) what is All samples were regarded as biased the overall accuracy of validated PDD and imperfect representations of the techniques interpreted with the assumption of populations from which they are drawn. This independence among the RQ stimuli, 3) what meant that some differences would be is the accuracy level of PDD techniques expected to be observed between the sample interpreted with the assumption of non- distribution parameters and the population independence, among the RQ stimuli, 4) are distributions parameters if it were possible to there significant differences or outliers among obtain data from the entire population. any of the validated PDD techniques, and 5) However, representative samples would be are there any outlier results that are not expected to deviate from the population in accounted for by the presently available ways that are not statistically significant. evidence. Samples from different studies, if all are representative of the population, would also Results be expected to differ in ways that are not statistically significant. Alpha was set at .05 for all statistical comparisons. Multivariate ANOVAs were used to compare the sampling distribution parameters Validated PDD Techniques of each study to the sampling distributions of Thirty-eight studies satisfied the other replication studies for each PDD criteria for inclusion in the meta-analysis. examination technique. Monte Carlo methods These studies were based on 32 different were used to calculate standard errors and samples of confirmed cases, from which 45 statistical confidence intervals for the criterion different samples of scores were obtained. accuracy profiles of each of the studies These studies involved 295 scorers who included in the meta-analysis. Data were provided 11,737 scored results of 3,723 aggregated for those techniques that are examinations, including 6,109 scores of 2,015 intended to be interpreted with the confirmed deceptive examinations, 5,628 assumption of independence among the RQ scores of 1,708 confirmed truthful exams. stimuli, and those that are interpreted with Some of the samples were used in different the assumption of non-independence. studies, some were scored using different TDA models, and some samples were scored by Results of the meta-analysis were not more than one scorer. Appendix A shows a graded or weighted for study quality, other list of the included studies and sample sizes. than the quantitative selection criteria that Criterion accuracy reported for each of the were previously described for inclusion in the included studies can be seen in Appendix B. meta-analysis. In other words, all studies Reliability statistics for the included studies were considered equally if they satisfied the are shown in Appendix C. Appendix D shows qualitative requirements for inclusion in the the mean and standard deviations for the meta-analysis. Study results were weighted sampling distributions of deceptive and by sample size and number of participating truthful scores for the included studies. scorers, in that studies based on larger Fourteen PDD techniques were identified as samples and studies that involved a greater being supported by published and replicated

213 Polygraph, 2011, 40(4) Validated Techniques

studies that met the qualitative and criterion truthful cases and 50 criterion quantitative selection requirements for this deceptive cases. Unweighted decision meta-analysis. Presented alphabetically, they accuracy was reported as .814, along with an were: unweighted inconclusive rate of .280.

AFMGQT / Seven-position TDA Nelson, Handler, Morgan and O'Burke (In press) obtained blind numerical scores The United States Air Force Modified from three experienced examiners employed General Question Technique (AFMGQT)22 is a by the Iraqi government who used the seven- modern variant of the family of CQT formats position TDA model to evaluate a confirmed that have emerged as modifications of the case sample of AFMGQT (N = 22) exams that original General Question Technique (Reid, were selected from the U.S. Department of 1947) and Zone Comparison Technique Defense confirmed case archive. Eleven of the (Backster, 1963). The AFMGQT can be used cases were confirmed as deceptive; the effectively with two, three or four RQs. remaining 11 were confirmed as truthful. A AFMGQT examinations are used in both total of 66 examination scores were obtained, multi-facet event-specific diagnostic contexts and unweighted decision accuracy was and multi-issue screening contexts (e.g., reported as .761, along with an unweighted public safety employee selection, government inconclusive rate of .242. security screening, post-conviction screening programs, etc.). AFMGQT exams conducted in Figure 1 shows a mean and standard both multi-facet and multi-issue contexts are deviation plot of the subtotal scores23 of the interpreted with decision rules based on an sampling distributions of the three AFMGQT assumption that the criterion variance of the seven-position studies. No mean and RQs is independent. Three studies describe standard deviation data were available for the the criterion accuracy of the AFMGQT when AFMGQT study completed by Senter, Waller scored with the seven-position TDA model. and Krapohl (2008). A two-way ANOVA showed that the interaction of sampling Senter, Waller and Krapohl (2008), distribution and criterion status was not using blind seven-position numerical scores of significant [F (1,68) = 0.263, (p = .610)], nor 33 programmed deceptive and 36 was the main effect for sampling distribution programmed truthful examinees who were [F (1,68) = 0.013, (p = .910)]. tested using the AFMGQT following their participation in a mock roadside bombing The combined decision accuracy level scenario, reported an unweighted decision of these AFMGQT seven-position TDA studies, accuracy of .849, with an inconclusive rate of weighted for sample size and number of .015. scorers, was .822 with a combined inconclusive rate of .191. Reliability for Nelson and Handler (In press) used seven-position scores of AFMGQT exams was Monte Carlo methods to study the criterion reported by Senter, Waller and Krapohl (2008) accuracy of seven-position numerical scores of as kappa statistic of .750. The proportion of AFMGQT exams with two, three and four RQs. overall decision agreement, excluding The Monte Carlo space consisted of 50 inconclusive results, for all studies was .965.

22 Two versions exist for the AFMGQT: version 1 and version 2. Differences between these two techniques are based on unstudied assumptions regarding test structure, and the effect of these differences has not been thoroughly studied. There is no compelling hypothesis suggesting the performance of one version would be different or superior to another. Evidence available at the present time suggests that both versions perform adequately and no significant differences have been identified. Therefore, these results are suggested as generalizable to versions 1 and 2 of the AFMGQT, and both of which are represented in the included studies.

23 Subtotal scores were used in this analysis because the AFMGQT is scored with decision rules using only subtotal scores which assume criterion independence among the RQs.

Polygraph, 2011, 40(4) 214 Ad Hoc Committee on Validated Techniques

Figure 1. Mean deceptive and truthful subtotal scores for AFMGQT seven-position studies.

AFMGQT / ESS truthful cases and 50 criterion deceptive cases. Unweighted decision accuracy was Three studies describe the criterion reported as .876, with an unweighted accuracy of AFMGQT exams when scored inconclusive rate of .178. using the ESS. Figure 2 shows a mean and standard Nelson and Blalock (In press) deviation plot of the scores of the sampling transformed seven-position AFMGQT scores distributions of the three AFMGQT ESS from the Senter, Waller and Krapohl (2008) studies. A two-way ANOVA showed that the laboratory study to ESS scores, including 33 interaction of sampling distribution and results for confirmed deceptive cases and 36 criterion status was not significant [F (1,123) results for confirmed truthful cases. = 2.467, (p = .119)], nor was the main effect Unweighted decision accuracy was .839, with for sampling distribution [F (1,123) = 0.009, (p an inconclusive rate of .152. = .925)].

Nelson, Blalock and Handler (2011) The combined decision accuracy level obtained blind ESS scores from two of these AFMGQT ESS studies, weighted for inexperienced examiners and one experienced sample size and number of scorers, was .875 examiner who used the ESS to evaluate a with a combined inconclusive rate of .170. sample of confirmed AFMGQT exams (N = 22), Reliability for ESS scores of AFMGQT exams, including 11 exams that were confirmed as reported by Nelson, Blalock and Handler deceptive and 11 exams that were confirmed (2011) as the bootstrap mean of pair-wise as truthful. A total of 66 examination scores correlation coefficients, was .930. were obtained, and unweighted decision accuracy was .883, with an inconclusive rate Backster You-Phase of .183. The Backster You-Phase technique is Nelson, Handler and Senter (In press) an event-specific diagnostic technique, based used Monte Carlo methods to study the on Backster’s Zone Comparison concept. This criterion accuracy of ESS scores of AFMGQT technique is scored using TDA rules developed exams with two, three and four RQs. The by Cleve Backster and taught almost Monte Carlo space consisted of 50 criterion exclusively at the Backster School of Lie

215 Polygraph, 2011, 40(4) Validated Techniques

Figure 2. Mean deceptive and truthful subtotal scores for AFMGQT ESS studies.

Detection (2011). Both generic ZCT Monte Carlo study of You-Phase exams was (Department of Defense, 2006; Honts, Raskin .927, with an inconclusive rate of .321. & Kircher, 1987) and boutique ZCT variants (Gordon et al., 2000; Matte & Reuss, 1989) Nelson, Handler, Adams and Backster have been developed from the Backster You- (In press) surveyed the results of seven Phase technique. Scores from two recent examiners who provided 144 blind numerical studies were aggregated to calculate the scores for a sample (N = 22) of 11 confirmed criterion accuracy profile of Backster You- deceptive and 11 confirmed truthful You- Phase examinations. Phase examinations. These seven scorers had a range of experience from less than one year Nelson (In press) used seeding to more than 30 years. Results from the parameters calculated from the composite blind-scores survey produced an unweighted distributions of two non-selected studies24 to decision accuracy rate of .825 with an seed a Monte Carlo space of 100 simulated inconclusive rate of .117. Figure 3 shows a Backster You-Phase exams. The Monte Carlo mean and standard deviation plot for the space consisted of 50 criterion truthful cases scores of deceptive and truthful cases. A two- and 50 criterion deceptive cases. Unweighted way unbalanced25 ANOVA showed that the decision accuracy for the Nelson (In press) interaction of sampling distribution and

24 Honts, Hodes and Raskin (1985) used the Backster You-Phase technique in a countermeasure study for which the traditional arm cuff was replaced with an alternative cardio sensor. Meiron, Krapohl and Ashkenazi (2008) used the Backster You-Phase technique in a study of the Either-Or Rule, using a highly selected sample from which the results of problematic examinations were not included, resulting in a sample that is assumed to be systematically devoid of error variance. Although neither of these studies was usable alone, the parameters that describe the composite distributions of deceptive and truthful scores was assumed to be a more generalizable representation of error or uncontrolled variance along with diagnostic variance for scores from the Backster You-Phase exams.

25 Unbalanced ANOVAs, using the harmonic mean of the sample Ns, was used throughout this study when necessitated by differences in sample sizes. As a result, the total degrees of freedom in the ANOVA summary may not reflect the sum of all samples in the same way as a balanced ANOVA design. Unbalanced ANOVA designs can be expected to provide slightly less statistical power than balanced ANOVA designs.

Polygraph, 2011, 40(4) 216 Ad Hoc Committee on Validated Techniques

Figure 3. Mean and standard deviation plot for Backster numerical scores of confirmed You-Phase exams.

criterion status was not significant F (1,68) = information that would be available only to 0.869, (p = .355)], nor was the main effect for investigators and a guilty or involved suspect. sampling distribution [F (1,68) = 0.164, (p = Like the CQT, the CIT/GKT is based on the .686)]. principle of salience, including emotion, cognition and behavioral conditioning as the The combined results of the two psychological basis of physiological response published studies of Backster You-Phase (Senter, Weatherman, Krapohl & Horvath, exams, weighted for the sample size and 2010). Also, the CIT/GKT is conducted using number of scorers, produced a decision instrumentation that is similar to CQT accuracy rate of .863 and an inconclusive rate methods, including electrodermal sensors, of .196. Reliability of Backster numerical and may include cardiograph and pneumo- scores of You-Phase exams, reported by graph sensors. However, the CIT/GKT does Nelson et al. (In press) as a bootstrap mean of not include comparison questions and is not a pairwise correlation coefficients, was .567. CQT method. Therefore, the CIT/GKT has not been subject to the same ethical concerns as Concealed Information Test (CIT) probable-lie CQT methods regarding manipulating the examinee as a feature of test The CIT, also known as a Guilty administration.26 Hypothetical explanations Knowledge Test (GKT; Lykken, 1959) and for psychophysiological mechanisms under- related to the Peak of Tension (POT) technique lying the CIT/GKT have not been limited to (Ansley, 1992), is an event-specific diagnostic emotion and fear as the sole basis of response. technique that can be used to investigate Also, the CIT/GKT has remained free from whether an examinee possesses knowledge or scientific criticisms involving the role of

26 Similarly, directed-lie methods have been shown to work as well as probable-lie methods, and are less subject to the ethical complications regarding manipulating the test subject (Bell, Kircher & Bernhardt, 2008; Blalock, Nelson, Handler & Shaw, 2011; Honts & Reavy, 2009).

217 Polygraph, 2011, 40(4) Validated Techniques

examinee confessions as both an objective of name, the DLST technique uses directed-lie the test and as a form of verification of the CQs and is designed for screening exams that test results.27 are conducted in the absence of any known problem. Screening tests are often MacLaren (2001) published the results constructed and interpreted with multiple of a meta-analysis of 50 samples in 22 studies issues for which the criterion variance of the involving the CIT/GKT. Thirty-nine of those CQs is assumed to be independent. Two samples involved 1,070 examinees of which studies, involving the DLST/TES and seven- 666 had engaged in a behavioral act for which position TDA, have been published by the U.S. they had concealed information along with Department of Defense. Research reports 404 examinees who had no involvement or were available, though insufficient to meet the knowledge of the behavioral details. Eleven study inclusion criteria, due to the absence of samples involved 177 examinees who sampling standard deviations in the study possessed concealed knowledge of, but did not reports. Also, data for the Department of actually engage in, the behavioral act under Defense studies were not available for this investigation. committee to review. However, two DLST/TES replication studies were recently completed Using the scoring protocol and test and the data from those studies was included methodology described by Lykken (1959), in this meta-analysis. results reported by MacLaren produced a test sensitivity level of .815 for behaviorally The first published study of the involved examinees, along with a test DLST/TES (Research Division Staff, 1995a) specificity level of .832 for un-involved was a laboratory experiment involving a mock examinees who had no concealed knowledge. espionage scenario. Three experienced Unweighted decision accuracy was .823. federally trained examiners scored 94 However, when results included those examinations, involving 26 programmed examinees who possessed concealed deceptive examinees and 68 programmed information but were behaviorally un- truthful examinees. Results from this study involved, the test sensitivity level was .759, produced an unweighted decision accuracy and the unweighted decision accuracy rate level of .788, with an inconclusive rate of was .795. .155.29

Directed-lie Screening Test / Seven-position In the second published study of the TDA DLST/TES (Research Division Staff, 1995b) 10 experienced federally trained examiners The Directed-lie Screening Test (DLST) provided scores for 30 deceptive and 55 is based on the Test for Espionage and truthful laboratory examinations involving a Sabotage (TES) that was developed by the U.S. mock espionage scenario. Results from this Department of Defense.28 As indicated by its study produced an unweighted decision

27 Overemphasis on confession confirmation and non-independent criterion has led to criticisms of contamination and overestimation of CQT accuracy.

28 The name Directed Lie Screening Test is used in contexts for which the investigation targets differ from espionage and sabotage.

29 Although all information was included in the published reports, some false-positive errors and inconclusive results were excluded from previously reported statistics for this study. False-positive results were removed from the reported results when the examinee made post-test admissions that would have been viewed as substantive in field settings, causing the results to be viewed as not erroneous. Inconclusive results were removed from the previously reported results because DLST/TES procedures which require the immediate re-examination of inconclusive results. It was not possible for the blind scorers to repeat inconclusive examinations, and the investigators elected to describe DLST/TES accuracy without inconclusive results that could not be subject to re-examination. All false- positive and inconclusive results were included in the present results because this was considered a more conservative estimate of criterion accuracy for this technique.

Polygraph, 2011, 40(4) 218 Ad Hoc Committee on Validated Techniques

accuracy level of .888, with an inconclusive accuracy level was .831 with an inconclusive rate of .009.30 rate of .092.

Nelson (In press) used Monte Carlo Figure 4 shows a mean and standard methods to study the criterion accuracy of the deviation plot for the seven-position deceptive DLST/TES, and reported an unweighted and truthful scores of DLST/TES exams. No decision accuracy level of .874, with an mean and standard deviation data were inconclusive rate of .096. The Monte Carlo available for the TES studies completed by the space consisted of 50 criterion truthful cases US Department of Defense. A two-way and 50 criterion deceptive cases. ANOVA showed that the interaction of sampling distribution and criterion status was Nelson, Handler, Blalock and not significant [F (1,128) = 0.109, (p = .742)], Hernández (In press) studied the DLST/TES in nor was the main effect for sampling a mock espionage scenario with examiner distribution [F (1,128) = 0.023, (p = .880)]. trainees from the Iraqi government. Two scorers, including one experienced federally The combined results of the four trained examiner who is also an APA primary published studies of the DLST/TES with instructor, and one international examiner seven-position TDA, weighted for the sample who is an APA member from México, provided size and the number of scorers, produced a blind seven-position scores for 25 decision accuracy rate of .844, and an incon- programmed deceptive and 24 programmed clusive rate of .088. Reliability of DLST/TES truthful examinees. Fifty scores were studies completed by the U.S. Department of obtained for programmed deceptive examinees Defense, calculated via Cohen's Kappa, was and 48 scores were obtained for programmed reported as .760. The average proportion of innocent examinees. The unweighted decision pairwise decision agreement for all DLST/TES seven-position studies was .806.

Figure 4. Mean and standard deviation plot for seven-position scores of DLST/TES exams.

30 Two-false positive errors were removed from the previously reported accuracy estimations due to post-test admissions that were deemed by the principal investigator to have been likely to be considered substantive and not erroneous in field settings. One inconclusive result was removed from the previous accuracy estimations. Calculations in this report include all error and inconclusive results.

219 Polygraph, 2011, 40(4) Validated Techniques

Directed-lie Screening Test / ESS DLST/TES examinations to seven-position and three-position scores. The Monte Carlo Four studies describe the criterion space consisted of 50 criterion truthful cases accuracy of the DLST/TES when scored with and 50 criterion deceptive cases. Unweighted the ESS. decision accuracy of ESS scores was reported as .871, with an inconclusive rate of .048. Nelson and Handler (In press) used Monte Carlo methods to study the criterion Nelson, Handler, Blalock and accuracy of DLST/TES exams scored with the Hernández (In press) reported the results of ESS. The Monte Carlo space consisted of 50 blind ESS scores of DLST/TES examinations. criterion truthful cases and 50 criterion Seven-position scores from two scorers, deceptive cases. Unweighted decision including one experienced federally trained accuracy was reported as .831 with an examiner who is also an APA primary inconclusive rate of .104. instructor, and one international examiner who is an APA member from México, were Nelson, Handler and Morgan (In press) transformed to ESS scores, including 50 blind used a mock espionage scenario to study the scores for 25 programmed guilty and 48 blind criterion accuracy of ESS scores of DLST/TES scores for 24 programmed innocent exams conducted by seven inexperienced examinees. Unweighted decision accuracy examiner trainees on eight non-naive was .859, and the unweighted inconclusive examinees who were fully conversant with the rate was .123. operation and scoring of PDD examinations including the DLST/TES. Blind scores were Figure 5 shows a mean and standard obtained for 25 programmed deceptive exams deviation plot of the scores from the four and 24 programmed truthful exams. DLST/TES ESS studies. A two-way ANOVA Unweighted decision accuracy was .854 with showed that the interaction of sampling an inconclusive rate of .088. distribution and criterion status was not significant [F (1,289) = 2.396, (p = .123)], nor Nelson (In press), using a different was the main effect for sampling distribution Monte Carlo method, compared ESS scores of [F (3,289) = 0.156, (p = .925)].

Figure 5. Mean and standard deviation plot for ESS scores of DLST/TES exams.

MeanDeceptiveandTruthfulScoresfromDLST/TESESS Studies 10 8 6 4.660 3.636 4 3.265 2.086 2 Deceptive 0 2.442 1.271 3.031 1.781 Truthful 2 Nelson& Nelson Nelson Nelson 4 Handler Handler& Handler Morgan Blalock& 6 Hernández 8 10

Polygraph, 2011, 40(4) 220 Ad Hoc Committee on Validated Techniques

The combined decision accuracy level Carlo space consisted of 50 criterion truthful of these studies, weighted for sample size and cases and 50 criterion deceptive cases. number of scorers, was .858 with a combined Unweighted decision accuracy was .870, and inconclusive rate of .090. Reliability of ESS the unweighted inconclusive rate was .301. scores of DLST/TES examinations, reported as the bootstrap mean of the proportion of the Nelson, Handler, Blalock and pairwise decision agreement, excluding Cushman (In press) obtained blind scores inconclusive results, was .911 for the Nelson, from eight inexperienced and two experienced Handler and Morgan (In press) study, and scorers who used the seven-position TDA .769 for the Nelson, Handler, Blalock and model to provide 220 scores for a sample of Hernández (In press) study. The average rate Federal You-Phase examinations (N = 22) of pairwise decision agreement for these selected from the confirmed case archive of studies was .840. the U.S. Department of Defense. Eleven of the cases were confirmed as deceptive, and 11 Federal You-Phase / Seven-position TDA were confirmed as truthful. Unweighted decision accuracy was .885, with an The Federal You-Phase technique31 is unweighted inconclusive rate of .108. an event-specific diagnostic technique con- structed with two RQs. Two studies describe Figure 6 shows a mean and standard the criterion accuracy of the Federal You- deviation plot of the scores from the Federal Phase technique when scored using the seven- You-Phase seven-position criterion studies. A position TDA model. two-way ANOVA showed that the interaction of sampling distribution and criterion status Nelson (2011) used Monte Carlo was not significant [F (1,68) = 0.628, (p = methods to calculate the criterion accuracy of .431)], nor was the main effect for sampling Federal You-Phase exams when scored with distribution [F (1,68) = 0.001, (p = .977)]. the seven-position TDA model. The Monte

Figure 6. Mean and standard deviation plot for seven-position scores of Federal You-Phase exams.

31 Sometimes referred to as the Bi-Zone technique.

221 Polygraph, 2011, 40(4) Validated Techniques

The combined decision accuracy level transforming 220 blind seven-position scores of these studies of Federal You-Phase exams obtained from eight inexperienced and two scored with seven-position scoring, weighted experienced scorers who evaluated a sample for sample size and number of scorers, was of Federal You-Phase examinations (N = 22) .883 with a combined inconclusive rate of selected from the confirmed case archive of .168. Reliability of seven-position scores of the U.S. Department of Defense. Eleven of the Federal You-Phase exams, reported as the cases were confirmed as deceptive, and 11 bootstrap mean of the proportion of the were confirmed as truthful. Unweighted pairwise decision agreement, excluding decision accuracy was .906, and the inconclusive results, was .852. unweighted inconclusive rate was .235.

Federal You-Phase/ESS Figure 7 shows a mean and standard deviation plot of the ESS scores of the Federal Two studies describe the criterion You-Phase studies. A two-way ANOVA accuracy of Federal You-Phase examinations showed that the interaction of sampling when scored with the ESS. distribution and criterion status was not significant [F (1,68) = 0.155, (p = .695)], nor Nelson (2011) used Monte Carlo was the main effect for sampling distribution methods to calculate the criterion accuracy of [F (1,68) = 0.021, (p = .886)]. Federal You-Phase exams when scored with the ESS. The Monte Carlo space consisted of The combined decision accuracy level 50 criterion truthful cases and 50 criterion of these studies of Federal You-Phase exams, deceptive cases. Unweighted decision weighted for sample size and number of accuracy was .897 and the unweighted scorers, was .904 with a combined inconclu- inconclusive rate was .096. sive rate of .192, when scored with the ESS TDA method. Reliability, reported as the Nelson, Handler, Blalock and bootstrap proportion of pair-wise decision Cushman (In press) reported the criterion agreement excluding inconclusive results, was accuracy of Federal You-Phase exams using .897. ESS scores that were obtained by

Figure 7. Mean deceptive and truthful scores for Federal You-Phase / ESS studies.

Polygraph, 2011, 40(4) 222 Ad Hoc Committee on Validated Techniques

Federal ZCT / Seven-position TDA confirmed deceptive field examinations and 50 confirmed truthful exams selected from the The Federal ZCT is an event-specific U.S. Department of Defense confirmed case diagnostic technique constructed with three archive. A total of 1,000 scored results were RQs. Three studies describe the criterion obtained. Unweighted decision accuracy was accuracy of the Federal ZCT when scored .852, and the unweighted inconclusive rate using the seven-position TDA model. was .198.

Blackwell (1998) described the Honts, Amato and Gordon (2004), criterion accuracy of 100 confirmed Federal reported the criterion accuracy of Federal ZCT ZCT exams, of which 65 examinations were exams that were evaluated by three scorers confirmed as deceptive while 35 exams were who used the federal seven-position TDA confirmed as truthful. A total of 195 scored model. A total of 72 scores were obtained for results were obtained for criterion deceptive 24 criterion deceptive exams, and 72 scores exams, and 105 scored results were obtained were obtained for 24 criterion truthful exams. for criterion truthful exams. Three experi- Unweighted decision accuracy was .958, and enced federally trained examiners scored all of the unweighted inconclusive rate was .042. the cases using the seven-position TDA method.32 Unweighted decision accuracy was Figure 8 shows a mean and standard .793, and the unweighted inconclusive rate deviation plot of the seven-position scores of was .159. the Federal You-Phase studies. A two-way ANOVA showed that the interaction of Krapohl and Cushman (2006), sampling distribution was not significant, [F reported the criterion accuracy of Federal ZCT (1,204) = 0.706, (p = .402)], nor was the main exams scored by a cohort of 10 experienced effect for sampling distribution [F (1,204) = examiners, each of whom who scored 50 0.004, (p = .951)].

Figure 8. Mean deceptive and truthful scores for Federal ZCT exams with seven-position TDA.

32 The older, pre-2006, Federal TDA model employed more features than the presently used evidence-based Federal TDA model. However, Kircher et al. (2005) reported that experienced examiners tend to violate rules that do not work and tend to emphasize procedures that do work. Therefore it is possible that the scores of these examiners reflect current training and field practices more closely than might be initially assumed.

223 Polygraph, 2011, 40(4) Validated Techniques

The combined decision accuracy level case archive. Fifty of the examinations were of these seven-position TDA studies of Federal confirmed as deceptive, and 50 of the exams ZCT exams, weighted for sample size and were confirmed as truthful. Unweighted number of scorers, was .860 with a combined decision accuracy was .872, and the inconclusive rate of .171. Reliability of seven- unweighted inconclusive rate was .073. position scores of Federal ZCT exams, reported as the Fleiss' kappa statistic for Nelson and Krapohl (2011) reported categorical decisions of multiple raters was the criterion accuracy of 60 Federal ZCT .570, and the pairwise proportion of decision exams that were evaluated by six experienced agreement excluding inconclusive results was federally trained scorers. Thirty of the .800. examinations were confirmed as deceptive and 30 exams were confirmed as truthful. Each Federal ZCT / Seven-position TDA with scorer evaluated a random subset of 10 evidentiary decision rules exams. Results were evaluated using the Federal seven-position TDA model and Two studies describe the criterion evidentiary decision rules. Unweighted accuracy of the Federal You-Phase technique decision accuracy was .870, and the when scored using the seven-position TDA unweighted inconclusive rate was .100. model and evidentiary decision rules. Figure 9 shows a mean and standard Krapohl and Cushman (2006) deviation plot of the seven-position scores of described the criterion accuracy of 100 the Federal ZCT studies, using evidentiary Federal ZCT exams that were scored by 10 rules. A two-way ANOVA showed that the experienced examiners using the seven- interaction of sampling distribution and position TDA model and evidentiary decision criterion status was not significant [F (1,146) rules.33 A total of 1,000 scored results were = 0.001, (p = .981)], nor was the main effect obtained. Examinations were selected from for sampling distribution [F (1,146) = 0.046, (p the U.S. Department of Defense confirmed = .830)].

Figure 9. Mean deceptive and truthful scores for Federal ZCT exams with seven-position TDA and evidentiary decision rules.

33 Krapohl (2005) and Krapohl and Cushman (2006) showed that evidentiary decision rules can substantially reduce inconclusive rates without a corresponding loss of overall decision accuracy.

Polygraph, 2011, 40(4) 224 Ad Hoc Committee on Validated Techniques

The combined decision accuracy level Shurani (2011) reported the results of of these seven-position TDA studies of Federal a field study involving three examiners from ZCT exams with the seven-position TDA Costa Rica who used the IZCT along with an method and evidentiary decision rules, additional experimental technique. The weighted for sample size and number of sample consisted of 73 cases for which all scorers, was .872 with a combined possible suspects were tested. Forty-eight inconclusive rate of .075. Reliability, cases were confirmed, resulting in N = 188 calculated as the bootstrap average of examinations, that were conducted using the pairwise decision agreement excluding IZCT with three and four RQs.36 Two inconclusive results, was .870. inconclusive results were removed from the reported results. No information was reported Integrated Zone Comparison Technique regarding the number of exams conducted with three or four RQs. However, data were The Integrated Zone Comparison provided to the committee for 84 examinations Technique (IZCT) (Gordon et al., 2000) is a reportedly conducted using three RQs, proprietary event-specific diagnostic technique including scores for 36 deceptive cases and 48 scored with the Horizontal Scoring System truthful cases. Scores for the remaining 104 (Gordon, 1999).34 Two studies describe the exams were not made available. No sampling criterion accuracy of this technique. mean or standard deviations were reported, and the committee was unable to compare the Gordon et al. (2005; also described in means of the sample data provided to the Mohamed et al., 2006) reported the results of committee with any published information. a pilot study involving six guilty and five Results of this study were reported with innocent subjects35 who participated in a perfect accuracy and zero inconclusive laboratory scenario involving a mock shooting findings. No reliability statistics were reported incident. Decision accuracy was reported as for this study, and the committee was unable 1.000, with an unweighted inconclusive rate to calculate interrater reliability from the of .100. available data.

Shurani and Chaves (2010) reported Figure 10 shows a mean and standard the results of a survey of 84 field deviation plot of the sampling distributions of examinations conducted with the IZCT, the IZCT studies. A two-way ANOVA showed a including 44 scores for confirmed deceptive significant interaction between the sampling examinees and 40 scores for confirmed distribution and case status [F (1,173) = truthful examinees. All examinations were 533.771, (p < .001)]. Post-hoc one-way reportedly verified by confessions, with ANOVAs showed that the sampling differences extrapolygraphic evidence extant for some for deceptive cases was not significant. exams. Unweighted decision accuracy was However, the difference in truthful scores for .988, with an unweighted inconclusive rate of the three samples was significant [F (2,33) = .061. No reliability statistics were reported for 21.402, (p = .014)]. Truthful scores were this study, and the committee was unable to significantly greater for the Shurani and calculate interrater reliability from the Chaves (2010) and Shurani (2011) studies available data. compared to the Gordon et al. (2005) study.

34 A rank order scoring system based on unique developer-devised measurement features.

35 The original pilot study design included six innocent subjects, however one truthful subject made a false- confession to the examiner (Gordon, personal communication July 6, 2011) who was also the primary author of the Gordon et al. (2005) study and developer of the IZCT. Inclusion of the false-positive (false-confession) error case would have resulted in less than perfect accuracy.

36 No published description exists for the use of the IZCT with four RQs. Because the IZCT is scored using a rank order paradigm, inclusion of additional RQs without the inclusion of an equivalent number of additional CQs can be expected to differentially affect the rank-sum scores of relevant and CQs. No published studies described or investigated these statistical complexities.

225 Polygraph, 2011, 40(4) Validated Techniques

It was later learned that the Gordon et (2010) and Shurani (2011) samples revealed a al. (2005) study was conducted using single significant interaction [F (1,164) = 43.140, (p < issue IZCT exams while the Shurani and .001)], suggesting that the scores of deceptive Chaves (2010) sample cases were conducted and truthful cases were expressed or using multi-facet IZCT exams. It is not clear interpreted differently in the Shurani and whether this difference accounts for the Chavez (2010) and Shurani (2011) study significant interaction and differences samples. One-way differences were not observed in these sampling distributions.37 A significant. Scores for the Shurani (2011) two-way ANOVA comparison, scores x study were further from zero than the scores sampling distribution, of the sampling for the Shurani and Chaves (2011) study for distributions from the Shurani and Chavez both deceptive and truthful cases.

Figure 10. Mean deceptive and truthful scores for IZCT samples.

The combined decision accuracy level Matte Quadri-track Zone Comparison of these IZCT studies, weighted for sample size Technique and number of scorers, was .994 with a combined inconclusive rate of .033. The Matte Quadri-track Zone Comparison Technique (MQTZCT) (Matte & No reliability statistics were reported Reuss, 1989) is a proprietary event-specific, for any of the IZCT studies, and the committee single-issue diagnostic technique scored using was unable to calculate interrater reliability a modification of the Backster numerical from the available data. system. Three studies describe the criterion accuracy of the MQTZCT.

37 In a practical sense, differences in assumptions about independence and non-independence among the test questions will result in the use of different decision rules, and these differences may have had a biasing effect on case confirmation and sample selection for these field studies. Rank order scores for all RQs are always relative to all other relevant and comparison test stimuli. Rank order scores are therefore inherently non-independent, and the mathematical justification for the application of a rank-order scoring model to multi-facet exams, for which the decision rules are based on the assumption of independence, is not clear. This non-trivial statistical and decision theoretical complication has not been adequately discussed or studied.

Polygraph, 2011, 40(4) 226 Ad Hoc Committee on Validated Techniques

Matte and Reuss (1989) reported the confirmed by confession along with additional results of 64 deceptive and 58 truthful cases evidence for some cases. Decision accuracy that were confirmed through combinations of was reported as .964, with zero inconclusive confession and other evidence. Unweighted results. decision accuracy was reported as a perfect 1.000, with an unweighted inconclusive rate Figure 11 shows a mean and standard of .059. deviation plot of the subtotal scores38 of the sampling distributions of the three MQTZCT Mangan, Armitage and Adams (2008) studies.39 A two-way ANOVA revealed a reported the criterion accuracy of a survey of significant interaction between the sampling 91 deceptive cases and 45 truthful cases that distribution and case status [F (1,261) = were confirmed via examinee confession. 361.605, (p < .001)]. Although the different Decision accuracy was again reported as a studies appeared to handle deceptive and perfect 1.000, with an unweighted truthful cases with different effectiveness, inconclusive rate of .011. post-hoc one-way ANOVAs showed that the differences in scores were not significant for Shurani, Stein and Brand (2009) deceptive cases [F (2,141) = 0.389, (p = 0.678) reported the criterion accuracy of a survey of or for truthful cases [F (2,122) = 0.264, (p < 28 deceptive and 29 truthful cases that were .768)].

Figure 11. Mean deceptive and truthful per-chart scores for MQTZCT samples.

38 Scores for MQTZCT exams are reported as the subtotal per chart, obtained by summing all numerical scores within each chart. Subtotals described elsewhere in this report involve the between-chart RQ subtotals, obtained by summing the numerical scores for each RQ for all charts.

39 Data initially provided to the ad hoc committee for the Mangan, Armitage and Adams (2008) and Shurani and Chaves (2009) studies included only those scores for which the scorers achieved the correct result, and did not include scores for inconclusive or erroneous results. Missing scores were later provided to the committee for both the Mangan, Armitage and Adams (2008) and Shurani, Stein and Brand (2009) studies. However, the resulting sampling distributions were different from those reported for both studies. Because of these troublesome discrepancies, the statistical analysis was not re-calculated with the missing scores, and the reported analysis reflects the mean scores as reported by Mangan, Armitage and Adams (2008) and Shurani and Chavez (2009). The result of this confound is that sampling distributions, as reported, should be considered systematically devoid of error or unexplained variance, and therefore not generalizable.

227 Polygraph, 2011, 40(4) Validated Techniques

The combined decision accuracy level Kircher and Raskin (1988) reported the of these MQTZCT studies, weighted for sample results from two scorers, both of whom scored size and number of scorers, was .994 with a 50 programmed deceptive and 50 combined inconclusive rate of .029. programmed truthful examinees in a Reliability for MQTZCT exams was reported by laboratory study. A total of 200 scored results Matte and Reuss (1989) as .990.40 were obtained. Unweighted decision accuracy of blind numerical scores was .935, with an Utah ZCT – Probable Lie Test inconclusive rate of .070.

The Utah ZCT Probable Lie Test,41 also Figure 12 shows a mean and standard referred to as the Utah Probable Lie Test (PLT), deviation plot of the scores of the sampling (Handler 2006; Handler & Nelson, 2008) and distributions of the included Utah PLC the Utah numerical scoring system (Bell et al., studies. A two-way ANOVA showed that the 1999; Handler & Nelson, 2008) were interaction of sampling distribution and developed by researchers at the University of criterion status was not significant [F (1,63) = Utah, as a modification of the Backster ZCT 1.682, (p = .200)], nor was the main effect for (Backster, 1963). Two studies describe the sampling distribution [F (1,63) = 0.108, (p = criterion accuracy of the Utah PLT. .743)].

Honts, Raskin and Kircher (1987) The combined decision accuracy level reported the results of 10 programmed of these Utah PLT studies, weighted for deceptive and 10 programmed truthful sample size and number of scorers, was .931 examinees in a study of polygraph with a combined inconclusive rate of .077. countermeasures.42 Unweighted decision Reliability for Utah PLC exams, expressed as accuracy of blind numerical scores was .889, the average of kappa statistics for the two with an inconclusive rate of .150.43 studies was .730, with a pairwise rate of overall decision agreement, excluding inconclusive results, of .975.

40 This statistic was published in the Matte and Reuss (1989) reprint of the dissertation published in the journal Polygraph, but cannot be located in the original dissertations study for the no longer extant Columbia Pacific University.

41 Developers of the Utah technique appear to have given little concern to the name of the test question format, and this format has also been referred to as the Utah 3-question version and the Utah PLT. Polygraph field examiners have used the term Utah ZCT because of the obvious similarities with other ZCT variants. The term Utah ZCT is used in this document to aide in the recognition of the procedural and practical similarities between this technique and other three-question ZCT formats intended for single-issue event-specific testing.

42 Only the non-countermeasure control group cases are included in this analysis.

43 Honts, Raskin and Kircher (1987) reported mean scores but were not required by editorial and publication standards to report standard deviations for the sampling distributions of deceptive and truthful and deceptive scores at the time of publication. Because data were no longer available to calculate these missing statistics, a blunt estimate of the pooled standard deviation was calculated from the reported t-value for the level of significance of the difference between truthful and deceptive scores.

Polygraph, 2011, 40(4) 228 Ad Hoc Committee on Validated Techniques

Figure 12. Mean deceptive and truthful total scores for Utah PLT studies.

examinees who participated in a laboratory Utah ZCT – Directed Lie Test experiment. Unweighted decision accuracy of blind numerical scores was .856, with an The Utah ZCT Directed Lie Test (DLT) inconclusive rate of .067.45 is a variant of the Utah PLT, using directed-lie CQs in place of probable-lie questions. Two Figure 13 shows a mean and standard studies describe the criterion accuracy of Utah deviation plot of the scores of the sampling DLC exams. distributions of the included Utah DLT studies. A two-way ANOVA showed that the Honts and Raskin (1988) reported the interaction of sampling distribution and criterion accuracy of Utah DLC exams of 25 criterion status was not significant [F (4,51) = criminal suspects, including 12 deceptive and 0.705, (p = .592)], nor was the main effect for 13 truthful persons, whose examination were sampling distribution [F (2,51) = 0.009, (p = later confirmed by confession, evidence, the .991)]. confession of an alternative suspect, or the retraction of an allegation. Unweighted The combined decision accuracy level decision accuracy of blind numerical scores of these Utah DLT studies, weighted for was .958, with an inconclusive rate of .077.44 sample size and number of scorers, was .902 with a combined inconclusive rate of .073. Horowitz, Kircher, Honts and Raskin Reliability for Utah DLC exams, expressed as (1997) reported the results of 15 programmed the average of Pearson correlation coefficients deceptive and 15 programmed truthful for the included studies, was .930.

44 Honts and Raskin (1988) reported mean scores but were not required by editorial and publication standards to report standard deviations for the sampling distributions of deceptive and truthful and deceptive scores at the time of publication. Because data were no longer available to calculate these missing statistics, a blunt estimate of the pooled standard deviation was calculated from the reported F-ratio for the level of significance of the difference between truthful and deceptive scores.

45 Mean and standard deviation statistics were measured to the nearest 1/2 point from Figure 1 in Horowitz, Kircher, Honts and Raskin (1997) study report.

229 Polygraph, 2011, 40(4) Validated Techniques

Figure 13. Mean deceptive and truthful total scores for Utah DLT studies.

Utah ZCT – Canadian Police College/Royal examinations conducted by the Canadian law Canadian Mounted Police Version enforcement officers using the RCMP version of the Utah PLT. Twenty-one of the cases The Canadian Police College (CPC) and were confirmed as deceptive, and 11 of the the Royal Canadian Mounted Police (RCMP) cases were confirmed as truthful. Unweighted have developed a variant of the Utah PLT, decision accuracy of blind numerical scores referred to as the RCMP Zone or the CPC was .969, with an inconclusive rate of .210. Series A exam. Three studies describe the criterion accuracy of the Utah RCMP Series A Figure 14 shows a mean and standard exam. deviation plot of the scores of the sampling distributions of the included Utah CPC-RCMP Honts, Hodes and Raskin (1985) studies. A two-way ANOVA showed that reported the criterion accuracy of Utah PLT neither the interaction of sampling exams using the test question sequence of the distributions and criterion status [F (1,99) = RCMP Series A exam, including 19 deceptive 0.562, (p = .455)], nor the main effect for and 19 truthful cases. Unweighted decision sampling distribution [F (1,99) = 0.109, (p = accuracy of blind numerical scores was .833, .742)] were statistically significant. with an inconclusive rate of .237. The combined decision accuracy level Driscoll, Honts and Jones (1987) of these Utah CPC-RCMP studies, weighted for reported the criterion accuracy Utah PLT sample size and number of scorers, was .939 exams, using the results of 20 programmed with a combined inconclusive rate of .183. deceptive and 20 programmed truthful Reliability for Utah RCMP exams was reported examinees who were recruited from a group by Honts (1996) as Kappa = .480 for counseling program at a Veterans Center, categorical decision agreement adjusted for using the test question sequence the RCMP chance agreement. The average pairwise Series A exam. Decision accuracy was Pearson correlation coefficient for numerical reported as 1.000, with an unweighted scores of the included studies was .940, and inconclusive rate of .100. the average proportion of decision agreement, excluding inconclusive results, was .883. Honts (1996) reported the results of a survey of criterion accuracy of field

Polygraph, 2011, 40(4) 230 Ad Hoc Committee on Validated Techniques

Figure 14. Mean deceptive and truthful total scores for Utah CPC-RCMP studies.

Utah ZCT – Combined PLT, DLT and RCMP for sampling distribution [F (2,246) = 0.02, (p Studies = .980)]. Because the interaction was approaching a significant level, one-way post- Figure 15 shows a mean and standard hoc ANOVAs were also completed. Differences deviation plot for the three variants of the between the sampling distributions were not Utah PLT. A two-way ANOVA showed that the significant for the deceptive scores [F (2,100) = interaction between the test variant and 0.042, (p = .959)] or for the truthful scores [F criterion status was not significant [F (1,246) (2,100) = 0.008, (p = .992)]. = 2.553, (p = .111)], nor was the main effect

Figure 15. Mean deceptive and truthful total scores for three variants of the Utah ZCT.

231 Polygraph, 2011, 40(4) Validated Techniques

Unweighted decision accuracy for Blalock, Cushman and Nelson (2009), seven included studies pertaining to the three in a replication study, reported the criterion variants of the Utah technique, weighted for accuracy of a group of nine examiner trainees sample size and number of scorers, was .930, who used the ESS to evaluate a sample of 100 with an unweighted inconclusive rate of .107. exams selected from the U.S. Department of Reliability statistics were averaged for all Defense confirmed case archive. Fifty of the included Utah ZCT studies, and produced an examinations were confirmed as deceptive, average reliability statistic of kappa = .647. and 50 of the exams were confirmed as The average rate of decision agreement truthful. A total of 900 scored results were excluding inconclusive results was .958, and obtained. Unweighted decision accuracy of the average Pearson correlation coefficient for blind numerical scores was .870, with an numerical scores was .913. inconclusive rate of .138.

Event-Specific ZCT / ESS Nelson, Blalock, Oelrich and Cushman (2011), reported the results of a reliability The ESS is an evidence-based TDA study involving 25 experienced examiners who model that includes normative data for ZCT used the ESS to evaluate a sample of 10 examinations and other PDD techniques. examinations selected from the U.S. Because ESS transformations are non- Department of Defense confirmed case parametric, ESS scores are sensitive to archive. Six of the cases were confirmed as differences in response magnitude yet robust deceptive, and four cases were confirmed as against differences in the linearity of response truthful. A total of 250 scored results were magnitude. obtained. The pairwise proportion of decision agreement was .950, and the unweighted Nelson et al. (2011) reported a average of correct decisions excluding summary of five previous criterion accuracy inconclusive results was .958. The studies of ESS scores of ZCT examinations, unweighted inconclusive rate was .102. including results reported by Nelson, Krapohl and Handler (2008), Blalock, Cushman and Handler, Nelson, Goodson and Hicks Nelson (2009), Nelson, Blalock, Oelrich and (2010) reported the criterion accuracy of 19 Cushman (2011), Handler, Nelson, Goodson examiner trainees from the México Policía and Hicks (2010) and Nelson and Krapohl Federal, who used the ESS to evaluate 100 (2011). These studies included 5,192 scored examinations selected from the U.S. results from 140 scorers who evaluated 732 Department of Defense confirmed case individual examinations. Those results archive. Fifty of the examinations were consisted of 2,671 scored results of 384 confirmed as deceptive, and 50 of the exams confirmed deceptive examinations, and 2,521 were confirmed as truthful. A total of 1,900 scored results of 348 confirmed truthful scored results were obtained. Unweighted exams. Examinations included both Federal decision accuracy of blind numerical scores ZCT and Utah ZCT exams. Unweighted was .901, with an inconclusive rate of .040. decision accuracy of these scores, excluding inconclusive results was .921, and the Nelson and Krapohl (2011) reported unweighted inconclusive rate was .098. the criterion accuracy of transformed ESS scores from six experienced federally trained Nelson, Krapohl and Handler (2008) examiners who evaluated a sample of 60 reported the criterion accuracy of ESS scores examinations selected from the U.S. of seven inexperienced examiner trainees who Department of Defense confirmed case used the ESS to evaluate a sample of 100 archive. Each examiner scored 10 cases. exams selected from the U.S. Department of Thirty of the examinations were confirmed as Defense confirmed case archive. Fifty of the deceptive, and 30 of the exams were examinations were confirmed as deceptive, confirmed as truthful. Unweighted decision and 50 of the exams were confirmed as accuracy of blind numerical scores was .913, truthful. A total of 700 scored results were with an unweighted inconclusive rate of .020. obtained. Unweighted decision accuracy of blind numerical scores was .872, with an Remaining scores of the Nelson et al. inconclusive rate of .103. (2011) results consisted of 1,382 scored

Polygraph, 2011, 40(4) 232 Ad Hoc Committee on Validated Techniques

results of 572 individual examinations. These 10 cases each from the holdout sample. One results consisted of 741 scored results of 304 subset of 10 cases was scored by two of the confirmed deceptive examinations, and 641 Ohio police trainees. scored results of 268 confirmed truthful exams. Data from the Krapohl and Cushman Numerical scores from the Kircher, (2006) study were scored using an automated Kristjiansson, Gardner and Webb (2005) study version of the ESS, including 50 confirmed (N = 80) were transformed to ESS scores, deceptive examinations and 50 confirmed including 40 scores of confirmed deceptive truthful exams selected from the U.S. exams and 40 scores of confirmed truthful Department of Defense confirmed case exams. Data from the OSS development archive. These exams were also scored by a sample (Krapohl, 2002; Krapohl & McManus, cohort of 11 examiner trainees from the 1999; Nelson, Krapohl & Handler, 2008) were Colombian Army Counterintelligence Unit, evaluated using an automated model of the who used the ESS in pairs of two and three ESS, including 149 scores for confirmed examiners to score 10 cases each. Data from deceptive exams and 143 scores for confirmed the holdout sample used by Krapohl and truthful exams. Seven-position scores from McManus (1999) were also scored using an two experts who participated in the Kircher automated version of the ESS, including 30 and Raskin (1988) study were transformed to confirmed deceptive examinations and 30 ESS scores, including 100 scores for 50 confirmed truthful exams. This holdout examinations of programmed deceptive exami- sample was also evaluated by a cohort of 35 nees, and 100 scores for 50 exams conducted scorers from Romania, consisting of 15 on programmed truthful examinees. Finally, international polygraph examiners and 20 seven-position scores from three expert researchers, psychologists and graduate scorers who evaluated the cases for the students, from the University of Iasi in Blackwell (1998) study were transformed to Romania, who used the ESS while working in ESS scores, including 195 scores for 65 teams to score subsets of 10 cases each. The confirmed deceptive examinations, and 105 holdout sample was also scored by a cohort of scores for 35 confirmed truthful exams. 12 examiner trainees from the Panama National Police who worked in teams to score Figure 16 shows a mean and standard subsets of 10 cases each. In addition, seven deviation plot of the scores of the sampling examiner trainees from police agencies in the distributions of the included ZCT ESS studies. state of Ohio used the ESS to score subsets of A two-way ANOVA showed that the interaction

Figure 16. Mean deceptive and truthful total scores for ZCT ESS studies.

233 Polygraph, 2011, 40(4) Validated Techniques

of sampling distribution and criterion status Criterion Accuracy For All Validated was not significant [F (1,215) = 0.205, (p = Techniques .651)], nor was the main effect for sampling A one-way ANOVA for unweighted distribution [F (5,215) = 0.164, (p = .976)]. decision accuracy showed that differences in unweighted decision accuracy for these 14 The combined decision accuracy level PDD techniques were significant [F (13,5119) of the included ZCT ESS studies, weighted for = 2.753, (p < .001)]. One-way ANOVAs for sample size and number of scorers, was .922 case status showed that differences in correct with a combined inconclusive rate of .098, as decisions were significant for both criterion reported by Nelson et al. (2011). Reliability deceptive cases [F (13,2494) = 1.982, (p = statistics for ZCT ESS studies were average to .019)] and criterion truthful cases [F (13,2542) produce a pairwise proportion of decision = 2.764, (p <. .001)]. Figure 17 shows the agreement, excluding inconclusive results, of mean and confidence intervals for the un- .950, with average Kappa = .585. weighted accuracy of 14 techniques included in the meta-analysis.

Figure 17. Mean and confidence intervals for unweighted decision accuracy of 14 PDD techniques.

A series of ANOVA contrasts showed unweighted accuracy for the remaining 12 that differences were significant only for two PDD techniques. One-sample t-tests further PDD techniques, the IZCT and the MQTZCT. confirmed the outlier status of the results of Exclusion of these two PDD techniques these two techniques: t = 212.268 (p < .001) resulted in no significant differences [F for both the IZCT and the MQTZCT.46 A series (11,4859) = 0.949, (p = .491)] in the of leave-one-out t-tests revealed that none of

46 t-values are the same for both of these techniques because the unweighted mean of weighted sampling means is the same for both techniques (.994).

Polygraph, 2011, 40(4) 234 Ad Hoc Committee on Validated Techniques

the other techniques produced outlier results produced a decision accuracy of .869 (.036) when compared to the results of all other without inconclusive results. The 95% techniques. confidence range was from .798 to .940. The mean inconclusive rate, excluding outlier APA 2012 criterion validity standards results, was .128 (.030) with a 95% confidence Table 1 (also shown in the Executive range of .068 to .187. Aggregated reliability Summary) shows a list of the 14 PDD statistics produces a mean Kappa statistic of techniques that satisfied the requirements for .642 (.102) with a 95% confidence range of inclusion in this meta-analysis at criterion .443 to .842. The mean rate of inter-rater accuracy levels specified in the APA 2012 decision agreement, excluding outlier results standard requirements for evidentiary testing, and excluding inconclusive results, was .901 paired-testing, and investigative testing. Also (.082) with a 95% confidence range from .741 shown in Table 1 are the unweighted decision to .999. The mean Pearson correlation accuracy and inconclusive rates for each PDD coefficient for numerical scores, excluding technique. Additional details concerning the outlier results, was .876 (.116) with a 95% sampling distributions and a complete confidence range of .649 to .999. Table 2 dimensional profile of criterion accuracy for shows the aggregated criterion accuracy each of these techniques and all included profile of all validated CQT PDD techniques, studies can be found in Appendix E. weighted for the sample size and number of scorers.47 Also shown in Table 2 is the The combination of all validated PDD criterion accuracy profile including outlier techniques, excluding outlier results, results.

47 A majority of examinations in field polygraph programs are conducted using PDD techniques that are interpreted with an assumption of criterion independence among the RQs. However, a majority of PDD criterion validity research has been conducted using PDD techniques that are interpreted with an assumption of non-independence. Non-independent examination techniques have greater statistical discrimination power and greater accuracy than independent exam techniques. For this reason, the unweighted average was considered to be a more conservative and generalizable estimate of the overall accuracy of all PDD examination techniques. This was calculated as the unweighted average of the weighted aggregation of independent techniques and the weighted aggregation of non- independent techniques. Calculation of the weighted average would result in an overestimation of accuracy. The unweighted average of PPV and NPV might be a more optimistic and flattering under some conditions, but can be expected to be less generalizable to field circumstances when base rates are unknown or different than the base rates in the study samples.

235 Polygraph, 2011, 40(4) Validated Techniques

Table 1. Mean (standard deviation) and {95% confidence intervals} for correct decisions (CD) and inconclusive results (INC) for validated PDD techniques. Evidentiary Techniques/ Paired Testing Techniques/ Investigative Techniques/ TDA Method TDA Method TDA Method Federal You-Phase / ESS1 AFMGQT4,8 / ESS5 AFMGQT6,8 / 7 position CD = .904 (.032) {.841 to .966} CD = .875 (.039) {.798 to .953} CD = .817 (.042) {.734 to .900} INC = .192 (.033) {.127 to .256} INC = .170 (.036) {.100 to .241} INC = .197 (.030) {.138 to .255} Event-Specific ZCT / ESS Backster You-Phase / Backster CIT7 / Lykken Scoring CD = .921 (.028) {.866 to .977} CD = .862 (.037) {.787 to .932} CD = .823 (.041) {.744 to .903} INC = .098 (.030) {.039 to .157} INC = .196 (.040) {.117 to .275} INC = NA IZCT / Horizontal2 Federal You-Phase / 7 position DLST (TES)8 / 7 position CD = .994 (.008) {.978 to .999} CD = .883 (.035) {.813 to .952} CD = .844 (.039) {.768 to .920} INC = .033 (.019) {.001 to .069} INC = .168 (.037) {.096 to .241} INC = .088 (.028) {.034 to .142} MQTZCT / Matte3 Federal ZCT / 7 position DLST (TES)8 / ESS CD = .994 (.013) {.968 to .999} CD = .860 (.037) {.801 to .945} CD = .858 (.037) {.786 to .930} INC = .029 (.015) {.001 to .058} INC = .171 (.040) {.113 to .269} INC = .090 (.026) {.039 to .142} Utah ZCT DLT / Utah Federal ZCT / 7 pos. evidentiary CD = .902 (.031) {.841 to .962} CD = .880 (.034) {.813 to .948} - INC = .073 (.025) {.023 to .122} INC = .085 (.029) {.028 to .141} Utah ZCT PLT / Utah CD = .931 (.026) {.879 to .983} - - INC = .077 (.028) {.022 to .133} Utah ZCT Combined / Utah CD = .930 (.026) {.875to .984} - - INC = .107 (.028) {.048 to .165} Utah ZCT CPC-RCMP Series A / Utah CD = .939 (.038) {.864 to .999} - - INC = .185 (.041) {.104 to .266}

1 Empirical Scoring System.

2 Generalizability of this outlier result is limited by the fact that no measures of test reliability have been published for this technique. Also, significant differences were found in the sampling distributions of the included studies, suggesting that the samples data are not representative of each other, or that the exams were administered and/or scored differently. One of the studies involved a small sample (N = 12) that was reported in two articles, for which the participating scorer was also the technique developer. One of the publications described the study as a non-blind pilot study. Both reports indicated that one of the six truthful participants was removed from the study after making a false-confession. The reported perfect accuracy rate did not include the false confession. Neither the perfect accuracy nor the .167 false-confession rate are likely to generalize to field settings.

3 Generalizability of this outlier result is limited by the fact that the developers and investigators have advised the necessity of intensive training available only from experienced practitioners of the technique, and have suggested that the complexity of the technique exceeds that which other professionals can learn from the published resources. The developer reported a near-perfect correlation coefficient of .99 for the numerical scores, suggesting an unprecedented high rate of inter-scorer agreement, which is unexpected given the purported complexity of the method. Additionally, the data initially provided to the committee for replication studies included only those cases for which the scorers arrived at the correct decision, excluding scores from those cases for which the scorers did not achieve the correct decision. Missing scores were later provided to the committee for both the Mangan et al (2008) and Shurani and Chavez (2009) studies. However, the resulting sampling means were different from those reported for both replication studies. Because of these discrepancies, the statistical analysis was not re-calculated with the missing scores, and the reported analysis reflects the sampling distribution means as reported. Sampling means for replication studies should be considered devoid of error or uncontrolled variance.

4 Two versions exist for the AFMGQT, with minor structural differences between them. There is no evidence that the performance of one version is superior to the other. Because replicated evidence would be required to reject a null-hypothesis that the differences are meaningless, and because the selected studies include a mixture of both AFMGQT versions, these results are provided as a generalizable to both versions. AFMGQT exams are used in both multi-facet event-specific contexts and multi-issue screening contexts. Both multi-facet and multi-issue examinations were interpreted with decision rules based on an assumption of criterion independence among the RQs.

5 The AFMGQT produced accuracy that is satisfactory for paired testing only when scored with the Empirical Scoring System.

6 There are two techniques for which there are no published studies but which are structurally nearly identical to the AFMGQT: the LEPET and the Utah MGQT. Validity of the AFMGQT can be generalized to these techniques if scored with the same TDA methods.

7 Concealed Information Test, also referred to as the Guilty Knowledge Test (GKT) and Peak of Tension test (POT). The data used here were provided in the meta-analysis report of laboratory research by MacLaren (2001).

8 Studies for these PDD techniques were conducted using decision rules based on the assumption of criterion independence among the testing targets. Accuracy of screening techniques may be further improved by the systematic use of a successive-hurdles approach.

Polygraph, 2011, 40(4) 236 Ad Hoc Committee on Validated Techniques

Table 2. Mean (standard deviation) and {95% confidence Interval} for criterion accuracy profiles for all validated PDD techniques combined.

Excluding outlier results All included studies

Number of PDD Techniques 12 14 Number of Studies 39 45 N Deceptive 2,067 2,336 N Truthful 1,802 2,031 Total N 3,869 4,367 Number Scorers 280 295 N of Deceptive Scores 5,840 6,109 N of Truthful Scores 5,399 5,628 Total Scores 11,239 11,737 .869 (.036) .887 (.033) Percent Correct {.798 to .940} {.823 to .951} .128 (.030) .114 (.028) Inconclusive {.068 to .187} {.058 to .170} .812 (.056) .835 (.051) Sensitivity {.702 to .923} {.734 to .936} .717 (.061) .751 (.058) Specificity {.597 to .838} {.638 to .864} .083 (.038) .072 (.035) FN Errors {.008 to .157} {.004 to .141} .144 (.049) .123 (.043) FP Errors {.048 to .239} {.039 to .208} .105 (.042) .092 (.037) D Inc {.022 to .187} {.020 to .165} .151 (.042) .136 (.041) T Inc {.068 to .234} {.056 to .216} .854 (.049) .874 (.044) PPV {.757 to .950} {.789 to .960} .899 (.047) .911 (.043) NPV {.807 to .990} {.827 to .995} .909 (.042) .921 (.039) D Correct {.826 to .992} {.844 to .997} .829 (.056) .854 (.049) T Correct {.721 to .938} {.757 to .950}

237 Polygraph, 2011, 40(4) Validated Techniques

Table 3 shows the criterion accuracy 2012 standards. Also shown in Table 3 is the profile of the weighted aggregation of PDD criterion accuracy profile for evidentiary techniques at the evidentiary, paired testing techniques without the results of the two and investigative levels according to the APA outlier techniques.

Table 3. Criterion accuracy profiles for evidentiary, paired-testing, and investigative techniques. Evidentiary Evidentiary Paired-testing Investigative Techniques w/o Outlier Studies Techniques Techniques Number of Techniques 5 3 5 4 Number of Studies 21 15 12 12 N Deceptive 861 592 435 1,040 N Truthful 776 547 408 847 Total N 1,637 1,139 843 1,887 Number Scorers 174 159 56 65 N of Deceptive Scores 3,297 3,028 1,700 1,112 N of Truthful Scores 3,098 2,869 1,613 917 Total Scores 6,395 5,897 3,313 2,029 Unweighted Average .910 (.027) .903 (.028) .867 (.036) .844 (.039) Accuracy {.857 to .963} {.847 to .958} {.796 to .938} {.767 to .920} Unweighted Average .090 (.029) .095 (.030) .142 (.036) .114 (.028) Inconclusives {.032 to .147} {.035 to .154} {.071 to .213} {.060 to .168} Sensitivity .843 (.050) .832 (.053) .828 (.051) .802 (.047) {.745 to .941} {.729 to .935} {.728 to .928} {.710 to .893} Specificity .826 (.054) .816 (.055) .670 (.071) .771 (.073) {.721 to .931} {.708 to .923} {.531 to .809} {.627 to .915} FN Errors .082 (.033) .089 (.034) .060 (.032) .158 (.042) {.018 to .147} {.021 to .156} {.001 to .123} {.076 to .240} FP Errors .083 (.035) .090 (.037) .159 (.052) .159 (.070) {.014 to .152} {.018 to .162} {.056 to .261} {.022 to .296} D Inc .080 (.038) .086 (.041) .112 (.043) .038 (.020) {.005 to .155} {.004 to .167} {.028 to .195} {.001 to .077} T Inc .099 (.044) .104 (.044) .170 (.058) .073 (.015) {.014 to .185} {.017 to .191} {.056 to .284} {.043 to .102} PPV .915 (.034) .908 (.037) .847 (.050) .860 (.037) {.848 to .982} {.836 to .979} {.749 to .945} {.788 to .933} NPV .904 (.042) .898 (.043) .920 (.046) .812 (.082) {.823 to .986} {.814 to .982} {.829 to .999} {.651 to .973} D Correct .911 (.037) .904 (.039) .932 (.036) .837 (.045) {.839 to .983} {.828 to .98} {.862 to .999} {.749 to .924} T Correct .908 (.038) .901 (.040) .804 (.064) .827 (.072) {.833 to .983} {.822 to .980} {.678 to .930} {.686 to .968}

Polygraph, 2011, 40(4) 238 Ad Hoc Committee on Validated Techniques

Figure 18 shows the mean and inconclusive results [F (1,10896) = 3562.384, statistical confidence intervals for correct (p < .001)], suggesting that techniques at decisions and inconclusive results for three these different categorical levels may handle levels of criterion validity described by the APA deceptive and truthful cases with different 2012 standards. Two-way ANOVAs, including effectiveness. However, post-hoc one-way outlier results, showed a significant ANOVAs showed there was no significant interaction between criterion status and difference in the proportion of correct validation category for correct decisions [F decisions inconclusive results, or errors for (1,10896) = 7433.144, (p < .001)] and for deceptive or truthful cases.

Figure 18. Mean and confidence interval plot for APA validation categories.

Evidentiary testing techniques Five PDD techniques were reported to Table 4 shows the criterion accuracy produce both sufficiently high levels of profiles for the five PDD techniques that diagnostic accuracy and low inconclusive satisfy the APA 2012 requirements for rates that satisfy the APA 2012 standard evidentiary/diagnostic testing. Also shown in requirements for evidentiary testing. Scores Table 4 are the number of included studies for from 21 surveys and experiments were each PDD technique, the total number of summarized to describe the criterion validity scored results, reliability along with the mean of these evidentiary techniques. Studies that and standard deviations of the average support these evidentiary techniques included deceptive and truthful scores of the included 174 scorers who provided 7,407 numerical studies. Mean test sensitivity, test specificity, scores for 1,637 confirmed exams, including and unweighted accuracy have been reported 3,821 numerical scores for 861 confirmed at levels that are statistically significantly deceptive examinations and 3,586 numerical greater than chance (50%) for each of these scores for 776 confirmed truthful five PDD techniques. examinations.

239 Polygraph, 2011, 40(4) Validated Techniques

Table 4. Criterion accuracy profiles for evidentiary/diagnostic PDD techniques. Technique Federal Utah PLT IZCT* MQTZCT* ZCT ESS You-Phase (combined) TDA Method ESS Horizontal Matte Utah ESS Number of Studies 2 3 3 7 6 N Deceptive 61 86 183 147 384 N Truthful 61 93 139 138 348 Total N 122 179 319 285 732 Number Scorers 11 8 7 8 140 N of Deceptive Scores 160 86 183 197 2671 N of Truthful Scores 160 93 136 188 2521 Total Scores 320 179 319 385 5192 Mean D -7.512 -21.505 -8.711 -10.885 -10.46 StDev D 6.184 12.606 2.489 7.878 8.949 Mean T 6.146 19.626 5.226 9.372 8.219 StDev T 6.217 4.232 3.479 8.066 8.051 Reliability - Kappa - -† - .650 0.59 Reliability - Agreement .900 -† - .960 .950 Reliability - Correlation - -† .990‡ .910 - Unweighted Average .904 (.032) .994 (.008) .994 (.013)* .930 (.028) .921 (.028) Accuracy {.841 to .966} {.978 to .999} {.968 to .999} {.875 to .984} {.866 to .977} Unweighted Average .192 (.033) .033 (.019) .029 (.015) .107 (.030) .098 (.030) Inconclusives {.127 to .256} {.001 to .069} {.001 to .058} {.048 to .165} {.039 to .157} .845 (.052) .977 (.020) .967 (.021) .853 (.049) .817 (.056) Sensitivity {.742 to .948} {.937 to .999} {.926 to .999} {.757 to .948} {.706 to .927} .757 (.064) .946 (.035) .963 (.033) .809 (.056) .846 (.051) Specificity {.633 to .882} {.878 to .999} {.899 to .999} {.699 to .918} {.747 to .946} .034 (.026) .012 (.015) .011 (.021) .051 (.031) .077 (.037) FN Errors {.001 to .085} {.001 to .041} {.001 to .052} {.001 to .112} {.004 to .151} .138 (.050) .001 (.005) .001 (.015) .074 (.038) .064 (.034) FP Errors {.039 to .236} {.001 to .01} {.001 to .03} {.001 to .148} {.001 to .130} .128 (.046) .012 (.014) .022 (.001) .096 (.040) .106 (.044) D INC {.037 to .219} {.001 to .040} {.022 to .022} {.017 to .176} {.020 to .192} .255 (.044) .054 (.035) .037 (.029) .117 (.046) .089 (.042) T INC {.170 to .341} {.001 to .122} {.001 to .094} {.027 to .207} {.008 to .171} .860 (.050) .999 (.004) .999 (.015) .923 (.039) .931 (.038) PPV {.761 to .958} {.991 to .999} {.970 to .999} {.847 to .999} {.857 to .999} .957 (.033) .989 (.019) .985 (.021) .938 (.036) .912 (.042) NPV {.892 to .999} {.952 to .999} {.944 to .999} {.867 to .999} {.830 to .993} .961 (.029) .988 (.015) .989 (.021) .944 (.034) .913 (.042) D Correct {.903 to .999} {.959 to .999} {.948 to .999} {.877 to .999} {.831 to .996} .846 (.056) .999 (.005) .999 (.015) .916 (.043) .929 (.037) T Correct {.736 to .956} {.989 to .999} {.969 to .999} {.832 to .999} {.857 to .999} * Outlier results that differ significantly from the normal range of the other techniques.

† No reliability data has been published for any of the studies on the IZCT. ‡ A correlation coefficient of .990 is an extraordinary and remarkable finding in any field of research, and suggests an extremely low rate of disagreement between the numerical scores of blind evaluators using the MQTZCT. This statistic cannot be found in the Matte and Reuss (1989) dissertation paper for the now defunct Columbia Pacific University, but was published in the included Matte and Reuss (1989) reprint in Polygraph. Despite this extremely high correlation of numerical scores from different scorers, developers and researchers of the MQTZCT have expressed repeated cautions regarding the lack of generalizability of MQTZCT results without intensive proprietary training.

Polygraph, 2011, 40(4) 240 Ad Hoc Committee on Validated Techniques

A two-by-five way ANOVA, criterion (4,828) = 3.118 (p = .015)], while differences in status x technique, for correct decisions the rate of correct decisions for deceptive showed a significant interaction between tech- cases were not significant. A series of ANOVA nique and case status [F (1,7397) = 8944.964, contrasts showed that differences were (p < .001)], indicating that these five different significant only for the two outlier techniques. evidentiary techniques handled deceptive and There were no significant differences when the truthful cases differently. outliers were not included. Figure 19 shows a mean and confidence interval plot for correct Post-hoc one-way ANOVAs showed decisions and inconclusive results of the five that differences in the rate of correct decisions evidentiary techniques. for criterion truthful cases were significant [F

Figure 19. Means and confidence intervals for evidentiary techniques.

Paired-testing techniques examinations and 1,613 numerical scores for Five techniques were identified as 408 confirmed truthful examinations. providing a sufficient level of accuracy to satisfy the APA requirement for paired- Table 5 shows the criterion accuracy testing.48 Scores from 12 surveys and profiles for the five PDD techniques that experiments were summarized to describe the satisfy the APA 2012 requirements for paired criterion validity of these paired-testing testing. Also shown in Table 5 are the techniques. Studies that support these number of included studies for each PDD paired-testing techniques included 56 scorers technique, the total number of scored results, who provided 3,313 numerical scores for 843 reliability along with the mean and standard confirmed exams, including 1,700 numerical deviations of the average deceptive and scores for 435 confirmed deceptive truthful scores of the included studies.

48 All PDD techniques that meet the APA 2012 standard requirement for evidentiary testing also meet the requirements for paired-testing and investigative testing. Those PDD techniques that that meet the criterion accuracy requirement for paired-testing are also sufficiently valid for investigative testing.

241 Polygraph, 2011, 40(4) Validated Techniques

Although test sensitivity and unweighted produced test specificity levels that were not decision accuracy is significantly greater than significantly greater than chance (Backster chance for all five of these paired testing You-Phase/Backster, Federal You-Phase/7- techniques, three of these techniques position, & Federal ZCT/7-position).

Table 5. Criterion accuracy profiles for paired-testing techniques. Backster Federal Technique Federal ZCT Federal ZCT AFMGQT You-Phase You-Phase 7-position TDA Method Backster 7-position 7-position ESS evidentiary Number of Studies 2 2 3 2 3 N Deceptive 61 61 139 80 94 N Truthful 61 61 109 80 97 Total N 122 122 248 160 191 Number Scorers 8 11 16 16 5 N of Deceptive Scores 127 160 767 530 116 N of Truthful Scores 127 160 677 530 119 Total Scores 254 320 1,444 1,060 235 Mean D -16.055 -7.195 -8.577 -8.263 -2.960 StDev D 7.417 5.824 9.018 9.032 4.765 Mean T 5.216 5.999 7.466 7.852 3.738 StDev T 10.291 5.893 8.472 9.721 4.104 Reliability - Kappa - - .570 - - Reliability - Agreement - .850 .800 .870 1 Reliability - Correlation .567 - - - .930 Unweighted Average .862 (.037) .883 (.035) .860 (.037) .880 (.034) .875 (.039) Accuracy {.787 to .932} {.813 to .952} {.788 to .931} {.813 to .948} {.798 to .953} Unweighted Average .196 (.040) .168 (.037) .171 (.040) .085 (.029) .170 (.036) Inconclusives {.117 to .275} {.096 to .241} {.093 to .249} {.028 to .141} {.100 to .241} .836 (.052) .841 (.050) .858 (.051) .804 (.054) .729 (.065) Sensitivity {.734 to .938} {.742 to .939} {.759 to .957} {.697 to .911} {.603 to .856} .556 (.070) .632 (.069) .581 (.073) .809 (.057) .700 (.063) Specificity {.418 to .694} {.497 to .768} {.438 to .723} {.698 to .920} {.577 to .823} .007 (.012) .028 (.023) .033 (.029) .110 (.044) .092 (.046) FN Errors {-.016 to .03} {.001 to .073} {.001 to .090} {.024 to .197} {.002 to .182} .207 (.058) .161 (.051) .188 (.051) .109 (.044) .112 (.047) FP Errors {.091 to .322} {.061 to .261} {.089 to .287} {.022 to .196} {.02 to .204} .156 (.051) .131 (.046) .110 (.044) .087 (.039) .178 (.056) D INC {.055 to .257} {.041 to .221} {.023 to .196} {.010 to .163} {.068 to .289} .236 (.059) .205 (.057) .232 (.064) .083 (.040) .162 (.047) T INC {.119 to .354} {.093 to .318} {.106 to .358} {.003 to .162} {.071 to .254} .801 (.055) .840 (.053) .838 (.053) .880 (.048) .864 (.058) PPV {.693 to .909} {.736 to .943} {.734 to .943} {.786 to .974} {.751 to .977} .987 (.021) .958 (.035) .940 (.046) .880 (.048) .887 (.052) NPV {.945 to .999} {.889 to .999} {.851 to .999} {.786 to .974} {.785 to .989} .991 (.014) .968 (.027) .963 (.033) .879 (.048) .888 (.057) D Correct {.963 to .999} {.916 to .999} {.898 to .999} {.786 to .973} {.777 to .999} .728 (.073) .797 (.064) .756 (.067) .881 (.048) .862 (.053) T Correct {.584 to .873} {.672 to .923} {.625 to .887} {.786 to .976} {.758 to .967}

Polygraph, 2011, 40(4) 242 Ad Hoc Committee on Validated Techniques

Figure 20 shows a mean and that these five paired testing techniques confidence interval plot for correct decisions produced different rates of correct decisions and inconclusive results of the five paired for deceptive and truthful cases. Post-hoc testing techniques. A two-by-five way ANOVA, one-way ANOVAs showed no significant criterion status x technique, for correct differences in the rate of correct decisions for decisions showed a significant interaction [F criterion deceptive cases or criterion truthful (1,3303) = 5891.333, (p < .001)], indicating cases.

Figure 20. Means and confidence intervals for paired-testing techniques.

A two-by-five way ANOVA, criterion numerical scores for 1,887 confirmed exams, status x technique, for inconclusive decisions including 1,112 numerical scores for 1,040 showed a significant interaction [F (1,3303) = confirmed deceptive examinations and 917 5891.333, (p < .001)], indicating that these numerical scores for 847 confirmed truthful five paired-testing techniques produced examinations. different rates of inconclusive results for deceptive and truthful cases. Post-hoc one- Table 6 shows the criterion accuracy way ANOVAs showed that differences in profiles for the four PDD techniques that inconclusive rates were not significant for satisfy the APA 2012 requirements for criterion deceptive or criterion truthful cases. investigative testing. Also shown in Table 6 are the number of included studies for each Investigative testing techniques PDD technique, the total number of scored Four PDD techniques produced results, reliability along with the mean and criterion accuracy that satisfies the APA standard deviations of the average deceptive requirement for investigative testing. Scores and truthful scores of the included studies. from 12 surveys and experiments were Unweighted decision accuracy and test summarized to describe the criterion validity sensitivity has been reported as significantly of these investigative techniques.49 Studies greater than chance for all four of these that support these investigative techniques investigative techniques. Three of these included 65 scorers who provided 2,029 investigative techniques, the CIT, and

49 One of the included studies was a meta-analysis that summarized the results of laboratory studies using the CIT.

243 Polygraph, 2011, 40(4) Validated Techniques

DLST/TES scored with both the seven- technique, the AFMGQT scored with the position and ESS models, produced test seven-position model, was not significantly specificity that was significantly greater than greater than chance. chance. Test specificity for one investigative

Table 6. Criterion accuracy for investigative techniques. Technique CIT/GKT DLST/TES DLST/TES AFMGQT TDA Method Lykken 7-position ESS 7-position Number of Studies 39 4 4 3 N Deceptive 666 131 149 94 N Truthful 404 197 149 97 Total N 1070 328 298 191 Number Scorers 39 16 5 5 N of Deceptive Scores 666 156 174 116 N of Truthful Scores 404 221 173 119 Total Scores 1070 377 347 235 Mean D - -2.126 -2.131 -2.607 StDev D - 3.959 3.801 4.754 Mean T - 3.162 3.412 3.114 StDev T - 3.531 3.153 3.705 Reliability - Kappa - .760 - .750 Reliability - Agreement - .806 .840 .965 Reliability - Correlation - - - .940 Unweighted Average .823 (.041) .844 (.039) .858 (.037) .817 (.042) Accuracy {.744 to .903} {.768 to .920} {.786 to .930} {.734 to .900} Unweighted Average .001 (.001) .088 (.028) .090 (.026) .197 (.030) Inconclusives {.001 to .001} {.034 to .142} {.039 to .142} {.138 to .255} .815 (.048) .748 (.062) .809 (.069) .783 (.058) Sensitivity {.721 to .910} {.626 to .869} {.674 to .945} {.669 to .896} .832 (.067) .792 (.060) .751 (.031) .538 (.068) Specificity {.700 to .963} {.674 to .909} {.691 to .811} {.405 to .672} .185 (.048) .156 (.050) .112 (.057) .079 (.050) FN Errors {.090 to .279} {.058 to .255} {.001 to .224} {.001 to .177} .168 (.067) .127 (.052) .146 (.027) .203 (.057) FP Errors {.037 to .300} {.026 to .229} {.093 to .2} {.090 to .315} .001 (.001) .096 (.041) .078 (.052) .137 (.033) D INC {.001 to .001} {.016 to .175} {.001 to .180} {.071 to .202} .001 (.001) .081 (.037) .102 (.014) .257 (.049) T INC {.001 to .001} {.008 to .153} {.075 to .130} {.160 to .354} .889 (.037) .806 (.055) .848 (.041) .79 (.059) PPV {.816 to .961} {.698 to .914} {.767 to .928} {.675 to .905} .732 (.076) .878 (.054) .870 (.052) .874 (.062) NPV {.583 to .881} {.772 to .983} {.768 to .971} {.753 to .996} .815 (.048) .827 (.055) .878 (.067) .908 (.053) D Correct {.721 to .910} {.719 to .935} {.746 to .999} {.804 to .999} .832 (.067) .861 (.055) .837 (.027) .726 (.066) T Correct {.700 to .963} {.753 to .969} {.783 to .891} {.597 to .856}

Polygraph, 2011, 40(4) 244 Ad Hoc Committee on Validated Techniques

Figure 21 shows a mean and indicating that these four investigative confidence interval plot for correct decisions techniques differed in their abilities to and inconclusive results of the five paired correctly classify deceptive and truthful cases. testing techniques. A two-by-four way However, post-hoc one-way ANOVAs showed ANOVA, criterion status x technique, for no significant differences in the rate of correct correct decisions showed a significant decisions for criterion deceptive or criterion interaction [F (1,2021) = 1320.745, (p <.001)], truthful cases.

Figure 21. Means and confidence intervals for investigative techniques.

A two-by-three way ANOVA, criterion interpreted with decision rules based on an status x technique, for inconclusive decisions assumption of non-independence. Scores showed a significant interaction [F (1,953) = were summarized from 14 surveys and 404.177, (p < .001)], indicating that these experiments involving PDD techniques that three investigative techniques produced are interpreted with an assumption of different rates of inconclusive results for independent criterion variance among the deceptive and truthful cases. Post-hoc one- RQs. These studies included 31 scorers who way ANOVAs showed that differences in provided 1,194 numerical scores for 1,008 inconclusive rates were not significant for confirmed exams, including 562 numerical deceptive cases, but were significant for scores for 468 confirmed deceptive examina- truthful cases [F (2,478) = 3.418, (p = .034)]. tions and 632 numerical scores for 540 CIT/GKT results do not include an confirmed truthful examinations. Excluding inconclusive category, and this technique was outlier results, scores from 24 surveys and not included in the two-way analysis for experiments were summarized to describe the inconclusive results. criterion validity of PDD techniques for which the results are interpreted with decision rules Independent and non-independent PDD based on an assumption of non-independence techniques of the criterion variance of the RQs. These Table 7 shows the criterion accuracy studies included 210 scorers who provided profile of four PDD techniques that are 8,975 numerical scores for 1,791 confirmed interpreted with decision rules based on an exams, including 4,612 numerical scores for assumption of independent criterion variance 933 confirmed deceptive examinations and among the RQs, along with the criterion 4,363 numerical scores for 858 confirmed accuracy profile of PDD techniques that are truthful examinations.

245 Polygraph, 2011, 40(4) Validated Techniques

Excluding outlier results, comparison variance of multiple relevant questions question techniques intended for event- produced an aggregated decision accuracy specific (single issue) diagnostic testing, in rate of .850 (.773 - .926) with a combined which the criterion variance of multiple inconclusive rate of .125 (.068 - .183). The relevant questions is assumed to be non- unweighted average of accuracy for independent, produced an aggregated decision independent and non-independent PDD accuracy rate of .890 (.829 - .951), with a techniques, excluding outlier results, combined inconclusive rate of .110 (.047 - produced a decision accuracy level of .869 .173). Comparison question PDD techniques (.798 - .940) with an inconclusive rate of .128 designed to be interpreted with the (.068 - .187), as shown in Table 7. assumption of independence of the criterion

Table 7. Criterion accuracy profile of independent and non-independent PDD techniques. Non-independent Criterion Independent Non-independent Techniques PDD Techniques PDD Techniques with Outlier Results Number of Techniques 4 7 9 Number of Studies 14 24 30 N Deceptive 468 933 1,202 N Truthful 540 858 1,087 Total N 1,008 1,791 2,289 Number Scorers 31 210 225 N of Deceptive Scores 562 4,612 4,881 N of Truthful Scores 632 4,363 4,592 Total Scores 1,194 8,975 9,473 .850 (.039) .890 (.031) .896 (.030) Percent Correct {.773 to .926} {.829 to .951} {.837 to .955} .125 (.029) .110 (.032) .106 (.031) Inconclusive {.068 to .183} {.047 to .173} {.044 to .167} .771 (.072) .833 (.052) .840 (.050) Sensitivity {.630 to .911} {.731 to .934} {.743 to .938} .719 (.047) .765 (.061) .775 (.059) Specificity {.626 to .811} {.646 to .884} {.658 to .891} .113 (.058) .078 (.033) .074 (.032) FN Errors {.001 to .226} {.013 to .143} {.011 to .138} .144 (.039) .115 (.042) .109 (.041) FP Errors {.066 to .221} {.032 to .197} {.029 to .189} .112 (.051) .093 (.041) .089 (.039) D Inc {.013 to .212} {.012 to .174} {.011 to .166} .136 (.031) .127 (.049) .122 (.049) T Inc {.076 to .196} {.030 to .223} {.027 to .218} .828 (.059) .886 (.041) .893 (.039) PPV {.712 to .943} {.806 to .967} {.816 to .969} .878 (.049) .906 (.044) .910 (.043) NPV {.782 to .973} {.820 to .993} {.826 to .995} .873 (.066) .915 (.037) .919 (.036) D Correct {.744 to .999} {.842 to .988} {.849 to .989} .831 (.043) .866 (.049) .873 (.047) T Correct {.746 to .915} {.770 to .962} {.780 to .965}

Polygraph, 2011, 40(4) 246 Ad Hoc Committee on Validated Techniques

Figure 22 shows the mean and results, showed a significant interaction for statistical confidence intervals for correct correct decisions [F (1,10165) = 2656.637, (p < decisions and inconclusive rates for PDD .001)] and inconclusive results [F (1,10165) = techniques interpreted with decision rules 806.839, (p < .001)]. However, post-hoc based on assumptions of independent and ANOVAs showed there was no significant one- non-independent criterion variance, excluding way difference in the proportion of correct outlier results. Two-way ANOVAs, criterion decisions or inconclusive results for criterion state x independence, excluding outlier deceptive cases or criterion truthful cases.

Figure 22. Mean and confidence interval plot for criterion independent and non- independent (excluding outlier results) PDD techniques.

Discussion MQTZCT, the Utah ZCT (including PLT, DLT, and CPC-RCMP variants) scored with the Utah Fourteen PDD techniques (shown in numerical scoring system, and any variant of Figure 17) meet the requirements of the APA an event-specific three question ZCT scored 2012 standards for test validation. These with the ESS. Statistical analysis revealed techniques are supported by published two statistical outliers, the IZCT and the descriptions of the protocol for test MQTZCT. Two-way ANOVAs indicated that administration and test data analysis, using there were no significant differences between instrumentation representative of that used in the other evidentiary techniques when the field practice, and by published and replicated outlier results were not included. empirical support for the criterion accuracy of a published method for test data analysis. Five other PDD techniques were found to produce criterion accuracy that meets the Five PDD techniques have published APA 2012 standard requirements for paired- evidence of validity that meets the APA 2012 testing, with unweighted decision accuracy requirements for evidentiary testing, including over .860 along with inconclusive rates under unweighted decision accuracy over .900 along .200. These PDD techniques are, in with inconclusive rates under .200. These five alphabetical order, the AFMGQT when scored PDD techniques are, in alphabetical order; the with the ESS, the Backster You-Phase Federal You-Phase technique scored with the technique scored with the Backster numerical Empirical Scoring System (ESS), the IZCT, the scoring system, the Federal You-Phase

247 Polygraph, 2011, 40(4) Validated Techniques

technique scored with the Federal seven- techniques has accuracy different from the position TDA model, and the Federal ZCT other techniques. scored with the Federal seven-position TDA model, and the Federal ZCT scored with the Outlier results seven-position TDA model and interpreted Two outliers were identified: the IZCT with evidentiary decision rules. Although this and the MQTZCT. Research for both of these level of validation is intended to serve the techniques reported near-perfect accuracy, needs for criterion accuracy in paired-testing and these results were found to be statistical situations, the majority of PDD examinations outliers to the distribution of results predicted are not intended for paired testing or by all other studies on all other techniques, evidentiary use in courtroom settings. It is including the other evidentiary techniques in therefore inevitable that many field which these two studies are grouped. These examinations may be conducted with the PDD two techniques rely on support from the most techniques in this list though not intended for problematic research of all studies included in use in courtroom settings. Although a the meta-analysis. significant interaction was observed between criterion status and PDD techniques, One of the two studies included in indicating that different PDD techniques may support of the IZCT (Gordon et al., 2005; also provide subtle differences in accuracy with described by Mohamed et al., 2006) is a very criterion deceptive and criterion truthful small study described in one publication as a cases, none of the one-way main effects was non-blind pilot study. The use of pilot studies significant for decision accuracy or to answer questions about criterion accuracy inconclusive results among the deceptive or is troublesome. Additionally, both reports truthful cases. The present evidence does not indicated that one of the 12 participants in support a conclusion that any of these the Gordon et al. (2005) study, a programmed techniques provides accuracy that is different innocent participant, made a false-confession from the other techniques, and instead to the examiner, also the primary author, suggests this group of PDD techniques during the pre-test interview. That participant provides overall criterion accuracy of similar was removed from the experiment, which effectiveness. illuminates the non-blind study design. A false confession in field PDD programs would Four additional PDD techniques were not be immediately distinguishable from an found to satisfy the APA 2012 standard authentic confession. In field polygraph requirements for investigative testing. These programs a pre-test confession would be four PDD techniques are, in alphabetical viewed as a practical and successful form of order, the CIT/GKT, the DLST/TES scored resolution of the matter under investigation. with the seven-position TDA method, the Authentic confessions are regarded as PDD DLST/TES scored with the ESS, and the successes, and it is therefore necessary to AFMGQT50 when scored with the seven- regard false-confessions as problems. In a position TDA method. Although there may be field situation, it would only be later, when subtle differences in the accuracy of these additional evidence is available, that the techniques with criterion deceptive and confession would be identified as an error and criterion truthful cases, there were no would be viewed as problematic. significant main effect differences for decision accuracy or inconclusive results among the Inclusion of this error into the study deceptive or truthful cases. These results results would have resulted in a false-positive suggest that this group of PDD techniques (i.e., false-confession) rate of .167 and less provides overall criterion accuracy of similar than perfect test accuracy. Instead, the effectiveness, and the present evidence does results from the Gordon et al. (2005) study not support a conclusion that any of these were provided without the false confession,

50 Because the Utah MGQT and the LEPET are structurally virtually identical to the AFMGQT, and use the same scoring regimen, it is reasonable to generalize the AFMGQT validation findings to these two techniques.

Polygraph, 2011, 40(4) 248 Ad Hoc Committee on Validated Techniques

along with a reported decision accuracy rate of calculations of interrater reliability could not 1.000. It is possible that neither the reported be completed from the available data. The decision accuracy rate of 1.000 nor the false absence of reliability statistics does not allow confession rate of .167 is representative of estimates of the generalizability of the study IZCT performance in field settings. An results to results that would be obtained from argument could be offered that since this was other examiners or other scorers. Coupled a non-blind pilot study, which was not with the significant interaction effects between designed to serve as a criterion accuracy sampling distribution and case status, the study, removing errors from the reported present evidence is insufficient to support the study result was justified. Pilot studies like notion that other practitioners would obtain this help guide decisions about the funding scores or results similar to those reported in and design of more rigorous research into the published studies. areas such as fMRI or other methods for lie detection. However, the selective exclusion of Studies supporting the generalizability unfavorable data from a study of criterion of the MQTZCT, the other statistical outlier, accuracy requires strong justification. are limited by some interesting and unique factors. First, the developer of the MQTZCT An additional concern regarding the and previous authors themselves seem to evidence supporting the IZCT is the fact that have cautioned against the generalizability of the sampling distributions from the three this technique by emphasizing the need for included studies differ significantly. intensive and specialized training available Significant differences are the result of several only from practitioners of the method. Indeed possible conditions, including: 1) the samples they have asserted that the complexity of the were selected from different populations, 2) technique, and its related psychological the IZCT was administered differently to the hypotheses, are such that other trained PDD different study samples, or 3) the study examiners should not reasonably expect to samples were scored and interpreted with a learn or properly execute the MQTZCT based different application of the TDA rules. It is on information available in the published also possible that the observed significant sources. An emphasis on strict compliance differences are the result of a highly selective with a complex and precise systems of many sampling methodology, in which examinations rules gives the impression that the technique are included based on the examiner's or should be regarded as fragile, non-robust, and investigators judgment of good or confident easily disrupted by even slight departures results, such as would occur in the context of from stipulated procedures. a direct admission regarding the investigation targets. Over-reliance on confession A second, equally important concern confirmation could have the effect of involves the fact that a significant interaction systematically excluding both false-negative was found between sampling distribution and and false-positive error cases, for which no case status. Although one-way differences confession would be likely to be obtained. were not significant within the deceptive or truthful groups, the significant interaction Regardless of the reason, deceptive effect indicates that the scores of criterion and truthful scores were expressed in deceptive and criterion truthful cases are significantly different ways in the three expressed or interpreted in different ways different studies on the IZCT. The meaning of within the sampling distributions of the three these significant differences to this meta- included studies on the MQTZCT. In other analysis is that the included studies appear to words, the data are not congruent even among be based on samples that are not the studies used to support the MQTZCT. representative of each other, and it is This significant interaction suggests the unknown whether one or more of the studies possibility that the included studies are based is not representative of the population of on samples that are not representative of each examinees. other. It is unknown whether one or more of the studies is not representative of the A third problematic concern with the population of all examinees, reducing our IZCT is that none of the published studies confidence in the potential for generalizability included any reliability statistics and of the reported results.

249 Polygraph, 2011, 40(4) Validated Techniques

A third concern involving the MQTZCT insufficient to support the generalizability of is that the reported reliability coefficient of the reported study results. .990 was published in the Matte and Reuss (1989) reprint of the dissertation published in Possible mediators of these outlier the journal Polygraph, but cannot be located results include the possibility that these in the original dissertation study for the no techniques are simply superior to others. The longer extant Columbia Pacific University. role of proprietary, personal and financial This is both unfortunate and concerning interests, including business relationships because the unprecedented high rate of inter- between technique developers and principal scorer agreement is unexpected given the investigators, cannot be overlooked, however, purported complexity of the method. and the serious methodological and empirical confounds surrounding the supporting A final confound to the generalizability research undermines confidence in the study of the results of the included studies on the results and reported accuracy of these MQTZCT is that the data provided to the techniques. From a scientific perspective, committee initially included numerical scores even well designed research generated by for only those cases for which the scorers advocates of a method who have a vested achieved the correct result. Data available to interest in the outcome, and who act as the ad-hoc committee did not initially include participants and authors of the study report, numerical scores for those cases for which the does not have the compelling power of scorers achieved erroneous or inconclusive research not so encumbered by these results. Missing scores were later provided to potentially compromising factors.51 the committee for both the Mangan, Armitage and Adams (2008) and Shurani, Stein and Regardless of what factors contribute Brand (2009) studies. However, the resulting to these exceptional results, the confounds sampling distribution means, calculated with associated with the supporting studies the missing scores, were different from those undermines confidence that they represent reported for both studies. Because of these the true accuracies. Expectations that these discrepancies, the statistical analysis was not outlier results will generalize to field settings re-calculated with the missing scores, and this should be delayed until more complete analysis reflects the mean scores as reported independent replication studies and extended by Mangan, Armitage and Adams (2008), and analysis are completed. by Shurani and Chavez (2009). Field data are always a combination of diagnostic (i.e., Criterion accuracy controlled or explained) variance and error Excluding outliers, the aggregated variance (i.e., uncontrolled or unexplained unweighted accuracy52 of all PDD techniques variance). The sampling means reported in was .869 (.798 - .940), with an unweighted the Mangan, Armitage and Adams (2008) and inconclusive rate of .128 (.068 - .187). All 14 Shurani, Stein and Brand (2009) studies are PDD techniques included in this meta- systematically devoid of error variance. Given analysis produced unweighted decision that a significant interaction effect was accuracy levels that were significantly greater observed between sampling distribution and than chance. Excluding outliers, there were case status, the present evidence is no significant one-way differences in the

51 Questions may arise as to why these studies and techniques were included in the meta-analysis after identifying so many serious confounds. The techniques were ultimately included in the meta-analysis because they met the more general requirements outlined in the APA Standards of Practice. It was also determined that the meta-analysis is more complete, and therefore more helpful and informative to interested readers, with the inclusion of these studies and techniques.

52 The unweighted average was considered to be a more conservative and realistic calculation of the overall accuracy of all PDD examination techniques. Calculation of the weighted average, or the simple proportion of correct decisions, often results in higher statistical findings that are less robust against differences in base-rates and therefore less generalizable.

Polygraph, 2011, 40(4) 250 Ad Hoc Committee on Validated Techniques

unweighted decision accuracy of any of the 14 technique scored with the Federal seven- PDD techniques, and no significant one-way position TDA model, the Federal You-Phase differences in correct decisions, inconclusive technique scored with the ESS, the Federal results, or errors for criterion deceptive or ZCT scored with the Federal seven-position criterion truthful cases. Neither were there TDA model, the Federal ZCT scored with the any significant differences in the aggregated seven-position TDA model and interpreted criterion accuracy of PDD techniques at the with evidentiary decision rules, the Utah ZCT evidentiary, paired-testing, and investigative (including PLT, DLT, and CPC-RCMP variants) levels. Some practical differences were scored with the Utah numerical scoring observed in the criterion accuracy profiles of system, and any variant of an event-specific these techniques. All five techniques included three-question ZCT scored with the ESS. at the evidentiary level produced statistically significant effect sizes for both test sensitivity Published and replicated empirical to deception and test specificity to truth- evidence exists for four PDD techniques that telling. are interpreted with decision rules based on the assumption of independent criterion At the paired testing level all five variance among the RQs. These techniques techniques also produced test sensitivity to produced an aggregated unweighted accuracy deception that was significantly greater than level of .850 (.773 - .926) with an inconclusive chance, though only two of these techniques, rate of .125 (.068 - .183). In alphabetical the Federal ZCT scored with the seven- order, these techniques are: the DLST/TES position evidentiary rules, and the AFMGQT scored with the seven-position TDA method, scored with the ESS, produced test specificity the DLST/TES scored with the ESS, the to truth-telling that was significantly greater AFMGQT when scored with the seven-position than chance. Specificity to truth-telling was TDA method, and the AFMGQT when scored not significantly greater than chance for the with the ESS. Backster You-Phase technique with Backster scoring, Federal ZCT with seven-position Despite these observed and practical scoring or Federal You-Phase with seven- differences, excluding outlier results, no position scoring. significant differences were found for decision accuracy or inconclusive results among the For investigative techniques, all four PDD techniques that satisfy the requirements techniques produced test sensitivity to of the APA 2012 standards. Similarly no deception that was significantly greater than significant differences were found for decision chance. Specificity to truth-telling was accuracy or inconclusive results for PDD significantly greater than chance for the CIT, techniques interpreted with the assumption of and the DLST/TES format when scored with independence or non-independence among both the seven-position and ESS models, but the RQs. was not significantly greater than chance for the AFMGQT when scored with the seven- Not all techniques reviewed possessed position model. sufficient empirical support to meet the APA standards for inclusion. Some named PDD Excluding outlier results, published techniques were found to lack any published and replicated empirical evidence for seven evidence of support that could be used to CQT formats, intended for event-specific calculate the sampling distributions, reliability diagnostic testing for which the results are and criterion accuracy profiles needed for interpreted using decision rules based on the inclusion in this meta-analysis; these are assumption of non-independence of the listed in Appendix F. Appendix G provides a criterion variance of the RQs produced an summary of published studies that could not aggregated unweighted accuracy rate of .890 be included in the meta-analysis. Appendix H (.829 - .951) along with an inconclusive rate of contains a description of techniques for which .110 (.047 - .173). These techniques are, in there exists a single un-replicated study that alphabetical order, the AFMGQT when scored met the requirements for inclusion in this with the ESS, the Backster You-Phase meta-analysis. These techniques could not be technique scored with the Backster numerical included in the meta-analysis as the APA scoring system, the Federal You-Phase Standard requires a minimum of two

251 Polygraph, 2011, 40(4) Validated Techniques

published studies. Appendix I lists those PDD of these PDD techniques are structurally techniques found to have published and nearly identical to the AFMGQT. We can find replicated evidence of support, but the no reason why validation data for AFMGQT reported criterion accuracy did not satisfy the cannot be generalized to these techniques if validity requirements of the APA 2012 scored with the same TDA methods. standards. Comparison With Previous Systematic PDD techniques that make use of the Reviews three-position TDA model are not included in We did not test the level of significance the meta-analysis and are therefore not of the difference between the present accuracy included in Table 1. The criterion accuracy estimations with those of the OTA (1983), profiles of PDD techniques that make use of though it can be easily seen that the .847 the three-position TDA model are shown in mean accuracy rate of field studies is outside Appendix I. The unweighted decision the 95% confidence interval (.865 to .977) for accuracies were significant for all of the PDD techniques that meet the APA 2012 techniques based on three-position TDA requirements for evidentiary testing. This methods, but not equal for the deceptive and observed difference is most likely due to the truthful cases. All techniques that employed exclusion of results from studies that do not three-position TDA methods consistently conform to recognizable field practices, and to exceeded the 2012 limit for inconclusive the exclusion of results of PDD techniques decisions (20%). Because criterion accuracy that do not produce satisfactory results rates for techniques with three-position TDA according to the APA 2012 standards of did not differ significantly from seven-position practice. There is little justification for use in criterion accuracy, an initial analysis with the field practice, and therefore little justification three-position TDA method may be considered for inclusion into accuracy estimations, of acceptable if inconclusive results are resolved PDD techniques that have been supplanted by via subsequent analysis with a TDA method more effective methods. Inclusion of arcane or that provides both accuracy and inconclusive substandard methods into accuracy rates that meet the requirements of the APA estimation would be the equivalent of 2012 standards. attempting to answer an automobile industry question regarding corporate fuel economy Some readers will note that two while including all makes and models from versions exist for the AFMGQT with minor the 1960s and 1970s gas-guzzling era into structural differences between them.53 There calculations of present-day economy. is no evidence to suggest that the performance Techniques which produce substandard and of one version is superior to the other. Con- unsatisfactory criterion accuracy were sidering that rigorous and replicated evidence therefore excluded from the meta-analysis. would be required to reject a null-hypothesis that the differences are meaningless, and These results are consistent with the considering that the included studies include results of Honts and Peterson (1997), Raskin a mixture of both AFMGQT versions, these and Podlesny (1979), Abrams (1977; 1989), results are provided as generalizable to both and Ansley's (1990) findings regarding blind versions of the AFMGQT. evaluation of PDD test data. Results of this meta-analysis are also consistent with the Two widely used and recognizable results of the more recent National Research techniques, the LEPET and the Utah MGQT Council (2003) who reported an accuracy rate (four-question version of the Utah technique), of laboratory studies as .860 along with an were not included in the meta-analysis aggregated rate of .890 for field studies, using because no published studies could be located studies that met their selection criteria. in support of these techniques. However, both Because the present analysis includes only

53 The AFMGQT is used in both multi-facet investigations of known incidents and multi-issue screening contexts. Both types of exams, muti-facet and multi-issue, are interpreted with decision rules based on the assumption of independent criterion variance among the RQs.

Polygraph, 2011, 40(4) 252 Ad Hoc Committee on Validated Techniques

techniques as they are documented and used confession; all field samples that are selected in field settings, we suggest that the present through the availability and quality of results provide a more helpful and practical confirmation data are potentially non-random answer to PDD professionals, program and non-representative. This same concern, managers, and professional consumers of regarding the non-independence of PDD results who are faced with the need to confirmation data, applies to investigation make evidence-based decisions regarding the results and judicial outcomes that are based selection and field use of presently available in part on the information resulting from a PDD techniques. polygraph exam. It is possible that this phenomenon underlies the general trend in These results are more conservative the literature in which the results of PDD field than those reported by Ansley (1983; 1990) studies have generally outperformed the and those of Abrams (1973), which warrants results of laboratory experiments.55 Despite further discussion. It is unlikely that the PDD potential or observed sampling differences test has become less accurate during the last between field and laboratory studies, the NRC three decades. A more realistic possibility is (2003) found no significant differences that the samples included in the early between the results of high quality field and literature reviews by Ansley were more laboratory studies. vulnerable to overestimation of test accuracy as a result of sample selection methodology. Results of this meta-analysis are Ansley (1990) stated that court decisions and consistent with the systematic review of evidence are sometimes unreliable, and Crewson (2003) regarding the accuracy of expressed a preference for confession diagnostic polygraphs. However, these results confirmation of PDD examination results. depart from Crewson's conclusion regarding Over-emphasis on confession confirmation screening polygraphs. The accuracy rate includes the potential for unintended found for screening polygraphs in this meta- systematic exclusion of false-negative and analysis was higher than that reported by false-positive errors, both of which conditions Crewson, and the difference is statistically are unlikely to lead to a confession, and significant (t [1008] = .002). While the exact therefore the confirmation criterion. cause of this difference cannot be known from Confessions themselves are the result of a the present data, we note that the studies and non-random decision to pursue further techniques used by Crewson could not be discussion with and disclosure from the included in this meta-analysis. Four of the examinee. If the decision to pursue a screening studies reported by Crewson confession is based in part on the results of a involved the Relevant-Irrelevant technique polygraph exam, then confirmation via (Ansley, 1989; Brownlie, Johnson & Knill, confession is non-independent from the test 1997; Honts & Amato, 1999; Jayne, 1989), result and therefore self-fulfilling. and the remaining study involved the Reid Technique. Included studies pertaining to The impact of these methodological criterion independent screening polygraphs issues could be sampling distributions that were not available at the time of Crewson's inflate PDD test accuracy.54 This phenomenon review of the published scientific literature. may not be limited to confirmation via

54 A possible example of this phenomenon can be seen in Mangan et al., (2008) who reported the results of a survey of the confession-confirmed test results of one experienced examiner. The reported results were 100% accurate, a finding in accord with what would be expected to arise from a confession-based selection bias.

55 An alternative explanation would hold that the difference is the result of differences in ecological and external validity of the test circumstances. These hypotheses have not been thoroughly evaluated and it would be unwise to attempt to reach any conclusion with the current state of understanding. The NRC (2003) reported that this trend is not inconsistent with experience in other fields of testing and science and should be the focus of future research.

253 Polygraph, 2011, 40(4) Validated Techniques

Moderators and Mediators types of exams. This would seem to suggest There was no indication that the study that the selection of different examination results were a function of, or influenced by, strategies, involving independent or non- sample sizes. Results were not coded for independent RQs, is a practical matter that examinee characteristics, including age, should be determined by the needs of the gender, ethnicity, culture, education, or socio- testing circumstances. economic status, nor were the studies coded for their quality or methodology. Sample Ancillary Analysis results based on examinees who were subject One ancillary analysis was completed. to some form of experimental manipulation Results were calculated for CQT formats with (e.g., medications, fatigue, chronic physical or the exclusion of those studies that did not chronic mental health problems, level of satisfy a more rigorous set of selection criteria. functioning, countermeasure training or First, PDD techniques were excluded from the instructions, etc.) were not included, and ancillary analysis if both test sensitivity to these factors were not evaluated. deception and test specificity to truth-telling were not both statistically significantly greater Excluding outliers, no significant than chance. This resulted in the exclusion of differences were found in the criterion the AFMGQT, Federal ZCT, and Federal You- accuracy of PDD techniques suitable for Phase techniques when these are scored with evidentiary testing, paired testing, and the seven-position TDA model, in addition to investigative testing. This suggests that these the Backster You-Phase technique. Statistical categorical distinctions are arbitrary and outliers, not accounted for by the available therefore meaningless in a scientific sense. evidence, were also excluded. This resulted in However, the value of standardized the exclusion of the IZCT and MQTZCT and requirements for test precision becomes several studies that were seriously clearer when considering policy decisions that confounded. Exclusion of techniques for emphasize or require the use of evidence- which there is no published statistics based methods and restrict the use of un- describing test reliability also resulted in the validated or experimental methods. The exclusion of the IZCT. Similarly, PDD scientific value of categorical distinctions techniques and studies were excluded if there becomes more obvious when considering the were significant interaction or main effect difficulty in answering questions about the differences between the sampling scientific accuracy, and the complications that distributions, indicating that the sample result from the inclusion into accuracy distributions are not representative of each estimates of less accurate and arcane other. This also resulted in the exclusion of methods that have been supplanted or the IZCT and MQTZCT. Studies were also replaced by more effect modern alternatives. excluded if statistical descriptions of the Ethically, it is difficult to imagine some sampling distributions were not available or justification, when decisions affect individual could not be calculated from the available lives, community safety and national security, data. This resulted in the removal of two for the use of methods which the scientific studies on the DLST (Research Division Staff, evidence has shown to be sub-optimal or sub- 1995a; 1995b), one study on the AFMGQT standard. (Senter, Waller & Krapohl, 2008), and two studies on the MQTZCT (Shurani, Stein & Comparison of accuracy rates for PDD Brand, 2009; Shurani, 2011). techniques interpreted with the assumption of criterion independence versus non- CQT formats retained for ancillary independence showed no significant analysis produced a combined decision differences in decision accuracy. However, a accuracy rate of .898 (.840 - .955) and an significant interaction effect for inconclusive inconclusive rate of .092 (.033 - .150)56 for results suggests there may be subtle PDD techniques interpreted with decision differences in inconclusive rates for these rules based on an assumption of

56 Calculated as the weighted average of unweighted decision accuracy and the unweighted inconclusive rate.

Polygraph, 2011, 40(4) 254 Ad Hoc Committee on Validated Techniques

non- independence of the criterion variance of the DLST/TES and AFMGQT PDD were the RQs. PDD techniques interpreted with achieved using decision rules that are based decision rules based on an assumption of on an assumption of criterion independence independent criterion variance produced a among the RQs. Generalizability of the results decision accuracy rate of .857 (.782 - .932) of this meta-analysis may depend, in part, on and an inconclusive rate of .117 (.058 - .177). the correctness of this assumption. The aggregated decision accuracy rate for all studies and PDD techniques included in the Some of the included studies are ancillary analysis was .883 (.817 - .950) with impaired by obvious research confounds, the an inconclusive rate of .116 (.056 - .175).57 most noticeable of which is that some samples Two-way ANOVAs showed that neither main were selected with an emphasis on examinee effect nor interactions were significant when confession as a central feature of the criterion. comparing the decision accuracy for the Another important confound, observed in ancillary analysis with that of the entire meta- some of the included studies, was that the analysis. The interaction effect was primary author was also the developer of a significant for inconclusive results [F (1,1992) PDD technique for which there exists some = 17.335, (p < .001)]. Inconclusive rates were form of proprietary, or financial interest. slightly higher for truthful cases with non- Indeed, it would appear that one of the independent techniques, and slightly higher markers for these kinds of studies (and typical for deceptive cases with criterion independent of advocacy research elsewhere) is that the PDD techniques. Post-hoc ANOVAs showed reported near-perfect accuracy demonstra- that none of the one-way differences were tions are statistical outliers to the distribution significant, indicating that these small of results from less confounded studies. differences are unlikely to be noticed by field examiners. The absence of critical information and critical commentary in some included study One-way ANOVAs showed that the reports gives the impression of a file-drawer results of the ancillary analysis did not differ bias in which less than favorable results are significantly from the results of the entire not submitted for publication. Another ver- meta-analysis for correct decisions [F (1,5471) sion of this problem seems to have occurred in = 0.08, (p = 0.777)] or inconclusive results [F the context of this meta-analysis, in which (1,5471) = 0.08, (p = 0.777)]. This indicates some of the study data initially provided to the that the use of more rigorous study selection committee, and some of the published requirements would be unlikely to produce sampling means, included only those results meta-analytic results that differ from the for which the scorers achieved the correct results of this study. results, initially withholding the results of inconclusive and error cases. The result of Limitations this is that published sampling means for Two obvious limitations pertain to this some studies are systematically devoid of analysis. First, studies were not coded for error or uncontrolled variance and must field and laboratory studies and no attempt therefore be considered not generalizable. was made to investigate any effects from differences in study design. Instead, field and Confounds related to individual laboratory results were included with equal studies can complicate the meaning and consideration and the results of all studies interpretation of the results of the meta- were combined regardless of design. analysis. These concerns represent an Secondly, there was no attempt to investigate example of the value and need for scientific decision accuracy at the level of the individual rigor and independence when evaluating the questions for any of the included PDD effectiveness of PDD and lie detection techniques. Related to this second confound methods. Study selection and inclusion rules is the fact that the results of studies involving for the meta-analysis were intended to be as

57 Calculated as the unweighted average of all studies included in the ancillary analysis.

255 Polygraph, 2011, 40(4) Validated Techniques

inclusive as possible, yet maintain a level of Another limitation of this analysis is scientific rigor. To reduce the impact of these that none of the included studies involved confounds on the meta-analysis, aggregated juvenile examinees. As a result, a results have been provided both with and conservative evaluation of the present results without outlier results. would suggest that our present knowledge- base can be considered applicable only to Another confounding issue with some physically and mentally healthy adults of studies is that the level of education, training normal functional characteristics. A more and knowledge regarding psychological, generous interpretation would recognize that physiological and testing principles may be there is little difference between adults and significantly greater for participating older juveniles in terms of physiology as examiners than for most field examiners. measured or utilized by modern polygraph Most blind scoring studies of PDD accuracy sensors and little difference in the involve highly experienced experts. Studies of psychological bases for polygraph reactions the ESS have been an exception to this trend, between adults and older, developmentally making use of inexperienced examiners and mature, juveniles. Generalizing these results scorers, and recent studies by Honts and his to persons who are known outliers compared colleagues have involved students trained to to the expected distribution of persons from collect the study data. the normative population (i.e., persons whose functional characteristics are outside the Meta-analysis always involves the normal range) should be done with great imposition of study selection rules, and it is caution. always possible that a meta-analysis based on a different set of inclusion criteria would lead Some of the included studies lacked to different results. All studies in this complete information, though it was possible analysis were regarded equally if they met the to calculate the reliability, sampling publication requirements and provided distributions, and the dimensional profile of sufficient information to evaluate the criterion criterion accuracy from raw data that was accuracy and reliability or generalizability of provided to the ad hoc committee. Sample the study results. Qualitative requirements data were not available for some studies, for inclusion in this study pertained only to notably those from the U.S. Government. whether included studies satisfactorily Those studies include the development and represented field testing instrumentation and validation studies on the TES (DLST) and the components, and satisfactorily represented a AFMGQT techniques. These studies did PDD technique for which a published report reliability and accuracy data that was description exists for the test question sufficient to include them in the meta- sequence and method for test data analysis. analysis, and all of these studies have been Meta-analytic weighting values were assigned replicated independently. according to the sample size and number of scorers for each study, though there were no An obvious limitation of this meta- obvious effects related to sample size. analysis is that it did not include the results Although previous statistical analyses have of computerized scoring algorithms. not identified any significant differences in the results of field and laboratory studies, it is One other issue deserves mention. possible that meta-analytic results would be The principal investigator for this meta- slightly different if the included studies were analysis was also the primary author of a coded and weighted for other dimensions, number of included studies.58 The committee including study quality, design, sampling was aware that his research was significantly, methodology, proprietary interests, or and at times solely, involved in studies that inclusion of the primary author as a study were proffered to validate some of the included participant. techniques. Some of these techniques

58 Mr. Nelson has no financial, proprietary or personal interest in any of the PDD techniques or methodologies included in this meta-analysis.

Polygraph, 2011, 40(4) 256 Ad Hoc Committee on Validated Techniques

would not have been included without these One practical area of needed research studies. As such, it was the responsibility of involves the generalizability of normative data the committee to weigh his judgments against and accuracy estimates for PDD techniques factors that may have diminished his interpreted with the assumption of criterion independence. This was of vital concern, independence, including both multi-facet and inasmuch as a bias in study selection or data multi-issue exams. Another practical area of analysis would seriously compromise the needed research involves the use of DLCs with integrity of the final report. Upon closer additional PDD test formats. scrutiny, reservations regarding these two potentially conflicting roles (principal author Continued research and improvement of the meta-analysis and author of studies is needed for all PDD techniques, and these included in the meta-analysis) were mitigated improvements should be fully integrated into by the lack of any apparent personal interests both training and field practices. Additional in the outcomes of those studies and his studies should be completed to increase the limited participatory role in any study (never knowledge base regarding moderator variables having conducted or scored any of the such as examinee characteristics (e.g., examinations). His published or pending juveniles, older persons, persons with mental studies did not reveal any discernible pattern illness, and persons with medical health of preference for or against particular complications) in addition to crime details or polygraph techniques. Finally, all decisions characteristics that lead to the most effective for study inclusion were made collectively use of PDD examinations. Additional research among the committee members based on the is needed in screening examinations, merits of the research: no single committee including studies pertaining to the decision member had absolute authority to exclude or theoretic complexities inherent to include a given study. While reasonable examinations constructed with multiple individual opinions may differ in some parts of independent targets. Researchers should this report, the committee took deliberate care continue to increase their use of Monte Carlo to ensure that personal interests among those models and other statistical methods that can in the committee would not be cause for be used to provide answers to complex criticism of the report. Full disclosure of the research problems that are difficult to relationship between the principal author of investigate through other methods. Results of this report and his research is provided here Monte Carlo studies should be compared to to meet the standard ethical obligation in those from live experiments from both field scientific reports. and laboratory settings.

Recommendations A number of mediator variables have Because no significant differences were been suggested as having a significant effect found among the 14 PDD techniques included on the accuracy of the PDD exam, and some in this meta-analysis, no attempt should be of these involve complex psychological and made to describe these techniques in terms of linguistic assumptions that may or may not a rank order regarding effectiveness. be fully testable. Untestable hypotheses Available evidence does not support any PDD should be discarded in favor of testable ones, technique as superior to others. Attempts at and additional research should be conducted establishing any hierarchy of efficacy are to understand the merits of procedural and therefore unwarranted. Instead, less attention structural hypothesis that have been should be given to named PDD techniques suggested as related to test accuracy. and meaningless differences in PDD test Evidence from scientific studies should formats. More emphasis should be given to become a standing expectation, and test construction details for which there is developers and practitioners of PDD exams replicated evidence of their contribution to should resist the temptation to include criterion accuracy. More emphasis should be authority-based and anecdotal theories which given to the important practical and decision have not been tested. theoretic differences in PDD techniques for which the RQs are interpreted as independent Increased research standards are or non-independent. needed, including requirements for transparency and statements of interest from

257 Polygraph, 2011, 40(4) Validated Techniques

all authors and participants. More and statistical decision theory is still not importantly, because research on PDD test common in TDA methods for PDD exams. effectiveness is a process of testing the test, Instead, TDA methods in field use emphasize primary authors should be required to refrain manual scoring methods with integer-level from also functioning as a study participant. and rank-level precision that should be It is especially important, when the primary considered blunt and unreliable compared to author is also the PDD technique developer or the precision and reliability that can be lacks independence due to a financial or obtained via automated measurement and business relationship with the developer, that statistical analysis. There is a growing basis the research data and methodology be of evidence that indicates that computer subjected to rigorous objective and external algorithms can be equally or more effective review before a profession or the community is than manual TDA methods as long as the data encouraged to rely on the research results. are of satisfactory quality. Because PDD examination results may play a decision Researchers should be required to support role in matters that affect individual provide statistical descriptions of the sampling lives, community safety and national security, distributions. This will facilitate more effective the developers of computer algorithms should comparison of sampling distributions, and will be required to provide complete descriptions increase the ability to evaluate and of the operational procedures, in addition to understand the representativeness and the evaluation criteria, data transformation generalizability of study results. Just as the and aggregation methodology, normative data, results of single un-replicated studies are of and statistical basis in decision theory, signal little actual value to meta-analytic research, detection theory, signal discrimination theory, the results of studies that employ a single regression analysis or machine learning. expert scorer are of little actual value to the profession. Researchers should be Continued development and encouraged or required to use multiple scorer refinement of PDD testing methodologies is participants of varied training and experience. needed. PDD testing procedures have Both examiners and examinees should be changed little over the past decade. randomly selected whenever possible. This Development efforts during this time have will increase the ability to study and focused on improvements to test data analytic understand the generalizability of PDD methods, including decision rules (Senter, methods. 2003; Senter & Dollins, 2002; 2004; 2008), statistical algorithm development (Nelson, Some studies included in the meta-Krapohl & Handler, 2008), numerical analysis are not adequately identified transformations (Krapohl, 2010; Nelson, regarding the type of study, and the effect is Krapohl & Handler, 2008; Nelson et al., 2011), potentially misleading for the profession. Pilot and an increased use of normative data to studies and surveys should be clearly calculate error rates and optimal decision identified as such, and should not be included cutscores (Krapohl, 2010; Nelson & Handler, in future systematic reviews or meta-analysis 2010; Nelson, Krapohl & Handler, 2008; of criterion accuracy. Criterion studies should Nelson et al., 2011). Additional research and also be clearly identified from studies designed more detailed investigation and comparison of to evaluate moderator or mediator variables or numerical transformation models is needed, questions of construct and causality. including seven-position, three-position, ESS, rank-order transformation methods, those of Reliability statistics should be required computer algorithms, and the application of for all studies, unless precluded by the study these transformations to examinations design (e.g., computer algorithm or simulation constructed from independent and non- studies). Primary authors should be required independent examination targets. to make all raw data and numerical scores available for review and extended analysis. PDD component sensors have changed little for several decades. This may be a mixed The results of computerized statistical blessing. Although critics may point to this as TDA algorithms should be included in future a stagnation of research and development, studies of this type. The use of computers there is a considerable published knowledge

Polygraph, 2011, 40(4) 258 Ad Hoc Committee on Validated Techniques

base supporting and describing the inclusion of new and emerging information. effectiveness of the presently used array of Future meta-analytic studies should code and PDD component sensors. It would be evaluate for moderators such as examiner premature to abandon that knowledge base in characteristics, examinee characteristics, and an attempt to satisfy a collective hunger for mediators such as study quality, financial new methods. Any replacement of the interests, and other possible mediators. presently used component sensors must be accompanied by published and replicated Conclusions evidence that the data and information Results of this meta-analysis show provided by the new sensors is as good as, or that a number of studies are of satisfactory better than, the data and information from the rigorous quality to provide a basis of empirical presently used sensors. Additionally, the use support describing the generalizability of an of new and improved sensors will face a array of PDD techniques at criterion accuracy substantial and non-trivial burden of levels that significantly exceed chance developing and demonstrating the expectations. Although somewhat arbitrary, incorporation of new data into new or existing the APA 2012 standards of practice and normative data and new or existing structural requirements for test accuracy are helpful to decision models. Despite these general the profession. The goal of professional cautions about the replacement of PDD standards is to promote the use of effective components and physiological measures, it is methods, and discourage the use of less also clear that the PDD test remains imperfect effective and unproven methods. Fourteen and in need of continued advancement. PDD PDD techniques were found to be supported test methods will not be improved without by multiple published studies and to satisfy replacing less effective methods and the requirements of the APA 2012 Standards techniques with more effective ones. of Practice. Normative data are available for each of these 14 PDD techniques. We note To give greater confidence in the that despite the imperfections of the effectiveness of the IZCT and the MQTZCT (or polygraph, the NRC (2003) reported that none any proprietary methods59), they should be of the potential new technologies was ready to subject to replication by independent replace the polygraph, and this condition researchers, who did not develop the appears not to have changed at the present techniques, have no business relationship time. with the developers, did not conduct the exams, analyze the data, or report the findings APA standards do not themselves at the time of the exams. All raw data and impose qualitative or methodological numerical scores should be made available for requirements for scientific evidence,60 and no extended analysis. If these techniques are quantitative requirements are stated beyond inherently superior to others, there should be the requirement for two publications that no great difficulty in confirming this through indicate certain levels of precision for high-quality independent research. examination decisions. Herein rests a potential weakness of the APA standards: a Finally, this meta-analysis should be simplistic interpretation of the requirements repeated at some future time, with the suggests that anyone could press ink onto

59 In fairness to the developers of these methods, every “lie detection” method in the past 100 years that was researched by the developer or by an enthusiastic user evaluating his or her own examinations has reported accuracy approaching perfection. It is one of the hallmarks of advocacy research in all fields, not just lie detection. The trend includes Marston’s discontinuous blood pressure technique, Summer’s Pathometer in the 1930s, MacNitt with the Relevant-Irrelevant technique in the 1940s, Lykken’s GKT, Farwell’s “Brain Fingerprinting,” and the Computer Voice Stress Analyzer. In each case the authors reported stellar accuracy, usually greater than 99%. In all these cases, however, subsequent research was either absent or resulted in accuracies significantly lower than the original reports.

60 The APA has adopted a standard for research, which can be found online at www.polygraph.org and printed in the journal Polygraph.

259 Polygraph, 2011, 40(4) Validated Techniques

paper two times in self-published volumes, A conservative assessment would suggest that claiming perfect or near perfect accuracy, and the practice of conducting experimental subsequently claim compliance with the APA methods on the public, when effective standards for test validation at the highest evidence-based methods are available and categorical level – for evidentiary testing. easily implemented with no additional costs, While this would be viewed by scientific may be considered reasonable under some thinkers with some skepticism, the example circumstances, but calls for compelling illustrates the need in meta-analytic research justification and includes ethical requirements for the definition of more rigorous study for informed consent and notification. inclusion and exclusion criteria. Just as not all evidence is good evidence, not all Finally, this meta-analysis should be publications are useful. Publication itself is considered an information resource only, and not an endorsement of fact, and merely the results of this study should not be indicates that editors and reviewers agreed interpreted as APA policy. No attempt should that the work would be of some interest to the be made to represent or interpret this profession. document or the results of this study as the only or final authority on PDD test validation. Nothing in this document should be Other, equally reasonable, approaches are taken to suggest that we presently know also possible regarding the evaluation of the everything we need to know, or everything scientific literature on PDD testing. there is to know, about PDD testing. There is always more to learn and there is always a This project was completed with the need for continued research. More goal of summarizing the existing published information will undoubtedly become available scientific literature regarding PDD techniques in the future, and it is incumbent on and criterion accuracy, and to provide a professionals to continue to incorporate convenient resource to those who may wish to practices based on new and improving avoid the burden of reviewing the research evidence from high quality scientific studies. literature for themselves. Every effort has To do otherwise is to subject the future of the been extended to not only provide profession to opinion, which will be vulnerable conclusions, but also in-depth explanations to personalities, politics, and personal that underlie those conclusions so that interests. In the strictest sense, PDD readers can better understand their basis. techniques for which there is an inadequate basis of published and replicated scientific The information herein is provided to studies must be considered experimental, the APA Board to advise its professional regardless of how long they have existed. membership of the strength of validation of However, the suggestion of abandoning un- PDD techniques in present use. This validated, ineffective, or experimental methods information is intended only to help PDD that have long been used in field practice is professionals make informed decisions not without controversy. regarding the selection of PDD techniques for use in field settings. It may also assist It can be, and has been, argued that program administrators, policy makers, and some unstudied or experimental methods may courts to make evidence-based decisions work as well or better than some proven about the informational value of PDD test methods, and may provide specialized benefits results in general. Nothing should prevent the in certain ways. Conversely, it is also possible use of any PDD technique that is supported that experimental and unproven methods do by scientific research that demonstrates an not work as well as those with evidence of accuracy rate significantly greater than scientific validity. In the worst circumstances, chance, so long as that use is compliant with the use of experimental methods could result the requirements of local laws, regulations, in otherwise-avoidable adverse consequences. and enforceable standards.

How to cite this document:

American Polygraph Association (2011). Meta-analytic survey of criterion accuracy of validated polygraph techniques. [Electronic version] Retrieved , from http://www .polygraph.org.

Polygraph, 2011, 40(4) 260 Ad Hoc Committee on Validated Techniques

References

* indicates studies that were included in the meta-analysis.

indicates studies that were cited only in the appendices.

Abrams, S. (1973). Polygraph validity and reliability: A review. Journal of Forensic Sciences, 18, 313-326.

Abrams, S. (1977). A polygraph handbook for attorneys. Lexington, MA: Lexington Books.

Abrams, S. (1989). The complete polygraph handbook. Lexington, MA: Lexington Books.

Abrams, S. (1984). The question of the intent question. Polygraph, 13, 326-332.

Anderson, C. A., Lindsay, J. J., & Bushman, B. J. (1999). Research in the psychological laboratory: Truth or triviality? Current Directions In Psychological Science, 8, 3-9.

Ansley, N. (1983). A compendium on polygraph validity. Polygraph, 12, 53-61.

Ansley, N. (1989). Accuracy and utility of RI screening by student examiners at DODPI. Polygraph and Personnel Security Research. Office of Security. National Security Agency. Fort George G. Meade, MD.

Ansley, N. (1990). The validity and reliability of polygraph decisions in real cases. Polygraph, 19, 169-181.

Ansley, N. (1992). The history and accuracy of guilty knowledge and peak of tension tests. Polygraph, 21, 174-247.

Backster, C. (1963). Standardized polygraph notepack and technique guide: Backster zone comparison technique. Cleve Backster: New York.

Backster School of Lie Detection (2011). Basic polygraph examiner's course chart interpretation notebook. Backster School of Lie Detection: San Diego.

Barland, G. H., Honts, C. R., & Barger, S. D. (1989). Studies of the accuracy of security screening polygraph examinations. Department of Defense Polygraph Institute.

Barland, G. H. & Raskin, D. C. (1975). Psychopathy and detection of deception in criminal suspects. Psychophysiology, 12, 224.

Bell, B. G., Kircher, J. C., & Bernhardt, P. C. (2008). New measures improve the accuracy of the directed-lie test when detecting deception using a mock crime. Physiology and Behavior, 94, 331-340.

Bell, B. G., Raskin, D. C., Honts, C. R., & Kircher, J. C. (1999). The Utah numerical scoring system. Polygraph, 28(1), 1-9.

Blackstone, K. (2011). Polygraph, Sex Offenders, and the Court: What Professionals Should Know About Polygraph..., and a Lot More. Concord, MA: Emerson Books.

261 Polygraph, 2011, 40(4) Validated Techniques

*Blackwell, J. N. (1998). PolyScore 33 and psychophysiological detection of deception examiner rates of accuracy when scoring examination from actual criminal investigations. Available at the Defense Technical Information Center. DTIC AD Number A355504/PAA. Reprinted in Polygraph, 28(2) 149-175.

*Blalock, B., Cushman, B., & Nelson, R. (2009). A replication and validation study on an empirically based manual scoring system. Polygraph, 38, 281-288.

Blalock, B., Nelson, R., Handler, M., & Shaw, P. (2011). A position paper on the use of directed lie comparison questions in diagnostic and screening polygraphs. Police Polygraph Digest, 2-5.

Brownlie, C., Johnson, G. J., & Knill, B. (1997) Validation study of the relevant/irrelevant screening format. Unpublished report.

Capps, M. H. (1991). Predictive value of the sacrifice relevant. Polygraph, 20(1), 1-8.

Capps, M. H. & Ansley, N. (1992). Comparison of two scoring scales. Polygraph, 21, 39-43.

Capps, M. H., Knill, B. L., & Evans, R. K. (1993). Effectiveness of the symptomatic questions. Polygraph, 22, 285-298.

Correa, E. J. & Adams, H. E. (1981). The validity of the pre-employment polygraph examination and the effects of motivation. Polygraph, 10, 143-155.

Crewson, P. E. (2001). A comparative analysis of polygraph with other screening and diagnostic tools. Research Support Service. Report No. DoDPI01-R-0003. Reprinted in Polygraph 32, (57-85).

Department of Defense (2006). Federal psychophysiological detection of deception examiner handbook. Reprinted in Polygraph, 40(1), 2-66.

*Driscoll, L. N., Honts, C. R., & Jones, D. (1987). The validity of the positive control physiological detection of deception technique. Journal of Police Science and Administration, 15, 46-50. Reprinted in Polygraph, 16(3), 218-225.

Forman, R. F. & McCauley, C. (1986). Validity of the positive control polygraph test using the field practice model. Journal of Applied Psychology, 71, 691-698. Reprinted in Polygraph, 16(2), 145-160.

Ganguly, A. K., Lahri, S. K., & Bhaseen, V. (1986). Detection of deception by conventional qualitative method and its confirmation by quantitative method - An experimental study in polygraphy. Polygraph, 15, 203-210.

Ginton, A., Daie, N., Elaad, E., & Ben-Shakhar, G. (1982). A method for evaluating the use of the polygraph in a real-life situation. Journal of Applied Psychology, 67, 131-137.

Gordon, N. J. (1999). The academy for scientific investigative training's horizontal scoring system and examiner's algorithm system for chart interpretation. Polygraph, 28, 56-64.

Gordon, N. J., Fleisher, W. L., Morsie, H., Habib, W., & Salah, K. (2000). A field validity study of the integrated zone comparison technique. Polygraph, 29, 220-225.

Polygraph, 2011, 40(4) 262 Ad Hoc Committee on Validated Techniques

*Gordon, N. J., Mohamed, F. B., Faro, S. H., Platek, S. M., Ahmad, H., & Williams, J. M. (2005). Integrated zone comparison polygraph technique accuracy with scoring algorithms. Physiology & behavior, 87(2), 251-254. (Same study is described in Mohamed, F. B., Faro, S. H., Gordon, N. J., Platek, S. M., Ahmad, H. & Williams, J.M. (2006).)

Handler, M. (2006). The Utah PLC. Polygraph, 35, 139-148.

Handler, M. & Nelson, R. (2008). Utah approach to comparison question polygraph testing. European Polygraph, 2, 83-119.

Handler, M. & Nelson, R. (In press). Criterion validity of the United States Air Force Modified General Question Technique and three position scoring. Polygraph.

*Handler, M., Nelson, R., Goodson, W., & Hicks, M. (2010). Empirical Scoring System: A cross- cultural replication and extension study of manual scoring and decision policies. Polygraph, 39, 200-215.

Harwell, E. M. (2000). A comparison of 3- and 7-position scoring scales with field examinations. Polygraph, 29, 195-197.

Hilliard, D. L. (1979). A cross analysis between relevant questions and a generalized intent to answer truthfully question. Polygraph, 8, 73-77.

*Honts, C. R. (1996). Criterion development and validity of the CQT in field application. The Journal of General Psychology, 123, 309-324.

Honts, C. R., & Amato, S. L. (1999). The automated polygraph examination: Final report of U. S. Government Contract No. 110224-1998-MO. Boise State University.

*Honts, C. R., Amato, S. & Gordon, A. (2004). Effects of outside issues on the comparison question test. Journal of General Psychology, 131(1), 53-74.

Honts, C. R. & Driscoll, L. N. (1987). An evaluation of the reliability and validity of rank order and standard numerical scoring of polygraph charts. Polygraph, 16, 241-257.

Honts, C. R. & Hodes, R. L. (1983). The detection of physical countermeasures. Polygraph, 12, 7-17.

*Honts, C. R., Hodes, R. L., & Raskin, D. C. (1985). Effects of physical countermeasures on the physiological detection of deception. Journal of Applied Psychology, 70(1), 177-187.

Honts, C. R. & Peterson, C. F. (1997). Brief of the Committee of Concerned Social Scientists as Amicus Curiae United States v Scheffer. Available from the author.

*Honts, C. R. & Raskin, D. (1988). A field study of the validity of the directed lie control question. Journal of Police Science and Administration, 16(1), 56-61.

*Honts, C. R., Raskin, D. C., & Kircher, J. C. (1987). Effects of physical countermeasures and their electromyographic detection during polygraph tests for deception. Psychophysiology, 1, 241-247.

Honts, C. R., & Reavy, R. (2009). Effects of Comparison Question Type and Between Test Stimulation on the Validity of Comparison Question Test. US Army Research Office: Grant Number W911NF-07-1-0670.

263 Polygraph, 2011, 40(4) Validated Techniques

*Horowitz, S. W., Kircher, J. C., Honts, C. R., & Raskin, D. C. (1997). The role of comparison questions in physiological detection of deception. Psychophysiology, 34, 108-115.

Horvath, F. S. (1977). The effect of selected variables on interpretation of polygraph records. Journal of Applied Psychology, 62, 127-136.

Horvath F. S. (1988). The utility of control questions and the effects of two control question types in field polygraph techniques. Journal of Police Science and Administration, 16(3), 198-209. Reprinted in Polygraph, 20, 7-25.

Horvath, F. S. (1994). The value and effectiveness of the sacrifice relevant question: An empirical assessment. Polygraph, 23, 261-279.

Horvath, F. & Palmatier, J. (2008). Effect of two types of control questions and two question formats on the outcomes of polygraph examinations. Journal of Forensic Sciences, 53(4), 1- 11.

Horvath, F. S. & Reid, J. E. (1971). The reliability of polygraph examiner diagnosis of truth and deception. Journal of Criminal Law, Criminology and Police Science, 62, 276-281.

Hunter, F. L., & Ash, P. (1973). The accuracy and consistency of polygraph examiners' diagnosis. Journal of Police Science and Administration, 1, 370-375.

Jayne, B. (1989). A comparison between the predictive value of two common preemployment screening procedures. The Investigator, 5(3).

Jayne, B. C. (1990). Contributions of physiological recordings in the polygraph technique. Polygraph, 19, 105-117.

Kircher, J. C., Kristjiansson, S. D., Gardner, M. K., & Webb, A. (2005). Human and computer decision-making in the psychophysiological detection of deception. University of Utah.

*Kircher, J. C. & Raskin, D. C. (1988). Human versus computerized evaluations of polygraph data in a laboratory setting. Journal of Applied Psychology, 73, 291-302.

Kokish, R., Levenson, J. S., & Blasingame, G. D. (2005). Post-conviction sex offender polygraph examination: client-reported perceptions of utility and accuracy. Sexual Abuse : A Journal of Research and Treatment, 17, 211-21.

Krapohl, D. J. (1998). A comparison of 3- and 7- position scoring scales with laboratory data. Polygraph, 27, 210-218.

Krapohl, D. J. (2002). Short report: Update for the objective scoring system. Polygraph, 31, 298- 302.

Krapohl, D. J. (2005). Polygraph decision rules for evidentiary and paired testing (Marin protocol) applications. Polygraph, 34, 184-192.

Krapohl, D. J. (2006). Validated polygraph techniques. Polygraph, 35(3), 149-155.

Krapohl, D. J. (2010). Short report: A test of the ESS with two-question field cases. Polygraph, 39, 124-126.

Polygraph, 2011, 40(4) 264 Ad Hoc Committee on Validated Techniques

*Krapohl, D. J. & Cushman, B. (2006). Comparison of evidentiary and investigative decision rules: A replication. Polygraph, 35(1), 55-63.

Krapohl, D. J., Dutton, D. W. & Ryan, A. H. (2001). The rank order scoring system: Replication and extension with field data. Polygraph, 30, 172-181.

Krapohl, D. J. & McManus, B. (1999). An objective method for manually scoring polygraph data. Polygraph, 28, 209-222.

Krapohl, D. J. & Norris, W. F. (2000). An exploratory study of traditional and objective scoring systems with MGQT field cases. Polygraph, 29, 185-194.

Krapohl, D. J. & Ryan, A. H. (2001). A belated look at symptomatic questions. Polygraph, 30, 206-212.

Krapohl, D. J., Senter, S. M., & Stern, B. A. (2005). An exploration of methods for the analysis of multiple-issue Relevant/Irrelevant screening data. Polygraph, 34(1), 47-62.

Lykken, D. T. (1959). The GSR in the detection of guilt. Journal of Applied Psychology, 43, 385- 388.

*MacLaren, V. V. (2001). A quantitative review of the guilty knowledge test. The Journal of Applied Psychology, 86, 674-683.

*Mangan, D. J., Armitage, T. E., & Adams, G. C. (2008). A field study on the validity of the Quadri-Track Zone Comparison Technique. Physiology and Behavior, 17-23.

Matte, J. A. (1990). Validation study on the polygraph Quadri-Zone Comparison Technique. Research Abstract LD 01452, Vol. 1502, 1989, University Microfilm International (UMI), Ann Arbor, MI.

Matte, J. A. (2010). A field study of the Backster Zone Comparison Technique's Either Or Rule and scoring system versus two other scoring systems when relevant question elicits strong response. European Polygraph, 4, 53-69.

*Matte, J. A. & Reuss, R. M. (1989). A field validation study of the Quadri-Zone Comparison Technique. Polygraph, 18, 187-202.

Meiron, E., Krapohl, D. J., & Ashkenazi, T. (2008). An assessment of the Backster “Either-Or” Rule in polygraph scoring. Polygraph, 37, 240-249.

Mohamed, F. B., Faro, S. H., Gordon, N. J., Platek, S. M., Ahmad, H., & Williams, J. M. (2006). Brain mapping of deception and truth telling about an ecologically valid situation: functional MR imaging and polygraph investigation--initial experience. Radiology, 238, 679-88.

National Research Council (2003). The Polygraph and Lie Detection. Washington, D.C.: National Academy of Sciences.

*Nelson, R. (In press). Monte Carlo study of criterion validity of Backster You-Phase examinations. Polygraph.

265 Polygraph, 2011, 40(4) Validated Techniques

*Nelson, R. (In press). Monte Carlo study of criterion validity of the Directed Lie Screening Test using the seven-position, three-position and Empirical Scoring Systems. Polygraph.

*Nelson, R. (2011). Monte Carlo study of criterion validity for two-question zone comparison tests with the Empirical Scoring System, seven-position, and three-position scoring models. Polygraph, 40, 146-156.

*Nelson, R. & Blalock, B. (In press). Extended analysis of Senter, Waller and Krapohl's AFMGQT examination data with the Empirical Scoring System and the Objective Scoring System, version 3. Polygraph, (In press).

*Nelson, R., Blalock, B., & Handler, M. (2011). Criterion validity of the Empirical Scoring System and the Objective Scoring System, version 3 with the USAF Modified General Question Technique. Polygraph, 40, 172-179.

*Nelson, R., Blalock, B., Oelrich, M., & Cushman, B. (2011). Reliability of the Empirical Scoring System with expert examiners. Polygraph, 40, 131-139.

Nelson, R. & Handler, M. (2010). Empirical Scoring System. Lafayette Instrument Company.

*Nelson, R. & Handler, M. (In press). Monte Carlo study of the United States Air Force Modified General Question Technique with two three and four questions. Polygraph.

*Nelson, R., Handler, M., Adams, G., & Backster, C. (In press). Survey of reliability and criterion validity of Backster numerical scores of You-Phase exams from confirmed field investigations. Polygraph.

*Nelson, R., Handler, M., Blalock, B., & Cushman, B. (In press). Blind scoring of confirmed federal You-Phase examinations by experienced and inexperienced examiners: Criterion validity with the Empirical Scoring System and the seven-position model. Polygraph.

*Nelson, R., Handler, M., Blalock, B., & Hernández, N. (In press). Replication and extension study of Directed Lie Screening Tests: Criterion validity with the seven- and three-position models and the Empirical Scoring System. Polygraph.

*Nelson, R., Handler, M., & Morgan, C. (In press). Criterion validity of the Directed Lie Screening Test and the Empirical Scoring System with inexperienced examiners and non-naive examinees in a laboratory setting. Polygraph.

*Nelson, R., Handler, M., Morgan, C., & O’Burke, P. (In press). Criterion validity of the United States Air Force Modified General Question Technique and Iraqi scorers. Polygraph.

*Nelson, R., Handler, M., & Senter, S. (In press). Monte Carlo study of criterion validity of the Directed Lie Screening Test using the Empirical Scoring System and the Objective Scoring System version 3. Polygraph.

*Nelson, R., Handler, M., Shaw, P., Gougler, M., Blalock, B., Russell, C., Cushman, B., & Oelrich, M. (2011). Using the Empirical Scoring System. Polygraph, 40(2), 67-78.

*Nelson, R. & Krapohl, D. (2011). Criterion validity of the Empirical Scoring System with experienced examiners: Comparison with the seven-position evidentiary model using the Federal Zone Comparison Technique. Polygraph, 40, 79-85.

Polygraph, 2011, 40(4) 266 Ad Hoc Committee on Validated Techniques

*Nelson, R., Krapohl, D., & Handler, M. (2008). Brute force comparison: A Monte Carlo study of the Objective Scoring System version 3 (OSS-3) and human polygraph scorers. Polygraph, 37, 185-215.

Office of Technology Assessment (1983). The validity of polygraph testing: A research review and evaluation. Washington, D.C.: U.S. Congress, Office of Technology Assessment.

Patrick, C. J. & Iacono, W. G. (1989). Psychopathy, threat and polygraph test accuracy. Journal of Applied Psychology, 74, 347-355.

Patrick, C. J. & Iacono, W. G. (1991). Validity of the control question polygraph test: The problem of sampling bias. Journal of Applied Psychology, 76, 229-238.

Podlesny, J. A. & Raskin, D. C. (1978). Effectiveness of techniques and physiological measures in the detection of deception. Psychophysiology, 15, 344-359.

Podlesny, J., Raskin, D., & Barland, G. (1976). Effectiveness of Techniques and Physiological Measures in the Detection of Deception. Report No. 76-5, Contract 75-N1-99-001 LEAA (available through Department of Psychology, University of Utah, Salt Lake City).

Podlesny, J. A. & Truslow, C. M. (1993). Validity of an expanded-issue (modified general question) polygraph technique in a simulated distributed-crime-roles context. Journal of Applied Psychology, 78, 788-797.

Pollina, D. A., Dollins, A. B., Senter, S. M., Krapohl, D. J. & Ryan, A. H. (2004). Comparison of polygraph data obtained from individuals involved in mock crimes and actual criminal investigations. Journal of applied psychology, 89, 1099-105.

Raskin, D. C. & Hare, R. D. (1978). Psychopathy and detection of deception in a prison population. Psychophysiology, 15, 126-136.

Raskin, D. C. & Honts, C. R. (2002). Handbook of polygraph testing. In M. Kleiner (Ed.), Handbook of Polygraph Testing. San Diego: Academic Press.

Raskin, D. C. & Podlesny, J. A. (1979). Truth and deception: A reply to Lykken. Psychological Bulletin, 86, 54-59.

Reid, J. E. (1947). A revised questioning technique in lie detection tests. Journal of Criminal Law and Criminology, 37, 542-547. Reprinted in Polygraph 11, 17-21.

Reid, J. E. & Inbau, F. E. (1977). Truth and deception: The polygraph ('lie detector') technique (2nd ed). Baltimore, MD: Williams & Wilkins.

*Research Division Staff (1995a). A comparison of psychophysiological detection of deception accuracy rates obtained using the counterintelligence scope Polygraph and the test for espionage and sabotage question formats. DTIC AD Number A319333. Department of Defense Polygraph Institute. Fort Jackson, SC. Reprinted in Polygraph, 26(2), 79-106.

*Research Division Staff (1995b). Psychophysiological detection of deception accuracy rates obtained using the test for espionage and sabotage. DTIC AD Number A330774. Department of Defense Polygraph Institute. Fort Jackson, SC. Reprinted in Polygraph, 27(3), 171-180.

267 Polygraph, 2011, 40(4) Validated Techniques

Research Division Staff (2001). Test of a mock theft scenario for use in the Psychophysiological Detection of Deception: IV. Report No. DoDPI00-R-0002. Department of Defense Polygraph Institute. Reprinted in Polygraph 30(4) 244-253.

Rovner, L. I. (1986). Accuracy of physiological detection of deception for subjects with prior knowledge. Polygraph, 15(1), 1-39.

Senter, S. M. (2003). Modified general question test decision rule exploration. Polygraph, 32, 251- 263.

Senter, S. M. & Dollins, A. B. (2002). New Decision Rule Development: Exploration of a two-stage approach. Report number DoDPI00-R-0001. Department of Defense Polygraph Institute Research Division, Fort Jackson, SC. Reprinted in Polygraph 37, 149-164.

Senter, S. & Dollins, A. B. (2004). Comparison of question series and decision rules: A replication. Polygraph, 33, 223-233.

Senter, S. M. & Dollins, A. B. (2008). Optimal decision rules for evaluating psychophysiological detection of deception data: an exploration. Polygraph, 37(2), 112-124.

*Senter, S., Waller, J., & Krapohl, D. (2008). Air Force Modified General Question Test validation study. Polygraph, 37(3), 174-184.

Senter, S., Weatherman, D., Krapohl, D., & Horvath, F. (2010). Psychological set or differential salience: A proposal for reconciling theory and terminology in polygraph testing. Polygraph, 39 (2), 109-117.

*Shurani, T. (2011). Polygraph verification test. European Polygraph, 16.

*Shurani, T. & Chaves, F. (2010). Integrated Zone Comparison Technique and ASIT PolySuite algorithm: A field validity study. European Polygraph, 4(2), 71-80.

*Shurani, T., Stein, E. & Brand, E. (2009). A Field Study on the Validity of the Quadri-Track Zone Comparison Technique. European Polygraph, 1, 5-24.

Slowik, S. M. & Buckley, J. P., III (1975). Relative accuracy of polygraph examiner diagnosis of respiration, blood pressure and GSR recordings. Journal of Police Science and Administration, 3, 305-309.

Van Herk, M. (1990). Numerical evaluation: Seven point scale +/-6 and possible alternatives: A discussion. The Newsletter of the Canadian Association of Police Polygraphists, 7, 28-47. Reprinted in Polygraph, 20(2), 70-79.

Verschuere, B., Meijer, E., & Merckelbach, H. (2008). The Quadri-Track Zone Comparison Technique: It's just not science. A critique to Mangan, Armitage, and Adams (2008). Physiology and Behavior, 1-2, 27-28.

Wicklander, D. E. & Hunter, F. L. (1975). The influence of auxiliary sources of information in polygraph diagnosis. Journal of Police Science and Administration, 3, 405-409.

Polygraph, 2011, 40(4) 268 Ad Hoc Committee on Validated Techniques

Appendix A

Sample Sizes of Included Studies

Total N N Total Deceptive Truthful PDD Technique Study Scorers N Deceptive Truthful Scores Scores Scores AFMGQT (7-position) Senter, Waller & Krapohl (2008) 1 69 33 36 69 33 36 1 AFMGQT (7-position) Nelson, Handler, Morgan & O'Burke (In press) 2 22 11 11 66 33 33 3 AFMGQT (7-position) Nelson, Handler, & Senter (In press) 3 A - - - 100 50 50 1 AFMGQT (ESS) Nelson, Blalock & Handler (2011) 2 - - - 66 33 33 3 AFMGQT (ESS) Nelson & Blalock (In press) 1 - - - 69 33 36 1 AFMGQT (ESS) Nelson, Handler, & Senter (In press) 3 A 100 50 50 100 50 50 1 Backster You-Phase (Backster) Nelson, Handler, Adams & Backster (In press) 4 22 11 11 154 77 77 7 Backster You-Phase (Backster) Nelson (In press) 100 50 50 100 50 50 1 CIT MacLaren 2001 1,070 666 404 1,070 666 404 39 DLST/TES (7-position) Research Division Staff 1995a 94 26 68 94 26 68 3 DLST/TES (7-position) Research Division Staff 1995b 85 30 55 85 30 55 10 DLST/TES (7-position) Nelson (In press) B 100 50 50 100 50 50 1 DLST/TES (7-position) Nelson Handler Blalock & Hernández (In press) 5 C 49 25 24 98 50 48 2 DLST/TES (ESS) Nelson & Handler (In press) 100 50 50 100 50 50 1 DLST/TES (ESS) Nelson, Handler & Morgan (In press) 49 24 25 49 24 25 1 DLST/TES (ESS) Nelson (In press) B - - - 100 50 50 1 DLST/TES (ESS) Nelson, Handler, Blalock & Hernández (In press) 5 C - - - 98 50 48 2 Federal You-Phase (7-position) Nelson (2011) D 100 50 50 100 50 50 1 Federal You-Phase (7-position) Nelson, Handler, Blalock & Cushman (In press) 6 E - - - 220 110 110 10 Federal You-Phase (ESS) Nelson (2011) D 100 50 50 100 50 50 1 Federal You-Phase (ESS) Nelson, Handler, Blalock & Cushman (In press) 6 E - - - 220 110 110 10 Federal ZCT (7-position) Blackwell (1998) 100 65 35 300 195 105 3 Federal ZCT (7-position) Krapohl & Cushman (2006) 7 F 100 50 50 1,000 500 500 10 Honts, Amato & Gordon (2004) as reported in Honts Federal ZCT (7-position) 48 24 24 144 72 72 3 in Grahnag (2004) Federal ZCT (7-position evidentiary) Krapohl & Cushman (2006) 7 F - - - 1,000 500 500 10 Federal ZCT (7-position evidentiary) Nelson & Krapohl (2011) 8 G 60 30 30 60 30 30 6 IZCT (Horizontal) Shurani & Chavez (2010) 84 44 40 84 44 40 4 Gordon, Mohamed, Faro, Platek, Ahmad & Williams IZCT (Horizontal) 11 6 5 11 6 5 1 (2005) IZCT (Horizontal) Shurani (2011) 84 36 48 84 36 48 3 MQTZCT (Matte) Matte & Reuss (1989) dissertation 122 64 58 122 64 58 2 MQTZCT (Matte) Shurani, Stein & Brand (2009) 57 28 29 57 28 29 4 MQTZCT (Matte) Mangan, Armitage & Adams (2008) 140 91 49 140 91 49 1 Utah-RCMP/CPC (Utah) Honts, Hodes & Raskin (1985) 38 19 19 38 19 19 1 Utah-RCMP/CPC (Utah) Driscoll, Honts & Jones, 1987) 40 20 20 40 20 20 1 Utah-RCMP/CPC (Utah) Honts (1996) 32 21 11 32 21 11 1 Utah-DLC (Utah) Honts & Raskin (1988) 25 12 13 25 12 13 1 Utah-DLC (Utah) Horowitz, Kircher, Honts & Raskin (1997) 30 15 15 30 15 15 1 Utah-DLC (Utah) Kircher & Raskin (1988) 100 50 50 200 100 100 2 Utah-DLC (Utah) Honts, Raskin & Kircher (1987) 20 10 10 20 10 10 1 ZCT (ESS) Nelson, Krapohl & Handler (2008) 7 - - - 700 350 350 7 ZCT (ESS) Nelson, Blalock, Oelrich & Cushman (2011) 7 - - - 250 150 100 25 ZCT (ESS) Nelson & Krapohl (2011) 8 G - - - 60 30 30 6 ZCT (ESS) Nelson et al (2011) 572 304 268 1,382 741 641 74 ZCT (ESS) Blalock, Cushman & Nelson (2009) 7 - - - 900 450 450 9 ZCT (ESS) Handler, Nelson, Goodson & Hicks (2010) 7 - - - 1,900 950 950 19 1-8 Sample scores based on the same sample cases. A-G Sample scores published in the same study.

269 Polygraph, 2011, 40(4) Validated Techniques

Appendix B

Criterion Accuracy of Included Studies

Unweighted Unweighted PDD Technique Study Sens. Spec. FN FP D-INC T-INC Accuracy INC AFMGQT (7-position) Senter, Waller & Krapohl (2008) 1 .758 .917 .212 .083 .030 .001 .849 .015 AFMGQT (7-position) Nelson, Handler, Morgan & O'Burke (In press) 2 .818 .364 .001 .333 .182 .303 .761 .242 AFMGQT (7-position) Nelson, Handler, & Senter (In press) 3 A .780 .420 .040 .200 .140 .420 .814 .280 AFMGQT (ESS) Nelson, Blalock & Handler (2011) 2 .831 .616 .010 .175 .158 .208 .883 .183 AFMGQT (ESS) Nelson & Blalock (In press) 1 .511 .862 .211 .027 .277 .028 .839 .152 AFMGQT (ESS) Nelson, Handler, & Senter (In press) 3 A .806 .639 .067 .131 .127 .229 .876 .178 Backster You-Phase (Backster) Nelson, Handler, Adams & Backster (In press) 4 .943 .543 .009 .274 .048 .183 .828 .116 Backster You-Phase (Backster) Nelson (In press) .668 .592 .019 .079 .313 .329 .927 .321 CIT MacLaren 2001 .815 .832 .185 .168 .001 .001 .823 .000 DLST/TES (7-position) Research Division Staff 1995a .654 .676 .154 .206 .192 .118 .788 .155 DLST/TES (7-position) Research Division Staff 1995b .833 .909 .167 .073 .000 .018 .880 .009 DLST/TES (7-position) Nelson (In press) B .910 .677 .037 .184 .053 .139 .874 .096 DLST/TES (7-position) Nelson Handler Blalock & Hernández (In press) 5 C .583 .940 .271 .020 .145 .039 .831 .092 DLST/TES (ESS) Nelson & Handler (In press) .917 .587 .036 .253 .047 .160 .831 .104 DLST/TES (ESS) Nelson, Handler & Morgan (In press) .625 .950 .210 .040 .165 .010 .854 .088 DLST/TES (ESS) Nelson (In press) B .935 .730 .046 .195 .020 .075 .871 .048 DLST/TES (ESS) Nelson, Handler, Blalock & Hernández (In press) 5 C .665 .839 .207 .040 .126 .119 .859 .123 Federal You-Phase (7-position) Nelson (2011) D .833 .417 .010 .138 .157 .444 .870 .301 Federal You-Phase (7-position) Nelson, Handler, Blalock & Cushman (In press) 6 E .844 .730 .036 .171 .119 .097 .885 .108 Federal You-Phase (ESS) Nelson (2011) D .813 .729 .050 .126 .090 .102 .897 .096 Federal You-Phase (ESS) Nelson, Handler, Blalock & Cushman (In press) 6 E .859 .770 .027 .143 .145 .325 .906 .235 Federal ZCT (7-position) Blackwell (1998) .923 .448 .015 .295 .062 .257 .793 .159 Federal ZCT (7-position) Krapohl & Cushman (2006) 7 F .824 .560 .044 .180 .132 .260 .853 .196 Honts, Amato & Gordon (2004) as reported in Honts in Federal ZCT (7-position) .917 .917 .001 .083 .083 .000 .958 .042 Grahnag (2004) Federal ZCT (7-pos. evidentiary) Krapohl & Cushman (2006) 7 F .792 .824 .122 .116 .086 .060 .872 .073 Federal ZCT (7-pos. evidentiary) Nelson & Krapohl (2011) 8 G .933 .667 .000 .233 .067 .133 .870 .100 IZCT (Horizontal) Shurani & Chavez (2010) .955 .900 .023 .001 .023 .100 .988 .061 Gordon, Mohamed, Faro, Platek, Ahmad & Williams IZCT (Horizontal) .999 .800 .001 .001 .001 .200 .999 .100 (2005) IZCT (Horizontal) Shurani (2011) .999 .999 .001 .001 .001 .001 .999 .000 MQTZCT (Matte) Matte & Reuss (1989) dissertation .969 .914 .000 .001 .031 .086 .999 .059 MQTZCT (Matte) Shurani, Stein & Brand (2009) .929 1.000 .071 .001 .000 .001 .964 .000 MQTZCT (Matte) Mangan, Armitage & Adams (2008) .978 1.000 .001 .001 .022 .001 .999 .011 Utah-RCMP/CPC (Utah) Honts, Hodes & Raskin (1985) .895 .421 .001 .211 .105 .368 .833 .237 Utah-RCMP/CPC (Utah) Driscoll, Honts & Jones, 1987) .900 .900 .001 .001 .100 .100 .999 .100 Utah-RCMP/CPC (Utah) Honts (1996) .714 .818 .048 .001 .238 .182 .969 .210 Utah-DLC (Utah) Honts & Raskin (1988) .917 .846 .083 .001 .001 .154 .958 .077 Utah-DLC (Utah) Horowitz, Kircher, Honts & Raskin (1997) .733 .867 .133 .133 .133 .001 .856 .067 Utah-DLC (Utah) Kircher & Raskin (1988) .880 .860 .060 .060 .060 .080 .935 .070 Utah-DLC (Utah) Honts, Raskin & Kircher (1987) .800 .700 .001 .200 .200 .100 .889 .150 ZCT (ESS) Nelson, Krapohl & Hanlder (2008) 7 .749 .814 .154 .077 .097 .109 .872 .103 ZCT (ESS) Nelson, Blalock, Oelrich & Cushman (2011) 7 .793 .930 .073 .001 .133 .070 .958 .102 ZCT (ESS) Nelson & Krapohl (2011) 8 G .833 .633 .001 .133 .167 .233 .913 .200 ZCT (ESS) Nelson et al (2011) .863 .789 .047 .093 .103 .107 .921 .105 ZCT (ESS) Blalock, Cushman & Nelson (2009) 7 .773 .727 .122 .102 .104 .171 .870 .138 ZCT (ESS) Handler, Nelson, Goodson & Hicks (2010) 7 .865 .881 .103 .089 .040 .039 .901 .040 1-8 Sample scores based on the same sample cases. A-G Sample scores published in the same study.

Polygraph, 2011, 40(4) 270 Ad Hoc Committee on Validated Techniques

Appendix C

Reliability Statistics for Included Studies

Decision PDD Technique Study Fleiss' Kappa Correlation Agreement AFMGQT (7-position) Senter, Waller & Krapohl (2008) 1 .750 .930 .940 AFMGQT (7-position) Nelson, Handler, Morgan & O'Burke (In press) 2 - 1.000 - AFMGQT (7-position) Nelson, Handler, & Senter (In press) 3 A - - - AFMGQT (ESS) Nelson, Blalock & Handler (2011) 2 - 1.000 .931 AFMGQT (ESS) Nelson & Blalock (In press) 1 - - - AFMGQT (ESS) Nelson, Handler, & Senter (In press) 3 A - - - Backster You-Phase (Backster) Nelson, Handler, Adams & Backster (In press) 4 - - .567 Backster You-Phase (Backster) Nelson (In press) - - - CIT MacLaren 2001 - - - DLST/TES (7-position) Research Division Staff 1995a .760 .890 - DLST/TES (7-position) Research Division Staff 1995b - - - DLST/TES (7-position) Nelson (In press) B - - - DLST/TES (7-position) Nelson Handler Blalock & Hernández (In press) 5 C - .722 - DLST/TES (ESS) Nelson & Handler (In press) - - - DLST/TES (ESS) Nelson, Handler & Morgan (In press) - .911 - DLST/TES (ESS) Nelson (In press) B - - - DLST/TES (ESS) Nelson, Handler, Blalock & Hernández (In press) 5 C - .769 - Federal You-Phase (7-position) Nelson (2011) D - - - Federal You-Phase (7-position) Nelson, Handler, Blalock & Cushman (In press) 6 E - .852 - Federal You-Phase (ESS) Nelson (2011) D - - - Federal You-Phase (ESS) Nelson, Handler, Blalock & Cushman (In press) 6 E - .897 - Federal ZCT (7-position) Blackwell (1998) .570 .800 - Federal ZCT (7-position) Krapohl & Cushman (2006) 7 F - - - Honts, Amato & Gordon (2004) as reported in Honts in Federal ZCT (7-position) - - - Grahnag (2004) Federal ZCT (7-position evidentiary) Krapohl & Cushman (2006) 7 F - .870 - Federal ZCT (7-position evidentiary) Nelson & Krapohl (2011) 8 G - - - IZCT (Horizontal) Shurani & Chavez (2010) - - - Gordon, Mohamed, Faro, Platek, Ahmad & Williams IZCT (Horizontal) - - - (2005) IZCT (Horizontal) Shurani (2011) - - - MQTZCT (Matte) Matte & Reuss (1989) dissertation - - .990 MQTZCT (Matte) Shurani, Stein & Brand (2009) - - - MQTZCT (Matte) Mangan, Armitage & Adams (2008) - - - Utah-RCMP/CPC (Utah) Honts, Hodes & Raskin (1985) .480 .950 .880 Utah-RCMP/CPC (Utah) Driscoll, Honts & Jones, 1987) - - .860 Utah-RCMP/CPC (Utah) Honts (1996) - .930 .910 Utah-DLC (Utah) Honts & Raskin (1988) - - .940 Utah-DLC (Utah) Horowitz, Kircher, Honts & Raskin (1997) - - .920 Utah-DLC (Utah) Kircher & Raskin (1988) .730 .990 .970 Utah-DLC (Utah) Honts, Raskin & Kircher (1987) .730 .960 - ZCT (ESS) Nelson, Krapohl & Hanlder (2008) 7 .610 - - ZCT (ESS) Nelson, Blalock, Oelrich & Cushman (2011) 7 - .950 - ZCT (ESS) Nelson & Krapohl (2011) 8 G - - - ZCT (ESS) Nelson et al (2011) - - - ZCT (ESS) Blalock, Cushman & Nelson (2009) 7 .560 - - ZCT (ESS) Handler, Nelson, Goodson & Hicks (2010) 7 .590 - .840 1-8 Sample scores based on the same sample cases. A-G Sample scores published in the same study.

271 Polygraph, 2011, 40(4) Validated Techniques

Appendix D

Means and Standard Deviations of Criterion Deceptive and Criterion Truthful Scores

PDD Technique Study Mean D StDev D Mean T StDev T AFMGQT (7-position) Senter, Waller & Krapohl (2008) 1 - - - - AFMGQT (7-position) Nelson, Handler, Morgan & O'Burke (In press) 2 -2.995* 4.727* 2.365* 3.879* AFMGQT (7-position) Nelson, Handler, & Senter (In press) 3 A -2.827* 4.504* 3.556* 3.766* AFMGQT (ESS) Nelson, Blalock & Handler (2011) 2 -3.850* 4.730* 4.530* 5.180* AFMGQT (ESS) Nelson & Blalock (In press) 1 -2.000* 5.030* 3.420* 3.470* AFMGQT (ESS) Nelson, Handler, & Senter (In press) 3 A -3.031* 4.535* 3.265* 3.661* Backster You-Phase (Backster) Nelson, Handler, Adams & Backster (In press) 4 -19.649 6.482 3.612 10.010 Backster You-Phase (Backster) Nelson (In press) -12.460 8.353 6.820 10.572 CIT MacLaren 2001 - - - - DLST/TES (7-position) Research Division Staff 1995a - - - - DLST/TES (7-position) Research Division Staff 1995b - - - - DLST/TES (7-position) Nelson (In press) B -2.418* 3.818* 2.653* 3.618* DLST/TES (7-position) Nelson Handler Blalock & Hernández (In press) 5 C -1.833* 4.099* 3.670* 3.443* DLST/TES (ESS) Nelson & Handler (In press) -2.442* 3.531* 2.086* 3.460* DLST/TES (ESS) Nelson, Handler & Morgan (In press) -1.271* 3.131* 4.660* 2.299* DLST/TES (ESS) Nelson (In press) B -3.031* 5.104* 3.265* 3.935* DLST/TES (ESS) Nelson, Handler, Blalock & Hernández (In press) 5 C -1.781* 3.437* 3.636* 2.917* Federal You-Phase (7-position) Nelson (2011) D -6.398 4.914 5.485 5.106 Federal You-Phase (7-position) Nelson, Handler, Blalock & Cushman (In press) 6 E -7.991 6.733 6.514 6.680 Federal You-Phase (ESS) Nelson (2011) D -6.685 6.881 6.735 6.045 Federal You-Phase (ESS) Nelson, Handler, Blalock & Cushman (In press) 6 E -8.606 5.842 6.018 7.107 Federal ZCT (7-position) Blackwell (1998) -10.385 9.510 6.981 7.495 Federal ZCT (7-position) Krapohl & Cushman (2006) 7 F -6.264 10.863 9.776 8.212 Honts, Amato & Gordon (2004) as reported in Honts in Grahnag Federal ZCT (7-position) -8.420 6.837 6.640 9.187 (2004) Federal ZCT (7-position evidentiary) Krapohl & Cushman (2006) 7 F -6.264 10.863 9.776 8.212 Federal ZCT (7-position evidentiary) Nelson & Krapohl (2011) 8 G -9.600 7.356 6.926 10.709 IZCT (Horizontal) Shurani & Chavez (2010) -8.847 15.264 21.181 5.097 IZCT (Horizontal) Gordon, Mohamed, Faro, Platek, Ahmad & Williams (2005) -36.000 12.946 8.750 1.635 IZCT (Horizontal) Shurani (2011) -19.667 9.607 28.948 5.963 MQTZCT (Matte) Matte & Reuss (1989) dissertation -9.148+ 2.843+ 3.099+ 6.002+ MQTZCT (Matte) Shurani, Stein & Brand (2009) -6.949+ 1.630+ 5.388+ 1.246+ MQTZCT (Matte) Mangan, Armitage & Adams (2008) -10.037+ 2.995+ 7.190+ 3.189+ Utah-RCMP/CPC (Utah) Honts, Hodes & Raskin (1985) -11.950 6.520 9.000 10.660 Utah-RCMP/CPC (Utah) Driscoll, Honts & Jones, 1987) -10.700 6.000 10.350 6.470 Utah-RCMP/CPC (Utah) Honts (1996) -15.000 5.564 8.170 5.270 Utah-DLC (Utah) Honts & Raskin (1988) -11.500 5.803 9.000 5.803 Utah-DLC (Utah) Horowitz, Kircher, Honts & Raskin (1997) -7.000 13.500 8.500 11.500 Utah-DLC (Utah) Kircher & Raskin (1988) -7.710 8.420 10.785 8.671 Utah-DLC (Utah) Honts, Raskin & Kircher (1987) -14.000 7.490 9.600 7.490 ZCT (ESS) Nelson, Krapohl & Hanlder (2008) 7 -9.606 9.743 9.162 8.564 ZCT (ESS) Nelson, Blalock, Oelrich & Cushman (2011) 7 -10.740 8.263 8.690 4.585 ZCT (ESS) Nelson & Krapohl (2011) 8 G -11.833 7.764 6.000 9.592 ZCT (ESS) Nelson et al (2011) -11.354 9.392 7.373 9.270 ZCT (ESS) Blalock, Cushman & Nelson (2009) 7 -11.253 9.786 7.191 8.785 ZCT (ESS) Handler, Nelson, Goodson & Hicks (2010) 7 -7.953 10.017 11.212 8.619 1-8 Sample scores based on the same sample cases. A-G Sample scores published in the same study. * Means and standard deviations are reported as subtotal scores for individual questions. + Means and standard deviations are reported for subtotal scores for individual test charts.

Polygraph, 2011, 40(4) 272 Ad Hoc Committee on Validated Techniques

Appendix E-1

AFMGQT / Seven-position TDA

Nelson, Handler, Senter, Waller & Nelson & Handler Study Morgan & O'Burke Krapohl (2008) (In press) (In press) Sample N 69 22 100 N Deceptive 33 11 50 N Truthful 36 11 50 Scorers 12 3 1 D Scores 33 33 50 T Scores 36 33 50 Total Scores 69 66 100 Mean D -2.000 -2.995 -2.827 StDev D 5.030 4.727 4.504 Mean T 3.420 2.365 3.556 StDev T 3.470 3.879 3.766 Reliability Kappa .750 - - Reliability Agreement .930 .999 - Reliability Correlation .940 - - Unweighted Average .849 .761 .814 Accuracy Unweighted .015 .242 .280 Inconclusives Sensitivity .758 .818 .780 Specificity .917 .364 .420 FN Errors .212 .000 .040 FP Errors .083 .333 .200 D-INC .030 .182 .140 T-INC .000 .303 .420 PPV .901 .711 .796 NPV .812 .999 .913 D Correct .781 .999 .951 T Correct .917 .522 .677

273 Polygraph, 2011, 40(4) Validated Techniques

Appendix E-2

AFMGQT / ESS

Nelson, Blalock Nelson & Blalock Nelson, Handler & Study Handler (In press) (In press) Senter (In press) Sample N 22 69 100 N Deceptive 11 33 50 N Truthful 11 36 50 Scorers 3 1 1 D Scores 33 33 50 T Scores 33 36 50 Total Scores 66 69 100 Mean D -3.850 -2.000 -3.031 StDev D 4.730 5.030 4.535 Mean T 4.530 3.420 3.265 StDev T 5.180 3.470 3.661 Reliability Kappa - - - Reliability Agreement .999 - - Reliability Correlation .931 - - Unweighted Average .883 .839 .876 Accuracy Unweighted .183 .152 .178 Inconclusives Sensitivity .831 .511 .806 Specificity .616 .862 .639 FN Errors .010 .211 .067 FP Errors .175 .027 .131 D-INC .158 .277 .127 T-INC .208 .028 .229 PPV .826 .951 .860 NPV .984 .803 .905 D Correct .988 .708 .923 T Correct .779 .970 .830

Polygraph, 2011, 40(4) 274 Ad Hoc Committee on Validated Techniques

Appendix E-3

Backster You-Phase

Nelson, Handler, Study Adams & Backster Nelson (In press) (In press) Sample N 22 100 N Deceptive 11 50 N Truthful 11 50 Scorers 7 1 D Scores 77 50 T Scores 77 50 Total Scores 154 100 Mean D -19.649 -12.460 StDev D 6.482 8.353 Mean T 3.612 6.820 StDev T 10.010 10.572 Reliability Kappa - - Reliability Agreement - - Reliability Correlation .567 - Unweighted Average .825 .927 Accuracy Unweighted .117 .321 Inconclusives Sensitivity .948 .668 Specificity .532 .592 FN Errors .001 .019 FP Errors .286 .079 D-INC .052 .313 T-INC .182 .329 PPV .768 .894 NPV .999 .969 D Correct .999 .972 T Correct .650 .882

275 Polygraph, 2011, 40(4) Validated Techniques

Appendix E-4

Concealed Information Test / Guilty Knowledge Test

as reported by MacLaren (2001)

MacLaren, V. V. (2001). A quantitative review of the guilty knowledge test. Journal of Applied Psychology, 86, 674-683.

Results are reported with all informed participants, and with only those informed participants who also engaged in the behavioral acts.

Mean (St. Er.) {95% CI} Informed/guilty and All informed and

uniformed participants uninformed participants Number of studies 39 50 N Deceptive 666 843 N Truthful 404 404 Total N 1070 1243 .823 (.011) .795 (.043) Unweighted Accuracy {.801 to .846} {.711 to .880} Unweighted Inconclusives - - .815 (.014) .759 (.053) Sensitivity {.789 to .842} {.655 to .864} .832 (.019) .832 (.068) Specificity {.795 to .868} {.698 to .965} .185 (.014) .241 (.053) FN Errors {.158 to .211} {.136 to .345} .168 (.019) .168 (.068) FP Errors {.132 to .205} {.035 to .302} D-INC - - T-INC - - .889 (.011) .904 (.041) PPV {.868 to .909} {.824 to .984} .732 (.021) .623 (.075) NPV {.69 to .774} {.477 to .770} .815 (.014) .759 (.053) D Correct {.789 to .842} {.655 to .864} .832 (.019) .832 (.068) T Correct {.795 to .868} {.698 to .965}

Polygraph, 2011, 40(4) 276 Ad Hoc Committee on Validated Techniques

Appendix E-5

Directed Lie Screening Test (TES) / Seven-position TDA

Nelson, Handler, Research Division Research Division Blalock & Study Nelson (In press) Staff (1995a) Staff (1995b) Hernández (In press) Sample N 94 85 100 49 N Deceptive 26 30 50 25 N Truthful 68 55 50 24 Scorers 3 10 1 2 D Scores 26 30 50 50 T Scores 68 55 50 48 Total Scores 94 85 100 98 Mean D - - -2.418 -1.833 StDev D - - 3.818 4.099 Mean T - - 2.653 3.670 StDev T - - 3.618 3.443 Reliability Kappa .760 - - - Reliability Kappa .890 - - .722 Reliability Agreement - - - - Unweighted Average .788 .880 .874 .831 Accuracy

Unweighted .155 .009 .096 .092 Inconclusives Sensitivity .654 .833 .910 .583 Specificity .676 .909 .677 .940 FN Errors .154 .167 .037 .271 FP Errors .206 .073 .184 .020 D-INC .192 .001 .053 .145 T-INC .118 .018 .139 .039 PPV .761 .920 .832 .967 NPV .815 .845 .948 .776 D Correct .810 .833 .961 .683 T Correct .767 .926 .786 .979

277 Polygraph, 2011, 40(4) Validated Techniques

Appendix E-6

Directed Lie Screening Test (TES) / ESS

Nelson, Handler, Nelson & Handler Nelson, Handler & Blalock & Study Nelson (In press) (In press) Morgan (In press) Hernández (In press) Sample N 100 49 100 49 N Deceptive 50 24 50 25 N Truthful 50 25 50 24 Scorers 1 1 1 2 D Scores 50 24 50 50 T Scores 50 25 50 48 Total Scores 100 49 100 98 Mean D -2.442 -1.271 -3.031 -1.781 StDev D 3.531 3.131 5.104 3.437 Mean T 2.086 4.660 3.265 3.636 StDev T 3.460 2.299 3.935 2.917 Reliability Kappa - - - - Reliability Kappa - .911 - .769 Reliability Agreement - - - - Unweighted Average .831 .854 .871 .859 Accuracy

Unweighted .104 .088 .048 .123 Inconclusives Sensitivity .917 .625 .935 .665 Specificity .587 .950 .730 .839 FN Errors .036 .210 .046 .207 FP Errors .253 .040 .195 .040 D-INC .047 .165 .020 .126 T-INC .160 .010 .075 .119 PPV .784 .940 .827 .943 NPV .942 .819 .941 .802 D Correct .962 .749 .953 .763 T Correct .699 .960 .789 .954

Polygraph, 2011, 40(4) 278 Ad Hoc Committee on Validated Techniques

Appendix E-7

Federal You-Phase / Seven-position TDA

Nelson, Handler, Study Nelson (In press) Blalock & Cushman (In press) Sample N 100 22 N Deceptive 50 11 N Truthful 50 11 Scorers 1 10 D Scores 50 110 T Scores 50 110 Total Scores 100 220 Mean D -6.398 -7.991 StDev D 4.914 6.733 Mean T 5.485 6.514 StDev T 5.106 6.680 Reliability Kappa - - Reliability Kappa - .852 Reliability Agreement - - Unweighted Average .870 .885 Accuracy Unweighted .301 .108 Inconclusives Sensitivity .833 .844 Specificity .417 .730 FN Errors .010 .036 FP Errors .138 .171 D-INC .157 .119 T-INC .444 .097 PPV .858 .832 NPV .977 .953 D Correct .988 .959 T Correct .751 .810

279 Polygraph, 2011, 40(4) Validated Techniques

Appendix E-8

Federal You-Phase / ESS

Nelson, Handler, Study Nelson (In press) Blalock & Cushman (In press) Sample N 100 22 N Deceptive 50 11 N Truthful 50 11 Scorers 1 10 D Scores 50 110 T Scores 50 110 Total Scores 100 220 Mean D -6.685 -8.606 StDev D 6.881 5.842 Mean T 6.735 6.018 StDev T 6.045 7.107 Reliability Kappa - - Reliability Kappa - 0.9 Reliability Agreement - - Unweighted Average .897 .906 Accuracy Unweighted .096 .235 Inconclusives Sensitivity .813 .859 Specificity .729 .770 FN Errors .050 .027 FP Errors .126 .143 D-INC .090 .145 T-INC .102 .325 PPV .866 .857 NPV .936 .966 D Correct .942 .970 T Correct .853 .843

Polygraph, 2011, 40(4) 280 Ad Hoc Committee on Validated Techniques

Appendix E-9

Federal ZCT / Seven-position TDA

Honts, Amato & Krapohl & Gordon (2004) as Study Blackwell (1998) Cushman (2006) reported in Grahnag (2004) Sample N 100 100 48 N Deceptive 65 50 24 N Truthful 35 50 24 Scorers 3 10 3 D Scores 195 500 72 T Scores 105 500 72 Total Scores 300 1,000 144 Mean D -10.385 -6.264 -8.420 StDev D 9.510 10.863 6.837 Mean T 6.981 9.776 6.640 StDev T 7.495 8.212 9.187 Reliability Kappa .570 - - Reliability Kappa .800 - - Reliability Agreement - - - Unweighted Average .793 .852 .958 Accuracy Unweighted .159 .198 .042 Inconclusives Sensitivity .923 .824 .917 Specificity .448 .556 .917 FN Errors .015 .044 .000 FP Errors .295 .180 .083 D-INC .062 .132 .083 T-INC .257 .264 .001 PPV .758 .821 .917 NPV .967 .927 .999 D Correct .984 .949 .999 T Correct .603 .755 .917

281 Polygraph, 2011, 40(4) Validated Techniques

Appendix E-10

Federal ZCT / Seven-position TDA with Evidentiary Rules

Krapohl & Nelson & Krapohl Study Cushman (2006) (2011) Sample N 100 60 N Deceptive 50 30 N Truthful 50 30 Scorers 10 6 D Scores 500 30 T Scores 500 30 Total Scores 1,000 60 Mean D -6.264 -9.600 StDev D 10.863 7.356 Mean T 9.776 6.926 StDev T 8.212 10.709 Reliability Kappa - - Reliability Agreement 0.870 - Reliability Correlation - - Unweighted Average .872 .870 Accuracy Unweighted Average .073 .100 Accuracy Sensitivity .792 .933 Specificity .824 .667 FN Errors .122 .001 FP Errors .116 .233 D-INC .086 .067 T-INC .060 .133 PPV .872 .800 NPV .871 .999 D Correct .999 .999 T Correct 0.88 .741

Polygraph, 2011, 40(4) 282 Ad Hoc Committee on Validated Techniques

Appendix E-11

Integrated Zone Comparison Technique / Horizontal Scoring System

Gordon, Mohamed, Faro, Platek, Shurani & Chaves Study Shurani (2011) Ahmad & Williams (2010) (2005) Sample N 11 84 84 N Deceptive 6 44 36 N Truthful 5 40 48 Scorers 1 4 1 D Scores 6 44 36 T Scores 5 40 48 Total Scores 11 84 84 Mean D -36.000 -8.847 -19.667 StDev D 12.946 15.264 9.607 Mean T 8.750 21.181 28.948 StDev T 1.635 5.097 5.963 Reliability Kappa - - - Reliability Agreement - - - Reliability Correlation - - - Unweighted Average .999 .988 .999 Accuracy Unweighted .100 .061 .001 Inconclusives Sensitivity .999 .955 .999 Specificity .800 .900 .999 FN Errors .001 .023 .001 FP Errors .001 .001 .001 D-INC .001 .023 .001 T-INC .200 .100 .001 PPV .999 .999 .999 NPV .999 .975 .999 D Correct .999 .977 .999 T Correct .999 .999 .999

283 Polygraph, 2011, 40(4) Validated Techniques

Appendix E-12

Matte Quadri-Track Zone Comparison Technique

Matte & Reuss Shurani, Stein & Mangan, Armitage Study (1989) Brand (2009) & Adams (2008) Sample N 122 57 140 N Deceptive 64 28 91 N Truthful 58 29 49 Scorers 2 4 1 D Scores 64 28 91 T Scores 58 29 49 Total Scores 122 57 140 Mean D -9.148 -6.949* -10.037* StDev D 2.843 1.630* 2.995* Mean T 3.099 5.388* 7.190* StDev T 6.002 1.246* 3.189* Reliability Kappa - - - Reliability Agreement - - - Reliability Correlation .990 - - Unweighted Average .999 .964 .999 Accuracy Unweighted .059 .001 .011 Inconclusives Sensitivity .969 .929 .978 Specificity .914 .999 .999 FN Errors .001 .071 .001 FP Errors .001 .001 .001 D-INC .031 .001 .022 T-INC .086 .001 .001 PPV .999 .999 .999 NPV .999 .933 .999 D Correct .999 .929 .999 T Correct .999 .999 .999

Polygraph, 2011, 40(4) 284 Ad Hoc Committee on Validated Techniques

Appendix E-13

Utah PLC / Utah Numerical Scoring

Kircher & Raskin Honts, Raskin & Study (1988) Kircher (1987) Sample N 100 20 N Deceptive 50 10 N Truthful 50 10 Scorers 1 1 D Scores 50 10 T Scores 50 10 Total Scores 100 20 Mean D -7.710 -14.000 StDev D 8.420 7.490 Mean T 10.785 9.600 StDev T 8.671 7.490 Reliability Kappa .730 .730 Reliability Agreement .990 .960 Reliability Correlation .970 - Unweighted Average .935 .889 Accuracy Unweighted .070 .150 Inconclusives Sensitivity .880 .800 Specificity .860 .700 FN Errors .060 .000 FP Errors .060 .200 D-INC .060 .200 T-INC .080 .100 PPV .936 .800 NPV .935 .999 D Correct .936 .999 T Correct .935 .778

285 Polygraph, 2011, 40(4) Validated Techniques

Appendix E-14

Utah DLC / Utah Numerical Scoring

Horowitz, Kircher, Honts & Raskin Study Honts & Raskin (1988) (1997) Sample N 25 30 N Deceptive 12 15 N Truthful 13 15 Scorers 1 1 D Scores 12 15 T Scores 13 15 Total Scores 25 30 Mean D -11.500 -7.000 StDev D 5.803 13.500 Mean T 9.000 8.500 StDev T 5.803 11.500 Reliability Kappa - - Reliability Agreement - - Reliability Correlation .940 .920 Unweighted Average .958 .856 Accuracy Unweighted .077 .067 Inconclusives Sensitivity .917 .733 Specificity .846 .867 FN Errors .083 .133 FP Errors .001 .133 D-INC .001 .133 T-INC .154 .001 PPV .999 .846 NPV .910 .867 D Correct .917 .846 T Correct .999 .867

Polygraph, 2011, 40(4) 286 Ad Hoc Committee on Validated Techniques

Appendix E-15

Utah RCMP Zone / Utah Numerical Scoring

Honts Hodes & Driscoll, Honts & Study Honts (1996) Raskin (1985) Jones, (1987) Sample N 38 32 40 N Deceptive 19 21 20 N Truthful 19 11 20 Scorers 1 1 1 D Scores 19 21 20 T Scores 19 11 20 Total Scores 38 32 40 Mean D -11.950 -15.000 -10.700 StDev D 6.520 5.564 6.000 Mean T 9.000 8.170 10.350 StDev T 10.660 5.270 6.470 Reliability Kappa .480 - - Reliability Agreement .950 .930 - Reliability Correlation .880 .910 .860 Unweighted Average .833 .969 .999 Accuracy Unweighted .237 .210 .100 Inconclusives Sensitivity .895 .714 .900 Specificity .421 .818 .900 FN Errors .001 .048 .001 FP Errors .211 .001 .001 D-INC .105 .238 .100 T-INC .368 .182 .100 PPV .810 .999 .999 NPV .999 .945 .999 D Correct .999 .938 .999 T Correct .667 .999 .999

287 Polygraph, 2011, 40(4) Validated Techniques

Appendix E-16

Utah PLT DLC Combined / Utah Numerical Scoring

Technique Utah PLC Utah DLC RCMP Sample N 120 55 110 N Deceptive 60 27 60 N Truthful 60 28 50 Scorers 2 2 3 D Scores 60 27 60 T Scores 60 28 50 Total Scores 120 55 110 Mean D -10.855 -9.250 -12.550 StDev D 7.955 9.652 6.028 Mean T 10.193 8.750 9.173 StDev T 8.081 8.652 7.467 Reliability Kappa .730 - .480 Reliability Agreement .975 - .940 Reliability Correlation .970 .930 .883 Unweighted Average .927 .902 .939 Accuracy Unweighted .083 .073 .185 Inconclusives Sensitivity .867 .815 .833 Specificity .833 .857 .700 FN Errors .050 .111 .017 FP Errors .083 .071 .080 D-INC .083 .074 .150 T-INC .083 .071 .220 PPV .912 .919 .912 NPV .943 .885 .977 D Correct .945 .880 .980 T Correct .909 .923 .897

Polygraph, 2011, 40(4) 288 Ad Hoc Committee on Validated Techniques

Appendix E-17

Zone Comparison Techniques / ESS

Nelson, Nelson, Blalock, Handler, Blalock, Nelson & Nelson Krapohl & Cushman & Nelson, Study Cushman & Krapohl et al. Handler Nelson Goodson & Oelrich (2011) (2011) (2008) (2009) Hicks (2010) (2011) Sample N 100 10 60 562 100 100 N Deceptive 50 6 30 298 50 50 N Truthful 50 4 30 264 50 50 Scorers 7 25 6 74 9 19 D Scores 350 150 30 741 450 950 T Scores 350 100 30 641 450 950 Total Scores 700 250 60 1,382 900 1,900 Mean D -9.634 -10.740 -11.833 -11.354 -11.253 -7.953 StDev D 8.475 8.263 7.764 9.392 9.786 10.017 Mean T 8.849 8.690 6.000 7.373 7.191 11.212 StDev T 7.457 4.585 9.592 9.270 8.785 8.619 Reliability Kappa .610 - - - .560 .590 Reliability Agreement - .950 - - - - Reliability Correlation - - - - - .840 Unweighted Average .872 .958 .913 .883 .870 .867 Accuracy Unweighted .103 .102 .200 .114 .138 .115 Inconclusives Sensitivity .749 .793 .833 .804 .773 .666 Specificity .814 .930 .633 .761 .727 .873 FN Errors .154 .073 .001 .090 .122 .196 FP Errors .077 .001 .133 .117 .102 .035 D-INC .097 .133 .167 .105 .104 .138 T-INC .109 .070 .233 .122 .171 .092 PPV .907 .999 .862 .873 .883 .950 NPV .841 .927 .999 .894 .856 .817 D Correct .829 .916 .999 .899 .864 .773 T Correct .914 .999 .826 .867 .877 .961

289 Polygraph, 2011, 40(4) Validated Techniques

Appendix F

PDD Techniques for which no published studies could be located or for which no published studies could be included in the meta-analysis.

Backster Exploratory Backster SKY Backster ZCT IZCT Screening Law Enforcement Pre-Employment Test (LEPET)* Marcy Technique Matte Quinque Track Matte SGK RCMP B Series Searching Peak of Tension Utah MGQT*

* Although there is no published research to describe these techniques, the LEPET and Utah MGQT formats are structurally nearly identical to the AFMGQT. Validation data for the AFMGQT can be generalized to these techniques if they are evaluated with the TDA models described in the published studies.

Polygraph, 2011, 40(4) 290 Ad Hoc Committee on Validated Techniques

Appendix G

Studies that could not be included in the meta-analysis

These studies were excluded for a variety of reasons, including the use of instrumentation that does not match that used in the field, the use of test question sequences or test data analysis models that do not match field testing protocols, non-representative study samples, or insufficient statistical data to calculate the criterion accuracy profile or sampling distributions.

Ansley (1989) produced a study on the effectiveness of Relevant-Irrelevant screening exams with student examiners. Crewson (2001) included the results of this study in a survey of diagnostic and screening tests in the polygraph, medicine and psychology professions. However, the study report is not available, and the study could not be included in the meta-analysis.

Barland, Honts and Barger (1989) studied the accuracy of multi-issue screening exams. No reliability data and no sampling means or standard deviations were available. Data are not available for further analysis. Numerical scoring was completed according to older training and field practice protocols from the U.S. Department of Defense and may not reflect current practices. Without the ability to compare these results to the results from other studies, this series of studies could not be included in the meta-analysis.

Bell, Kircher and Bernhardt (2008) compared the Utah PLC and DLC methods, and concluded no significant differences exist. TDA was limited to automated methods, and therefore was not included in the meta-analysis.

Brownlie, Johnson, and Knill (1997) completed a study on the Relevant-Irrelevant technique. The study is described in the report from the NRC (2003) and Crewson (2001). Crewson (2001) included the results of this study in a survey of diagnostic and screening tests in the polygraph, medicine and psychology professions. However, the study report is not available, and the study could not be included in the meta-analysis.

Correa and Adams (1981) reported the results of a study of the RI technique. The testing procedure does not match field practice, and includes the use of a thermistor respiration sensor instead of standard thoracic and abdominal pneumograph sensors. In addition, an EKG was used in place of the cardiograph sensors. Results, as reported, were 100% accuracy. No data could be obtained for review.

Forman and McCauley (1986) reported the results of a study on the positive-control technique. Reported results did not provide mean or standard deviation statistics for the distributions of scores. Additionally, the study employed a quasi-numerical scoring system that does not reflect PCT field practice as described in other studies.

Ganguly, Lahri and Bhaseen (1986) reported the results of a study based on the Reid Technique. However, testing procedures do not reflect field practices in that only the pneumograph and cardiograph data were evaluated during the study.

Ginton, Daie, Elaad and Ben-Shakhar (1982) reported the results of an interesting field study involving a unique modification of the MGQT technique, using an overall truth question at position 3. This study was not included because this question sequence does not reflect field practices.

Gordon, Fleisher, Morsie, Habib, and Salah (2000) reported the results of a field study of the IZCT, including 309 reported confirmed cases and 1 error. No reliability data was included in the publication, nor any statistical description of the sampling distributions of

291 Polygraph, 2011, 40(4) Validated Techniques

deceptive and truthful scores. The authors advised that the data belong to the intelligence service of a foreign government, and the primary author informed the ad hoc committee (personal communication June 10, 2011) that he completed the study report without ever seeing the data.

Honts and Amato (1999) reported the results of an automated presentation of the Relevant- Irrelevant polygraph technique. This procedure does not reflect field practices, and the information could therefore not be included in the meta-analysis.

Honts and Hodes (1983) reported the same results as experiment 1 in the Honts, Hodes and Raskin (1985) study, which did not include a standard cardiovascular arm cuff.

Honts, Hodes and Raskin (1985) experiment 1 involved the Backster You-Phase technique, but did not include a standard cardio cuff and therefore could not be included in the meta- analysis.

Honts and Reavy (2009) used the Federal ZCT in a large-scale laboratory experiment. However, this study was designed to evaluate and compare the effectiveness of PLCs and DLCs. Data as reported could not be used to calculate a dimensional profile of generalizable criterion accuracy estimates.

Horowitz, Kircher, Honts and Raskin (1997) included the results of a study of the RI techniques. RI examination results could not be included because the scoring protocol did not reflect field practices (global analysis and subjective evaluation of consistent and significant responses), and involved the use of seven-position numerical scoring procedures in which the RQ responses were compared with the responses to neutral questions.

Horvath (1988) reported the results of a study on the Reid technique, involving two blind evaluators who used a 7-position scoring model described as similar to the one used by Barland and Raskin (1975) with +/- 5 decision thresholds. This is not the Reid scoring method, which has been described as a three position TDA model that does not employ fixed cut scores. They also included a fifth RQ which is not consistent with current field practices.

Horvath and Palmatier (2008) reported the results of an MGQT format structured like the Reid technique. The study involved two blind evaluators who used a 7-position scoring model described as similar to the one used by Barland and Raskin (1975) with +/- 6 decision thresholds. This is not the Reid scoring method, which has been described as a three position TDA model that did not have cut scores. They also included a fifth RQ which is not consistent with field practices. Additionally, there was only one evaluator so reliability statistics could not be calculated.

Horvath and Reid (1971) reported the results of a study on the Reid technique. The study could not be included in the meta-analysis for several reasons, including a highly selective sample and the lack of a clearly structured decision model. Of the original 75 polygraph exams, 40 were chosen by the author for inclusion in the study sample, and 35 exams were removed from the sample because the author felt they were too easy to score. The act of selecting out exams from individual case files which they felt were not appropriate for the study brings into question whether the resulting sample would be representative of real world cases. The examiners were precluded from making an inconclusive call, and were required to make a DI or NDI call for every case. Some cases had five RQs, which is inconsistent with field testing techniques currently used. No mean and standard deviation scores were provided and no data could be provided to calculate them. Additionally, no interrater reliability statistics were reported.

Polygraph, 2011, 40(4) 292 Ad Hoc Committee on Validated Techniques

Hunter and Ash (1973) reported the results of a study of the Reid technique. The Reid technique does not employ a structured decision model or fixed cutscores, but relies on impressionistic decisions from the examiner. The report does not include any inter-rater reliability information and does not include any sampling distribution data that can be used to compare the sampling distributions. Additionally, the study sample was constructed using a highly selective process involving verified cases conducted by the primary author. Without data on reliability and without sampling distribution parameters, or access to the examination data, the sample is of is of unverifiable representativeness and generalizability.

Jayne (1989) reported the results of a field study on the predictive value of polygraph screening tests. The study design was intended to compare screening polygraph accuracy with that of other preemployment screening methods. In addition to highly selective and non-random case selection requirements, the criterion status of the sample screening cases was determined as a function of the results of a subsequent diagnostic polygraph regarding an employee theft investigation that was independent of the screening polygraph. Although an innovative attempt to study future outcomes related to polygraph screening, the results of this study are not suitable for use as a criterion accuracy study.

Jayne (1990) reported the result of a study of the Reid technique. Although the results were previously described by Krapohl (2006), this study could not be included in the present meta-analysis because two different scoring approaches were used, neither of which reflect field practices. One scoring method involved making precise linear measurements of the test data. The other, more common, numerical scoring method was completed while excluding the fourth RQ, which was described as a secondary RQ that could also be scored as a CQ. A test for which the nature and purpose of the stimulus is decided post-hoc lacks scientific rigor. Current field practices do not endorse the exclusion of individual RQs or the use of a secondary RQ as a CQ. In addition, no decision rules or numerical cutscores were used in this study, and decisions were made according to the subjective opinion of the scorer.

Krapohl (2005) reported the results of seven-position evidentiary scoring of Federal ZCT exams. Study data included cases from four different archival samples for which the sampling distributions could not be effectively compared to the sampling distributions from other studies.

Krapohl (2010) showed that seven-position scores of You-Phase exams could be transformed to ESS scores. Sample data was a highly selective and non-representative sample of Backster You-Phase exams, reported by Meiron, Krapohl & Ashknazi (2008).

Matte (1990) is an abstract of a dissertation study completed at an institution that was approved by the State of California but not accredited by an institution recognized by the U.S. Government or foreign equivalent. UMI later became part of ProQuest who subsequently adopted a policy that only those dissertations from regionally accredited universities would be listed and available to the public. The dissertation is maintained by ProQuest with Matte as the author and ProQuest as the publisher. Data from this dissertation study was included in this meta-analysis because the data was previously published in the journal Polygraph (Matte & Ruess, 1989).

Matte (2010) reported the results of a study of the Backster Either-Or rule. However, no truthful cases were included in the sample data. This study cannot be construed as a criterion accuracy study, and could not be included in the meta-analysis.

Meiron, Krapohl and Ashkenazi (2008) reported the results of a study of the Backster Either-Or rule. Data, as reported at the 2008 APA annual conference in Indianapolis, included a highly selective and non-random sample from which the results of problematic

293 Polygraph, 2011, 40(4) Validated Techniques

examinations were not included. The result of this form of highly selective sampling is that the sample data are systematically devoid of error variance.

Patrick and Iacono (1991) reported the results of a study of the Federal ZCT while reviewing the CQs between charts. Instrumentation recorded both skin conductance and heart rate, which were scored against pre-stimulus levels in a manner that does not reflect field practices.

Patrick and Iacono (1989) in a replication of an earlier study by Raskin and Hare (1978) reported the results of a study involving a technique that resembles the Federal ZCT, using the Utah scoring system and a grand total decision rule. This study is interesting but does not resemble field practices closely enough to be included in the present meta-analysis of criterion accuracy.

Podlesny and Raskin (1978) recorded eight different physiological channels using a test question sequence that resembles the Federal ZCT with a guilt-complex question instead of the symptomatic question at position 8. Numerical scoring was based on the method described by Barland and Raskin (1975) and Raskin and Hare (1978) which is an early version of the Utah numerical scoring system. Guilt complex questions and differences between exclusive and non-exclusive CQs were studied. In addition, the CQT was compared to the GKT. The absence of standard deviations prevents the comparison of the sampling distribution of numerical scores with those from other studies. This study was designed to evaluate a number of important questions, but could not fit into a clear category of test question sequence and TDA model for which comparable replication studies could be located. As a result, this study could not be included in the meta-analysis of criterion accuracy of PDD techniques presently used in field settings.

Raskin and Hare (1978) reported the results of an important study using examinees who were considered to be criminal psychopaths. Results from this study could not be included in the meta-analysis criterion accuracy of field PDD methods because of the use of non- standard testing instrumentation that did not include a standard blood-pressure cardio- activity cuff sensor.

Research Division Staff (2001) described a study used on the development of a laboratory scenario used to manipulate research subjects. This study was not designed to be a criterion accuracy study.

Rovner (1986) in an interesting study of polygraph countermeasures did not report decisions, errors and inconclusive results separately for truthful and deceptive cases. Data are unavailable for further analysis. As a result, the complete dimensional profile of criterion accuracy could not be calculated and the study could not be included as a criterion accuracy study.

Senter and Dollins (2002) published an interesting and informative study of decision rules that is not suitable for use as a criterion study.

Senter (2003) published an interesting and informative study of decision rules that is not suitable for use as a criterion study.

Senter and Dollins (2004) published an interesting and informative study of decision rules that is not suitable for use as a criterion study.

Senter and Dollins (2008) published an interesting and informative study of decision rules that is not suitable for use as a criterion study.

Polygraph, 2011, 40(4) 294 Ad Hoc Committee on Validated Techniques

Slowik and Buckley (1975) reported the results of a study of the Reid technique. The study sample was selected from verified cases but does not describe the verification process. The Reid technique does not employ a structured decision model or fixed cutscores, but relies on impressionistic decisions from the examiner. The report does not include any inter-rater reliability information and does not include any sampling distribution parameters that can be used to compare the sampling distributions. Without sampling distribution statistics, or access to the examination data, the sample is of unverifiable representativeness and unknown generalizability.

Van Herk (1990) reported the results of a pilot study and what may be the first publication on three-position TDA. One-third of the cases were unconfirmed, and consequently, this study could not be included in the meta-analysis.

Wicklander and Hunter (1975) reported the results of a study of the Reid technique. The study sample was selected from verified cases but does not describe the verification process. The Reid technique does not employ a structured decision model or fixed cutscores, but relies on impressionistic decisions from the examiner. The report does not include any inter-rater reliability information and does not include any sampling distribution data that can be used to compare the sampling distributions. Without sampling distribution statistics, or access to the examination data, the sample is of unverifiable representativeness and unknown generalizability.

295 Polygraph, 2011, 40(4) Validated Techniques

Appendix H

Techniques for which there exists only one study that met the qualitative and quantitative criteria for inclusion in the meta-analysis.

The APA Standards of Practice require a minimum of two published studies of a given technique.

Arther Technique

Horvath, F.S. (1977) reported the results of a field study using the Arther modification of the Reid technique.

Reid Technique

Horvath (1988) reported the results of a laboratory study using the Reid technique.

These studies were previously combined by Krapohl (2006). However, subsequent review suggests that the Reid and Arther techniques are sufficiently dissimilar that the results of these studies cannot be regarded as replicating each other.

The Reid Technique differs from other CQT formats in some important ways. It sometimes employs a fifth RQ while all other PDD techniques are limited to four RQs. Also, the Reid technique does not employ fixed numerical cutscores or a structured decision model. Instead, decisions are made impressionistically by the examiner, using information from the test data, pretest interview, behavioral observations, and case file information. Inclusion of clinical impressions in the decision process severely restricts the ability to validate it in the way that has been done with the other techniques. Although the Reid technique can be credited as the source of many important innovations, and is indeed the wellspring from which all other CQTs have evolved, it is currently not being taught at any polygraph schools accredited by the APA. The number of practitioners using the technique has decreased substantially since the closure of the Reid Polygraph School. The majority of the other techniques in the meta-analysis are currently taught and/or practiced on a considerable scale. A review of the research in support of the technique resulted in an inability to satisfy the requirements of the study selection criteria, including interrater reliability statistics and normative parameters with which to calculate the generalizability of sample data. Results described in the published literature, and data available to the committee, do not permit the statistical treatments applied to all of the other methods. Despite these limitations, the average accuracy level of studies on the Reid technique was not significantly different from the results of this meta-analysis.

Polygraph, 2011, 40(4) 296 Ad Hoc Committee on Validated Techniques

Appendix I-1

AFMGQT / Three-position TDA

Two usable studies on the AFMGQT with three-position TDA produced a combined unweighted average accuracy level of .816 (.059), with an inconclusive rate of .443 (.044). The weighted average of correct decisions for the AFMGQT three-position model, was .989 (.016) for criterion deceptive cases and .643 (.116) for criterion truthful cases. The weighted average inconclusive rates were .237 (.060) for criterion deceptive cases and .648 (.067) for criterion truthful cases.

Nelson & Handler Handler & Nelson Study (In press) (In press) Sample N 100 22 N Deceptive 50 11 N Truthful 50 11 Scorers 3 D Scores 50 33 T Scores 50 33 Total Scores 100 66 Mean D -1.886 -1.903 StDev D 3.161 2.986 Mean T 2.427 1.424 StDev T 2.557 2.722 Reliability Kappa - - Reliability Agreement - .945 Reliability Correlation - - Unweighted Accuracy .869 .740 Unweighted Inconclusives .457 .421 Sensitivity .737 .780 Specificity .260 .180 FN Errors .007 .010 FP Errors .088 .186 D-INC .256 .209 T-INC .658 .633 PPV .894 .807 NPV .974 .947 D Correct .991 .987 T Correct .748 .492

297 Polygraph, 2011, 40(4) Validated Techniques

Appendix I-2

Army MGQT / Seven-position TDA

Two usable studies on the Army MGQT resulted in an unweighted accuracy level of .694 (.043), with an inconclusive rate of .133 (.038). The weighted average of correct decisions for the Army MGQT, using the seven-position TDA model, was .999 (.050) for criterion deceptive cases and .039 (.085) for criterion truthful cases. The weighted average inconclusive rates were .043 (.034) for criterion deceptive cases and .224 (.065) for criterion truthful cases.

Krapohl & Norris Blackwell Study (2000) (1999) Sample N 32 100 N Deceptive 16 80 N Truthful 16 20 Scorers 3 3 D Scores 16 240 T Scores 16 60 Total Scores 32 300 Mean D - - StDev D - - Mean T - - StDev T - - Reliability Kappa - - Reliability Agreement - - Reliability Correlation .750 .907 Unweighted Accuracy .833 .660 Unweighted Inconclusives .219 .125 Sensitivity .813 .967 Specificity .500 .250 FN Errors .001 .001 FP Errors .250 .533 D-INC .188 .033 T-INC .250 .217 PPV .765 .644 NPV .999 .999 D Correct .999 .999 T Correct .667 .319

Polygraph, 2011, 40(4) 298 Ad Hoc Committee on Validated Techniques

Appendix I-3

Directed-Lie Screening Test / Three-position TDA

Two usable studies on the DLST/TES with three-position TDA produced a combined unweighted average accuracy level of .869 (.037) with an inconclusive rate of .228 (.043). The weighted average of correct decisions for the DLST three-position model, was .827 (.060) for criterion deceptive cases and .912 (.043) for criterion truthful cases. The weighted average inconclusive rates were .427 (.063) for criterion deceptive cases and .210 (.060) for criterion truthful cases.

Nelson, Handler, Blalock & Study Nelson (In press) Hernández (In press) Sample N 100 49 N Deceptive 50 25 N Truthful 50 24 Scorers 1 2 D Scores 50 50 T Scores 50 48 Total Scores 100 98 Mean D -1.585 -1.458 StDev D 2.382 2.784 Mean T 1.719 2.470 StDev T 2.253 1.853 Reliability Kappa - - Reliability Agreement - 0.762 Reliability Correlation - - Unweighted Accuracy .893 .817 Unweighted Inconclusives .208 .248 Sensitivity .829 .415 Specificity .595 .848 FN Errors .033 .228 FP Errors .127 .009 D-INC .138 .355 T-INC .277 .141 PPV .867 .979 NPV .947 .788 D Correct .962 .645 T Correct .824 .989

299 Polygraph, 2011, 40(4) Validated Techniques

Appendix I-4

Federal You-Phase / Three-position TDA

Two usable studies on the Federal You-Phase technique with three-position TDA produced combined unweighted average accurate level of .881 (.041), with an inconclusive rate of .282 (.046). The weighted average of correct decisions for the Federal You-Phase three-position model, was .977 (.023) for criterion deceptive cases and .786 (.078) for criterion truthful cases. The weighted average inconclusive rates were .181 (.055) for criterion deceptive cases and .348 (.072) for criterion truthful cases.

Nelson, Handler, Nelson Study Blalock & Cushman (In press) (In press) Sample N 100 22 N Deceptive 50 11 N Truthful 50 11 Scorers 1 10 D Scores 50 110 T Scores 50 110 Total Scores 100 220 Mean D -4.720 -5.394 StDev D 3.345 4.219 Mean T 3.901 3.982 StDev T 3.804 4.432 Reliability Kappa - - Reliability Agreement - 0.877 Reliability Correlation - - Unweighted Accuracy .889 .878 Unweighted Inconclusives .387 .235 Sensitivity .740 .826 Specificity .380 .530 FN Errors .001 .027 FP Errors .107 .143 D-INC .260 .145 T-INC .513 .325 PPV .874 .852 NPV .997 .952 D Correct .999 .968 T Correct .780 .788

Polygraph, 2011, 40(4) 300 Ad Hoc Committee on Validated Techniques

Appendix I-5

Federal ZCT / Three-position TDA (+/-6)

Two usable studies on the Federal ZCT with three-position TDA, using tradition cutscores (+/-6 and -3) produced a combined unweighted average accuracy level of .883 (.048), with an inconclusive rate of .318 (.043). The weighted average of correct decisions for the Federal ZCT three-position model with traditional cutscores, was .990 (.016) for criterion deceptive cases and .675 (.095) for criterion truthful cases. The weighted average inconclusive rates were .158 (.050) for criterion deceptive cases and .477 (.069) for criterion truthful cases.

Capps & Ansley Study Blackwell (1998) (1992) Sample N 100 100 N Deceptive 52 65 N Truthful 48 35 Scorers 1 3 D Scores 52 195 T Scores 48 105 Total Scores 100 300 Mean D -9.640 -5.938 StDev D 5.146 5.503 Mean T 4.780 4.867 StDev T 5.461 5.580 Reliability Kappa - .360 Reliability Agreement - .660 Reliability Correlation - - Unweighted Average Accuracy .977 .779 Unweighted Inconclusives .386 .293 Sensitivity .769 .851 Specificity .438 .314 FN Errors .001 .010 FP Errors .021 .238 D-INC .231 .138 T-INC .542 .448 PPV .974 .781 NPV .999 .968 D Correct .999 .988 T Correct .955 .569

301 Polygraph, 2011, 40(4) Validated Techniques

Appendix I-6

Federal ZCT / Three-position TDA (+/-4)

Two usable studies on the Federal ZCT with three-position TDA, using improved cutscores (+/-4 and -3) produced a combined unweighted average accuracy level of .939 (.028), with an inconclusive rate of .269 (.044). The weighted average of correct decisions for the Federal ZCT three-position model with improved cutscores, was .932 (.042) for criterion deceptive cases and .946 (.035) for criterion truthful cases. The weighted average inconclusive rates were .314 (.066) for criterion deceptive cases and .225 (.059) for criterion truthful cases.

Study Krapohl (1998) Harwell (2000) Sample N 100 88 N Deceptive 50 60 N Truthful 50 28 Scorers 5 3 D Scores 250 180 T Scores 250 84 Total Scores 500 264 Mean D -7.700 -6.194 StDev D 5.876 5.576 Mean T 6.300 5.113 StDev T 5.317 5.521 Reliability Kappa - - Reliability Agreement - .990 Reliability Correlation .900 - Unweighted Accuracy .929 .937 Unweighted Inconclusives .264 .296 Sensitivity .604 .689 Specificity .768 .631 FN Errors .068 .017 FP Errors .032 .071 D-INC .328 .294 T-INC .200 .298 PPV .950 .906 NPV .919 .974 D Correct .899 .976 T Correct .960 .898

Polygraph, 2011, 40(4) 302 Ad Hoc Committee on Validated Techniques

Appendix I-7

Positive Control Technique / Seven-position TDA*

Two studies on the Positive Control Technique produced a combined unweighted average accuracy level of .820 (.043), with an inconclusive rate of .292 (.045). The weighted average of correct decisions for the Positive Control Technique, was .679 (.080) for criterion deceptive cases and .962 (.032) for criterion truthful cases. The weighted average inconclusive rates were .333 (.066) for criterion deceptive cases and .250 (.063) for criterion truthful cases.

Driscoll, Honts & Forman & Study Jones (1987) McCauley (1986) Sample N 40 38 N Deceptive 20 22 N Truthful 20 16 Scorers 1 1 D Scores 20 22 T Scores 20 16 Total Scores 40 38 Mean D -2.000 - StDev D 3.800 - Mean T 6.600 - StDev T 5.700 - Reliability Kappa - - Reliability Agreement - .800 Reliability Correlation .840 .800 Unweighted Accuracy .889 .777 Unweighted Inconclusives .450 .131 Sensitivity .350 .545 Specificity .650 .750 FN Errors .100 .318 FP Errors .001 .063 D-INC .550 .136 T-INC .350 .125 PPV .999 .897 NPV .867 .702 D Correct .778 .632 T Correct .999 .923

* Driscoll, Honts & Jones (1987) used seven-position numerical scoring. Forman & McCauley used a non-numerical TDA approach, and is included here only for comparison purposes.

303 Polygraph, 2011, 40(4) Validated Techniques

Appendix I-8

Relevant-Irrelevant Technique

One usable published study exists for the Relevant-Irrelevant techniques (Krapohl, Senter & Stern, 2005), which resulted in an unweighted accuracy rate of .732 (.044). Krapohl (2006) previously reported the accuracy of the RI technique at .83 with zero inconclusives, while including the results of Correa and Adams (1981) who reported an accuracy level of 100% in a laboratory study that employed methodology that does not reflect field practices.

Krapohl, Senter & Study Stern (2005) Sample N 100 N Deceptive 59 N Truthful 41 Scorers 1 D Scores 59 T Scores 41 Total Scores 100 Mean D - StDev D - Mean T - StDev T - Reliability Kappa - Reliability Agreement .700 Reliability Correlation - Unweighted Accuracy .732 Unweighted Inconclusives .001 Sensitivity .831 Specificity .634 FN Errors .169 FP Errors .366 D-INC .001 T-INC .001 PPV .694 NPV .789 D Correct .831 T Correct .634

Polygraph, 2011, 40(4) 304 Ad Hoc Committee on Validated Techniques

Appendix I-9

Zone Comparison Techniques / Rank Order Scoring System

Two usable studies on the ZCT with Rank Order TDA produced a combined unweighted average accuracy level of .886 (.036), with an inconclusive rate of .213 (.042). The weighted average of correct decisions for the Zone Comparison Technique with Rank Order TDA, was .885 (.050) for criterion deceptive cases and .887 (.053) for criterion truthful cases. The weighted average inconclusive rates were .020 (.061) for criterion deceptive cases and .225 (.057) for criterion truthful cases.

Krapohl, Dutton & Honts & Driscoll Study Ryan (2001) (1987) Sample N 100 60 N Deceptive 50 30 N Truthful 50 30 Scorers 3 2 D Scores 50 30 T Scores 50 30 Total Scores 100 60 Mean D -24.650 -20.900 StDev D 18.970 8.660 Mean T 7.400 13.400 StDev T 17.060 8.660 Reliability Kappa - - Reliability Agreement - - Reliability Correlation - .930 Unweighted Accuracy .879 .906 Unweighted Inconclusives .150 .317 Sensitivity .720 .600 Specificity .720 .633 FN Errors .120 .033 FP Errors .080 .100 D-INC .100 .367 T-INC .200 .267 PPV .900 .857 NPV .857 .950 D Correct .857 .947 T Correct .900 .864

305 Polygraph, 2011, 40(4) Addendum to the 2011 Meta-analytic Survey – MSU-MGQT

Raymond Nelson

This variant of the Modified General Question Technique was first described by Horvath (1988) and was replicated by Horvath and Palmatier (2008). Both studies were designed to investigate effects related to types of comparison questions. Both studies were conducted at the University of Michigan, and the name “Michigan State University Modified General Question Technique” (MSU-MGQT) is used herein. Operation and execution of the MSU-MGQT is, in some ways, not dissimilar to that for other MGQT formats, and the details can be found in the previously cited studies.

Horvath (1988) described a laboratory study involving a test question sequence similar to that described by Reid (1947), though including a fifth relevant question that was used to describe the guilty status of the participant. Data were evaluated by two blind evaluators who used a 7- position scoring model described as similar to the one used by Barland and Raskin (1975) with grand total decision thresholds set at +/-5. Unweighted decision accuracy of blind numerical scores was .871, with an inconclusive rate of .025.

Figure 1. Mean and standard deviations for the scores from truthful and deceptive samples in the two MSU-MGQT studies.

Horvath and Palmatier (2008) reported the results of another study of comparison question types, also involving the MSU-MGQT format, but also including a fifth relevant question that was used to describe the guilty status of the participant. Scoring tasks were completed used a 7-position

1 scoring model described as similar to the one used by Barland and Raskin (1975), with grand total decision thresholds set at +/- 6. Unweighted decision accuracy of blind numerical scores was .882, with an inconclusive rate of .167.

Figure 1 shows a mean and standard deviation plot of the scores of the sampling distributions of the included MSU-MGQT studies. A two-way ANOVA showed that the interaction of sampling distribution and criterion status was significant [F (1,92) = 8.7, (p = .004)]. The source of the interaction is at least partially attributable to the fact that scores for both deceptive and truthful participants were substantially stronger for the Horvath (1988) study, and both were closer to zero in the Horvath and Palmatier (2008) study. One-way ANOVAs showed no significant differences in the scores of the two studies for either the deceptive samples [F (1,46) = 0.231, (p = .633)] or truthful samples [F (1,46) = 0.147, (p = .703)].

Table 1 shows the summary of the two studies combined. Table 2 shows the profile and statistical confidence intervals for the criterion accuracy metrics. Table 3 shows a summary of the individual studies.

Table 1. Summary MSU-MGQT studies. Number of studies 2 Total N 50 N Deceptive 25 N Truthful 25 Number of Examiners/Scorers 3 Total Scores 100 D Scores 50 T Scores 50 Mean D -18.650 StDev D - Mean T 15.300 StDev T - Reliability Kappa - Reliability Agreement 0.95 Reliability Correlation 0.92

2 Table 2. Criterion accuracy and confidence intervals for MSU-MGQT studies. .877 (.049) Unweighted Average Accuracy {.781 to .972} .110 (.043) Unweighted Inconclusives {.025 to .195} .820 (.068) Sensitivity {.686 to .954} .740 (.072) Specificity {.599 to .881} .120 (.064) FN Errors {.001 to .246} .100 (.060) FP Errors {.001 to .218} .060 (.047) D-INC {.001 to .153} .160 (.073) T-INC {.017 to .303} .891 (.065) PPV {.765 to .999} .860 (.075) NPV {.713 to .999} .872 (.068) D Correct {.739 to .999} .881 (.072) T Correct {.740 to .999}

3 Table 3. Summary of individual studies using the MSU-MGQT. Study Horvath (1988) Horvath & Palmatier (2008) Sample N 20 30 N Deceptive 10 15 N Truthful 10 15 Scorers 2 1 D Scores 20 30 T Scores 20 30 Total Scores 40 60 Mean D -24.500 -12.800 StDev D - 17.200 Mean T 18.800 11.800 StDev T - 12.900 Reliability Kappa - Reliability Agreement 0.95 Reliability Correlation 0.920 Unweighted Average Accuracy 0.871 0.882 Unweighted Inconclusives 0.025 0.167 Sensitivity 0.900 0.767 Specificity 0.800 0.700 FN Errors 0.100 0.133 FP Errors 0.150 0.067 D-INC 0.000 0.100 T-INC 0.050 0.233 PPV 0.857 0.920 NPV 0.889 0.840 D Correct 0.900 0.852 T Correct 0.842 0.913

The combined decision accuracy level of the MSU-MGQT studies, weighted for sample size and number of scorers, was .877 with a combined inconclusive rate of .11. Reliability for MSU- MGQT exams, expressed as the concordance rate for categorical decisions, was reported by Horvath (1988) at .95, with a correlation coefficient of .92 for numerical scores of the two evaluators.

4 References

Barland, G. H. & Raskin, D.C. (1975). Psychopathy and detection of deception in criminal suspects. Psychophysiology, 12, 224.

Horvath, F. S. (1988). The utility of control questions and the effects of two control question types in field polygraph techniques. Journal of Police Science and Administration, 16, 198-209.

Horvath, F. & Palmatier, J. (2008). Effect of two types of control questions and two question formats on the outcomes of polygraph examinations. Journal of Forensic Sciences, 53(4), 1-11.

Reid, J. E. (1947). A revised questioning technique in lie detection tests. Journal of Criminal Law and Criminology, 37, 542-547. Reprinted in Polygraph 11, 17-21.

5 Senter, Weatherman, Krapohl, Horvath

Psychological Set or Differential Salience: A Proposal for Reconciling Theory and Terminology in Polygraph Testing

Stuart Senter, Dan Weatherman, Donald Krapohl, and Frank Horvath1

Abstract What is now known as the Comparison Question Technique (CQT) is based on the assumption that truthful persons will be more physiologically responsive to comparison questions than to relevant (incident-related) test questions whereas for deceptive persons the opposite will be true. Years of research have confirmed this expectation. While the term “Psychological Set” has been accepted in the field to refer to this difference in responsiveness, the term has very limited value. It does not accommodate non-CQT procedures and it is neither understood nor applied in the scientific literature as it is by polygraph examiners. In this paper it is proposed that the CQT phenomenon is better described by the concept of “Differential Salience,” a term which has a stronger scientific foundation. Moreover, the concept of differential salience describes what is observed physiologically in common polygraph testing methodologies aside from the CQT.

Introduction dampening, were original to Backster. According to Matte and Grove (2001), The use of what was termed a however, Backster attributed the term “comparative response question,” a question “psychological set” to the author of a 1948 specifically introduced during polygraph psychology textbook (Ruch, 1948). However, testing to provide a stimulus against which to in a recent paper Handler (2007) reported that compare and evaluate the significance of the “psychological set” was not mentioned in the physiological reactions to relevant questions, Ruch text. When Handler discussed this with was introduced by John E. Reid in 1947. Backster, Backster said that he himself had From that time such a “control question” created the term “psychological set” (Handler, became commonplace in the field. Today the personal communication December 15, 2007). “control question” is more appropriately referred to as a comparison question. The use There is some confusion regarding how of such questions is the basis for Comparison these three concepts differ. The commonly Question Techniques (CQT). accepted definition for “psychological set” seems to be what was originally used by In the early 1960’s Cleve Backster Backster to describe his concept of “anti- applied the term “Psychological Set” to climax dampening.” For instance, here is polygraph testing in an attempt to explain the what Backster (1960b) wrote about that functioning of the CQT. That concept, along concept: with two others advanced by Backster, “super- dampening” and “anti-climax dampening,” “The anticlimax dampening concept is joined the lexicon of polygraph examiners formulated on the well-validated from that time forward (Backster, 1960a, psychological principle that a person’s 1960b). Backster introduced all of these fears, , and apprehensions are terms to explain different patterns of channeled toward the situation which physiological responding which he observed holds greatest immediate threat to his during CQT polygraph testing. The latter two self-preservation or general well-being.” terms, super-dampening and anti-climax pg 1.

1 The views expressed in this article are solely those of the authors, and do not necessarily represent those of the Department of Defense or the US Government. The principal author can be reached at [email protected].

109 Polygraph, 2010, 39(2) Psychological Set or Differential Salience

Most polygraph examiners today would 1988; Department of Defense Polygraph likely recognize this definition of anticlimax Institute Research Division Staff, 1998) offers dampening as being the same as the a clear challenge to the jeopardy presumption contemporary definition of the term of “psychological set”. Recall that during DLC “psychological set.” Indeed, Matte and Grove testing the polygraph examiner not only gives (2001) advised that this same definition was the examinee permission to lie on the DLCs, used in 1965 by DACA’s predecessor but there is an explicit agreement struck that organization, the US Army Military Police the examinee will lie to the DLCs. Fear of School (USAMPS), to describe “psychological detection is eliminated as an explanation for set.” an examinee’s reactions to DLC questions. The fear-based “psychological set” concept For the purpose of this paper we have does not support the effectiveness of DLC focused on what is the prevailing field polygraph testing, nor does it explain the understanding of “psychological set.” accuracy of polygraph testing in non- Examinees will be more physiologically threatening situations. The prevailing responsive to stimuli (test questions) that pose definition of “psychological set” therefore is the greatest threat to their well-being or inadequate, prompting the current search for interests. If more than one type of threat is a more fitting and scientifically supported presented, reactivity will tend toward that concept that can account for such effects. which is perceived as the greatest threat with diminished or no reactivity to the secondary The concealed information test (CIT), threat. When the two threats include both also known as the guilty knowledge test (GKT), relevant questions and probable-lie represents another situation where comparison (PLC) questions, “psychological psychological set fails to account for all set” is said to explain why liars respond more possible responses that occur on the test. The greatly to the former and truthtellers to the CIT is based on recognition, and not fear of latter. detection. Over the course of the test, the examinee is simply asked to repeat plausible Central to “psychological set,” and to stimuli contained in a particular crime scene. the concepts “super-dampening” and “anti- The foundation of the test is on what the climax dampening” is the notion that an examinee knows or recognizes, not the examinee’s fears will orient his attention, that emotional reactions that may emanate from the physiological responding will tell where the stimuli, such as fear or perception of those fears are directed, and that deception threat (Ben-Shakhar & Elaad, 2003). and truthfulness can be inferred by the Ultimately, the narrow scope of the pattern of physiological responding. Though psychological set cannot account for the superficially appealing, the theory is robust effects of the CIT. insufficient to explain those circumstances where polygraph testing continues to produce One substantial problem with respect accurate results despite the lack of examinees’ to “psychological set” is that no branch of the fears. For example, previous writers have behavioral sciences, including what is often pointed out that examinees will still react in said to be polygraphy’s parent science, the predicted way during polygraph testing psychophysiology, recognizes how those in the even when there are no threats against the polygraph field apply the term. A search of the examinee (Davidson, 1968; Lieblich, Naftali, EBSCO behavioral science database returned Shumueli, & Kugelmass, 1974). Examinees 39 abstracts or article summaries that will even respond when answering truthfully included the term ‘Psychological Set’ ranging to a chosen card in a card test (Gustafson & in time from 1956 to 2006. Two of these Orne, 1964; Kugelmass, Lieblich, & Bergman, instances were simply incidental pairings of 1967). In these studies there is no threat to the words ‘psychological’ and ‘set’ and did not the examinee’s self-preservation or general refer to an integrated concept. Without well-being, an essential component of the exception, the remaining 37 abstracts used “psychological set” theory. the term psychological set to describe some sort of mindset, expectation, world view, or Similarly, the directed-lie comparison approach taken in a problem solving (DLC) question technique (Honts & Raskin, situation. The term was never used in the

Polygraph, 2010, 39(2) 110 Senter, Weatherman, Krapohl, Horvath

context of channeling or focusing concern on published articles, book chapters, and the greatest of a group of potential threats, as dissertations. Salience is a broad reaching the term is used in the polygraph community. term, covering the arenas of memory, This peculiarity gives rise to confusion and attention, speech pragmatics, perception, provides a basis of derision for polygraph emotion, cognition, social contexts, and many, critics (Furedy, 1991). Even as a best case, many others. There is little to debate “psychological set” is viewed as non-scientific regarding the pervasiveness or general jargon. A replacement expression would not scientific acceptance of the term “salience.” only have to have an accepted definition More importantly, there is also a large body of within the larger , but it literature that argues for the role of salience in would also have to deal with those phenomena lie detection, and more specifically, polygraph which “psychological set” fails to address. We applications. propose that the appropriate expression is “differential salience.” Salience and Lie Detection

Defining Salience Vendemia, Buzan, Green, and Schillaci (2005) and Vendemia, Buzan, and Simon- The adjective “salient” is described as Dack (2005) described a model of deception “prominent” or “conspicuous” based on the that included salience as a key component, Random House Unabridged Dictionary (2006). among other critical factors. This model is The American Heritage Dictionary of the shown in Figure 1. As shown, salience is English Language (2006) characterizes proposed as a critical component of the “salient” as “strikingly conspicuous” or physiological measures indicative of “prominent.” The WordNet® 3.0 system deception, among a model that includes produced by Princeton University describes memory, emotion, and attentional “salient” as “having a quality that thrusts components. itself into attention.” Finally, the Kernerman English Multilingual Dictionary defines Handler and Nelson (2007) recently “salient” as “main,” “chief,” “most noticeable.” argued for the inclusion of salience in the These sources also define “salience” as “the polygraph lexicon, specifically for the purpose state or condition of being salient.” of explaining the PLC question test. However, multiple authors have also included the Based on the definitions above, notion of salience in theories of lie detection. salience indicates that a stimulus is Wolpe, Foster, and Langleben (2005) prominent, conspicuous, and/or striking. As suggested that from “a neuropsychological the source or cause of salience is unspecified perspective, both the CQT and GKT are in these definitions, the implication is that a ‘forced-choice’ protocols that seek to detect given stimulus can be salient for a wide differences in psychological salience between variety of reasons. A stimulus that is questions by examining the physiologic considered to be salient may be threatening, responses of the subject to target and baseline novel, surprising, familiar, complicated, conditions.” pertinent, or otherwise significant. Furthermore, different stimuli will possess In other work, Offe and Offe (2007) different degrees of salience, in the same stated that “the basic assumption of the CQT sense that different stimuli would be is that RQ will have a higher significance for threatening, novel, surprising, etc., to different guilty subjects and CQ for innocent subjects degrees. It may not always be possible to and that these differences in significance will know why, in any particular case, one item is be reflected in the physiological variables more salient than another. recorded as the reaction.” In this work, Offe and Offe (2007) concluded that “criticism Salience in the Scientific Literature cannot be sustained that it would be impossible to systematically achieve a Conducting a search for “salience” or differential significance of relevant and “salient” using the EBSCO PsychINFO comparison questions and to measure it database for the behavioral sciences indicates physiologically.” that these terms exist in over 15,000

111 Polygraph, 2010, 39(2) Psychological Set or Differential Salience

Honts (2004) also described the perceived by the examinee generated these rationale of the comparison question test as different reactions. Honts used the general assessing credibility based on the differential description of differences in salience tied to reactions caused by two different types of different questions to explain the diagnostic questions. Differential salience between value of both PLC and DLC question tests. relevant and comparison questions as

Figure 1. Proposed relationship of the underlying theoretical constructs involved in the process of deception. From Vendemia, Buzan, and Simon-Dack (2005).

Salience in the Context of the difference in the degree of salience across Psychophysiological Detection of stimuli provides the basic theoretical Deception foundation for all polygraph methodologies listed below. It is well accepted, at least given Reid (1962) recognized “emotionally current scientific knowledge, that there is no weighted” response differences not only unique “lie” response. It is only the variability between relevant and comparison questions, in the salience of different stimuli that permits but also observed degrees of salience across an inference of “lying” or that an item is the relevant questions themselves. The simple recognized as unique amongst other items.

Polygraph, 2010, 39(2) 112 Senter, Weatherman, Krapohl, Horvath

The following passages discuss how ratings, even though provided by persons who differential salience can be used to explain the were unaware of the theoretical basis for PLC effectiveness of major categories of polygraph testing, were clearly supportive of the testing approaches. underlying premise. Those examinees who were “deceptive” during the testing rated the PLC Question Test relevant questions as of significantly greater concern to them than the comparison In the PLC question test, examinees questions. On the other hand, “truthful” are guided to commit that they are ‘not the examinees expressed significantly greater kind of person’ that would perform a concern for the comparison questions than for particular transgression, such as lying, the relevant items. These subjective ratings, in stealing, or harming another. Later, addition to the objective scoring of the examinees are specifically asked whether they physiological data, showed that the salience of have ever done such things. The assumption the comparison questions was greater for is that all examinees have done these things, “truthful” examinees and the salience of the but that the presentation of such questions relevant questions was greater for those who will be particularly uncomfortable to the were “deceptive.” It can be concluded that it is truthful participant who has little concern this differential salience that accounts for the over the relevant questions. In contrast, guilty diagnostic value of PLC polygraph testing. examinees are expected to have little concern Moreover, because similar findings have been over the comparison questions, engaged reported in more recent studies, there is instead by the relevant questions. compelling reason to suggest that this differential salience generalizes across a The different response magnitudes variety of settings (Honts, 2003; Horowitz, produced by truthful and deceptive examinees Kircher, Honts, & Raskin, 1997; Offe & Offe, during the polygraph data collection process 2007). can be attributed to the differential salience that the relevant and comparison questions DLC Question Test hold for the two groups of examinees. Comparison questions are more salient The flexibility of the term salience can (perhaps because they are more threatening, also be used to explain the processes included pertinent, or otherwise significant) than in the DLC question test. This test is distinct relevant questions for truthful examinees. from the PLC question test in that Relevant questions are more salient (again participants are instructed to lie to the due to increased relative threat or comparison questions. In this context, significance) than comparison questions for comparison questions do not present different deceptive examinees. Thus, the PLC question levels of threat to truthful or deceptive test is effective due to the different levels or examinees, but rather, different levels of threat or pertinence (encompassed by the cognitive engagement. Relevant questions, in term ‘salience’) possessed by the different this type of test, still exist as a potential classes of questions, for deceptive and truthful threatening stimulus, depending on the examinees. This coincides with the reasoning examinee. presented by Honts (2004) and Wolpe et al. (2005) described earlier. Truthful participants should experience little threat from the relevant While the differential salience of the questions, but should view the DLCs as more two categories of questions for truthful and salient, as these questions require them to deceptive examinees has been demonstrated perform a cognitive task, as directed by the empirically from inspection of the objective polygraph examiner. Also, truthful examinees physiological data, it has also been obtained are likely vigilant for such questions, relative when examinees are debriefed following to the other questions, which likely polygraph examinations (National Research contributes to their differential salience. In Council, 2003). Horvath (1988) asked contrast, deceptive examinees should feel examinees in a laboratory study to provide threatened by the relevant questions, which subjective ratings of the questions that had detract from the prominence or salience of the been asked during the testing process. These DLC questions.

113 Polygraph, 2010, 39(2) Psychological Set or Differential Salience

In principle, the DLC questions are on similar principles. The key value or more salient for truthful examinees than are location of the item of interest holds a special relevant questions due to the cognitive task prominence to the deceptive examinee, and required of DLC questions and potentially due thereby, greater salience, relative to the other to examinee vigilance. For deceptive stimuli presented. Thus, presentation of the examinees, the relevant questions are more key value or location should produce the salient than the DLC questions due to the greatest physiological response, relative to the threat that they represent. For these reasons other items in the presentation sequence. In we can expect greater response magnitudes to accordance with this line of reasoning, Wolpe the comparison questions for truthful et al. (2005) suggested that the purpose of the examinees and greater responses to relevant concealed information test was to assess the questions for deceptive examinees in the salience of information presented to the context of the DLC question test. This fits with examinee. Honts’ (2004) and Wolpe et al.’s (2005) description of the comparison question test, Relevant/Irrelevant Test though the source/origination of salience is certainly different than that of the PLC The relevant/irrelevant test operates question test. on the premise that a particular relevant question that represents the greatest threat to Concealed Information Test the examinee will produce significant and consistent physiological responses relative to The concealed information test other relevant questions. For example, an involves the presentation of elements individual who has committed an undetected (sometimes called ‘keys’) present at a crime theft will likely be more concerned about or scene, in addition to similar but unrelated more threatened by this question than control (also referred to as foil) items that were relevant questions dealing with issues about not present at the crime. The key items could which the examinee has nothing to hide. In include the type of weapon used to stab a this context, questions that pertain to the victim, the caliber of weapon used, the color of issues about which examinees are hiding the victim’s shirt, and so on. In theory, the information are more salient than those individual who committed the crime will know questions that do not, due to the different the elements or keys of the crime, and these level of threat, concern, or prominence that stimuli will be familiar and more prominent they represent. This differential salience than the control stimuli. For the person who across questions produces greater did not commit the crime, the key stimuli physiological responses to the relevant should not be prominent or familiar relative to question or questions of concern. the control stimuli. Concluding Comments The concealed information test is effective because of the differential salience of It is important to note that all the key stimuli relative to the control stimuli, polygraph approaches are effective or from the standpoint of the guilty individual. diagnostic to the extent that the stimuli and Key stimuli are recognized and familiar for questions that they include are salient in the guilty individual, relative to control stimuli, expected direction. For instance, Offe and Offe and larger physiological response magnitudes (2007) demonstrated that participants for are produced. Innocent individuals whom erroneous decisions were made did not undergoing a concealed information test rate comparison questions and relevant should not find the key stimuli familiar questions in the predicted direction. In other relative to the control stimuli. Thus, for words, for these participants, relevant innocent individuals, there should be no questions and comparison questions did not differential salience for the key stimuli relative show differential salience. This was not true to the control stimuli. for participants on whom correct decisions were made. Versions of the concealed information test, including the peak of tension and In the same vein, the concealed searching peak of tension approaches, operate information test is only effective to the extent

Polygraph, 2010, 39(2) 114 Senter, Weatherman, Krapohl, Horvath

that the key stimuli are salient for the guilty framework through which a variety of examinee and not for innocent examinees. If polygraph approaches can be explained. The ineffective key stimuli are chosen, for example expression differential salience allows for when the guilty examinee does not remember multiple explanations of physiological arousal a particular element of the crime, then this to come to bear, not just being limited to key will not be familiar or differentially salient threatening circumstances, as with to the examinee, relative to the foil (or control) conventional polygraph explanations. Using items. No differences in response magnitudes salience, we can account for physiological between key and control items are likely to be responses that occur due to a variety of demonstrated in such cases. (Note: The term reasons, and in the context of a variety of “buffer” usually refers to the opening item, not polygraph formats. The notion that such to the noncritical ones which are usually responses appearing in polygraph called foils, controls, non-critical items or examinations occur due to a single source similar terms.) (e.g., fear of detection) is incompatible with the psychophysiological literature. Salience Summary provides a vehicle to broaden and enhance our understanding of the multiple factors that The concept of salience and more come to bear in polygraph testing. It is time to specifically, “differential salience” provides a change the way we present what we do in defensible and comprehendible theoretical terms of common scientific understandings.

115 Polygraph, 2010, 39(2) Psychological Set or Differential Salience

References

Backster, C. (1960a). Outside "super-dampening" factor. Annual school research series of polygraph technique trends.

Backster, C. (1960b). "Anticlimax dampening" concept. Annual school research series of polygraph technique trends.

Ben-Shakhar, G., & Elaad, E. (2003). The validity of psychophysiological detection of information with the guilty knowledge test: A meta-analytic review. Journal of Applied Psychology, 88(1), 131-151.

Davidson, P.O. (1968). Validity of the guilty knowledge technique: The effect of motivation. Journal of Applied Psychology, 52(1), 62-65.

Department of Defense Polygraph Institute Research Division Staff (1998). Psychophysiological detection of deception accuracy rates obtained using the test for espionage and sabotage. Polygraph, 27(1), 68-73.

Furedy, J. J. (1991). Alice-in-Wonderland terminological usage in, and communicational concerns about, that peculiarly American flight of technological fancy: The CQT polygraph. Integrative Physiological & Behavioral Science, 26(3), 241-247.

Gustafson, L.A., and Orne, M. (1964, April 17). The effect of "lying" in "lie detection" studies. Paper presented at the 35th annual meeting of the Eastern Psychological Association, Philadelphia, PA.

Handler, M., & Nelson, R. (2007). Polygraph terms for the 21st Century. Polygraph, 36(3), 157-164.

Honts, C. R. (2003). Participants perceptions support rationale of comparison question tests for psychophysiological detection of deception. Psychophysiology, 40, S48.

Honts, C. R. (2004). The psychophysiological detection of deception. In Granhag, P., & Strömwall, L. (Eds), The Detection of Deception in Forensic Contexts. (pp 103-123) New York, NY: Cambridge University Press.

Honts, C.R., and Raskin, D.C. (1988). A field study of the validity of the directed lie control question. Journal of Police Science and Administration, 16(1), 56-61.

Horowitz, S. W., Kircher, J. C., Honts, C. R., & Raskin, D. C. (1997). The role of comparison questions in physiological detection of deception. Journal of Applied Psychology, 79, 252- 259.

Horvath, F. (1988). The utility of control questions and the effects of two control question types in field polygraph techniques. Journal of Police Science and Administration, 16(3), 198-209.

Krapohl, D.J. (2001). A brief rejoinder to Matte & Grove regarding “psychological set.” Polygraph, 30(3), 203-205.

Kugelmass, S., Lieblich, I., & Bergman, Z. (1967). The role of "lying" in psychophysiological detection. Psychophysiology, 3(3), 312-315.

Lieblich, I., Naftali, G., Shumueli, J., & Kugelmass, S. (1974). Efficiency of GSR detection of information with repeated presentation of series of stimuli in two motivational states. Journal of Applied Psychology, 59(1), 113-115.

Polygraph, 2010, 39(2) 116 Senter, Weatherman, Krapohl, Horvath

Matte, J.A., & Grove, R.N. (2001). Psychological set: Its origin, theory and application. Polygraph 30(3), 196-202.

No Author (2006). The American Heritage Dictionary of the English Language (4th ed.). Houghton Mifflin Company.

No Author (2006). Kernerman English Multilingual Dictionary (Beta Version).K Dictionaries Ltd.

No Author (2006). Random House Unabridged Dictionary. Random House, Inc.

No Author (2006). WordNet® 3.0. Princeton University.

Offe, H., & Offe, S. (2007). The comparison question test: Does it work, and if so how? Law and Human Behavior, 31, 291-303.

Reid, J. E. (1947). A revised questioning technique in lie detection tests. Journal of Criminal Law and Criminology of Northwestern University, 37(6).

Reid, J.E. (1962, Aug). The Emotionally Weighted Question in Lie-Detector Testing. Paper presented at the 9th Annual Meeting of the American Academy of Polygraph Examiners, Chicago, IL.

Ruch, F.L. (1948). Psychology and Life. Scott Foresman: Chicago.

Vendemia, J. M. C., Buzan, R. F., Green, E. P., & Schillaci, M. J., (2005). Effects of preparedness to deceive on ERP waveforms in a two-stimulus paradigm. Journal of Neurotherapy, 9(3), 45-70.

Vendemia, J. M. C., Buzan, R. F., & Simon-Dack, S. L. (2005). Reaction time of motor responses in two-stimulus paradigms involving deception and congruity with varying levels of difficulty. Behavioural Neurology, 16 (2005) 25–36.

Wolpe, P.R., Foster, K.R. & Langleben, D.D. (2005). Emerging neurotechnologies for lie-detection: Promises and perils. The American Journal of Bioethics, 5(2), 39-49.

117 Polygraph, 2010, 39(2)