Making Crime Analysis More Inferential

Making Crime Analysis More Inferential Dr Michael Townsley School of Criminology and Criminal Justice, Griffith University 21–27 April 2014 / International Summit On Scientific Criminal Analysis 1 / 35 Outline Defining What Analysis Is Five principles of statistical reasoning Three strategies to avoid errors 2 / 35 What Do We Mean by Analysis? Analysis is not simply descriptive. It must include some component of reasoning, inference or interpretation. Regurgitating numerical values or summarising the situation is not analysis Need a system for doing this, comprising: • Appropriate theory • Methods to generate and test hypotheses A system will allow you to generate knowledge about the criminal environment. 3 / 35 Theories for Crime Analysis: Environmental Criminology crime = motivation + opportunity • Rational choice • Routine activity • Crime pattern theory 4 / 35 Problems Let’s acknowledge the range of factors limiting analysts from doing their work: Organisational Individual Tasking Training Operational imperatives Highly variable performance Cognitive biases 5 / 35 Humans find patterns anywhere • Apophenia is the experience of seeing patterns or connections in random or meaningless data. • Pareidolia is a type of apophenia involving the perception of images or sounds in random stimuli (seeing faces in inanimate objects) 6 / 35 Seeing Faces 7 / 35 The hungry helicopter eats delicious soldiers 8 / 35 These boxes are planning something . 9 / 35 Cookie Monster spotted in Aisle 4 10 / 35 Outline Defining What Analysis Is Five principles of statistical reasoning Three strategies to avoid errors 11 / 35 Principles of Statistics The field of statistics is decision making under uncertainty. Without being over simplistic, the entirety of statistics can be distilled into five core principles: 1 rates over counts 2 making comparisons 3 retrospective versus prospective 4 sampling bias 5 Simpson’s paradox Because decisions in operational law enforcement need to be made with incomplete data, in imperfect conditions and under significant time pressure, a statistical approach should enable better analysis, or at least avoid common pitfalls. 12 / 35 Statistical Principle 1: Frequencies Versus Rates • A rate is a frequency adjusted for the underlying population at risk • Usually population based, but could be number of properties 13 / 35 International comparisons Using data from UNDOC, police recorded assaults in Chile, US, Australia and New Zealand 2004 2006 2008 2010 count rate 8e+05 US Australia 300 6e+05 250 NZ US amount 4e+05 200 2e+05 Australia 150 Chile Chile NZ 0e+00 2004 2006 2008 2010 year 14 / 35 Issues • significance of counts for practical purposes • what is the population at risk? • relevance with respect to time (by day, seasonal patterns) • most crimes measured at point level, but rate calculation some level of aggregation (to streets or areas, say) 15 / 35 Aggregation effects (Safe as Houses project) What is the population at risk? How relevant are the boundaries here? 16 / 35 Statistical Principle 2: Making Comparisons • Crime figures are meaningless without reference to a comparison area or some baseline crime level. • confounding issue is regression to the mean • Comparison groups can be misleading due to contextual differences • Hot spot maps (with hot and cold areas) do not constitute a valid comparison • need to include comparisons of the causal factor as well as crime 17 / 35 Some connection between street network and drug arrests Source: Eck (1997) 18 / 35 Strong relationship between poor place management and drug arrests Source: Eck (1997) 19 / 35 Statistical Principle 3: Retrospective Versus Prospective Risks • risk factors can be computed through different types of studies: retrospective and prospective • retrospective studies examine a group experiencing an outcome and examine their past • prospective studies follow a population and examine their lifestyle and whether the outcome occurs • Many risk factors are computed using a retrospective study, but expressed in prospective terms. 20 / 35 Sexual assaults on public transport A study claims that 80% of sexual assaults take place on the public transport system. The inference drawn is that there is a high chance of victimisation if you use public transport. • if victimised, there is a high chance of using public transport (what the study says); and • if using public transport, there is a high chance of being victimised (what get communicated). Each of these statements is a conditional probability (proportion of an event within a subsample). But here the subsamples and events are been swapped. 21 / 35 Let’s Look at Data to Make This Concrete Victimised Not victimised Total Public transport 80 10,000 10,080 Not public transport 20 11,000 11,020 Total 100 21,000 21,100 • if victimised (N=100), 80% used public transport (retrospective) • if using public transport (N=10,080), almost 1% were victimised (prospective). 22 / 35 Let’s Look at Data to Make This Concrete Victimised Not victimised Total Public transport 80 10,000 10080 Not public transport 20 11,000 11,020 Total 100 21,000 21,100 • if victimised (N=100), 80% used public transport (retrospective) • if using public transport (N=10,080), almost 1% were victimised (prospective). 23 / 35 To make a retrospective comparison Victimised Not victimised Total Public transport 80 10,000 10,080 Not public transport 20 11,000 11,020 Total 100 21,000 21,100 • if victimised (N=100), 80% used public transport • if not victimised (N=21,000), 48% used public transport. 24 / 35 To make a prospective comparison Victimised Not victimised Total Public transport 80 10,000 10080 Not public transport 20 11,000 11,020 Total 100 21,000 21,100 • if using public transport (N=10,080), almost 0.8% were victimised. • if not using public transport (N=11,020), 0.2% were victimised 25 / 35 Retrospective vs Prospective 1 Crime analysts will virtually always have retrospective studies, so this problem will come up 2 Make sure valid comparisons are made. Compare conditional probabilities appropriately 3 Retrospective proportions overstate the size of the risk factor. 26 / 35 Statistical Principle 4: Selection Bias • occurs naturally whenever secondary data analysis conducted • differences between victimisation surveys and official statistics; various filters operating on what offences are reported to police, which get recorded. • survivorship bias 27 / 35 Spatial Selection Bias • Ratcliffe (2001) catalogues the ways that geocoding can go wrong: • Out of date property parcel map • Abbreviations and misspellings • Local name variations • Address duplication • Non-existent addresses • Non-addresses • Bichler and Balchak (2007) found distinctive systematic biases in geocoding errors in the major GIS applications. • Ratcliffe (2001) between 5 and 7% of records geocoded to incorrect census tracts. 28 / 35 Statistical Principle 5: Simpson’s Paradox Table: Aggregate Crime Rates for Areas 1 and 2 Area 1 Area 2 Total 7.75 7.20 Area 2 is safer than Area 1, in aggregate. It would be worth considering what examples of best practice might be transferred to Area 1. To do so, we look at different crime types. 29 / 35 Crime Rates by Crime Types Area 1 Area 2 Crime Type Freq Denom. Rate Freq Denomin. Rate Assault 256 41,250 6.21 430 54,000 7.96 Comm. Burglary 178 2,800 6.36 30 350 8.57 Car Theft 69 20,850 3.31 66 18,750 3.52 Total 503 64,900 7.75 526 73,100 7.20 Area 2 has a higher crime rate for all crimes. 30 / 35 Explaining Simpson’s Paradox • Operates when patterns of rates (or proportions) calculated for an entire sample are not consistent for patterns for subgroups of the data. • A result of changing denominators in crime rates and is a result of only relying on proportions or rates as indicators of activity. • Usually a sign of a lurking variable • NOTE! This directly contradicts the First Principle listed here. 31 / 35 Spatial Principles Spatial data and analyses have a number of unique attributes that need to be controlled for: • modifiable area unit problem – when point level information is aggregated arbitrary administrative boundaries • spatial autocorrelation – things are close are more similar than distant things. 32 / 35 Outline Defining What Analysis Is Five principles of statistical reasoning Three strategies to avoid errors 33 / 35 Three strategies to avoid errors 1 be more scientific. Come to my next talk! 2 employ more sophisticated methods. Upskill analysts and collaborate with researchers 3 be more focused and use crime theories. Read Eck (1997) “What do all these dots mean?” chapter. 34 / 35 BibliographyI Bichler, G. and Balchak, S. (2007). Address Matching Bias: Ignorance Is Not Bliss. Policing: An International Journal of Police Strategies & Management, 30(1):32–60. Eck, J. E. (1997). What Do Those Dots Mean? Mapping Theories with Data. In Weisburd, D. L. and McEwen, T., editors, Crime mapping and crime prevention, volume 8 of Crime Prevention Studies, pages 377–406. Criminal Justice Press, Monsey, NY. Ratcliffe, J. H. (2001). On the Accuracy of Tiger-Type Geocoded Address Data in Relation to Cadastral and Census Areal Units. International Journal of Geographical Information Science, 15(5):473–485. 35 / 35.

Making Crime Analysis More Inferential

A Task-Based Taxonomy of Cognitive Biases for Information Visualization

Cognitive Bias Mitigation: How to Make Decision-Making More Rational?

THE ROLE of PUBLICATION SELECTION BIAS in ESTIMATES of the VALUE of a STATISTICAL LIFE W

Bias and Fairness in NLP

Why So Confident? the Influence of Outcome Desirability on Selective Exposure and Likelihood Judgment

Correcting Sampling Bias in Non-Market Valuation with Kernel Mean Matching

When Do Employees Perceive Their Skills to Be Firm-Specific?

Evaluation of Selection Bias in an Internet-Based Study of Pregnancy Planners

Survivorship Bias Mitigation in a Recidivism Prediction Tool

Testing for Selection Bias IZA DP No

Perioperative Management of ACE Inhibitor Therapy: Challenges of Clinical Decision Making Based on Surrogate Endpoints

Correcting Sample Selection Bias by Unlabeled Data