2

The Accuracy of the Violent Offender Identification Directive Tool to Predict Fu...

 2 years ago
source link: https://journals.sagepub.com/doi/full/10.1177/0093854818824378
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

The Accuracy of the Violent Offender Identification Directive Tool to Predict Future Gun Violence

First Published January 17, 2019

Research Article

Abstract

We evaluate the Violent Offender Identification Directive (VOID) tool, a risk prediction instrument implemented within a police department to identify offenders likely to be involved with future gun violence. VOID uses a variety of static measures of prior criminal history that are available in police records management systems. The VOID tool is assessed for predictive accuracy by taking a historical sample and calculating scores for over 200,000 individuals known to the police at the end of 2012, and predicting 103 individuals involved with gun violence (either as a shooter or a victim) during 2013. Despite weights for the instrument being determined in an ad hoc manner, the VOID tool does very well in predicting involvement with gun violence compared with an optimized logistic regression and generalized boosted models. We discuss theoretical reasons why such ad hoc instruments are likely to perform well in identifying chronic offenders for all police departments.

Introduction

Gun violence presents a growing concern to scholars and practitioners alike (Rozel & Mulvey, 2017). Research indicates that a number of factors are associated with involvement in criminal gun violence as a perpetrator or victim, including characteristics such as age, sex, race, economic and social disadvantage, area of residence, criminal history, social network characteristics, and previous exposure to violence, among others (Braga, Papachristos, & Hureau, 2010; Fowler, Dahlberg, Haileyesus, & Annest, 2015; Nofziger & Kurtz, 2005; Papachristos, Braga, & Hureau, 2012; Pear, Castillo-Carniglia, Kagawa, Cerdá, & Wintemute, 2018). Moreover, like many other aspects of criminal offending and victimization (Spelman, 1995), gun violence concentrates among a small number of individuals (Papachristos et al., 2012). Thus, in line with the Risk–Needs–Responsivity model of intervention (Andrews & Bonta, 2006; Andrews, Bonta, & Wormith, 2006), focusing interventions on those individuals who are most at risk may be a useful approach to attempt to reduce gun violence. Predicting future involvement in gun violence, however, is a challenging problem. Shootings, and violence in general, are relatively rare occurrences.

Notably, although risk assessment tools are commonly used to predict recidivism in sentencing contexts (Monahan & Skeem, 2016), validated instruments have not typically been used by police departments to predict future violence—particularly gun violence—among large populations of known individuals. In the current study, we apply a novel risk prediction tool—the Violent Offender Identification Directive (VOID)—in predicting future gun violence from information available in police records. This evaluation of the VOID tool in this study has two benefits to the wider criminal justice research and practitioner community. First, we quantify how well one can predict future involvement in gun violence in a large sample of individuals known to the police. Second, we use items typically available in police record management systems, meaning that much of the instrument can be replicated in other jurisdictions.

Violence Risk Assessment

There are three basic approaches to risk assessment: actuarial, structured professional judgment, and unstructured clinical judgment (Dolan & Doyle, 2000; Meehl, 1954; Singh, Desmarais, & Van Dorn, 2013). Actuarial tools assign weights to research-based risk factors to produce a weighted combination of risk factors known as a risk score. Structured professional judgment directs the users’ attention to specific, empirically validated risk factors. Unstructured clinical judgment, on the contrary, rests on more subjective clinical evaluations. In general, actuarial tools and structured clinical judgments both provide reasonably accurate risk assessments, and both perform better than unstructured clinical judgments (Dawes, 1979; Douglas, Yeomans, & Boer, 2005; Grove, Zald, Lebow, Snitz, & Nelson, 2000; Hanson, 2009; Kahneman, 2011; Meehl, 1954; Tetlock & Gardner, 2015).

Many risk assessment tools, using a variety of both unchanging or “static” risk factors (i.e., age at first arrest, gender) and changing or “dynamic” risk factors (e.g., residential stability, relationship status), have been found to predict violent behavior with some accuracy (Campbell, French, & Gendreau, 2009; Douglas et al., 2005), and no single tool emerges as the best predictor across multiple studies and meta-analyses (Campbell et al., 2009; Douglas et al., 2005; Fazel, Singh, Doll, & Grann, 2012; Glover, Nicholson, Hemmati, Bernfeld, & Quinsey, 2002; Kroner, Mills, & Reddon, 2005; Singh et al., 2013; Yang, Wong, & Coid, 2010). Common violence risk assessment tools (e.g., Historical, Clinical, Risk Management [HCR-20; C. D. Webster, Eaves, Douglas, & Wintrup, 1995]; Level of Service Inventory–Revised [LSI-R; Andrews & Bonta, 1995]; Structured Assessment of Violence Risk in Youth [SAVRY; Borum, Bartel, & Forth, 2003]; Violence Risk Appraisal Guide [VRAG; Quinsey, Harris, Rice, & Cormier, 2006]) typically include different combinations of static and dynamic criminal history/involvement items, measures of individuals’ stability and resources, contextual factors such as neighborhood disadvantage, and psychological characteristics.

Risk assessment tools are commonly used in corrections—particularly probation and parole—to determine which individuals are likely to recidivate (Miller & Maloney, 2013; Monahan & Skeem, 2016). Thus, risk assessment tools have generally been developed to follow a known set of individuals about whom a large amount of information can be gathered. Police, on the contrary, are interested in predicting future violence among a much larger set of individuals. Moreover, police departments are often limited to whatever information is available in their record management systems: Whereas police may have data on dynamic factors such as individuals’ gang membership or employment status, their information is frequently limited to prior criminal history and victimizations, along with basic demographic information (Jennings, 2006). While the lack of dynamic risk factors available in police record management systems prevents police departments from matching specific treatments to those predicted as being of high risk (a common goal of the majority of risk assessment instruments), the static characteristics available still tend to be predictive of involvement in future violence (Berk, Sherman, Barnes, Kurtz, & Ahlman, 2009; Neuilly, Zgoba, Tita, & Lee, 2011).

Early case studies have suggested that police departments traditionally have not relied on quantitative indicators of prior criminal history to identify chronic offenders, but rather target individuals who are simply believed to be currently active in committing crime based on officer intelligence (Martin & Sherman, 1986; Spelman, 1990). Although risk assessment tools for use by police officers have become increasingly popular in recent years (Messing, Campbell, Wilson, Brown, & Patchell, 2017; Storey, Kropp, Hart, Belfrage, & Strand, 2014), most of these focus on predicting future violence involvement of victims or perpetrators who are identified by police and “screened” via questionnaires to develop risk scores. In addition, common police risk assessment methods (e.g., the Lethality Screen used as part of the widely adopted Lethality Assessment Program [Messing et al., 2017]) tend to focus on specific contexts for violence, particularly domestic or intimate partner violence. Although several instruments used by police, including the Ontario Domestic Assault Risk Assessment (ODARA; Hilton et al., 2004) and Brief Spousal Assault Form for the Evaluation of Risk (B-SAFER; Kropp & Hart, 2004), have been found to effectively predict domestic or intimate partner violence (Hilton, Harris, Rice, Houghton, & Eke, 2008; Messing & Thaller, 2013; Storey et al., 2014), all focus exclusively on domestic violence in this context and remain limited in their ability to predict gun violence among individuals who have not been involved in previous incidents. A main reason for this is that collecting dynamic indicators, or otherwise conducting in-depth “screening,” for an entire set of individuals known to a police department is simply not feasible. However, the success of these instruments in predicting domestic or intimate partner recidivism does demonstrate the utility of risk assessment for police.

Given that a variety of static risk factors appear to be efficacious in predicting violence, however (Mills, Kroner, & Hemmati, 2003), it may be that police departments can achieve accurate predictions of future gun violence among a large population of individuals using only the data that they have available (e.g., prior arrests, victimizations, and other types of contact with the police). In what follows, we evaluate the predictive power of the VOID, a risk prediction instrument that is currently being used to this end by an upstate New York police department to identify people who may be involved in future gun violence, either as victims or as perpetrators, with a goal of providing interventions for those individuals prior to any involvement in violence.

The VOID

In brief, the VOID is a risk prediction instrument that is used in conjunction with two coordinated interventions, one that is enforcement-based and another that is prevention-based (i.e., social services). First, the police department calculates risk scores from the VOID tool. Next, supplementing this information with additional intelligence, the department identifies the 10 offenders at highest risk—the “top ten in crisis.” These 10 people are then prioritized for enforcement and preventive attention, and these interventions comprise the VOID.

The identification of the “top 10 in crisis” list occurs in two stages. In the first stage, numeric scores are calculated for persons known to the police department via prior police contacts. These are known as the VOID scores, based mainly on information about individuals’ prior criminal histories, and the algorithm by which the scores are generated is known as the VOID tool. These scores are then ranked, and analysts from the department’s Crime Analysis Center (CAC) review a large number of the highly ranked individuals, from among whom CAC personnel select the 10 individuals to be placed on the VOID list. Analysts use not only prior criminal history but also current human intelligence to determine which individuals to place on the list (Abrahamse, Ebener, Greenwood, Fitzgerald, & Kosin, 1991)—so the 10 individuals placed on the list do not necessarily have the 10 highest VOID scores.

Examples of intelligence used by analysts may include whether an individual is regularly out in public (so at higher risk) or “laying low” (at lower risk). Some pieces of information that obviously influence an individual’s risk are not captured in the police database (such as whether the individual is deceased, has moved outside of the department’s jurisdiction, or is currently incarcerated). These pieces of information can be used to eliminate individuals who may have high risk according to their VOID score, but should not be the target of any police intervention. Thus, although the initial construction of the scores is purely actuarial, officers’ subjective judgment is used to formulate the smaller pool of targeted individuals.

The VOID list is reconstructed on a monthly basis, and two scores are calculated for all known offenders. One, a historical score, takes into account all of an offender’s previous involvement in the various kinds of incidents. The other, a 90-day score, includes only the last 90 days’ involvement in those incidents. The department places more emphasis on the 90-day score, inasmuch as the object of VOID is to identify the people at high risk of near-term violence. Additional intelligence is gathered by the crime analysis unit on a day-to-day basis and at VOID meetings with Enhanced Supervision Unit (ESU) officers, as well as through debriefings of offenders and conferences with arresting officers.

Once placed on the VOID list, an individual is subject to a two-pronged effort to prevent his or her involvement in gun violence. Detectives initiate (or continue) investigations to build prosecutable cases against VOID list members, thereby removing them from the street and from further risk of involvement in gun violence. Such long-term investigations typically focus solely on drug offenders, so VOID helps to prioritize resources on individuals who are at high risk of violence offending and victimization.

At the same time, officers assigned to the ESU perform service-based outreach to individuals on the list, as well as to other parties who might exercise prosocial influences over the individuals. The ESU’s main function is to act as a liaison between the individuals and different social service agencies, as well as to collect intelligence on the current behavior of the top 10 in crisis. The current status of the individuals on the VOID list is reviewed at biweekly meetings that include detectives, ESU, crime analysts, command staff, and probation and parole representatives. Although the scores are reconstructed and reviewed every month, turnover of individuals on the list is not that frequent. Once individuals are on the list they tend to stay on the list until all parties involved deem it appropriate to remove an individual. To date, the majority of individuals placed on the list tend to typically stay on the list for around 6 to 12 months.

Development of the VOID Tool

The VOID tool was developed in 2012 by members of the police department’s analysis unit. Tasked in 2012 with formulating a mechanism for identifying people at very high risk of near-term involvement in gun violence (particularly a fatal or nonfatal shooting or a “shots fired” incident, and not merely a robbery with a gun or a menacing), staff identified a number of people who appeared from street intelligence to be at high risk. Nominations were sought from sworn personnel whose assignments (or whose approaches to their jobs) made them knowledgeable sources, such as community policing officers, detectives, and other patrol officers. The analysis unit proceeded to examine the backgrounds of these nominated high-risk individuals for markers that would substantiate the intel-based designation as high risk. The markers included, for example, weapons possession, truancy, and individual or family involvement in a street gang. This analysis of individual backgrounds confirmed that the identified individuals were people whose histories and circumstances were those that are plausibly associated with gun violence.

With a view toward making the risk scoring more systematic and broadly applicable, data were assembled on all shooters and shooting victims since 2009. Information from the record management system on their involvement in various kinds of incidents was collated and examined to identify those that were most prevalent, including but not limited to criminal possession of marijuana, involvement in shootings and shots fired incidents, resisting arrest, gang involvement, field interviews, a history as a missing person, domestic assault, and disorderly conduct. Each of these was used as a risk factor for constructing the VOID scores. Weights that were commensurate with the aggregate frequencies of the various incidents were formulated for each of the risk factors. Some types of incidents are also given greater weight based on context. For example, field interviews conducted in a known gang location are given additional weight above other field interviews.

Method

Analysis Plan

This analysis only examined the predictive accuracy of the VOID tool—that is how well the VOID scores predicted future involvement in gun violence. We did not examine the accuracy of the subjective judgments made by officers, nor did we evaluate the effectiveness of the VOID intervention to reduce future gun violence. To assess the predictive accuracy of the tool, we computed VOID scores and analyzed involvement in gun violence for a period of time that predates the VOID intervention, lest we confound the impact of the interventions with the predictions of the scores. Scores were computed based on information through the end of 2012, and these scores were used to predict involvement in gun violence during 2013. Involvement in gun violence may entail either being an offender or a victim, and we will subsequently conduct analyses of both these subgroups as well as a combination of either type of involvement. These dates were chosen as they were the earliest date in which all the same information was available to replicate the scores before the VOID program started in 2014. Given the rarity of gun violence in this jurisdiction, we do not attempt to assess the near-term predictive validity of the VOID tool in this analysis (e.g., how well the 90-day scores predict violence in the next month). Given that individuals tend to stay on VOID top-10 list for an extended period of time though (typically in a range of 6 to 12 months), assessing the long-term predictive validity of the tool has utility for how the instrument is used in practice.

Sample

We constructed VOID scores based on historical data through 2012 for every individual who was assigned a unique identifier (a Master Name Index or MNI) in the police department’s record management system prior to 2013, a total of 237,232 individuals. These individuals had at least one police contact between 2000 and 2012. We then used these scores to predict whether during 2013 an individual was either a suspect in a shooting or a victim of a shooting, as determined by CAC’s “Shots Fired MNIs” database.1

Outcome Measure

The outcome being predicted is involvement in gun crime in 2013, either as a victim or a suspected offender. A total of 59 separate shooting incidents in 2013 involved an identified victim, offender, and/or suspect, which together comprised 118 people. We calculated VOID scores for only the 237,232 individuals who had a field intelligence card, arrest, or involvement in an incident prior to 2013. This is the universe of individuals known to the police as potential future gun-involved persons. Including only individuals from this subset of MNIs reduced the number of people involved in gun violence in 2013 and known to the police prior to 2013 to 103 individuals. That is, 15 individuals who were involved in gun violence in 2013 did not have any prior contact with the police; many were unintended victims of shootings.

While many risk assessment instruments either focus specifically on predicting future offending, VOID does not distinguish between committing a shooting or being the victim of a shooting. The motivation is that the police department believes that there tends to be a large amount of overlap between the two groups, which is a belief validated in a variety of prior criminological research (Jennings, Piquero, & Reingle, 2012). Similar predictive policing applications (such as Chicago’s Heat List) include both victimizations and offending as one outcome (Papachristos, 2016). A second reason is that the police department does not distinguish the intervention based on whether they believe an individual is at high risk of victimization or offending—individuals get the same treatment as described (targeted investigations, ESU monitoring, and suggested social services) in either case. Prior gun violence interventions have sometimes shown preventing victimization is more effective than reducing offending (Papachristos & Kirk, 2015), and so it is reasonable for police to focus resources on individuals who are at high risk of victimization, the same as offending.

Of the 103 individuals being predicted in this sample, the overlap between victimization and offending is slight: only seven individuals were involved with both. For suspects, there was a total of 54 individuals, and for victims there was a total of 56 individuals. We provide separate analyses for three different outcomes: being a victim of a shooting, a suspect in a shooting, or the combination of being either a victim or a suspect in a shooting.

VOID Scores

Table 1 lists the VOID tool risk factors and the weights used for each individual item. Specifically, the VOID risk scoring tool incorporates information on offenders’

  • arrests (aggravated assaults, arson, burglary, etc.)

  • field interviews (with additional weight if a field interview is in a gang area)

  • victimizations (assault, aggravated assault, robbery, menacing, and reckless endangerment)

  • status as a suspect in a case (same offense types as victimizations)

  • appearance as a subject of a crime analyst bulletin (a “be on the lookout” bulletin disseminated from the CAC)

  • involvement in a juvenile offending incident (obtained from a specific database of juvenile incidents)

  • involvement in a runaway incident from a local facility

  • prior involvement in shots fired (as either victim, offender, or witness)

  • truancy

  • involvement in jail incidents (incidents of misconduct at a local jail)

  • known gang membership

Table 1: VOID Items and Weights

Table 1: VOID Items and Weights

10.1177_0093854818824378-table1.gif

All of these factors are cumulative (with the exception of gang membership), and each is given a weight. Weights are applied to specific incidents, such that an offender’s score increases with each additional incident. For example, an arrest for robbery or aggravated assault carries a weight of 2. So, if a known offender had three previous robbery arrests and two previous aggravated assault arrests (and no points for anything else), that individual’s VOID score would be 3 × 2 + 2 × 2 = 10. Online supplementary analysis provides descriptive statistics of the VOID tool, as well as bivariate associations between each risk factor and involvement in gun violence (see Supplemental Material, available in the online version of this article).

Results

We assess the predictive accuracy of the VOID tool in absolute terms, in that we show the tool can effectively discriminate those at low and high risk effectively enough to be useful in practice. We also assess the relative accuracy of the VOID tool compared with five other optimized models. Four of them based on logistic regression models (King & Zeng, 2001; Tollenaar & van der Heijden, 2013) and the fifth based on a machine learning technique known as generalized boosted regression models or GBM (Ridgeway, 1999).2 These are meant to test whether we can use the same items in the VOID tool, but provide more accurate predictions using optimized regression results.

As several of the indicators are correlated, we fit two separate logistic regression models: the first includes each item individually as a predictor, eliminating predictors with zero results in the outcome; the second uses simple sum scores of violent arrests, nonviolent arrests, violent juvenile arrests, nonviolent juvenile arrests, victimizations, suspected offenses, and each of the remaining items individually in the model. We then fit each of these models using traditional logistic regression, as well as using a rare events logistic model (King & Zeng, 2001; Piquero, MacDonald, Dobrin, Daigle, & Cullen, 2005), which corrects for potential small sample bias in logistic regression given there are only 103 instances of the violent outcome in this sample. The GBM model includes all of the original items, allowed up to three-way interactions and a depth of the tree of up to 100 leaves. For each of these new models, we used the same pre-2013 data to predict gun violence in 2013 (either as a victim, offender, or combining either status). Because there is no “hold-out sample” for the regression models (i.e., cases that are not used to generate estimates of model parameters), we compare the model predictions to the 2013 data. This should provide a strong advantage to the new regression models compared with the VOID risk tool, as they are validated on the same sample that was used to generate model parameters. However, subsequent results show that both the logistic regressions and the generalized boosted predictions perform similarly to the original VOID tool.

The predictive accuracy of diagnostic tests is often evaluated by way of a receiver operating characteristic (ROC) curve, which plots the rate of true positives—called “sensitivity”—against the rate of false positives—called “specificity” (Davis & Goadrich, 2006). The ROC curve predicting involvement in either offending or victimization for the VOID scores, as well as the other optimized model predictions, is shown in Figure 1 (ROC graphs for the individual outcomes of specifically offending or victimization show very similar patterns). We would prefer high rates of true positives (correctly predicting individuals associated with gun violence) and low rates of false positives (falsely predicting someone will become involved in gun violence); though at any level of predictive accuracy, a higher rate of true positives can be achieved only by accepting a higher rate of false positives.

Figure 1: ROC Curve for the Different Classification Models Predicting Either a Suspect in a Shooting or a Victim of a Shooting

Note. The darker and thicker line is the curve based on the VOID scores, and the lighter lines are those based on the other regression or machine learning models. ROC = receiver operating characteristic; VOID = Violent Offender Identification Directive.

Table 2 displays the area under the curve (AUC) statistics, their standard error, and the 95% confidence interval, lower and upper bound, for the VOID scores, as well as the five other models considered. It conducts this breakdown for the outcome of suspected involvement in gun violence, the victim of gun violence, or the union of the two categories (either a victim or suspect).

Table 2: AUC Statistics for Each Model

Table 2: AUC Statistics for Each Model

10.1177_0093854818824378-table2.gif

For each outcome, VOID does quite well compared with each of the other models in terms of the AUC statistic, although when examining the confidence intervals for the different models they all substantively overlap. As each score is well above 0.5, most have an AUC of over 0.85, all of the models predict each outcome at a much better chance than random. An AUC of over 0.85 means that there is over an 85% likelihood of a gun violence case scoring higher on VOID than a non-gun violence case. Comparing the different outcomes of being a victim or a suspect, each prediction method (including VOID) does slightly better predicting suspects, as opposed to victims of shootings. Although, clearly each is better than random.

With an offender population of nearly 240,000, though, one should be more concerned with how well the highest scores predict involvement in gun violence. The crime analysts who use the tool do not have the capability to examine a large proportion of the population to make additional intelligence-based assessments, and so in practice there are limits to the threshold the analysts can set to further investigate whether an individual should be considered for placement on the VOID list. The ROC curves show that the top few percent of each prediction identifies about 60% of those who subsequently become involved in shootings as either a suspect or victim. However, just 1% of the population is still nearly 2,400 individuals.

Tables 3 and 4 present the classification rates, sensitivity (the proportion of true positives captured), the PPV (the positive predictive value, also sometimes known as precision), and the diagnostic odds ratio (Singh, 2013).3 These statistics are provided for the top 1,000 scores in Table 3 (VOID only includes the top 959 because there are tied scores after that), as well as the top 5,000 scores in Table 4 for the VOID tool as well as the other models. The cut-offs were based on current practice (assessing the top 1,000 scores), as well as the authors’ estimates of how many more scores the analysts can reasonably be expected to examine based on personal observations of the process (likely somewhere around 5,000 scores). Different ways to select an optimal cut-off, such as Youden’s index (Krzanowski & Hand, 2009), select over 37,000 cases at a VOID score of over two in this sample, which is an infeasible number of individuals for the analysts to effectively review on a regular basis.4

Table 3: Classification Statistics of Top 1,000 Scores by Different Methods

Table 3: Classification Statistics of Top 1,000 Scores by Different Methods

10.1177_0093854818824378-table3.gif

Table 4: Classification Statistics of Top 5,000 Scores by Different Methods

Table 4: Classification Statistics of Top 5,000 Scores by Different Methods

10.1177_0093854818824378-table4.gif

For the top 1,000 scores in Table 3, The VOID tool does a very good job relative to the rate of gun violence in the population, capturing 32 individuals who later become involved in gun violence as either a suspect or a victim. It is important to note, though, that predicting gun violence is difficult, as the number of people associated with gun violence remains low relative to the number of people identified by the tool. The PPV is only slightly over 3%, that is only around 3% of those within the top 1,000 scores will go on to actually be involved in gun violence. When breaking down by either victims or suspects the results are similar, as a ratio measure the VOID tool does quite well in capturing individuals involved in gun violence, but the PPV is smaller still for each individual outcome.

Widening the net to examine the top 5,000 scores (displayed in Table 4) results in a larger capture rate for VOID as well as the other predictive models examined. For the combined outcome, the different models result in capturing around 60 individuals, just slightly over half of the sample. This widening of the net, though, increases sensitivity but decreases PPV; when examining the top 5,000 scores, only slightly over 1% of the sample end up subsequently being involved in gun violence as either a victim or a suspect.

Given how the VOID tool is currently used in practice (i.e., analysts use field intelligence to decide which of the 1,000 or so individuals with the highest scores to place on the top-10 list), it is unlikely that the alternative models presented here would greatly outperform the VOID tool as it is currently applied.

Figure 2 shows how many of the individuals with the top N scores (shown on the x axis) became involved in gun violence (as either a victim or suspect). The figure is, then, essentially a continuous graphical representation of the number of true positives (the Total Gun Violence column) in Table 4. The VOID score ranking is displayed as a darker and thicker line, and rankings based on the other models are displayed as thinner gray lines. For the top 1,000 scores, the line for each model is between 30 and 40 individuals, and for the top 5,000 scores, each tool successfully predicts involvement in gun violence for 60 individuals. While the predictions from the other regression models slightly outperform the VOID scores at the lowest rankings, they each have similar performance by the time one gets to around the top 5,000 scores. VOID outperforms the other models once one gets past the top 11,000 scores, and results in a much larger AUC, but this is largely irrelevant to the way the tool is currently used. The model prediction that initially captures the most violent individuals until around the 10,000 mark in Figure 2 is the GBM prediction.

Figure 2: Number of Gun Violent Cases (Either as a Victim or Suspect) Within Top Rankings of Each Model

Note. The darker and thicker line is the curve based on the VOID scores, and the lighter lines are those based on the other regression or machine learning models. VOID = Violent Offender Identification Directive.

This suggest that widening the net of initial VOID scores to examine from 1,000 to around 5,000 would potentially be beneficial, depending on the capacity of the analysts to evaluate that many cases. Even so, the VOID scores perform reasonably well compared with the optimal regression models, even when only considering the top 1,000 scores. There is a strong limitation, though, in predicting such a rare outcome of gun violence—one must be willing to tolerate a large proportion of false positives to capture a substantive number of future individuals likely to be involved with gun violence (Berk, 2011).

Discussion

The analyses reported here indicate that the VOID tool is predictive of future violence, and that optimized models using the same information are unlikely to appreciably improve upon the VOID tool. Moreover, we believe the ad hoc tool created by analysts performed as well as optimized regression models for several reasons, and that it would be practical for other police departments to implement similar risk prediction tools.

First, because many of the individual items used to signal risk within police databases are correlated, their contributions to predicting violence are likely in large part redundant. The work of Dawes (1979) is particularly applicable to the situation of generating risk scores from police records. In that article, Dawes showed that choosing ad hoc weights for regression equations only reduced prediction errors by a slight amount when the predictor variables have a positive effect on the outcome and the predictor variables all have positive inter-item correlations. This is a situation that readily applies to information available in police databases. Chronic offenders tend to be crime generalists, and any police contact is likely associated with a higher propensity for future criminal behavior. Any one particular act is only a small signal, but aggregated over many events produces a stronger signal that the offender is at high risk.

This finding that simple tools based on static risk factors can produce near similar prediction accuracy compared with more complicated instruments is a long-standing finding in recidivism research (Dressel & Farid, 2018; Harcourt, 2007; Tollenaar & van der Heijden, 2013). As such, it is not surprising that we found the same thing in predicting future involvement in violence based on police records. In fact, recent methods capitalizing on this fact take complicated models and reduce them to simpler instruments (Goel, Rao, & Shroff, 2016; Jung, Concannon, Shroff, Goel, & Goldstein, 2017; Zeng, Ustun, & Rudin, 2017). All of these factors make us confident that an instrument like VOID could be applied by police departments beyond the one we studied. Even if a department does not have access to the specific individual items used in the VOID instrument, it is still likely that combining scores from past criminal incidents that are available in police records could predict future violence reasonably well.

Conclusion

The analysis in the current study provides two major contributions to the wider criminal justice community. The first is illustrating the potential to predict future violence based on data often available to police departments. The second is the utility of predictive modeling for focusing police resources on high-risk gun offenders.

There are several limitations of the findings though. First, we only evaluate the efficacy of the general VOID tool to predict violence in a historical sample. We did not evaluate the prediction accuracy of the officers’ subjective judgments nor did we evaluate the effectiveness of the more dynamic 90-day VOID scores. Identifying whether these terms or other factors predict near-term risk of gun violence (as opposed to risk of gun violence in the next year) would provide more actionable information to a police department. Accomplishing such an assessment was not possible with the current agency, as shootings are too rare, and historical data could not be compiled from all of the databases further into the past. However, an analysis to determine whether very recent contacts with the police are diagnostic enough to use in making policing decisions could be accomplished using data from an agency with a larger sample size of shootings.

Evaluating the ability of the subjective judgment of intelligence officers provides an additional potential way to reduce the number of false positives. Even when limiting to the top 1,000 scores, the probability of a high-risk individual being subsequently involved in gun violence (either as a victim or suspect) was only around 3%. When widening the tool to examine the top 5,000 scores, its sensitivity increased, capturing around 60% of those involved in gun violence in the next year, but the PPV decreased to around 1%. When predicting a rare outcome one needs to balance capturing a small number of true positives by accepting a larger number of false positives (Berk, 2011); it may be the case that incorporating officer judgment can further improve the accuracy of the VOID instrument—either within the top scores or by incorporating further information to identify high-risk individuals not flagged by their prior police contacts.

A second limitation of the article is that we did not evaluate the effectiveness of the VOID program in reducing gun violence. Simply identifying the individuals at the most risk does not necessarily mean the police department will be effective in curbing their future gun violence. It is no doubt a difficult population to intervene with, and similar programs have not been found to be effective (Saunders, Hunt, & Hollywood, 2016). We have, however, validated that the tool has diagnostic utility in predicting those likely to be involved with future gun violence, which is presumably a first step before identifying whether a program effectively reduces future involvement with gun violence. The subjective judgment component of the VOID program in practice makes conducting outcome evaluations difficult, as those identified through intelligence will by design differ from those who were not chosen to be on the final focused list.

A final limitation is that we do not tackle ethical issues regarding the implementation of such a tool to identify chronic offenders. Although the police department under study does not use any demographic characteristics to determine VOID scores—such as age, race, or gender—that does not mean the instrument does not produce disparate outcomes among disadvantaged groups. Using prior criminal history creates a large disparity in identifying young male minorities with high VOID scores in this jurisdiction, and is likely to do so in other jurisdictions if police departments adopt similar instruments (Lum & Isaac, 2016). Because of the small number of individuals associated with gun violence, and the fact that many encounters do not collect this information, it is not possible to conduct additional analyses on these person-based characteristics in this sample. Future research, though, should consider assessing the fairness of different risk prediction instruments for cognizable groups (in particular minorities) by the police along with their accuracy (Berk, 2016; Corbett-Davies, Pierson, Feller, Goel, & Huq, 2017).

Relatedly, the VOID tool incorporates aspects that can reasonably be deemed out of the control of the individual, such as prior victimizations. The majority of police departments implementing chronic offender lists will likely use arrest histories readily available in police records, but these will often include instances in which no charges were ultimately found against the individual. Although arrest and victimization histories may be predictive of future involvement in gun violence, it is an ethical question whether such markers should be used by the police department to intervene with such individuals, as any intervention may infringe on an individual’s rights. This critique equally applies to the use of field stops as a component in the predictive system, which can create a feedback loop where individuals are continually targeted as high risk based solely on prior discretionary police behavior (Ferguson, 2017).

As the tool predicts violent offending as well as victimization, it brings up additional questions as to what the police should do with the predictions. This department has service-based aspects as well as traditional law enforcement tactics. Because the approaches are used simultaneously in this jurisdiction, we would not be able to say whether service-based approaches (which mitigate most of the ethical critiques of using the predictions) would work independent of traditional policing tactics (Papachristos, 2016). We do, however, demonstrate that the tool is effective in predicting who is at high risk of future violent gun offending or gun victimization, and as such it could be used for either strategy.

Finally, it is also the case that the subjective decision-making component of the VOID program, which identifies the top 10 “in crisis,” could add an additional stage that could result in the disproportionate targeting of vulnerable groups. Although there are still current debates about what exactly it means to be fair when making such predictions (Berk, 2016), police departments will need to address this problem alongside the question of whether such chronic offender programs are ultimately effective in reducing future violence.

Notes

1.
While this database had overlap with victims as recorded in police department’s record management system, Crime Analysis Center’s (CAC) shooting database had a much larger number of listed suspects (presumably identified by intelligence not recorded electronically in the record management system).

2.
The traditional logistic regression models were fit in SPSS, the rare events logistic regression models were fit using the Zelig R library (Choirat et al., 2018), and the generalized boosted regression was fit using the gbm R library (Ridgeway, 2013). The implementation of generalized boosting is an ensemble method using decision trees, which have been shown to have better predictive performance for predicting future acts of murder (Berk, Sherman, Barnes, Kurtz, & Ahlman, 2009; Neuilly, Zgoba, Tita, & Lee, 2011). We did not include a comparison of random forests, as the dataset was too large and would cause out-of-memory errors when using the R randomForest library (Liaw & Wiener, 2002).

3.
The PPV is calculated as TP/(TP+FP)TP/(TP+FP) where TP is the number of true positives and FP is the number of false positives. Sensitivity is calculated as TP/(TP+FP)TP/(TP+FP) where FN is the number of false negatives. The diagnostic odds ratio is calculated as (TP×TN)/(FP×FN)(TP×TN)/(FP×FN) where TN is the number of true negatives.

4.
It would also be possible to select an optimal cut-off if the police department could articulate a cost estimate to false positives or false negatives (or at least a relative cost between the two, for example a false negative is 100 times more costly than a false positive). Like in many cases, it is difficult to make even this quantitative estimate, so this is not an option in the current analysis.

ORCID iD
Andrew P. Wheeler https://orcid.org/0000-0003-2255-1316

Supplemental Material
Supplemental Material is available in the online version of this article at http://journals.sagepub.com/home/cjb.

References

Abrahamse, A. F., Ebener, P. A., Greenwood, P. W., Fitzgerald, N., Kosin, T. E. (1991). An experimental evaluation of the Phoenix repeat offender program. Justice Quarterly, 8, 141-168.
Google Scholar | Crossref
Andrews, D. A., Bonta, J. (1995). The Level of Service Inventory—Revised. Toronto, Ontario, Canada: Multi-Health Systems.
Google Scholar
Andrews, D. A., Bonta, J. (2006). The psychology of criminal conduct (4th ed.). Cincinnati, OH: Anderson.
Google Scholar
Andrews, D. A., Bonta, J., Wormith, S. J. (2006). The recent past and near future of risk and/or need assessment. Crime & Delinquency, 52, 7-27.
Google Scholar | SAGE Journals | ISI
Berk, R. (2011). Asymmetric loss functions for forecasting in criminal justice settings. Journal of Quantitative Criminology, 27, 107-123.
Google Scholar | Crossref
Berk, R. (2016, May). A primer on fairness in criminal justice risk assessments. The Criminologist, 41(6), 6-9.
Google Scholar
Berk, R., Sherman, L., Barnes, G., Kurtz, E., Ahlman, L. (2009). Forecasting murder within a population of probationers and parolees: A high stakes application of statistical learning. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172, 191-211.
Google Scholar | Crossref | ISI
Borum, R., Bartel, P., Forth, A. (2003). SAVRY: Structured Assessment of Violence Risk in Youth (Consultation version). Tampa: Florida Mental Health Institute, University of South Florida.
Google Scholar
Braga, A., Papachristos, A. V., Hureau, D. M. (2010). The concentration and stability of gun violence at micro places in Boston, 1980-2008. Journal of Quantitative Criminology, 26, 33-53.
Google Scholar | Crossref | ISI
Campbell, M. A., French, S., Gendreau, P. (2009). The prediction of violence in adult offenders: A meta-analytic comparison of instruments and methods of assessment. Criminal Justice and Behavior, 36, 567-590.
Google Scholar | SAGE Journals | ISI
Choirat, C., Gandrud, G., Honaker, J., Imai, K., King, G., Lau, O. (2018). Zelig: Everyone’s statistical software. Available from http://zeligproject.org/
Google Scholar
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., Huq, A. (2017). Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 797-806). Halifax, Nova Scotia, Canada: Association for Computing Machinery.
Google Scholar | Crossref
Davis, J., Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (pp. 233-240). Halifax, Nova Scotia, Canada: Association for Computing Machinery.
Google Scholar | Crossref
Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34, 571-582.
Google Scholar | Crossref | ISI
Dolan, M., Doyle, M. (2000). Violence risk prediction: Clinical and actuarial measures and the role of the Psychopathy Checklist. The British Journal of Psychiatry, 177, 303-311.
Google Scholar | Crossref | Medline | ISI
Douglas, K. S., Yeomans, M., Boer, D. P. (2005). Comparative validity analysis of multiple measures of violence risk in a sample of criminal offenders. Criminal Justice and Behavior, 32, 479-510.
Google Scholar | SAGE Journals | ISI
Dressel, J., Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1), eaa05580.
Google Scholar | Crossref
Fazel, S., Singh, J. P., Doll, H., Grann, M. (2012). Use of risk assessment instruments to predict violence and antisocial behaviour in 73 samples involving 24 827 people: Systematic review and meta-analysis. British Medical Journal, 345, Article e4692.
Google Scholar | Crossref | Medline
Ferguson, A. G. (2017). The rise of big data policing: Surveillance, race, and the future of law enforcement. New York: New York University Press.
Google Scholar | Crossref
Fowler, K. A., Dahlberg, L. L., Haileyesus, T., Annest, J. L. (2015). Firearm injuries in the United States. Preventive Medicine, 79, 5-14.
Google Scholar | Crossref | Medline
Glover, A. J., Nicholson, D. E., Hemmati, T., Bernfeld, G. A., Quinsey, V. L. (2002). A comparison of predictors of general and violent recidivism among high-risk federal offenders. Criminal Justice and Behavior, 29, 235-249.
Google Scholar | SAGE Journals | ISI
Goel, S., Rao, J. M., Shroff, R. (2016). Precinct or prejudice? Understanding racial disparities in New York City’s stop-and-frisk policy. The Annals of Applied Statistics, 10, 365-394.
Google Scholar | Crossref
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12, 19-30.
Google Scholar | Crossref | Medline | ISI
Hanson, R. K. (2009). The psychological assessment of risk for crime and violence. Canadian Psychology/Psychologie Canadienne, 50, 172-182.
Google Scholar | Crossref | ISI
Harcourt, B . (2007). Against prediction: Profiling, policing, and punishing in an actuarial age. Chicago, IL: The University of Chicago Press.
Google Scholar
Hilton, N. Z., Harris, G. T., Rice, M. E., Houghton, R. E., Eke, A. W. (2008). An indepth actuarial assessment for wife assault recidivism: The domestic violence risk appraisal guide. Law and Human Behavior, 32, 150-163.
Google Scholar | Crossref | Medline | ISI
Hilton, N. Z., Harris, G. T., Rice, M. E., Lang, C., Cormier, C. A., Lines, K. J. (2004). A brief actuarial assessment for the prediction of wife assault recidivism: The Ontario Domestic Assault Risk Assessment. Psychological Assessment, 16, 267-275.
Google Scholar | Crossref | Medline | ISI
Jennings, W. G. (2006). Revisiting prediction models in policing: Identifying high-risk offenders. American Journal of Criminal Justice, 31, 35-50.
Google Scholar | Crossref
Jennings, W. G., Piquero, A. R., Reingle, J. M. (2012). On the overlap between victimization and offending: A review of the literature. Aggression and Violent Behavior, 17, 16-26.
Google Scholar | Crossref | ISI
Jung, J., Concannon, C., Shroff, R., Goel, S., Goldstein, D. G. (2017). Simple rules for complex decisions. Social Science Research Network. doi:10.2139/ssrn.2919024
Google Scholar | Crossref
Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux.
Google Scholar
King, G., Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137-163.
Google Scholar | Crossref
Kroner, D. G., Mills, J. F., Reddon, J. R. (2005). A coffee can, factor analysis, and prediction of antisocial behavior: The structure of criminal risk. International Journal of Law and Psychiatry, 28, 360-374.
Google Scholar | Crossref | Medline | ISI
Kropp, P. R., Hart, S. D. (2004). The development of the Brief Spousal Assault Form for the Evaluation of Risk (B-SAFER): A tool for criminal justice professionals. Ottawa, Ontario: Department of Justice, Government of Canada. Retrieved from http://canada.justice.gc.ca/eng/rp-pr/fl-lf/famil/rr05_fv1-rr05_vf1/rr05_fv1.pdf
Google Scholar
Krzanowski, W. J., Hand, D. J. (2009). ROC curves for continuous data. London, England: Chapman and Hall.
Google Scholar | Crossref
Liaw, A., Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22.
Google Scholar
Lum, K., Isaac, W. (2016). To predict and serve? Significance, 13(5), 14-19.
Google Scholar | Crossref
Martin, S. E., Sherman, L. W. (1986). Catching career criminals: Proactive policing and selective apprehension. Justice Quarterly, 3, 171-192.
Google Scholar | Crossref
Meehl, P. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota Press.
Google Scholar | Crossref
Messing, J. T., Campbell, J., Wilson, S. J., Brown, S., Patchell, B. (2017). The lethality screen: The predictive validity of an intimate partner violence risk assessment for use by first responders. Journal of Interpersonal Violence, 32, 205-226.
Google Scholar | SAGE Journals | ISI
Messing, J. T., Thaller, J. (2013). The average predictive validity of intimate partner violence risk assessment instruments. Journal of Interpersonal Violence, 28, 1537-1558.
Google Scholar | SAGE Journals | ISI
Miller, J., Maloney, C. (2013). Practitioner compliance with risk/needs assessment tools: A theoretical and empirical assessment. Criminal Justice and Behavior, 40, 716-736.
Google Scholar | SAGE Journals | ISI
Mills, J. F., Kroner, D. G., Hemmati, T. (2003). Predicting violent behavior through a static-stable variable lens. Journal of Interpersonal Violence, 18, 891-904.
Google Scholar | SAGE Journals | ISI
Monahan, J., Skeem, J. L. (2016). Risk assessment in criminal sentencing. Annual Review of Clinical Psychology, 12, 489-513.
Google Scholar | Crossref | Medline
Neuilly, M. A., Zgoba, K. M., Tita, G. E., Lee, S. S. (2011). Predicting recidivism in homicide offenders using classification tree analysis. Homicide Studies, 15, 154-176.
Google Scholar | SAGE Journals | ISI
Nofziger, S., Kurtz, D. (2005). Violent lives: A lifestyle model linking exposure to violence to juvenile violent offending. Journal of Research in Crime and Delinquency, 42, 3-26.
Google Scholar | SAGE Journals | ISI
Papachristos, A. V. (2016, July 29). Commentary: CPD’s crucial choice: Treat its list as offenders or as potential victims? Chicago Tribune. Retrieved from http://www.chicagotribune.com/news/opinion/commentary/ct-gun-violence-list-chicago-police-murder-perspec-0801-jm-20160729-story.html
Google Scholar
Papachristos, A. V., Braga, A. A., Hureau, D. M. (2012). Social networks and the risk of gunshot injury. Journal of Urban Health, 89, 992-1003.
Google Scholar | Crossref | Medline | ISI
Papachristos, A. V., Kirk, D. S. (2015). Changing the street dynamic: Evaluating Chicago’s group violence reduction strategy. Criminology & Public Policy, 14, 525-558.
Google Scholar | Crossref | ISI
Pear, V. A., Castillo-Carniglia, A., Kagawa, R. M., Cerdá, M., Wintemute, G. J. (2018). Firearm mortality in California, 2000-2015: The epidemiologic importance of within-state variation. Annals of Epidemiology, 28, 309-315.e2.
Google Scholar | Crossref | Medline
Piquero, A., MacDonald, J., Dobrin, A., Daigle, L., Cullen, F. T. (2005). Self-control, violent offending, and homicide victimization: Assessing the general theory of crime. Journal of Quantitative Criminology, 21, 55-71.
Google Scholar | Crossref | ISI
Quinsey, V. L., Harris, G. T., Rice, M. E., Cormier, C. A. (2006). The law and public policy: Violent offenders: Appraising and managing risk. Washington, DC: American Psychological Association.
Google Scholar
Ridgeway, G. (1999). The state of boosting. Computing Science and Statistics, 31, 172-181.
Google Scholar
Ridgeway, G. (2013). gbm: Generalized Boosted Regression Models (R Package Version 2.1).
Google Scholar
Rozel, J. S., Mulvey, E. P. (2017). The link between mental illness and firearm violence: Implications for social policy and clinical practice. Annual Review of Clinical Psychology, 13, 445-469.
Google Scholar | Crossref | Medline
Saunders, J., Hunt, P., Hollywood, J. S. (2016). Predictions put into practice: A quasi-experimental evaluation of Chicago’s predictive policing pilot. Journal of Experimental Criminology, 12, 347-371.
Google Scholar | Crossref | ISI
Singh, J. P. (2013). Predictive validity performance indicators in violence risk assessment: A methodological primer. Behavioral Sciences & the Law, 31, 8-22.
Google Scholar | Crossref | Medline | ISI
Singh, J. P., Desmarais, S. L., Van Dorn, R. A. (2013). Measurement of predictive validity in violence risk assessment studies: A second-order systematic review. Behavioral Sciences & the Law, 31, 55-73.
Google Scholar | Crossref | Medline | ISI
Spelman, W. (1990). Repeat offender programs for law enforcement. Washington, DC: Police Executive Research Forum.
Google Scholar
Spelman, W. (1995). Criminal careers of public places. Crime Prevention Studies, 4, 115-144.
Google Scholar
Storey, J. E., Kropp, P. R., Hart, S. D., Belfrage, H., Strand, S. (2014). Assessment and management of risk for intimate partner violence by police officers using the Brief Spousal Assault Form for the Evaluation of Risk. Criminal Justice and Behavior, 41, 256-271.
Google Scholar | SAGE Journals | ISI
Tetlock, P. E., Gardner, D. (2015). Superforecasting: The art and science of prediction. New York, NY: Crown Publishers.
Google Scholar
Tollenaar, N., van der Heijden, P. G. M. (2013). Which method predicts recidivism best? A comparison of statistical, machine learning and data mining predictive models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176, 565-584.
Google Scholar | Crossref | ISI
Webster, C. D., Eaves, D., Douglas, K. S., Wintrup, A. (1995). The HCR-20 scheme: The assessment of dangerousness and risk. Vancouver, British Columbia, Canada: Simon Fraser University, Forensic Psychiatric Services Commission of British Columbia.
Google Scholar
Yang, M., Wong, S. C., Coid, J. (2010). The efficacy of violence prediction: A meta-analytic comparison of nine risk assessment tools. Psychological Bulletin, 136, 740-767.
Google Scholar | Crossref | Medline | ISI
Zeng, J., Ustun, B., Rudin, C. (2017). Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180, 689-722.
Google Scholar | Crossref

Andrew P. Wheeler is an assistant professor of criminology at the University of Texas at Dallas in the School of Economic, Political & Policy Sciences. He received his doctoral degree in criminal justice from the University at Albany, State University of New York (SUNY). His research focuses on the spatial analysis of crime at micro places, evaluating police interventions to reduce crime, and practical problems faced by crime analysts.

Robert E. Worden is the director of the John F. Finn Institute for Public Safety, Inc., and an associate professor of criminal justice at the University at Albany, SUNY. He holds a PhD in political science from the University of North Carolina at Chapel Hill. His scholarship has appeared in Justice Quarterly, Criminology, Law & Society Review, and other academic journals, and it has been funded by the National Institute of Justice, the Bureau of Justice Assistance, the Laura and John Arnold Foundation, the New York State Division of Criminal Justice Services, and other sponsors. He served on the National Research Council’s Committee to Review Research on Police Policies and Practices, whose report, Fairness and Effectiveness in Policing: The Evidence, was published by the National Academies Press in 2004. He is the coauthor (with Sarah J. McLean) of Mirage of Police Reform. In 2018, he was recognized by the Police Section of the Academy of Criminal Justice Sciences with the O.W. Wilson Award for outstanding contributions to police education, research, and practice.

Jasmine R. Silver is an assistant professor at Rutgers University, Newark. She holds a PhD in criminal justice from the University at Albany, SUNY, and her work centers broadly on the roles of ideology and moral intuition in shaping perceptions of crime and justice. Her published research appears in Justice Quarterly, Law and Society Review, Law and Human Behavior, Criminology, and Punishment and Society.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK