Yesterday, I suggested that the overrepresentation of blacks among the victims of police shootings may not be primarily attributable to racial animus on the parts of police officers. In response, a paper by Ross (2015; PLOS ONE), which claims the following, was brought to my attention:
There is no relationship between county-level racial bias in police shootings and crime rates (even race-specific crime rates), meaning that the racial bias observed in police shootings in this data set is not explainable as a response to local-level crime rates.
To his credit, Ross made his dataset publicly available (see S1_File.zip in the Supporting Information), so I carried out a few analyses on it myself.
The total number of victims in the dataset with known race is 647. There are 260 counties with at least one police shooting of a victim with known race (8% of total counties). 23% of counties have 100% black victims, 55% of counties have 100% white victims, and the remaining 23% have some mix of black and white victims. The mean percentage of victims who are black is 33%. Furthermore, 90% of counties have five or fewer victims, 80% have three or fewer, and 57% have exactly one victim; the mean number of victims per county is 2.5.
The low number of victims per county seems to me to be a serious limitation of Ross's analysis. How can one reliably model an aggregate-level variable (such as the ratio of black to white victims, or the proportion of victims who are black) that, for most counties, is based on only one or two observations?
I calculated the 95% confidence interval for the proportion of victims who are black in all 260 counties. (Thanks to Emil Kirkegaard for calculating confidence intervals for cases where p = 1 using the prop.test in R.) This confidence interval overlapped with p = .13 (the percentage of blacks in the general population) in 87% of counties. And when restricting the analysis to unarmed victims, the confidence interval for the proportion of victims who are black overlapped with p = .13 in 89% of counties. In other words, for >85% of counties, one cannot rule out that the true proportion of victims who are black is equal to the proportion of blacks in the general population.
The limitation of low numbers of cases per county was not lost on Ross. In the Methods section he notes that:
County-level police shooting rates are estimated using binomial probabilities, and a prior, estimated from the data, under hierarchical partial pooling. Hierarchical pooling allows information collected in other counties within the United States to partially inform the parameter estimates in a focal county, which improves out-of-sample predictive inference globally... Prior to the introduction of multi-level modeling methods, relative risk ratios at local levels were very hard to infer... The multi-level Bayesian methods used here, partially (rather than fully) pool information across counties, allowing for more stable estimates in relative risk ratios
Not being familiar with these methods myself, I am not in a particularly strong position to judge their efficacy. But it seems to me that Ross's dependent variable will have been subject to considerable measurement error. Therefore, I'm not sure he can really be confident that there is no relationship between racial bias in police shootings and race-specific crime rates at the county level. This is not to say that there is a relationship, but simply that the evidence Ross presents for there not being one may not be very compelling. I would interested to hear others' perspectives. (Stata .do file available upon request.)