Thursday 19 January 2017

More on the SPLC's hate crime data

Last week, I posted about the Southern Poverty Law Centre's post-election hate crime report. The post showed that Hillary vote share is positively associated with SPLC hate crimes per capita across US states, but is not associated with a measure of hate crimes per capita based on FBI data. This suggests that the SPLC data might have included a non-trivial number of false reports.

However, as Emil Kirkegaard pointed out, the FBI hate crime measure might be invalid itself––i.e., might not be a good measure of the true frequency of hate crimes. To check this, and to further investigate the possibility of false reports in the SPLC data, I obtained two measures of racial prejudice and one objective measure of violent crime. The two measures of racial prejudice are: N-word search frequency (taken from Google), and years with anti-miscegenation laws (taken from the Washington Post). The measure of violent crime is simply the homicide rate, taken from the FBI

Consistent with the claim that the FBI measure is invalid, it was negatively correlated with both measures of racial prejudice, as well as homicide rate. Thus, rather than capturing the true frequency of hate crimes, it probably just picks up state-level differences in how certain crimes are reported. 


However, all three new measures were also negatively correlated with SPLC hate crimes per capita, which calls into question the veracity of the SPLC's data. Results are shown below.


Finally, Hillary vote share is not significantly correlated with any of the three new measures (but is positively correlated with SPLC hate crimes per capita).


Incidentally, homicide rate was strongly associated with both N-word search frequency (r = .61) and years with anti-miscegenation laws (r = .56), but these two variables were only weakly associated with one another (r = .25).

Thursday 5 January 2017

Were there false reports in the SPLC's post-election hate crime data?

On 29 November, The Southern Poverty Law Centre (SPLC) reported that there had been 867 hate crimes in the United States since the election result was announced. The SPLC's report provided a break down of these 867 hate crimes by state. The authors of the reported noted the following:
The 867 hate incidents described here come from two sources — submissions to the #ReportHate page on the SPLC website and media accounts. Incidents were limited to real-world events; the count does not include instances of online harassment. We have excluded incidents that authorities have determined to be hoaxes; however, it was not possible to confirm the veracity of all reports.
Yesterday on Twitter I came across a plot showing a relatively strong negative relationship between SPLC hate crimes per capita and Trump vote share across US states. (Unfortunately, I can't seem to find the link again, so apologies to whomever made the plot.) In other words, there were more reported hate crimes in states with fewer Trump voters. This suggests that the SPLC might not have been able to exclude all hoaxes or false reports from their data. 

At the suggestion of my friend Roberto Cerina, I investigated this possibility further by comparing the SPLC data to state-level hate crime data from the FBI's Uniform Crime Reports. Specifically, I obtained hate crimes per capita in each state for each year from 2011 to 2015 (as well as Hillary vote share in 2016 for each state). Washington DC was excluded, due to being an outlier. No FBI data were available for Hawaii. 

The correlations between the FBI measures of hate crimes per capita in different years were all very strong, ranging from r = . 78 to r = .92 (p < 0.001 in all cases). This suggests that there is a high degree of persistence in hate crimes per capita from year to year. States with more hate crimes per capita in one year tend to have more hate crimes per capita in the next year. However, the correlations between the FBI measures and the SPLC measure of hate crimes per capita were much weaker, ranging from r = .17 to r = .38 (some significant, some not). Given the high degree of persistence from year to year in the FBI data, I computed the average FBI hate crimes per capita from 2011 to 2015. This variable was correlated with SPLC hate crimes per capita at r = .28 (p = 0.049).

To investigate the possibility of false reporting, I first looked to see whether Hillary vote share or average FBI hate crimes per capita is a better predictor of SPLC hate crimes per capita. Results are shown below:


Consistent with the plot I saw on Twitter, Hillary vote share is strongly associated with SPLC hate crimes per capita. When average FBI hate crimes per capita is added to the model, it enters non-significantly, and Hillary vote share remains a strong predictor. Next, I looked to see whether Hillary vote share is correlated with hate crimes per capita in previous years, as reported by the FBI. Results are shown below:





Hillary vote share is not associated with hate crimes per capita in 2015, 2014, 2013 or 2012. By contrast, the previous year's hate crimes per capita is strongly associated with hate crimes per capita in each year. (It should be noted that there was a marginally significant correlation between Hillary vote share and hate crimes per capita in 2011: r = .26, p = 0.068). In other words, SPLC hate crimes per capita is better predicted by Hillary vote share than FBI hate crimes per capita, whereas FBI hate crimes per capita in any given year is better predicted by the previous year's (or the average) hate crimes per capita.

As an alternative strategy, I regressed SPLC hate crimes per capita on average FBI hate crimes per capita, and saved the residuals. I then plotted the residuals against Hillary vote share (shown below). Note that the residual for a particular state quantifies how many more SPLC hate crimes that state had relative to the number that would have been expected based on previous years' hate crimes (as reported by the FBI). The plot is shown below:


There is a relatively strong positive correlation. In states with more Hillary votes, there were more hate crimes reported by the SPLC relative to the number that would have been expected based on previous years' hate crimes.

Please note that this analysis in no way proves that the SPLC were unable to exclude a large number of hoaxes or false reports, or even a small number. Rather, it is merely suggestive. Comments are welcome, and all data are available upon request.

Tuesday 3 January 2017

How much room for renegotiation did Cameron have prior to the EU referendum?

Prior to the EU referendum last year, David Cameron attempted to renegotiate the terms of Britain's membership with the leaders of other EU member states. Many commentators (though not all) regarded his renegotiation as a failure, arguing that he achieved very little of substance. It is therefore interesting to pose the question, could he feasibly have obtained more concessions from other EU leaders?

I came across some interesting polling, which suggests the answer is almost certainly not. In February of 2016, Lord Ashcroft polled citizens of all other 27 EU member states. First, he asked them:
As you may know, the United Kingdom will have a referendum within the next two years to decide whether or not to remain a member of the European Union. Would you prefer to see the UK remain a member of the EU, or would you prefer the UK to leave, or does it not matter to you either way?
Results were as follows:


A majority in almost every member state said they would prefer the UK to remain a member of the EU. On average, 60% answered "Remain", 30% answered "Doesn't matter", and only 10% answered "Leave". However, Lord Ashcroft also asked EU citizens:
The UK Prime Minister, David Cameron, is negotiating with the leaders of other EU countries to change the terms of the UK's membership of the EU. Which of the following statements comes closest to your view? It is important that the UK should remain a member of the EU. If the UK does not like the terms of EU membership it should leave.
Results were as follows:


A majority in most member states said they would prefer the UK to leave the EU, rather than to renegotiate the terms of its membership. On average, 57% answered, "If the UK does not like the terms of EU membership it should leave", while only 43% answered, "It is important that the UK should remain a member of the EU".

Monday 7 November 2016

What would the electoral college have looked like in 2012 if illegal immigrants could vote?

According to Pew Research, the states with the largest illegal immigrant populations are as follows:


Of these states, Romney won Texas, Georgia, Arizona and North Carolina in the 2012 election––garnering him 103 electoral college votes. Would he have still won them if illegal immigrants could vote? 

Most illegal immigrants are Hispanics. 71% of Hispanics voted for Obama in 2012, while 27% voted for Romney. The Hispanic turnout rate was 48%. Therefore, the results in Texas, Georgia, Arizona and North Carolina would have been as follows if illegal immigrants could vote:


Romney would probably still have won all four of these states. However, his margins of victory would have been quite substantially reduced. 

Tuesday 4 October 2016

How much does the pill increase the risk of depression?

A recent article in The Guardian claims that "The pill is linked to depression – and doctors can no longer ignore it". It does so on the basis of a new study published in JAMA Psychiatry, which tracked all women aged 15-34 in Denmark over a number of years, and measured (among other things) their use of oral contraceptives, and their use of anti-depressants. The study's headline results were that: 
a) women who used oral contraceptives had a 23% higher risk of using anti-depressants for the first time 
b) adolescent females who used oral contraceptives had an 80% higher risk of using anti-depressants for the first time 
The author of the Guardian article seems to regard these results as highly alarming. Yet there are several reasons not to be too consternated. 

First, from what I can tell, the JAMA Psych study was largely correlational, rather than causal: only age and year were controlled for in the main analyses. Second, the authors also measured women's diagnoses of depression, and––for that outcome––the increase in risk associated with oral contraceptive use was only ~10% for women overall (but it was still around 70% for adolescents). Third, and most importantly, the effect sizes do not appear to be very large. 

The crude incidence rate for women overall was ~1.7 per 100 person years. In other words, if 100 women who were not using oral contraceptives lived for 1 year, 1.7 of them would be expected to use anti-depressants for the first time during that year. The results above imply that this figure rises to 2.1 for women using oral contraceptives. The crude incidence rate for adolescents was ~0.9 per 100 person years. So if 100 adolescents who were not using oral contraceptives lived for 1 year, 0.9 of them would be expected to use anti-depressants for the first time during that year. The results above imply that this figure rises to 1.7 for adolescents using oral contraceptives. (The crude incidence rates for diagnoses of depression were much lower.) 

Sunday 10 July 2016

Is county-level racial bias in police shootings unrelated to race-specific crime rates?

Yesterday, I suggested that the overrepresentation of blacks among the victims of police shootings may not be primarily attributable to racial animus on the parts of police officers. In response, a paper by Ross (2015; PLOS ONE), which claims the following, was brought to my attention:
There is no relationship between county-level racial bias in police shootings and crime rates (even race-specific crime rates), meaning that the racial bias observed in police shootings in this data set is not explainable as a response to local-level crime rates.
To his credit, Ross made his dataset publicly available (see S1_File.zip in the Supporting Information), so I carried out a few analyses on it myself. 

The total number of victims in the dataset with known race is 647. There are 260 counties with at least one police shooting of a victim with known race (8% of total counties). 23% of counties have 100% black victims, 55% of counties have 100% white victims, and the remaining 23% have some mix of black and white victims. The mean percentage of victims who are black is 33%. Furthermore, 90% of counties have five or fewer victims, 80% have three or fewer, and 57% have exactly one victim; the mean number of victims per county is 2.5. 

The low number of victims per county seems to me to be a serious limitation of Ross's analysis. How can one reliably model an aggregate-level variable (such as the ratio of black to white victims, or the proportion of victims who are black) that, for most counties, is based on only one or two observations? 

I calculated the 95% confidence interval for the proportion of victims who are black in all 260 counties. (Thanks to Emil Kirkegaard for calculating confidence intervals for cases where p = 1 using the prop.test in R.) This confidence interval overlapped with p = .13 (the percentage of blacks in the general population) in 87% of counties. And when restricting the analysis to unarmed victims, the confidence interval for the proportion of victims who are black overlapped with p = .13 in 89% of counties. In other words, for >85% of counties, one cannot rule out that the true proportion of victims who are black is equal to the proportion of blacks in the general population.

The limitation of low numbers of cases per county was not lost on Ross. In the Methods section he notes that:
County-level police shooting rates are estimated using binomial probabilities, and a prior, estimated from the data, under hierarchical partial pooling. Hierarchical pooling allows information collected in other counties within the United States to partially inform the parameter estimates in a focal county, which improves out-of-sample predictive inference globally... Prior to the introduction of multi-level modeling methods, relative risk ratios at local levels were very hard to infer... The multi-level Bayesian methods used here, partially (rather than fully) pool information across counties, allowing for more stable estimates in relative risk ratios
Not being familiar with these methods myself, I am not in a particularly strong position to judge their efficacy. But it seems to me that Ross's dependent variable will have been subject to considerable measurement error. Therefore, I'm not sure he can really be confident that there is no relationship between racial bias in police shootings and race-specific crime rates at the county level. This is not to say that there is a relationship, but simply that the evidence Ross presents for there not being one may not be very compelling. I would interested to hear others' perspectives. (Stata .do file available upon request.)

Saturday 9 July 2016

Does racial animus explain killings of black civilians by US police?

This post examines the distribution of victims of police shootings by race, and by sex. Data on individuals killed by police were taken from the Washington Post database for 2015 and 2016. According to these data, over the last two years, 27% of those killed by police were black, and 39% of those killed by police while unarmed were black. Insofar as blacks represent only 13% of the US population, this implies that blacks are overrepresented among the victims of police shootings.

However, blacks are––for whatever reason––also overrepresented among the perpetrators of violent crime. It is possible that blacks are more likely to be killed by police because they are more likely to get into violent confrontations with police, or because police officers practice statistical discrimination. The chart below shows, from left to right, the racial distribution of: the US population (taken from the US Census Bureau); those killed by police; those killed by police while unarmed; murder offenders (taken from the FBI); and alleged police killers (taken from the FBI––averaging over the last five years of available data was done to obviate sampling error). 


Blacks represent 13% of the US population, 27% of those killed by police, 39% of those killed by police while unarmed, 53% of murder offenders, and 39% of alleged police killers. These figures suggest that blacks may not be overrepresented among the victims of police shootings once involvement in murder or police killings is adjusted for. By way of comparison, consider the corresponding distributions by sex, which are shown in the chart below. Men are much more likely to be killed by police than women. But they are about as likely to be killed by police as one would expect on the basis of their involvement in murder or police killings. 


Given the highly sensitive nature of the subject matter, some caveats are in order. First, I am not arguing that blacks deserve to be killed more by police than members of other races. Rather, I am simply pointing out that the overrepresentation of blacks among the victims of police shootings may not be primarily attributable to racial animus on the parts of police officers. Second, I am not denying that there are cases of racially motivated violence against blacks by police officers. Such cases are of course deplorable. Third, I am not denying that there is a problem with police violence in the United States. The rate of police shootings in the US seems disproportionate even to the US's comparatively high homicide rate.

Finally, I am happy to send the dataset I have assembled to anyone who wants it.