Blog Post #5

Ella Trembanis

2024/10/06

Demographics & Voter Files

However sophisticated their statistical techniques may be, even the greatest election forecasters can be humbled by hard-to-track changes in voter turnout. News reports unsurprisingly focus on the salacious headline story of voter behavior – who undecideds are inclined to vote for – and place less emphasis on the underlying issue of voter turnout. But of course, polls start to lose their sheen if they appear to be a poor proxy for the intentions of those who will ultimately cast their ballots.

Though Census data is an invaluable resource in analyst’s efforts to reconcile noisy, overinclusive data with recognized demographic patterns, it does not provide the full picture. Not only can the Census itself be overinclusive – non-citizens, among other non-voting residents, may be counted – but since it is updated just every ten years, its accuracy is continually declining. In the aftermath of the COVID-19 pandemic, which shifted large numbers of workers into long-term remote work positions and gave some the opportunity to move across district and state lines, we should be particularly cautious about interpreting Census data as convincing evidence of who (still) lives where.

Voter files – though not a panacea by any means – help bridge some of these shortcomings. These state records provide political campaigns with the invaluable opportunity to target voters on a microscopic level, and allow forecasters to seek a second opinion about a population’s demographic characteristics, and by extension its likely turnout.

To kick off this week’s blog, I took a brief look at voter files from my home state of Delaware.

I first conducted a simple linear regression to see which demographic variables seemed to be the best predictors of turnout. The summarized output, below, interestingly suggests that race and party registration have little observable impact on voter turnout.

## 
## Call:
## lm(formula = svi_vote_all_general_pres_pct ~ sii_age_range + 
##     sii_gender + sii_race + svi_party_registration + sii_education_level + 
##     sii_homeowner, data = vf_de)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -95.581 -24.629   9.771  24.059  85.261 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              30.7598    17.7714   1.731 0.083518 .  
## sii_age_rangeB           17.3342     1.5268  11.353  < 2e-16 ***
## sii_age_rangeC           22.6238     1.5491  14.605  < 2e-16 ***
## sii_age_rangeD           31.0418     1.3985  22.197  < 2e-16 ***
## sii_age_rangeE           36.6073     1.5246  24.011  < 2e-16 ***
## sii_age_rangeF           39.2695     1.6560  23.714  < 2e-16 ***
## sii_genderM              -4.2739     0.8270  -5.168 2.43e-07 ***
## sii_genderU              -7.8137     2.5018  -3.123 0.001796 ** 
## sii_raceB                 2.1753     3.1272   0.696 0.486697    
## sii_raceH                -1.9438     3.5801  -0.543 0.587172    
## sii_raceN                 2.8391    16.0844   0.177 0.859896    
## sii_raceO               -15.8210    24.8782  -0.636 0.524835    
## sii_raceU                -5.1535     6.9498  -0.742 0.458396    
## sii_raceW                 4.2272     2.9979   1.410 0.158565    
## svi_party_registrationD  10.1942    17.4745   0.583 0.559657    
## svi_party_registrationG  10.0633    21.0731   0.478 0.632989    
## svi_party_registrationL  13.2126    19.5199   0.677 0.498504    
## svi_party_registrationR   9.1524    17.4831   0.524 0.600641    
## svi_party_registrationT  -6.5079    39.0859  -0.167 0.867765    
## svi_party_registrationU   0.8759    17.4808   0.050 0.960041    
## svi_party_registrationW -24.1822    24.7141  -0.978 0.327872    
## sii_education_levelB     11.0025     0.9875  11.142  < 2e-16 ***
## sii_education_levelC     12.1717     1.3908   8.751  < 2e-16 ***
## sii_education_levelD      8.7111     4.7995   1.815 0.069565 .  
## sii_education_levelE      9.5529     1.7538   5.447 5.28e-08 ***
## sii_homeownerR           -7.4689     1.9879  -3.757 0.000173 ***
## sii_homeownerU           -5.8447     0.9846  -5.936 3.05e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.9 on 7423 degrees of freedom
##   (3180 observations deleted due to missingness)
## Multiple R-squared:  0.1814,	Adjusted R-squared:  0.1785 
## F-statistic: 63.25 on 26 and 7423 DF,  p-value: < 2.2e-16

That party registration seems to play such a small role is particularly counterintuitive since Delaware is a reliably blue state, in which Republican voters might conceivably be less motivated to vote in non-competitive federal races. Further research is needed to draw any detailed conclusions, but perhaps bipartisan interest in more competitive down-ballot races is driving this distribution.

Figures I-III, below, summarize voter turnout rates across the variables determined to be most significant by the regression model: age, education level, and home ownership. In this regard, Delaware seems to be in lock-step with the conventional wisdom that older, wealthier, and more highly-educated individuals are more likely to vote. Not only do these groups often have an easier time shouldering the opportunity costs of voting, but they may also have distinctive policy interests that motivate them to participate. For instance, retirees may find it easier to take time out of their days to go to the polls, and they may also be impassioned about defending Social Security.

This Week’s Prediction

This week’s model is an elaboration on the Time for More Change approach I proposed in last week’s blog. Time for More Change borrows its basic framework from Abramowitz’s Time for Change model, with a few tweaks made to account for the specific conditions of the 2024 race, namely, swapping out Biden’s June approval for Harris’s support in recent poll averages, incorporating the Index of Consumer Sentiment to acknowledge feelings – not just facts – about the state of the economy, and using a binary variable to indicate participation in the previous administration.

Time for More Change predicts a Harris two-party popular vote share of 50.69%, with an upper bound of 54.22% and a lower bound of 47.15% at the 80% confidence level. This reflects the popular consensus that 2024 will be an exceptionally close contest, although even a very poor model could have stumbled into a prediction around the 50% mark.

The real stress test for Time for More Change comes from the electoral college prediction, which I have added for this week.

The most pressing issue I faced in constructing the electoral college extension of Time for More Change – besides some disastrous human error that had me temporarily predicting a blue Texas and a red Massachusetts – was accounting for the vast majority of the states which lack current polling data. I used Biden’s 2020 vote share by state and the most recent national polling average to fill in the gaps in these cases, but it is by no means a perfect proxy for polls. Fortunately, all of the major swing states for 2024 have state-level polling.

Time For Change predicts that Harris will receive 287 electoral college votes, which would give her the presidency. While this estimate in itself is not out of the realm of possibility (give or take a few votes from Maine and Nebraska’s congressional districts, since their proportional systems are not yet incorporated in this model), it does produce a slightly sketchy map.

In Figure IV, below, the model has Oregon and Virginia swinging toward the Republicans and Ohio, West Virginia, North Dakota, and South Carolina breaking for Harris. It is difficult to evaluate the model’s prediction of the swing states – even though we have more recent polling in those cases, they are genuinely up for grabs, so possible errors in their predicted electoral college vote are not as obvious as the improbable North Dakota guess.

There is plenty more to be done with this model – its forecasting uncertainty is poorly defined, it is essentially blind to campaign factors and vulnerable to unrepresentative polls, and the Time for Change approach’s focus on incumbent party vote share means that it says very little about Trump’s viability as a candidate.

See you next week!