How FiveThirtyEight’s 2020 Forecasts Did And What We’ll Be Thinking About For 2022


[ad_1]

Welcome to the latest installment of what we’re hoping to make a biennial practice: an after-action review of FiveThirtyEight’s election forecasts, which in this past cycle covered the House, Senate and presidency. I realize there might not be too much appetite for reading about 2020 forecasts now that it’s June 2021, but let’s start with the major takeaways, which are fairly straightforward this year:

  • Even in a year when the polls were mediocre to poor, our forecasts largely identified the right outcomes. They correctly identified the winners of the presidency (Joe Biden), the U.S. Senate (Democrats, after the Georgia runoffs) and the U.S. House (Democrats, although by a narrower-than-expected margin). They were also largely accurate in identifying the winners in individual states and races, identifying the outcome correctly in 48 of 50 presidential states (we also missed the 2nd Congressional District in Maine), 32 of 35 Senate races and 417 of 435 House races. 
  • More importantly from our point of view, our models were generally well-calibrated. That is to say, the share of races we called correctly was roughly in line with the probabilities listed by our models. For instance, based on the probabilities associated with the Deluxe version of our Senate forecasts, the model expected to get 88 percent of its calls right; in fact, it got 91 percent of them right. (If anything our models were slightly underconfident, that is, there were fewer upsets than predicted.) Moreover, when upsets did occur, they occurred in races where the model assigned relatively high odds to them happening. For example, although we had Sen. Susan Collins as an underdog in Maine, we still had her with a 41 percent chance of victory there, so the race was highly competitive.
  • However, nearly all of our models’ misses came in the same direction; namely, Republicans won in races where Democrats were favored. On the one hand, this is a fairly common pattern. One party often wins most of the toss-up races. Indeed, this logic is embedded in our models, which assume that errors are somewhat correlated from race to race. (Recognizing these correlations was a big reason that our presidential model gave Donald Trump a better chance than other models in 2016.) On the other hand, it may be that polling and related-forecast errors are becoming even more correlated, due to rising partisanship and a decline in split-ticket voting. The number of House seats that Democrats wound up with (222) was outside the 80 percent confidence interval in our forecast, which may be a reflection of this. The more correlated that race outcomes are, the wider the confidence intervals in forecasting how many seats a party will win overall, since it’s less likely that a miss in one direction (say, a Republican upset in one race) will be offset by a miss in the opposite direction (a Democratic upset win in a different race). If so, we’ll want to address this in our model moving forward.

OK, now some ground rules before we proceed further. I’ll be looking at a total of 529 forecasts: 435 U.S. House races; 35 Senate races; plus the presidential outcome in all 50 states, the District of Columbia, and the five congressional districts in the states (Maine and Nebraska) that split their electoral votes, plus the topline forecasts as to which party would control the Senate, the House and presidency via winning the Electoral College. I’m also only looking at how our final forecasts did on Election Day. (For a look at how our forecasts performed at earlier points in the 2020 cycle, please check out our updated interactive where we evaluate how our many, many forecasts have done over the years.) One last note for Georgia where both Senate races went to runoffs, we’ll evaluate our forecast based on what they said as of Nov. 3, which incorporated the possibility of a runoff in January. (That forecast correctly had Democrats favored to win Georgia’s special Senate election, eventually won by now-Sen. Raphael Warnock, but incorrectly had Republican incumbent Sen. David Perdue favored to hold onto his seat, which was instead won by Democratic Sen. Jon Ossoff).

Here’s a look at all 529 forecasts combined, including our presidential forecasts plus the Deluxe version of our congressional forecasts. And in the chart below, I’ve broken these forecasts down into four different categories: toss-up (where the leader had between a 50 and 60 percent chance of winning); lean (a 60 to 75 percent chance); likely (a 75 to 95 percent chance) and solid (a 95 percent or greater chance), first showing how all races combined fared and then how races where Democrats or Republicans were favored did.

How well our 2020 forecasts did

Final FiveThirtyEight presidential forecast and final Deluxe version of our congressional forecasts as of Nov. 3, 2020, versus actual results

All races combined
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT actual correct
Toss-up 22 12 9 55% 41%
Lean 31 21 24 68 77
Likely 77 68 73 88 95
Solid 399 397 399 99 100
All races 529 497 505 94 95
Where Democrats were favored
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up (tilt D) 12 7 1 55% 8%
Lean D 15 11 8 70 53
Likely D 40 35 36 87 90
Solid D 222 221 222 99 100
All races 289 273 267 94 92
Where Republicans were favored
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up (tilt R) 10 6 8 55% 80%
Lean R 16 10 16 65 100
Likely R 37 33 37 88 100
Solid R 177 176 177 99 100
All races 240 224 238 93 99

One big takeaway here is that, perhaps unsurprisingly, we did best in calling races correctly when we were more confident of the outcome. We got 41 percent of our toss-up calls right, along with 77 percent of our “lean” calls, 95 percent of our “likely” calls, and 100 percent (399 of 399) of our “solid” calls.

Also, despite going just nine for 22 in toss-up races, our model did about as well as it was expecting to, or just slightly better than that. Based on the probabilities attached to each forecast, our model’s expectation was that it would get about 497 of the 529 races right in an average year. In fact, it got 505 of them correct (albeit not a statistically significant difference from 497).

These topline numbers, however, conceal significant differences in how the respective parties performed. Democrats won just one of the 12 toss-up races where they were narrowly favored (that is, where they had between a 50 and 60 percent chance of winning), while Republicans won eight of their 10 toss-ups. Meanwhile, Democrats won eight of the 15 “lean Democratic” races — just barely more than 50 percent — while Republicans won 16 of the 16 “lean Republican” races. Democrats did win 36 of the 40 “likely Democratic” races, and all of their “solid Democratic” races, so there was a limit to how far up the probabilistic scale the upsets went. 

Still, the large majority of the races that were expected to be close went the same way — toward Republicans — especially in the U.S. House. Of course, that’s what you might expect in an election cycle where the average poll had roughly a 5-point Democratic bias.

Evaluating our presidential forecasts

Next, let’s break out the numbers for the presidential race specifically:

How well our 2020 presidential forecast did

Final FiveThirtyEight presidential forecast as of Nov. 3, 2020, versus actual results

All races combined
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up 3 2 2 57 67
Lean 6 4 4 66 67
Likely 13 12 13 90 100
Solid 35 35 35 98 100
All races 57 52 54 91 95
Where Democrats were favored
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up (tilt D) 2 1 1 58 50
Lean D 4 3 2 68 50
Likely D 7 6 7 90 100
Solid D 19 19 19 99 100
All races 32 29 29 91 91
Where Republicans were favored
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up (tilt R) 1 1 1 55 100
Lean R 2 1 2 61 100
Likely R 6 5 6 89 100
Solid R 16 16 16 98 100
All races 25 23 25 91 100

The sample sizes aren’t really large enough here to say much about calibration, although for what it’s worth the model did decently enough, getting two of the three toss-up races right along with four of the six “lean” races. It also got all of the “likely” and “solid” forecasts correct. The only misses, however, were those that worked in Trump’s favor: Biden was favored to win Florida, North Carolina and Maine’s 2nd Congressional District when Trump won them instead. Still, none of those were huge upsets: Trump’s chances were 31 percent in Florida, 36 percent in North Carolina and 43 percent in Maine’s 2nd.

Probably the more controversial part of our forecast is that our final forecast had Biden with an 89 percent chance of winning the Electoral College, which struck some observers as too high given that the outcome of the election took several days to determine.

I’m not super sympathetic to this critique, to be honest. For one thing, Biden’s 89 percent chance of victory did not mean an 89 percent chance of a blowout. On the contrary, precisely the reason that Biden’s chances were so high is that he could endure a fairly large polling error and still come out ahead, something which had not been true for Hillary Clinton four years earlier. And indeed, although the final outcome was fairly close, Biden did have some margin to spare, as he could have lost any two of the four closest states (Georgia, Wisconsin, Arizona, Pennsylvania) and still have won the Electoral College despite the large polling error.

Evaluating our congressional forecast

Let’s turn back to our congressional forecast, and compare the Lite, Classic and Deluxe versions of our model. We sometimes describe the Lite forecast as being “polls only,” although as I’ll explain in a moment, that undersells it to some degree. But the Classic forecast layers in a whole bunch of additional data that Lite does not use, such as fundraising data and the results of the previous congressional race in that state or district. The Deluxe version, meanwhile, uses all of that data plus expert forecasts from groups such as The Cook Political Report. Here’s how our congressional forecast did by version:

How well our Deluxe congressional forecast did

Final Deluxe version of FiveThirtyEight’s congressional forecast as of Nov. 3, 2020, versus actual results

All races combined
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up 19 10 7 55% 37%
Lean 25 17 20 68% 80%
Likely 64 56 60 87% 94%
Solid 364 362 364 99% 100%
All races 472 445 451 94% 96%
Where Democrats were favored
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up (tilt D) 10 6 0 55% 0%
Lean D 11 8 6 71% 55%
Likely D 33 29 29 87% 88%
Solid D 203 202 203 >99% 100%
All races 257 244 238 95% 93%
Where Republicans were favored
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up (tilt R) 9 5 7 55% 78%
Lean R 14 9 14 66% 100%
Likely R 31 27 31 88% 100%
Solid R 161 160 161 99% 100%
All races 215 201 213 94% 99%
How well our Classic congressional forecast did

Final Classic version of FiveThirtyEight’s congressional forecast as of Nov. 3, 2020, versus actual results

All races combined
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up 19 10 10 55% 53%
Lean 32 17 21 67% 66%
Likely 77 67 73 87% 95%
Solid 344 342 344 99% 100%
All races 472 441 450 93% 95%
Where Democrats were favored
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up (tilt D) 9 5 1 55% 11%
Lean D 14 9 6 66% 43%
Likely D 39 34 35 87% 90%
Solid D 196 195 196 >99% 100%
All races 258 243 238 94% 92%
Where Republicans were favored
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up (tilt R) 10 5 9 55% 90%
Lean R 18 12 17 67% 94%
Likely R 38 33 38 87% 100%
Solid R 148 147 148 99% 100%
All races 214 198 212 92% 99%
How well our Lite congressional forecast did

Final Lite version of FiveThirtyEight’s congressional forecast as of Nov. 3, 2020, versus actual results

All races combined
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up 23 13 19 54% 83%
Lean 37 25 30 68% 81%
Likely 103 89 97 87% 94%
Solid 309 307 309 99% 100%
All races 472 434 455 92% 96%
Where Democrats were favored
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up (tilt D) 4 2 1 54% 25%
Lean D 13 9 7 69% 54%
Likely D 44 38 38 87% 86%
Solid D 192 191 192 >99% 100%
All races 253 240 238 95% 94%
Where Republicans were favored
CATEGORY NO. RACES EXPECTED
WINS
ACTUAL
WINS
EXPECTED CORRECT ACTUAL CORRECT
Toss-up (tilt R) 19 10 18 54% 95%
Lean R 24 16 23 68% 96%
Likely R 59 51 59 87% 100%
Solid R 117 116 117 99% 100%
All races 219 194 217 88% 99%

It turns out that the “simple” Lite version of our model called 455 of the 472 races correctly, as compared to 450 for Classic and 451 for Deluxe. That’s a surprise, perhaps, especially given that our Lite forecast relies more heavily on polls, and the polls didn’t have a very good year. 

However, there are a couple of complications. One is that the Lite model isn’t so simple, really. In races with a lot of polling, it just defaults to the polling. But many congressional races — most races for the House, for instance — don’t get very much polling. Instead, in those cases, Lite relies on a system called CANTOR that looks at polling in similar states and districts. I’m skipping some steps here, but in identifying comparable districts, CANTOR relies heavily on our partisan lean index, which is based on recent voting in presidential and state legislative elections. So if Lite is performing well, it may just mean that state and district partisanship (as used in CANTOR) is a super important metric, maybe more so than the more complicated soup of data that Classic and Deluxe look at.

The other complication is that Lite probably got lucky. It correctly called an uncanny 19 of 23 races that it deemed to be toss-ups, while Classic got 10 of 19 of its toss-ups right and Deluxe got seven of its 19 toss-ups right. There’s not much skill in forecasting toss-ups, but if each version of the model had gotten its expected number of toss-ups right, then Deluxe would have called 454 races correctly, as compared to 449 for Lite and 448 for Classic.

Still, there are some things to consider for future election cycles. In a world with extremely little split-ticket voting, it may be hard to improve on measures such as the partisan lean of a district when forecasting congressional races. Conversely, fundraising — which the Classic model relies upon fairly heavily — may be a misleading metric in a world in which races are highly nationalized and a high fundraising total doesn’t necessarily indicate an abundance of local support.

Here’s another concern from our House forecast: Democrats’ final seat total — 222 seats, down from 235 at the conclusion of the 2018 midterms — fell outside of the 80 percent confidence interval in our Deluxe forecast, which spanned between 225 and 254 Democratic seats. It was also outside the 80 percent interval in Lite and Deluxe.

This wasn’t a total disaster unto itself: Forecasts are supposed to miss the 80 percent confidence interval 20 percent of the time, and Democrats’ seat total was within the 90 percent confidence interval. Still, as I mentioned earlier, we’ll want to study whether the correlation in forecast errors has become even higher than we were assuming before. If so, that would yield wider confidence intervals in our topline forecasts since it’s less likely that errors in different races cancel out (i.e., that Democrats would pull off an upset victory in one seat to compensate for an upset loss in another seat).

The biggest upsets of 2020

Speaking of upsets, the next table contains a comprehensive list of upsets: Any race that any version of any of our models got wrong:

The biggest upsets of 2020 per our forecasts

Races in which at least one version of the FiveThirtyEight forecast rated the eventual winner as an underdog

Winner’s Chances
Office Race Deluxe Classic Lite
House IA-1 12.5% 6.7% 13.1%
House IA-2 12.2 6.0 16.3
House FL-26 18.1 13.8 19.3
House FL-27 18.5 31.8 21.6
House TX-23 25.9 24.9 31.6
House CA-48 32.0 34.2 18.3
House CA-39 26.0 35.5 23.5
Senate NC 31.9 29.4 25.5
President FL 30.9
House NY-2 43.0 31.0 30.8
President NC 36.1
House CA-21 41.8 39.1 29.8
Senate ME 41.0 41.4 30.2
House GA-7 43.3 30.4 39.0
House SC-1 35.7 42.3 38.2
President ME-2 42.7
House CA-25 45.3 33.4 52.1
House NY-11 41.5 38.8 58.6
Senate GA (Perdue) 42.6 47.4 49.2
House UT-4 43.9 46.3 50.5
House NM-2 44.5 43.4 56.1
House VA-5 50.9 40.3 53.0
House NJ-2 49.8 54.6 43.3
House PA-10 52.2 54.5 44.5
House OK-5 49.1 50.0 54.2
House IN-5 49.8 54.8 50.1
House MI-3 55.8 47.5 54.3
Senate IA 57.7 50.2 49.98
House CO-3 61.2 47.5 61.4

We only published one version of our presidential forecast this year. It is most analogous to our Classic congressional forecast so its results are listed in that column.

One thing to note is that there weren’t any particularly remarkable upsets such as a 50-to-1 or 100-to-1 longshot winning. No candidate with less than a five percent chance of winning won, and only two candidates with less than a 10 percent chance did — Republicans Ashley Hinson and Mariannette Miller-Meeks in Iowa’s 1st and 2nd Congressional Districts, respectively — and then, only in the Classic version of our model.

Still, the table provides a good indication of where the pain points were for Democrats. They considerably underperformed expectations in Iowa along with a series of districts with a large number of Hispanic voters, such as Texas’s 23rd Congressional District and Florida’s 26th Congressional District. That led to a considerably narrower House majority than Democrats were hoping for. Conversely, the only upsets in races where Republicans were favored were for Perdue’s Senate seat and in Georgia’s 7th Congressional District, where Democrat Carolyn Bourdeaux beat Republican Rich McCormick.

Quick thoughts on model changes for 2022

At the end of each election cycle, I like to consider what changes we might make to our models in forthcoming cycles. I’ll be brief here, since I’ve already touched on most of these.

One set of questions is concerned with how reliable the polling will be going forward and what changes we might make in response to that. The changes we made to our pollster ratings earlier this year — namely, no longer privileging live-caller telephone polls — will have some knock-on effects on our models, since our models use our pollster ratings to determine how much weight to assign to polls from different firms.

Beyond that, though, I think questions about polling accuracy are probably more pertinent to 2024 than 2022. That’s partly because our congressional model already relies on a mix of polling and non-polling indicators and so is less sensitive to inaccurate polling than our presidential forecast might be. It’s also partly because the polls did quite well in the previous midterm election in 2018, so it seems prudent to see how they fare in 2022 before necessarily concluding that congressional polling is broken. 

Also, I’d note that our models are already fairly conservative with respect to how accurate they expect the polling to be. That is, they assume that polling errors are common. That’s how our forecasts managed to be well-calibrated despite a fairly large polling error in 2020.

What else to look at before 2022? Prior to last year, we’d already increased the weight given to partisanship in the Deluxe and Classic models and reduced the weight associated with variables such as candidate experience and fundraising. That looks to have been a good first step, but I suspect when we throw the 2020 results in and reanalyze the data, it will compel some further changes in that direction. And as I mentioned, while our models already assume that the errors between different races are correlated, those correlations may be even higher in an era of greater partisanship, so we’ll want to look at that, too.

Finally, there’s the question of what might happen to election analysis in an America where the Republicans might seek to nullify or invalidate election results. In some sense, these questions are external to our model, which is intended to forecast the results assuming all legal votes are counted and that electoral outcomes are respected. But we’ll keep our minds open as to whether there are ways to present our forecasts that make the scope and mission of our model clearer, or different types of model outputs that could shine light on these questions. In the meantime, we’ll continue to cover the threats to electoral democracy through our reporting.

[ad_2]


Leave a Reply

Your email address will not be published. Required fields are marked *