How FiveThirtyEight’s 2020 Forecasts Did And What We’ll Be Thinking About For 2022

[ad_1]

Welcome to the latest installment of what we’re hoping to make a biennial practice: an after-action review of FiveThirtyEight’s election forecasts, which in this past cycle covered the House, Senate and presidency. I realize there might not be too much appetite for reading about 2020 forecasts now that it’s June 2021, but let’s start with the major takeaways, which are fairly straightforward this year:

Even in a year when the polls were mediocre to poor, our forecasts largely identified the right outcomes. They correctly identified the winners of the presidency (Joe Biden), the U.S. Senate (Democrats, after the Georgia runoffs) and the U.S. House (Democrats, although by a narrower-than-expected margin). They were also largely accurate in identifying the winners in individual states and races, identifying the outcome correctly in 48 of 50 presidential states (we also missed the 2nd Congressional District in Maine), 32 of 35 Senate races and 417 of 435 House races.
More importantly from our point of view, our models were generally well-calibrated. That is to say, the share of races we called correctly was roughly in line with the probabilities listed by our models. For instance, based on the probabilities associated with the Deluxe version of our Senate forecasts, the model expected to get 88 percent of its calls right; in fact, it got 91 percent of them right. (If anything our models were slightly underconfident, that is, there were fewer upsets than predicted.) Moreover, when upsets did occur, they occurred in races where the model assigned relatively high odds to them happening. For example, although we had Sen. Susan Collins as an underdog in Maine, we still had her with a 41 percent chance of victory there, so the race was highly competitive.
However, nearly all of our models’ misses came in the same direction; namely, Republicans won in races where Democrats were favored. On the one hand, this is a fairly common pattern. One party often wins most of the toss-up races. Indeed, this logic is embedded in our models, which assume that errors are somewhat correlated from race to race. (Recognizing these correlations was a big reason that our presidential model gave Donald Trump a better chance than other models in 2016.) On the other hand, it may be that polling and related-forecast errors are becoming even more correlated, due to rising partisanship and a decline in split-ticket voting. The number of House seats that Democrats wound up with (222) was outside the 80 percent confidence interval in our forecast, which may be a reflection of this. The more correlated that race outcomes are, the wider the confidence intervals in forecasting how many seats a party will win overall, since it’s less likely that a miss in one direction (say, a Republican upset in one race) will be offset by a miss in the opposite direction (a Democratic upset win in a different race). If so, we’ll want to address this in our model moving forward.

OK, now some ground rules before we proceed further. I’ll be looking at a total of 529 forecasts: 435 U.S. House races; 35 Senate races; plus the presidential outcome in all 50 states, the District of Columbia, and the five congressional districts in the states (Maine and Nebraska) that split their electoral votes, plus the topline forecasts as to which party would control the Senate, the House and presidency via winning the Electoral College. I’m also only looking at how our final forecasts did on Election Day. (For a look at how our forecasts performed at earlier points in the 2020 cycle, please check out our updated interactive where we evaluate how our many, many forecasts have done over the years.) One last note for Georgia where both Senate races went to runoffs, we’ll evaluate our forecast based on what they said as of Nov. 3, which incorporated the possibility of a runoff in January. (That forecast correctly had Democrats favored to win Georgia’s special Senate election, eventually won by now-Sen. Raphael Warnock, but incorrectly had Republican incumbent Sen. David Perdue favored to hold onto his seat, which was instead won by Democratic Sen. Jon Ossoff).

Here’s a look at all 529 forecasts combined, including our presidential forecasts plus the Deluxe version of our congressional forecasts. And in the chart below, I’ve broken these forecasts down into four different categories: toss-up (where the leader had between a 50 and 60 percent chance of winning); lean (a 60 to 75 percent chance); likely (a 75 to 95 percent chance) and solid (a 95 percent or greater chance), first showing how all races combined fared and then how races where Democrats or Republicans were favored did.

How well our 2020 forecasts did

Final FiveThirtyEight presidential forecast and final Deluxe version of our congressional forecasts as of Nov. 3, 2020, versus actual results

All races combined
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	actual correct
Toss-up	22	12	9	55%	41%
Lean	31	21	24	68	77
Likely	77	68	73	88	95
Solid	399	397	399	99	100
All races	529	497	505	94	95
Where Democrats were favored
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up (tilt D)	12	7	1	55%	8%
Lean D	15	11	8	70	53
Likely D	40	35	36	87	90
Solid D	222	221	222	99	100
All races	289	273	267	94	92
Where Republicans were favored
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up (tilt R)	10	6	8	55%	80%
Lean R	16	10	16	65	100
Likely R	37	33	37	88	100
Solid R	177	176	177	99	100
All races	240	224	238	93	99

One big takeaway here is that, perhaps unsurprisingly, we did best in calling races correctly when we were more confident of the outcome. We got 41 percent of our toss-up calls right, along with 77 percent of our “lean” calls, 95 percent of our “likely” calls, and 100 percent (399 of 399) of our “solid” calls.

Also, despite going just nine for 22 in toss-up races, our model did about as well as it was expecting to, or just slightly better than that. Based on the probabilities attached to each forecast, our model’s expectation was that it would get about 497 of the 529 races right in an average year. In fact, it got 505 of them correct (albeit not a statistically significant difference from 497).

These topline numbers, however, conceal significant differences in how the respective parties performed. Democrats won just one of the 12 toss-up races where they were narrowly favored (that is, where they had between a 50 and 60 percent chance of winning), while Republicans won eight of their 10 toss-ups. Meanwhile, Democrats won eight of the 15 “lean Democratic” races — just barely more than 50 percent — while Republicans won 16 of the 16 “lean Republican” races. Democrats did win 36 of the 40 “likely Democratic” races, and all of their “solid Democratic” races, so there was a limit to how far up the probabilistic scale the upsets went.

Still, the large majority of the races that were expected to be close went the same way — toward Republicans — especially in the U.S. House. Of course, that’s what you might expect in an election cycle where the average poll had roughly a 5-point Democratic bias.

Evaluating our presidential forecasts

Next, let’s break out the numbers for the presidential race specifically:

How well our 2020 presidential forecast did

Final FiveThirtyEight presidential forecast as of Nov. 3, 2020, versus actual results

All races combined
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up	3	2	2	57	67
Lean	6	4	4	66	67
Likely	13	12	13	90	100
Solid	35	35	35	98	100
All races	57	52	54	91	95
Where Democrats were favored
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up (tilt D)	2	1	1	58	50
Lean D	4	3	2	68	50
Likely D	7	6	7	90	100
Solid D	19	19	19	99	100
All races	32	29	29	91	91
Where Republicans were favored
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up (tilt R)	1	1	1	55	100
Lean R	2	1	2	61	100
Likely R	6	5	6	89	100
Solid R	16	16	16	98	100
All races	25	23	25	91	100

The sample sizes aren’t really large enough here to say much about calibration, although for what it’s worth the model did decently enough, getting two of the three toss-up races right along with four of the six “lean” races. It also got all of the “likely” and “solid” forecasts correct. The only misses, however, were those that worked in Trump’s favor: Biden was favored to win Florida, North Carolina and Maine’s 2nd Congressional District when Trump won them instead. Still, none of those were huge upsets: Trump’s chances were 31 percent in Florida, 36 percent in North Carolina and 43 percent in Maine’s 2nd.

Probably the more controversial part of our forecast is that our final forecast had Biden with an 89 percent chance of winning the Electoral College, which struck some observers as too high given that the outcome of the election took several days to determine.

I’m not super sympathetic to this critique, to be honest. For one thing, Biden’s 89 percent chance of victory did not mean an 89 percent chance of a blowout. On the contrary, precisely the reason that Biden’s chances were so high is that he could endure a fairly large polling error and still come out ahead, something which had not been true for Hillary Clinton four years earlier. And indeed, although the final outcome was fairly close, Biden did have some margin to spare, as he could have lost any two of the four closest states (Georgia, Wisconsin, Arizona, Pennsylvania) and still have won the Electoral College despite the large polling error.

Evaluating our congressional forecast

Let’s turn back to our congressional forecast, and compare the Lite, Classic and Deluxe versions of our model. We sometimes describe the Lite forecast as being “polls only,” although as I’ll explain in a moment, that undersells it to some degree. But the Classic forecast layers in a whole bunch of additional data that Lite does not use, such as fundraising data and the results of the previous congressional race in that state or district. The Deluxe version, meanwhile, uses all of that data plus expert forecasts from groups such as The Cook Political Report. Here’s how our congressional forecast did by version:

How well our Deluxe congressional forecast did

Final Deluxe version of FiveThirtyEight’s congressional forecast as of Nov. 3, 2020, versus actual results

All races combined
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up	19	10	7	55%	37%
Lean	25	17	20	68%	80%
Likely	64	56	60	87%	94%
Solid	364	362	364	99%	100%
All races	472	445	451	94%	96%
Where Democrats were favored
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up (tilt D)	10	6	0	55%	0%
Lean D	11	8	6	71%	55%
Likely D	33	29	29	87%	88%
Solid D	203	202	203	>99%	100%
All races	257	244	238	95%	93%
Where Republicans were favored
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up (tilt R)	9	5	7	55%	78%
Lean R	14	9	14	66%	100%
Likely R	31	27	31	88%	100%
Solid R	161	160	161	99%	100%
All races	215	201	213	94%	99%

How well our Classic congressional forecast did

Final Classic version of FiveThirtyEight’s congressional forecast as of Nov. 3, 2020, versus actual results

All races combined
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up	19	10	10	55%	53%
Lean	32	17	21	67%	66%
Likely	77	67	73	87%	95%
Solid	344	342	344	99%	100%
All races	472	441	450	93%	95%
Where Democrats were favored
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up (tilt D)	9	5	1	55%	11%
Lean D	14	9	6	66%	43%
Likely D	39	34	35	87%	90%
Solid D	196	195	196	>99%	100%
All races	258	243	238	94%	92%
Where Republicans were favored
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up (tilt R)	10	5	9	55%	90%
Lean R	18	12	17	67%	94%
Likely R	38	33	38	87%	100%
Solid R	148	147	148	99%	100%
All races	214	198	212	92%	99%

How well our Lite congressional forecast did

Final Lite version of FiveThirtyEight’s congressional forecast as of Nov. 3, 2020, versus actual results

All races combined
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up	23	13	19	54%	83%
Lean	37	25	30	68%	81%
Likely	103	89	97	87%	94%
Solid	309	307	309	99%	100%
All races	472	434	455	92%	96%
Where Democrats were favored
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up (tilt D)	4	2	1	54%	25%
Lean D	13	9	7	69%	54%
Likely D	44	38	38	87%	86%
Solid D	192	191	192	>99%	100%
All races	253	240	238	95%	94%
Where Republicans were favored
CATEGORY	NO. RACES	EXPECTED WINS	ACTUAL WINS	EXPECTED CORRECT	ACTUAL CORRECT
Toss-up (tilt R)	19	10	18	54%	95%
Lean R	24	16	23	68%	96%
Likely R	59	51	59	87%	100%
Solid R	117	116	117	99%	100%
All races	219	194	217	88%	99%

It turns out that the “simple” Lite version of our model called 455 of the 472 races correctly, as compared to 450 for Classic and 451 for Deluxe. That’s a surprise, perhaps, especially given that our Lite forecast relies more heavily on polls, and the polls didn’t have a very good year.

However, there are a couple of complications. One is that the Lite model isn’t so simple, really. In races with a lot of polling, it just defaults to the polling. But many congressional races — most races for the House, for instance — don’t get very much polling. Instead, in those cases, Lite relies on a system called CANTOR that looks at polling in similar states and districts. I’m skipping some steps here, but in identifying comparable districts, CANTOR relies heavily on our partisan lean index, which is based on recent voting in presidential and state legislative elections. So if Lite is performing well, it may just mean that state and district partisanship (as used in CANTOR) is a super important metric, maybe more so than the more complicated soup of data that Classic and Deluxe look at.

The other complication is that Lite probably got lucky. It correctly called an uncanny 19 of 23 races that it deemed to be toss-ups, while Classic got 10 of 19 of its toss-ups right and Deluxe got seven of its 19 toss-ups right. There’s not much skill in forecasting toss-ups, but if each version of the model had gotten its expected number of toss-ups right, then Deluxe would have called 454 races correctly, as compared to 449 for Lite and 448 for Classic.

Still, there are some things to consider for future election cycles. In a world with extremely little split-ticket voting, it may be hard to improve on measures such as the partisan lean of a district when forecasting congressional races. Conversely, fundraising — which the Classic model relies upon fairly heavily — may be a misleading metric in a world in which races are highly nationalized and a high fundraising total doesn’t necessarily indicate an abundance of local support.

Here’s another concern from our House forecast: Democrats’ final seat total — 222 seats, down from 235 at the conclusion of the 2018 midterms — fell outside of the 80 percent confidence interval in our Deluxe forecast, which spanned between 225 and 254 Democratic seats. It was also outside the 80 percent interval in Lite and Deluxe.

This wasn’t a total disaster unto itself: Forecasts are supposed to miss the 80 percent confidence interval 20 percent of the time, and Democrats’ seat total was within the 90 percent confidence interval. Still, as I mentioned earlier, we’ll want to study whether the correlation in forecast errors has become even higher than we were assuming before. If so, that would yield wider confidence intervals in our topline forecasts since it’s less likely that errors in different races cancel out (i.e., that Democrats would pull off an upset victory in one seat to compensate for an upset loss in another seat).

The biggest upsets of 2020

Speaking of upsets, the next table contains a comprehensive list of upsets: Any race that any version of any of our models got wrong:

The biggest upsets of 2020 per our forecasts

Races in which at least one version of the FiveThirtyEight forecast rated the eventual winner as an underdog

		Winner’s Chances
Office	Race	Deluxe	Classic	Lite
House	IA-1	12.5%	6.7%	13.1%
House	IA-2	12.2	6.0	16.3
House	FL-26	18.1	13.8	19.3
House	FL-27	18.5	31.8	21.6
House	TX-23	25.9	24.9	31.6
House	CA-48	32.0	34.2	18.3
House	CA-39	26.0	35.5	23.5
Senate	NC	31.9	29.4	25.5
President	FL	—	30.9	—
House	NY-2	43.0	31.0	30.8
President	NC	—	36.1	—
House	CA-21	41.8	39.1	29.8
Senate	ME	41.0	41.4	30.2
House	GA-7	43.3	30.4	39.0
House	SC-1	35.7	42.3	38.2
President	ME-2	—	42.7	—
House	CA-25	45.3	33.4	52.1
House	NY-11	41.5	38.8	58.6
Senate	GA (Perdue)	42.6	47.4	49.2
House	UT-4	43.9	46.3	50.5
House	NM-2	44.5	43.4	56.1
House	VA-5	50.9	40.3	53.0
House	NJ-2	49.8	54.6	43.3
House	PA-10	52.2	54.5	44.5
House	OK-5	49.1	50.0	54.2
House	IN-5	49.8	54.8	50.1
House	MI-3	55.8	47.5	54.3
Senate	IA	57.7	50.2	49.98
House	CO-3	61.2	47.5	61.4

One thing to note is that there weren’t any particularly remarkable upsets such as a 50-to-1 or 100-to-1 longshot winning. No candidate with less than a five percent chance of winning won, and only two candidates with less than a 10 percent chance did — Republicans Ashley Hinson and Mariannette Miller-Meeks in Iowa’s 1st and 2nd Congressional Districts, respectively — and then, only in the Classic version of our model.

Still, the table provides a good indication of where the pain points were for Democrats. They considerably underperformed expectations in Iowa along with a series of districts with a large number of Hispanic voters, such as Texas’s 23rd Congressional District and Florida’s 26th Congressional District. That led to a considerably narrower House majority than Democrats were hoping for. Conversely, the only upsets in races where Republicans were favored were for Perdue’s Senate seat and in Georgia’s 7th Congressional District, where Democrat Carolyn Bourdeaux beat Republican Rich McCormick.

Quick thoughts on model changes for 2022

At the end of each election cycle, I like to consider what changes we might make to our models in forthcoming cycles. I’ll be brief here, since I’ve already touched on most of these.

One set of questions is concerned with how reliable the polling will be going forward and what changes we might make in response to that. The changes we made to our pollster ratings earlier this year — namely, no longer privileging live-caller telephone polls — will have some knock-on effects on our models, since our models use our pollster ratings to determine how much weight to assign to polls from different firms.

Beyond that, though, I think questions about polling accuracy are probably more pertinent to 2024 than 2022. That’s partly because our congressional model already relies on a mix of polling and non-polling indicators and so is less sensitive to inaccurate polling than our presidential forecast might be. It’s also partly because the polls did quite well in the previous midterm election in 2018, so it seems prudent to see how they fare in 2022 before necessarily concluding that congressional polling is broken.

Also, I’d note that our models are already fairly conservative with respect to how accurate they expect the polling to be. That is, they assume that polling errors are common. That’s how our forecasts managed to be well-calibrated despite a fairly large polling error in 2020.

What else to look at before 2022? Prior to last year, we’d already increased the weight given to partisanship in the Deluxe and Classic models and reduced the weight associated with variables such as candidate experience and fundraising. That looks to have been a good first step, but I suspect when we throw the 2020 results in and reanalyze the data, it will compel some further changes in that direction. And as I mentioned, while our models already assume that the errors between different races are correlated, those correlations may be even higher in an era of greater partisanship, so we’ll want to look at that, too.

Finally, there’s the question of what might happen to election analysis in an America where the Republicans might seek to nullify or invalidate election results. In some sense, these questions are external to our model, which is intended to forecast the results assuming all legal votes are counted and that electoral outcomes are respected. But we’ll keep our minds open as to whether there are ways to present our forecasts that make the scope and mission of our model clearer, or different types of model outputs that could shine light on these questions. In the meantime, we’ll continue to cover the threats to electoral democracy through our reporting.

[ad_2]

We Heard