How Our 2022 Midterm Forecasts Performed

[ad_1]

FORECAST-REVIEW-2023-4×3 — PHOTO ILLUSTRATION BY EMILY SCHERER / GETTY IMAGES

Let’s get this out of the way up front: There was a wide gap between the perception of how well polls and data-driven forecasts did in 2022 and the reality of how they did … and the reality is that they did pretty well.

While some polling firms badly missed the mark, in the aggregate the polls had one of their most accurate cycles in recent history. As a result, FiveThirtyEight’s forecasts had a pretty good year, too. Media proclamations of a “red wave” occurred largely despite polls that showed a close race for the U.S. Senate and a close generic congressional ballot. It was the pundits who made the red wave narrative, not the data.

With that said, the polls weren’t perfect.

Polling averages and forecasts did slightly underestimate Democrats, though the differences were modest — certainly less than the extent to which they underestimated Republicans in 2016 and 2020.
Some pollsters — such as Trafalgar Group and Rasmussen Reports, which have a history of Republican-leaning polling — had a conspicuously poor year.
There are different methods of polling aggregation and forecasting. The margins in the polling averages from RealClearPolitics were on average 1.3 percentage points more favorable to Republicans in the most competitive Senate races than those published by FiveThirtyEight. Similarly, RCP’s generic ballot polling average was 1.3 points more favorable to the GOP than FiveThirtyEight’s. In this article, I’ll only be evaluating FiveThirtyEight’s forecasts, but methodological choices made a difference.
Finally, Democrats’ relatively strong year — although there were some precedents for it — defied a lot of midterm history. It’s not just that the polls did better than the conventional wisdom; they also did well relative to political science or “fundamentals”-based forecasting methods.

So let’s dig into the FiveThirtyEight forecast. As you may know if you follow our work closely, we publish three different versions of our congressional and gubernatorial forecasts. Version one is a Lite forecast that sticks as much as possible to the polls themselves. (In races that have little polling, Lite makes inferences from the generic ballot and from polls of other races.) Our Classic forecast blends the polls with other data — for instance, information on candidate fundraising, incumbency and the voting history of the state or district. Finally, our Deluxe forecast adds in another layer, namely race ratings from outside groups such as The Cook Political Report. Deluxe is the default when you pull up our forecast interactive and the version that we use most often when describing our forecasts.

But there was, in some ways, a fourth version of our model this year. Because of a data processing error, our Deluxe version was using outdated race ratings for House ratings from one of the expert groups, Inside Elections. Essentially, those ratings were frozen in time as of late September. The impact on the forecast was minor, but not to the point of being trivial. In this article, I’ll evaluate our Deluxe forecasts both as published (that is, with outdated Inside Elections ratings) and as revised with the correct ratings. (Ironically, the as-published forecasts were actually slightly more accurate than revised ones — more about that below.)

But first, here were the topline numbers for the various versions of our forecast:

Dot plot with 80% confidence intervals of the final Senate and House forecasts in each version of FiveThirtyEight’s 2022 model versus actual election results, showing how well FiveThirtyEight’s 2022 topline forecast performed. Overall, election results were relatively close to forecast means and were within 80% confidence intervals for all model types for both the Senate and House.

Both the Democrats’ one-seat gain in the Senate and Republicans’ nine-seat gain in the House were well within the 80 percent confidence intervals established by our various models. True, the actual results were not in the dead center of the range: Democrats did somewhat better than the average forecasted result in both chambers. But it’s hard to hit an exact bullseye (although we get lucky and come close now and then) — that’s the whole reason to express uncertainty in a forecast.

In percentage terms, the forecasts gave Democrats somewhere between a 41 and 50 percent chance of keeping control of the Senate. Even using the 41 percent number, you would have had a decent-sized (and ultimately winning) bet on Democrats relative to prediction market odds, which put their chances at 32 percent. That is to say, the FiveThirtyEight forecasts were more bullish on Democrats than the conventional wisdom. And the Lite and Classic forecasts — which rely entirely on objective indicators and not expert ratings — saw the Senate as a true dead heat.

Meanwhile, Republicans won the aggregate popular vote for the House by 2.8 percentage points. That is pretty close to the target established by our forecasts, which projected Republicans to win it by margins ranging from 2.4 to 4.0 points.

In other words, Republicans won about as many national votes as expected. There was not any sort of disproportionate youth turnout wave or other Democratic turnout surge. Instead, according to exit polls, more Republican-identified than Democratic-identified voters turned out in November.

However, Democrats did an especially good job of translating votes into seats. How? Republicans ran up the score in uncompetitive races while Democrats eked out tight ones. A big part of the story is candidate quality. In many swing states and districts, Republicans offered voters far-right, inexperienced and/or scandal-plagued candidates, turning off independent voters. It may also have been that Democrats did a better job of directing financial and other resources to the highest-stakes races. Differences on the margin mattered: Democrats won four of the six Senate races and four of the five gubernatorial races decided by 5 percentage points or fewer.

Next, let’s check the calibration of the FiveThirtyEight forecasts, which is a way to see if the leading candidate won about as often as advertised. (For instance, did candidates who had a 70 percent chance win around 70 percent of the time?) We break our forecasts down into four categories: toss-up (where the leader had between a 50 and 60 percent chance of winning); lean (a 60 to 75 percent chance); likely (a 75 to 95 percent chance) and solid (a 95 percent or greater chance). Here were the numbers for the various versions of the forecasts — first splitting the results by whether Democrats or Republicans were favored, then showing all races combined.

How well our Lite midterms forecast did

Final Lite version of FiveThirtyEight’s House, Senate and gubernatorial forecasts as of Nov. 8, 2022, versus actual results

Where Democrats were favored
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up (tilt D)	50-60%	7	4	55%	3	43%
Lean D	60-75%	26	18	68%	23	88%
Likely D	75-95%	36	32	88%	36	100%
Solid D	≥95%	168	167	99%	168	100%
All races		237	220	93%	230	97%
Where Republicans were favored
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up (tilt R)	50-60%	11	6	53%	3	27%
Lean R	60-75%	13	9	67%	8	61%
Likely R	75-95%	44	38	86%	40	91%
Solid R	≥95%	201	200	99%	201	100%
All races		269	252	94%	252	94%
All races combined
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up	50-60%	18	10	54%	6	33%
Lean	60-75%	39	26	67%	31	79%
Likely	75-95%	80	69	87%	76	95%
Solid	≥95%	369	367	99%	369	100%
All races		506	472	93%	482	95%

Overall, calibration of the Lite forecast was pretty good, but with some asymmetries between the parties. Based on our forecast, Republicans were supposed to win 252 races (combining House, Senate and gubernatorial contests) and they in fact won exactly 252. Democrats were supposed to win 220 races and instead won 230. In the aggregate, the Lite forecasts were slightly underconfident — meaning there were somewhat fewer upsets than expected — although that’s what you might expect in a cycle where the polls had a strong year.

How well our Classic midterms forecast did

Final Classic version of FiveThirtyEight’s House, Senate and gubernatorial forecasts as of Nov. 8, 2022, versus actual results

Where Democrats were favored
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up (tilt D)	50-60%	14	8	55%	13	93%
Lean D	60-75%	23	16	68%	16	70%
Likely D	75-95%	30	26	88%	30	100%
Solid D	≥95%	172	171	>99%	172	100%
All races		239	221	92%	231	97%
Where Republicans were favored
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up (tilt R)	50-60%	9	5	54%	2	22%
Lean R	60-75%	14	9	66%	7	50%
Likely R	75-95%	26	23	89%	25	96%
Solid R	≥95%	218	217	>99%	217	>99%
All races		267	254	95%	251	94%
All races combined
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up	50-60%	23	13	55%	15	65%
Lean	60-75%	37	25	67%	23	62%
Likely	75-95%	56	49	88%	55	98%
Solid	≥95%	390	389	>99%	389	>99%
All races		506	475	94%	482	95%

The calibration story is basically the same story for our Classic forecasts. Note that there were very few long-shot upsets. In races labeled as “likely,” the favorite won 55 out of 56 races. And they won 389 out of the 390 “solid” races.

And last but not least, Deluxe followed more or less the same script. I’ll present both the published and revised versions of Deluxe together since they make for a fun comparison:

How well our published Deluxe midterms forecast did

Final Deluxe version of FiveThirtyEight’s House, Senate and gubernatorial forecasts (which were affected by a data processing error) as of Nov. 8, 2022, versus actual results

Where Democrats were favored
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up (tilt D)	50-60%	8	4	55%	7	88%
Lean D	60-75%	19	13	67%	16	84%
Likely D	75-95%	30	26	87%	29	97%
Solid D	≥95%	182	181	>99%	182	100%
All races		239	224	94%	234	98%
Where Republicans were favored
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up (tilt R)	50-60%	8	4	55%	3	38%
Lean R	60-75%	8	5	65%	6	75%
Likely R	75-95%	24	21	87%	19	79%
Solid R	≥95%	227	226	>99%	226	>99%
All races		267	259	96%	254	95%
All races combined
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up	50-60%	16	9	55%	10	63%
Lean	60-75%	27	18	67%	22	82%
Likely	75-95%	54	47	87%	48	89%
Solid	≥95%	409	408	>99%	408	>99%
All races		506	481	95%	488	96%

How well our revised Deluxe midterms forecast did

What the final Deluxe version of FiveThirtyEight’s House, Senate and gubernatorial forecasts would have said on Nov. 8, 2022, in the absence of a data processing error, versus actual results

Where Democrats were favored
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up (tilt D)	50-60%	12	7	55%	7	58%
Lean D	60-75%	16	11	68%	15	94%
Likely D	75-95%	24	21	87%	24	100%
Solid D	≥95%	182	181	>99%	182	100%
All races		234	220	94%	228	97%
Where Republicans were favored
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up (tilt R)	50-60%	12	6	54%	2	17%
Lean R	60-75%	13	9	66%	6	46%
Likely R	75-95%	23	20	88%	22	96%
Solid R	≥95%	224	223	>99%	223	>99%
All races		272	259	95%	253	93%
All races combined
			Expected		Actual
CATEGORY	ODDS	RACES	WINS	CORRECT	WINS	CORRECT
Toss-up	50-60%	24	13	55%	9	38%
Lean	60-75%	29	20	68%	21	72%
Likely	75-95%	47	41	87%	46	98%
Solid	≥95%	406	405	>99%	405	>99%
All races		506	479	94%	481	95%

Note that the as-published version of the Deluxe model actually made more correct “calls” (488) than the revised version did (481), even though the published version was using out-of-date Inside Elections ratings! Some of this probably just reflects luck in the closest contests. Deluxe (as published) identified the winners correctly in 32 of 43 “toss-up” and “lean” races (74 percent), while Deluxe (revised) went 30-of-53 (57 percent) in these categories.

However, the published version of the Deluxe ratings was also somewhat more optimistic for Democrats than the revised version. Since Democrats had a pretty good night overall, this helped it get a few more calls right. Mostly this reflects that the conventional wisdom grew more bearish on Democrats between late September and Election Day — and the conventional wisdom in September was closer to what actually transpired. So in some ways it was a blessing-in-disguise to use the late September version of the Inside Advantage ratings.

Next up, a chart you’ll love if you want to give us a hard time: the biggest upsets of the year.

The biggest upsets of 2022

Races in which at least one version of the final FiveThirtyEight forecast rated the eventual winner as an underdog

Office▲▼	Race▲▼	Winning Party▲▼	Lite▲▼	Classic▲▼	Deluxe (Pub.)▲▼	Deluxe (Rev.)▲▼
			WINNER’S CHANCES
House	WA-3	D	15.3%	4.0%	2.2%	4.6%
House	CO-8	D	24.9	15.1	9.0	17.7
House	OH-1	D	35.9	29.5	16.1	29.9
House	OH-13	D	42.5	34.8	18.6	33.9
House	NY-4	R	47.6	28.9	22.3	29.5
House	NM-2	D	31.1	39.2	22.4	37.2
House	NC-13	D	54.5	43.0	23.4	39.1
House	NY-17	R	36.2	43.8	29.9	41.5
House	NY-3	R	46.8	33.7	31.7	41.1
Governor	AZ	D	33.3	36.2	32.0	34.2
House	CA-13	R	35.3	35.9	33.4	45.2
Senate	GA	D	47.7	46.2	36.8	39.6
Senate	PA	D	55.5	53.4	42.7	46.0
House	PA-7	D	46.3	33.4	43.9	32.4
House	TX-15	R	63.9	38.4	45.9	60.1
Governor	WI	D	49.9	56.9	47.0	48.9
House	TX-34	D	73.6	56.6	47.8	50.9
Senate	NV	D	41.6	45.9	48.8	50.8
House	AK-1	D	47.8	39.1	50.4	48.2
House	VA-2	R	43.4	61.1	52.2	66.9
House	NV-1	D	17.9	33.7	53.2	45.8
House	RI-2	D	17.2	48.1	53.7	43.8
House	PA-17	D	48.5	46.4	54.2	41.2
House	NY-19	D	70.3	51.7	57.6	45.4
House	PA-8	D	68.2	53.5	59.3	46.4
House	CT-5	D	58.3	55.0	60.7	47.3
House	CA-22	R	39.1	39.2	60.9	47.3
House	NV-3	D	39.3	56.4	61.5	51.8
House	IL-17	D	66.0	57.3	62.2	49.3
House	CA-27	R	52.8	37.8	63.4	50.8
House	NY-22	R	46.2	37.7	64.2	47.7
House	NH-1	D	49.5	50.9	67.0	58.2
House	IL-6	D	89.7	49.6	67.3	65.4
House	MD-6	D	70.3	47.9	71.5	69.1
House	OR-6	D	31.8	59.8	71.9	61.3

The major upset here is in Washington’s 3rd Congressional District, where Democrat Marie Gluesenkamp Perez defeated Republican Joe Kent despite having only a 2 percent chance in Deluxe (as published) and a 4 percent chance in Classic and Deluxe (revised). That’s a big upset, but it’s also about what you’d expect. As you can see from the calibration charts, an upset or two like this is par for the course given that we made forecasts for more than 500 races. So this is a sign of solid calibration.

What we might change — and what we don’t think we’ll change — for 2024 and 2026

I typically close these forecast reviews by considering what modifications I might make to our models in the future. In certain ways, though, this election lowered my stress level a bit. The polls had a relatively good year, even if that by no means rules out problems going forward.

Moreover, one of the core hypotheses of our forecast is that polling bias is unpredictable: Polls will be biased against Republicans in some years and biased against Democrats in other years, but it’s hard to predict the direction of the bias in advance. That was the case in 2022, where Democrats were modestly underrated by the polls in 2022 — albeit with some misses in both directions — after Republicans considerably overperformed their polls in 2016 and 2020.

However, there are a few things that I’m thinking about:

1. Given that the Deluxe forecasts haven’t really outperformed Lite or Classic since we introduced the current version of the model in 2018, there’s a question of what utility they serve. In principle, the expert ratings used in Deluxe can add a lot of value by considering measures of candidate quality that may be hard to spot in objective indicators, or because the groups that publish these ratings have access to inside information such as internal polling. But they can also introduce a subjective or “vibes”-based element, which certainly didn’t help in 2022.

There’s also a potential issue of recursiveness. If the expert groups partly look to the FiveThirtyEight forecast for guidance in how they rate races, but the FiveThirtyEight model in turn uses the expert ratings, the two methods become less independent from one another.

I’m not sure what we’ll do about Deluxe quite yet, but it’s a fairly close call between keeping things as is, scrapping the Deluxe forecast, and keeping Deluxe but making it a secondary version and Classic the default version.

2. Our model has a pretty sophisticated method for considering how the results in different states and districts are correlated — for instance, it understands that demographically similar states and districts tend to move in the same direction. But it likely understates intrastate correlations.

That was an issue this cycle as Republicans experienced a localized “red wave” in Florida and New York despite having a disappointing election nationally. These effects were partly the result of turnout differentials caused by upballot candidates, such as the tailwinds for Republican Gov. Ron DeSantis in Florida or the lack of enthusiasm for Democratic Gov. Kathy Hochul in New York.

The problems come because our model underestimates the degree to which a district in upstate New York and one in downstate New York are potentially correlated with one another, even if the districts are fairly different from one another demographically. Conversely, it slightly overestimates the degree to which a district in New York and one in Pennsylvania are correlated. We will do some due diligence on how common these patterns have been in past elections — and how much practical effect they have on the model.

3. Finally, this is more in the category of “note to ourselves,” but we need to review our internal processes for double-checking that data inputs are working properly, given the error involving the Inside Elections ratings. Our models combine and aggregate a lot of different data sources, and that’s part of what makes them robust — but complex models can also introduce more opportunities for error.

There’s also one department where I’m not considering major changes, which is our process for determining which polls we include in our averages and forecasts.

Despite complaints both before and after the election about Republican-leaning polling firms “flooding the zone,” our overall forecasts and polling averages were both fairly accurate and relatively unbiased in 2022. It doesn’t seem prudent to me to have continued to “trust the process” after 2016 and 2020, when polling averages had a strong pro-Democratic bias, but then to panic and radically revise our method after polling averages had a slight pro-Republican bias in 2022.

That doesn’t mean we won’t consider changes around the margin. But I’ve been thinking about these issues for a long time, and our polling averages and our model already have a lot of defense mechanisms against zone-flooding. The most important is our house-effects adjustment: if a polling firm consistently shows Democratic or Republican-leaning results, the model detects that and adjusts the results accordingly. Expressly partisan polls (such as an internal poll for a campaign or the RNC) also receive special handling: basically the model assumes they are biased until proven otherwise. And our pollster ratings are designed to be self-correcting. When we update our ratings later this year with results from 2022, pollsters such as Trafaglar and Rasmussen will take a hit, which will give them less influence in the polling averages in 2024.

Finally, we don’t want to have to make a lot of ad-hoc decisions about which polls to include or not, both because that would be extremely time-consuming and because it would introduce avenues for bias when everyone is stressed out in the middle of an election campaign.

Keep in mind that this is a long-term process: It takes many election cycles to determine which polling firms are most reliable. For instance, some of the polling firms that were least accurate in 2022 were actually the most accurate in 2020. I think it’s an enormous mistake in forecasting to constantly “fight the last war” when you have many years or a larger batch of data to evaluate. It’s exactly the sort of mistake that vibes-driven pundits make: They assume that whatever happened in the previous election or two will happen again. Our approach is to create a good process and to play the long game — and it works out pretty well more often than not.

[ad_2]

We Heard