The Death Of Polling Is Greatly Exaggerated


[ad_1]

Polls probably aren’t at the top of your mind right now. We’re more than four months removed from the 2020 election, and we still have almost 20 months to go until the midterms.

That’s why it’s the perfect time to launch the latest update to our FiveThirtyEight pollster ratings, which we just released today! We’d encourage you to go check out the ratings — as well as our revamp of the interactive featuring individual pages for each pollster with more detail than ever before on how we calculate the ratings.

What are pollster ratings? | Polling 101 from FiveThirtyEight

No, but seriously … I think it’s nice to have a little distance from the heat of an election cycle when talking about polls. When I first looked at the performance of the polls in November, it came after the election had just been called for Joe Biden — and after several anxious days of watching states slowly report their mail ballots, which produced a “blue shift” in several states that initially appeared to have been called wrongly by the polls. We also didn’t yet know that Democrats would win control of the U.S. Senate, thanks to a pair of January runoffs in Georgia. Meanwhile, then-President Donald Trump was still refusing to concede. In that environment, a decidedly mediocre year for the polls was being mistaken for a terrible one when that conclusion wasn’t necessarily justified.

So what does 2020 look like with the benefit of more hindsight — and the opportunity to more comprehensively compare the polls against the final results? The rest of this article will consist of four parts:

  • First, our review of how the polls did overall in 2020, using the same format that we’ve traditionally applied when updating our pollster ratings.
  • Second, a look at which polling firms did best and worst in 2020.
  • Third, our evaluation of how the polls have performed both in the short run and long run based on various methodological categories. And we’ll announce an important change to how our pollster ratings will be calculated going forward. Namely — breaking news here — it’s no longer clear that live-caller telephone polls are outperforming other methods, so they’ll no longer receive privileged status in FiveThirtyEight’s pollster ratings and election models.
  • Finally, some other, relatively minor technical notes about changes in how we’re calculating the pollster ratings. Some of you may want to skip this last part.

How the polls did in 2020

Our pollster ratings database captures all polls conducted in the final 21 days of presidential primary elections since 2000, as well as general elections for president, governor, U.S. Senate and House since 1998. It also includes polls on special elections and runoffs for these offices. So, technically speaking, the data you’ll see below covers the entire 2019-20 election cycle, though the majority of it comes from elections on Nov. 3, 2020. We’re also classifying the Georgia Senate runoffs, held on Jan. 5, 2021, as part of the 2019-20 cycle.

First up, let’s start with our preferred way to evaluate poll accuracy: calculating the average error observed in the polls. We do this by comparing the margin between the top two finishers in the poll to the actual results; for example, if a poll had Biden leading Trump by 2 percentage points in a state and Trump actually won by 4 points, that would be a 6-point error. 

In the table below, we calculate the average error for all polls in our database for 2019-20 and how that compares with previous cycles, excluding polling firms banned by FiveThirtyEight and weighting by how prolific a pollster was in a given cycle. We also break out the polling error by office.

It was a mediocre year for the polls in 2020

Weighted-average error in polls in final 21 days of the campaign

Presidential State Level
Cycle Primary General Governor U.S. Senate U.S. House Combined
1998 8.1 7.5 7.1 7.7
1999-2000 7.9 4.4 4.9 6.0 4.4 5.6
2001-02 5.3 5.4 5.4 5.4
2003-04 7.0 3.2 5.4 5.3 5.8 4.8
2005-06 5.1 5.2 6.5 5.7
2007-08 7.5 3.5 4.4 4.6 5.8 5.5
2009-10 4.7 5.0 7.0 5.8
2011-12 8.9 3.7 4.9 4.7 5.6 5.3
2013-14 4.5 5.3 6.8 5.3
2015-16 10.2 4.9 5.4 5.0 5.5 6.8
2017-18 5.1 4.2 5.1 4.9
2019-20 10.2 5.0 6.4 5.9 6.4 6.3
All years 9.2 4.3 5.4 5.4 6.3 6.0

Pollsters that are banned by FiveThirtyEight are not included in the averages. So as not to give a more prolific pollster too much influence over the average, polls are weighted by one over the square root of the number of polls each pollster conducted in a specific category.

Looking at all the polls in 2019-20, the polls had an average error of 6.3 percentage points. That makes it the third-worst of the 12 election cycles included in our pollster ratings, better only than 1998 (an average error of 7.7 points) and 2015-16 (6.8 points). Although, don’t read too much into the difference between 2019-20 and 2015-16. Each subcategory of polls in 2015-16 (e.g., U.S. Senate polls) was equally accurate or more accurate than in 2019-20. 

Breaking the results down by election type doesn’t make 2019-20 look much better. It was the second-worst out of 12 gubernatorial cycles and the third-worst out of 12 U.S. Senate cycles. In races for the U.S. House, 2020’s performance was closer to average. But 2020 had the highest average error of the six presidential general election cycles used in the pollster ratings (albeit only a tenth of a point worse than 2016). And it was tied with 2016 for being the worst cycle for presidential primary polls … although the primary calendar offered some decent excuses for why polling those races was tough.

Will the stimulus bill boost Democrats’ electoral prospects?

But while polling accuracy was mediocre in 2020, it also wasn’t any sort of historical outlier. The overall average error of 6.3 points in 2019-20 is only slightly worse than the average error across all polls since 1998, which is 6.0 points. There were also presidential years before the period our pollster ratings cover, such as in 1948 and 1980, when the polls exhibited notably larger errors than in 2020.

So while the polling industry has major challenges — including, as we’ll detail later, the fact that live-caller telephone polls may no longer be the industry gold standard — it’s also premature to conclude that the sky is falling. As you can see from the chart above, there isn’t any particularly clear statistical trend showing that polls have gotten worse over time. Yes, both 2016 and 2020 were rather poor years, but sandwiched between them was an excellent year for the polls in 2018. And in their most recent test, the Georgia Senate runoffs, the polls were extremely accurate.

Of course, there’s a lot more to unpack here. Why have the polls been pretty accurate in recent years in “emerging” swing states, such as Georgia and Arizona, but largely terrible in the Upper Midwest? Why did they do poorly in 2016 and 2020 but pretty well in Trump-era elections — like the Georgia runoffs or the Alabama Senate special election in 2017 — when Trump himself wasn’t on the ballot? We don’t really have time to explore the landscape of theories in the midst of this already very long article, although these are topics we’ve frequently covered at FiveThirtyEight. At the same time, I hope this macro-level view has been helpful and an evolution beyond the somewhat misinformed “polling is broken!” narrative.

Next, let’s review a couple of other metrics to gauge how accurate the polls were. One of them makes 2020 look a bit better — while the other makes it look worse and gets at what we think is the strongest reason for concern going forward: not that the polls were necessarily that inaccurate, but that almost all the misses came in the same direction, underestimating GOP support.

First, the hits and misses, or how often the polls “called” the winner. By this measure, the 2019-20 cycle was pretty average, historically speaking. In 79 percent of polls across the cycle, the winner was identified correctly, which matches our 79 percent hit rate overall.

Still, 4 in 5 polls got the 2020 winners right

Weighted-average share of polls that correctly identify the winner in final 21 days of the campaign

Presidential State Level
Cycle Primary General Governor U.S. Senate U.S. House Combined
1998 85% 87% 49% 75%
1999-2000 95% 67% 82 83 53 76
2001-02 90 81 73 82
2003-04 94 78 70 82 70 78
2005-06 90 92 72 84
2007-08 79 94 95 95 82 88
2009-10 85 86 74 81
2011-12 62 82 91 87 71 77
2013-14 80 77 74 77
2015-16 85 71 68 78 78 78
2017-18 74 73 80 75
2019-20 79 80 92 72 81 79
All years 82 79 83 81 73 79

Pollsters that are banned by FiveThirtyEight are not included in the averages. So as not to give a more prolific pollster too much influence over the average, polls are weighted by one over the square root of the number of polls each pollster conducted in a specific category.

In fact, this hit rate has been remarkably consistent over time. With the exception of 2007-08, where a remarkable 88 percent of polls identified the right winner, every cycle since 1998 has featured somewhere between 75 percent and 84 percent of winners being identified correctly. So, as a rough rule of thumb, you can expect polls to be right about four out of five times … of course, that also means they’ll miss about one out of five times.

Looking at hits and misses, though, isn’t really our preferred way to judge polling accuracy. Sure, Biden held on to win Wisconsin, for example, so the polls were technically “right.” But no pollster should be bragging about a Biden win by less than a full percentage point when the polling average had him up by 8.4 points there. Likewise, Biden won the national popular vote and Democrats won the popular vote for the U.S. House — but in both cases by narrower-than-expected margins. Meanwhile, the polls happened to get some of the closest states in the presidential race right, such as Georgia and Arizona. But the polls don’t always get so lucky.

Anyway, there’s another, more important metric by which poll performance in 2020 was rather concerning. That is statistical bias, which calculates not the magnitude of the polling error but in which direction (Democratic or Republican) the polls missed.

Polls in 2020 considerably overestimated Democrats

Weighted-average statistical bias in polls in final 21 days of the campaign

Pres. State Level
Cycle General Governor U.S. Senate U.S. House Combined
1998 R+5.8 R+4.5 R+0.9 R+3.8
1999-2000 R+2.4 R+0.2 R+2.8 D+1.2 R+1.8
2001-02 D+3.5 D+2.0 D+1.4 D+2.6
2003-04 D+1.1 D+1.9 D+0.8 D+2.1 D+1.4
2005-06 D+0.4 R+2.1 D+1.1 D+0.1
2007-08 D+1.0 R+0.1 D+0.1 D+1.4 D+0.9
2009-10 R+0.2 R+0.8 D+1.3 D+0.4
2011-12 R+2.5 R+1.6 R+3.1 R+3.2 R+2.8
2013-14 D+2.3 D+2.7 D+3.9 D+2.8
2015-16 D+3.3 D+3.1 D+2.8 D+3.4 D+3.0
2017-18 R+0.9 EVEN R+0.8 R+0.5
2019-20 D+4.2 D+5.6 D+5.0 D+6.1 D+4.8
All years D+1.3 D+0.9 D+0.7 D+1.2 D+1.1

Bias is calculated only for races in which the top two finishers are a Democrat and a Republican. Therefore, it is not calculated for presidential primaries. Pollsters that are banned by FiveThirtyEight are not included in the averages. So as not to give a more prolific pollster too much influence over the average, polls are weighted by one over the square root of the number of polls each pollster conducted in a specific category.

On average in the 2019-20 cycle, polls underestimated the performance of the Republican candidate by a whopping 4.8 percentage points! So the big issue in 2020 wasn’t that the polls were that inaccurate — they were only slightly more inaccurate than usual — but that they almost all missed in the same direction.

Interestingly, the bias was actually smaller for Trump’s presidential race against Biden (4.2 points) than in races for Congress or governor. But either way, that isn’t a good performance: It’s the largest bias in either direction in the cycles covered by our pollster ratings database, exceeding the previous record of a 3.8-point Republican bias in 1998.

If you went back before 1998, it’s likely you could find years with larger bias. Presidential polls and congressional generic ballot polls massively underestimated Republicans in 1980, for instance — by about 7 points in the presidential race, for example. And the final generic ballot polling average underestimated Republicans by about 5 points in the GOP wave year of 1994, we estimate.

In general, there hasn’t been much consistency about in which direction the bias runs from year to year. A Democratic overperformance against the polls in 2011-12 was followed by a Republican one in 2013-14, for example.

However, we think there’s good reason to expect that these types of mistakes in one direction or another — what we sometimes call systematic polling errors — will be more of an issue going forward. How come? The systematic errors aren’t necessarily a function of the polls themselves. Rather, they’re because in a time of intense political polarization and little ticket-splitting, race outcomes are highly correlated with one another up and down the ballot.

Put differently, there’s less chance for errors — overestimating the Democrat in one state, and the Republican in another — to cancel each other out. If something about the polls caused them to overestimate the Democratic presidential candidate’s performance in Iowa, for example, they will probably do the same in a similar state such as Wisconsin. Likewise, if the polls overestimate the Democratic presidential candidate’s performance in Iowa, they’ll probably also overestimate the Democratic Senate candidate’s performance in that state.

The old cliche that the Electoral College is really “50 separate contests” is highly misleading in our nationalized, polarized electoral climate. Everything is connected, and for better or worse, you need some relatively fancy math to get a decent estimate of a party’s chance of winning the presidency, or the Senate.

We know this will sound a little self-serving since we’re in the business of building election forecasts — and we’re not trying to turn this into an episode of “Model Talk” — but it’s precisely because of these correlations that election forecasting models are so valuable. They can help us understand how polling errors work under real-world conditions.

But these correlations also make evaluating poll accuracy harder. Why? Because they have the effect of reducing the sample size. Technically speaking, more than 500 races took place on Nov. 3 if you consider races for Congress, races for governor, and each state’s Electoral College votes. That sounds like a lot of data. But if all the outcomes are highly correlated, they may not tell you as much as you’d think.

Suppose, for example, you had a polling error caused by the fact that Democrats were more likely to stay at home during the COVID-19 pandemic and were therefore more likely to respond to surveys. That sort of issue could leave your polls with a Democratic bias in nearly all those races. And what looked like many failures — underestimating Republicans in dozens of contests! — could really have had just one root cause. So while it might sound flip to write off Nov. 3, 2020, as “just one bad day” for pollsters — and even I wouldn’t go quite that far — it’s closer to the truth than you might think.

That said, there is also a question of whether it’s significant that the polls have continued to be biased in the same direction. Namely, in three of the past four cycles (2013-14, 2015-16 and 2019-20), the polls have all had a meaningful Democratic-leaning bias. Again, though, we’re dealing with a small sample size. (If you flipped a coin four times and it came up heads three times, that would be nothing remarkable at all.) It’s also worth noting that the polls had a meaningful Republican-leaning bias in the cycle just before that, 2011-12.

Let me be clear — and this reflects my viewpoint as a journalist and an avid consumer of polls, because I’m not a pollster myself — from my perch in the rafters, I don’t see 2020 as having been anything particularly remarkable. I think it’s mostly other critics and journalists (who perhaps haven’t spent as much time comparing 2020 with past elections, such as 1980) who lack perspective.

But the reason polls have tended not to show a consistent bias over time is that people who actually do conduct polls work really hard to keep it that way. Most pollsters are not going to go into 2022 or 2024 thinking that 2020 was just bad luck. They’ll scrutinize the reasons for the polling error. They’ll decide whether continued problems are likely going forward or whether much of the error was unique to circumstances particular to 2020, such as COVID-19. And they’ll correct accordingly, or perhaps even overcorrect.

The industry will also course-correct at a macro level. Techniques that worked comparatively well in 2020 will be imitated; polling firms that were comparatively successful will win more business.

And as I hinted at earlier, our pollster ratings will be making a course correction, too — we’ll no longer be giving bonus points to live-caller polls. But before we get to that, let’s take a quick look at how different pollsters fared in 2020.

Which pollsters did best in 2020?

All right, then … so which pollsters made the best of a bad 2020? In an article last year, we covered how the pollsters did in the 2020 primaries, so I’ll stick with the general election here. Here is the average error, share of correct calls, and statistical bias for all firms with at least 10 qualifying polls — plus ABC News/The Washington Post, which I’m including for transparency’s sake since ABC News owns FiveThirtyEight:

AtlasIntel was 2020′s most accurate pollster

Average error of polls in final 21 days before the 2020 general election, for pollsters that conducted at least 10 polls

Pollster No. Polls Races Called Correctly Statistical Bias Avg. Error
Swayable 68 93% +5.1 +5.8
SurveyMonkey 58 97 +3.8 +4.7
Morning Consult 54 80 +4.6 +4.8
Change Research 44 69 +5.7 +6.1
Data for Progress 42 75 +5.0 +5.0
YouGov 37 82 +5.3 +5.3
Emerson College 36 58 +3.0 +4.1
Ipsos 32 73 +4.6 +4.6
Public Policy Polling 31 63 +7.2 +7.2
Trafalgar Group 28 50 -2.1 +2.6
Siena College/NYT Upshot 25 76 +5.5 +5.5
RMG Research 25 80 +6.0 +6.0
Civiqs 22 73 +5.3 +5.6
Gravis Marketing 21 90 +4.8 +4.8
SurveyUSA 17 79 +2.4 +4.6
Rasmussen Reports/POR 17 68 +1.0 +2.8
Opinion Savvy/InsiderAdvantage 15 47 +0.8 +3.5
Research Co. 14 96 +5.2 +5.2
AtlasIntel 14 71 +1.0 +2.2
Harris Insights & Analytics 13 81 +3.2 +3.3
SSRS 12 75 +7.1 +7.1
Citizen Data 12 58 +7.0 +7.0
Quinnipiac University 12 63 +7.1 +7.1
Monmouth University 11 45 +10.1 +10.1
ABC News/The Washington Post* 7 71 +5.5 +5.5

*ABC News/The Washington Post had fewer than 10 qualifying polls but is listed for transparency since ABC News is FiveThirtyEight’s parent company.

First, let’s give a shout-out to the pollsters with the lowest average error. Those were AtlasIntel (2.2 percentage points), Trafalgar Group (2.6 points), Rasmussen Reports/Pulse Opinion Research (2.8 points), Harris Insights & Analytics (3.3 points) and Opinion Savvy/InsiderAdvantage (3.5 points).

These firms have a few things in common. First, none of them are primarily live-caller pollsters; instead, they use a various and sundry mix of methods — online, IVR (or interactive voice response; that is, an automated poll using prerecorded questions and text messaging.

Indeed, the live-caller polls didn’t have a great general election. There aren’t that many of them in the table above. But of the ones that did make the list, SSRS (a 7.1 percentage point average error), Quinnipiac University (7.1 points) and Monmouth University (10.1 points) all had poor general election cycles. Siena College/The New York Times Upshot (5.5 points) and ABC News/The Washington Post (5.5 points) did a bit better by comparison.

One thing you might notice about these non-live-caller pollsters who had a good 2020 is that some (though not all) have a reputation for being Trump- or Republican-leaning. This is in part for reasons beyond the polls themselves. For instance, the pollsters may like to appear on conservative talk shows or conduct polling on behalf of conservative-leaning outlets. But it’s also because, in 2020, they tended to show more favorable results for Trump than the average poll did.

Here, though, is where it’s important to draw a distinction between house effects and bias. House effects are how a poll compares with other polls. Bias is how the poll compares against the actual election results. And in the long run, it’s bias that matters; there’s nothing wrong with having a house effect if you turn out to be right! So in a year when most polls underestimated Trump and Republicans, the polls with Trump-leaning house effects mostly turned out to be both more accurate and less biased, although Trafalgar Group still wound up with a modest Republican bias (2.4 points).

Did Joe Biden get lucky in 2020? | FiveThirtyEight Politics Podcast

One more observation: Some of these pollsters probably deserve a bit more credit than they got. I say that even though there isn’t a lot of love lost between FiveThirtyEight and at least one of these polling firms: Trafalgar Group. Trafalgar Group has major issues with transparency, for instance, and we’ve criticized them for it. But their polling was pretty good last cycle, and they didn’t get a lot of credit for it because they happened to “call” some of the close states wrong. These pollsters often showed Biden narrowly losing states such as Wisconsin, Michigan and Pennsylvania that he instead narrowly won. Nonetheless, a poll that showed, for example, Biden losing Pennsylvania by 2 points was actually slightly closer to the mark than one that had him winning it by 7, given Biden’s final margin of victory there (1.2 points). So, yes, in some cases these pollsters were too bullish on Republicans, but not to the same extent that most other pollsters were too bullish on Democrats.

Of course, one could argue that these polling firms got lucky in a different respect. If your polls are always Republican-leaning, then you’re going to look like a genius whenever the polling averages happen to miss Republican support. You’ll be one of the worst-performing pollsters in other cycles, however.

I think this is a valid point … but only if a polling firm really does have a long track record of always leaning in the same direction. That’s an apt description for Rasmussen Reports/Pulse Opinion Research, for example, which has been Republican-leaning for many years. Trafalgar Group, however, is relatively new — their first entry in our polling database comes from the 2016 primaries. It’s hard to criticize them too much when, at least in 2016 and 2020, they were correct to show better results for Trump than the consensus of other polls. And for what it’s worth, the final Trafalgar Group polls also correctly showed Democrats winning the Georgia runoffs.

Perhaps one final lesson is that there is value in averaging, aggregating, and having inclusive rules for which polls are included. Some of the pollsters I mentioned above didn’t have terribly strong pollster ratings heading into the 2020 general election cycle, either because they were relatively new or they had mixed track records. But while our polling averages assign somewhat less weight to polls from firms with worse pollster ratings, we do include them and they can still have a fair amount of impact on our numbers. If we’d limited our polling averages only to so-called “gold standard” pollsters, they would have been less accurate. That brings us to our next topic.

Live-caller polls aren’t outperforming other methods.

Until this update, FiveThirtyEight’s pollster ratings were based on a combination of a pollster’s accuracy in the past plus two methodological questions: 

  • Does the pollster participate in industry groups or initiatives (defined more precisely below) associated with greater transparency?
  • And does the pollster conduct its polls via live telephone calls, including calls placed to cellphones?

Essentially, pollsters got bonus points for meeting these criteria — not out of the generosity of our hearts (although we do think that transparency is a good thing unto itself) but because these characteristics had been associated with higher accuracy historically.

As I’ll describe below, the transparency criterion still works pretty well. The live-caller-with-cellphones standard has become more of a problem, though, for several reasons.

For one, nearly all live-caller polls now include calls placed to cellphones. On the one hand, that’s good news since the clear majority of adults are now wireless-only. But it also removes a point of differentiation for us in calculating the pollster ratings. Polling cellphones is more expensive than polling landlines, so when some pollsters included them and others didn’t, it had served as a proxy for a pollster’s overall level of rigor in its polling operation. But now that everyone who does live-caller polls is calling cellphones, that proxy is no longer as useful.

Second, it no longer makes sense to designate an entire polling firm based on which methodology it uses. Polling firms switch methodologies from time to time; some former live-caller pollsters are moving online, for example. Moreover, many pollsters mix and match methods over the course of an election cycle depending on what sort of survey they’re conducting. In other words, the methodology is really a characteristic of a poll and not a pollster, so that’s how we’re now classifying it for purposes of the pollster ratings.

But, most importantly, there’s just not much evidence that live-caller polls are consistently outperforming other methods as far as poll accuracy goes.

In the table below, I’ve shown the advanced plus-minus score for all polls in our database since 2016 based on their methodology. Advanced plus-minus, described in more detail here, compares how a poll did with others of the same election type (e.g., other presidential primary polls) or, where possible, the same exact election (e.g., other polls of the 2020 Iowa Democratic caucus), controlling for the poll’s sample size and how close to the election it was conducted. The key thing to understand here is that negative advanced-plus minus scores are good; they mean that a poll had less error than expected based on these characteristics.

So which type of poll has been doing best? It’s sort of a mess:

How different polling methods stack up in the short term …

Advanced-plus minus scores for polls in elections from 2016-2020. Negative scores indicate more accurate polling.

Live Phone Calls No. Polls Advanced +/-
Any Live Phone Component 1,202 -0.0
Live Phone Hybrid 210 +0.5
Live Phone Exclusively 992 -0.2
IVR/Automated Phone Calls
Any Interactive Voice Response (IVR) Component 977 +0.1
IVR Hybrid 667 -0.3
IVR Exclusively 310 +0.7
online
Any Online Component 1,706 +0.4
Online Hybrid 671 -0.0
Online Exclusively 1,035 +0.7
Text messaging
Any Text Component 275 -0.1
Text Hybrid 245 -0.1
Text Exclusively 30 -0.4

Pollsters that are banned by FiveThirtyEight are not included in the averages.

Polls that include a live-phone component (alone or in conjunction with other methods) have an advanced plus-minus of 0.0 since 2016, as compared with polls with an IVR component, which have a score of +0.1. That’s not much different, obviously; it means the live-caller polls were about a tenth of a point more accurate. Meanwhile, polls with an online component had a score of +0.4. That’s a bit worse, but it’s not that meaningful a distinction statistically given that this category tends to be dominated by a few, large polling firms that have rather different track records from one another. Finally, polls that have a text-message component have an advanced plus-minus of -0.1, although this is a relatively new method and a fairly small sample of polls.

Of course, all of this is complicated by the fact that many polls are now using a mixture of  methods, such as combining IVR calls to landlines with an online panel. The mixed-mode method of polls seems to be doing fine, too. It is perhaps worth nothing, though, that pure IVR polls that don’t include an online component have struggled, with an advanced plus-minus score of +0.7 since 2016. This may be because such polls have no way to reach voters who don’t have landlines, as many states prohibit automated calls to cellphones. There may be an argument then for excluding landline-only polls from our averages going forward, although these have become rare enough that it may soon become a moot point.

What if we expand our sample to the entire pollster ratings database since 1998? Does that provide for clearer methodological winners and losers?

How different polling methods stack up in the long run

Advanced-plus minus scores for polls in elections from 1998-2020. Negative scores indicate more accurate polling.

Live Phone Calls No. Polls Advanced +/-
Any Live Phone Component 5,393 -0.1
Live Phone Hybrid 258 +0.3
Live Phone Exclusively 5,135 -0.1
IVR/Automated Phone Calls
Any Interactive Voice Response (IVR) Component 3,219 -0.3
IVR Hybrid 834 -0.3
IVR Exclusively 2,385 -0.3
online
Any Online Component 2,373 +0.3
Online Hybrid 790 -0.1
Online Exclusively 1,583 +0.4
text messaging
Any Text Component 275 -0.1
Text Hybrid 245 -0.1
Text Exclusively 30 -0.4

Pollsters that are banned by FiveThirtyEight are not included in the averages.

No, not really. Live-caller polls (alone or in combination with other methods) have an advanced-plus of -0.1 since 1998, versus a score of -0.3 for IVR polls. I think you could maybe argue that phone polls in general (live or IVR) have been more successful than online polls, which have an advanced plus-minus of +0.3 over the entire sample. But again, “online” is a broad category that spans a wide range of techniques — and some online pollsters have been considerably more accurate than others. The main takeaway seems to be that, with the possible exception of landline-only polls, in an environment where few voters use landlines, methodology alone doesn’t tell you all that much.

So for all these reasons, we’ll no longer be giving a bonus to live-caller pollsters in our pollster ratings. We don’t think it’s a particularly close decision, in fact.

But that’s emphatically not the same as saying that “anything goes” or that all polls are equal. For one thing, our research finds that pollsters that meet the transparency criterion still are outperforming others, so we’ll continue to use that. We sometimes refer to this as the “NCPP/AAPOR/Roper” standard because a pollster meets it by belonging to the (now largely inactive) National Council on Public Polls, by participating in the American Association for Public Opinion Research Transparency Initiative or by contributing data to the Roper Center for Public Opinion Research iPoll archive. (Unless it becomes active again, we’ll discontinue eligibility based on NCPP membership soon.)

Since 2016, polls from firms that meet the NCPP/AAPOR/Roper criteria have an advanced-plus minus score of -0.1, considerably better than the score of +0.5 for polls from other firms. And across the entire sample, since 1998, polls from NCPP/AAPOR/Roper firms have an advanced-plus minus of -0.4, as compared with +0.1 for those from other pollsters. Transparency is a robust indicator of poll accuracy and still counts for a lot, in other words.

Another check on the idea that “anything goes” — which we probably haven’t emphasized enough when discussing pollster ratings in the past — is that our ratings are designed to be more skeptical toward pollsters for which we don’t have much data. In calculating our averages, a pollster that hasn’t had any polls graded in our pollster ratings database is assumed to be considerably below average if it doesn’t meet the NCPP/AAPOR/Roper criteria. But this “new pollster penalty” gradually phases out once a firm has conducted around 20 recent polls.

And, of course, in the long run, the most important factor in our pollster ratings is that a polling organization is getting good results. The more polls a pollster conducts, the more its rating is purely a function of how accurate its polls are and not any assumptions based on what its methodological practices are.

So congratulations to the pollsters who had largely accurate results despite a difficult environment in 2020. And my sympathies to the ones who didn’t. Polling remains vital to the democratic experiment, and although I’m not a pollster, I know how frustrating it can be to be producing polls for a media environment that sometimes doesn’t get that.

Most of you will probably want to drop off at this point; there are just a few, largely technical notes to follow. Before you go, though, here’s the link again to the new pollster ratings, and here’s where you can find the raw data behind them.

Methodological notes

I thought I told you to leave and go enjoy the spring weather! But transparency is vital in our pollster ratings project, so we do want to note a few odds and ends that reflect changes in how the pollster ratings are calculated this year. These are in no particular order of importance:

  • As described earlier, we’re now classifying methodology based on the individual poll rather than on the pollster. In some cases, for polls we entered in our database long ago and didn’t record the methodology, we had to go back and impute it based on the methodology that the pollster generally used at that time. If you see any methodologies that you think are listed incorrectly, drop us a note at polls@fivethirtyeight.com.
  • We’re now excluding presidential primary polls if a candidate receiving at least 15 percent in the poll dropped out, or if any combination of candidates receiving at least 25 percent in the poll dropped out. Previously, we only excluded polls because of dropouts if one of the top two candidates in the poll dropped out. Only a small number of polls are affected by this change.
  • As described at length here, advanced plus-minus scores are calculated in several stages. In the first stage, we run a nonlinear regression analysis where we seek to predict the error in the poll based on some basic characteristics of the poll. The regression now includes the following factors: the poll’s margin of sampling error, the type of election (presidential general, presidential primary, U.S. House, U.S. Senate or governor), the number of days between the poll and the election, and the number of unique pollsters surveying the race. This set of variables has been slightly simplified from previous versions of our pollster ratings. 
  • Previously, in conducting the regression analysis described above, we fixed the coefficient associated with the poll’s margin of sampling error such that it matches the theoretical margin of sampling described here. In theory, for example, in a poll of 500 voters where one candidate leads 55-45, you should know exactly how much sampling error there is. Now, however, we’re deriving it from the regression rather than treating it as a fixed parameter. This is because we’ve discovered that empirically, a poll’s sample size is less important than it theoretically should be in contributing to a poll’s overall error. That is to say, for example, if you take a poll of 2,000 voters and compare it to one with 500 voters, it will tend to have less error, but there is not as much of a reduction in error as you’d expect, holding other factors constant. Why is this the case? It’s probably for a combination of reasons.
    • Demographic weights and other decisions the pollster makes provide information above and beyond what the sample size implies.
    • Pollsters may fail to publish results stemming from polls with small sample sizes that they perceive to be outliers.
    • Or they may herd toward other polls.
    • The sample size alone does not account for design effects.
    • And an increasing number of polls (especially online polls) use non-probability sampling, under which the assumptions of traditional margin-of-error formulas do not really apply.

In short, while you should pay attention to sample size and a pollster’s margin of sampling error, there are also a lot of things that these don’t tell you. At some point, we will probably also change how sample sizes are used in determining the weights assigned to polls in our polling averages.

  • Finally, we have slightly modified and simplified the formula for calculating predictive-plus minus, the final stage in our ratings, which is what the letter grades associated with each pollster are derived from. (Yeah, I know the formula below looks complicated, but it’s actually simpler than before.) The formula now is as follows:

PPM = (max(-2,APM+herding_penalty)*(disc_pollcount)+prior*(18))/((18)+(disc_pollcount))

In the formula, PPM stands for predictive plus-minus and APM stands for advanced plus-minus. The herding_penality is applied when pollsters show an unnaturally low amount of variation relative to other polls of the same race that had already been conducted at the time the poll was released; see description here.

disc_pollcount is the discounted poll count, where older polls receive a lower weight. A poll’s weight is calculated as 

Thus, for example, a poll conducted in 2020 will get full weight, a poll conducted in 2012 will get a weight of 0.56, and one from 1998 will have a weight of 0.20. 

Finally, prior is calculated as follows:

0.66-_isncppaaporroper*.68+min(18,disc_pollcount)*-.022

Where _isncppaaporroper takes on a value of 1 if a pollster meets the NCPP/AAPOR/Roper transparency standard and 0 otherwise.

That’s all, folks! Don’t hesitate to drop us a line if you have any other questions.

CORRECTION (March 25, 2021, 10:53 a.m.): Two tables in this article previously flipped the data for the primary and general elections. The two tables have been updated.

How the Atlanta attacks may activate Asian Americans politically

Why Republicans are happy to stoke culture wars? | FiveThirtyEight Politics Podcast



[ad_2]


Leave a Reply

Your email address will not be published. Required fields are marked *