Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Quixote1818

(28,926 posts)
Mon Nov 2, 2020, 10:14 PM Nov 2020

If you combine 10 polls and average them out, doesn't the margin of error become pretty small?


For example taking 10 polls in PA putting Biden up around 5.5%. What kind of margin of error with that many polls?


Any statistics experts here?
13 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies

relayerbob

(6,544 posts)
1. Depending on the values involved, it would reduce it some if the are all pretty close.
Mon Nov 2, 2020, 10:18 PM
Nov 2020

If the ten polls were widely divergent, however, it could techincally increase., but unlikely in this situation

 

mr_lebowski

(33,643 posts)
2. It would inherently potentially shrink it, by how much would depend on how the total sample size
Mon Nov 2, 2020, 10:19 PM
Nov 2020

changes.

Believe it or not, the confidence interval that you get from 5000 samples isn't that much smaller than the value of 1000 samples, just as a for instance. OTOH the difference between 10 and 50 samples is HUGE.

 

Hoyt

(54,770 posts)
4. The Margin of Error does tend to decrease with a larger sample size. But how the question is asked,
Mon Nov 2, 2020, 10:21 PM
Nov 2020

who/whom is in the pollster's universe, how the sample is drawn, other bias, etc., make we wonder if it really means much for this election.

Not claiming to be an "expert," just someone who occasionally has to rectally extract an answer to questions on sampling.

regnaD kciN

(26,044 posts)
6. There are two different factors...
Mon Nov 2, 2020, 10:30 PM
Nov 2020

1) Increasing sample size does lower the MoE.

2) However, that's assuming all the polls use a reliable likely-voter model, or ones that cancel out each other's errors. If, for whatever reason, they all make the same flawed assumption, it's of no help at all. To give a prime example: in 2016, the problem was that the polls weren't weighted for education level. Now, they are, which should reduce the likelihood of a 2016-level error. However, even if all these polls use educational weighting, if they all assume that "white males without a college education" will make up 15% of the electorate, and they turn out to make up 20% of those who actually vote, then that will introduce a pretty serious polling error, no matter whether they used education level in their polling or not.

unc70

(6,110 posts)
7. Short answer: not really; long answer, don't try it at home
Mon Nov 2, 2020, 10:31 PM
Nov 2020

Increasing sample size, including by pooling, might reduce margin of error, but not always and not by as much as one might expect. And it is unlikely to be allowed statistically.

Trying to combine mismatched samples of supposedly similar populations puts one well on the way to committing statistical sin and experimental design malpractice.

Just say no (even if Excel or even SAS might let you).

Blue_true

(31,261 posts)
8. Depends upon the MOE of each poll.
Mon Nov 2, 2020, 10:34 PM
Nov 2020

The MOE in the case you laid out would be an average, so it would be larger than some polls, smaller than others.

 

Drunken Irishman

(34,857 posts)
9. People get too caught up on the MOE.
Mon Nov 2, 2020, 10:38 PM
Nov 2020

The MOE is only there to let you know that if this poll is correct, it can fluctuate based on that total.

But it doesn't inherently make the poll valid.

You can have a poll with a MOE that is +/- 2.5 that has Trump up 3 in a state he inevitably loses by ten - or, well outside the MOE.

What averaging can do is create a consensus, which is more accurate than picking and choosing what polls are right or wrong. This is because it neutralizes the outliers - polls like the aforementioned that may have Biden down three, when every other poll has him up in the state - with some showing a significant lead.

Because, at the end of the day, if you have ten polls, it's likely six or seven are hitting a consensus within a couple points, while the other three or four might be major outliers.

If you average 'em, the outliers are swamped by the consensus.

dsc

(52,155 posts)
10. It isn't as direct as that
Mon Nov 2, 2020, 10:57 PM
Nov 2020

You can't treat the polls as one giant sample, nor can you assume independence of the polls (there is a known herding effect). That said, the national polls were correct in 2016 (certainly well within the MOE) the problem was that Hillary over performed where it did her no good (Texas, California, New York, Arizona among others) and underperformed where it was deadly (Pennsylvania, Wisconsin, and Michigan). That was largely because of the weighting issue which did affect state polls. It should be noted that people who were polling CD were seeing the problem she had.

The polls are quite likely accurate this time and Biden seems to be in very good shape. They would have to be historically off for him to lose at this point. I don't think they are. The CD polls agree, the state polls agree.

But yes, an average of polls is a better predictor than any one poll would be over time. It just isn't like you now have a sample of 10,000 instead of 1,000 if you take 10 polls with 1,000 and average them. Incidentally the relationship is square root so even if this were true the MOE would shrink by 2/3 not 9/10

kurtcagle

(1,602 posts)
11. Dartboards
Mon Nov 2, 2020, 11:07 PM
Nov 2020

Imagine each poll as a cluster of darts thrown by players of different skill, but no one can actually see the board itself, only the ring around the board. Really good players will tend to cluster near the bullseye, not so good players may have a more scattershot distribution with the answer somewhere in the middle. Some of the players will also tend to naturally pull to the left or right uniformly, creating bias.

An aggregator looks at the past history of a given pollster and assigns a weight to each pollster that reflects how closely they hit their mark in the past. When the variance (the spread) is high, either because the population is undecided or there is a hidden variable that isn't being taken into account, then the margin of error goes up. Small sample sizes can also make the MoE go up, but once you get beyond a sample size of 700 or so, sample size has relatively little effect on MoE.

A factor that can make a bigger difference is stability. In 2016, the lead kept shifting back and forth between Clinton and Trump, meaning that there were hidden factors that no one was taking into account - in this case, educational level. This meant that there was critical information that was lost when the different sets were averaged together. Once that factor was identified and properly weighted, the models were actually pretty good, as it explained a lot of the variability.

The PA polls at this point are all very consistent, meaning that the variance is comparatively small. This makes it easier to compare apples to apples. Now Pennsylvania is complicated because it is basically three states - one that looks a lot like New England, one that looks small urban Midwestern, and one that looks like Kentucky. This means that when you sample PA, you have to be cognizant of what part you're sampling from and how you're doing the sampling. If you put too much weight in the middle, then PA should be a great state for Trump, but both east and west PA look far better for Biden. The larger the sample size (which is essentially what an election is), the more likely the state goes for Democrats, but Democrats tend to vote less often than Republicans unless they are highly motivated to go to the polls, as I believe is the case this year.

So, to answer your question, it's complicated. Yes, combining polls can reduce the MoE, but only in certain situations, and even then not always consistently.

keithbvadu2

(36,731 posts)
12. Poll accuracy
Tue Nov 3, 2020, 12:44 AM
Nov 2020

Poll accuracy

I read once that poll accuracy is determined by the number of folks polled.

Take the square root of the number polled.

Divide 100 by that square root.

It gives you the approx +/_ accuracy.

400 folks polled is +/_ 5% accurate.

1600 folks polled is +/_ 2.5% accurate

Very rough and simplistic.

A statistics class would hammer you with more details.

Especially; are the folks polled a valid cross section of the voters?

Or; are they mostly true believers in the issue/candidate or mostly non-believers?

Is it a neutral poll or a push poll that guides you toward the answers they really want?

(from Statistics 101 class) A telephone poll taken in the 20s or 30s indicated that a certain candidate would win.
Didn’t happen. The other guy won.
Turns out that it was not a sample of the general population.
Only the relatively well-to-do could afford phones so they did not sample the ‘common folk’.

Latest Discussions»General Discussion»If you combine 10 polls a...