2016 Postmortem

Loki Liesmith

(4,602 posts) Sun Sep 25, 2016, 05:16 PM Sep 2016

More Analysis of 538 Results vs. Poll Averages: Is 538 providing added value?

Last edited Sun Sep 25, 2016, 10:58 PM - Edit history (3)

Continued from here.

Technical Stuff: So, I took jpegs of both 538 probability graphs and the Huffington Post poll trendlines (national) from the same interval, used numpy to convert the jpegs into arrays of data and then used a spline from scipy to put both curves onto the same 117 day grid. I then subtracted the Clinton trendline (HuffPo) from the 538 probability estimate for a Clinton Win. I did the same for the Trump numbers.

Results: As you can see, there is VERY LITTLE structure in the residuals (fluctuations after subtracting the poll average, left side plots) as a function of time. Certainly no meaningful temporal structure. So Silver's model extracts no trend apart from poll average. And plotting Trump residuals vs Clinton, it's clear there is no dependency between the two of them. It looks like, apart from the poll average trend, the fluctuations in 538 probabilites are close to zero mean gaussian (noise) process. Small changes in Clinton's win probability have nothing to do with small changes in Trump's probability.

Conclusions: If Silver's model were providing added value over simple poll averages one would expect that, after removing the poll average, a reduction in Clinton's numbers should show some dependence on an increase in Trump's numbers and vice versa. Clearly this is not the case.

Whole thing looks like a random walk.

What am I saying? Nate Silver has argued that his models extract the real signal masked by noisy data. The results here indicate that nearly all of that signal is the smoothed national poll average.

In short, this does NOT inspire confidence that fast changes in the 538 model are anything but noise.

11 replies

= new reply since forum marked as read

Highlight:

More Analysis of 538 Results vs. Poll Averages: Is 538 providing added value? (Original Post) Loki Liesmith Sep 2016 OP

Excellent analysis zanana1 Sep 2016 #1

Sorry? Loki Liesmith Sep 2016 #2

It's probably the first 2 paragraphs. Persondem Sep 2016 #3

Added some (hopefully) clarifying text Loki Liesmith Sep 2016 #5

Even if the model is not much more predictive than a poll average BzaDem Sep 2016 #4

Got question for you. I know it's a stretch, but could Silver be using his adjusted numbers Persondem Sep 2016 #6

No BlueInPhilly Sep 2016 #7

nods Loki Liesmith Sep 2016 #8

My understanding is that they look at the state poll averages which informs geek tragedy Sep 2016 #9

Peer reviewed research suggests simple models work the best. DemocratSinceBirth Sep 2016 #10

I agree BlueInPhilly Sep 2016 #11

zanana1

(6,110 posts)

1. Excellent analysis

Reply to Loki Liesmith (Original post)

Sun Sep 25, 2016, 05:28 PM

Sep 2016

It made me feel like I'd lost 20 IQ points.

Loki Liesmith

(4,602 posts)

2. Sorry?

Reply to zanana1 (Reply #1)

Sun Sep 25, 2016, 05:30 PM

Sep 2016

Can I explain something better?

Persondem

(1,936 posts)

3. It's probably the first 2 paragraphs.

Reply to Loki Liesmith (Reply #2)

Sun Sep 25, 2016, 08:26 PM

Sep 2016

Most people have no idea what a residual is ... etc. You explained it very well in the bottom part of your post.

Loki Liesmith

(4,602 posts)

5. Added some (hopefully) clarifying text

Reply to Persondem (Reply #3)

Sun Sep 25, 2016, 10:59 PM

Sep 2016

BzaDem

(11,142 posts)

4. Even if the model is not much more predictive than a poll average

Reply to Loki Liesmith (Original post)

Sun Sep 25, 2016, 09:45 PM

Sep 2016

I still find a few things useful about the model:

1. It takes into account pollster quality, house effects, and other variables not directly accounted for by the average. Even if all of those end up canceling each other out in a poll average (which Nate himself admits is frequently the case), it helps me see the effects of particular polls in context. I like knowing whether a particular result is a "good" result or "bad" result for Clinton, and the top lines often do not tell us that without more info.

Perhaps I shouldn't ever be asking about particular polls, because the noise is too high. But what can I say -- I'm a political junkie. Polling averages take longer to converge (sometimes a lot longer), even after a major event effects the race in a way that matches our intuition. So I still look at individual polls to a limited extent (with a heaping dose of caution), and his model helps me make sense of them.

2. His model extrapolates poll results in some states (and national polls) to other states that for whatever reason, don't have recent poll data. This is one reason state poll averages take so long to converge -- state polls often are a lagging indicator, since there are a lot of states, and polling is time consuming/expensive. Sure, this extrapolation could be wrong, but according to him, his model is weighted heavily towards the real data once it comes rolling in. His model provides his best guess on what could be going on in the absence of current polling, which (to satisfy my own curiosity) I would otherwise have to do myself.

3. His model takes into account past results to a limited extent, even when they have been superceded by later results from the same pollster. This is actually somewhat of an open question, as to whether this is a good idea. We don't have enough data points to say either way. I think the intuition behind it is that if one candidate consistently polls ahead on average over a long period, that should not be completely discounted when the polls tighten up a bit (as things like temporary enthusiasm or lack thereof can affect current polls and then disappear later). Of course, we wouldn't want to take into account past results too much, since then the model would take too long to take into account new data. But most simple averages ignore old polls completely, since otherwise you have to decide how much weight to give them, and then it starts sounding suspiciously like a model.

Perhaps all this is bogus reasoning. I'm no statastician. But those are my thoughts on why I like Nate's model, even though it often matches the poll averages.

Persondem

(1,936 posts)

6. Got question for you. I know it's a stretch, but could Silver be using his adjusted numbers

Reply to Loki Liesmith (Original post)

Mon Sep 26, 2016, 06:00 PM

Sep 2016

to determine his adjusted numbers? From looking at FL the other day, it seemed that his trend line was really off, unless he was using his adjusted numbers (Seemed that the trend (recent polls) was in Clintons favor but according to Silver was a net +1.2% for Trump). If he is, that's a subtle way to cook the numbers a certain way. i suppose i need to see his exact procedure for determining the "trend" in the polls to know for sure.

Thoughts?

BlueInPhilly

(870 posts)

7. No

Reply to Loki Liesmith (Original post)

Mon Sep 26, 2016, 06:33 PM

Sep 2016

He stacks it with macroeconomic variables and spurious coefficients that may or may not have real value.

In the long run, his model may probably be more accurate. But the noise is just too much and too unreliable for hourly forecasts. Macroeconomic variables, by nature, are long term metrics.