2016 Postmortem
Related: About this forumMore Analysis of 538 Results vs. Poll Averages: Is 538 providing added value?
Last edited Sun Sep 25, 2016, 10:58 PM - Edit history (3)
Continued from here.
Technical Stuff: So, I took jpegs of both 538 probability graphs and the Huffington Post poll trendlines (national) from the same interval, used numpy to convert the jpegs into arrays of data and then used a spline from scipy to put both curves onto the same 117 day grid. I then subtracted the Clinton trendline (HuffPo) from the 538 probability estimate for a Clinton Win. I did the same for the Trump numbers.
Results: As you can see, there is VERY LITTLE structure in the residuals (fluctuations after subtracting the poll average, left side plots) as a function of time. Certainly no meaningful temporal structure. So Silver's model extracts no trend apart from poll average. And plotting Trump residuals vs Clinton, it's clear there is no dependency between the two of them. It looks like, apart from the poll average trend, the fluctuations in 538 probabilites are close to zero mean gaussian (noise) process. Small changes in Clinton's win probability have nothing to do with small changes in Trump's probability.
Conclusions: If Silver's model were providing added value over simple poll averages one would expect that, after removing the poll average, a reduction in Clinton's numbers should show some dependence on an increase in Trump's numbers and vice versa. Clearly this is not the case.
Whole thing looks like a random walk.
What am I saying? Nate Silver has argued that his models extract the real signal masked by noisy data. The results here indicate that nearly all of that signal is the smoothed national poll average.
In short, this does NOT inspire confidence that fast changes in the 538 model are anything but noise.
zanana1
(6,110 posts)It made me feel like I'd lost 20 IQ points.
Loki Liesmith
(4,602 posts)Can I explain something better?
Persondem
(1,936 posts)Most people have no idea what a residual is ... etc. You explained it very well in the bottom part of your post.
Loki Liesmith
(4,602 posts)BzaDem
(11,142 posts)I still find a few things useful about the model:
1. It takes into account pollster quality, house effects, and other variables not directly accounted for by the average. Even if all of those end up canceling each other out in a poll average (which Nate himself admits is frequently the case), it helps me see the effects of particular polls in context. I like knowing whether a particular result is a "good" result or "bad" result for Clinton, and the top lines often do not tell us that without more info.
Perhaps I shouldn't ever be asking about particular polls, because the noise is too high. But what can I say -- I'm a political junkie. Polling averages take longer to converge (sometimes a lot longer), even after a major event effects the race in a way that matches our intuition. So I still look at individual polls to a limited extent (with a heaping dose of caution), and his model helps me make sense of them.
2. His model extrapolates poll results in some states (and national polls) to other states that for whatever reason, don't have recent poll data. This is one reason state poll averages take so long to converge -- state polls often are a lagging indicator, since there are a lot of states, and polling is time consuming/expensive. Sure, this extrapolation could be wrong, but according to him, his model is weighted heavily towards the real data once it comes rolling in. His model provides his best guess on what could be going on in the absence of current polling, which (to satisfy my own curiosity) I would otherwise have to do myself.
3. His model takes into account past results to a limited extent, even when they have been superceded by later results from the same pollster. This is actually somewhat of an open question, as to whether this is a good idea. We don't have enough data points to say either way. I think the intuition behind it is that if one candidate consistently polls ahead on average over a long period, that should not be completely discounted when the polls tighten up a bit (as things like temporary enthusiasm or lack thereof can affect current polls and then disappear later). Of course, we wouldn't want to take into account past results too much, since then the model would take too long to take into account new data. But most simple averages ignore old polls completely, since otherwise you have to decide how much weight to give them, and then it starts sounding suspiciously like a model.
Perhaps all this is bogus reasoning. I'm no statastician. But those are my thoughts on why I like Nate's model, even though it often matches the poll averages.
Persondem
(1,936 posts)to determine his adjusted numbers? From looking at FL the other day, it seemed that his trend line was really off, unless he was using his adjusted numbers (Seemed that the trend (recent polls) was in Clintons favor but according to Silver was a net +1.2% for Trump). If he is, that's a subtle way to cook the numbers a certain way. i suppose i need to see his exact procedure for determining the "trend" in the polls to know for sure.
Thoughts?
BlueInPhilly
(870 posts)He stacks it with macroeconomic variables and spurious coefficients that may or may not have real value.
In the long run, his model may probably be more accurate. But the noise is just too much and too unreliable for hourly forecasts. Macroeconomic variables, by nature, are long term metrics.
Loki Liesmith
(4,602 posts)my graphic appears to have disappeared...hmmm
geek tragedy
(68,868 posts)national level polling. If so, is it just that the state polls generally but noisily conform to the same general trends as the national polls?
DemocratSinceBirth
(99,710 posts)Peer reviewed research suggests simple models work the best:
http://www.sciencedirect.com/science/article/pii/S014829631500140X
BlueInPhilly
(870 posts)Occam's razor definitely applies to mathematical models.