Polls cont.

January 14, 2008

When designing a survey, one can determine the number of individuals necessary to be sampled for a specified margin of error. Suppose the pollsters want a margin of error of \pm 3 points. The width of a confidence interval with (1-\alpha ) level of significance for a binomial random variable is computed by using the following equation,
w=2z_{\frac{\alpha }{2}}\sqrt{\frac{\theta (1-\theta)}{n}}.
Solving for n we get a function of the number of individuals needed to be sampled for a given confidence interval width and proportion,
n=\left( \frac{2z_{\frac{\alpha }{2}}}{w}\right) ^{2}\theta (1-\theta ).
We can substitute 0.06 for w and using calculus we know that the preceding function is maximized when the proportion, \theta , is equal to one-half. Setting \alpha to a moderate 0.05 and computing for n we get the number of individuals needed to be sampled
n=1067
Around 1,000 people sampled from the population will provide a 6 point width in the confidence interval and a margin of error of \pm 3 points. Once the population is sampled and the parameters are estimated then what conclusions can be made? The computations return an estimate of the proportion, \hat{\theta}, and a \pm 3 point confidence interval around the estimate. The interpretation of this confidence interval is tricky to understand and often misinterpreted in the media. The interpretation is this: If 100 independent random samples were taken from the population, in 95 of those samples the parameter estimate would lie within the confidence interval. The true parameter value of the proportion may or may not lie in that interval. The likelihood of the study is designed to capture the true parameter value but it is far from a certainty. And that is what the pundits and journalists should remember. Try not to draw too much information from a poll that is statistically derived. Or at least understand the uncertainty involved in such games.

With the Presidential Primaries in full swing it’s important to review the statistical nature of polls. The media can sometimes overestimate the value of information from these surveys and rely on a tool that is a random measure. First issue, is the sample biased? The underlying assumption that is relied upon to conduct a scientific poll is the random sample. If the sample generated is not random then the results should be examined closely. Second issue, what is the underlying population that is being sampled? The population needs to be clearly defined and should be reported when citing results from a poll. Is the population sampled registered voters, party affiliations, people listed in the phone book? The poll is designed to infer characteristics about the population and if there is confusion about the precise definition then interpretation of the results can be misleading. Third issue, the media often misinterpret the meaning of significance and confidence levels. The final issue here will be addressed in an example in the next post.