Home
Services
QUIP Software
Support
Downloads
Facts & Opinions
Site Map
What's New
Contact Us

In recent years, marketing research survey analysts have increasingly tended to insist that banner tabulations show "significance tests" between many pairs of banner points. It has been my observation that, with few exceptions, this mostly indicates that the originator of the request does not really understand much about statistical significance.

The July 2000 issue of PC Magazine contains an excellent example of how statistical significance is all too often misused. For many years, the July issue of that magazine, which is distributed at the PC Expo in New York, has published the results of a "Service and Reliability" survey conducted by mail among a sample selected from the magazine's subscriber list.  These results are frequently quoted in other publications and by computer manufacturers in their advertising.

In the 2000 report, certain items display ratings in the form of symbols: a green up-arrow denotes "significantly above average," a gray horizontal bar denotes "average" and a red down-arrow indicates "significantly worse than average."   Other items are shown as  averages or percents. For the most part, there is no way to match the rating symbols with numeric data for any vendor.

Only the column titled "Units needing repair in the past 12 months" shows both a percent and the resulting rating symbol.  Reading down, one finds Dell rated "above average" overall, with 25% of units needing repair in the past 12 months, but e_machines, with 22%, and Apple, with 24%, are rated "average."  At the other end of the scale, "locally built computers," with 34% of units needing repair, displays the scarlet mark indicating "significantly worse than average," while Acer with 39%, AST with 36% and Packard Bell with 35%, are rated as "average." 

Since the table provides the number of responses for each vendor, one can easily establish what is happening here:  Dell has a base of 2,243 and "locally built" a base of 2,971, whereas for the "average" brands we have 207 for Apple, 129 for e_machines, 213 for Acer, 98 for AST and 360 for Packard Bell. Assuming no errors and ignoring methodological issues, one readily sees that those brands rated significantly better or worse than average owe this distinction more to the number of responses they received than to what those responses indicate.

The survey actually shows that e_machines and Apple rate better than average, while Acer, AST and Packard Bell rate worse than average on this particular measurement. However, they did not receive enough responses to determine the probability that this reflects the experience of all PC Magazine subscribers, within the arbitrary (and undisclosed) confidence limits selected for the test.

If I were looking to buy a computer, this result would certainly be "significant," to me, whether or not the survey results for these brands are "statistically significant."

Except for statisticians, few people understand that a significant difference between two sample groups indicates only that there is a probability greater than some arbitrarily chosen limit that the difference in the sample is not simply a random effect of sampling error. This is not at all the same as the probability that such a difference actually exists in the population, as is too often stated. 

Furthermore, without more information, statistical significance says nothing about whether such a difference, even if it does exist in the population, is meaningful to the analysis, nor does it provide an indication of causality.

Yet standard operating procedure today seems to be for analysts to plow through reams of data looking for statistically significant differences so that they can attempt to assign some kind of meaning to them. When faced with a pile of tables littered with significance markers and a deadline to issue a report, it is almost impossible to do otherwise. 

Even if you are far more statistically savvy than the editors of PC Magazine, this practice is just as likely to make you miss the true significance of what you are looking at.

_________________

In the interest of full disclosure, we should note that one of our clients handled the PC Magazine Service and Reliability surveys for Ziff-Davis for many years (although not in 2000) and that, while we never tabulated that project for them, we did help design the data sets and implement the significance testing as requested by the Ziff-Davis research department.

 

Copyright 1999-2006 Jan Werner Data Processing - Last modified: March 19, 2002