“100% of Bloggers are Handsome:” How Data Analytics and Reporting Can Go Wrong

Written by TrendSource | 4/12/17 3:43 PM

Part 2 of 2. Click here for Part 1

Six Ways to Ensure Accuracy in Data Analysis and Reporting

At every stage, market research—from planning through data collection all the way to reporting—is fraught with pitfalls, places where the wrong decision (however innocently made) can torpedo a program and provide inaccurate results. These innacuracies will ultimately lead businesses down an unfortunately misguided path. With so much at stake, companies have to get it right the first time and that means some serious planning.

Last time, we looked at how errors in data collection could lead to some pretty erroneous claims, in political polling and in market research alike. But that’s just the frontside. In the final stages of a research program, even one running perfectly up to this point, the wrong decisions in analytics and reporting can lead to claims that range from the spurious to the patently false. These are the types of mistakes that give you the Edsel, the Apple Newton, and perhaps even New Coke.

So what should you be on the lookout for? Here are six tips to ensure that data analytics and reporting do not lead you astray.

Pick the Right Data Analysis Model

Yes, there is more than one way to analyze raw data and knowing which one will best illuminate your data is important. An example related to correlation: in grade school, the taller students read at a higher level than the shorter ones, but that doesn’t mean height and intelligence are related—it just means that older kids read better than younger kids and older kids are also taller than younger kids. This is a distinction a correlation model would miss. And that’s just basic stuff. Within regression models, for example, one must choose between linear and logistic, and also decide if they will use a forest or decision tree model. And don’t even get me started on parabolas.

Don’t Combine Data Sets Just Because You Can

In advanced analytics, there will always be the temptation to pile data sets on top of one another and build one super set that supports broad and bold claims. Resist this temptation. Only combine like with like, not generally similar with generally similar. Example: you know from your research that, in California, 20% of bloggers still live with their parents, in Oregon 25% of bloggers live with their parents, and in Washington it is 30%. This is as far as your data can take you. You cannot claim that 25% of Pacific Coast bloggers live with their parents. Confused? Consider the states’ population—20% of California is way bigger than even 30% of Washington.

Don’t Combine Data Sets Just Because You Can, Part 2

It gets worse. Recently we came across a Back-to-School Study that asked respondents how much they consider social media when making electronics, clothing, and school supply purchases. It found that roughly 10% did so with school supplies, 20% with clothing, and 70% with electronics. And all that’s fine. The study overreached, however, and went on to claim that 30% of Back-to-School shoppers were consulting social media before making purchases. This, for what should be obvious reasons, is nowhere near accurate. Three discrete categories were combined to make an artificial “back-to-school shopper” category, in which electronics respondents inflated the average.

Always Show Your Statistics and Work

Analysts and statisticians should always show the n value (sample size) on every single graph, chart, or any other data-based claim, visual or otherwise. This is because percentages alone tell you little to nothing. Example: 100% of bloggers love their job. Great. But how many bloggers are we talking about? 100,000? 1 million? Just the one writing these words (sometimes)? There is simply no way to tell without showing your work.

Avoid P-Value Hacking and Sample Size Shrinking

Sample size manipulation is another hallmark of shady data analysis. Without getting too into the weeds on this one, sample sizes can be reduced to manipulate the results. Check out FiveThirtyEight’s explainer on data hacking, complete with an interactive model!

Careful with Interpretations

Data and analytics can be confusing, especially for the uninitiated. So in reporting, it is paramount to be clear and accurate. Saying that 55% of bloggers are handsome is nowhere near the same thing as saying 55% of handsome people are bloggers. It seems obvious but this is easy to whiff. So it’s easy to see how a market researcher could erroneously claim that 20% of middle-aged men love your product when, in actuality, it is that 20% of people who love your product are middle-aged men.

That should give you some idea of what should be on your radar. Yet with something so intricate and significant, perhaps it is best left to the professionals?

View full post