Hypothesis Testing for Business Decisions
Go No Go Decision Making
In college, I struggled with statistics. Professors seemed to want to teach us the “what” and “how” of statistics, but not the “why.” They used “not” language to describe results: “We cannot reject the null hypothesis.” People struggle with understanding the meaning of sentences containing the word “not.” I confess, I am one of them.
My wife speaks “not” language fluently. If I ask her if she wants to go see a movie and have dinner, she will reply “I wouldn’t want to see anything too violent and I wouldn’t want to eat too much.” We’ve been married almost 30 years, but it’s probably because I took a course shortly after we were married that taught me how to speak “not” language.
What does this have to do with go-no go decision making? A lot I think.
Hypothesis Testing in the Real World
A statistician recently commented about a published paper using the QI Macros Z-Test. He wrote:
The result was not significant, thus the authors concluded "the means are the same". Obviously this is wrong, as the null hypothesis can never be accepted; only rejected or fail to be rejected. But the authors insisted they have proven that the means are the same because the QI Macros software explicitly stated this: "Means are the same".
(Note: QI Macros also stated that it “Cannot reject the null hypothesis.”)
From a purely statistical point of view, he is correct. As the American Statistical Association points out:
You cannot prove the null hypothesis; you can only reject it at some level of significance. “By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.” You can find out more about ASA’s position on p-values here.
From a real world, real-time need for good/bad, pass/fail, go/no go decision making, it’s not that helpful. Few people understand these nuances. Fewer still can remember these nuances after their Six Sigma training. They want to know what it means when we say “reject the null hypothesis” or “cannot reject the null hypothesis (accept the null hypothesis).”
If you cannot reject the null hypothesis (accept the null hypothesis) that Mean 1 = Mean 2, then for the purposes of decision making, the means are the same. Note that I did not say that the means are equal. For that you could use an equivalence test.
Here’s an Example Using QI Macros Test Data.
Notice that all statistics packages will give you p value, but leave you to draw your own conclusions. This is fine for statisticians, but challenging for everyone else.
If you reject the null hypothesis that Mean 1 = Mean 2, then from a decision-making point of view, the means are different. But wait; from a statistical point of view, we can’t prove that either. If your p value is 0.001, you still have one chance in 1,000 of being unable to reject the null hypothesis. If your p value is 0.06, you reject the null hypothesis, but you can still be 94% confident the means are different. Is that good enough? If the p value is greater than 0.5, the means are more similar than different. If the p value is less than 0.5 the means are more different than similar. What level of risk can you tolerate?
All of this seems to be a way of hedging our bets, a way of being able to avoid the blame. If we cannot say the means are the same or different, why bother using hypothesis testing in the first place?
Perhaps the writers of the paper could have said the means are not different. But wait, if the means are not different, doesn’t that mean they are the same? Perhaps they should have used an equivalence test.
Here’s an equivalence test using the same data, QI Macros will say that we “cannot conclude the means are equivalent.” So the means aren’t different, but they aren’t equivalent either.
My goal has never been to turn the masses into statisticians; my goal is to help the average user get actionable insights from their data to improve their product or service.
Statisticians probably think I’m an idiot. The lay user finds QI Macros helpful because it provides not only p values but also analysis, both statistical and actionable. I think this is a feature, not a bug.
Here's My Point
Stop quibbling. Take a stand. In or out; good or bad; pass or fail, go or no go. The means are either the same or different. Use the results to make a decision about how to improve your process and move on.
Rights to reprint this article in company periodicals is freely given with the inclusion of the following tag line: "© 2017 Jay Arthur, the KnowWare® Man, (888) 468-1537, firstname.lastname@example.org."