Growing Concern About Statistical Errors Triggers Statement On
P-Values
American Statistical Association Wants
To Change How Scientists Use Statistical Inference
“We teach it because it’s what we do; we do it because
it’s what we teach.” It is this type of circularity and other concerns
coming to the attention of the American Statistical Association (ASA)
in 2014 which prompted a decision by the ASA Board to develop a
policy statement on p-values and statistical significance. The ASA
goal was “to shed light on an aspect of our field that is too often
misunderstood and misused in the broader research community.”
Funny Video
To
illustrate this confusion, the journalist Christie Aschwanden
shared a funny video in one of her recent articles at
fivethirtyeight.com about the lack of understanding even scientists
have about the definition of p-value [go to
https://tinyurl.com/pv62zro
and click on the short video].
Controversial Topic
According to ASA, the statement development process was
lengthier and more controversial than anticipated. In addition to the
statement, ASA invited commentaries from a variety of investigators,
some of them such as Sander Greenland and Ken Rothman
commenting individually as well as participating in a multi-authored
commentary. Titles of the single author commentaries include:
·
“It’s
Not the P-values’ Fault”,
·
“P
Values Are Not What They Are Cracked Up To Be”,
·
“Is
Reform Possible Without A Paradigm Shift?”
·
“Don’t
Throw Out The Error Control Baby With The Bad Statistics Bathwater”,
and
·
“Disengaging From Statistical Significance”.
Longer Paper
The longer multi-authored contribution is entitled
“Statistical Tests, P-values, Confidence Intervals, and Power: A Guide
To Misinterpretations” co-authored by Sander Greenland, Stephen
Senn, Kenneth Rothman, John Carlin, Charles Poole,
Steven Goodman, and Douglas Altman. It addresses no less
than 25 misinterpretations and provides a closing set of guidelines
(See link and note at the end of this article).
According to
ASA, “Nothing in the ASA statement is new. Statisticians and others
have been sounding the alarm about these matters for decades, to
little avail. What is new is that ASA has never before issued guidance
on a matter of statistical practice.
With this statement, ASA is hoping “to draw renewed and
vigorous attention to changing the practice of science with regards to
the use of statistical inference.”
Set Of Principles
The ASA statement presents a set of principles to guide
the conduct or interpretation of science. They are:
1. P-values can indicate how incompatible the data are
with a specified statistical model.
2. P-values do not measure the probability that the
studied hypothesis is true, or the probability that the data were
produced by random chance alone.
3. Scientific conclusions and business or policy
decisions should not be based only on whether a p-value passes a
specific threshold.
4. Proper inference requires full reporting and
transparency
5. A p-value, or statistical significance, does not
measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure
of evidence regarding a model or hypothesis.
Guide To
Misinterpretations
The Guide to Misinterpretations
written by Greenland and colleagues includes at least 14 such
misinterpretations related to single p-values, 4 related to P-value
comparisons and predictions, 5 related to confidence intervals, and 2
common misinterpretations related to power calculations. It offers the
following guidelines to minimize the harms of current practice:
1. Correct and careful interpretation of statistical tests
demands examining the sizes of effect estimates and confidence limits,
as well as precise P-values.
2. Careful interpretation also demands critical examination
of the assumptions and conventions used for the statistical
analysis—not just the usual statistical assumptions, but also the
hidden assumptions about how results were generated and chosen for
presentation.
3. It is simply false to claim that statistically
non-significant results support a test hypothesis, because the same
results may be even more compatible with alternative hypotheses—even
if the power of the test is high for those alternatives.
4. Interval estimates aid in evaluating whether the data are capable
of discriminating among various hypotheses about effect sizes, or
whether statistical results have been misrepresented as supporting one
hypothesis when those results are better explained by other
hypotheses.
5. Correct statistical evaluation of multiple studies
requires a pooled analysis of meta-analysis…all the earlier cautions
apply.
6. Any opinion offered about the probability ,
likelihood, certainty, or similar property for a hypothesis cannot
be derived from statistical methods alone.
7. All statistical methods…make extensive assumptions about the
sequence of events that led to the results presented—not only in the
data generation, but in the analysis choices…research reports should
describe in detail the full sequence of events that led to the
statistics presented…
[Ed. Note:
·
To
access the ASA statement, go to:
https://tinyurl.com/hu8ut6l
·
To
access the 20 supplemental commentaries published with the statement,
go to:
https://tinyurl.com/z443259
·
To
access the Greenland and colleagues Guide to Misinterpretations, go to
the link above this sentence, go to the very bottom of the page,
locate the box showing the number 21, and click through to #21 for the
Guide to Misinterpretations.]
|