The Echoes of the Mind (50-3-1) Probability Value

 Probability Value

In the 1920s and 1930s, English statistician and biologist Ronald Fisher mathematically combined Mendelian genetics with Darwin’s hypothesis of natural selection, creating what became known as the modern evolutionary synthesis, thereby establishing evolution as biology’s primary paradigm. Fisher’s work revolutionized the experimental design and the use of statistical inference.

In his approach, Fisher expressly wanted to avoid the subjectivity involved in Bayesian inference, which became popular in the 1980s and remains so. Bayes’ theorem is now badly abused in science, medicine, and law to conclude causation when this shaggy approach at best suggests conditional plausibility when critical data is missing.

Fisher statistically assessed significance using a probability value (p-value). The p-value simply suggests the probability that a proposed hypothesis is plausible.

The problem is that the p-value by itself is not of particular interest. What scientists want is a measure of the credibility of their conclusions, based on observed data. The p-value neither measures that nor is it part of the formula that provides it. ~ Steven Goodman

Scientists now use p-value as a backhanded way of determining whether their data and attendant conclusions are valid. This is a fundamental misconception.

This pernicious error creates the illusion that the p-value alone measures the credibility of a conclusion, which opens the door to the mistaken notion that the dividing line between scientifically justified and unjustified claims is set by whether the p-value has crossed a “bright line” of significance, to the exclusion of external considerations like prior evidence, understanding of mechanism, or experimental design and conduct. ~ Steven Goodman

Random variation alone can easily lead to large disparities in p-values. ~ Swiss zoologist Valentin Amrhein et al

Fisher used “significance” only to suggest whether an observation was worth following up on.

This is in stark contrast to the modern practice of making claims based on a single documentation of statistical significance. ~ Steven Goodman

p-values used in the conventional, dichotomous way decide whether a result refutes or supports a scientific hypothesis. Bucketing results into ‘statistically significant’ and ‘statistically non-significant’ makes people think that the items assigned in that way are categorically different. The false belief that crossing the threshold of statistical significance is enough to show that a result is ‘real’ has led scientists and journal editors to privilege such results, thereby distorting the literature. Statistically significant estimates are biased upwards, whereas statistically non-significant estimates are biased downwards. Consequently, any discussion that focuses on estimates chosen for their significance will be biased. On top of this, the rigid focus on statistical significance encourages researchers to choose data and methods that yield statistical significance for some desired (or simply publishable) result, or that yield statistical non-significance for an undesired result, such as potential side effects of drugs – thereby invalidating conclusions. ~ Valentin Amrhein et al