A few weeks ago I was analysing data from an experiment I’d been working on for a while, when the literal worst thing possible happened (ok, exaggeration). P=0.0649. Insignificant findings. Any other PhD students reading this will sympathise, because at that moment, the world seems to be imploding. After internally screaming for a few minutes, I started to think about this. Why is this number the be all and end all of my work when I can clearly see visual trends, and, when I know what experimental limitations has caused this statistical insignificance?
For anyone not familiar with P values, a P value is the outcome of some fancy statistical equations which judges the strength of a scientific hypothesis or evidence. Generally, it is accepted that a P value of less than 0.05 points to ‘significant’ findings (the null hypothesis can be rejected), i.e. the findings from your data are ‘real’ and not down to chance or a fluke. Translated into layman’s terms, a P value of 0.05 means that you can be 95% confident that your scientific findings are true. Statistical tests have been used for years by scientists to almost ‘prove’ their research is dependable in publications, however there are many inherent problems in this P value addiction.
Firstly, a P value doesn’t provide solid evidence, and doesn’t show the whole picture. P values are just a part of the story, and should be considered along with other evidence. There have been numerous studies that have demonstrated how significant P values can actually result in false positives.
Secondly, P values can be misused, whether this be on purpose or by accident. Statistical analysis is extremely difficult to perform – there are many variables, different tests, and a wealth of parameters to consider. However, every researcher is expected to do these calculations, often with little help. This is how mistakes can be made and false conclusions drawn. And, data can be manipulated to show what the researcher wants it to show; it is possible to run a large number of statistical tests on a dataset, and pick the one that gives a significant P value. This is drastic, but it does happen.
My worry is that, statistically significant findings are not always a marker of ‘good’ or ‘robust’ research. Insignificant findings can sometimes be just as useful as significant findings. Having said that, statistics is still a great tool for researchers and shouldn’t be neglected, and if in doubt, try to get your stats checked with a statistician. The problem is that researchers want CONFIDENCE. They want to report their findings with certainty, and P values are the only way of quantifying this. In a ‘publish or perish’ community, showing ‘visual trends’ is simply not enough proof.
(I am not a statistician, and I’m aware that this post glazes over quite a huge topic or omits some technical terminology, however there are some great articles linked below, particularly the first source.)
No comments:
Post a Comment