‘Correlation does not imply causation’ is a statistical mantra. Most good high school and undergraduate statistics courses teach this, and most good science bloggers, journalists and scientists repeat it over and over again. But when and how far does that mantra extend into regression model territory? And what of the no-man’s-land surrounding this mysterious terra statistica?
Causal language refers to definitive statements that describe a cause and effect between two variables. It is in the same camp as the active voice, which is increasingly being promoted as the ‘way to write’ for scientists. Passive voice and non-directional language, once the standard of scientific writing, are now seen by some as vague, ambiguous and open to misinterpretation. But in our rush to be active, confident and ‘own’ our research results, are we risking misinterpretation and misunderstanding of science at the other end of the scale? “Building more roads increases bee abundance” might sound dramatic, convincing and galvanising…but it doesn’t mean quite the same thing as “Bee abundance was associated with the number of grassy road verges in the landscape”.*
I’ve just reviewed a manuscript (for a pretty good general science journal) in which the authors made a simple mistake that most of us have made at some point: they measured a handful of small-scale interacting variables at various sites across an ecosystem type, ran a generalised linear model with those variables, and then implied causation across broadly-relevant landscapes, animal communities and ecosystem functions. They used a sampling method that was not specifically designed to measure the variable they were interested in (so they were missing some of the information they needed to make the claims they wanted to make). Plus, the handful of variables they measured were based on ecological interactions that are known to be influenced by multiple, or more significant, environmental drivers that weren’t considered in their study.
I am a relative academic novice, but this is not the first manuscript I’ve reviewed that has taken this approach. The general topic and study aims are always interesting and topical, and the Introduction well-written and reasonable, so I usually get halfway through the Methods with positive thoughts before alarm bells ring. The Discussion then becomes a wild affair, claiming the moon and stars with little or no evidence to back up the connection.
Am I crazy, or being too picky? I don’t believe so. Like most peer reviewers, I am not a statistician, and I don’t claim to be any kind of expert in advanced statistics or modelling techniques. But I know the basic tenets of ecological data analysis, and implying causation from correlation is not one of them. Why do apparently experienced researchers submit papers that do this, especially where multiple confounding factors are involved, but not acknowledged by the study design or analysis methods? And why do editors send these papers out for review?
Is it because this is how we are taught science from an early age? When I was studying my undergraduate science degree, I wrote lots of ‘cause and effect’ papers as assignments. Simple, controlled experiments are the most effective way to teach science at this level, where results need to be obtained within a 3 hour practical session, or a few weeks of the teaching semester. Light affects seedling growth in a greenhouse, oxygen levels affect fish-tank communities etc. But this isn’t always reality, and it rarely applies to field ecology studies. Like many young researchers, I learned this the hard way when I started my PhD.
Data collected through field ecology studies are rarely controlled, due to many factors, including time, money and the impossibility for a researcher to be in multiple places at once. This is completely fine (as long as the data are analysed appropriately), and this is what makes ecology so exciting. But the study design and data analyses need to acknowledge this and factor it into the study. Even then, it can still be difficult to make definitive statements.
Is it because of the modern obsession with predictive modelling? As the number of published papers based on data modelling increases, could newer researchers assume that they need to use similar modelling techniques to remain ‘current’ or contribute to ecological theory and knowledge? This type of modelling is not suited to all data types or all research questions, but few contemporary modelling-focused studies clarify this. If you do a quick Google search of regression and causality, you will find many apparently reputable blogs and statistical sites that appear to promote the use of regression to make ‘strong’ assumptions about research data, without clarifying the dirty details.
Or is it because of the modern obsession with novelty and certainty in science? Although there are great journals that don’t use novelty as the main criterion for acceptance of a paper, the current culture around popular acceptance of science belies this. It is ingrained in researchers, through media, academia, politics and institutional goals that we need to produce ‘wow’ research to get ahead and have an impact. This article suggests that this might have unintended consequences by encouraging researchers to claim novelty where there is none, simply to meet this criterion.
Is it a combination of all of these factors? Most likely. Because we live in a complex world, and very rarely is one isolated factor ever the single cause of anything. Of course, it’s nigh impossible to identify every interaction and causal factor in a single system. But ecology benefits from studies that acknowledge limitations, state what they didn’t study and identify peripheral factors that need further attention. Too much generalisation and overuse of ‘active’ causal language can increase misunderstanding about the process of scientific research and reduce emphasis on proven causal relationships.
* This is a fictional statement based on fact.
© Manu Saunders 2015
Interesting – I’d never connected active/passive and causation/correlation before. I would have said they’re unconnected, but you make an intriguing case that the active voice is part of a culture of overclaiming. This doesn’t change my stance on the active voice (pro, pro, pro!) but is definitely food for thought.
It’s certainly true that one can hypothesize causation when one sees correlation; one just can’t conclude it. So some of this may be a matter of framing in Discussions – do you think?
Yes, the framing of the Discussion is often where the unsupported claims appear. But, when these claims are introduced in the Abstract & Intro too, the methods and data analysis need to back this up.
I think there is a difference between speculation and extrapolation based on your results (which is fine and usually necessary in ecology papers) and unsupported claims of the broader impact of your results – the fine line between them comes down to the use of causal language! 🙂
LikeLiked by 2 people
Very interesting especially the bit about the active and passive voice – I got very agitated about this when I first started editing Ecological Entomology and even had something published in Nature about it http://www.nature.com/nature/journal/v381/n6582/abs/381467a0.html – I was of course attacked viciously 😉 (although also received a lot of supportive private emails)
LikeLiked by 1 person
Thanks for the link! What a great piece – I will link to that in a future post 🙂 The argument for active voice has filtered down from the business world, where it has more effective application, but it is rarely relevant to science.
LikeLiked by 1 person
Language is a tool and passive/active voice is not necessarily linked with critical and non-critical thinking (and writing). The pattern of sound manuscript writing until the conclusions section, where great leaps of faith and wild extrapolations rear their trickster heads and ears, can be found in all fields of science; medicine, ecology, biology, etc.
I posit this is the value of, and a good case for mandatory inclusion in science degrees, journal clubs or scientific journal review classes. It fosters reviewing the literature, constructive discourse, and critical thinking. This carries over not only into future writing, but even future scientific preparedness and experimental construction.
One book on the ‘Required Reading’ list would be “Do Lemmings Commit Suicide? Beautiful hypotheses and ugly facts” by Dennis Chitty. (1996) 🙂 if I recall, there is also mention of writing and voice in the book.
LikeLiked by 1 person
Thanks Macrobe. Yes, it’s surprising that constructive discourse and critical thinking exercises don’t seem to be as inherent to science degrees as they are in humanities. Journal review classes are a great idea, even just more reading & writing! I’m working on a post about this, so stay tuned!