All impact metrics are wrong, but (with more data) some are useful.

A couple of years ago I wrote about some of the limitations of relying on Altmetrics as an indicator of a paper’s impact, because it doesn’t pick up all online mentions.

Yes, impact metrics are flawed; experts have been pointing this out for years. And I’m not singling out Altmetrics here, there are a few different impact metrics used by different journals for the same goal, e.g. PlumX, Dimensions, CrossRef Event Data.

Despite their flaws, we’re all still using them to demonstrate how our work is reaching global audiences. I used them recently in a promotion application and a major grant application.

But I’m now questioning whether I will keep using them, because they are deeply flawed and are consistently misused and misinterpreted. They are literally a measure of quantity without any context: the number of shares or mentions, but no indication of how and why they are being shared.

This is problematic for a few reasons.

In communication terms, this is the difference between reach and impact, or awareness and engagement. You can measure reach with very simple metrics: the number of followers, the number of readers, the number of page views etc. But this number tells you nothing about real impact: how many of those people actually engaged with the content, took it on board, took action or made changes because of what they read. If they took action, was it appropriate for the problem, or was it misguided because they had misunderstood the evidence.

These things really matter if you want to understand how research impacts society. If you use metrics of reach to demonstrate impact, you are telling a very small, and probably misleading, part of the story.

Impact metrics for scholarly works are well-known from citation counts and journal impact factors. The goal is to demonstrate that a paper or journal is having an impact on its discipline, because it is being cited a lot. Of course there are caveats to this assumption, but the principle stands. Most researchers cite a paper because it has informed their own arguments; therefore, number of citations is a reasonable indicator of impact on a research field.

The assumptions of these scholarly impact metrics are based on a lot of other assumptions associated with the rigour of the peer review system. That is, a paper isn’t available to cite until it’s already passed through rigorous peer assessment. In addition, the researchers who end up citing that paper are specialist experts in their field and are equipped with skills to decide that the paper is fit for purpose, i.e. relevant & rigorous enough to be cited in their own paper. Even if a researcher cites a paper in criticism or to highlight knowledge gaps, it is still a valid contribution to that discipline because it is helping knowledge of that discipline grow.

Yes, truly dodgy papers do slip through the cracks of the peer review system and are therefore available to be cited. What if researchers keep citing those dodgy papers in criticism, thus giving it a high citation count? I’m not aware of any research on this, but truly flawed papers should theoretically not get higher citation counts than rigorous papers. There might be a flurry of citations immediately after the dodgy paper is published, from opinion pieces and commentaries that point out its flaws, but over time citations should decline. I for one tend to avoid citing truly flawed papers, hopefully others do too!

These quality assumptions don’t apply to social engagement metrics. Because these metrics measure pings from a range of social media, news and blog websites, the potential audience for the paper increases exponentially. And, unlike peer reviewed primary literature, this audience has a huge range of levels of expertise. Most audience members will not have the specialist expertise to judge the scholarly rigour and quality of a paper.

Most importantly, online impact metrics are confounded by time, and are only relevant for papers published after journals established impact metrics.

If we just measure how many times a paper is shared on social media or mentioned in news stories or on blogs, we are overlooking how and why it is being shared. Is it being cited on denial websites to promote dangerous agendas? Is it being cited misleadingly in news stories that misinterpret the findings? Is it being tweeted or blogged about by other experts in the field who are pointing out the flaws in it?

These are all important questions that cannot be answered by the simple metric of ‘online impact’, aka reach.

Some examples of why high online impact metrics don’t translate well to scholarly quality.

Some contrast:

And let’s check some of the most highly-cited papers in ecology written by some of the leading ecologists of our time.

  • Jane Lubchenco’s most cited first-authored paper (1998), Altmetric score of 186.
  • Gretchen Daily’s most cited first-authored paper (2009), Altmetric score of 16.
  • Daniel Simberloff’s most cited first-authored paper (1999) has an Altmetric score of 51.
  • Robert Whittaker’s most cited first-authored paper (1972) has an Altmetric score of 6.
  • I checked the current online impact metrics of the most-cited ecology papers from the last 10 years, listed in this October 2018 post at Dynamic Ecology. For the first list (most-cited), the range of scores was 0 to 1,912, average 552 (one paper was published in a journal that didn’t use online impact metrics). For the second list (most-cited that aren’t about global change, statistical methods, ecosystem services, or microbiomes), the range of scores was 0 to 234, average 50. You get the idea. (For journals that used PlumX, I only counted social media & news mentions to ensure comparison with Altmetrics.)

So what is the point of online reach metrics? They don’t measure research quality, rigour or relevance to society’s needs. All they tell us is that people are talking about a paper. But without associated content analysis, they don’t tell us why people are talking about it. This is pretty important if we want to judge a paper’s impact.


© Manu Saunders 2020


6 thoughts on “All impact metrics are wrong, but (with more data) some are useful.

  1. Ken Hughes February 16, 2020 / 10:47 AM

    A good scientific article on these ideas is by Trilling et al. in 2016: “From Newsworthiness to Shareworthiness”. It points out specific things that make an article more likely to proliferate on social media such as presence of conflict, issues involving Western countries, and a negative or positive tone instead of a neutral tone. It also notes how shares on Twitter tend to saturate after one day on average which is, of course, indicative the points your making that social media attention isn’t conducive to lasting impact.


  2. Black Sea FM March 2, 2020 / 8:21 PM

    In general, I welcome the post. I suggest, however, that the commentary on the Journal Impact Factor is insufficiently sceptical.


What do you think?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s