Dix, A: Citations and Sub-Area Bias in the UK Research Assessment Process

Abstract: This paper presents a citation-based analysis of selected results of REF2014, the periodic UK research assessment process. Data for the Computer Science and Informatics sub-panel includes ACM topic sub-area information, allowing a level of analysis hitherto impossible. While every effort is made during the REF process to be fair, the results suggest systematic latent bias may have emerged between sub-areas. Furthermore this may have had a systematic effect benefiting some institutions relative to others, and potentially also introducing gender bias. Metric-based analysis could in future be used as part of the human-assessment process to uncover and help eradicate latent bias.

License: Creative Commons Attribution 4.0 International (CC-BY 4.0)

File: ASCW15_dix-citations-and-sub-areas-bias-in-the-uk-research-assessment-process.pdf

9 Comments




  1. This is very interesting work, as it shows convincingly that bibliometrics can be used to uncover latent biases in an evaluation exercise.

    The REF seems to be not the only evaluation based on expert opinion where inter-area biases play a role; in the latest iteration of VHB Jourqual, a ranking of journals relevant to business research, which is also based on expert opinion, the authors made the conscious decision not to publish an overall rating of all the journals included; instead they only published tables by sub-discipline, and they explicitly advise people against making cross-area comparisons (see Introduction to VHB-JOURQUAL3). There is some more discussion on problems regarding inter-area comparison on the site, unfortunately only in German: IV. Ergänzende Hinweise und Reaktionen zu VHB-JOURQUAL3 (2012) and V. Fragen zur Auswertung von VHB-JOURQUAL3 (2015).

    As I am not familiar with the UK REF, I do have a rather simple question: how is the final category determined that a research output is classified in?

    Reply

    1. For most of the REF sub-panels the final category is a matter of discussion between the panellists who evaluated the output. For the computing sub-panel however we independently marked, discussed wide discrepancies and then (because we are computer scientists) put everything into a big algorithm – therein almost certainly lay some of the problems as the needs of the algorithm drove allocation of reviewers – classic socio-technical failure story!

      Reply


  2. Graham, I didn’t calculate correlation coefficients, as I don’t doubt that there is some predictive power between citations and REF scores. The issue is about the other systematic effects … the residual variance is not random :-/

    BTW. I have pointers to all the data used and my own summary spreadsheets at: http://alandix.com/ref2014/

    Reply

  3. Re Anon review.

    This is true, fields do differ a lot in the number of citations, however the difference measured is both very large, and in the opposite direction one would expect given the differences in citation behaviour between fields.

    To deal with this issue, one of the seven citation measures used (which all agree broadly, no cherry picking!) corrects for this using Scopus cross-field citation measures. However, correcting for these differences makes things ‘worse’ in the sense that the discrepancy between the REF scores and citation metrics grows.

    For example, 25.9% of submitted web papers were in the top 1% worldwide based on the Scopus within area citation data but only 17.6% of web outputs had 4*, that is, on average, you would need to be amongst the top 0.6% world wide to get a 4*. In contrast for logic you only need to be in the top 6%.

    Now it could be that globally web research sucks, which would help account for this figure … discuss 😉

    Reply

  4. The plots use rank order. Is there any difference if the raw values are used? In both cases, what are the correlation coefficients (e.g. for Figure 2)?

    Reply

  5. This is an interesting analysis and I like Figure 2 and the discussion of how the current public domain information is being used for policy decisions. It is important to note that citation counts vary between sub-fields of a discipline, though, and so it is not surprising that the relationship between citations and quality scores also varies between sub-fields.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

ascw Captcha ensure human user * Time limit is exhausted. Please reload the CAPTCHA.