Abstract: In this paper we advocate increased use of textual data to develop new bibliometric methods. To demonstrate text’s potential we propose a new bibliometric method that combines natural language processing with traditional bibliometric techniques to improve high impact science predictions. Relying upon the vast amounts of scholarly data now available online, we assemble a universe of scientific topics and use article text to measure the topical distance between citing and cited papers. We show that accounting for topical distance improves our ability to predict scientific impact. Citations from both topically distant and proximate papers provide more insight into an article’s impact potential than those from papers with middling similarity.
License: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)