Hidden Citations Obscure True Impact in Science

Read original: arXiv:2310.16181 - Published 5/14/2024 by Xiangyi Meng, Onur Varol, Albert-L'aszl'o Barab'asi

🔍

Overview

References are important for signaling previous knowledge, but can also be misused as measures of scientific impact
When a discovery becomes common knowledge, citations may suffer from "obliteration by incorporation," where the discovery is now so well-known it's not referenced explicitly
This leads to "hidden citations," where a discovery is clearly credited in the text without a reference to the original publication
The researchers used unsupervised machine learning to systematically identify these hidden citations across a large corpus of scientific papers

Plain English Explanation

The researchers found that for many influential discoveries, the number of hidden citations (textual credit without a reference) actually outnumbers the official citation count. This hidden impact emerges regardless of the publishing venue or academic discipline.

The prevalence of hidden citations is not driven by the official citation count, but rather by how much the discovery is discussed within the papers. The more a discovery is talked about, the less visible it becomes to standard citation metrics.

This suggests that traditional bibliometric measures like citation counts offer a limited perspective on the true impact of a scientific discovery. To get a more complete picture, we need to analyze the full text of the scientific literature, not just the reference lists. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.

Technical Explanation

The researchers used unsupervised machine learning techniques to systematically identify "hidden citations" - instances where a discovery is clearly credited in the text of a paper without an accompanying citation to the original publication. They applied this approach to a large corpus of scientific papers across different disciplines.

The results showed that for many influential discoveries, the number of hidden citations far exceeds the official citation count. This pattern held true regardless of the publishing venue or academic field. The prevalence of hidden citations was not driven by citation counts, but rather by the degree of discussion around the discovery within the full text of the papers. The more a discovery was mentioned and discussed, the less visible it became to standard bibliometric analysis.

The researchers argue that this highlights the limitations of relying solely on citation metrics to quantify the true impact of scientific work. By extracting knowledge from the full text, rather than just reference lists, a more complete picture of a discovery's influence can be obtained.

Critical Analysis

The research provides a novel and insightful approach to measuring the true impact of scientific discoveries beyond traditional citation-based metrics. However, some caveats and limitations should be considered:

The method for identifying "hidden citations" relies on unsupervised machine learning, which may introduce biases or miss certain types of textual credit. Further validation and refinement of the approach could strengthen the findings.
The study is correlational in nature, so it cannot definitively determine the causes of the observed patterns. Other factors beyond just discussion frequency may influence the prevalence of hidden citations.
The researchers do not address potential issues around the automated retrieval and processing of citations, which can introduce errors or miss important contextual information. Incorporating human validation or complementary methods could help strengthen the analysis.
While the researchers highlight the limitations of citation-based metrics, more research is needed to understand how hidden citations and other text-based measures can be effectively incorporated into research evaluation and impact assessment.

Conclusion

This study sheds important light on the phenomenon of "hidden citations" - instances where scientific discoveries are clearly credited in the text of papers without accompanying formal citations. The researchers demonstrate that for many influential discoveries, these hidden citations vastly outnumber the official citation counts.

This finding suggests that traditional bibliometric measures provide an incomplete picture of a discovery's true impact and influence. To gain a more comprehensive understanding, it is necessary to extract knowledge from the full text of the scientific literature, not just the reference lists.

While the study has some limitations, it represents an important step towards developing more nuanced and holistic approaches to assessing the impact of scientific work. As the volume of published research continues to grow, methods like this will be increasingly crucial for navigating the complex landscape of scientific knowledge and achievement.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔍

Hidden Citations Obscure True Impact in Science

Xiangyi Meng, Onur Varol, Albert-L'aszl'o Barab'asi

References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.

5/14/2024

🏷️

Why do you cite? An investigation on citation intents and decision-making classification processes

Lorenzo Paolini (Department of Classical Philology,Italian Studies, University of Bologna, Bologna, Italy), Sahar Vahdati (Nature-inspired machine intelligence group, SCaDS.AI center, Technical University of Dresden, Germany Institute for Applied Computer Science, InfAI - Dresden, Germany), Angelo Di Iorio (Department of Computer Science,Engineering, University of Bologna, Bologna, Italy), Robert Wardenga (Institute for Applied Computer Science, InfAI - Dresden, Germany), Ivan Heibi (Research Centre for Open Scholarly Metadata, Department of Classical Philology,Italian Studies, University of Bologna, Bologna, Italy, Digital Humanities Advanced Research Centre), Silvio Peroni (Research Centre for Open Scholarly Metadata, Department of Classical Philology,Italian Studies, University of Bologna, Bologna, Italy, Digital Humanities Advanced Research Centre)

Identifying the reason for which an author cites another work is essential to understand the nature of scientific contributions and to assess their impact. Citations are one of the pillars of scholarly communication and most metrics employed to analyze these conceptual links are based on quantitative observations. Behind the act of referencing another scholarly work there is a whole world of meanings that needs to be proficiently and effectively revealed. This study emphasizes the importance of trustfully classifying citation intents to provide more comprehensive and insightful analyses in research assessment. We address this task by presenting a study utilizing advanced Ensemble Strategies for Citation Intent Classification (CIC) incorporating Language Models (LMs) and employing Explainable AI (XAI) techniques to enhance the interpretability and trustworthiness of models' predictions. Our approach involves two ensemble classifiers that utilize fine-tuned SciBERT and XLNet LMs as baselines. We further demonstrate the critical role of section titles as a feature in improving models' performances. The study also introduces a web application developed with Flask and currently available at http://137.204.64.4:81/cic/classifier, aimed at classifying citation intents. One of our models sets as a new state-of-the-art (SOTA) with an 89.46% Macro-F1 score on the SciCite benchmark. The integration of XAI techniques provides insights into the decision-making processes, highlighting the contributions of individual words for level-0 classifications, and of individual models for the metaclassification. The findings suggest that the inclusion of section titles significantly enhances classification performances in the CIC task. Our contributions provide useful insights for developing more robust datasets and methodologies, thus fostering a deeper understanding of scholarly communication.

7/19/2024

Past, Present, and Future of Citation Practices in HCI

Jonas Oppenlaender

Science is a complex system comprised of many scientists who individually make collective decisions that, due to the size and nature of the academic system, largely do not affect the system as a whole. However, certain decisions at the meso-level of research communities, such as the Human-Computer Interaction (HCI) community, may result in deep and long-lasting behavioral changes in scientists. In this article, we provide evidence on how a change in editorial policies introduced at the ACM CHI Conference in 2016 launched the CHI community on an expansive path, denoted by a year-by-year increase in the mean number of references included in CHI articles. If this near-linear trend continues undisrupted, an article in CHI 2030 will include on average almost 130 references. The trend towards more citations reflects a citation culture where quantity is prioritized over quality, contributing to both author and peer reviewer fatigue. This article underscores the profound impact that meso-level policy adjustments have on the evolution of scientific fields and disciplines, urging stakeholders to carefully consider the broader implications of such changes.

9/11/2024

🌿

CausalCite: A Causal Formulation of Paper Citations

Ishan Kumar, Zhijing Jin, Ehsan Mokhtarian, Siyuan Guo, Yuen Chen, Mrinmaya Sachan, Bernhard Scholkopf

Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. TextMatch encodes each paper using text embeddings from large language models (LLMs), extracts similar samples by cosine similarity, and synthesizes a counterfactual sample as the weighted average of similar papers according to their similarity values. We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various subfields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of the quality of a paper. Our code is available at https://github.com/causalNLP/causal-cite.

5/29/2024