Measuring publication relatedness using controlled vocabularies

Read original: arXiv:2408.15004 - Published 8/28/2024 by Emil Dolmer Alnor
Total Score

0

🤷

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Measuring the relatedness between scientific publications is important for various areas like bibliometrics and science policy.
  • Controlled vocabularies can be a promising basis for measuring relatedness, addressing issues with citation-based or text-based similarity.
  • Several controlled-vocabulary-based relatedness measures have been developed, but their accuracy and suitability for different research questions have not been comprehensively tested.

Plain English Explanation

Researchers often need to understand how different scientific publications are related to one another. This can be useful for things like evaluating the impact of research or organizing scientific knowledge. One way to measure this "relatedness" is by looking at the keywords or terms used to describe the publications, rather than just citing other papers or analyzing the text itself. This can help address some of the issues that come up when using those other methods.

While there have been a few different ways suggested for measuring relatedness using keywords, the authors of this paper wanted to see how well these different methods actually work and which ones are best suited for different research questions. They reviewed the existing measures, developed a new one, and then tested all of them using real data about topics in the biomedical field.

Technical Explanation

The paper reviews several existing controlled-vocabulary-based relatedness measures and develops a new measure, then benchmarks the performance of these measures using the TREC Genomics dataset as ground truth.

The new measure they developed is based on the semantic similarity between the keywords or terms used to describe the publications. It takes into account both the direct matching of terms as well as the semantic relationships between them.

The benchmark tests showed that the new measure and the one proposed by Ahlgren et al. (2020) each had different strengths and weaknesses. These results can help researchers decide which method to use when studying topics like interdisciplinarity, information retrieval, clustering of science, and how researchers switch between different research topics.

Critical Analysis

The paper provides a comprehensive evaluation of several controlled-vocabulary-based relatedness measures, which is a valuable contribution to the field. However, the authors acknowledge that the benchmark test using the TREC Genomics dataset may have limitations in terms of representing the full scope of possible research questions and publication types.

Additionally, while the new measure developed in the paper shows promising performance, its effectiveness may depend on the specific domain and controlled vocabulary being used. Further research could explore how the measures perform in other scientific fields or with different controlled vocabularies.

The paper also does not delve into potential biases or limitations of controlled vocabularies themselves, which could impact the accuracy and applicability of the relatedness measures. Exploring these issues could be an interesting area for future work.

Conclusion

This paper provides an important evaluation of different methods for measuring the relatedness between scientific publications using controlled vocabularies. The results can help researchers choose the appropriate technique for their particular research needs, whether that's studying interdisciplinary connections, improving information retrieval, or understanding how researchers navigate different research topics over time. While the methods have some limitations, this work represents a significant step forward in developing more robust and reliable ways to analyze the relationships between scientific publications.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤷

Total Score

0

Measuring publication relatedness using controlled vocabularies

Emil Dolmer Alnor

Measuring the relatedness between scientific publications has important applications in many areas of bibliometrics and science policy. Controlled vocabularies provide a promising basis for measuring relatedness because they address issues that arise when using citation or textual similarity to measure relatedness. While several controlled-vocabulary-based relatedness measures have been developed, there exists no comprehensive and direct test of their accuracy and suitability for different types of research questions. This paper reviews existing measures, develops a new measure, and benchmarks the measures using TREC Genomics data as a ground truth of topics. The benchmark test show that the new measure and the measure proposed by Ahlgren et al. (2020) have differing strengths and weaknesses. These results inform a discussion of which method to choose when studying interdisciplinarity, information retrieval, clustering of science, and researcher topic switching.

Read more

8/28/2024

👁️

Total Score

0

Seed-based information retrieval in networks of research publications: Evaluation of direct citations, bibliographic coupling, co-citations and PubMed related article score

Peter Sjog{aa}rde, Per Ahlgren

In this contribution, we deal with seed-based information retrieval in networks of research publications. Using systematic reviews as a baseline, and publication data from the NIH Open Citation Collection, we compare the performance of the three citation-based approaches direct citation, co-citation, and bibliographic coupling with respect to recall and precision measures. In addition, we include the PubMed Related Article score as well as combined approaches in the comparison. We also provide a fairly comprehensive review of earlier research in which citation relations have been used for information retrieval purposes. The results show an advantage for co-citation over bibliographic coupling and direct citation. However, combining the three approaches outperforms the exclusive use of co-citation in the study. The results further indicate, in line with previous research, that combining citation-based approaches with textual approaches enhances the performance of seed-based information retrieval. The results from the study may guide approaches combining citation-based and textual approaches in their choice of citation similarity measures. We suggest that future research use more structured approaches to evaluate methods for seed-based retrieval of publications, including comparative approaches as well as the elaboration of common data sets and baselines for evaluation.

Read more

6/14/2024

🤖

Total Score

0

A Guide to Similarity Measures

Avivit Levy, B. Riva Shalom, Michal Chalamish

Similarity measures play a central role in various data science application domains for a wide assortment of tasks. This guide describes a comprehensive set of prevalent similarity measures to serve both non-experts and professional. Non-experts that wish to understand the motivation for a measure as well as how to use it may find a friendly and detailed exposition of the formulas of the measures, whereas experts may find a glance to the principles of designing similarity measures and ideas for a better way to measure similarity for their desired task in a given application domain.

Read more

8/16/2024

🌿

Total Score

0

Decoding Knowledge Claims: The Evaluation of Scientific Publication Contributions through Semantic Analysis

Luca D'Aniello, Nicolas Robinson-Garcia, Massimo Aria, Corrado Cuccurullo

The surge in scientific publications challenges the use of publication counts as a measure of scientific progress, requiring alternative metrics that emphasize the quality and novelty of scientific contributions rather than sheer quantity. This paper proposes the use of Relaxed Word Mover's Distance (RWMD), a semantic text similarity measure, to evaluate the novelty of scientific papers. We hypothesize that RWMD can more effectively gauge the growth of scientific knowledge. To test such an assumption, we apply RWMD to evaluate seminal papers, with Hirsch's H-Index paper as a primary case study. We compare RWMD results across three groups: 1) H-Index-related papers, 2) scientometric studies, and 3) unrelated papers, aiming to discern redundant literature and hype from genuine innovations. Findings suggest that emphasizing knowledge claims offers a deeper insight into scientific contributions, marking RWMD as a promising alternative method to traditional citation metrics, thus better tracking significant scientific breakthroughs.

Read more

9/4/2024