We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields

Read original: arXiv:2310.14870 - Published 7/17/2024 by Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad
Total Score

0

šŸŒæ

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Natural Language Processing (NLP) is a rapidly advancing field with the potential to significantly impact the world
  • However, this progress comes with substantial risks that require broad engagement across various fields of study
  • This paper quantifies the degree of influence between NLP and 23 other fields, analyzing NLP publications, citations, and cross-field engagement over time

Plain English Explanation

Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. As NLP has advanced, it has become poised to substantially influence many aspects of our lives, from improved voice assistants to automated language translation.

However, this rapid progress also comes with significant risks and challenges that require input and collaboration from a wide range of academic disciplines. To better understand the state of this cross-disciplinary engagement, the researchers in this paper analyzed a large dataset of NLP publications, citations, and the connections between NLP and 23 other fields of study.

Their analysis revealed that, unlike most areas of research, the diversity of fields engaged with NLP has actually declined over time. In the 1980s, NLP papers cited a wide range of fields, but by 2022, this cross-field engagement had reached an all-time low. NLP has also become more insular, with researchers citing other NLP papers at an increasing rate and fewer "bridge" papers connecting NLP to other disciplines.

Furthermore, the researchers found that NLP is heavily dominated by computer science, with less than 8% of NLP citations coming from linguistics and less than 3% from math and psychology. This suggests that NLP may be overlooking important perspectives and insights from these other fields.

Overall, this study highlights the urgent need for NLP researchers to reflect on and actively address the declining cross-disciplinary engagement in their field. Maintaining a diverse range of influences is crucial for ensuring that NLP develops in a responsible and well-rounded manner.

Technical Explanation

The researchers in this paper sought to quantify the degree of influence between NLP and 23 other fields of study. They analyzed a dataset of approximately 77,000 NLP papers, 3.1 million citations from NLP papers to other publications, and 1.8 million citations from other papers to NLP work.

To measure cross-field engagement, the team developed a metric called the Citation Field Diversity Index (CFDI). This index ranges from 0 to 1, with higher values indicating a more diverse set of cited fields. The researchers tracked the CFDI of NLP over time, finding that it had declined from 0.58 in 1980 to 0.31 in 2022 - an all-time low.

In addition, the analysis showed that NLP has become increasingly insular, with researchers citing other NLP papers at a higher rate and fewer "bridge" papers connecting NLP to other disciplines. The distribution of NLP citations was also highly skewed, with over 80% going to computer science and less than 8% to linguistics, and less than 3% to math and psychology.

Critical Analysis

While this study provides valuable insights into the state of cross-disciplinary engagement in NLP, there are a few potential limitations and areas for further research:

  • The analysis is limited to the 23 fields explicitly included in the study, and there may be other relevant disciplines that were not considered.
  • The paper does not delve into the potential reasons behind the declining cross-field engagement, such as institutional or funding pressures, cultural factors, or the inherent technical nature of NLP research.
  • The findings may not be generalizable to all subfields of NLP, as the researchers did not examine potential differences in cross-disciplinary engagement across various NLP applications or techniques.

Additionally, one could argue that the decline in cross-field engagement is not necessarily a negative outcome if NLP is becoming more focused and efficient within its core domains. However, the authors make a compelling case that maintaining diverse influences is crucial for the responsible development of NLP technologies, which have the potential to significantly impact society.

Conclusion

This study provides a sobering look at the state of cross-disciplinary engagement in the field of Natural Language Processing. The researchers found that, unlike most academic fields, NLP's engagement with a diverse range of disciplines has steadily declined over the past few decades, reaching an all-time low in 2022.

This trend towards insularity is concerning, as it suggests that NLP may be overlooking important perspectives and insights from fields like linguistics, mathematics, and psychology. Maintaining a broad range of influences is crucial for ensuring that NLP develops in a responsible and well-rounded manner, especially given its potential to substantially impact the world.

The findings of this paper suggest that NLP researchers and the broader academic community must actively work to address this trend and foster greater cross-disciplinary collaboration. Only by doing so can we unlock the full potential of Natural Language Processing while mitigating its inherent risks and challenges.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on š• ā†’

Related Papers

šŸŒæ

Total Score

0

We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields

Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad

Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). We analyzed ~77k NLP papers, ~3.1m citations from NLP papers to other papers, and ~1.8m citations from other papers to NLP papers. We show that, unlike most fields, the cross-field engagement of NLP, measured by our proposed Citation Field Diversity Index (CFDI), has declined from 0.58 in 1980 to 0.31 in 2022 (an all-time low). In addition, we find that NLP has grown more insular -- citing increasingly more NLP papers and having fewer papers that act as bridges between fields. NLP citations are dominated by computer science; Less than 8% of NLP citations are to linguistics, and less than 3% are to math and psychology. These findings underscore NLP's urgent need to reflect on its engagement with various fields.

Read more

7/17/2024

šŸŒæ

Total Score

0

The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research

Mohamed Abdalla, Jan Philip Wahle, Terry Ruas, Aur'elie N'ev'eol, Fanny Ducel, Saif M. Mohammad, Karen Fort

Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence in the NLP community over time. Using a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors, we explore the industry presence in the field since the early 90s. We find that industry presence among NLP authors has been steady before a steep increase over the past five years (180% growth from 2017 to 2022). A few companies account for most of the publications and provide funding to academic researchers through grants and internships. Our study shows that the presence and impact of the industry on natural language processing research are significant and fast-growing. This work calls for increased transparency of industry influence in the field.

Read more

7/17/2024

Collaboration or Corporate Capture? Quantifying NLP's Reliance on Industry Artifacts and Contributions
Total Score

0

Collaboration or Corporate Capture? Quantifying NLP's Reliance on Industry Artifacts and Contributions

Will Aitken, Mohamed Abdalla, Karen Rudie, Catherine Stinson

Impressive performance of pre-trained models has garnered public attention and made news headlines in recent years. Almost always, these models are produced by or in collaboration with industry. Using them is critical for competing on natural language processing (NLP) benchmarks and correspondingly to stay relevant in NLP research. We surveyed 100 papers published at EMNLP 2022 to determine the degree to which researchers rely on industry models, other artifacts, and contributions to publish in prestigious NLP venues and found that the ratio of their citation is at least three times greater than what would be expected. Our work serves as a scaffold to enable future researchers to more accurately address whether: 1) Collaboration with industry is still collaboration in the absence of an alternative or 2) if NLP inquiry has been captured by the motivations and research direction of private corporations.

Read more

6/26/2024

šŸ‘Øā€šŸ«

Total Score

0

Connecting the Dots in News Analysis: Bridging the Cross-Disciplinary Disparities in Media Bias and Framing

Gisela Vallejo, Timothy Baldwin, Lea Frermann

The manifestation and effect of bias in news reporting have been central topics in the social sciences for decades, and have received increasing attention in the NLP community recently. While NLP can help to scale up analyses or contribute automatic procedures to investigate the impact of biased news in society, we argue that methodologies that are currently dominant fall short of addressing the complex questions and effects addressed in theoretical media studies. In this survey paper, we review social science approaches and draw a comparison with typical task formulations, methods, and evaluation metrics used in the analysis of media bias in NLP. We discuss open questions and suggest possible directions to close identified gaps between theory and predictive models, and their evaluation. These include model transparency, considering document-external information, and cross-document reasoning rather than single-label assignment.

Read more

6/21/2024