Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models

Read original: arXiv:2402.16420 - Published 4/24/2024 by Lev Kharlashkin, Melany Macias, Leo Huovinen, Mika Hamalainen

Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models

Overview

This paper explores using language models to predict Sustainable Development Goals (SDGs) from course descriptions.
The researchers compare the performance of large language models (LLMs) like GPT-3 to conventional foundation models on this task.
They analyze the strengths and limitations of different model types in capturing SDG-relevant information from course descriptions.

Plain English Explanation

The paper looks at using artificial intelligence (AI) models to predict the Sustainable Development Goals (SDGs) that are relevant to the content of university course descriptions. SDGs are a set of global goals established by the United Nations to address challenges like poverty, inequality, and climate change.

The researchers tested different types of AI models, including the very large language models (LLMs) like GPT-3 that have become well-known in recent years, as well as more conventional "foundation" models that are trained on a broader set of data. They wanted to see how well these models could identify the SDGs associated with the topics covered in course descriptions.

By comparing the performance of the different model types, the researchers aim to understand the strengths and weaknesses of each approach when it comes to capturing information about sustainable development in educational content. This could help inform the use of AI systems for tasks like curriculum planning, student advising, or automated assessment of educational materials.

Technical Explanation

The paper presents a study that uses language models to predict the Sustainable Development Goals (SDGs) associated with university course descriptions. The researchers compare the performance of large language models (LLMs) like GPT-3 to more conventional foundation models on this task.

The dataset used consists of course descriptions from a national longitudinal dataset on skills taught in US higher education. The researchers fine-tuned the LLM and foundation model approaches on this data to build predictive models for the 17 SDG categories.

Through their experiments, the authors find that the LLMs generally outperform the foundation models in predicting the SDGs associated with course content. However, they also identify some limitations of the LLMs, such as difficulties in accurately classifying courses related to certain SDGs. The results suggest that a combination of model types may be most effective for this application.

Critical Analysis

The paper provides a thorough evaluation of using language models to predict SDGs from course descriptions. The researchers acknowledge limitations in their dataset, such as potential biases in the course descriptions, that could impact model performance.

One area that could be explored further is the interpretability of the models' SDG predictions. Understanding the specific textual cues the models use to associate courses with SDGs could provide insights into the models' reasoning and potentially lead to improvements.

Additionally, the paper does not deeply examine the potential societal implications of using AI systems for this task. There could be concerns around algorithmic bias or unintended consequences that should be considered, especially given the importance of the SDGs.

Overall, the study represents a valuable contribution to research on applying language models to educational datasets. The findings could inform the development of intelligent systems to support sustainable development initiatives in higher education.

Conclusion

This paper investigates the use of language models, including both large language models and more conventional foundation models, to predict the Sustainable Development Goals (SDGs) associated with university course descriptions. The researchers find that the large language models generally outperform the foundation models on this task, but also identify limitations in the LLMs' ability to accurately classify certain SDG categories.

The results highlight the potential of language models to support tasks related to sustainable development in education, such as curriculum planning, student advising, and content evaluation. However, the authors also note the need to further explore issues of model interpretability and potential societal impacts.

Overall, this work contributes valuable insights into the strengths and weaknesses of different AI modeling approaches for extracting sustainability-relevant information from educational data. As AI systems become more widely adopted in educational settings, research like this will be crucial for ensuring these technologies are developed and deployed responsibly.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models

Lev Kharlashkin, Melany Macias, Leo Huovinen, Mika Hamalainen

We present our work on predicting United Nations sustainable development goals (SDG) for university courses. We use an LLM named PaLM 2 to generate training data given a noisy human-authored course description input as input. We use this data to train several different smaller language models to predict SDGs for university courses. This work contributes to better university level adaptation of SDGs. The best performing model in our experiments was BART with an F1-score of 0.786.

4/24/2024

Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals

Qingyang Wu, Ying Xu, Tingsong Xiao, Yunze Xiao, Yitong Li, Tianyang Wang, Yichi Zhang, Shanghai Zhong, Yuwei Zhang, Wei Lu, Yifan Yang

Large Language Models (LLMs) have emerged as potent tools for advancing the United Nations' Sustainable Development Goals (SDGs). However, the attitudinal disparities between LLMs and humans towards these goals can pose significant challenges. This study conducts a comprehensive review and analysis of the existing literature on the attitudes of LLMs towards the 17 SDGs, emphasizing the comparison between their attitudes and support for each goal and those of humans. We examine the potential disparities, primarily focusing on aspects such as understanding and emotions, cultural and regional differences, task objective variations, and factors considered in the decision-making process. These disparities arise from the underrepresentation and imbalance in LLM training data, historical biases, quality issues, lack of contextual understanding, and skewed ethical values reflected. The study also investigates the risks and harms that may arise from neglecting the attitudes of LLMs towards the SDGs, including the exacerbation of social inequalities, racial discrimination, environmental destruction, and resource wastage. To address these challenges, we propose strategies and recommendations to guide and regulate the application of LLMs, ensuring their alignment with the principles and goals of the SDGs, and therefore creating a more just, inclusive, and sustainable future.

4/23/2024

On the performativity of SDG classifications in large bibliometric databases

Matteo Ottaviani, Stephan Stahlschmidt

Large bibliometric databases, such as Web of Science, Scopus, and OpenAlex, facilitate bibliometric analyses, but are performative, affecting the visibility of scientific outputs and the impact measurement of participating entities. Recently, these databases have taken up the UN's Sustainable Development Goals (SDGs) in their respective classifications, which have been criticised for their diverging nature. This work proposes using the feature of large language models (LLMs) to learn about the data bias injected by diverse SDG classifications into bibliometric data by exploring five SDGs. We build a LLM that is fine-tuned in parallel by the diverse SDG classifications inscribed into the databases' SDG classifications. Our results show high sensitivity in model architecture, classified publications, fine-tuning process, and natural language generation. The wide arbitrariness at different levels raises concerns about using LLM in research practice.

5/7/2024

👀

Causal Machine Learning for Cost-Effective Allocation of Development Aid

Milan Kuzmanovic, Dennis Frauen, Tobias Hatt, Stefan Feuerriegel

The Sustainable Development Goals (SDGs) of the United Nations provide a blueprint of a better future by 'leaving no one behind', and, to achieve the SDGs by 2030, poor countries require immense volumes of development aid. In this paper, we develop a causal machine learning framework for predicting heterogeneous treatment effects of aid disbursements to inform effective aid allocation. Specifically, our framework comprises three components: (i) a balancing autoencoder that uses representation learning to embed high-dimensional country characteristics while addressing treatment selection bias; (ii) a counterfactual generator to compute counterfactual outcomes for varying aid volumes to address small sample-size settings; and (iii) an inference model that is used to predict heterogeneous treatment-response curves. We demonstrate the effectiveness of our framework using data with official development aid earmarked to end HIV/AIDS in 105 countries, amounting to more than USD 5.2 billion. For this, we first show that our framework successfully computes heterogeneous treatment-response curves using semi-synthetic data. Then, we demonstrate our framework using real-world HIV data. Our framework points to large opportunities for a more effective aid allocation, suggesting that the total number of new HIV infections could be reduced by up to 3.3% (~50,000 cases) compared to the current allocation practice.

6/18/2024