Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

2405.14107

Published 5/24/2024 by Qian Zhu, Dakuo Wang, Shuai Ma, April Yi Wang, Zixin Chen, Udayan Khurana, Xiaojuan Ma

✨

Abstract

As AI technology continues to advance, the importance of human-AI collaboration becomes increasingly evident, with numerous studies exploring its potential in various fields. One vital field is data science, including feature engineering (FE), where both human ingenuity and AI capabilities play pivotal roles. Despite the existence of AI-generated recommendations for FE, there remains a limited understanding of how to effectively integrate and utilize humans' and AI's knowledge. To address this gap, we design a readily-usable prototype, human&AI-assisted FE in Jupyter notebooks. It harnesses the strengths of humans and AI to provide feature suggestions to users, seamlessly integrating these recommendations into practical workflows. Using the prototype as a research probe, we conducted an exploratory study to gain valuable insights into data science practitioners' perceptions, usage patterns, and their potential needs when presented with feature suggestions from both humans and AI. Through qualitative analysis, we discovered that the Creator of the feature (i.e., AI or human) significantly influences users' feature selection, and the semantic clarity of the suggested feature greatly impacts its adoption rate. Furthermore, our findings indicate that users perceive both differences and complementarity between features generated by humans and those generated by AI. Lastly, based on our study results, we derived a set of design recommendations for future human&AI FE design. Our findings show the collaborative potential between humans and AI in the field of FE.

Create account to get full access

Overview

The paper explores the potential of human-AI collaboration in the field of data science, specifically in feature engineering (FE).
It presents a prototype called "human&AI-assisted FE" that integrates human and AI-generated feature suggestions into a Jupyter notebook workflow.
The study investigates data science practitioners' perceptions, usage patterns, and needs when presented with feature suggestions from both humans and AI.
The findings highlight the importance of the feature creator's identity (human or AI) and the semantic clarity of the suggested features in user adoption.
The paper also suggests design recommendations for future human-AI collaboration in FE.

Plain English Explanation

In the rapidly evolving world of artificial intelligence (AI), the collaboration between humans and AI is becoming increasingly vital, particularly in the field of data science. One critical area where this collaboration can be beneficial is feature engineering (FE), which is the process of creating and selecting the most relevant features (or characteristics) from data to improve the performance of machine learning models.

The researchers behind this study recognized that while AI can generate recommendations for feature engineering, there is still a limited understanding of how to effectively integrate and utilize the knowledge of both humans and AI. To address this gap, they designed a prototype called "human&AI-assisted FE" that allows users to access and incorporate feature suggestions from both human experts and AI systems into their Jupyter notebook workflows.

By using this prototype as a research tool, the researchers conducted an exploratory study to gain insights into how data science practitioners perceive, use, and potentially benefit from these feature suggestions. The study revealed that the identity of the feature creator (human or AI) and the clarity of the feature description significantly influence whether users decide to incorporate the suggested features into their work.

For example, users may be more inclined to use features suggested by human experts, as they perceive these features to be more intuitive and meaningful. At the same time, the study also found that users recognize the complementary nature of human and AI-generated features, suggesting that a combination of both can be valuable in the feature engineering process.

Based on these findings, the researchers derived a set of design recommendations for future human-AI collaboration in feature engineering. These recommendations aim to help create more effective and user-friendly tools that leverage the strengths of both humans and AI to improve the data science workflow.

Overall, this study highlights the promising potential of human-AI collaboration in the field of feature engineering, and offers insights that can guide the development of more advanced and integrated tools to support data science practitioners in their work.

Technical Explanation

The study presented in the paper explores the potential of human-AI collaboration in the context of feature engineering (FE), a crucial step in the data science workflow. The researchers designed a prototype called "human&AI-assisted FE" that integrates feature suggestions from both human experts and AI systems into a Jupyter notebook environment.

To investigate the usage and perceptions of this human-AI collaborative approach, the researchers conducted an exploratory study with data science practitioners. They asked participants to use the prototype and provided them with feature suggestions from both human and AI sources. Through qualitative analysis, the researchers discovered several key insights:

Creator Identity: The identity of the feature creator (human or AI) significantly influenced the participants' feature selection. Users tended to favor features suggested by human experts, perceiving them as more intuitive and meaningful.
Semantic Clarity: The clarity and semantic understanding of the suggested features greatly impacted their adoption rate. Features with clear and understandable descriptions were more likely to be incorporated into the participants' workflows.
Complementarity: Participants recognized both differences and complementarity between features generated by humans and those generated by AI. They expressed interest in leveraging the strengths of both human and AI contributions in the feature engineering process.

Based on these findings, the researchers derived a set of design recommendations for future human-AI collaboration in feature engineering. These recommendations aim to enhance the integration and utilization of human and AI knowledge in the data science domain.

The study's experimental design and the insights gained provide valuable guidance for the development of more effective and user-centered tools that support data science practitioners in their feature engineering tasks. By harnessing the strengths of both human ingenuity and AI capabilities, the researchers demonstrate the potential for fruitful collaboration in this important field.

Critical Analysis

The study presented in the paper offers valuable insights into the potential of human-AI collaboration in feature engineering, a critical component of the data science workflow. However, the research also has some limitations that deserve consideration.

One key limitation is the relatively small sample size of the exploratory study, which may limit the generalizability of the findings. While the qualitative approach provides rich insights, a larger-scale study with a more diverse set of participants could further validate and expand the understanding of human-AI collaboration in feature engineering.

Additionally, the study focuses on the perceptions and usage patterns of the prototype tool, but does not directly assess the impact of human-AI collaboration on the quality or performance of the feature engineering process. Future research could explore the tangible benefits of this collaborative approach, such as improved model performance or increased efficiency in feature selection.

Another area for further exploration is the long-term implications of human-AI collaboration in feature engineering. As AI systems continue to advance, it will be crucial to understand how the dynamic between human and AI contributions evolves over time, and how to maintain an effective balance between the two.

Despite these limitations, the study's findings and design recommendations provide a solid foundation for advancing the field of human-AI collaboration in data science. The insights gained can inform the development of more user-centric and effective tools that leverage the strengths of both humans and AI to enhance the feature engineering process.

Conclusion

This study highlights the promising potential of human-AI collaboration in the field of feature engineering, a crucial component of the data science workflow. By designing a prototype called "human&AI-assisted FE" and conducting an exploratory study with data science practitioners, the researchers have gained valuable insights into the perceptions, usage patterns, and needs of users when presented with feature suggestions from both human experts and AI systems.

The key findings suggest that the identity of the feature creator and the semantic clarity of the suggested features play a significant role in user adoption. Additionally, the study reveals that users recognize the complementarity between human and AI-generated features, indicating the value of integrating both in the feature engineering process.

Based on these insights, the researchers have proposed a set of design recommendations that can guide the development of future human-AI collaboration tools in the field of data science. These recommendations aim to enhance the seamless integration and effective utilization of human and AI knowledge, ultimately improving the overall efficiency and performance of feature engineering tasks.

As AI technology continues to advance, the importance of human-AI collaboration in data science will only grow. This study provides a solid foundation for further exploration and innovation in this critical area, paving the way for more effective and user-centric tools that leverage the combined strengths of humans and AI to drive progress in the field of data science.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Leveraging Knowlegde Graphs for Interpretable Feature Generation

Mohamed Bouadi, Arta Alavi, Salima Benbernou, Mourad Ouziri

The quality of Machine Learning (ML) models strongly depends on the input data, as such Feature Engineering (FE) is often required in ML. In addition, with the proliferation of ML-powered systems, especially in critical contexts, the need for interpretability and explainability becomes increasingly important. Since manual FE is time-consuming and requires case specific knowledge, we propose KRAFT, an AutoFE framework that leverages a knowledge graph to guide the generation of interpretable features. Our hybrid AI approach combines a neural generator to transform raw features through a series of transformations and a knowledge-based reasoner to evaluate features interpretability using Description Logics (DL). The generator is trained through Deep Reinforcement Learning (DRL) to maximize the prediction accuracy and the interpretability of the generated features. Extensive experiments on real datasets demonstrate that KRAFT significantly improves accuracy while ensuring a high level of interpretability.

6/4/2024

cs.LG cs.AI

🔮

Human Expertise in Algorithmic Prediction

Rohan Alur, Manish Raghavan, Devavrat Shah

We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach focuses on the use of human judgment to distinguish inputs which `look the same' to any feasible predictive algorithm. We argue that this framing clarifies the problem of human/AI collaboration in prediction tasks, as experts often have access to information -- particularly subjective information -- which is not encoded in the algorithm's training data. We use this insight to develop a set of principled algorithms for selectively incorporating human feedback only when it improves the performance of any feasible predictor. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can significantly improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly 30% of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.

5/24/2024

cs.LG cs.AI cs.HC

Fiper: a Visual-based Explanation Combining Rules and Feature Importance

Eleonora Cappuccio, Daniele Fadda, Rosa Lanzilotti, Salvatore Rinzivillo

Artificial Intelligence algorithms have now become pervasive in multiple high-stakes domains. However, their internal logic can be obscure to humans. Explainable Artificial Intelligence aims to design tools and techniques to illustrate the predictions of the so-called black-box algorithms. The Human-Computer Interaction community has long stressed the need for a more user-centered approach to Explainable AI. This approach can benefit from research in user interface, user experience, and visual analytics. This paper proposes a visual-based method to illustrate rules paired with feature importance. A user study with 15 participants was conducted comparing our visual method with the original output of the algorithm and textual representation to test its effectiveness with users.

4/29/2024

cs.HC cs.AI

Harmonizing Human Insights and AI Precision: Hand in Hand for Advancing Knowledge Graph Task

Shurong Wang, Yufei Zhang, Xuliang Huang, Hongwei Wang

Knowledge graph embedding (KGE) has caught significant interest for its effectiveness in knowledge graph completion (KGC), specifically link prediction (LP), with recent KGE models cracking the LP benchmarks. Despite the rapidly growing literature, insufficient attention has been paid to the cooperation between humans and AI on KG. However, humans' capability to analyze graphs conceptually may further improve the efficacy of KGE models with semantic information. To this effect, we carefully designed a human-AI team (HAIT) system dubbed KG-HAIT, which harnesses the human insights on KG by leveraging fully human-designed ad-hoc dynamic programming (DP) on KG to produce human insightful feature (HIF) vectors that capture the subgraph structural feature and semantic similarities. By integrating HIF vectors into the training of KGE models, notable improvements are observed across various benchmarks and metrics, accompanied by accelerated model convergence. Our results underscore the effectiveness of human-designed DP in the task of LP, emphasizing the pivotal role of collaboration between humans and AI on KG. We open avenues for further exploration and innovation through KG-HAIT, paving the way towards more effective and insightful KG analysis techniques.

5/16/2024

cs.LG cs.AI