Taxonomy and Analysis of Sensitive User Queries in Generative AI Search

Read original: arXiv:2404.08672 - Published 4/16/2024 by Hwiyeol Jo, Taiwoo Park, Nayoung Choi, Changbong Kim, Ohjoon Kwon, Donghyeon Jeon, Hyunwoo Lee, Eui-Hyeon Lee, Kyoungho Shin, Sun Suk Lim and 3 others

Overview

• This research paper explores the taxonomy and analysis of sensitive user queries in generative AI search engines.

• The paper aims to understand the types of sensitive queries users might make and how search engines can effectively handle them.

• The research provides insights that can help improve the safety and trustworthiness of generative AI search tools.

Plain English Explanation

• This paper looks at the different kinds of sensitive or personal questions that people might ask when using AI-powered search engines.

• The researchers wanted to understand what types of sensitive queries users might have and how search engines can best respond to them.

• For example, a person might use a search engine to look for information on a sensitive health condition or a personal legal issue. The researchers analyzed these types of queries to help make AI search tools safer and more trustworthy for users.

• The findings from this research can help AI model developers improve the way their search engines handle sensitive information and protect user privacy.

Technical Explanation

• The paper categorizes different types of sensitive user queries, including those related to health, finance, and personal identity.

• The researchers conducted a large-scale analysis of real search engine query logs to identify patterns and trends in sensitive queries.

• They also developed a taxonomy to systematically organize the different categories of sensitive queries, such as those seeking medical advice or involving illegal activities.

• The analysis provides insights into how users interact with generative AI search tools for complex and personal information needs.

• The findings can help search engine developers enhance the safety and trustworthiness of their systems when handling sensitive user queries.

Critical Analysis

• The paper acknowledges limitations in the query log data, as it may not capture the full breadth of sensitive user queries.

• There is also a need for further research to understand how users' perceptions of sensitivity may differ from the researchers' taxonomy.

• Additional work is required to explore ethical considerations around the handling of sensitive information in generative AI search tools.

Conclusion

• This research provides a valuable taxonomy and analysis of sensitive user queries in generative AI search engines.

• The findings can help search engine developers improve the safety and trustworthiness of their systems when dealing with personal and sensitive user information.

• Continued research in this area is crucial as generative AI search tools become more prevalent in people's daily lives.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Taxonomy and Analysis of Sensitive User Queries in Generative AI Search

Hwiyeol Jo, Taiwoo Park, Nayoung Choi, Changbong Kim, Ohjoon Kwon, Donghyeon Jeon, Hyunwoo Lee, Eui-Hyeon Lee, Kyoungho Shin, Sun Suk Lim, Kyungmi Kim, Jihye Lee, Sun Kim

Although there has been a growing interest among industries to integrate generative LLMs into their services, limited experiences and scarcity of resources acts as a barrier in launching and servicing large-scale LLM-based conversational services. In this paper, we share our experiences in developing and operating generative AI models within a national-scale search engine, with a specific focus on the sensitiveness of user queries. We propose a taxonomy for sensitive search queries, outline our approaches, and present a comprehensive analysis report on sensitive queries from actual users.

4/16/2024

🖼️

Hybrid Semantic Search: Unveiling User Intent Beyond Keywords

Aman Ahluwalia, Bishwajit Sutradhar, Karishma Ghosh, Indrapal Yadav, Arpan Sheetal, Prashant Patil

This paper addresses the limitations of traditional keyword-based search in understanding user intent and introduces a novel hybrid search approach that leverages the strengths of non-semantic search engines, Large Language Models (LLMs), and embedding models. The proposed system integrates keyword matching, semantic vector embeddings, and LLM-generated structured queries to deliver highly relevant and contextually appropriate search results. By combining these complementary methods, the hybrid approach effectively captures both explicit and implicit user intent.The paper further explores techniques to optimize query execution for faster response times and demonstrates the effectiveness of this hybrid search model in producing comprehensive and accurate search outcomes.

9/9/2024

Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User Satisfaction

Hyunwoo Kim, Yoonseo Choi, Taehyun Yang, Honggu Lee, Chaneon Park, Yongju Lee, Jin Young Kim, Juho Kim

With large language models (LLMs), conversational search engines shift how users retrieve information from the web by enabling natural conversations to express their search intents over multiple turns. Users' natural conversation embodies rich but implicit signals of users' search intents and evaluation of search results to understand user experience with the system. However, it is underexplored how and why users ask follow-up queries to continue conversations with conversational search engines and how the follow-up queries signal users' satisfaction. From qualitative analysis of 250 conversational turns from an in-lab user evaluation of Naver Cue:, a commercial conversational search engine, we propose a taxonomy of 18 users' follow-up query patterns from conversational search, comprising two major axes: (1) users' motivations behind continuing conversations (N = 7) and (2) actions of follow-up queries (N = 11). Compared to the existing literature on query reformulations, we uncovered a new set of motivations and actions behind follow-up queries, including asking for subjective opinions or providing natural language feedback on the engine's responses. To analyze conversational search logs with our taxonomy in a scalable and efficient manner, we built an LLM-powered classifier (73% accuracy). With our classifier, we analyzed 2,061 conversational tuples collected from real-world usage logs of Cue: and examined how the conversation patterns from our taxonomy correlates with satisfaction. Our initial findings suggest some signals of dissatisfactions, such as Clarifying Queries, Excluding Condition, and Substituting Condition with follow-up queries. We envision our approach could contribute to automated evaluation of conversation search experience by providing satisfaction signals and grounds for realistic user simulations.

7/19/2024

🤖

Generative AI Search Engines as Arbiters of Public Knowledge: An Audit of Bias and Authority

Alice Li, Luanne Sinnamon

This paper reports on an audit study of generative AI systems (ChatGPT, Bing Chat, and Perplexity) which investigates how these new search engines construct responses and establish authority for topics of public importance. We collected system responses using a set of 48 authentic queries for 4 topics over a 7-day period and analyzed the data using sentiment analysis, inductive coding and source classification. Results provide an overview of the nature of system responses across these systems and provide evidence of sentiment bias based on the queries and topics, and commercial and geographic bias in sources. The quality of sources used to support claims is uneven, relying heavily on News and Media, Business and Digital Media websites. Implications for system users emphasize the need to critically examine Generative AI system outputs when making decisions related to public interest and personal well-being.

5/24/2024