Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies

2309.13063

Published 5/13/2024 by Chirag Shah, Ryen W. White, Reid Andersen, Georg Buscher, Scott Counts, Sarkar Snigdha Sarathi Das, Ali Montazer, Sathish Manivannan, Jennifer Neville, Xiaochuan Ni and 6 others

cs.IR cs.AI cs.CL

💬

Abstract

Log data can reveal valuable information about how users interact with Web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for emerging forms of Web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics. Existing methods rely on manual or machine-learned labeling, which are either expensive or inflexible for large and dynamic datasets. We propose a novel solution using large language models (LLMs), which can generate rich and relevant concepts, descriptions, and examples for user intents. However, using LLMs to generate a user intent taxonomy and apply it for log analysis can be problematic for two main reasons: (1) such a taxonomy is not externally validated; and (2) there may be an undesirable feedback loop. To address this, we propose a new methodology with human experts and assessors to verify the quality of the LLM-generated taxonomy. We also present an end-to-end pipeline that uses an LLM with human-in-the-loop to produce, refine, and apply labels for user intent analysis in log data. We demonstrate its effectiveness by uncovering new insights into user intents from search and chat logs from the Microsoft Bing commercial search engine. The proposed work's novelty stems from the method for generating purpose-driven user intent taxonomies with strong validation. This method not only helps remove methodological and practical bottlenecks from intent-focused research, but also provides a new framework for generating, validating, and applying other kinds of taxonomies in a scalable and adaptable way with reasonable human effort.

Create account to get full access

Overview

Analyzing user intents in web search log data is challenging, especially for emerging forms of search like AI-driven chat
Existing methods for labeling user intents are either expensive (manual) or inflexible (machine-learned)
The paper proposes a new methodology using large language models (LLMs) to generate user intent taxonomies, with human validation to address limitations

Plain English Explanation

When people use web search engines or AI-powered chat services, their interactions and the information they're looking for can reveal valuable insights. By understanding user intents - the reasons why people are searching or chatting - companies can improve their services and better meet user needs.

However, analyzing user intents from the massive amounts of log data generated by these services is challenging, especially for newer technologies like AI-driven chat. Existing methods for categorizing user intents either rely on expensive manual labeling by humans, or machine learning models that may not be flexible enough to capture the diversity of intents, particularly as user behaviors evolve.

The researchers propose a novel solution using large language models (LLMs). These powerful AI systems can generate rich descriptions, examples, and taxonomies of user intents. However, using an LLM-generated taxonomy directly poses two key challenges:

The taxonomy may not be externally validated or grounded in real-world user behavior.
There's a risk of an undesirable feedback loop, where the model's own biases get encoded into the taxonomy.

To address these issues, the researchers propose a new methodology that combines the generative power of LLMs with human validation and refinement. This allows them to produce a high-quality, purpose-driven user intent taxonomy that can then be used to analyze log data in an effective and scalable way.

Technical Explanation

The paper presents an end-to-end pipeline that uses an LLM with human-in-the-loop to generate, refine, and apply labels for user intent analysis in web search and chat logs.

First, the LLM is used to produce an initial user intent taxonomy, generating rich concepts, descriptions, and examples. Human experts and assessors then review and validate this taxonomy, removing biases and ensuring it aligns with real-world user behaviors observed in the log data.

With the validated taxonomy in place, the researchers apply it to the log data, categorizing user intents at scale. This allows them to uncover new insights into how people are using web search and conversational AI services.

The key innovations of this work are:

The method for generating purpose-driven user intent taxonomies with strong human validation, which addresses the limitations of existing approaches.
The end-to-end pipeline that enables scalable, adaptable user intent analysis for evolving web search and chat technologies.

Critical Analysis

The researchers acknowledge several important caveats and limitations of their approach:

The human validation process, while crucial, still relies on subjective judgments and may not fully capture the nuances of user intents.
The taxonomy and labeling process is tailored to the specific dataset and use case, so it may not generalize perfectly to other contexts.
There are still open questions about the long-term robustness and stability of the LLM-generated taxonomies, and how they might evolve over time.

Additionally, one could question whether the human-in-the-loop approach truly solves the issue of model biases, or if there are still latent biases that get encoded into the final taxonomy.

Overall, however, the proposed methodology represents a significant advancement in user intent analysis, addressing key practical and methodological challenges. The work also provides a new framework for generating, validating, and applying taxonomies in a scalable and adaptable way - a capability that could be useful for a wide range of applications beyond just user intent analysis.

Conclusion

This research tackles the important challenge of understanding user intents from web search and chat log data, which can reveal valuable insights for improving digital services. By combining the generative power of large language models with human validation, the researchers have developed a novel methodology that addresses the limitations of existing approaches.

The resulting system not only enables more effective user intent analysis, but also provides a new framework for generating and validating taxonomies in a scalable and adaptable way. This work has the potential to drive important advancements in user-centric design and research, with implications for a broad range of digital technologies and services.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

Syed Mekael Wasti, Ken Q. Pu, Ali Neshati

The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex problems. In this paper, we present our efforts toward constructing a framework that can serve as an intermediary between a user and their user interface (UI), enabling dynamic and real-time interactions. We employ a system that stands upon textual semantic mappings of UI components, in the form of annotations. These mappings are stored, parsed, and scaled in a custom data structure, supplementary to an agent-based prompting backend engine. Employing textual semantic mappings allows each component to not only explain its role to the engine but also provide expectations. By comprehending the needs of both the user and the components, our LLM engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions. Such an integration evolves static user interfaces into highly dynamic and adaptable solutions, introducing a new frontier of intelligent and responsive user experiences.

4/17/2024

cs.HC cs.AI cs.CL cs.LG

Large language models can accurately predict searcher preferences

Paul Thomas, Seth Spielman, Nick Craswell, Bhaskar Mitra

Relevance labels, which indicate whether a search result is valuable to a searcher, are key to evaluating and optimising search systems. The best way to capture the true preferences of users is to ask them for their careful feedback on which results would be useful, but this approach does not scale to produce a large number of labels. Getting relevance labels at scale is usually done with third-party labellers, who judge on behalf of the user, but there is a risk of low-quality data if the labeller doesn't understand user needs. To improve quality, one standard approach is to study real users through interviews, user studies and direct feedback, find areas where labels are systematically disagreeing with users, then educate labellers about user needs through judging guidelines, training and monitoring. This paper introduces an alternate approach for improving label quality. It takes careful feedback from real users, which by definition is the highest-quality first-party gold data that can be derived, and develops an large language model prompt that agrees with that data. We present ideas and observations from deploying language models for large-scale relevance labelling at Bing, and illustrate with data from TREC. We have found large language models can be effective, with accuracy as good as human labellers and similar capability to pick the hardest queries, best runs, and best groups. Systematic changes to the prompts make a difference in accuracy, but so too do simple paraphrases. To measure agreement with real searchers needs high-quality gold labels, but with these we find that models produce better labels than third-party workers, for a fraction of the cost, and these labels let us train notably better rankers.

5/20/2024

cs.IR cs.AI cs.CL cs.LG

Using Large Language Models to Enrich the Documentation of Datasets for Machine Learning

Joan Giner-Miguelez, Abel G'omez, Jordi Cabot

Recent regulatory initiatives like the European AI Act and relevant voices in the Machine Learning (ML) community stress the need to describe datasets along several key dimensions for trustworthy AI, such as the provenance processes and social concerns. However, this information is typically presented as unstructured text in accompanying documentation, hampering their automated analysis and processing. In this work, we explore using large language models (LLM) and a set of prompting strategies to automatically extract these dimensions from documents and enrich the dataset description with them. Our approach could aid data publishers and practitioners in creating machine-readable documentation to improve the discoverability of their datasets, assess their compliance with current AI regulations, and improve the overall quality of ML models trained on them. In this paper, we evaluate the approach on 12 scientific dataset papers published in two scientific journals (Nature's Scientific Data and Elsevier's Data in Brief) using two different LLMs (GPT3.5 and Flan-UL2). Results show good accuracy with our prompt extraction strategies. Concrete results vary depending on the dimensions, but overall, GPT3.5 shows slightly better accuracy (81,21%) than FLAN-UL2 (69,13%) although it is more prone to hallucinations. We have released an open-source tool implementing our approach and a replication package, including the experiments' code and results, in an open-source repository.

5/27/2024

cs.DL cs.AI cs.CL

Large language models as oracles for instantiating ontologies with domain-specific knowledge

Giovanni Ciatto, Andrea Agiollo, Matteo Magnini, Andrea Omicini

Background. Endowing intelligent systems with semantic data commonly requires designing and instantiating ontologies with domain-specific knowledge. Especially in the early phases, those activities are typically performed manually by human experts possibly leveraging on their own experience. The resulting process is therefore time-consuming, error-prone, and often biased by the personal background of the ontology designer. Objective. To mitigate that issue, we propose a novel domain-independent approach to automatically instantiate ontologies with domain-specific knowledge, by leveraging on large language models (LLMs) as oracles. Method. Starting from (i) an initial schema composed by inter-related classes andproperties and (ii) a set of query templates, our method queries the LLM multi- ple times, and generates instances for both classes and properties from its replies. Thus, the ontology is automatically filled with domain-specific knowledge, compliant to the initial schema. As a result, the ontology is quickly and automatically enriched with manifold instances, which experts may consider to keep, adjust, discard, or complement according to their own needs and expertise. Contribution. We formalise our method in general way and instantiate it over various LLMs, as well as on a concrete case study. We report experiments rooted in the nutritional domain where an ontology of food meals and their ingredients is semi-automatically instantiated from scratch, starting from a categorisation of meals and their relationships. There, we analyse the quality of the generated ontologies and compare ontologies attained by exploiting different LLMs. Finally, we provide a SWOT analysis of the proposed method.

4/8/2024

cs.AI cs.CL cs.IR cs.LG cs.LO