Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset

Read original: arXiv:2404.10505 - Published 4/17/2024 by Mahta Bakhshizadeh, Christian Jilek, Markus Schroder, Heiko Maus, Andreas Dengel

Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset

Overview

The RLKWiC dataset provides detailed data on real-life knowledge work in context, capturing various user activities and interactions.
This dataset aims to support research on context-based user modeling, knowledge work support, and personal information management.

Plain English Explanation

The RLKWiC dataset is a collection of data that provides a detailed look at how people do their work in the real world. It captures information about the various activities and interactions that people have while they are working, such as the programs they use, the documents they access, and the emails they send and receive.

The goal of this dataset is to help researchers better understand how people actually work in their day-to-day lives, rather than just relying on observations or self-reports. This could be useful for developing new tools and technologies that can better support people in their work, such as personalized digital assistants or systems that can anticipate a user's needs based on their context.

The dataset could also be used to study how people manage their personal information and how they switch between different tasks and projects throughout the day. This could lead to insights that help people be more productive and organized in their work.

Technical Explanation

The RLKWiC dataset was collected through a longitudinal study of knowledge workers in a variety of organizational settings. Participants were asked to install a data collection tool on their computers that logged various user activities, such as application usage, document access, email communication, and web browsing. The data was anonymized and aggregated to protect the privacy of the participants.

The dataset includes a wide range of features, such as timestamps, user identifiers, application names, file names, email headers, and web URLs. This rich data allows researchers to investigate patterns of knowledge work and to develop context-aware user models that can better support individual users.

Critical Analysis

The RLKWiC dataset provides a valuable resource for researchers studying knowledge work and personal information management. However, there are some limitations to consider:

The dataset may not be representative of all knowledge workers, as the participants were recruited from a specific organizational context.
The data collection tool may have introduced some bias or distortion in the observed user behaviors.
The anonymization process may have removed or obscured some potentially relevant contextual information.

Additionally, the dataset does not capture other important aspects of knowledge work, such as subjective experiences, cognitive processes, or collaborative interactions. Further research would be needed to understand the broader context and implications of the observed user behaviors.

Conclusion

The RLKWiC dataset represents an important step forward in the study of real-life knowledge work. By providing detailed, context-rich data on user activities and interactions, this dataset opens up new avenues for research on personalized digital assistants, knowledge work support, and personal information management. While the dataset has some limitations, it sets the stage for more comprehensive and insightful investigations into the complex and dynamic nature of knowledge work in the modern digital landscape.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset

Mahta Bakhshizadeh, Christian Jilek, Markus Schroder, Heiko Maus, Andreas Dengel

Over the years, various approaches have been employed to enhance the productivity of knowledge workers, from addressing psychological well-being to the development of personal knowledge assistants. A significant challenge in this research area has been the absence of a comprehensive, publicly accessible dataset that mirrors real-world knowledge work. Although a handful of datasets exist, many are restricted in access or lack vital information dimensions, complicating meaningful comparison and benchmarking in the domain. This paper presents RLKWiC, a novel dataset of Real-Life Knowledge Work in Context, derived from monitoring the computer interactions of eight participants over a span of two months. As the first publicly available dataset offering a wealth of essential information dimensions (such as explicated contexts, textual contents, and semantics), RLKWiC seeks to address the research gap in the personal information management domain, providing valuable insights for modeling user behavior.

4/17/2024

💬

Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets

Desiree Heim, Christian Jilek, Adrian Ulges, Andreas Dengel

Current publicly available knowledge work data collections lack diversity, extensive annotations, and contextual information about the users and their documents. These issues hinder objective and comparable data-driven evaluations and optimizations of knowledge work assistance systems. Due to the considerable resources needed to collect such data in real-life settings and the necessity of data censorship, collecting such a dataset appears nearly impossible. For this reason, we propose a configurable, multi-agent knowledge work dataset generator. This system simulates collaborative knowledge work among agents producing Large Language Model-generated documents and accompanying data traces. Additionally, the generator captures all background information, given in its configuration or created during the simulation process, in a knowledge graph. Finally, the resulting dataset can be utilized and shared without privacy or confidentiality concerns. This paper introduces our approach's design and vision and focuses on generating authentic knowledge work documents using Large Language Models. Our study involving human raters who assessed 53% of the generated and 74% of the real documents as realistic demonstrates the potential of our approach. Furthermore, we analyze the authenticity criteria mentioned in the participants' comments and elaborate on potential improvements for identified common issues.

9/9/2024

KNOW: A Real-World Ontology for Knowledge Capture with Large Language Models

Arto Bendiken

We present KNOW--the Knowledge Navigator Ontology for the World--the first ontology designed to capture everyday knowledge to augment large language models (LLMs) in real-world generative AI use cases such as personal AI assistants. Our domain is human life, both its everyday concerns and its major milestones. We have limited the initial scope of the modeled concepts to only established human universals: spacetime (places, events) plus social (people, groups, organizations). The inclusion criteria for modeled concepts are pragmatic, beginning with universality and utility. We compare and contrast previous work such as Schema.org and Cyc--as well as attempts at a synthesis of knowledge graphs and language models--noting how LLMs already encode internally much of the commonsense tacit knowledge that took decades to capture in the Cyc project. We also make available code-generated software libraries for the 12 most popular programming languages, enabling the direct use of ontology concepts in software engineering. We emphasize simplicity and developer experience in promoting AI interoperability.

5/31/2024

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models

Zhuoran Jin, Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, Jun Zhao

Large language models (LLMs) inevitably memorize sensitive, copyrighted, and harmful knowledge from the training corpus; therefore, it is crucial to erase this knowledge from the models. Machine unlearning is a promising solution for efficiently removing specific knowledge by post hoc modifying models. In this paper, we propose a Real-World Knowledge Unlearning benchmark (RWKU) for LLM unlearning. RWKU is designed based on the following three key factors: (1) For the task setting, we consider a more practical and challenging unlearning setting, where neither the forget corpus nor the retain corpus is accessible. (2) For the knowledge source, we choose 200 real-world famous people as the unlearning targets and show that such popular knowledge is widely present in various LLMs. (3) For the evaluation framework, we design the forget set and the retain set to evaluate the model's capabilities across various real-world applications. Regarding the forget set, we provide four four membership inference attack (MIA) methods and nine kinds of adversarial attack probes to rigorously test unlearning efficacy. Regarding the retain set, we assess locality and utility in terms of neighbor perturbation, general ability, reasoning ability, truthfulness, factuality, and fluency. We conduct extensive experiments across two unlearning scenarios, two models and six baseline methods and obtain some meaningful findings. We release our benchmark and code publicly at http://rwku-bench.github.io for future work.

6/18/2024