Is Medieval Distant Viewing Possible? : Extending and Enriching Annotation of Legacy Image Collections using Visual Analytics

Read original: arXiv:2208.09657 - Published 4/12/2024 by Christofer Meinecke, Estelle Gu'eville, David Joseph Wrisley, Stefan Janicke

🖼️

Overview

The paper describes a method for working with historical image datasets, such as medieval manuscript images, that have conflicting or overlapping metadata.
The researchers aimed to (1) create a more uniform set of descriptive labels to bridge the combined dataset, and (2) establish a high-quality hierarchical classification that can be used as input for supervised machine learning.
To achieve these goals, the researchers developed visualization and interaction mechanisms that enable experts to combine, regularize, and extend the vocabulary used to describe these image datasets.

Plain English Explanation

The researchers were working with historical images, such as from medieval manuscripts, that had confusing or contradictory information about what was in the images. Since manually fixing this metadata would be very time-consuming, they wanted to find a way to create a more uniform set of labels to describe the images and organize those labels into a clear hierarchy.

To do this, they developed some interactive visual tools that allowed experts to look at the relationships between the different labels and metadata. This helped the experts combine, organize and expand the vocabulary used to describe the images. The goal was to end up with a high-quality set of labels and categories that could be used to train machine learning models on the historical image data.

Technical Explanation

The paper describes a method for working with historical image datasets, such as medieval manuscript images, that have conflicting or overlapping metadata. The researchers aimed to (1) create a more uniform set of descriptive labels to bridge the combined dataset, and (2) establish a high-quality hierarchical classification that can be used as input for supervised machine learning.

To achieve these goals, the researchers developed visualization and interaction mechanisms that enable experts to combine, regularize, and extend the vocabulary used to describe these image datasets. The visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata. Word and image embeddings as well as co-occurrences of labels across the datasets enable batch re-annotation of images, recommendation of label candidates and support composing a hierarchical classification of labels.

Critical Analysis

The paper addresses an important challenge in working with historical image datasets, where the quality and consistency of metadata can be a significant obstacle. The researchers' approach of leveraging expert knowledge and interactive visualization tools to regularize and extend the vocabulary is a reasonable solution, though the effectiveness would depend on the specific datasets and experts involved.

One potential limitation is the scalability of the approach, as manually reconciling and curating metadata can become prohibitively time-consuming for very large datasets. The researchers do not provide details on the size or complexity of the datasets they worked with, so it's unclear how well the method would scale.

Additionally, the paper does not discuss potential biases or blindspots that may be introduced by relying on expert knowledge. The experts' perspectives and interpretations could influence the final hierarchical classification, which could then impact the performance of any subsequent machine learning models.

Overall, the research presents a thoughtful approach to a challenging problem, but further investigation into the scalability and potential biases of the method would be valuable.

Conclusion

This paper describes a novel approach to working with historical image datasets that have inconsistent or conflicting metadata. By developing interactive visualization tools, the researchers enabled experts to combine, organize, and expand the vocabulary used to describe the images. This resulted in a high-quality hierarchical classification that can be used as input for supervised machine learning models.

While the approach has some limitations in terms of scalability and potential biases, it represents an important step forward in making historical image collections more accessible and useful for research and applications. The techniques developed in this paper could be broadly applicable to other domains grappling with the challenges of working with legacy data.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Is Medieval Distant Viewing Possible? : Extending and Enriching Annotation of Legacy Image Collections using Visual Analytics

Christofer Meinecke, Estelle Gu'eville, David Joseph Wrisley, Stefan Janicke

Distant viewing approaches have typically used image datasets close to the contemporary image data used to train machine learning models. To work with images from other historical periods requires expert annotated data, and the quality of labels is crucial for the quality of results. Especially when working with cultural heritage collections that contain myriad uncertainties, annotating data, or re-annotating, legacy data is an arduous task. In this paper, we describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata. Since a manual reconciliation of the two legacy ontologies would be very expensive, we aim (1) to create a more uniform set of descriptive labels to serve as a bridge in the combined dataset, and (2) to establish a high quality hierarchical classification that can be used as a valuable input for subsequent supervised machine learning. To achieve these goals, we developed visualization and interaction mechanisms, enabling medievalists to combine, regularize and extend the vocabulary used to describe these, and other cognate, image datasets. The visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata. Word and image embeddings as well as co-occurrences of labels across the datasets, enable batch re-annotation of images, recommendation of label candidates and support composing a hierarchical classification of labels.

4/12/2024

Multimodal Metadata Assignment for Cultural Heritage Artifacts

Luis Rei, Dunja Mladeni'c, Mareike Dorozynski, Franz Rottensteiner, Thomas Schleider, Raphael Troncy, Jorge Sebasti'an Lozano, Mar Gait'an Salvatella

We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers and use the focal loss to handle class imbalance. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged specific data models and taxonomy in a Knowledge Graph to create the dataset and to store classification results. All individual classifiers accurately predict missing properties in the digitized silk artifacts, with the multimodal approach providing the best results.

6/4/2024

Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation

Lars Schmarje, Vasco Grossmann, Claudius Zelenka, Johannes Brunger, Reinhard Koch

In the field of image classification, existing methods often struggle with biased or ambiguous data, a prevalent issue in real-world scenarios. Current strategies, including semi-supervised learning and class blending, offer partial solutions but lack a definitive resolution. Addressing this gap, our paper introduces a novel strategy for generating high-quality labels in challenging datasets. Central to our approach is a clearly designed flowchart, based on a broad literature review, which enables the creation of reliable labels. We validate our methodology through a rigorous real-world test case in the biomedical field, specifically in deducing height reduction from vertebral imaging. Our empirical study, leveraging over 250,000 annotations, demonstrates the effectiveness of our strategies decisions compared to their alternatives.

4/30/2024

📊

No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data

Christopher Klugmann, Rafid Mahmood, Guruprasad Hegde, Amit Kale, Daniel Kondermann

Labeling visual data is expensive and time-consuming. Crowdsourcing systems promise to enable highly parallelizable annotations through the participation of monetarily or otherwise motivated workers, but even this approach has its limits. The solution: replace manual work with machine work. But how reliable are machine annotators? Sacrificing data quality for high throughput cannot be acceptable, especially in safety-critical applications such as autonomous driving. In this paper, we present a framework that enables quality checking of visual data at large scales without sacrificing the reliability of the results. We ask annotators simple questions with discrete answers, which can be highly automated using a convolutional neural network trained to predict crowd responses. Unlike the methods of previous work, which aim to directly predict soft labels to address human uncertainty, we use per-task posterior distributions over soft labels as our training objective, leveraging a Dirichlet prior for analytical accessibility. We demonstrate our approach on two challenging real-world automotive datasets, showing that our model can fully automate a significant portion of tasks, saving costs in the high double-digit percentage range. Our model reliably predicts human uncertainty, allowing for more accurate inspection and filtering of difficult examples. Additionally, we show that the posterior distributions over soft labels predicted by our model can be used as priors in further inference processes, reducing the need for numerous human labelers to approximate true soft labels accurately. This results in further cost reductions and more efficient use of human resources in the annotation process.

9/4/2024