WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

Read original: arXiv:2406.09211 - Published 6/18/2024 by Luk'av{s} Adam, Vojtv{e}ch v{C}erm'ak, Kostas Papafitsoros, Lukas Picek

🤔

Overview

Creation of a new wildlife re-identification dataset called WildlifeReID-10k, containing over 214k images of 10k individual animals
Diversity of animals included, from marine turtles to African herbivores
Argument that standard dataset splits are inadequate for wildlife re-identification, proposing a new similarity-aware split
Inclusion of similarity-aware splits for both closed-set and open-set settings, and a baseline performance using MegaDescriptor
Dataset and code publicly available in the wildlife-datasets library

Plain English Explanation

The researchers have created a new dataset called WildlifeReID-10k that contains over 214,000 images of 10,000 individual animals. This dataset is a collection of 30 existing wildlife re-identification datasets, with additional processing steps. The animals included are extremely diverse, ranging from marine turtles and primates to birds, African herbivores, marine mammals, and even domestic animals.

The researchers argue that the standard method of randomly splitting datasets into training and testing sets is not adequate for wildlife re-identification tasks. Instead, they propose a new "similarity-aware" split, which groups similar images together. This is important because many wildlife images can look very alike, and a random split may not properly test a model's ability to distinguish between individual animals.

To help other researchers evaluate their methods, the researchers have included similarity-aware splits for both "closed-set" and "open-set" settings. They've also provided a baseline performance using a model called MegaDescriptor, which is a foundational model for wildlife re-identification. Finally, they've made the entire dataset and the code used to create it publicly available in the wildlife-datasets library, making it easy for others to use and build upon.

Technical Explanation

The researchers have created a new wildlife re-identification dataset called WildlifeReID-10k, which contains over 214,000 images of 10,000 individual animals. This dataset is a curated collection of 30 existing wildlife re-identification datasets, with additional processing steps to ensure consistency and high quality.

The animals included in the dataset are highly diverse, ranging from marine turtles and primates to birds, African herbivores, marine mammals, and domestic animals. The researchers argue that this diversity is important for evaluating the performance of wildlife re-identification models, as they need to be able to handle a wide range of animal species and appearances.

A key contribution of this work is the researchers' proposal of a new "similarity-aware" dataset split, rather than the standard random split. They argue that the ubiquity of similar images in wildlife datasets means that a random split is inadequate for properly testing a model's ability to distinguish between individual animals. The similarity-aware split groups similar images together, ensuring that the training and testing sets are more distinct.

To promote fair method comparison, the researchers have included similarity-aware splits for both "closed-set" and "open-set" settings. In the closed-set setting, all individuals are represented in both the training and testing sets, while in the open-set setting, some individuals are only present in the testing set. They have also provided a baseline performance using the MegaDescriptor model, which is a foundational model for wildlife re-identification.

Finally, the researchers have made the entire WildlifeReID-10k dataset and the code used to create it publicly available in the wildlife-datasets library, ensuring that the dataset is both highly curated and easy for others to use and build upon.

Critical Analysis

The researchers have made a significant contribution to the field of wildlife re-identification by creating the WildlifeReID-10k dataset. The diversity of animals included and the proposed similarity-aware dataset split are important advances that address limitations of previous datasets and evaluation methods.

However, the researchers do acknowledge some potential limitations of their work. For example, they note that the dataset may still not capture the full range of variation in wildlife appearances, and that the similarity-aware split may not be the only way to properly evaluate wildlife re-identification models.

Additionally, while the researchers have provided a baseline performance using the MegaDescriptor model, it would be interesting to see how other state-of-the-art models perform on this dataset. This could help researchers better understand the current capabilities and limitations of wildlife re-identification systems.

Overall, the WildlifeReID-10k dataset and the researchers' proposed approach represent an important step forward in the field of wildlife re-identification. By making the dataset and code publicly available, the researchers have enabled other researchers to build upon this work and continue advancing the state of the art.

Conclusion

The researchers have introduced a new wildlife re-identification dataset called WildlifeReID-10k, which contains over 214,000 images of 10,000 individual animals from a diverse range of species. They argue that the standard random dataset splits are inadequate for wildlife re-identification tasks and propose a new similarity-aware split to better evaluate model performance.

The researchers have included similarity-aware splits for both closed-set and open-set settings, and have provided a baseline performance using the MegaDescriptor model. By making the dataset and code publicly available, the researchers have created a valuable resource for the wildlife re-identification community, enabling further research and advancement in this important field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤔

WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

Luk'av{s} Adam, Vojtv{e}ch v{C}erm'ak, Kostas Papafitsoros, Lukas Picek

We introduce a new wildlife re-identification dataset WildlifeReID-10k with more than 214k images of 10k individual animals. It is a collection of 30 existing wildlife re-identification datasets with additional processing steps. WildlifeReID-10k contains animals as diverse as marine turtles, primates, birds, African herbivores, marine mammals and domestic animals. Due to the ubiquity of similar images in datasets, we argue that the standard (random) splits into training and testing sets are inadequate for wildlife re-identification and propose a new similarity-aware split based on the similarity of extracted features. To promote fair method comparison, we include similarity-aware splits both for closed-set and open-set settings, use MegaDescriptor - a foundational model for wildlife re-identification - for baseline performance and host a leaderboard with the best results. We publicly publish the dataset and the codes used to create it in the wildlife-datasets library, making WildlifeReID-10k both highly curated and easy to use.

6/18/2024

🌀

SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification

Luk'av{s} Adam, Vojtv{e}ch v{C}erm'ak, Kostas Papafitsoros, Luk'av{s} Picek

This paper introduces the first public large-scale, long-span dataset with sea turtle photographs captured in the wild -- SeaTurtleID2022 (https://www.kaggle.com/datasets/wildlifedatasets/seaturtleid2022). The dataset contains 8729 photographs of 438 unique individuals collected within 13 years, making it the longest-spanned dataset for animal re-identification. All photographs include various annotations, e.g., identity, encounter timestamp, and body parts segmentation masks. Instead of standard random splits, the dataset allows for two realistic and ecologically motivated splits: (i) a time-aware closed-set with training, validation, and test data from different days/years, and (ii) a time-aware open-set with new unknown individuals in test and validation sets. We show that time-aware splits are essential for benchmarking re-identification methods, as random splits lead to performance overestimation. Furthermore, a baseline instance segmentation and re-identification performance over various body parts is provided. Finally, an end-to-end system for sea turtle re-identification is proposed and evaluated. The proposed system based on Hybrid Task Cascade for head instance segmentation and ArcFace-trained feature-extractor achieved an accuracy of 86.8%.

5/1/2024

🤷

Addressing the Elephant in the Room: Robust Animal Re-Identification with Unsupervised Part-Based Feature Alignment

Yingxue Yu, Vidit Vidit, Andrey Davydov, Martin Engilberge, Pascal Fua

Animal Re-ID is crucial for wildlife conservation, yet it faces unique challenges compared to person Re-ID. First, the scarcity and lack of diversity in datasets lead to background-biased models. Second, animal Re-ID depends on subtle, species-specific cues, further complicated by variations in pose, background, and lighting. This study addresses background biases by proposing a method to systematically remove backgrounds in both training and evaluation phases. And unlike prior works that depend on pose annotations, our approach utilizes an unsupervised technique for feature alignment across body parts and pose variations, enhancing practicality. Our method achieves superior results on three key animal Re-ID datasets: ATRW, YakReID-103, and ELPephants.

5/24/2024

PetFace: A Large-Scale Dataset and Benchmark for Animal Identification

Risa Shinoda, Kaede Shiohara

Automated animal face identification plays a crucial role in the monitoring of behaviors, conducting of surveys, and finding of lost animals. Despite the advancements in human face identification, the lack of datasets and benchmarks in the animal domain has impeded progress. In this paper, we introduce the PetFace dataset, a comprehensive resource for animal face identification encompassing 257,484 unique individuals across 13 animal families and 319 breed categories, including both experimental and pet animals. This large-scale collection of individuals facilitates the investigation of unseen animal face verification, an area that has not been sufficiently explored in existing datasets due to the limited number of individuals. Moreover, PetFace also has fine-grained annotations such as sex, breed, color, and pattern. We provide multiple benchmarks including re-identification for seen individuals and verification for unseen individuals. The models trained on our dataset outperform those trained on prior datasets, even for detailed breed variations and unseen animal families. Our result also indicates that there is some room to improve the performance of integrated identification on multiple animal families. We hope the PetFace dataset will facilitate animal face identification and encourage the development of non-invasive animal automatic identification methods.

8/21/2024