Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset

Read original: arXiv:2408.06507 - Published 8/14/2024 by Stefano Puliti, Emily R. Lines, Jana Mullerov'a, Julian Frey, Zoe Schindler, Adrian Straker, Matthew J. Allen, Lukas Winiwarter, Nataliia Rehush, Hristina Hristova and 23 others

🏷️

Overview

Laser scanning offers great potential for automating forest data collection
Deep learning (DL) shows promise for automating tree species identification from laser scans
But progress is hindered by the lack of large, diverse, labeled datasets of tree point clouds

Plain English Explanation

Deep learning is a powerful machine learning technique that can be used to automatically identify tree species from laser scanning data. Laser scanning, such as terrestrial laser scanning, mobile laser scanning, and drone laser scanning, can capture detailed 3D information about forests. Combining this data with deep learning holds great potential for automating the process of identifying different tree species in a forest, which is important for forestry management and research.

However, a key challenge has been the lack of large, diverse datasets of labeled tree point clouds that deep learning models can be trained on. Without these datasets, the models struggle to accurately and robustly identify tree species, especially when encountering new, unseen data. This has slowed progress in developing effective deep learning-based tree species classification systems.

Technical Explanation

To address this challenge, the researchers created the FOR-species20K benchmark dataset. This dataset contains over 20,000 tree point clouds from 33 different species, captured using terrestrial, mobile, and drone laser scanning across various European forests and other regions.

The researchers then benchmarked the performance of several state-of-the-art deep learning models for tree species classification using this dataset, including PointNet++, MinkNet, MLP-Mixer, DGCNN, SimpleView, DetailView, and YOLOv5.

The researchers found that the 2D image-based models generally performed better (average overall accuracy of 0.77) than the 3D point cloud-based models (average overall accuracy of 0.72). The top-performing model, DetailView, was particularly robust, handling data imbalances well and generalizing effectively across different tree sizes and scanning platforms.

Critical Analysis

The FOR-species20K dataset is a valuable contribution to the field, as it provides a large, diverse, and openly available resource for developing and benchmarking deep learning models for tree species classification from laser scanning data. This helps address the lack of suitable datasets that has previously hindered progress in this area.

However, the paper does not delve into potential limitations or areas for further research. For example, it would be interesting to understand how the performance of these models might be affected by factors such as the quality of the laser scanning data, the presence of occlusions or understory vegetation, or the geographic and environmental diversity of the training and test data.

Additionally, the researchers could have explored the interpretability and explainability of the deep learning models, as understanding the features and decision-making processes used by the models could lead to valuable insights for improving their performance and robustness.

Conclusion

The FOR-species20K dataset and the benchmarking of deep learning models for tree species classification using this dataset represent an important step forward in automating forest data capture and analysis. By providing a large, diverse, and openly available dataset, the researchers have laid the foundation for future advancements in this field, which could ultimately lead to more efficient and effective forest management and research.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏷️

Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset

Stefano Puliti, Emily R. Lines, Jana Mullerov'a, Julian Frey, Zoe Schindler, Adrian Straker, Matthew J. Allen, Lukas Winiwarter, Nataliia Rehush, Hristina Hristova, Brent Murray, Kim Calders, Louise Terryn, Nicholas Coops, Bernhard Hofle, Samuli Junttila, Martin Krr{u}v{c}ek, Grzegorz Krok, Kamil Kr'al, Shaun R. Levick, Linda Luck, Azim Missarov, Martin Mokrov{s}, Harry J. F. Owen, Krzysztof Stere'nczak, Timo P. Pitkanen, Nicola Puletti, Ninni Saarinen, Chris Hopkinson, Chiara Torresan, Enrico Tomelleri, Hannah Weiser, Rasmus Astrup

Proximally-sensed laser scanning offers significant potential for automated forest data capture, but challenges remain in automatically identifying tree species without additional ground data. Deep learning (DL) shows promise for automation, yet progress is slowed by the lack of large, diverse, openly available labeled datasets of single tree point clouds. This has impacted the robustness of DL models and the ability to establish best practices for species classification. To overcome these challenges, the FOR-species20K benchmark dataset was created, comprising over 20,000 tree point clouds from 33 species, captured using terrestrial (TLS), mobile (MLS), and drone laser scanning (ULS) across various European forests, with some data from other regions. This dataset enables the benchmarking of DL models for tree species classification, including both point cloud-based (PointNet++, MinkNet, MLP-Mixer, DGCNNs) and multi-view image-based methods (SimpleView, DetailView, YOLOv5). 2D image-based models generally performed better (average OA = 0.77) than 3D point cloud-based models (average OA = 0.72), with consistent results across different scanning platforms and sensors. The top model, DetailView, was particularly robust, handling data imbalances well and generalizing effectively across tree sizes. The FOR-species20K dataset, available at https://zenodo.org/records/13255198, is a key resource for developing and benchmarking DL models for tree species classification using laser scanning data, providing a foundation for future advancements in the field.

8/14/2024

PureForest: A Large-scale Aerial Lidar and Aerial Imagery Dataset for Tree Species Classification in Monospecific Forests

Charles Gaydon, Floryne Roche

Knowledge of tree species distribution is fundamental to managing forests. New deep learning approaches promise significant accuracy gains for forest mapping, and are becoming a critical tool for mapping multiple tree species at scale. To advance the field, deep learning researchers need large benchmark datasets with high-quality annotations. To this end, we present the PureForest dataset: a large-scale, open, multimodal dataset designed for tree species classification from both Aerial Lidar Scanning (ALS) point clouds and Very High Resolution (VHR) aerial images. Most current public Lidar datasets for tree species classification have low diversity as they only span a small area of a few dozen annotated hectares at most. In contrast, PureForest has 18 tree species grouped into 13 semantic classes, and spans 339 km$^2$ across 449 distinct monospecific forests, and is to date the largest and most comprehensive Lidar dataset for the identification of tree species. By making PureForest publicly available, we hope to provide a challenging benchmark dataset to support the development of deep learning approaches for tree species identification from Lidar and/or aerial imagery. In this data paper, we describe the annotation workflow, the dataset, the recommended evaluation methodology, and establish a baseline performance from both 3D and 2D modalities.

5/15/2024

Mining Field Data for Tree Species Recognition at Scale

Dimitri Gominski, Daniel Ortiz-Gonzalo, Martin Brandt, Maurice Mugabowindekwe, Rasmus Fensholt

Individual tree species labels are particularly hard to acquire due to the expert knowledge needed and the limitations of photointerpretation. Here, we present a methodology to automatically mine species labels from public forest inventory data, using available pretrained tree detection models. We identify tree instances in aerial imagery and match them with field data with close to zero human involvement. We conduct a series of experiments on the resulting dataset, and show a beneficial effect when adding noisy or even unlabeled data points, highlighting a strong potential for large-scale individual species mapping.

8/29/2024

Towards general deep-learning-based tree instance segmentation models

Jonathan Henrich, Jan van Delden

The segmentation of individual trees from forest point clouds is a crucial task for downstream analyses such as carbon sequestration estimation. Recently, deep-learning-based methods have been proposed which show the potential of learning to segment trees. Since these methods are trained in a supervised way, the question arises how general models can be obtained that are applicable across a wide range of settings. So far, training has been mainly conducted with data from one specific laser scanning type and for specific types of forests. In this work, we train one segmentation model under various conditions, using seven diverse datasets found in literature, to gain insights into the generalization capabilities under domain-shift. Our results suggest that a generalization from coniferous dominated sparse point clouds to deciduous dominated high-resolution point clouds is possible. Conversely, qualitative evidence suggests that generalization from high-resolution to low-resolution point clouds is challenging. This emphasizes the need for forest point clouds with diverse data characteristics for model development. To enrich the available data basis, labeled trees from two previous works were propagated to the complete forest point cloud and are made publicly available at https://doi.org/10.25625/QUTUWU.

5/6/2024