EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos

Read original: arXiv:2405.19644 - Published 5/31/2024 by Ryo Fujii, Masashi Hatano, Hideo Saito, Hiroki Kajita

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos

Overview

This paper introduces EgoSurgery-Phase, a dataset of egocentric open surgery videos for the task of surgical phase recognition.
The dataset contains videos from multiple surgical procedures, with phase-level annotations provided by surgical experts.
The researchers explore the use of masked autoencoder models for learning phase-discriminative representations from the video data.

Plain English Explanation

The EgoSurgery-Phase dataset is a collection of videos recorded from the perspective of a surgeon during open surgery procedures. These "egocentric" videos provide a unique view of the surgical workflow, capturing the surgeon's movements and interactions with the surgical site.

The key challenge addressed in this research is automatically recognizing the different phases or steps within a surgical procedure. For example, during a hip replacement surgery, there may be distinct phases like incision, tissue dissection, implant insertion, and closing. Being able to accurately identify these phases in real-time could assist surgeons and provide valuable insights for surgical training and optimization.

To tackle this challenge, the researchers developed a dataset that includes annotations from surgical experts, who have labeled the various phases present in each video. This dataset can then be used to train machine learning models to learn the visual patterns and transitions associated with different surgical phases.

The paper explores the use of a technique called masked autoencoder modeling, which has shown promise for learning useful representations from video data in an unsupervised manner. By intentionally masking or hiding certain parts of the input video, the model is forced to learn a more robust and generalizable understanding of the underlying surgical activities.

Technical Explanation

The EgoSurgery-Phase dataset consists of 90 egocentric videos of open surgical procedures, with a total duration of over 24 hours. The videos span multiple surgical specialties, including general surgery, orthopedics, and neurosurgery. Each video has been annotated by surgical experts, who have labeled the distinct phases present in the procedure.

To explore the task of surgical phase recognition, the researchers evaluate the use of masked autoencoder models. These models are trained to reconstruct the original video frames from a partially masked input, forcing the model to learn a more comprehensive representation of the surgical activities. The learned representations can then be used for downstream tasks, such as phase classification.

The researchers experiment with different masking strategies and model architectures, including convolutional and transformer-based models. They find that the masked autoencoder approach outperforms supervised learning baselines, demonstrating the potential of self-supervised learning for surgical video understanding.

Critical Analysis

The EgoSurgery-Phase dataset provides a valuable resource for the research community, as it addresses the need for high-quality, annotated surgical video data. The diversity of surgical procedures included in the dataset is a strength, as it allows for more generalizable models to be developed.

However, the dataset is limited to open surgery videos, and further research would be needed to explore the applicability of the techniques to minimally invasive or robotic surgical procedures, which have different visual characteristics.

Additionally, the paper does not provide a detailed analysis of the performance of the masked autoencoder models across different surgical phases or procedures. A more in-depth evaluation could shed light on the model's strengths, weaknesses, and potential areas for improvement.

While the use of self-supervised learning is a promising direction, the paper does not compare the masked autoencoder approach to other self-supervised techniques, such as contrastive learning or video-based pretraining. A more comprehensive comparison could help contextualize the findings and identify the most effective strategies for surgical video understanding.

Conclusion

The EgoSurgery-Phase dataset and the exploration of masked autoencoder models for surgical phase recognition represent an important step forward in the field of computer-assisted surgery. By leveraging egocentric video data and self-supervised learning techniques, this research has the potential to enhance surgical workflows, improve training programs, and ultimately contribute to better patient outcomes.

As the field of computer vision continues to advance, the integration of these technologies into the surgical domain will become increasingly crucial. The insights gained from this work can serve as a foundation for future research, ultimately leading to more intelligent and assistive systems for healthcare professionals.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos

Ryo Fujii, Masashi Hatano, Hideo Saito, Hiroki Kajita

Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named EgoSurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon's head. In addition to video, the EgoSurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons' gaze focuses are often critical for surgical phase recognition (e.g., surgical field), in our GGMAE, the gaze information acts as an empirical semantic richness prior to guiding the masking process, promoting better attention to semantically rich spatial regions. GGMAE significantly improves the previous state-of-the-art recognition method (6.4% in Jaccard) and the masked autoencoder-based method (3.1% in Jaccard) on EgoSurgery-Phase. The dataset will be released at https://github.com/Fujiry0/EgoSurgery.

5/31/2024

EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos

Ryo Fujii, Hideo Saito, Hiroki Kajita

Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures, and heavy occlusion. The lack of a comprehensive large-scale dataset compounds these challenges. In this paper, we introduce EgoSurgery-Tool, an extension of the existing EgoSurgery-Phase dataset, which contains real open surgery videos captured using an egocentric camera attached to the surgeon's head, along with phase annotations. EgoSurgery-Tool has been densely annotated with surgical tools and comprises over 49K surgical tool bounding boxes across 15 categories, constituting a large-scale surgical tool detection dataset. EgoSurgery-Tool also provides annotations for hand detection with over 46K hand-bounding boxes, capturing hand-object interactions that are crucial for understanding activities in egocentric open surgery. EgoSurgery-Tool is superior to existing datasets due to its larger scale, greater variety of surgical tools, more annotations, and denser scenes. We conduct a comprehensive analysis of EgoSurgery-Tool using nine popular object detectors to assess their effectiveness in both surgical tool and hand detection. The dataset will be released at https://github.com/Fujiry0/EgoSurgery.

6/7/2024

Thoracic Surgery Video Analysis for Surgical Phase Recognition

Syed Abdul Mateen, Niharika Malvia, Syed Abdul Khader, Danny Wang, Deepti Srinivasan, Chi-Fu Jeffrey Yang, Lana Schumacher, Sandeep Manjanna

This paper presents an approach for surgical phase recognition using video data, aiming to provide a comprehensive understanding of surgical procedures for automated workflow analysis. The advent of robotic surgery, digitized operating rooms, and the generation of vast amounts of data have opened doors for the application of machine learning and computer vision in the analysis of surgical videos. Among these advancements, Surgical Phase Recognition(SPR) stands out as an emerging technology that has the potential to recognize and assess the ongoing surgical scenario, summarize the surgery, evaluate surgical skills, offer surgical decision support, and facilitate medical training. In this paper, we analyse and evaluate both frame-based and video clipping-based phase recognition on thoracic surgery dataset consisting of 11 classes of phases. Specifically, we utilize ImageNet ViT for image-based classification and VideoMAE as the baseline model for video-based classification. We show that Masked Video Distillation(MVD) exhibits superior performance, achieving a top-1 accuracy of 72.9%, compared to 52.31% achieved by ImageNet ViT. These findings underscore the efficacy of video-based classifiers over their image-based counterparts in surgical phase recognition tasks.

6/14/2024

Robust Surgical Phase Recognition From Annotation Efficient Supervision

Or Rubin, Shlomi Laufer

Surgical phase recognition is a key task in computer-assisted surgery, aiming to automatically identify and categorize the different phases within a surgical procedure. Despite substantial advancements, most current approaches rely on fully supervised training, requiring expensive and time-consuming frame-level annotations. Timestamp supervision has recently emerged as a promising alternative, significantly reducing annotation costs while maintaining competitive performance. However, models trained on timestamp annotations can be negatively impacted by missing phase annotations, leading to a potential drawback in real-world scenarios. In this work, we address this issue by proposing a robust method for surgical phase recognition that can handle missing phase annotations effectively. Furthermore, we introduce the SkipTag@K annotation approach to the surgical domain, enabling a flexible balance between annotation effort and model performance. Our method achieves competitive results on two challenging datasets, demonstrating its efficacy in handling missing phase annotations and its potential for reducing annotation costs. Specifically, we achieve an accuracy of 85.1% on the MultiBypass140 dataset using only 3 annotated frames per video, showcasing the effectiveness of our method and the potential of the SkipTag@K setup. We perform extensive experiments to validate the robustness of our method and provide valuable insights to guide future research in surgical phase recognition. Our work contributes to the advancement of surgical workflow recognition and paves the way for more efficient and reliable surgical phase recognition systems.

6/27/2024