PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery

Read original: arXiv:2409.01184 - Published 9/4/2024 by Adrito Das, Danyal Z. Khan, Dimitrios Psychogyios, Yitong Zhang, John G. Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou and 22 others

PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery

Overview

The paper discusses the PitVis-2023 Challenge, which aims to develop algorithms for recognizing surgical workflows in videos of endoscopic pituitary surgeries.
Pituitary surgery is a complex procedure that requires precise coordination between the surgeon and the surgical team.
Automated recognition of the surgical workflow could assist surgeons, improve surgical training, and enhance patient safety.

Plain English Explanation

The paper describes a challenge called PitVis-2023, which is focused on improving computer vision techniques for analyzing videos of pituitary surgeries. Pituitary surgery is a delicate procedure where the surgeon operates on a small gland at the base of the brain. These surgeries require careful coordination between the surgeon and the entire surgical team.

The goal of the PitVis-2023 Challenge is to develop algorithms that can automatically recognize the different steps or "workflow" of the pituitary surgery as it unfolds in the video. Being able to automatically identify the various stages of the surgery could help surgeons in several ways. It could provide real-time guidance to assist the surgical team, improve training for new surgeons, and enhance patient safety by detecting any deviations from the standard surgical process.

The paper doesn't go into the technical details of the challenge or the specific algorithms being developed. However, the overall aim is to leverage computer vision and machine learning techniques to better understand and support this complex surgical procedure.

Technical Explanation

The paper introduces the PitVis-2023 Challenge, which focuses on workflow recognition in videos of endoscopic pituitary surgeries. Pituitary surgeries are delicate procedures that require precise coordination between the surgeon and the surgical team. Automating the recognition of the surgical workflow could provide real-time guidance to surgeons, improve surgical training, and enhance patient safety by detecting deviations from standard practice.

The challenge involves developing computer vision algorithms to analyze surgery videos and identify the different steps or stages of the procedure. This could include recognizing the use of specific surgical instruments, the order of tasks performed, and the interactions between the surgical team members.

The paper does not provide details on the specific dataset, evaluation metrics, or technical approaches being used in the challenge. However, it highlights the potential benefits of this type of computer-assisted surgical guidance system, including:

Providing real-time feedback and decision support to the surgical team
Enhancing surgical training by analyzing expert performance
Improving patient safety by detecting deviations from standard surgical workflows

Critical Analysis

The paper introduces an interesting challenge focused on a clinically important problem, but it lacks key details about the technical approach and dataset. Without more information, it's difficult to assess the feasibility or potential impact of the proposed workflow recognition system.

Some potential limitations or areas for further research include:

The diversity and quality of the video dataset used to train the algorithms
The ability of the computer vision models to generalize beyond the specific pituitary surgery use case
The integration of the workflow recognition system into the existing surgical workflow and its acceptance by surgeons
The potential for false positives or other errors that could impact surgical decision-making

Additionally, the paper does not address important ethical considerations, such as patient privacy, bias in the training data, or the responsibility of the algorithm's outputs in a high-stakes medical setting.

Overall, the PitVis-2023 Challenge represents an important step towards leveraging computer vision to enhance surgical precision and safety. However, more details are needed to fully evaluate the technical approach and its potential real-world impact.

Conclusion

The PitVis-2023 Challenge aims to develop computer vision algorithms that can automatically recognize the surgical workflow in videos of endoscopic pituitary surgeries. Pituitary surgery is a complex procedure that requires careful coordination between the surgeon and the entire surgical team. Automating the recognition of the different steps or stages of the surgery could provide real-time guidance to the surgical team, improve training for new surgeons, and enhance patient safety by detecting deviations from standard practice.

While the paper lacks technical details, it highlights the potential benefits of this type of computer-assisted surgical guidance system. Further research is needed to address the limitations and ethical considerations, but the PitVis-2023 Challenge represents an important step towards leveraging advanced computer vision techniques to support complex medical procedures and improve patient outcomes.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery

Adrito Das, Danyal Z. Khan, Dimitrios Psychogyios, Yitong Zhang, John G. Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou, Guoyan Zheng, Abdul Qayyum, Moona Mazher, Imran Razzak, Tianbin Li, Jin Ye, Junjun He, Szymon P{l}otka, Joanna Kaleta, Amine Yamlahi, Antoine Jund, Patrick Godau, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Dominik Rivoir, Alejandra P'erez, Santiago Rodriguez, Pablo Arbel'aez, Danail Stoyanov, Hani J. Marcus, Sophia Bano

The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery: including which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery; during live surgery; and when writing operation notes. The Pituitary Vision (PitVis) 2023 Challenge tasks the community to step and instrument recognition in videos of endoscopic pituitary surgery. This is a unique task when compared to other minimally invasive surgeries due to the smaller working space, which limits and distorts vision; and higher frequency of instrument and step switching, which requires more precise model predictions. Participants were provided with 25-videos, with results presented at the MICCAI-2023 conference as part of the Endoscopic Vision 2023 Challenge in Vancouver, Canada, on 08-Oct-2023. There were 18-submissions from 9-teams across 6-countries, using a variety of deep learning models. A commonality between the top performing models was incorporating spatio-temporal and multi-task methods, with greater than 50% and 10% macro-F1-score improvement over purely spacial single-task models in step and instrument recognition respectively. The PitVis-2023 Challenge therefore demonstrates state-of-the-art computer vision models in minimally invasive surgery are transferable to a new dataset, with surgery specific techniques used to enhance performance, progressing the field further. Benchmark results are provided in the paper, and the dataset is publicly available at: https://doi.org/10.5522/04/26531686.

9/4/2024

Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction

Gary Sarwin, Alessandro Carretta, Victor Staartjes, Matteo Zoli, Diego Mazzatenta, Luca Regli, Carlo Serra, Ender Konukoglu

Localizing oneself during endoscopic procedures can be problematic due to the lack of distinguishable textures and landmarks, as well as difficulties due to the endoscopic device such as a limited field of view and challenging lighting conditions. Expert knowledge shaped by years of experience is required for localization within the human body during endoscopic procedures. In this work, we present a deep learning method based on anatomy recognition, that constructs a surgical path in an unsupervised manner from surgical videos, modelling relative location and variations due to different viewing angles. At inference time, the model can map an unseen video's frames on the path and estimate the viewing angle, aiming to provide guidance, for instance, to reach a particular destination. We test the method on a dataset consisting of surgical videos of transsphenoidal adenomectomies, as well as on a synthetic dataset. An online tool that lets researchers upload their surgical videos to obtain anatomy detections and the weights of the trained YOLOv7 model are available at: https://surgicalvision.bmic.ethz.ch.

5/16/2024

Thoracic Surgery Video Analysis for Surgical Phase Recognition

Syed Abdul Mateen, Niharika Malvia, Syed Abdul Khader, Danny Wang, Deepti Srinivasan, Chi-Fu Jeffrey Yang, Lana Schumacher, Sandeep Manjanna

This paper presents an approach for surgical phase recognition using video data, aiming to provide a comprehensive understanding of surgical procedures for automated workflow analysis. The advent of robotic surgery, digitized operating rooms, and the generation of vast amounts of data have opened doors for the application of machine learning and computer vision in the analysis of surgical videos. Among these advancements, Surgical Phase Recognition(SPR) stands out as an emerging technology that has the potential to recognize and assess the ongoing surgical scenario, summarize the surgery, evaluate surgical skills, offer surgical decision support, and facilitate medical training. In this paper, we analyse and evaluate both frame-based and video clipping-based phase recognition on thoracic surgery dataset consisting of 11 classes of phases. Specifically, we utilize ImageNet ViT for image-based classification and VideoMAE as the baseline model for video-based classification. We show that Masked Video Distillation(MVD) exhibits superior performance, achieving a top-1 accuracy of 72.9%, compared to 52.31% achieved by ImageNet ViT. These findings underscore the efficacy of video-based classifiers over their image-based counterparts in surgical phase recognition tasks.

6/14/2024

SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction

c{C}au{g}han Koksal, Ghazal Ghazaei, Felix Holm, Azade Farshad, Nassir Navab

Graph-based holistic scene representations facilitate surgical workflow understanding and have recently demonstrated significant success. However, this task is often hindered by the limited availability of densely annotated surgical scene data. In this work, we introduce an end-to-end framework for the generation and optimization of surgical scene graphs on a downstream task. Our approach leverages the flexibility of graph-based spectral clustering and the generalization capability of foundation models to generate unsupervised scene graphs with learnable properties. We reinforce the initial spatial graph with sparse temporal connections using local matches between consecutive frames to predict temporally consistent clusters across a temporal neighborhood. By jointly optimizing the spatiotemporal relations and node features of the dynamic scene graph with the downstream task of phase segmentation, we address the costly and annotation-burdensome task of semantic scene comprehension and scene graph generation in surgical videos using only weak surgical phase labels. Further, by incorporating effective intermediate scene representation disentanglement steps within the pipeline, our solution outperforms the SOTA on the CATARACTS dataset by 8% accuracy and 10% F1 score in surgical workflow recognition

7/30/2024