SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

Read original: arXiv:2405.20333 - Published 5/31/2024 by Chinedu Innocent Nwoye, Nicolas Padoy

SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

Introduction

This paper presents SurgiTrack, a system for fine-grained multi-class and multi-tool tracking in surgical videos. The goal is to accurately identify and track a wide range of surgical instruments during medical procedures, which can provide valuable insights for surgical training, workflow analysis, and automation.

Plain English Explanation

SurgiTrack is a tool that can automatically detect and follow different surgical instruments in video recordings of medical operations. The researchers developed an advanced computer vision system that can recognize a large number of individual tools, like scalpels, forceps, and clamps, and track their movements throughout the video.

This capability is useful for several reasons. It can help train new surgeons by providing detailed feedback on their instrument usage. It can also analyze the workflow of experienced surgeons to identify opportunities for improving efficiency. Additionally, the tracking data could eventually be used to automate certain routine surgical tasks.

The key innovation of SurgiTrack is its ability to recognize a wide variety of instruments, going beyond just the basic tool types. This fine-grained classification and tracking is a significant technical challenge that the researchers managed to overcome through novel deep learning models and training techniques.

Technical Explanation

The core of SurgiTrack is a deep neural network that can detect and classify a large set of surgical instruments in video frames. The network uses a multi-task learning approach, jointly optimizing for both classification and localization of the tools. This allows it to not only identify what instruments are present, but also precisely where they are located in the image.

To train this model, the researchers assembled a large, diverse dataset of surgical video footage annotated with bounding boxes and class labels for each visible tool. They also developed novel data augmentation techniques to improve the model's robustness and generalization.

At inference time, SurgiTrack processes the video frames sequentially, using the detection and classification outputs to track the trajectories of individual instruments across the sequence. This is enabled by a tracking algorithm that associates detections between frames while handling occlusions and other challenges.

The paper evaluates SurgiTrack on several benchmark datasets, demonstrating state-of-the-art performance on both fine-grained tool classification and multi-object tracking tasks. The system is able to accurately identify and follow a wide range of surgical instruments in real-world surgical videos.

Critical Analysis

The SurgiTrack system represents a significant advance in the field of surgical video analysis. By achieving such high accuracy on fine-grained multi-tool tracking, it opens up new possibilities for applications in surgical training, workflow optimization, and automation.

That said, the paper does acknowledge some limitations of the current approach. For example, the dataset used for training, while large, may not fully capture the diversity of real-world surgical procedures and tools. Additionally, the tracking algorithm can still fail in challenging situations, such as when tools are heavily occluded or move in complex patterns.

Further research is needed to address these limitations and improve the robustness and generalizability of the system. Potential directions include exploring few-shot or unsupervised learning techniques to reduce reliance on extensive labeled data, as well as developing more sophisticated tracking models that can handle a wider range of occlusion and motion patterns.

Conclusion

The SurgiTrack system demonstrates the potential of advanced computer vision techniques to revolutionize surgical workflows and training. By providing fine-grained, multi-tool tracking capabilities, it enables new applications that could significantly improve patient outcomes and healthcare efficiency. While not perfect, this research represents an important step forward in the field of surgical video analysis, and the insights and techniques developed here are likely to have a lasting impact.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

Chinedu Innocent Nwoye, Nicolas Padoy

Accurate tool tracking is essential for the success of computer-assisted intervention. Previous efforts often modeled tool trajectories rigidly, overlooking the dynamic nature of surgical procedures, especially tracking scenarios like out-of-body and out-of-camera views. Addressing this limitation, the new CholecTrack20 dataset provides detailed labels that account for multiple tool trajectories in three perspectives: (1) intraoperative, (2) intracorporeal, and (3) visibility, representing the different types of temporal duration of tool tracks. These fine-grained labels enhance tracking flexibility but also increase the task complexity. Re-identifying tools after occlusion or re-insertion into the body remains challenging due to high visual similarity, especially among tools of the same category. This work recognizes the critical role of the tool operators in distinguishing tool track instances, especially those belonging to the same tool category. The operators' information are however not explicitly captured in surgical videos. We therefore propose SurgiTrack, a novel deep learning method that leverages YOLOv7 for precise tool detection and employs an attention mechanism to model the originating direction of the tools, as a proxy to their operators, for tool re-identification. To handle diverse tool trajectory perspectives, SurgiTrack employs a harmonizing bipartite matching graph, minimizing conflicts and ensuring accurate tool identity association. Experimental results on CholecTrack20 demonstrate SurgiTrack's effectiveness, outperforming baselines and state-of-the-art methods with real-time inference capability. This work sets a new standard in surgical tool tracking, providing dynamic trajectories for more adaptable and precise assistance in minimally invasive surgeries.

5/31/2024

SurgTrack: CAD-Free 3D Tracking of Real-world Surgical Instruments

Wenwu Guo, Jinlin Wu, Zhen Chen, Qingxiang Zhao, Miao Xu, Zhen Lei, Hongbin Liu

Vision-based surgical navigation has received increasing attention due to its non-invasive, cost-effective, and flexible advantages. In particular, a critical element of the vision-based navigation system is tracking surgical instruments. Compared with 2D instrument tracking methods, 3D instrument tracking has broader value in clinical practice, but is also more challenging due to weak texture, occlusion, and lack of Computer-Aided Design (CAD) models for 3D registration. To solve these challenges, we propose the SurgTrack, a two-stage 3D instrument tracking method for CAD-free and robust real-world applications. In the first registration stage, we incorporate an Instrument Signed Distance Field (SDF) modeling the 3D representation of instruments, achieving CAD-freed 3D registration. Due to this, we can obtain the location and orientation of instruments in the 3D space by matching the video stream with the registered SDF model. In the second tracking stage, we devise a posture graph optimization module, leveraging the historical tracking results of the posture memory pool to optimize the tracking results and improve the occlusion robustness. Furthermore, we collect the Instrument3D dataset to comprehensively evaluate the 3D tracking of surgical instruments. The extensive experiments validate the superiority and scalability of our SurgTrack, by outperforming the state-of-the-arts with a remarkable improvement. The code and dataset are available at https://github.com/wenwucode/SurgTrack.

9/5/2024

EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos

Ryo Fujii, Hideo Saito, Hiroki Kajita

Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures, and heavy occlusion. The lack of a comprehensive large-scale dataset compounds these challenges. In this paper, we introduce EgoSurgery-Tool, an extension of the existing EgoSurgery-Phase dataset, which contains real open surgery videos captured using an egocentric camera attached to the surgeon's head, along with phase annotations. EgoSurgery-Tool has been densely annotated with surgical tools and comprises over 49K surgical tool bounding boxes across 15 categories, constituting a large-scale surgical tool detection dataset. EgoSurgery-Tool also provides annotations for hand detection with over 46K hand-bounding boxes, capturing hand-object interactions that are crucial for understanding activities in egocentric open surgery. EgoSurgery-Tool is superior to existing datasets due to its larger scale, greater variety of surgical tools, more annotations, and denser scenes. We conduct a comprehensive analysis of EgoSurgery-Tool using nine popular object detectors to assess their effectiveness in both surgical tool and hand detection. The dataset will be released at https://github.com/Fujiry0/EgoSurgery.

6/7/2024

SURGIVID: Annotation-Efficient Surgical Video Object Discovery

c{C}au{g}han Koksal, Ghazal Ghazaei, Nassir Navab

Surgical scenes convey crucial information about the quality of surgery. Pixel-wise localization of tools and anatomical structures is the first task towards deeper surgical analysis for microscopic or endoscopic surgical views. This is typically done via fully-supervised methods which are annotation greedy and in several cases, demanding medical expertise. Considering the profusion of surgical videos obtained through standardized surgical workflows, we propose an annotation-efficient framework for the semantic segmentation of surgical scenes. We employ image-based self-supervised object discovery to identify the most salient tools and anatomical structures in surgical videos. These proposals are further refined within a minimally supervised fine-tuning step. Our unsupervised setup reinforced with only 36 annotation labels indicates comparable localization performance with fully-supervised segmentation models. Further, leveraging surgical phase labels as weak labels can better guide model attention towards surgical tools, leading to $sim 2%$ improvement in tool localization. Extensive ablation studies on the CaDIS dataset validate the effectiveness of our proposed solution in discovering relevant surgical objects with minimal or no supervision.

9/14/2024