MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation
Abstract
Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs
Community
The paper presents a novel method of enhancing Vision Transformer (ViT)-based medical image segmentation models by integrating pre-trained frozen transformer blocks from Large Language Models (LLMs), significantly improving segmentation performance across various medical imaging modalities.
- Frozen LLM Transformer Integration: Introduces a pre-trained, frozen transformer block from LLMs into the encoder of a ViT model, resulting in substantial performance improvements in medical image segmentation.
- Hybrid Attention and Multi-Scale Fusion: Proposes a Hybrid Attention Mechanism combining global and local feature learning, alongside a Multi-Scale Fusion Block to aggregate features across scales, enhancing segmentation precision.
- Extensive Evaluation: Demonstrates effectiveness across 10 medical imaging modalities, achieving higher accuracy, precision, and Dice scores, with thorough ablation studies confirming the advantages of the LLM-based approach.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper