arxiv:2410.02458

MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

Published on Oct 3

· Submitted by

amanchadha on Oct 4

Upvote

Authors:

Aman Chadha ,

Abstract

Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs

View arXiv page View PDF Add to collection

Community

amanchadha

Paper author Paper submitter about 2 hours ago

The paper presents a novel method of enhancing Vision Transformer (ViT)-based medical image segmentation models by integrating pre-trained frozen transformer blocks from Large Language Models (LLMs), significantly improving segmentation performance across various medical imaging modalities.

Frozen LLM Transformer Integration: Introduces a pre-trained, frozen transformer block from LLMs into the encoder of a ViT model, resulting in substantial performance improvements in medical image segmentation.
Hybrid Attention and Multi-Scale Fusion: Proposes a Hybrid Attention Mechanism combining global and local feature learning, alongside a Multi-Scale Fusion Block to aggregate features across scales, enhancing segmentation precision.
Extensive Evaluation: Demonstrates effectiveness across 10 medical imaging modalities, achieving higher accuracy, precision, and Dice scores, with thorough ablation studies confirming the advantages of the LLM-based approach.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.02458 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.02458 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.02458 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.