You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the TITAN model and its derivatives, which include models trained on outputs from the TITAN model or datasets created from the TITAN model, is prohibited and requires prior approval. Please note that the primary email used to sign up for your Hugging Face account must match your institutional email to receive approval. By downloading the model, you attest that all information (affiliation, research use) is correct and up-to-date. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the TITAN model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author.

Model Card for TITAN-preview

[Preprint] | [Github Repo] | [Cite]

What is TITAN?

TITAN (Transformer-based pathology Image and Text Alignment Network) is a multimodal whole-slide foundation model pre-trained using visual self-supervised learning and vision-language alignment. It leverages 335,645 whole-slide images (WSIs) from a diverse set of internally collected neoplastic, infectious, and inflammatory cases at Mass General Brigham. Additionally, TITAN utilizes over 182,000 pathology reports and more than 423,000 synthetic captions generated by PathChat, our pathology co-pilot. TITAN's slide embeddings achieve state-of-the-art performance on diverse downstream tasks, including linear probing, few-shot and zero-shot classification, rare cancer retrieval, cross-modal retrieval, and pathology report generation.

This is a preview and we will bring you further updates and improvements.

Requesting Access

As mentioned in the gated prompt, you must agree to the outlined terms of use, with the primary email for your HuggingFace account matching your institutional email. If your primary email is a personal email (@gmail/@hotmail/@qq) your request will be denied. To fix this, you can: (1) add your official institutional email to your HF account, and confirm your email address to verify, and (2) set your institutional email as your primary email in your HF account. Other reasons for your request access being denied include other mistakes in the form submitted, for example: full name includes abbreviations, affiliation is not spelled out, the described research use is not sufficient, or email domain address not recognized.

Model Description

Developed by: Mahmood Lab AI for Pathology @ Harvard/BWH
Model type: Pretrained vision-language encoders
Pretraining dataset: Mass-340K, sourced from private histology collections (BWH / MGH), in addition to slides from the public GTEx consortium.
Repository: https://github.com/mahmoodlab/TITAN
Preprint: https://arxiv.org/abs/2411.19666
License: CC-BY-NC-ND-4.0

Requirements

torch==2.0.1
timm==1.0.3
einops==0.6.1
einops-exts==0.0.4
transformers==4.46.0

Model Usage

TITAN-preview is a vision-lanuage model trained on CONCH v1.5 patch features with patch size of 512x512 pixels at 20x magnification.

Following authentication (using huggingface_hub), both TITAN-preview (slide and language encoders) and CONCH v1.5 (patch encoder) can be loaded using the commands below:

from huggingface_hub import login
from transformers import AutoModel 

login()  # login with your User Access Token, found at https://huggingface.co/settings/tokens

titan = AutoModel.from_pretrained('MahmoodLab/TITAN', trust_remote_code=True)
conch, eval_transform = titan.return_conch()

You can directly use TITAN-preview for slide-level feature extaction. TITAN builds a feature grids from CONCH v1.5 patch features using the coordinates and the distance between the patches. As patch coordinates are always saved at the slides' level 0 magnification, TITAN takes patch_size_lv0 which represents the distance between two adjacent patches at level 0 magnification. It is 1024 if slide is 40x, or 512 if slide is 20x. We have this info saved in our demo TCGA features.

Slide-level feature extraction can be done in the following way:

import h5py
from transformers import AutoModel

# load model
titan = AutoModel.from_pretrained('MahmoodLab/TITAN', trust_remote_code=True)

# load CONCH v1.5 demo features
h5_path = 'TCGA_demo_features/TCGA-RM-A68W-01Z-00-DX1.4E62E4F4-415C-46EB-A6C8-45BA14E82708.h5'
with h5py.File(h5_path, 'r') as file:
    features = torch.from_numpy(file['features'][:])
    coords = torch.from_numpy(file['coords'][:])
    patch_size_lv0 = file['coords'].attrs['patch_size_level0']

# extract slide embedding
with torch.autocast('cuda', torch.float16), torch.inference_mode():
    slide_embedding = model.encode_slide_from_patch_features(features, coords, patch_size_lv0)

These pre-extracted features can then be used for slide-level classification (via linear probing), retrieval (via l2 distance), and other machine learning settings, without task-specific finetuning.

We also released all TCGA TITAN-preview features in TCGA_TITAN_features.pkl. We demonstrated more detailed linear probe and zero-shot evaluation in our github.

License and Terms of Use

This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the TITAN model and its derivatives, which include models trained on outputs from the TITAN model or datasets created from the TITAN model, is prohibited and requires prior approval. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the TITAN model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author.

Contact

For any additional questions or comments, contact Faisal Mahmood ([email protected]),
Tong Ding ([email protected]),
Sophia J. Wagner ([email protected]),
Andrew H. Song ([email protected]),
or Richard J. Chen ([email protected]),

Acknowledgements

The project was built on top of amazing repositories such as ViT, iBOT, OpenClip, LGSSL, and Timm (ViT model implementation). We thank the authors and developers for their contribution.

BibTeX

If you found our work useful in your research, please consider citing our work at:

Ding, T.*, Wagner S.J.*, Song, A.H.*, Chen, R.J.* et al. Multimodal Whole Slide Foundation Model for Pathology, Arxiv, 2024

@misc{ding2024multimodalslidefoundationmodel,
      title={Multimodal Whole Slide Foundation Model for Pathology}, 
      author={Tong Ding and Sophia J. Wagner and Andrew H. Song and Richard J. Chen and Ming Y. Lu and Andrew Zhang and Anurag J. Vaidya and Guillaume Jaume and Muhammad Shaban and Ahrong Kim and Drew F. K. Williamson and Bowen Chen and Cristina Almagro-Perez and Paul Doucet and Sharifa Sahai and Chengkuan Chen and Daisuke Komura and Akihiro Kawabe and Shumpei Ishikawa and Georg Gerber and Tingying Peng and Long Phi Le and Faisal Mahmood},
      year={2024},
      eprint={2411.19666},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2411.19666}, 
}

Downloads last month: 6,107

Safetensors

Model size

159M params

Tensor type

F32

Inference API

Image Feature Extraction

Unable to determine this model's library. Check the docs .