Overview
This is the C2S-Pythia-410m-cell-type-conditioned-cell-generation model, built on the Pythia-410m architecture developed by EleutherAI, fine-tuned using Cell2Sentence (C2S) on a comprehensive collection of single-cell RNA sequencing (scRNA-seq) datasets from CellxGene and the Human Cell Atlas. Cell2Sentence is a pioneering technique that adapts large language models (LLMs) to single-cell biology by converting scRNA-seq data into "cell sentences" — ordered sequences of gene names based on expression levels. This model is specifically trained for cell type-conditioned single-cell generation, enabling the generation of realistic single-cell profiles conditioned on specified cell types.
Training Data
This model was trained on over 57 million human and mouse cells gathered from over 800 single-cell RNA sequencing datasets from CellxGene and the Human Cell Atlas. This dataset covers a broad range of cell types and conditions from multiple tissues in both human and mouse.
This model was trained with the top 200 genes per cell sentence.
Tasks
This model is designed for:
- Cell type-conditioned single-cell generation: Generating single-cell profiles conditioned on specific cell types, allowing for the creation of synthetic cells that reflect the gene expression patterns of targeted cell types.
Cell2Sentence Links
- GitHub: https://github.com/vandijklab/cell2sentence
- Paper: https://www.biorxiv.org/content/10.1101/2023.09.11.557287v3
Pythia Links
- Paper: https://arxiv.org/pdf/2304.01373
- Hugging Face: https://huggingface.co/EleutherAI/pythia-410m
- Downloads last month
- 23
Model tree for vandijklab/C2S-Pythia-410m-cell-type-conditioned-cell-generation
Base model
EleutherAI/pythia-410m