how state embed characterized in cell state mode and how to performe vs null mode

#438
by ZYSK-huggingface - opened

Hi !
Thank you so much for your diligent work and patient reply all the time. I have a question about how state and goal cell embedding characterized in cell state mode .

For example, I have a number of state A cells, and a number of state B cells, and I perturbate genes to identify which gene will shift cell from state A to state B.

In my comprehension, state A embed and stated B embed is fixed, initially calculated by averaging their respective number of cell embeddings. So is it necassary to make sure state A cells and state B cells have considerable numbers in my input datasets?

Besides, it also bothers me when numbers of different state are imbalanced. For example, state A is more frequent than state B, so perturbation from ①state A to state B and from ②state B to state A seems not fair. In ①, state A got more cells to perturbate and results seem more convincing. In ②, state B got less cells to perturbate and results seem less convincing.

Except the questions above, I now try to explore the vs null mode. From your document and answer, if my comprehension is correct, I can choose single gene perturbation instead of all genes perturbation and finally obtain pvalue and FDR with vs null mode.

However there seems to be two different senario and I am not sure both of them can work with vs null mode.
①perturbate single gene one by one using in silico perturber, and put all the intermidiate results in the same directory, and then use vs null mode with in silico perturber_stas which will automically compare each gene with the other gene? (what should the parameter 'genes_perturbed=' be then ? 'all' or something else?)
②if perturbate a gene list, what will happen then ?

Beacuse some time I hope to perturbate interested genes to see their impact on cell state shift. The second question may be important cause 'all' genes run too slow.

Thank you for your question. Please refer to the documentation regarding the null mode:
https://geneformer.readthedocs.io/en/latest/geneformer.in_silico_perturber.html

This is useful if you have a set of null distribution cells you would like to compare to. Please see my answers in the other discussion you started, #439, which I think will be helpful for you to understand the process and apply it to answer your biological question.

ctheodoris changed discussion status to closed

Sign up or log in to comment