u/EcstaticStruggle — reddlx

We are planning to perform scRNAseq using the 10x platform for ~48 patients and ~16 healthy volunteers enrolled in an ongoing longitudinal immunomonitoring cohort. We are measuring samples at baseline and follow-up (1 year). Because we are interested in granulocytes, we are fixating fresh samples as they are sampled during the study. Our pilot data demonstrated that in contrast to PBMCs, we can identify granulocyte clusters in the fixed samples reliably using this approach.

Because fixation is only preservable for 1 year according to the 10X instructions, we have to plan batches for scRNAseq. However, inclusion rates are variable across time, causing us difficulties in planning experiments.

My thinking was that we want to make representative pools (e.g., 12 patients, 4 controls per batch) with as much as possible age/sex-balanced groups to mitigate confounding batch effects. We have a follow-up timepoint that will be difficult to measure together with baseline samples (due to the fixation), but patients do not get treatment and it is more intended to track the natural progression of the disease.

Are there things we can account for during this planning stage to limit batch effects and improve our chances of correcting for these effects if they arise?

Or am I overthinking this and is scRNAseq batch correction more powerful than I realise? E.g., I see many papers combining scRNAseq from tens of different studies but I'm skeptical how this is even possible.