NOTICE: Cross posting from Biostars for more coverage
Hello, you are doing well! I recently started a post-bacc fellowship (it will likely last 18 months) and I currently do not have a solidified personal project that I am working on. Much of the other projects going on in my lab right now are either low hanging fruit that will be left to interns to work on or they are outside of my interest/expertise. I have spent my first month or so now reading some literature and developing a project plan/proposal. I wanted to share my ideas in an open forum so that I can receive feedback and suggestions for how I should go about this.
Project idea: creating a curated intercellular communication database that includes integrated cell type specific information regarding intercellular interactions.
Problem/knowledge gap: many intercellular communication databases exists and are used in cell-cell communication inference from single cell/spatial transcriptomic data, all of which display much variation in annotations, level of curation, sources of information used, biological specificity, etc. I am primarily concerned with databases focused strictly on intercellular, human and mouse interactions. There are currently no databases that incorporate/contain annotations of the cell types [that are likely] involved in a given intercellular interaction; I personally think including cell type specific information regarding interactions can be super helpful when conducting L-R interaction inference from transcriptomics data. I also think that annotations regarding biological context of interactions can be improved. In my opinion, CellChatDB is currently the best annotated DB, as it includes higher-order biological context for each interaction in the DB, ex. pathways associated with L-R pairs. I think that sourcing more granular information, ex. distinct roles of ligand/receptor complexes in associated pathway ("L binds to R1-R2 complex to activate XYZ signalling pathway, which results in ABC changes in cell behavior/other intracellular pathways/etc.")
Aim/Solution: in this project, I will attempt to construct a DB that provides solutions to the concerns presented in the Problem/knowledge gap section. I will source data regarding proteins involved in intercellular communication that are supported by previous literature, so that all interaction pairs in the DB are experimentally validated. I will incorporate cell type specific information regarding these interactions; as of now I foresee that this information will be sourced from literature + using single cell/spatial transcriptomics data to estimate cell type specific expression of ligands/receptors (one idea I had so far for easily incorporating cell type specific information would be using methods/data from Human Protein Atlas for single cell expression of genes).
Challenges/Considerations: There is much to consider, I think I will have a better idea of what obstacles exists by continuing to work on this project as well as read more literature. As of now, I think the primary challenge for me is assessing the value or potential impact of the results of this endeavour. Will a CCC DB tool that includes cell type specific information be all that helpful to researchers using it? Am I considering the desires and needs of researchers studying intercellular communication? Another pragmatic issue is whether or not this project idea may provide me with enough intellectual stimulation (DB construction has never seemed to be all that glamorous when it comes to methods lol). What are some other challenges I will need to consider (please chime into any questions asked in this section and throughout the post)?
As of now, I don't have much to show regarding the development of this project idea, and I think a part of that has to do with the lack of feedback or discussion I have had regarding this, so I am opening the floor to any and all feedback. Let me know if some things need to be clarified. I think my chief concern may be the possibility of reinventing a subpar wheel.