HoloCell: New Foundation Model Integrates Three Layers of Cellular Data
Researchers have developed HoloCell, a generative foundation model that integrates epigenomic, transcriptomic, and proteomic data from individual cells into a unified framework. The model was trained on approximately 468 million single-cell profiles across these three omics layers, making it the first to seamlessly combine all three major cellular data types. This advance could enable more comprehensive understanding of cellular states and support in silico simulation of cellular systems.
HoloCell is a large-scale generative foundation model containing over 860 million parameters, trained on the Human-Multi-Omics-Corpus comprising approximately 468 million single-cell profiles across epigenomics, transcriptomics, and proteomics layers. The model employs a hierarchical tokenization strategy that encodes cis-regulatory elements, genes, and proteins as structured tokens within a shared framework, enabling it to handle missing data across modalities. Researchers evaluated HoloCell across multiple tasks including single-omics representation learning, paired and unpaired multi-omics integration, and cross-modal generation using iterative diffusion and remasking techniques. The model demonstrated superior performance in capturing cell heterogeneity as an integrated system and enabling in silico simulation of multi-omics information flow. These capabilities position HoloCell as a step toward the concept of a virtual cell, offering both systematic characterization and generative simulation of cellular systems.
What's missing
The preprint does not provide information on: (1) computational requirements and inference time for practical use; (2) validation on independent external datasets beyond the training corpus; (3) comparison with other recent multi-omics integration methods; (4) limitations regarding cell types or tissues underrepresented in the training data; (5) availability of code and model weights for reproducibility.
What different sources said
- bioRxivCenter
HoloCell: A Generative Foundation Model for Holistic Cellular Modeling
Related
Study Identifies Galectin-3's Role in Gastric Metaplasia Development Through Cathartocytosis
Researchers found that galectin-3, a protein upregulated in precancerous tissue changes, facilitates a cellular process called cathartocytosis that promotes the development of spasmolytic polypeptide expressing metaplasia (SPEM) in the stomach. Galectin-3 is abnormally expressed alongside sulfated mucins in high-risk precancerous conditions like Barrett's esophagus and intestinal metaplasia. The findings suggest galectin-3 may represent a therapeutic target for preventing progression from normal tissue to metaplastic and potentially cancerous states.
Study reveals spermatogonial stem cell clones don't follow random drift patterns in zebrafish
Researchers used CRISPR barcoding to track how spermatogonial stem cells (SSCs) contribute to sperm production across a zebrafish's lifetime, finding that clonal dynamics deviate significantly from neutral drift models. The study developed a mathematical framework showing that larger clones tend to drift at higher rates, suggesting non-random selection pressures. These findings have implications for understanding allele transmission and male fertility across the reproductive lifespan.
Researchers identify synthetic lethal TYMS inhibitor effective against ATRX-deficient cells
Scientists developed a phenotype-first screening approach that identified PP12, a covalent fragment that selectively kills ATRX-deficient cells by inhibiting thymidylate synthase (TYMS). The study combines covalent fragment screening, chemoproteomics, and genetic analysis to link drug phenotypes to specific cellular targets. This work establishes a generalizable methodology for discovering synthetic lethal drug candidates in cancer cells with specific genetic deficiencies.