PublicationsJun 1183% confidence

New Method Uses Interpretable AI Features to Predict Enzyme Functions in Microbial Proteins

Center 100%

1 source

Researchers used sparse autoencoder (SAE) features derived from the ESMC-6B protein language model to predict enzyme commission (EC) numbers for microbial proteins, achieving 78.9% top-1 accuracy on a benchmark of 4,868 enzymes without task-specific training. The approach addresses a longstanding challenge: millions of microbial proteins have unknown enzymatic functions, and most deep learning methods offer little mechanistic insight. The method is notable for being both interpretable—linking predictions to biological concepts like catalytic triads and Rossmann folds—and computationally lightweight, with potential to screen billions of uncharacterized proteins.

A preprint posted to arXiv presents a framework for predicting enzyme function in microbial proteins by leveraging a 16,384-dimensional codebook of interpretable biological features extracted from the ESMC-6B protein language model via a sparse autoencoder (SAE). On a balanced benchmark of 4,868 SwissProt microbial enzymes spanning 161 EC subclasses, the SAE binary features achieved 78.9% top-1 and 88.5% top-5 accuracy, outperforming 3-mer sequence baselines (57.3%) by 37.6 percentage points. In a more challenging leave-one-EC3-class-out evaluation designed to simulate discovery of genuinely novel enzyme classes, the SAE features recovered the correct EC1 superclass in 47.7% of cases—3.3 times the random baseline of 14.3%—compared to 26.6% for sequence-based methods. Each SAE feature was annotated using GPT-5, and discriminative features mapped onto known biochemical mechanisms: catalytic triad geometry for hydrolases, NAD(P)H-binding Rossmann folds for oxidoreductases, and phosphate-binding P-loops for transferases. The authors applied the approach to the ESM Atlas of 7.7 million protein clusters, identifying 169,859 candidate 'dark enzyme-like' proteins distributed across all major microbial phyla. The method requires no GPU-intensive inference at prediction time, making it scalable to the billions of proteins in large metagenomic databases.

What's missing

As a preprint, this work has not yet undergone peer review. Key limitations and open questions include: the benchmark relies on SwissProt-annotated enzymes, which may not fully represent the diversity of truly novel or poorly characterized enzymes; the accuracy of GPT-5-generated feature annotations has not been independently validated; the 169,859 dark enzyme candidates are computationally predicted and lack experimental confirmation; and it is unclear how performance degrades for multi-functional enzymes. The study also does not report false positive rates for the dark matter candidate screen.

What different sources said

arXiv q-bioCenter
Interpretable enzyme function prediction via sparse autoencoder features of ESMC across the microbial protein universe

Publications

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Researchers have discovered that an enzyme in common gut bacteria can degrade N-epsilon-carboxymethyllysine (CML), a compound formed during thermal food processing, producing previously unknown biogenic amines. The enzyme, ornithine decarboxylase SpeC from enterobacteria, acts on CML and related modified lysine derivatives through a low-level 'underground' catalytic activity. This finding suggests a previously unrecognized communication axis between thermally processed dietary compounds and gut microbial physiology, with potential implications for host health.

1 sourceJun 13

Publications

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Researchers used Oxford Nanopore full-length 16S rRNA gene sequencing to characterize the microbiome of Ixodes scapularis black-legged ticks collected in Nova Scotia, Canada, distinguishing between tick-adapted bacteria and environmentally acquired bacteria. The study comes as I. scapularis — the primary vector of Lyme disease — is rapidly expanding northward into Canada due to climate change. The findings suggest that environmentally derived bacteria in tick microbiomes are not mere contamination, which has implications for how tick microbiome data is collected and interpreted across surveillance studies.

1 sourceJun 13

Publications

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria

Researchers have discovered that the metabolite acetyl-CoA directly inhibits enzymes that degrade the bacterial signaling molecule c-di-GMP, connecting cell envelope biosynthesis stress to biofilm formation in Pseudomonas aeruginosa. The study found that sub-inhibitory concentrations of antibiotics targeting early peptidoglycan biosynthesis — but not other antibiotic classes — elevate c-di-GMP levels by reducing phosphodiesterase activity, with acetyl-CoA competing for the enzyme active site. Because the relevant enzyme domain is broadly conserved across bacterial species, this checkpoint mechanism may be widespread and could have implications for understanding antibiotic-induced biofilm responses.

1 sourceJun 13

New Method Uses Interpretable AI Features to Predict Enzyme Functions in Microbial Proteins

What's missing

What different sources said

Related

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria