PublicationsJun 1183% confidence

Compiler-Friendly State Space Duality Enables Portable, Efficient Mamba-2 Inference Without Custom Kernels

Center 100%

1 source

Researchers have demonstrated a compiler-friendly implementation of Mamba-2 state space model inference using standard JAX primitives, eliminating the need for custom CUDA or Triton kernels. The approach exploits structural properties of the state space duality (SSD) recurrence—diagonal dynamics, fixed-size chunking, and static control flow—to enable a single-source codebase that runs unmodified on both Google TPU v6e and NVIDIA L40S hardware. This matters because it significantly improves hardware portability and inference efficiency, with cached decoding running 27–36 times faster than full-prefix recomputation at a 4096-token context.

A preprint posted to arXiv presents a method for running Mamba-2 sequence models efficiently across different accelerator hardware without relying on hardware-specific fused kernels. By expressing the state space duality (SSD) recurrence in standard JAX primitives—leveraging its diagonal per-head dynamics, fixed-size chunking, einsum-dominated compute, and static control flow—the authors produce a single-source inference path compatible with both TPU and GPU backends. On a Google Cloud TPU v6e, batch-1 prefill achieves approximately 140 TFLOPS (15% model FLOP utilisation, described as the roofline ceiling for this regime), while cached decode reaches up to 64% hardware bandwidth utilisation. Cached decoding is 27–36 times faster than full-prefix recomputation at a 4096-token context across five Mamba-2 checkpoints ranging from 130M to 2.7B parameters. Numerical accuracy is validated against the Triton reference implementation (mamba_ssm v2.2.2), with WikiText-103 perplexity matching within ±0.0005 points and hidden states agreeing to float32 rounding tolerance. The work addresses a key portability bottleneck in deploying state space models at scale, where prior high-throughput implementations were tightly coupled to specific GPU kernel ecosystems.

What's missing

Performance on other widely-used accelerators (e.g., NVIDIA A100/H100, AMD GPUs, or multi-device configurations) is not reported. Throughput comparisons against optimized Transformer inference systems are also absent, leaving open the question of how this approach fares relative to attention-based alternatives at equivalent model scales. The work is a preprint and has not yet undergone formal peer review.

What different sources said

arXiv cs.AICenter
Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference

Publications

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Researchers have discovered that an enzyme in common gut bacteria can degrade N-epsilon-carboxymethyllysine (CML), a compound formed during thermal food processing, producing previously unknown biogenic amines. The enzyme, ornithine decarboxylase SpeC from enterobacteria, acts on CML and related modified lysine derivatives through a low-level 'underground' catalytic activity. This finding suggests a previously unrecognized communication axis between thermally processed dietary compounds and gut microbial physiology, with potential implications for host health.

1 sourceJun 13

Publications

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Researchers used Oxford Nanopore full-length 16S rRNA gene sequencing to characterize the microbiome of Ixodes scapularis black-legged ticks collected in Nova Scotia, Canada, distinguishing between tick-adapted bacteria and environmentally acquired bacteria. The study comes as I. scapularis — the primary vector of Lyme disease — is rapidly expanding northward into Canada due to climate change. The findings suggest that environmentally derived bacteria in tick microbiomes are not mere contamination, which has implications for how tick microbiome data is collected and interpreted across surveillance studies.

1 sourceJun 13

Publications

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria

Researchers have discovered that the metabolite acetyl-CoA directly inhibits enzymes that degrade the bacterial signaling molecule c-di-GMP, connecting cell envelope biosynthesis stress to biofilm formation in Pseudomonas aeruginosa. The study found that sub-inhibitory concentrations of antibiotics targeting early peptidoglycan biosynthesis — but not other antibiotic classes — elevate c-di-GMP levels by reducing phosphodiesterase activity, with acetyl-CoA competing for the enzyme active site. Because the relevant enzyme domain is broadly conserved across bacterial species, this checkpoint mechanism may be widespread and could have implications for understanding antibiotic-induced biofilm responses.

1 sourceJun 13

Compiler-Friendly State Space Duality Enables Portable, Efficient Mamba-2 Inference Without Custom Kernels

What's missing

What different sources said

Related

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria