PublicationsJun 1087% confidence

AdaGC: New Adaptive Gradient Clipping Method Improves Large Language Model Training Stability

Center 100%

1 source

Researchers have proposed AdaGC, an adaptive per-tensor gradient clipping scheme designed to prevent training instabilities known as loss spikes in large language model pretraining. Loss spikes are a longstanding problem caused by a combination of data outliers, hardware faults, numerical precision issues, and hyperparameter settings that corrupt optimizer state. The method, accepted at ICML 2026, demonstrated zero spike scores and downstream accuracy improvements of up to 2.48% over the standard global gradient clipping baseline across three tested models.

AdaGC is a gradient clipping technique that addresses loss spikes in large-scale language model pretraining by bounding each tensor's gradient norm relative to a tensor-wise exponential moving average of its historical clipped values, rather than applying a single global clip threshold. The authors argue that loss spikes are rarely caused by a single factor but instead emerge from the confluence of data outliers, hardware faults, numerical precision problems, and hyperparameter choices, all of which manifest as abnormal gradients that contaminate optimizer moment estimates. AdaGC is described as optimizer-agnostic, meaning it can be combined with standard optimizers as well as newer ones like Muon and Lion, and it introduces negligible additional memory overhead. Experiments on Llama-2 7B, Mixtral 8x1B, and ERNIE 10B-A1.4B showed that AdaGC consistently reduced spike scores to zero and improved downstream task accuracy by 1.32%, 1.27%, and 2.48% respectively compared to global gradient clipping. The method also reduces inter-device communication costs relative to global gradient clipping, which requires aggregating gradient norms across all tensors in distributed training setups. The paper was accepted at ICML 2026 and code has been made publicly available.

What's missing

The paper does not report wall-clock training time overhead introduced by AdaGC relative to GlobalGC, nor does it evaluate scaling behavior beyond 10B parameters.

What different sources said

arXiv cs.LGCenter
AdaGC: Enhancing LLM Pretraining Stability via Adaptive Gradient Clipping

Publications

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Researchers have discovered that an enzyme in common gut bacteria can degrade N-epsilon-carboxymethyllysine (CML), a compound formed during thermal food processing, producing previously unknown biogenic amines. The enzyme, ornithine decarboxylase SpeC from enterobacteria, acts on CML and related modified lysine derivatives through a low-level 'underground' catalytic activity. This finding suggests a previously unrecognized communication axis between thermally processed dietary compounds and gut microbial physiology, with potential implications for host health.

1 sourceJun 13

Publications

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Researchers used Oxford Nanopore full-length 16S rRNA gene sequencing to characterize the microbiome of Ixodes scapularis black-legged ticks collected in Nova Scotia, Canada, distinguishing between tick-adapted bacteria and environmentally acquired bacteria. The study comes as I. scapularis — the primary vector of Lyme disease — is rapidly expanding northward into Canada due to climate change. The findings suggest that environmentally derived bacteria in tick microbiomes are not mere contamination, which has implications for how tick microbiome data is collected and interpreted across surveillance studies.

1 sourceJun 13

Publications

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria

Researchers have discovered that the metabolite acetyl-CoA directly inhibits enzymes that degrade the bacterial signaling molecule c-di-GMP, connecting cell envelope biosynthesis stress to biofilm formation in Pseudomonas aeruginosa. The study found that sub-inhibitory concentrations of antibiotics targeting early peptidoglycan biosynthesis — but not other antibiotic classes — elevate c-di-GMP levels by reducing phosphodiesterase activity, with acetyl-CoA competing for the enzyme active site. Because the relevant enzyme domain is broadly conserved across bacterial species, this checkpoint mechanism may be widespread and could have implications for understanding antibiotic-induced biofilm responses.

1 sourceJun 13

AdaGC: New Adaptive Gradient Clipping Method Improves Large Language Model Training Stability

What's missing

What different sources said

Related

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria