PublicationsJun 1185% confidence

Researchers Develop Method to Estimate Rare Harmful Outputs in Language Models

Center 100%

1 source

A team of researchers has developed a statistical technique to efficiently estimate the probability of rare but harmful outputs from large language models, accepted for presentation at ICML 2026. Current safety evaluations focus on identifying harmful inputs but largely ignore the probabilistic nature of model outputs and their tail-end behavior. The method matters because even very low-probability harmful outputs — on the order of 1-in-10,000 — will occur frequently when models are queried billions of times daily.

The paper, posted to arXiv and accepted to ICML 2026, addresses a gap in AI safety evaluation: existing benchmarks identify inputs likely to produce harmful outputs but do not quantify how probable those harmful outputs actually are. The authors propose using importance sampling — a statistical technique for estimating rare-event probabilities — by constructing 'unsafe' versions of a target language model that make harmful outputs artificially more frequent, allowing efficient probability estimation without exhaustive brute-force sampling. On benchmarks covering both misuse and misalignment scenarios, the method achieves accuracy comparable to standard Monte Carlo estimation while requiring 10 to 20 times fewer samples. Concretely, the technique can estimate harmful output probabilities as low as 10^-4 using only 500 samples. Beyond probability estimation, the authors find that their harmfulness estimates can also reveal how sensitive a model is to small perturbations in its input and can serve as a predictor of real-world deployment risks. The work argues that rigorous rare-event estimation is both necessary and practically achievable for responsible AI safety evaluation.

What's missing

It is unclear whether the 'unsafe' model variants used for importance sampling introduce their own biases that could systematically skew probability estimates. The study's benchmarks may not cover the full diversity of real-world deployment contexts, and the method's performance on multimodal or instruction-tuned models beyond those tested remains an open question.

What different sources said

arXiv cs.AICenter
Estimating Tail Risks in Language Model Output Distributions

Publications

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Researchers have discovered that an enzyme in common gut bacteria can degrade N-epsilon-carboxymethyllysine (CML), a compound formed during thermal food processing, producing previously unknown biogenic amines. The enzyme, ornithine decarboxylase SpeC from enterobacteria, acts on CML and related modified lysine derivatives through a low-level 'underground' catalytic activity. This finding suggests a previously unrecognized communication axis between thermally processed dietary compounds and gut microbial physiology, with potential implications for host health.

1 sourceJun 13

Publications

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Researchers used Oxford Nanopore full-length 16S rRNA gene sequencing to characterize the microbiome of Ixodes scapularis black-legged ticks collected in Nova Scotia, Canada, distinguishing between tick-adapted bacteria and environmentally acquired bacteria. The study comes as I. scapularis — the primary vector of Lyme disease — is rapidly expanding northward into Canada due to climate change. The findings suggest that environmentally derived bacteria in tick microbiomes are not mere contamination, which has implications for how tick microbiome data is collected and interpreted across surveillance studies.

1 sourceJun 13

Publications

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria

Researchers have discovered that the metabolite acetyl-CoA directly inhibits enzymes that degrade the bacterial signaling molecule c-di-GMP, connecting cell envelope biosynthesis stress to biofilm formation in Pseudomonas aeruginosa. The study found that sub-inhibitory concentrations of antibiotics targeting early peptidoglycan biosynthesis — but not other antibiotic classes — elevate c-di-GMP levels by reducing phosphodiesterase activity, with acetyl-CoA competing for the enzyme active site. Because the relevant enzyme domain is broadly conserved across bacterial species, this checkpoint mechanism may be widespread and could have implications for understanding antibiotic-induced biofilm responses.

1 sourceJun 13

Researchers Develop Method to Estimate Rare Harmful Outputs in Language Models

What's missing

What different sources said

Related

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria