TellWell
← Back to feed
Publications3d ago92% confidenceConfidence 92% — the share of independent, credible sources corroborating the core facts.

Scaffold Effects on GAIA: Controlled Study Shows Prompting Methods Significantly Impact AI Model Performance Measurements

Center 100%
1 source

A pre-registered controlled study comparing three different prompting scaffolds (ReAct, multi-agent, and planner-executor designs) across five AI models found that scaffold choice alone can shift measured accuracy by up to 28 percentage points on the same tasks. The research challenges assumptions that more capable models are less dependent on scaffolding and that capability rankings remain stable across different prompting approaches. These findings suggest that published AI capability scores may conflate model ability with scaffolding effectiveness, making direct performance comparisons unreliable without controlling for prompting methodology.

Researchers conducted a pre-registered controlled comparison of three prompting scaffolds across five models from Anthropic, Google, and OpenAI on GAIA validation tasks at difficulty levels 1 and 2. The study held tasks and conditions fixed while varying only the scaffolding approach, finding that scaffold choice alone produced accuracy variations of up to 28 percentage points within a single model. Notably, the hypothesis that more capable models would be less sensitive to scaffolding was rejected; instead, the most capable Anthropic model (Opus) showed the largest gains from structured scaffolds at harder difficulty levels. The predicted advantages of certain scaffolds (multi-agent over ReAct, planner-executor on file-reading tasks) did not hold consistently across model families, suggesting that model family rather than capability tier determines scaffold sensitivity. The results indicate that single-scaffold capability measurements are conditional estimates and that performance gaps between models may not reliably shrink as models improve.

What's missing

The study does not discuss potential implications for how AI capability benchmarks should be reported or standardized in the future, nor does it address whether existing published capability scores should be reinterpreted in light of these findings.

What different sources said

  • Scaffold Effects on GAIA: A Controlled Comparison

Related

PublicationsConfidence 78% — the share of independent, credible sources corroborating the core facts.

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Researchers have discovered that an enzyme in common gut bacteria can degrade N-epsilon-carboxymethyllysine (CML), a compound formed during thermal food processing, producing previously unknown biogenic amines. The enzyme, ornithine decarboxylase SpeC from enterobacteria, acts on CML and related modified lysine derivatives through a low-level 'underground' catalytic activity. This finding suggests a previously unrecognized communication axis between thermally processed dietary compounds and gut microbial physiology, with potential implications for host health.

1 source43m ago
PublicationsConfidence 78% — the share of independent, credible sources corroborating the core facts.

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Researchers used Oxford Nanopore full-length 16S rRNA gene sequencing to characterize the microbiome of Ixodes scapularis black-legged ticks collected in Nova Scotia, Canada, distinguishing between tick-adapted bacteria and environmentally acquired bacteria. The study comes as I. scapularis — the primary vector of Lyme disease — is rapidly expanding northward into Canada due to climate change. The findings suggest that environmentally derived bacteria in tick microbiomes are not mere contamination, which has implications for how tick microbiome data is collected and interpreted across surveillance studies.

1 source44m ago
PublicationsConfidence 78% — the share of independent, credible sources corroborating the core facts.

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria

Researchers have discovered that the metabolite acetyl-CoA directly inhibits enzymes that degrade the bacterial signaling molecule c-di-GMP, connecting cell envelope biosynthesis stress to biofilm formation in Pseudomonas aeruginosa. The study found that sub-inhibitory concentrations of antibiotics targeting early peptidoglycan biosynthesis — but not other antibiotic classes — elevate c-di-GMP levels by reducing phosphodiesterase activity, with acetyl-CoA competing for the enzyme active site. Because the relevant enzyme domain is broadly conserved across bacterial species, this checkpoint mechanism may be widespread and could have implications for understanding antibiotic-induced biofilm responses.

1 source44m ago