Self-Harness: LLM Agents That Autonomously Improve Their Own Operating Systems
Researchers introduced Self-Harness, a system where large language model agents automatically improve their own operating harnesses without human intervention. The approach uses an iterative loop to identify model-specific weaknesses and generate targeted fixes, achieving significant performance gains across three diverse LLM models. This work addresses the scalability challenge of manually engineering harnesses for rapidly evolving and diverse LLMs.
Self-Harness is a new paradigm that enables LLM-based agents to autonomously refine the harnesses—the systems that mediate their interaction with environments—through an iterative three-stage process. The method first identifies model-specific failure patterns from execution traces (Weakness Mining), then generates minimal, targeted harness modifications tied to those failures (Harness Proposal), and finally validates changes through regression testing (Proposal Validation). When tested on Terminal-Bench-2.0 with three diverse base models (MiniMax M2.5, Qwen3.5-35B-A3B, and GLM-5), Self-Harness consistently improved performance, with pass rates increasing from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1% respectively. Qualitative analysis showed that the system generates model-specific, executable changes rather than generic instructions, suggesting a scalable alternative to manual harness engineering as LLM diversity and evolution accelerate.
What's missing
The paper does not discuss computational costs or overhead of the Self-Harness iterative process, nor does it provide comparisons to alternative automated harness optimization approaches. Additionally, generalization of the approach to other benchmark tasks beyond Terminal-Bench-2.0 is not addressed.
What different sources said
- arXiv cs.LGCenter
In-Context Learning for Latent Space Bayesian Optimization
Related
Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines
Researchers have discovered that an enzyme in common gut bacteria can degrade N-epsilon-carboxymethyllysine (CML), a compound formed during thermal food processing, producing previously unknown biogenic amines. The enzyme, ornithine decarboxylase SpeC from enterobacteria, acts on CML and related modified lysine derivatives through a low-level 'underground' catalytic activity. This finding suggests a previously unrecognized communication axis between thermally processed dietary compounds and gut microbial physiology, with potential implications for host health.
Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada
Researchers used Oxford Nanopore full-length 16S rRNA gene sequencing to characterize the microbiome of Ixodes scapularis black-legged ticks collected in Nova Scotia, Canada, distinguishing between tick-adapted bacteria and environmentally acquired bacteria. The study comes as I. scapularis — the primary vector of Lyme disease — is rapidly expanding northward into Canada due to climate change. The findings suggest that environmentally derived bacteria in tick microbiomes are not mere contamination, which has implications for how tick microbiome data is collected and interpreted across surveillance studies.
Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria
Researchers have discovered that the metabolite acetyl-CoA directly inhibits enzymes that degrade the bacterial signaling molecule c-di-GMP, connecting cell envelope biosynthesis stress to biofilm formation in Pseudomonas aeruginosa. The study found that sub-inhibitory concentrations of antibiotics targeting early peptidoglycan biosynthesis — but not other antibiotic classes — elevate c-di-GMP levels by reducing phosphodiesterase activity, with acetyl-CoA competing for the enzyme active site. Because the relevant enzyme domain is broadly conserved across bacterial species, this checkpoint mechanism may be widespread and could have implications for understanding antibiotic-induced biofilm responses.