TellWell
← Back to feed
Publications3h ago92% confidenceConfidence 92% — the share of independent, credible sources corroborating the core facts.

Claw-SWE-Bench: New Benchmark for Evaluating AI Coding Agents Across Multiple Languages

Center 100%
1 source

Researchers introduced Claw-SWE-Bench, a new benchmark designed to fairly evaluate how well AI agents like OpenClaw perform on software engineering tasks across 8 programming languages. The benchmark addresses limitations in existing evaluation methods by standardizing testing conditions, including fixed prompts, runtime budgets, and workspace requirements. This work is significant because it reveals that adapter design and model choice substantially affect agent performance and cost, enabling more reproducible comparisons of AI coding systems.

Claw-SWE-Bench is a multilingual benchmark containing 350 GitHub issue-resolution instances across 8 languages and 43 repositories, designed to measure how well general-purpose AI agents perform on coding tasks under standardized conditions. The researchers found that existing benchmarks like SWE-bench do not adequately accommodate different agent architectures (or "claws"), making fair comparison difficult. The benchmark includes both a full version and a lighter 80-instance subset for faster validation. Testing revealed significant performance variation: OpenClaw with a minimal adapter achieved only 19.1% Pass@1 accuracy, while the full adapter reached 73.4% with the same underlying model, demonstrating that adapter design is critical. Across multiple model and harness combinations, model choice varied performance by 29.4 percentage points and harness choice by 27.4 percentage points, while systems with similar accuracy could differ substantially in API costs.

What different sources said

  • Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

Related

PublicationsConfidence 82% — the share of independent, credible sources corroborating the core facts.

Genetic Drift, Not Selection, Drives Rapid Feather Color Evolution in Island Bird Radiation

A new study of an island bird radiation found that rapid evolution of feather coloration is driven primarily by genetic drift in small populations rather than sexual or ecological selection. The research integrated whole-genome data with detailed plumage measurements across complete species sampling to test whether signaling trait evolution correlates with speciation rates. The findings suggest that neutral demographic processes play a central role in generating phenotypic diversity during island radiations, challenging assumptions about the mechanisms driving rapid evolution.

1 source3m ago
PublicationsConfidence 82% — the share of independent, credible sources corroborating the core facts.

New AI Model Improves Prediction of Therapeutic Peptide Function from Protein Sequences

Researchers developed a lightweight CNN classifier that predicts whether peptide sequences have therapeutic properties, trained on a database of 54,655 peptides across 48 functional categories. The model uses a novel negative sampling strategy to reduce false positive rates from over 60% in previous approaches to 2.1%. This advancement could accelerate drug discovery by enabling faster computational screening of peptide candidates before expensive experimental testing.

1 source11m ago
PublicationsConfidence 82% — the share of independent, credible sources corroborating the core facts.

Study Shows Different Metabolic Stress Models Produce Distinct Effects on Human Neuronal Networks

Researchers tested three common in vitro metabolic stress models on human-derived neuronal networks and found each produced different patterns of neuronal activity and cell damage. The models tested were hypoxia alone, oxygen-glucose deprivation (OGD), and hypoxia combined with glutamate exposure. The findings suggest that choice of experimental model significantly affects results and that combining electrophysiological and structural analyses is important for accurately assessing metabolic stress in stroke research.

1 source11m ago