TellWell
← Back to feed
Publications3d ago92% confidenceConfidence 92% — the share of independent, credible sources corroborating the core facts.

iOSWorld: New Benchmark Tests AI Agents' Ability to Understand Personal Data on iPhones

Center 100%
1 source

Researchers have created iOSWorld, the first interactive iOS simulator benchmark designed to test whether AI agents can reason about a user's personal identity, history, and preferences stored on a device. The benchmark includes 26 custom-built iOS apps with connected data like messages, transactions, and financial records, containing 133 tasks of varying difficulty. This addresses a gap in existing mobile agent benchmarks, which typically test isolated instructions rather than personalized, multi-app reasoning.

iOSWorld is a new open-source benchmark that evaluates how well AI agents can perform tasks requiring understanding of personal data across multiple iPhone apps. The benchmark features 26 newly built iOS applications containing realistic connected data including transactions, messages, travel records, social relationships, and financial activity. Tasks are organized into three categories: single-app tasks (27), multi-app tasks spanning 2-8 apps (60), and memory and personalization tasks requiring inference from personal data patterns (46). When tested on frontier and open-source computer-use models, the best-performing configuration achieved 52% overall accuracy but only 37% on multi-app tasks, suggesting significant room for improvement. The researchers found that providing models with privileged access to XML accessibility trees improved frontier models by up to 26 percentage points, though smaller models did not benefit from this additional input. The benchmark is being released as open-source with all apps, data, tasks, evaluation rubrics, and code.

What different sources said

  • iOSWorld: A Benchmark for Personally Intelligent Phone Agents

Related

PublicationsConfidence 78% — the share of independent, credible sources corroborating the core facts.

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Researchers have discovered that an enzyme in common gut bacteria can degrade N-epsilon-carboxymethyllysine (CML), a compound formed during thermal food processing, producing previously unknown biogenic amines. The enzyme, ornithine decarboxylase SpeC from enterobacteria, acts on CML and related modified lysine derivatives through a low-level 'underground' catalytic activity. This finding suggests a previously unrecognized communication axis between thermally processed dietary compounds and gut microbial physiology, with potential implications for host health.

1 source46m ago
PublicationsConfidence 78% — the share of independent, credible sources corroborating the core facts.

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Researchers used Oxford Nanopore full-length 16S rRNA gene sequencing to characterize the microbiome of Ixodes scapularis black-legged ticks collected in Nova Scotia, Canada, distinguishing between tick-adapted bacteria and environmentally acquired bacteria. The study comes as I. scapularis — the primary vector of Lyme disease — is rapidly expanding northward into Canada due to climate change. The findings suggest that environmentally derived bacteria in tick microbiomes are not mere contamination, which has implications for how tick microbiome data is collected and interpreted across surveillance studies.

1 source46m ago
PublicationsConfidence 78% — the share of independent, credible sources corroborating the core facts.

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria

Researchers have discovered that the metabolite acetyl-CoA directly inhibits enzymes that degrade the bacterial signaling molecule c-di-GMP, connecting cell envelope biosynthesis stress to biofilm formation in Pseudomonas aeruginosa. The study found that sub-inhibitory concentrations of antibiotics targeting early peptidoglycan biosynthesis — but not other antibiotic classes — elevate c-di-GMP levels by reducing phosphodiesterase activity, with acetyl-CoA competing for the enzyme active site. Because the relevant enzyme domain is broadly conserved across bacterial species, this checkpoint mechanism may be widespread and could have implications for understanding antibiotic-induced biofilm responses.

1 source46m ago