Waymo Develops New Benchmark Model to Compare Autonomous Driving Safety Against Human Drivers

Waymo has created a new computer model called the Reference Driver, developed with TU Delft and published in Nature Communications, designed to more accurately measure how its autonomous vehicles perform compared to human drivers. The model uses active inference theory to simulate how human drivers behave before and during crashes, improving upon previous models that only captured last-second reactive maneuvers. The advancement comes as Waymo scales operations to more cities and faces increased regulatory scrutiny following incidents like a January crash involving a child in Santa Monica.
Waymo, the Alphabet-owned robotaxi company, has unveiled a new behavioral benchmark model for evaluating autonomous driving safety against human driving standards. Developed in collaboration with TU Delft and published in Nature Communications, the Reference Driver model uses active inference—a framework based on the theory that drivers constantly imagine possible futures and take actions to reach the safest outcome—to more realistically simulate human behavior in traffic conflicts and crash scenarios. Unlike previous industry models that focused on replicating last-second reactive maneuvers, the Reference Driver can reproduce the full sequence of human driving behavior leading up to a crash, including the internal "surprise" a driver experiences during conflicts. Waymo is making the research code available under an academic, non-commercial license to encourage broader collaboration. The timing is significant, as the company faces heightened regulatory attention following a January incident in Santa Monica where a Waymo vehicle struck a child at 6 mph after decelerating from 17 mph.
What's missing
The study's own limitations and open questions are not detailed in the article. Additionally, the article does not provide information about how the new Reference Driver model's predictions compare quantitatively to the previous model's performance, or specific metrics demonstrating improved accuracy beyond the conceptual framework described.
What different sources said
- The VergeLeft
Waymo built a virtual driver to study how humans react to surprises on the road
- TechCrunchCenter
Waymo says it built a better benchmark for comparing robotaxis to humans
- EngadgetCenter
Waymo made a virtual human driver to improve its robotaxis
Related

Potensic Atom 3 Drone Offers DJI Alternative for Global Markets, But Faces US Import Ban
Potensic has released the Atom 3, an upgraded beginner drone featuring a larger sensor, 4K 60fps video, improved battery life, and AI tracking capabilities at competitive pricing ($429.99-$549.99). The drone competes directly with DJI's Lito X1 but faces the same regulatory barriers as DJI in the US market due to a ban on foreign-made drones. The availability restrictions highlight ongoing US trade restrictions on Chinese drone manufacturers and limit consumer choice in the American market.

Wing and Walmart Expand Drone Delivery to Seven Additional U.S. Cities
Alphabet-owned Wing and Walmart are expanding their drone delivery partnership to seven new U.S. cities including Memphis, New Orleans, Philadelphia, Phoenix, San Diego, the San Francisco Bay Area, and Salt Lake City. The expansion is part of a plan to reach over 270 Walmart locations by next year, building on successful deployments in Atlanta, Dallas-Fort Worth, and Houston. The move signals that drone delivery is transitioning from a novelty service to a mainstream logistics option, with Wing having completed over 1 million commercial deliveries.

Anthropic CEO Calls for FAA-Style Regulation of Powerful AI Models
Anthropic CEO Dario Amodei published an essay calling for government regulation of powerful AI models, comparing the approach to FAA oversight of commercial aviation. The proposal includes mandatory third-party testing for frontier models and potential government authority to block or delay their deployment if they pose safety risks. The call comes as Anthropic released Claude Fable 5 and an updated Claude Mythos 5 model with advanced cybersecurity capabilities.