Stroop Task Reveals Significant Attention Control Limitations in Leading AI Models

Researchers tested major AI systems including GPT-4o, Claude, and Gemini using the Stroop task, a classic psychology experiment measuring attention and focus. The AI models showed dramatic performance declines as task complexity increased, with accuracy dropping from 91% on short lists to 15% on longer ones, suggesting they struggle to suppress trained responses and maintain focus on instructions. The findings highlight fundamental differences between how AI systems and human brains manage attention and executive control.
A study led by Suketu Patel administered the Stroop task—a well-established psychological test where participants identify ink colors of words that may conflict with the word's meaning—to several leading large language models. The AI systems performed well on short lists of five words but experienced dramatic accuracy declines as list length increased: GPT-4o dropped from 91% accuracy to 57% at ten words and 15% at forty words, while Claude 3.5 Sonnet fell to 24% accuracy with forty-word lists. When matching and mismatched color words appeared together, performance deteriorated further, with some models showing near-zero accuracy on mismatched items. The researchers found that AI models increasingly defaulted to reading the words themselves rather than identifying ink colors, suggesting they could not consistently suppress their most heavily trained responses. This contrasts sharply with humans, who despite being naturally biased toward reading words can maintain high accuracy across long lists of conflicting stimuli, indicating a fundamental difference in how biological brains and current AI systems manage attention and executive control.
What different sources said
- Science DailyCenter
A classic brain test exposed AI's biggest weakness
Related

Potensic Atom 3 Drone Offers DJI Alternative for Global Markets, But Faces US Import Ban
Potensic has released the Atom 3, an upgraded beginner drone featuring a larger sensor, 4K 60fps video, improved battery life, and AI tracking capabilities at competitive pricing ($429.99-$549.99). The drone competes directly with DJI's Lito X1 but faces the same regulatory barriers as DJI in the US market due to a ban on foreign-made drones. The availability restrictions highlight ongoing US trade restrictions on Chinese drone manufacturers and limit consumer choice in the American market.

Wing and Walmart Expand Drone Delivery to Seven Additional U.S. Cities
Alphabet-owned Wing and Walmart are expanding their drone delivery partnership to seven new U.S. cities including Memphis, New Orleans, Philadelphia, Phoenix, San Diego, the San Francisco Bay Area, and Salt Lake City. The expansion is part of a plan to reach over 270 Walmart locations by next year, building on successful deployments in Atlanta, Dallas-Fort Worth, and Houston. The move signals that drone delivery is transitioning from a novelty service to a mainstream logistics option, with Wing having completed over 1 million commercial deliveries.

Anthropic CEO Calls for FAA-Style Regulation of Powerful AI Models
Anthropic CEO Dario Amodei published an essay calling for government regulation of powerful AI models, comparing the approach to FAA oversight of commercial aviation. The proposal includes mandatory third-party testing for frontier models and potential government authority to block or delay their deployment if they pose safety risks. The call comes as Anthropic released Claude Fable 5 and an updated Claude Mythos 5 model with advanced cybersecurity capabilities.