Apple's New On-Device AI Architecture Overcomes Memory Limitations Using Flash Storage
Apple announced AFM 3 Core Advanced, a 20-billion-parameter on-device AI model that stores weights in NAND flash memory instead of DRAM, solving a fundamental constraint that previously limited on-device AI capabilities. The architecture uses a novel routing mechanism that makes expert selection decisions once per prompt rather than per token, allowing it to work within the bandwidth limitations of consumer hardware. This breakthrough enables more capable local AI processing without requiring cloud connectivity, addressing a key limitation for enterprise AI deployment.
Apple unveiled its third-generation foundation models at WWDC26, including AFM 3 Core Advanced, which addresses the long-standing memory constraint that has limited on-device AI models. Traditional on-device deployments require entire model weights to fit in DRAM, capping practical parameter counts far below server-side capabilities. Apple's solution stores the full 20-billion-parameter model in NAND flash memory and uses a specialized routing mechanism called Instruction-Following Pruning that makes expert selection decisions once per prompt, then generates all tokens using the same expert configuration. This approach avoids the bandwidth bottleneck that would occur if weights needed to move between flash and DRAM for every token generated. The active parameter count dynamically scales from 1 billion to 4 billion depending on task complexity, and the AFM 3 family includes both on-device and server-based models running within Apple's Private Cloud Compute boundary.
What's missing
The articles do not discuss competitive responses from other chip manufacturers or AI companies working on similar on-device solutions, nor do they address the practical performance benchmarks or latency comparisons between this approach and cloud-based alternatives that users would actually experience.
How coverage differed
VentureBeat's coverage is technical and detailed, focusing on the architectural innovation and engineering constraints. The source balances Apple's disclosures with expert commentary from researchers like Awni Hannun at Anthropic, noting both what Apple has explained and what remains undisclosed regarding energy, thermal, and bandwidth metrics critical to real-world deployment.
What different sources said
- VentureBeatCenter
On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.
Related
Tesla Seeks Permission to Deploy Up to 5,000 Robotaxis in Las Vegas
Tesla has filed an application with Nevada regulators to operate a fleet of up to 5,000 autonomous vehicles across Las Vegas, Clark County, and surrounding areas. The proposal represents one of Tesla's largest robotaxi deployment requests and follows successful pilot programs in Texas cities like Austin. The decision could significantly expand autonomous vehicle services in a major metropolitan area while testing Tesla's camera-based self-driving technology at scale.
Apple Unveils AI-Powered Siri Updates at WWDC with Enhanced Personal Context Features
Apple announced significant AI-powered updates to Siri at its WWDC keynote, enabling the assistant to access personal context across native apps and understand on-screen content. The updates represent a major evolution of the voice assistant after years of development, with Apple emphasizing on-device processing and privacy through its Private Cloud Compute technology. The improvements address long-standing user frustrations with smartphone management while raising questions about data privacy and the practical limitations of AI integration.
Anthropic Releases Mythos AI Model to Public with Safety Restrictions
Anthropic has launched a public version of its Mythos AI model with built-in safeguards preventing use in high-risk areas like cybersecurity. The model, called Claude Fable 5, represents Anthropic's most powerful publicly available AI system and follows an April preview that demonstrated the model's ability to identify thousands of software vulnerabilities. The release reflects the competitive pressure between major AI companies to expand capabilities while managing safety concerns.