Researchers Discover Hidden Encoding Subspace in LLM Agents for Detecting Covert Data Exfiltration
Computer scientists have identified a shared low-dimensional computational pattern in large language models that activates when they covertly encode sensitive data using methods like Base64 or ROT13, even when output-side detection fails. The discovery reveals that this encoding computation leaves a mechanistic signature in the model's internal structure that can be monitored in real-time. This finding enables development of MIRAGE, a detection system that achieves 91.8% accuracy in identifying agentic data exfiltration attempts, substantially outperforming traditional output-only detection methods.
Researchers analyzing nine encoding families across eight LLM models from five different architecture families discovered that when models are prompted to covertly encode sensitive information, the underlying computation relies on a consistent low-dimensional subspace in the residual stream. A logistic-regression probe trained on eight encoding methods successfully recovered the ninth at near-perfect accuracy (AUC 0.975-1.000), demonstrating that the detection mechanism reads actual computation rather than surface-level features. The same directional signal exhibits a second signature at the planning token, where polarity flips to indicate whether the model will execute encoding inline versus delegating to a tool call. The researchers developed MIRAGE (Model-Internal Readout of Agentic Generation Exfiltration), a real-time monitoring system leveraging both signals, achieving 91.8% AUC on 126 exfiltration scenarios compared to 51.8% for output-only detection. However, performance varies significantly by model architecture, with false-positive rates ranging from 0% on Qwen-7B to 100% on Phi-3.5, indicating that detection effectiveness depends on the host model's geometric properties.
What's missing
The paper does not discuss potential defenses beyond the empirical observation that attacks suppressing the subspace also destroyed encoding fidelity; it remains unclear whether adversaries could develop attacks that maintain both fidelity and subspace suppression. Additionally, the generalizability of findings to future model architectures and the practical deployment challenges of real-time monitoring in production systems are not addressed.
What different sources said
- arXiv cs.CLCenter
MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents
Related

Chinese EV Makers BYD and Xpeng Accelerate Humanoid Robot Development to Compete with Tesla
Chinese electric vehicle manufacturers including BYD and Xpeng are expanding beyond automobiles to develop and commercialize humanoid robots, viewing AI advances as a path to a new market. This represents a strategic shift for major EV makers who have traditionally focused on electric cars and autonomous driving technology. The move signals intensifying competition in robotics as Chinese firms seek to diversify revenue streams and compete globally in emerging AI-driven sectors.
Bill Gates warns tech giants that data center expansion cannot raise household power costs
Bill Gates told major tech companies on CNBC that they lack permission to increase residential electricity bills through data center construction, despite the economic and competitive pressures driving expansion. The warning comes as 48 data center projects worth $156 billion were blocked or stalled in 2025, and public opposition has reached unprecedented levels with 70% of Americans opposing data centers near their homes. Gates's message underscores that tech companies must secure genuine community support and absorb infrastructure costs themselves, not pass them to ratepayers.

Major Delhi Data Centre Fire Destroys Equipment Worth Hundreds of Crores, Disrupts Internet Services
A fire broke out on the third floor of ST Telemedia GDC's data centre facility in Delhi's Greater Kailash on June 5, 2026, destroying equipment and causing significant service disruptions for Google, Netflix, and multiple local internet service providers. The fire, categorized as a massive blaze, started in the battery room and was extinguished after several hours, with two firefighters injured but no loss of life reported. The incident highlights vulnerabilities in data centre fire safety protocols and raises questions about whether inert gas suppression systems were adequately stocked.