New Training Method Helps AI Models Better Handle Conflicting Instructions Based on Source Trustworthiness
Researchers introduced Gravity-Weighted Direct Preference Optimization (GW-DPO), a training technique that teaches large language models to prioritize instructions from more trustworthy sources over less trustworthy ones. Current AI models treat all instructions equally, creating vulnerabilities to malicious prompt injections and conflicts between legitimate but competing directives. This approach could improve AI safety by enabling models to enforce principled hierarchies when receiving instructions from sources with different levels of authority.
A new preprint from arXiv proposes a solution to a fundamental vulnerability in large language models: their inability to distinguish between instructions from different sources of authority. The researchers formalized a five-level instruction hierarchy problem and developed Gravity-Weighted DPO, a preference-optimization training objective that scales penalties based on the structural distance between conflicting instruction levels. When combined with hierarchy-specific delimiter tokens and Instructional Segment Embeddings, the method demonstrated improvements over standard training approaches on Llama-3.1-8B-Instruct, achieving better adherence to priority rules while reducing over-refusal rates. The work addresses both security concerns (malicious prompt injections) and practical challenges (resolving legitimate competing instructions), with ablation studies revealing that different hierarchy depths involve tradeoffs between generality and specialization.
What's missing
The preprint does not provide evaluation results on real-world adversarial prompt injection attacks or downstream task performance beyond the hierarchy adherence metrics. The generalizability of the approach to models beyond Llama-3.1-8B-Instruct and to production-scale deployments remains unclear. Additionally, the paper does not discuss computational overhead or training costs compared to standard DPO.
What different sources said
- arXiv cs.CLCenter
Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization
Related

Chinese EV Makers BYD and Xpeng Accelerate Humanoid Robot Development to Compete with Tesla
Chinese electric vehicle manufacturers including BYD and Xpeng are expanding beyond automobiles to develop and commercialize humanoid robots, viewing AI advances as a path to a new market. This represents a strategic shift for major EV makers who have traditionally focused on electric cars and autonomous driving technology. The move signals intensifying competition in robotics as Chinese firms seek to diversify revenue streams and compete globally in emerging AI-driven sectors.
Bill Gates warns tech giants that data center expansion cannot raise household power costs
Bill Gates told major tech companies on CNBC that they lack permission to increase residential electricity bills through data center construction, despite the economic and competitive pressures driving expansion. The warning comes as 48 data center projects worth $156 billion were blocked or stalled in 2025, and public opposition has reached unprecedented levels with 70% of Americans opposing data centers near their homes. Gates's message underscores that tech companies must secure genuine community support and absorb infrastructure costs themselves, not pass them to ratepayers.

Major Delhi Data Centre Fire Destroys Equipment Worth Hundreds of Crores, Disrupts Internet Services
A fire broke out on the third floor of ST Telemedia GDC's data centre facility in Delhi's Greater Kailash on June 5, 2026, destroying equipment and causing significant service disruptions for Google, Netflix, and multiple local internet service providers. The fire, categorized as a massive blaze, started in the battery room and was extinguished after several hours, with two firefighters injured but no loss of life reported. The incident highlights vulnerabilities in data centre fire safety protocols and raises questions about whether inert gas suppression systems were adequately stocked.