PublicationsJun 1083% confidence

Researchers Develop Method for Knowledge Transfer Between Language Models with Different Tokenizers

Center 100%

1 source

Researchers have introduced Zero-Expert Self-Distillation Adaptation (ZEDA), a framework that converts static Mixture-of-Experts (MoE) language models into dynamic ones capable of skipping more than 50% of expert computations. MoE architectures scale large language models efficiently through sparse activation, but existing dynamic variants typically require training from scratch or task-specific tuning. ZEDA offers a low-cost post-training conversion path that could meaningfully reduce serving costs for already-deployed AI models.

A team of researchers has proposed ZEDA (Zero-Expert Self-Distillation Adaptation), a framework designed to make already-trained Mixture-of-Experts (MoE) language models more computationally efficient at inference time. MoE models route inputs through a subset of specialized sub-networks called experts, and dynamic MoE variants go further by adjusting which experts are activated based on each input — allowing simpler inputs to bypass unnecessary computation. ZEDA achieves this by injecting parameter-free zero-output experts into each MoE layer and then adapting the model through a two-stage self-distillation process, using the original frozen model as a teacher alongside a group-level balancing loss. Tested on Qwen3-30B-A3B and GLM-4.7-Flash across 11 benchmarks covering mathematics, code generation, and instruction following, ZEDA eliminated over 50% of expert floating-point operations with only marginal accuracy loss. It outperformed the strongest existing dynamic MoE baseline by 6.1 and 4.0 points on the two respective models and achieved approximately 1.20× end-to-end inference speedup. The work addresses a practical gap in the field, as most prior dynamic MoE research has focused on pre-training rather than adapting fully trained models already in deployment.

What's missing

The study does not report results on models beyond Qwen3-30B-A3B and GLM-4.7-Flash, leaving generalizability to other MoE architectures uncertain. Hardware-specific conditions (GPU type, batch size, serving configuration) underlying the 1.20× speedup claim are not detailed in the abstract, making real-world replication difficult to assess. The paper has not yet undergone formal peer review, as it is a preprint on arXiv.

What different sources said

arXiv cs.AICenter
Pruning and Distilling Mixture-of-Experts into Dense Language Models

Publications

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Researchers have discovered that an enzyme in common gut bacteria can degrade N-epsilon-carboxymethyllysine (CML), a compound formed during thermal food processing, producing previously unknown biogenic amines. The enzyme, ornithine decarboxylase SpeC from enterobacteria, acts on CML and related modified lysine derivatives through a low-level 'underground' catalytic activity. This finding suggests a previously unrecognized communication axis between thermally processed dietary compounds and gut microbial physiology, with potential implications for host health.

1 sourceJun 13

Publications

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Researchers used Oxford Nanopore full-length 16S rRNA gene sequencing to characterize the microbiome of Ixodes scapularis black-legged ticks collected in Nova Scotia, Canada, distinguishing between tick-adapted bacteria and environmentally acquired bacteria. The study comes as I. scapularis — the primary vector of Lyme disease — is rapidly expanding northward into Canada due to climate change. The findings suggest that environmentally derived bacteria in tick microbiomes are not mere contamination, which has implications for how tick microbiome data is collected and interpreted across surveillance studies.

1 sourceJun 13

Publications

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria

Researchers have discovered that the metabolite acetyl-CoA directly inhibits enzymes that degrade the bacterial signaling molecule c-di-GMP, connecting cell envelope biosynthesis stress to biofilm formation in Pseudomonas aeruginosa. The study found that sub-inhibitory concentrations of antibiotics targeting early peptidoglycan biosynthesis — but not other antibiotic classes — elevate c-di-GMP levels by reducing phosphodiesterase activity, with acetyl-CoA competing for the enzyme active site. Because the relevant enzyme domain is broadly conserved across bacterial species, this checkpoint mechanism may be widespread and could have implications for understanding antibiotic-induced biofilm responses.

1 sourceJun 13

Researchers Develop Method for Knowledge Transfer Between Language Models with Different Tokenizers

What's missing

What different sources said

Related

Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines

Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada

Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria