New Benchmark Datasets Introduced for Lead-Lag Forecasting in Social Platforms
Researchers have created two large benchmark datasets for lead-lag forecasting (LLF), a machine learning task that predicts delayed outcomes from early signals on social platforms. The datasets include 2.3 million arXiv papers tracking views-to-citations and 3 million GitHub repositories tracking pushes/stars-to-forks over multi-year periods. This work establishes LLF as a unified forecasting problem and provides standardized datasets to advance research in predicting long-term impacts from early user interactions.
Researchers have formalized and introduced benchmark datasets for lead-lag forecasting (LLF), a time-series prediction task where early interactions on social platforms (such as views, likes, or downloads) are used to predict later, higher-impact outcomes (citations, sales, or reviews). The study presents two high-volume datasets: arXiv with 2.3 million papers tracking the relationship between accesses and citations, and GitHub with 3 million repositories tracking the relationship between pushes/stars and forks. The datasets span multi-year horizons, capture the full spectrum of outcomes, and avoid survivorship bias. The researchers verified lead-lag dynamics through statistical and classification tests, benchmarked both parametric and non-parametric baseline models, and documented all technical details of data curation and cleaning. This work establishes LLF as a novel forecasting paradigm within the time-series community and provides an empirical foundation for systematic exploration of these patterns in social and usage data.
What different sources said
- arXiv cs.LGCenter
Benchmark Datasets for Lead-Lag Forecasting on Social Platforms
Related
Gut Bacteria Enzyme Found to Break Down Heat-Processed Food Compounds, Producing Novel Biogenic Amines
Researchers have discovered that an enzyme in common gut bacteria can degrade N-epsilon-carboxymethyllysine (CML), a compound formed during thermal food processing, producing previously unknown biogenic amines. The enzyme, ornithine decarboxylase SpeC from enterobacteria, acts on CML and related modified lysine derivatives through a low-level 'underground' catalytic activity. This finding suggests a previously unrecognized communication axis between thermally processed dietary compounds and gut microbial physiology, with potential implications for host health.
Full-Length Gene Sequencing Reveals Two Distinct Bacterial Communities in Black-Legged Ticks Expanding Into Canada
Researchers used Oxford Nanopore full-length 16S rRNA gene sequencing to characterize the microbiome of Ixodes scapularis black-legged ticks collected in Nova Scotia, Canada, distinguishing between tick-adapted bacteria and environmentally acquired bacteria. The study comes as I. scapularis — the primary vector of Lyme disease — is rapidly expanding northward into Canada due to climate change. The findings suggest that environmentally derived bacteria in tick microbiomes are not mere contamination, which has implications for how tick microbiome data is collected and interpreted across surveillance studies.
Study Identifies Metabolic Link Between Cell Envelope Stress and Biofilm Formation in Bacteria
Researchers have discovered that the metabolite acetyl-CoA directly inhibits enzymes that degrade the bacterial signaling molecule c-di-GMP, connecting cell envelope biosynthesis stress to biofilm formation in Pseudomonas aeruginosa. The study found that sub-inhibitory concentrations of antibiotics targeting early peptidoglycan biosynthesis — but not other antibiotic classes — elevate c-di-GMP levels by reducing phosphodiesterase activity, with acetyl-CoA competing for the enzyme active site. Because the relevant enzyme domain is broadly conserved across bacterial species, this checkpoint mechanism may be widespread and could have implications for understanding antibiotic-induced biofilm responses.