AI Summary • Published on Dec 1, 2025
Estimating the divergence times of evolutionary lineages from molecular sequence data is a critical task in reconstructing evolutionary history. However, current Bayesian relaxed-clock methods, which rely on repeated evaluations of the full phylogenetic likelihood, are computationally demanding, especially for large genomic datasets. A significant challenge also arises from the sensitivity of divergence-time estimates to uncertainties or errors in fossil placement and the specification of prior distributions. This dependence means that the quality of node-age estimates is highly reliant on the accuracy of fossil information, and issues like fossil misplacement or poorly specified priors can lead to considerable inaccuracies and disagreements in results. There is a clear need for methods that are both computationally efficient and robust to these common sources of fossil calibration uncertainty.
This study proposes novel solutions based on the phylogenetic pairwise composite likelihood, introducing two Adjusted Pairwise Likelihood (APW) formulations: APW1 and APW2. These methods integrate asymptotic moment-matching weights within a Bayesian Markov Chain Monte Carlo (MCMC) framework to better approximate the behavior of the full likelihood while significantly reducing computational burden. Unlike the full phylogenetic likelihood, which becomes more expensive with longer alignments, the computational cost of the pairwise composite likelihood remains constant with increasing sequence length, making it ideal for large phylogenomic datasets. The adjustment weights for APW1 and APW2 are derived from the asymptotic properties of the composite likelihood ratio test (LRT) statistic, specifically by estimating sensitivity (J) and variability (H) matrices using independent subsets of the alignment. The methodology supports both Jukes-Cantor (JC) and General Time Reversible (GTR) substitution models. The performance of these APW methods was rigorously assessed through extensive simulations across various fossil-calibration scenarios, including situations with correctly placed fossils and correct priors, as well as scenarios with misleading or less informative priors and even misplaced fossils. The methods were also applied to a real-world genome-scale dataset of modern birds.
The simulations demonstrated that APW methods produce node-age estimates comparable to those obtained from the full likelihood. Crucially, they exhibited greater robustness to fossil misplacement and prior misspecification, largely due to the reduced sensitivity of composite likelihoods to local calibration errors. In correctly calibrated settings, APW2 and APW1 achieved the highest overall credible interval (CI) coverage, often exceeding the nominal 0.95 level, and APW2 consistently yielded the lowest root mean square error (rMSE). For scenarios with incorrect or less informative priors, or misplaced fossils, the APW methods generally maintained higher CI coverage rates and showed less degradation in rMSE compared to the full likelihood, which often exhibited the lowest coverage rates. A significant finding was the substantial computational advantage of the APW methods. For the modern bird dataset, the APW models were approximately 27 times faster than the full likelihood method, with similar computational gains observed in simulations for longer sequence alignments (e.g., an order of magnitude faster for 50,000-site alignments). While APW2 sometimes resulted in wider credible intervals, reflecting greater uncertainty, particularly for younger nodes, these intervals often covered the point estimates from the full likelihood. The application to the modern bird dataset showed that both APW2 and full likelihood approaches independently suggest that the radiation of modern birds began before the K-Pg boundary.
The adjusted pairwise likelihood (APW) methods offer a significant advancement for Bayesian node dating, providing a framework that is both computationally efficient and robust to the inherent uncertainties in fossil calibration. This makes APW particularly well-suited for analyzing the increasingly large phylogenomic datasets produced by high-throughput sequencing technologies, where traditional full-likelihood methods become prohibitively expensive. By effectively balancing estimation precision with methodological robustness, APW1 and APW2 consistently outperformed both the unadjusted pairwise likelihood and the full likelihood under challenging simulation conditions. The successful implementation of these methods within MrBayes, a widely used Bayesian phylogenetic software, ensures their accessibility and integration with existing phylogenetic models. The ability of APW to provide reliable estimates even with uncertain or imperfect fossil priors addresses a critical bottleneck in evolutionary research, allowing for more confident reconstruction of evolutionary timelines, especially for deep divergences where fossil evidence is often sparse or contentious.