BEAST 2 Help Me Choose

BEAST 2 Help Me Choose StarBeast3 template -- species clock model panel

StarBeast3 template – species clock model panel

Here, we will consider the strict clock and the multispecies coalescent relaxed clock (Ogilvie at el. 2017).

Initial exploration

For an initial run when no experience with the type of data is available, running with the strict clock and simple substitution model and tree prior is a good way to find out whether there is signal in the data.

A relaxed clock analysis can then be added if there are a good number of mutations in the data and a reasonable number of taxa. If the distribution of the coefficient of variation is hugging zero, this is a signal that there is little rate variation among branches, and a strict clock is supported by the data. If this distribution is not touching zero, this is a signal there is rate variation and a relaxed clock is supported by the data. If you are unsure, you can run a path sampling or nested sampling analysis to perform Bayesian hypothesis testing.

Estimating the clock rate

By default, the clock rate is fixed at 1. To estimate the clock rate, open Mode and uncheck ` Automatic set clock rate. Then check the estimate button of Clock.rate in the Species Clock Model tab. The clock rate prior distribution should be set in the Priors` tab using an informed estimate.

Relaxed clock model

Under this model, each branch in the species tree is associated with a rate, and these rates are independent and identically drawn from a LogNormal distribution. The mean of this LogNormal distribution is fixed at 1 (to avoid non-identifiability with tree heights and clock rates), while the standard deviation is estimated. In contrast to the relaxed clock model used in a standard concatenated phylogenetic analysis, here, the gene trees inherit their branch substitution rates from the species tree. Thus, even though there could potentially be dozens or hundreds of gene trees, the number of branch substitution rates only grows with the size of the species tree.

When running the relaxed clock model, keep an eye on the estimated lognormal clock standard deviation. If its estimate is quite small (e.g. less than 0.1), then the data are very clock-like, and a strict clock may be adequate. Whereas, if the estimate is quite large (e.g. above 1), this suggests there may be issues with the dataset or the model. Specifically, it may indicate convergence issues, poor signal in the data perhaps due to a bad multiple sequence alignment, it may indicate errors in the taxon mapping, or various other issues.

References

Douglas, J, Jiménez-Silva, CL, Bouckaert, RR. “StarBeast3: Adaptive Parallelized Bayesian Inference under the Multispecies Coalescent.” Systematic Biology (2022). 10.1093/sysbio/syac010.

Douglas J, Zhang R, Bouckaert R. Adaptive dating and fast proposals: Revisiting the phylogenetic relaxed clock model. PLoS computational biology. 2021 Feb 2;17(2):e1008322. doi:10.1371/journal.pcbi.1008322.

Ogilvie, HA, Bouckaert, RR, Drummond, AJ. “StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates.” Molecular biology and evolution 34.8 (2017): 2101-2114. doi:10.1093/molbev/msx126.

Bayesian evolutionary analysis by sampling trees

Disclaimer: The above is the opinion of the author RB. If you do not agree, or spot a mistake, contact the author, or discuss this in the issues area or raise a new issue. A link will be added from this page to make sure others can find it.

Served through Jekyll, customised theme based on the twentyfourteen wordpress theme.