Projects using RevBayes

Nicolas Lartillot, Bastien Boussau

Last modified on September 29, 2023

Coevolution between substitution rates and life-history traits


In mammals, there is substantial variation in substitution rate between lineages. This variation may be correlated with life-history traits: small mammals, with short generation times, tend to evolve faster. Here, the aim is measure the correlation between substitution rate and body mass or other life-history traits in mammals, by building a model of the joint evolutionary process of the substitution rate and life-history traits across branches, using a variant of the Brownian model used for modelling the auto-correlated molecular clock (Lartillot and Delsuc, 2012).

A relaxed molecular clock informed by life-history and germ-line development in primates


In primates, the mutation rate per generation is mostly determined by the number of replications in the germline. The developmental process of the germline is relatively well characterized, and a model of its modulation, as a function of life-history (age of puberty, generation time, etc) has been proposed (Amster and Sella, 2016). Here, the aim is to use this model (and the empirically measured values reported in the article of Amster and Sella) to calibrate the molecular clock (and its variation) in simian primates, using information about sexual maturity and generation time in extant species.

Nucleotide compositional variation across species and its impact on phylogenetic reconstruction


The models that we have considered in the tutorials are homogeneous across branches. As a result, they predict the same nucleotide composition in all species. In practice, this is not the case. In some extreme situations, not accounting for compositional variation can result in phylogenetic reconstruction artifacts (typically, unrelated species with similar compositional biases artifactually cluster together). A simple but particularly striking example is shown in Foster, 2004. In this article, a model with branch specific nucleotide composition is introduced to improve phylogenetic reconstruction in such cases. Since then, Heaps et al. have proposed improved models. Here, the aim is to design versions of these models with branch-specific equilibrium frequencies, and to see if they improve phylogenetic reconstruction.

Multi-gene phylogenetic reconstruction of the phylogeny of mammals


Designing a model for doing multi-gene phylogenetic reconstruction. Genes may share the same species phylogeny but may differ in their rate of evolution and in their GC content. Can be used to reconstruct the phylogeny or to estimate the variance in substitution rates and in GC content across genes. The data is available from the article

Correlation between GC composition of ribosomal RNA and growth temperature in Archaea.


The idea is to model the correlated evolution of rRNA GC content and growth temperature across Archaea, and use this model to estimate the correlation and to infer ancestral temperatures along the phylogeny.

Are patterns of absence/presence of genes across genomes informative about the phylogeny?


The idea is to model the process of gain and loss of genes across a phylogeny, and to apply this model to data of absence/presence of genes across metazoans. See article of Pisani et al., (2015) and Ryan et al (2013) for two analyses giving different results on the same dataset.

Convergent evolution toward subterranean lifestyle in isopods, and its consequences on the rate of genome evolution.


Saclier et al have analysed the evolutionary patterns in a group of isopod species, in which there has been a large number of independent transitions from surface to underground lifestyle – giving an opportunity for modeling and investigating the impact of these transitions on genomic sequence evolution.

Exploring the feasibility of importance sampling for phylogenetic inference


Bayesian phylogenetic inference classically relies on MCMC, a computationally intensive algorithm. This is a problem as data sets have been increasing in size, resulting in increased computational and environmental footprints. In this project, you will investigate an alternative algorithm, importance sampling, as follows. Firstly, you will perform MCMC inference on a single gene alignment, and on a big alignment. Secondly, you will subsample a limited number of samples from the posterior distribution obtained on the small alignment. Thirdly, you will evaluate the posterior probability of these samples according to the big alignment. Fourthly, you will reweigh these samples according to the ratio of their posterior probabilities, on the large data set vs the small one. As a result, on the big dataset, you will then have a posterior distribution obtained through importance sampling, and one obtained using MCMC. On the big dataset, does the faster importance sampling approach produce estimates that are similar to the MCMC approach?

Analysis of the SEMG2 gene in Primates


Here the project is to analyze an alignment of the SEMG2 gene from primate species that differ in their mating systems.

Analysis of mitochondrial protein evolution in Daphnia


Here the project is to analyze a concatenated alignment of 15 DNA sequences coding for proteins from 29 strains of Daphnia pulex, some of which reproduce sexually (named S1 to S14), and others, asexually (named A1 to A14). Sexual reproduction is assumed to be the ancestral condition.

Power analysis


Given a limited number of sites, it may be difficult to get high support for all the nodes in a phylogeny. The aim of this project is to investigate how one could predict how many sites should be analyzed for a particular branch to be resolved with high posterior probability.

Investigate codon-position models vs codon models for phylogenetic reconstruction


Coding sequences are typically modelled at the codon level, using an alphabet with 61 states (64 - 3 stop codons). These models are often use to study natural selection. Another way to model coding sequences is to partition the data into three categories: first, second, and third codon positions. The purpose of this project is to try both models and compare them, either in terms of posterior predictive simulations, or in terms of phylogenetic reconstruction.

What did proteins look like close to the origin of life?


10 residues (ASDGLIPTEV) represent a consensus view of plausibly available amino acids through prebiotic chemistry. Giacobelli et al. took the C-terminal domain of a ribosomal protein, UL11, and replaced all “recent” amino acids by one of the 10 ancient ones, to generate a vast number of proteins which might have been functional before all amino acids were available. They then selected the most viable proteins among those. In this project, the goal is to try to reverse-evolve in silico the C-terminal domain of UL11 towards the reduced set of 10 amino acids, and see how it compares to the experimental results of Giacobelli et al.

Detecting changes in protein sequences associated to changes to the C4 metabolism in plants


The C4 metabolism is used by several groups of plants as an adaptation to photosynthesize in hot and dry conditions. In the rubisco protein, several sites seem to be associated to this metabolic change (see Parto and Lartillot, 2016). The aim of this project is to build a model to try and find these sites using RevBayes, using the amino acid sequences of the protein.