A usability patch for phylip drawtree

13 Jan

The drawtree program in the phylib package is still one of the most beautiful ways to visualize small unrooted trees (specifically for teaching purposes or publications).  However, being a program that was developed throughout the 80ies-90ies, its user interface is a bit dated. 

If you just want to plot the tree with default settings and don’t want to go through the input/preview loop, apply this little patch to drawtree.c, compile your program and execute this script (make sure your drawtree is on your $PATH or just modify the script). Also install pdfcrop and ps2pdf. Then, your tree will be directly converted into a pdf file with the same name as your tree file.

Visualizing RF/likelihood landscapes of RAxML-tree searches

15 Dec

The search for the maximum likelihood tree is a NP-hard problem. With RAxML, you usually conduct something like 100 tree searches in order to find a maximum likelihood estimate (MLE) tree. Depending on the shape of your likelihood surface, many of the trees will end up in various local minima. Usually, you will only consider the tree with the best likelihood. However, if you tested various partitioning schemes (i.e., an unpartitioned super-matrix, one partition per gene, some genes and some additional proteins, searches with or without individual branch length optimization), then you obtain trees that are not comparable to each other (i.e., across different partitioning schemes) in terms of likelihood.

However, it is straight-forward to compare their RF-distances. If you concatenate all trees into a file (one tree per line) and run RAxML with -f r, then you obtain a triangular matrix of the topological distances of the trees (= RF-distances). Below, you see a heatmap visualization of the RF-distances of 80 trees. An unpartitioned super-matrix was used to infer 40 trees (red) and the other 40 trees are based on a partitioned dataset (blue). The heatmap.2 function of R clusters the topological distances, such that you easily can see, which trees are very close to each other. I thought, it would be nice, if I was able to inspect the likelihoods of the respective trees at the same time. So I replaced, what is usually a dendrogram with a barplot, that indicates the relative likelihoods of the trees. Relative means, that the tree-likelihood is divided by the average likelihood of all trees inferred under the same partitioning scheme. Smaller bars show a higher likelihood and the per-partition MLE is marked with an additional red bar below the likelihood. But still: bars with different colours are not comparable among each other.

Image  

Okay, so what is the recipe to quickly recreate the plot? Download the modified heatmap.2 code and source it in your script. The signature of the plot-function is rfDistancesWithLikelihood(rfDistFile, lnlFile, lnlCol, catCol).
rfDistFile is the RAxML_RF-Distances.runId-file as produced by “RAxML -f n”. lnl-file contains the likelihoods of the trees. With lnlCol you specify the column that contains the likelihoods and catCol is the column that categorizes the trees into the different partitioning schemes. The zip-archive contains example files. Important: When you call RAxML to produce the RF-distances, the order of the tree must be the same as in the lnlFile.

A few things we see in this specific heatmap: for the partitioned analysis, there is a local minimum that is topologically distant from the MLE tree for this partitioning scheme, however not much worse in terms of likelihood. For the searches on the unpartitioned dataset it seems to make a big difference, if we search under GTRGAMMA or GTRCAT.

In general, be careful with the interpretation of the clustering (depends on the clustering-method; use the argument “clustmethod” to change between single, complete or average). Also, there are caveats to the RF-distance: a single rogue taxon that assumes distant positions in two compared trees can lead to extremely high RF-distances, even if the trees are topologically identical without the rogue.

Aside

Reboot

15 Dec

Maybe you still have this blog on your radar. From now on, I will use it mostly to write about research-related topics that I encounter throughout my PhD student time.

Follow

Get every new post delivered to your Inbox.