r - How to create Newick tree format from raw morphology data in R on Mac OSX

Question

I'm trying to teach myself how to do phylogenetics for historical linguistics in R. I've found a public data set (https://www.cs.rice.edu/~nakhleh/CPHL/IEDATA_112603), and I want to get a Newick format tree from it, so that I can visualize it following these instructions: https://www.r-phylo.org/wiki/HowTo/InputtingTrees. I'm running R 3.4.1 on Max OS 10.12.6.

Here's what I've done so far. I copied the data and used R and a text editor to transform it into a Nexus data file. Since Nexus (as I understand it) can't distinguish between the individual characters 1 and 2, and the combined character 12, I turned all values in the original data set over 9 into letters of the alphabet, in sequence (a-q). Anyone can download it from here: https://ucla.box.com/s/i4fbeagcw8lombg3xuhczfk3h0y7v54m

The problem is, I can't find any instructions or code or guidance to interpret the raw data as a tree. I've found one Python script (Convert csv to Newick tree), but I don't know Python. Can anyone point me in the direction of the right software/library/tutorial, or otherwise help me figure out what my next step should be?

score 2 · Accepted Answer

I finally found a colleague who could help me. I did not need to convert the data to Newick or Nexus to make a tree from it, I needed to convert it to phydat (see Phangorn package for R) to make a tree from it. What I did was to use the as.phydat() function in the Phangorn package for R to convert the linguistic data into "phylogenetic data." The way that I did this was by specifying "type = USER" in the function, which let me define my own levels for the data. There's a more detailed example at cran.r-project.org/web/packages/phangorn/vignettes/…. Then, I could create trees from it using the regular Phangorn functions.

score 0 · Accepted Answer

Using Phangorn might be a good approach in R (have a look at the "Constructing phylogenetic trees" vignette).

browseVignettes(package = "phangorn")

However, to properly infer the tree, I would advise you to use a "proper" phylogenetic inference software with more options (phangorn is excellent for explorative analysis but can be limited).

I suggest you use the BEAST software that has an entire tutorial dedicated to phylogenetic linguistics (https://www.luke.maurits.id.au/files/research/papers/beastling.pdf). Luke Maurits tutorial on github is really well explained (https://github.com/lmaurits/BEASTling/blob/master/docs/tutorial.rst).

Also, regarding your problem with ambiguous character states in your NEXUS file (i.e. state 12 for 1 and 2) you can code them in the nexus file as (12). For example this is a valid NEXUS format:

#NEXUS

BEGIN DATA;
DIMENSIONS NTAX=2 NCHAR=3;

MATRIX
t1 1(12)2
t2 111
;
END;

r - How to create Newick tree format from raw morphology data in R on Mac OSX

2 回答 2

Related

Reference