1

我有一个从顶层开始具有不同级别分支的数据集:stock -> mbranch -> sbranch -> lsbranch。我希望能够将这些级别的数据可视化为 Newick 格式。我在每个库存级别中有不同的语言组,并希望根据这些最高级别的组制作不同的树。

例如我的数据格式如下:

sample= data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))

我正在尝试输出 newick 树格式,类似于:

tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
plot(read.dendrogram(tree))

我正在这样做,所以稍后我可以对输出树的节点进行距离矩阵。

函数 write.tree 是否能够分析这样的数据并从中制作一棵树(假设我的实际数据集要大得多)?或者一般来说,一个输出树格式的函数。谢谢

4

1 回答 1

0

您可以使用该ape::read.tree()功能来读取您的 newick 格式树

tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
my_tree <- read.tree(text = tree)
plot(my_tree)

然后,您可以使用ape::write.tree将树保存到 newick 文件中:

write.tree(my_tree, file = "my_file_name.tre")

要将表格转换为"phylo"对象,ape您可以使用此功能(可能需要一些调整):

## The function
data.frame.to.phylo <- function(sample){
    ## Making an edge table
    edge_table <- rbind(
        ## The root connecting A to B
        rbind(c("root", "A"),c("root", "B")),
        ## All the nodes connecting to the tips
        cbind(sample$stock, sample$name)
        )

    ## Translating the values in the edge table into edge IDs
    ## The order must be tips, root, nodes
    element_names <- c(unique(sample$name), "root", unique(sample$stock))
    element_ids   <- seq(1:length(element_names))

    ## Looping through each ID and name
    for(element in element_ids) {
        edge_table <- ifelse(edge_table == element_names[element], element_ids[element], edge_table)
    }

    ## Make numeric
    edge_table <- apply(edge_table, 2, as.numeric)

    ## Build the phylo object
    phylo_object <- list()
    phylo_object$edge <- edge_table
    phylo_object$tip.label <- unique(sample$name)
    phylo_object$node.label <- c("root", unique(sample$stock))
    phylo_object$Nnode <- length(phylo_object$node.label)

    ## Forcing the class to be "phylo"
    class(phylo_object) <- "phylo"
    return(phylo_object)
}

## The data
sample = data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))

## Plotting the data.frame for testing the function
plot(data.frame.to.phylo(sample))

干杯,托马斯

于 2020-07-28T15:11:58.320 回答