1

I apologize if this question has been asked with terminology I don't recognize but it doesn't appear to be.

I am using the function comm2sci in the library taxize to search for the scientific names for a database of over 120,000 rows of common names. Here is a subset of 10:

commnames <- c("WESTERN CAPERCAILLIE", "AARDVARK", "AARDWOLF", "ABACO ISLAND BOA", 
"ABBOTT'S DAY GECKO", "ABDIM'S STORK", "ABRONIA GRAMINEA", "ABYSSINIAN BLUE 
WINGED GOOSE", 
"ABYSSINIAN CAT", "ABYSSINIAN GROUND HORNBILL")

When searching with the NCBI database in this function, it asks for user input if the common name is generic/general and not species specific, for example the following call will ask for clarification for "AARDVARK" by entering '1', '2' or 'return' for 'NA'.

install.packages("taxize")
library(taxize)
ncbioutput <- comm2sci(commnames, db = "ncbi")###querying ncbi database

Because of this, I cannot rely on this function to find the names of the 120000 species without me sitting and entering 'return' every few minutes. I know this question sounds taxize specific - but I've had this situation in the past with other functions as well. My question is: is there a general way to place the comm2sci call in a conditional statement that will return a specific value when user input is prompted? Or otherwise write a function that will return some input when prompted?

All searches related to this tell me how to ask for user input but not how to override user queries. These are two of the question threads I've found, but I can't seem to apply them to my situation: Make R wait for console input?, Switch R script from non-interactive to interactive

I hope this was clear. Thank you very much for your time!

4

1 回答 1

1

因此get_*,当有 > 1 个选项时,默认情况下,内部使用的所有函数都要求用户输入。但是,所有这些函数都有一个带下划线的姊妹函数,例如,get_uid_不提示输入并返回所有数据。您可以使用它来获取所有数据,然后根据需要进行处理。

对 进行了一些更改comm2sci,因此请先更新:devtools::install_github("ropensci/taxize")

这是一个例子。

library(taxize)
commnames <- c("WESTERN CAPERCAILLIE", "AARDVARK", "AARDWOLF", "ABACO ISLAND BOA", 
               "ABBOTT'S DAY GECKO", "ABDIM'S STORK", "ABRONIA GRAMINEA", 
               "ABYSSINIAN BLUE WINGED GOOSE", 
               "ABYSSINIAN CAT", "ABYSSINIAN GROUND HORNBILL")

然后用于get_uid_获取所有数据

ids <- get_uid_(commnames)

根据需要处理结果ids。在这里,为简洁起见,我们将只抓取每个的第一行

ids <- lapply(ids, function(z) z[1,])

然后把uid拿出来

ids <- as.uid(unname(vapply(ids, "[[", "", "uid")), check = FALSE)

并传递给comm2sci

comm2sci(ids)

$`100830`
[1] "Tetrao urogallus"

$`9818`
[1] "Orycteropus afer"

$`9680`
[1] "Proteles cristatus"

$`51745`
[1] "Chilabothrus exsul"

$`8565`
[1] "Gekko"

$`39789`
[1] "Ciconia abdimii"

$`278977`
[1] "Abronia graminea"

$`8865`
[1] "Cyanochen cyanopterus"

$`9685`
[1] "Felis catus"

$`153643`
[1] "Bucorvus abyssinicus"

请注意,NCBI 从get_uid/返回常用名称get_uid_,因此您可以继续将它们取出来

于 2017-07-21T17:46:05.043 回答