5

我正在尝试从 NCBI 网站获取 FASTA 文件,我使用以下功能

getncbiseq <- function(accession){
  dbs <- c()
  for (i in 1:numdbs){
    db <- dbs[i]
    choosebank(db)
    resquery <- try(query(".tmpquery", paste("AC=", accession)),silent = TRUE)
    if (!(inherits(resquery, "try-error"))){
      queryname <- "query2"
      thequery <- paste("AC=",accession,sep="")
      query(`queryname`,`thequery`)
      # see if a sequence was retrieved:
      seq <- getSequence(query2$req[[1]])
      closebank()
      return(seq)
    }
    closebank()
  }
  print(paste("ERROR: accession",accession,"was not found"))
}    

当我尝试检索序列时

mydata <- getncbiseq("NC_001477")

getSequence(query2$req[[1]]) 中的错误:找不到对象“query2”

还有更好的方法来缩短这些循环功能吗?

如果我使用

query('queryname','the query')
#or 
query("queryname","thequery")

我收到另一个错误

查询错误(“queryname”,“thequery”):无效请求:“(^) 处的未知列表:\"(^)thequery\""

4

2 回答 2

3

感谢您的大力帮助。我被困在这一点上一整天。我终于在 Windows 10 下使用 R3.4.0(32bits) 得到了以下代码:-

getncbiseq <- function(accession)
{
require("seqinr") # this function requires the SeqinR R package
# first find which ACNUC database the accession is stored in:
dbs <- c("genbank","refseq","refseqViruses","bacterial")
numdbs <- length(dbs)
for (i in 1:numdbs)
{
db <- dbs[i]
choosebank(db)
# check if the sequence is in ACNUC database 'db':
resquery <- try(query(".tmpquery", paste("AC=", accession)), silent = TRUE)

if (!(inherits(resquery, "try-error"))) {
  queryname <- "query2"
  thequery <- paste("AC=", accession, sep="")
  query2 <- query(queryname, thequery)
  # see if a sequence was retrieved:
  seq <- getSequence(query2$req[[1]])
  closebank()
  return(seq)
}
closebank()
}
print(paste("ERROR: accession",accession,"was not found"))
}
于 2017-04-24T18:48:19.340 回答
2

我认为您打算将调用分配给query()名为 的变量query2,但您忘记了。尝试这个:

if (!(inherits(resquery, "try-error"))) {
  queryname <- "query2"
  thequery <- paste("AC=", accession, sep="")
  query2 <- query(queryname, thequery)
  # see if a sequence was retrieved:
  seq <- getSequence(query2$req[[1]])
  closebank()
  return(seq)
}

正如您所提到的,您的其余代码也有一些可能会改进的怪癖和扭结。

更新:

这是使用向量而不是显式 for 循环的代码重构sapplydbs后者通常被 R 人不赞成):

processdbs <- function(x, y) {
    choosebank(x)
    resquery <- try(query(".tmpquery", paste("AC=", y)), silent = TRUE)
    if (!(inherits(resquery, "try-error"))) {
      queryname <- "query2"
      thequery  <- paste("AC=", y, sep="")
      query2 <- query(queryname, thequery)

      # see if a sequence was retrieved:
      seq <- getSequence(query2$req[[1]])
      closebank()
      return(seq)
    }
    closebank()
}

getncbiseq <- function(accession) {
   dbs <- c("genbank","refseq","refseqViruses","bacterial")
   result <- sapply(dbs, processdbs, y=accession)
   closebank()

   print(paste("ERROR: accession",accession,"was not found"))
}

您可能需要做一些额外的工作来检查result向量并确定是否在任何地方检索到序列。

于 2016-06-16T10:11:26.737 回答