r - R code for Retrieving the List of Names from Ensembl database

Question

This is the program written to convert the Entrez IDs into Gene Name using R. But the problem i am encountering is

Error in .checkKeysAreWellFormed(keys) : 
keys must be supplied in a character vector with no NAs

The program:

 a <- read.csv("C:\\list.csv", header = FALSE)
 a2 <- a[!is.na(a)]
for(i in 1:length(a))
{
    if(is.na(a[i])==TRUE)
    {
    next;
    } else {
       a2<-c(a2,a[i]);
       a3 <-lookUp(a2, 'org.Hs.eg', 'SYMBOL') 
    }
}

And the list looks like this: (list.csv)

5921,9315,10175,58155,1112,1974,2033,2309,3015,3192,5217,5411,5527,6660,8125,9743,10439,11174,23077,23097,26520,56929,84146,109,1073,1783,1809,1839,3169,3187,3768,4857,5066,5496,5594,5683,5885,6328,7490

Where is the problem?

score 2 · Accepted Answer

lookUp来自Bioconductor包注释。我们可以生成上面的错误

> library(annotate)
> lookUp(list("123"), 'org.Hs.eg', 'SYMBOL')
Error in .checkKeysAreWellFormed(keys) : 
  keys must be supplied in a character vector with no NAs

并通过提供字符向量而不是列表来纠正它

> lookUp("123", 'org.Hs.eg', 'SYMBOL')
$`123`
[1] "PLIN2"

如果您的文件“list.csv”确实包含您指出的单行，那么我可能

eid = strsplit(readLines("C:\\list.csv"), ",")[[1]]

获得 Entrez id 的字符向量class(eid)将是“字符”。清理它并进行查找

lookUp(eid[!is.na(eid)], "org.Hs.eg", "SYMBOL")

但更“现代”的方法是

select(org.Hs.eg.db, eid, "SYMBOL")

这将处理 NA 和无效的密钥，而不是大惊小怪

> select(org.Hs.eg.db, c(NA, "123", "xyz"), "SYMBOL")
  ENTREZID SYMBOL
1      123  PLIN2
2      xyz   <NA>
Warning message:
In .select(x, keys, cols, keytype, jointype = jointype) :
  'NA' keys have been removed

r - R code for Retrieving the List of Names from Ensembl database

1 回答 1

Related

Reference