r - 不允许 R 重复的“row.names”

Question

这是我在 R 中的问题：

mtable <- read.table(paste(".folder_1362704682.4574","/groups.txt",sep=""),sep="\t",comment.char='',skip=0, header=TRUE, fill=TRUE,check.names=FALSE)

第一个文件夹部分或 paste() 通常由 var 包装，用于调试目的 -> 静态。

我总是收到这样的信息：

Error in read.table(paste(".frunc_1362704682.4574", "/groups.txt", sep = ""),  :
  duplicate 'row.names' are not allowed

但是，如果我查看带有此标题的文件：

root_node_name  node_name       node_id #genes_in_root_node     #genes_in_node  #genes_with_variable=1_in_root_node     #genes_with_variable=1_in_node  raw_p_underrepresentation_of_variable=1 raw_p_overrepresentation_      of_variable=1  FWER_underrepresentation        FWER_overrepresentation FDR_underrepresentation FDR_overrepresentation

我看不到任何重复项.. :( 我在另一个讨论中读到了我应该尝试的内容：

mtable <- read.table(paste(".frunc_1362704682.4574","/groups.txt",sep=""),sep="\t",comment.char='',skip=0, header=TRUE, fill=TRUE,check.names=FALSE,**row.names=NULL**)

这很好用，但之后所有标题都向右移动一列：

> head(mtable, n=1)
           row.names                            root_node_name  node_name
1 molecular_function trans-hexaprenyltranstransferase activity GO:0000010
  node_id #genes_in_root_node #genes_in_node
1   17668                   2           2419
  #genes_with_variable=1_in_root_node #genes_with_variable=1_in_node
1                                   0                        0.74491
  raw_p_underrepresentation_of_variable=1
1                                       1
  raw_p_overrepresentation_of_variable=1 FWER_underrepresentation
1                                      1                        1
  FWER_overrepresentation FDR_underrepresentation FDR_overrepresentation
1

有什么想法可以让它正确吗？:(

编辑：

好的，正如评论者所说，这主要是 thr 行的问题.. 愚蠢的我认为它来自标题。但我不想命名这些行，它应该很容易阅读它们...... oO不能那么难，或者？

文件内容：

molecular_function      trans-hexaprenyltranstransferase activity       GO:0000010      17668   2       2419    0       0.74491 1       1       1       -1      -1
molecular_function      single-stranded DNA specific endodeoxyribonuclease activity     GO:0000014      17668   5       2419    0       0.478885        1       1       1       -1      -1
molecular_function      lactase activity        GO:0000016      17668   1       2419    0       0.863086        1       1       1       -1      -1
molecular_function      alpha-1,3-mannosyltransferase activity  GO:0000033      17668   3       2419    0       0.64291 1       1       1       -1      -1
molecular_function      tRNA binding    GO:0000049      17668   27      2419    7       0.975698        0.0663832       1       1       -1      -1
molecular_function      fatty-acyl-CoA binding  GO:0000062      17668   20      2419    6       0.986407        0.0460431       1       1       -1      -1
molecular_function      L-ornithine transmembrane transporter activity  GO:0000064      17668   1       2419    0       0.863086        1       1       1       -1      -1
molecular_function      S-adenosylmethionine transmembrane transporter activity GO:0000095      17668   1       2419    0       0.863086        1       1       1       -1      -1

score 11 · Accepted Answer

根据此处的 R 文档，

If there is a header and the first row contains one fewer field 
than the number of columns, the first column in the input is used
for the row names. Otherwise if row.names is missing, the rows are numbered.

...因此我建议第一行可能比列数少一个字段，因此read.table()选择第一列（包含多个副本molecular_function）作为行名。

score 1 · Accepted Answer

@adrianoesch的答案（https://stackoverflow.com/a/22408965/2236315）应该会有所帮助。

请注意，如果您在某些文本编辑器中打开，您应该会看到标题字段的数量少于标题行下方的列数。在我的例子中，数据集在最后一个标题字段的末尾缺少一个“，”。

score 0 · Accepted Answer

我遇到了同样的问题，问题是我的文本文件底部有大量的表格空白。因此，这些行上的每一行名称都是相同的（即空白）。之所以发生是因为我是从excel转换过来的。

score 0 · Accepted Answer

我已经自动生成了数据文件，除了标题之外，一列是空的。我不想单独编辑每个文件（并冒着弄脏它的风险）。我发现的最佳解决方法是问题#4066607，在参数中包含“row.names=NULL”。

DF<-read.csv(file, ..... , row.names=NULL)

这并不完美，但让我加载文件。与另一个答案中描述的行为不同（强制添加额外的行号列），我得到了标记为“row.names”的原始第一列，并且所有标题都向右移动了一列......但它让我获取所有数据。

r - 不允许 R 重复的“row.names”

4 回答 4

Related

Reference