regex - R：创建一个没有特定字符串的文件列表

Question

我正在尝试从包含具有以下模式的文件的目录中创建文件列表：

Name_Surname_12345_noe_xy.xls  
Name_Surname_12345_xy.xls

xy 可以是一个或两个字符。

现在我想要一个文件名中不包含“noe”的所有文件的列表。我只能读取“noe” - 使用的文件

fl = list.files(pattern = "noe.+xls$", recursive=T, full.names=T)

但找不到排除它们的方法。有什么建议么？

非常感谢
马库斯

score 3 · Accepted Answer

Get all the files and then use grep to find the noe ones and subset them out:

> all
[1] "Name_Surname_123425_xy.xls"    "Name_Surname_1234445_xy.xls"  
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"    
[5] "Name_Surname_13245_noe_xy.xls"
> all[grep("noe_xy.xls",all,invert=TRUE)]
[1] "Name_Surname_123425_xy.xls"  "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_xy.xls"

always make sure you check the edge cases where all or none of the files match:

> all[grep("xls",all,invert=TRUE)]
character(0)
> all[grep("fnord",all,invert=TRUE)]
[1] "Name_Surname_123425_xy.xls"    "Name_Surname_1234445_xy.xls"  
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"    
[5] "Name_Surname_13245_noe_xy.xls"

Using grep with a negative index works except in these edge cases:

> all
[1] "Name_Surname_123425_xy.xls"    "Name_Surname_1234445_xy.xls"  
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"    
[5] "Name_Surname_13245_noe_xy.xls"
> all[-grep("noe_xy.xls",all)] # strip out the noe_xy.xls files

[1] "Name_Surname_123425_xy.xls"  "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_xy.xls"  

 # works. Now strip out any xls files (should leave nothing)

> all[-grep("xls",all)]
character(0)

# yup, that works too. Now strip out 'fnord' files, shouldn't remove anything:

> all[-grep("fnord",all)]
character(0)

Epic fail! Reason is left as an exercise to the reader.

regex - R：创建一个没有特定字符串的文件列表

1 回答 1

Related

Reference