对不起,一般的问题。我正在寻找用于整理数据文件夹的指针,其中有许多 .txt 文件。它们都有不同的标题,并且对于绝大多数文件来说,文件具有相同的维度,即列号相同。然而,痛苦是一些文件,尽管列数相同,但列名不同。也就是说,在这些文件中,测量了其他一些变量。
我想清除这些文件,而我不能通过简单地比较列号来做到这一点。有什么方法可以传递列的名称并检查目录中有多少文件具有该列,以便我可以将它们删除到不同的文件夹中?
更新:
我创建了一个虚拟文件夹来包含反映问题的文件,请参阅下面的链接以访问我的谷歌驱动器上的文件。在这个文件夹中,我选取了 4 个包含问题列的文件。
https://drive.google.com/drive/folders/1IDq7BwfQNkGb9y3RvwlLE3FeMQc38taD?usp=sharing
问题是代码似乎能够找到匹配选择标准的文件,也就是问题列的实际名称,但我无法提取列表中此类文件的真实索引。任何指针?
library(data.table)
#read in the example file that have the problem column content
df_var <- read.delim("ctrl_S3127064__3S_DMSO_00_none.TXT", header = T, sep = "\t")
#read in a file that I want to use as reference
df_standard <- read.delim("ctrl__S162465_20190111_T8__3S_2DG_3mM_none.TXT", header = T, sep = "\t")
#get the names of columns of each file
standar.names <- names(df_standard)
var.names <- names(df_var)
same.titles <- var.names %in% standar.names
dff.titles <- !var.names %in% standar.names
#confirm the only 3 columns of problem is column 129,130 and 131
mismatched.names <- colnames(df_var[129:131])
#visual check the names of the problematic columns
mismatched.names
# get current working directory and list all files in this directory
wd <- getwd()
files_in_wd <- list.files(wd)
# create an empty list and read in all files from wd
l_files <- list()
for(i in seq_along(files_in_wd)){
l_files[[i]] <- read.delim(file = files_in_wd[i],
sep = "\t",
header = T,
nrows = 2)
}
# get column names of all files
column_names <- lapply(l_files, names)
# get unique names of files
unique_names <- unique(mismatched.names)
unique_names[1]
# decide which files to remove
#here there the "too_keep" returns an integer vector that I don't undestand
#I thought the numbers should represent the ID/index of the elements
#but I have less than 10 files, but the numbers in to_keep are around 1000
#this is probably because it's matching the actually index of the unlisted list
#but if I use to_keep <- which(column_names%in% unique_names[1]) it returns empty vector
to_keep <- which(unlist(column_names)%in% unique_names[1])
#now if I want to slice the file using to_keep the files_to_keep returns NA NA NA
files_to_keep <- files_in_wd[to_keep]
#once I have a list of targeted files, I can remove them into a new folder by using file.remove
library(filesstrings)
file.move(files_to_keep, "C:/Users/mli/Desktop/weeding/need to reanalysis" )