我有几千个*.csv
文件(都有一个唯一的名称),但标题 - 列在文件中是相等的 - 比如"Timestamp"
, "System_Name"
,"CPU_ID"
等......
我的问题是如何替换"System_Name"
(这是一个系统名称,如"as12535.org.at"
或任何其他字符组合,并将其匿名化?我很感激任何提示或指向正确方向...
在 CSV 文件的结构下方...
"Timestamp","System_Name","CPU_ID","User_CPU","User_Nice_CPU","System_CPU","Idle_CPU","Busy_CPU","Wait_IO_CPU","User_Sys_Pct"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
我尝试使用anonymizer
在矢量级别上运行良好的 R 包,但我遇到了对我在 R 中读取的数千个 csv 文件执行此操作的问题 - 我尝试的是以下内容 - 创建一个包含所有 csv 文件的列表列表中的数据框。
initialize a list
r.path <- setwd("mypath")
ldf <- list()
# creates the list of all the csv files in my directory - but filter for
# files with Unix in the filename for testing.
listcsv <- dir(pattern = ".UnixM.")
for (i in 1:length(listcsv)){
ldf[[i]] <- read.csv(file = listcsv[i])
}
我正在绞尽脑汁,因为我无法匿名化System_Name
列,甚至无法替换某些字符(用于伪匿名化)并循环遍历列表(ldf
)和该列表的数据框元素。
我的列表ldf
(包含单个 csv 文件的 df)如下所示:
summary(ldf)
Length Class Mode
[1,] 5 data.frame list
[2,] 5 data.frame list
[3,] 5 data.frame list
我现在如何读取所有 CSV 文件,更改或匿名化整个甚至部分"System_Name"
列,并在 R 的循环中为我的目录中的每个 CSV 执行此操作?不需要超级优雅——当它完成工作时我很高兴:-)