0

我有包含来自任务的行为数据的文本文件。但是,每个文件的前 18 行都是描述性信息(日期、时间、ID 号等),都在一大块文本中。实际的列名/数据从第 19 行开始。不是一种理想的格式,但我必须保留。

在研究readlines()andwritelines()函数时,我似乎需要将文本文件读入 R 以重新组织数据,然后将其作为文本文件写回,在前 18 行中具有相同的文本块。我不确定这实际上是如何工作的-我是否需要以某种方式合并readlines()read.delim()或者还会readlines()像我一样读取第 18 行下的所有数据read.delim(location, skip=18)

作为参考,以下是我正在使用的文本文件的示例:


 # Non-editable header begin --------------------------------------------------------------------------------

#  data format...............: continuous
#  setname...................: 200ICAready
#  filename..................: none_specified
#  filepath..................: none_specified
#  nchan.....................: 29
#  pnts......................: 666445
#  srate.....................: 500
#  nevents...................: 1792
#  generated by (bdf)........: 
#  generated by (set)........: 200ICAready
#  reported in ..............: 
#  prog Version..............: 7.0.0
#  creation date.............: 10-Sep-2021 16:21:24
#  user Account..............: 
# 
#  Non-editable header end --------------------------------------------------------------------------------




# item   bepoch   ecode             label         onset           diff       dura   b_flags    a_flags    enable        bin
#                                                 (sec)           (msec)     (msec)    (binary)   (binary)


1       0            13               ""          9.9980          0.00      0.0     00000000     00000000      1    [       ]
2       0             4               ""         10.9990       1001.00      0.0     00000000     00000000      1    [       ]
3       0            10               ""         11.1990        200.00      0.0     00000000     00000000      1    [       ]
4       0            14               ""         11.3990        200.00      0.0     00000000     00000000      1    [       ]
5       0            13               ""         12.7320       1333.00      0.0     00000000     00000000      1    [       ]
6       0             1               ""         13.7320       1000.00      0.0     00000000     00000000      1    [       ]
7       0             7               ""         13.9320        200.00      0.0     00000000     00000000      1    [       ]

结果如下:


 # Non-editable header begin --------------------------------------------------------------------------------

#  data format...............: continuous
#  setname...................: 200ICAready
#  filename..................: none_specified
#  filepath..................: none_specified
#  nchan.....................: 29
#  pnts......................: 666445
#  srate.....................: 500
#  nevents...................: 1792
#  generated by (bdf)........: 
#  generated by (set)........: 200ICAready
#  reported in ..............: 
#  prog Version..............: 7.0.0
#  creation date.............: 10-Sep-2021 16:21:24
#  user Account..............: 
# 
#  Non-editable header end --------------------------------------------------------------------------------




# item   bepoch   ecode             label         onset           diff       dura   b_flags    a_flags    enable        bin
#                                                 (sec)           (msec)     (msec)    (binary)   (binary)


1       0            13               ""          9.9980          0.00      0.0     00000000     00000000      1    [       ]
2       0             4               ""         10.9990       1001.00      0.0     00000000     00000000      1    [       ]
3       0            10               ""         11.1990        200.00      0.0     00000000     00000000      1    [       ]
4       0            15               ""         11.2500       200.00       0.0     00000000     00000000      1    [       ]
5       0            14               ""         11.3990        200.00      0.0     00000000     00000000      1    [       ]
6       0            13               ""         12.7320       1333.00      0.0     00000000     00000000      1    [       ]
7       0             1               ""         13.7320       1000.00      0.0     00000000     00000000      1    [       ]
8       0             19              ""         13.9320        200.00      0.0     00000000     00000000      1    [       ]

因此,我需要 R 在处理数据时临时存储不可编辑的标题部分,然后将其写为包含标题的文本文件。

编辑:我分别读取了标题和数据文件,现在正试图找到一种正确合并它们的方法。c(header, datafile)merge(header, datafile)没有工作。

4

1 回答 1

0

查看我的代码。它应该非常快。

library(tidyverse)
library(data.table)
library(fs)

dataRead = function(file) fread(
  file = file, skip=26, 
  col.names = c("item","bepoch","ecode","label","onset","diff",
                "dura","b_flags","a_flags","enable","bin","bin2"),
  colClasses = c("integer", "integer", "integer", "character",
                 "double", "double", "double", "character",
                 "character", "integer", "character", "character")) %>% 
  as_tibble() %>% 
  mutate(bin = str_c(bin, "    ", bin2)) %>% select(-bin2)
  
width = c(1, 5, 9, 10, 11, 9, 6, 11, 11, 5, 8)
files = dir_ls("txtfiles", regexp = "\\.txt$")
if(length(files)>0){
  for(i in 1:length(files)){
    header = fread(file = files[i], nrows=24, sep = "|", header=FALSE)
    df = dataRead(files[i])
    df = df %>% mutate(bin = "[xxxx]")
    df = df %>% mutate(across(everything(), 
                              ~str_pad(.x, width[which(names(df)==cur_column())])))
    fwrite(header, files[i], append = FALSE, quote = FALSE, col.names = FALSE)
    fwrite(df, files[i], append = TRUE, col.names = FALSE, sep = " ", quote = FALSE)
  }
}

该程序处理 txtfiles 文件夹中的每个 txt 文件。将标头和数据读入tibble,变异tibble然后写回文本文件。

于 2021-09-22T22:06:11.470 回答