regex - Regex and file processing

Question

This question relates to R but really isn't language specific per se. I have a bunch of csv files with this general format "sitename_03082015.csv". The files have 5 columns and various rows

Host    MaximumIn   MaximumOut  AverageIn   AverageOut
device1 30.63 Kbps  0 bps       24.60 Kbps  0 bps
device2 1.13 Mbps   24.89 Kbps  21.76 Kbps  461 bps
device5 698.44 Kbps 37.71 Kbps  17.49 Kbps  3.37 Kbps

I ultimately want to read in all the files and merge which I can do but during the merge I want to read the site name and date and add it to each related line so the output looks like this

Host      MaximumIn     MaximumOut  AverageIn   AverageOut  Site Name   Date
device1   30.63 Kbps    0 bps       24.60 Kbps  0 bps       SiteA       3/7/15
device12  1.13 Mbps     24.89 Kbps  21.76 Kbps  461 bps     SiteA       3/8/15
device1   698.44 Kbps   37.71 Kbps  17.49 Kbps  3.37 Kbps   SiteB       3/7/15
device2   39.08 Kbps    1.14 Mbps   10.88 Kbps  27.06 Kbps  SiteB       3/8/15
device3   123.43 Kbps   176.86 Kbps 8.62 Kbps   3.78 Kbps   SiteB       3/9/15

With my R code I can do the following:

#Get list of file names
filenames<- list.files(pattern = ".csv$")

#This extracts everything up to the underscore to get site name
siteName <- str_extract(string = filenames, "[^_]*")

# Extract date from file names use
date <- str_extract(string = filenames, "\\d{8}" )

With the below R code I can merge all the files but that will be without the added columns of site name and date that I want.

myDF<-do.call("rbind", lapply(filenames, read.table, header=TRUE, sep=","))

I just can't get my head around how to do the extracts for site and date adding and populating the columns to create my ideal dataframe which is the second table above.

The solution that best worked for me was posted below :)

score 2 · Accepted Answer

我立即想到的方法是cbind在阅读带有附加信息的信息时做，然后再做rbind。与此类似的东西：

 myDF<-do.call("rbind", 
          lapply(filenames, 
                 function(x) cbind(read.table(x, header=TRUE, sep=","), 
                                              "Site Name" = str_extract(string = x, "[^_]*"),
                                              "Date" = as.Date(str_extract(string = x, "\\d{8}"), "%m%d%Y"))))

score 1 · Accepted Answer

我做了类似的事情，可以在这里应用。您可以添加更多用逗号分隔的文件名。也可以类似地提取站点。如果您需要更多帮助，请告诉我。

    ##Assuming your csv files are saved in location C:/"
    library(stringr)

    ##List all filenames
    fileNames <- c("hist_03082015.csv","hist_03092015.csv")

    ##Create a empty dataframe to save all output to
    final_df <- NULL

    for (i in fileNames) {
    ##Read CSV
    df <- read.csv(paste("C:/",i,sep=""),header = TRUE,
                 sep = ",",colClasses='character')
    ##Extract date from filename into a column
    df$Date <- gsub("\\D","",i)
    ##Convert string to date
    df$Date <-as.Date(paste(str_sub(df$Date, 1, 2),
              str_sub(df$Date,  3,-5),
              str_sub(df$Date, 5,-1),sep="-"),"%d-%m-%Y")
    ##save all data into 1 dataframe 
    final_df <- rbind(final_df,df)
    print(summary(final_df))
    }

regex - Regex and file processing

2 回答 2

Related

Reference