This question relates to R but really isn't language specific per se. I have a bunch of csv files with this general format "sitename_03082015.csv". The files have 5 columns and various rows
Host MaximumIn MaximumOut AverageIn AverageOut
device1 30.63 Kbps 0 bps 24.60 Kbps 0 bps
device2 1.13 Mbps 24.89 Kbps 21.76 Kbps 461 bps
device5 698.44 Kbps 37.71 Kbps 17.49 Kbps 3.37 Kbps
I ultimately want to read in all the files and merge which I can do but during the merge I want to read the site name and date and add it to each related line so the output looks like this
Host MaximumIn MaximumOut AverageIn AverageOut Site Name Date
device1 30.63 Kbps 0 bps 24.60 Kbps 0 bps SiteA 3/7/15
device12 1.13 Mbps 24.89 Kbps 21.76 Kbps 461 bps SiteA 3/8/15
device1 698.44 Kbps 37.71 Kbps 17.49 Kbps 3.37 Kbps SiteB 3/7/15
device2 39.08 Kbps 1.14 Mbps 10.88 Kbps 27.06 Kbps SiteB 3/8/15
device3 123.43 Kbps 176.86 Kbps 8.62 Kbps 3.78 Kbps SiteB 3/9/15
With my R code I can do the following:
#Get list of file names
filenames<- list.files(pattern = ".csv$")
#This extracts everything up to the underscore to get site name
siteName <- str_extract(string = filenames, "[^_]*")
# Extract date from file names use
date <- str_extract(string = filenames, "\\d{8}" )
With the below R code I can merge all the files but that will be without the added columns of site name and date that I want.
myDF<-do.call("rbind", lapply(filenames, read.table, header=TRUE, sep=","))
I just can't get my head around how to do the extracts for site and date adding and populating the columns to create my ideal dataframe which is the second table above.
The solution that best worked for me was posted below :)