Say you have a CSV file. Each row of the file has numbers, vectors, and dates. Elements of each vector separated by semi-colons. For example a vector y in this csv file looks like ";1;2;4;7;2". The vectors are different lengths. I couldn't read this file using
read.table()
or
read.csv()
even with trying some things similar to what was written here How to read a .csv file containing apostrophes into R?. Below is a simplified version of what 3 lines in the CSV file might look like
1,6,;2;3.1;45;31.2;3,2,;1;1;1;1;1;5,10/22/1938 1:25
2,5,;1;22;12;1.4;66,7,;2;3;4;5;6;7;8;6;9,11/25/1938 1:25
3,1,;1;2;3;4;5;6;7;8;9,3.2,;1;2;3;4;5;6;7;9;10;11,11/25/1958 1:25
and here it is with spaces between the commas, to make it a bit more readable
1, 6, ;2;3.1;45;31.2;3, 2, ;1;1;1;1;1;5, 10/22/1938 1:25
2, 5, ;1;22;12;1.4;66, 7, ;2;3;4;5;6;7;8;6;9, 11/25/1938 1:25
3, 1, ;1;2;3;4;5;6;7;8;9, 3.2, ;1;2;3;4;5;6;7;9;10;11, 11/25/1958 1:25
Each line has the same number of ','s, the only major difference between lines is that the vectors can be different. Note sometimes fields may be blank. I think it makes most sense for the output to be in the form of a list of a list. I was thinking of writing my own function that would effectively look something like (I'm not so proficient with lists yet so my language may be way off here)
data <- empty list of a list
while (we haven't reached the end of the file){ #don't know the function to do this
temp = get first line of file #don't know the function to do this
if temp is not empty{ #don't know the function to do this
indices = which(temp==',')
indices.col = which(temp==';')
put temp[1:(indices(1)-1)] in the (counter,1) location of data;
put temp[(indices(1)+1):(indices(2)-1)] in the (counter,2) location of data;
store the vector and deal with the colons somehow in (counter,3) location of data;
}
}
Would there be an easier way to do this, maybe using read.table in a way that I missed. I'm not set on using lists of lists to do this. I want to basically do some regression analysis of the form y=mx+b, where x is one of the numerical entries and y is the scalar output of a function applied to one of the vector entries (eg sum(vector) = a*first entry of row + b). So perhaps keep that in mind. Also note that there would be an option to have this file use some other character besides semi colons to separate the vectors.