I want to run an ARIMA Model with thousands of CSV files with various combination
Using Pyflux Here is some python code..
index =0
#filename has file names of thousands of files
for csvfile in filenames:
data = pd.read_csv(csvfile)
model = pf.ARIMA(data=data,ar=4,ma=4,integ=0,target='sunspot.year')
x = model.fit("MLE")
list_of_results[index] = list_of_tuples[index] + (x.summary(),)
index++
I can load these CSV's in Big Query and want to parallelize this operation of sending the data into the ARIMA model as the operation of running the data through ARIMA model with these Files or BigQuery result can run in parallel so that I can save significant time on this operation.
Is there a way to achieve this in Google Data flow?