I am trying to merge two large matrices by row.names
in R with merge
, but it's taking quite some time. Is there a possibility to parallelize the merge
method? Maybe somehow with foreach
library? Or maybe there are faster solutions that do the job?
I have 8 cores and 24 GB of RAM. Both matrices are about 1.4 Gb and consist of ~900 rows and ~22000 columns.
Here is the code to reproduce a small example of my data set:
df1 <- data.frame(x = 1:3, y = 1:3, row.names = c('r1', 'r2', 'r3'))
df2 <- data.frame(z = 5:7, row.names = c('r1', 'r3', 'r7'))
dfMerged <- merge(df1, df2, by = "row.names", all = TRUE)
dfMerged[is.na(dfMerged)] <- 0