2

I am working with database tables with dbplyr

I have a local table and want to join it with a large (150m rows) table on the database

The database PRODUCTION is read only

# Set up the connection and point to the table

library(odbc); library(dbplyr)    

my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;DATABASE=PRODUCTION;UID=",
                            t2690_username,";PWD=",t2690_password, sep="")

t2690 <- dbConnect(odbc::odbc(), .connection_string=my_conn_string)

order_line <- tbl(t2690, "order_line") #150m rows

I also have a local table, let's call it orders

# fill df with random data

orders <- data.frame(matrix(rexp(50), nrow = 100000, ncol = 5))

names(orders) <- c("customer_id", paste0(rep("variable_", 4), 1:4))

let's say I wanted to join these two tables, I get the following error:

complete_orders <- orders %>% left_join(order_line)

> Error: `x` and `y` must share the same src, set `copy` = TRUE (may be slow)

The issue is, if I were to set copy = TRUE, it would try to download the whole of order_line and my computer would quickly run out of memory

Another option could be to upload the orders table to the database. The issue here is that the PRODUCTION database is read only - I would have to upload to a different database. Trying to copy across databases in dbplyr results in the same error.

The only solution I have found is to upload into the writable database and use sql to join them, which is far from ideal

4

2 回答 2

4

I have found the answer, you can use the in_schema() function within the tbl pointer to work across schemas within the same connection

# Connect without specifying a database
my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;UID=",
                            t2690_username,";PWD=",t2690_password, sep="")    

# Upload the local table to the TEMP db then point to it
orders <- tbl(t2690, in_schema("TEMP", "orders"))

order_line <-  tbl(t2690, in_schema("PRODUCTION", "order_line"))

complete_orders <- orders %>% left_join(order_line)
于 2018-05-22T08:19:33.230 回答
0

Another option could be to upload the orders table to the database. The issue here is that the PRODUCTION database is read only - I would have to upload to a different database. Trying to copy across databases in dbplyr results in the same error.

In your use case, it seems (based on the accepted answer) that your databases are on the same server and it's just a matter of using in_schema. If this were not the case, another approach would be that given here, which in effect gives a version of copy_to that works on a read-only connection.

于 2021-06-17T14:20:55.690 回答