I have developed an online survey that stores my data in a Microsoft SQL 2005 database. I have written a set of outlier checks on my data in R. The general workflow for these scripts is:
- Read data from SQL database with sqlQuery()
- Perform outlier analysis
- Write offending respondents back to database in separate table using sqlSave()
The table I am writing back to has the structure:
CREATE TABLE outliers2(
modelid int
, password varchar(50)
, reason varchar(50),
Constraint PK_outliers2 PRIMARY KEY(modelid, reason)
)
GO
As you can see, I've set the primary key to be modelid and reason. The same respondent may be an outlier for multiple checks, but I do not want to insert the same modelid and reason combo for any respondent.
Since we are still collecting data, I would like to be able to update these scripts on a daily / weekly basis as I develop the models I am estimating on the data. Here is the general form of the sqlSave()
command I'm using:
sqlSave(db, db.insert, "outliers2", append = TRUE, fast = FALSE, rownames = FALSE)
where db
is a valid ODBC Connection and db.insert
has the form
> head(db.insert)
modelid password reason
1 873 abkd WRONG DIRECTION
2 875 ab9d WRONG DIRECTION
3 890 akdw WRONG DIRECTION
4 905 pqjd WRONG DIRECTION
5 941 ymne WRONG DIRECTION
6 944 okyt WRONG DIRECTION
sqlSave()
chokes when it tries to insert a row that violates the primary key constraint and does not continue with the other records for the insert. I would have thought that setting fast = FALSE
would have alleviated this problem, but it doesn't.
Any ideas on how to get around this problem? I could always drop
the table at the beginning of the first script, but that seems pretty heavy handed and will undoubtedly lead to problems down the road.