sql - 将 rqda 文件转换为 sql 文件

Question

我正在使用 RQDA，它是 rstudio 中的一个包来手动编码文本。最终的 rqda 文件是一个sql数据库。我在文本中对语句进行了编码，并使用了不同的代码并将它们包含在代码类别中（例如：代码类别“actor_party”，然后是相关代码“社会主义”、“自由主义”、“保守主义”等）。我完成了编码并想用它进行社交网络分析。为此，我想创建一个 sql 数据库，以便每个代码类别都有自己的列，其中包含每行中的特定代码。每个代码都可以通过以下属性识别：catid（=代码类别号）、fid（文件标识号）和 selfirst（每个代码的开头）。通过这样做，为每个编码语句选择特定的 catid、fid 和 selfirst，以便 sqlite 可以将每个编码识别为唯一的（此外，正如您在下面的 R 脚本中看到的那样，
我在 0.99.879 版本中使用rstudio ，在 0.2-7版本中使用 rqda 和rsqlite 1.0.0。

因此，使用以下 R 代码：

library(RSQLite) # load Package RSQLite
setwd("C:/...")

system("ls *.rqda", show=TRUE)
sqlite <- dbDriver("SQLite")
#specifing the file
qdadb <- dbConnect(sqlite,"My_data.rqda")


dbListTables(qdadb)
dbListFields(qdadb, "coding") # that's where the codings are stored


catid <- dbGetQuery(qdadb, "select distinct(catid) from treecode where status = 1 ORDER BY catid")
i <- 1
table <- dbGetQuery(qdadb, "select fid, selfirst from coding where status = 1 GROUP BY fid, selfirst")
while(i <= max(catid)) {
   ids <- dbGetQuery(qdadb, paste("select cid from treecode where (catid = ",i," and status = 1)", sep=""));
   t <- dbGetQuery(qdadb, paste("select cid, fid, selfirst from coding where (cid in (", paste(as.character(ids$cid), sep="' '", collapse=","), ") and status = 1)", sep=""));
   table <- merge(table, t, by = c("fid","selfirst"), all.x = T);
   i <- i + 1;
   }
# warnings are created because of the same columns which are duplicated by the merging

colnames(table) <- c("fid", "selfirst", dbGetQuery(qdadb, "select name from codecat where status = 1")[,1]) #each code has attributed a unique f(ile)id and selfirst (it's the unique starting point of each coding)

# see below for an example of such a created table

library(car) # Companion to Applied Regression package

# years - catid = 1
table$A00_time_frame <- recode(table$A00_time_frame, '1 = 2010; 2 = 2011; 3 = 2012; 4 = 2013; 5 = 2014; 6 = 2015')

# Sources - catid = 2
ids <- dbGetQuery(qdadb, "select cid from treecode where (catid = 2 and status = 1)")[,1]
values <- dbGetQuery(qdadb, paste("select name from freecode where (id in(", paste(ids, collapse = ","), ") and status = 1)"))[,1]
table$B00_source <- recode(table$B00_source, paste0("'", paste(ids,"'='", values, collapse = "';'", sep=""),"'", sep=""))

# Claimant type - catid = 3
ids <- dbGetQuery(qdadb, "select cid from treecode where (catid = 3 and status = 1)")[,1]
values <- dbGetQuery(qdadb, paste("select name from freecode where (id in(", paste(ids, collapse = ","), ") and status = 1)"))[,1]
table$C00_claimant_type <- recode(table$C00_claimant_type, paste0("'", 
paste(ids,"'='", values, collapse = "';'", sep=""),"'", sep=""))

and so until "catid = 20"

这很有效，看起来像这样： example_table [这个表一直持续到第 844 行 - 只有 fid 是升序的]

即使这行得通并且创建的表与编码总数匹配，但还是会发生一些错误。某些代码未链接到正确的语句（即使它们链接到正确的代码类别，但未链接到正确的编码语句）

我仍然是 R(studio) 的初学者，无法解释出了什么问题。

有没有人知道这里可能是什么问题或错误以及如何解决？

应要求，我很乐意分享我的文件 :)

非常欢迎任何建议或帮助！！

编辑： 这是我的数据子集的链接，您可以复制它（该文件采用 rqda 格式，因为我认为，它的转换可能是问题本身）。
此外，给你两个例子在哪里看。

通过在 R 中创建“表”，可以识别以下行

1. - fid 95，selfirst 4553，然后是编码“Welt”，然后是“E02_European_Commission”+“G10_Cameroon”
但是，如果您检查原始编码rqda 文件，代码“喀麦隆”不在此文件中，而是在 fid 70、selfirst 5082 和“2010”年的“Welt”中

- fid 90、selfirst 959 和年份“2011”显示代码“CDU”，最后一行“特殊索赔人”显示名称“Martin Schulz”。
  但是，如果您检查原始 rqda 文件中的编码，则子集中的代码“Martin Schulz”没有附加编码。

我希望，这两个示例说明了问题，并让您了解在哪里分别查看问题所在。

抱歉，我一开始没有提供！

score 1 · Accepted Answer

也许首先简化代码，以便更好地了解可能出了什么问题？就个人而言，我会更依赖 SQL 而不是 R 来整理所有信息：

t <- dbGetQuery(qdadb, "SELECT codecat.name, coding.cid, coding.fid, coding.selfirst 
       FROM treecode, coding, codecat 
       WHERE treecode.cid = coding.cid 
       AND treecode.catid = codecat.catid
       AND treecode.status = 1
       AND coding.status = 1")
head(reshape(t, idvar = c("fid", "selfirst"), timevar = "name", direction = "wide"))

不确定这是您正在寻找的东西，或者它是否工作得更好。但它似乎更简单的代码来评估。

sql - 将 rqda 文件转换为 sql 文件

1 回答 1

Related

Reference