我使用该twitteR
软件包收集了一些推文,然后使用 Nicole White 的各种教程将它们导出到 neo4j 数据库。我将推文提取到一个名为的数据帧中kdf
,然后使用stringr
Nicole 演示的基本清理中的函数。然后我将其从 R 发送到 neo4j。我的代码的基本部分是:
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/", username="xxxx", password="xxxx")
clear(graph)
addConstraint(graph, "Tweet", "id")
addConstraint(graph, "User", "username")
addConstraint(graph, "Hashtag", "hashtag")
addConstraint(graph, "Tags", "ent_tag")
query = "
CREATE (tweet:Tweet {id: {tweetID}})
SET tweet.text = {text}
CREATE (user:User {name: {Username}})
CREATE (user)-[:TWEETED]->(tweet)
FOREACH(reply_to_sn IN CASE {reply_to_sn} WHEN NULL then [] else [{reply_to_sn}] END |
MERGE (replytouser:User {username:{reply_to_sn}})
CREATE (tweet)-[:IN_REPLY_TO]->(replytouser)
)
FOREACH(retweet_sn IN CASE {retweet_sn} WHEN NULL THEN [] ELSE [{retweet_sn}] END |
MERGE(retweet_user:User {username: {retweet_sn}})
CREATE (tweet)-[:RETWEET_OF]->(retweet_user)
)
FOREACH(hastag_nodes IN CASE {hashtag_nodes} WHEN NULL then [] else [{hashtag_nodes}] END |
MERGE (h:Hashtag {hashtag :{hashtag_nodes}})
CREATE (tweet)-[:HASHTAG]->(h)
)
FOREACH(mentioned_users IN CASE {mentioned_users} WHEN NULL then [] else [{mentioned_users}] END |
MERGE (m:User {username :{mentioned_users}})
CREATE (tweet)-[:MENTIONED]->(m)
)
"
tx = newTransaction(graph)
for(i in 1:nrow(kdf)){
row = kdf[i, ]
appendCypher(tx, query,
tweetID=row$id,
text=row$text,
Username=row$screenName,
reply_to_sn=row$replyToSN,
retweet_sn=getRetweetSN(row$text),
hashtag_nodes=getHashtags(row$text),
mentioned_users=getMentions(row$text))
}
commit(tx)
此后我所做的是使用 Watson 的 Alchemy API 为所有文本提取命名实体。这存储在一个名为ent_tbl
. 这包含三个变量tweetid
,etext
和etype
。现在我也试图将这些数据导出到同一个 neo4j 数据库并加入推文的 id。这是代码的另一部分:
query="
MATCH(t:ent_tag {id : $twid, type :$etype, text :$etext})
MATCH(tw:tweet {tweetID : $twid })
CREATE (tw)-[:HAS_ENT]->(t)
"
tx=newTransaction(graph)
for (i in 1:nrow(ent_tbl)){
row = ent_tbl[i,]
appendCypher(tx, query,
twid=row2$tweetid,
etype=row2$etype,
etext=row2$etext)
}
commit(tx)
虽然我在提交此操作时没有收到任何错误,但没有向我显示我希望看到的标签 ( ) 和推文 ( )summary(graph)
之间的关系。t
tw
> summary(graph)
This To That
1 User TWEETED Tweet
2 Tweet RETWEET_OF User
3 Tweet HASHTAG Hashtag
4 Tweet MENTIONED User
5 Tweet IN_REPLY_TO User
为什么会发生这种情况?这是我在 neo4j 中的 db.schema: