mysql - 在将行添加到 SQL 之前检查表中是否存在行

Question

我正在使用Python和模块构建一个 twitter 抓取Tweepy器应用程序MySQLdb

它将获取数百万条推文，因此性能是一个问题，我想在将其添加到同一个查询之前检查表中是否存在之前的 tweet_id

表架构是：

  *id* |   tweet_id             |     text
  _____|________________________|______________________________
    1  |   259327533444925056   |     sample tweet1
  _____|________________________|______________________________
    2  |   259327566714923333   |     this is a sample tweet2

我尝试的代码是，但它执行双重查询：

#check that the tweet doesn't exist first
q = "select count(*) from tweets where tweet_id = " + tweet.id
cur.execute(q)
result = cur.fetchone()
found = result[0]
if found == 0: 
q = "INSERT INTO  lexicon_nwindow (tweet_id,text) VALUES(tweet_id,tweet.text)
cur.execute(q)

使 Tweet_id 唯一并仅插入推文，会引发异常并且效率不高吗？

那么用一个查询来实现这一目标的最佳执行方法是什么？

score 1 · Accepted Answer

如果将 tweet_id 作为主键（删除字段 Id），则可以使用 INSERT IGNORE 或 REPLACE INTO。1解决了2个问题。

如果要保留 Id 字段，请将其设置为索引/唯一并将其设置为自动增量。如果我知道 tweet_id 可以用作主键，我会避开这种方法。

希望这可以帮助。

哈里

score 0 · Accepted Answer

The answer is profile, don't speculate.

I don't mean to be dismissive. We don't know what will be fastest:

SELECT + (in code) conditional INSERT
REPLACE INTO
INSERT IGNORE
INSERT SELECT WHERE NOT EXISTS...)
INSERT and (in code) ignore error

We don't know the rate of data, the frequency of duplicates, the server configuration, whether there are multiple writers simultaneously, etc.

Profile, don't speculate.

score 0 · Accepted Answer

#check that the tweet doesn't exist first
q = "select count(*) from tweets where tweet_id = " + tweet.id
cur.execute(q)
result = cur.fetchone()
found = result[0]
if found == 0: 
q = "REPLACE  lexicon_nwindow (tweet_id,text) VALUES(tweet_id,tweet.text)
cur.execute(q)

score 0 · Accepted Answer

使用 INSERT SELECT 而不是 INSERT VALUES 并在您的 SELECT 添加 WHERE 子句以检查您的 tweet.id 是否已在表中

q = "INSERT INTO  lexicon_nwindow (tweet_id,text) 
SELECT " + tweet.id +" ," + tweet.text +" FROM DUAL
WHERE not exists(select 1 from tweets where tweet_id = " + tweet.id +" ) "

mysql - 在将行添加到 SQL 之前检查表中是否存在行

4 回答 4

Related

Reference