mysql - 当存储过程迭代 15M 行的大表时，MySQL 的性能非常糟糕

Question

我有一个存储过程CURSOR，它在一个遍历15M 行的表的 select 语句上打开一个（该表是大型 CSV 的简单导入）。

我需要通过将每行的各个部分插入到 3 个不同的表中来规范化该数据（捕获自动更新 ID，在外键约束中使用它们，等等）。

所以我写了一个简单的存储过程， open CURSOR，FETCH将字段转换为变量并执行 3 个插入语句。

我在一个小型数据库服务器上，默认安装 mysql（1 cpu，1.7GB ram），我希望这个任务需要几个小时。我在 24 小时以上，顶部显示85% 浪费了 CPU。

我想我有某种可怕的低效率。关于提高任务效率的任何想法？或者只是确定瓶颈在哪里？

root@devapp1:/mnt/david_tmp# vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  1    256  13992  36888 1466584    0    0     9    61    1    1  0  0 98  1
 1  2    256  15216  35800 1466312    0    0    57  7282  416  847  2  1 12 85
 0  1    256  14720  35984 1466768    0    0    42  6154  387  811  2  1 10 87
 0  1    256  13736  36160 1467344    0    0    51  6979  439  934  2  1  9 89

DROP PROCEDURE IF EXISTS InsertItemData;

DELIMITER $$
CREATE PROCEDURE InsertItemData() BEGIN 
    DECLARE spd TEXT;
    DECLARE lpd TEXT;
    DECLARE pid INT;
    DECLARE iurl TEXT;

    DECLARE last_id INT UNSIGNED;
    DECLARE done INT DEFAULT FALSE;

    DECLARE raw CURSOR FOR select t.shortProductDescription, t.longProductDescription, t.productID, t.productImageURL 
                           from frugg.temp_input t;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
    OPEN raw;

    read_loop: LOOP
        FETCH raw INTO spd, lpd, pid, iurl;

        IF done THEN
            LEAVE read_loop;
        END IF;

        INSERT INTO item (short_description, long_description) VALUES (spd, lpd);
        SET last_id = LAST_INSERT_ID();
        INSERT INTO item_catalog_map (catalog_id, catalog_unique_item_id, item_id) VALUES (1, CAST(pid AS CHAR), last_id);
        INSERT INTO item_images (item_id, original_url) VALUES (last_id, iurl);
    END LOOP;

    CLOSE raw;
END$$
DELIMITER ;

score 1 · Accepted Answer

与在存储过程中循环相比，MySQL 几乎总是能更好地执行直接 SQL 语句。

也就是说，如果您使用 InnoDB 表，您的过程将在START TRANSACTION/COMMIT块内运行得更快。

更好的做法是在AUTO_INCREMENT中的记录中添加一个frugg.temp_input，并针对该表进行查询：

DROP TABLE IF EXISTS temp_input2;

CREATE TABLE temp_input2 (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    shortProductDescription TEXT, 
    longProductDescription TEXT,
    productID INT,
    productImageURL TEXT,
    PRIMARY KEY (id)
);

START TRANSACTION;

INSERT INTO 
    temp_input2
SELECT
    NULL AS id,
    shortProductDescription, 
    longProductDescription,
    productID,
    productImageURL
FROM
    frugg.temp_input;

INSERT 
    INTO item 
(
    id, 
    short_description, 
    long_description
) 
SELECT 
    id,
    shortProductDescription AS short_description, 
    longProductDescription AS long_description
FROM
    temp_input2
ORDER BY
    id;

INSERT INTO 
    item_catalog_map
(
    catalog_id, 
    catalog_unique_item_id, 
    item_id
)
SELECT 
    1 AS catalog_id,
    CAST(productID AS CHAR) AS catalog_unique_item_id,
    id AS item_id
FROM
    temp_input2
ORDER BY
    id;

INSERT INTO 
    item_images 
(
    item_id, 
    original_url
) 
SELECT 
    id AS item_id,
    productImageURL AS original_url
FROM
    temp_input2
ORDER BY
    id;

COMMIT;

比上面更好的是，在将 .CSV 文件加载到 .CSV 文件之前frugg.temp_input，向其中添加一个AUTO_INCREMENT字段，从而节省了上面显示的创建/加载的额外步骤temp_input2。

score 1 · Accepted Answer

我的想法与罗斯提供的类似，但在不了解您的表、索引、“自动增量”列名是什么的情况下，我只会直接插入...但是，如果您遇到任何我没有看到任何检查的重复项。我将插入如下内容并具有适当的索引来帮助重新加入（基于简短和长的产品描述）。

我只是尝试从选择中插入和插入，然后从该选择中插入......例如......

INSERT INTO item 
      ( short_description, 
        long_description ) 
   SELECT
        t.ShortProductDescription,
        t.LongProductDescription
     from
        frugg.temp_input t;

done, 15 million inserted... into items table... Now, add to the catalog map table...

INSERT INTO item_catalog_map
      ( catalog_id,
        catalog_unique_item_id,
        item_id )
   SELECT
         1 as Catalog_id,
         CAST( t.productID as CHAR) as catalog_unique_item_id,
         item.AutoIncrementIDColumn as item_id
      from
         frugg.temp_input t
            JOIN item on t.ShortProductDescription = item.short_desciption
                     AND t.LongProductDescription = item.long_description

done, all catalog map entries with corresponding "Item ID" inserted...

INSERT INTO item_images
      ( item_id,
        original_url )
   SELECT
         item.AutoIncrementIDColumn as item_id,
         t.productImageURL as original_url
      from
         frugg.temp_input t
            JOIN item on t.ShortProductDescription = item.short_desciption
                     AND t.LongProductDescription = item.long_description

完成图片网址。

mysql - 当存储过程迭代 15M 行的大表时，MySQL 的性能非常糟糕

2 回答 2

Related

Reference