1

背景:

我们有大约 60GB 的大型平面文件,并且正在插入到数据库中。在插入过程中,我们正在经历增量性能降级。

  • 我们有 174(百万)条记录,预计再插入 50(百万)条记录
  • 我们根据实体名称的前两个字符将主表拆分为 1000 多个表,例如 entity_aa、entity_ab ... entity_zz
  • 在每次插入期间,运行三个查询 (a) 基于范围的搜索到另一个表,(b) 检查记录是否已插入 (c) 插入到详细信息 (entity_briefs) 表
  • 我们添加了 entity_briefs 来处理频繁的搜索查询,但意识到,在插入数据库后,无论我们是否 ALTER TABLE entity(或 entity_briefs)DISABLE(或 ENABLE)KEY,它都会逐渐变慢。
  • 这台机器有 4 个 CPU,Gigs 磁盘空间,2GB RAM。操作系统为 Linux CentOS (5.4) 32bit
  • 我们发现并非所有 4 个 CPU 都被利用
  • 我们一次运行了 4 个导入脚本,但整体性能并不令人满意

有问题的表

CREATE TABLE `entity_briefs` (
`entity_brief_id` bigint(11) NOT NULL auto_increment,
`entity_id` bigint(11) default NULL,
`entity_table_prefix` char(2) default NULL,
`string_1` varchar(255) default NULL,
`string_2` varchar(255) default NULL,
`zip` varchar(25) default NULL,
`phone` bigint(11) default NULL,
PRIMARY KEY  (`entity_brief_id`),
KEY `idx_entity_id` (`entity_id`),
KEY `idx_entity_table_prefix` (`entity_table_prefix`),
KEY `idx_zip` (`zip`),
KEY `idx_string_1` (`string_1`),
KEY `idx_string_2` (`string_2`),
KEY `idx_phone` (`phone`)
);

mysqltuner.pl 输出:

 >>  MySQLTuner 1.1.1 - Major Hayden <major@mhtx.net>
 >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering
Please enter your MySQL administrative login: xxxxx
Please enter your MySQL administrative password:xxxxx

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.85-community
[OK] Operating on 32-bit architecture with less than 2GB RAM

-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 101M (Tables: 1344)
[!!] InnoDB is enabled but isn't being used
[!!] Total fragmented tables: 1

-------- Security Recommendations  -------------------------------------------
ERROR 1142 (42000) at line 1: SELECT command denied to user 'xxxx'@'localhost' for table 'user'
[OK] All database users have passwords assigned

-------- Performance Metrics -------------------------------------------------
[--] Up for: 5d 15h 53m 55s (2M q [4.395 qps], 9K conn, TX: 1B, RX: 425M)
[--] Reads / Writes: 51% / 49%
[--] Total buffers: 34.0M global + 2.7M per thread (500 max threads)
[OK] Maximum possible memory usage: 1.3G (67% of installed RAM)
[OK] Slow queries: 0% (9/2M)
[OK] Highest usage of available connections: 1% (5/500)
[!!] Key buffer size / total MyISAM indexes: 8.0M/105.3M
[!!] Key buffer hit rate: 94.1% (72M cached / 4M reads)
[!!] Query cache is disabled
[OK] Temporary tables created on disk: 7% (101 on disk / 1K total)
[!!] Thread cache is disabled
[!!] Table cache hit rate: 0% (64 open / 277K opened)
[OK] Open file limit used: 0% (127/18K)
[OK] Table locks acquired immediately: 99% (2M immediate / 2M locks)
[!!] Connections aborted: 38%

-------- Recommendations -----------------------------------------------------
General recommendations:
    Add skip-innodb to MySQL configuration to disable InnoDB
    Run OPTIMIZE TABLE to defragment tables for better performance
    Enable the slow query log to troubleshoot bad queries
    Set thread_cache_size to 4 as a starting value
    Increase table_cache gradually to avoid file descriptor limits
    Your applications are not closing MySQL connections properly
Variables to adjust:
    key_buffer_size (> 105.3M)
    query_cache_size (>= 8M)
    thread_cache_size (start at 4)
    table_cache (> 64)

需求: 为了加快插入速度,可以使用什么优化策略?

4

1 回答 1

3

一些一般性建议,因为我没有灵丹妙药给你:

我认为随着表大小的增长,您不能指望插入时事情不会减慢。数据库插入时间通常会随数据库大小而变化,诀窍是在这种预期下尝试使整体性能可以接受。

如果事情变慢并且 CPU 没有固定,那么您可能在数据库访问上受到 I/O 限制。如果您发现是这种情况,您可能想尝试更快的驱动器、Raid 0、更快的驱动器控制器等。您甚至可以考虑在固态驱动器上构建数据库,然后在创建后将其复制到传统硬盘驾驶。对于您可以从文件系统上的 mysql 获得的随机访问行为,这些应该要快得多,尽管我知道随着时间的推移您会“磨损”它们。尽管如此,您仍然可以获得低于 10,000 美元的 TB 固态存储。

还要好好看看优化您的插入过程。像您提到的那样在插入期间禁用索引虽然不会阻止逐渐减速,但应该会显着加快整个过程。我从您的描述中得知,您有某种插入脚本逻辑,可以进行选择和插入,而不是简单的 LOAD 平面文件。您每次插入都执行三个不同的查询,可能在客户端和数据库之间多次往返数据。尤其是查看范围选择,并确保仅此查询在表大小方面没有不良的性能特征。

Another possibility may be throwing a lot more RAM at the problem and using it as a disk cache. If that "other table" that you are running those range selects on isn't being modified during your insertfest, perhaps you can get that in memory to cut down on drive seeking, if you determine that seek time is indeed the performance bound here.

于 2010-02-05T03:45:32.887 回答