1

我有一个来自美国人口普查的 CSV 文件,如下所示:

"ZIP5","ZIP4","ZIP9","STATE CODE","STATE","COUNTY CODE","COUNTY NAME","CBSA CODE","CBSA  TITLE","CBSA LSAD","METRO DIVISION CODE","METRO DIVISION TITLE","METRO DIVISION LSAD","CSA   CODE","CSA TITLE","CSA LSAD"
"04841",,"04841","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04843",,"04843","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan     Statistical Area",,,,,,
"04846",,"04846","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04847",,"04847","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04848",,"04848","23","ME","027","Waldo County",,,,,,,,,
"04849",,"04849","23","ME","027","Waldo County",,,,,,,,,
"04850",,"04850","23","ME","027","Waldo County",,,,,,,,,
"04851",,"04851","23","ME","013","Knox County","40500","Rockland, ME","Micropolitan Statistical Area",,,,,,
"04852",,"04852","23","ME","015","Lincoln County",,,,,,,,,

该文件有超过 200 万条记录。大多数记录的所有字段都没有数据。

这是我为上述 CSV 文件定义的 MySQL 记录布局:

+----------------------+------------------+------+-----+---------+----------------+
| Field                | Type             | Null | Key | Default | Extra          |
+----------------------+------------------+------+-----+---------+----------------+
| id                   | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| ZIP5                 | varchar(5)       | NO   |     | NULL    |                |
| ZIP4                 | varchar(5)       | NO   |     | NULL    |                |
| ZIP9                 | varchar(10)      | NO   |     | NULL    |                |
| STATE_CODE           | varchar(2)       | NO   |     | NULL    |                |
| STATE                | varchar(2)       | NO   |     | NULL    |                |
| COUNTY_CODE          | varchar(3)       | NO   |     | NULL    |                |
| COUNTY_NAME          | varchar(50)      | NO   |     | NULL    |                |
| CBSA_CODE            | varchar(5)       | NO   |     | NULL    |                |
| CBSA_TITLE           | varchar(50)      | NO   |     | NULL    |                |
| CBSA_LSAD            | varchar(50)      | NO   |     | NULL    |                |
| METRO_DIVISION_CODE  | varchar(5)       | NO   |     | NULL    |                |
| METRO_DIVISION_TITLE | varchar(50)      | NO   |     | NULL    |                |
| METRO_DIVISION_LSAD  | varchar(50)      | NO   |     | NULL    |                |
| CSA_CODE             | varchar(3)       | NO   |     | NULL    |                |
| CSA_TITLE            | varchar(50)      | NO   |     | NULL    |                |
| CSA_LSAD             | varchar(50)      | NO   |     | NULL    |                |
+----------------------+------------------+------+-----+---------+----------------+

(我刚刚意识到我应该将 ZIP5 定义为主键?)

我已经读过,如果您在 CSV 文件中有一个空字段,您应该将其更改为 \N,但是有没有办法轻松做到这一点?我可以编写一个 PHP 程序来做到这一点,但是有超过 200 万条记录需要很长时间,而且我的服务器没有很多 RAM。

如何以最简单的方式成功将此 CSV 文件导入 MySQL?MySQL 中的 LOAD 命令上是否有一些参数可以做到这一点?它现在的工作方式,它抱怨 ZIP5 有数据截断,当我查看 MySQL 时,它在邮政编码中有引号,只有前 4 位数字。谢谢!

4

2 回答 2

1

首先,我在您上面发布的表格上看不到主键。首先必须始终有一个主键。通常我们使用AUTOINCREMENT添加一个名为id的列。对于邮政编码和其他东西,描述 2-3 列的复杂键也很方便。一如既往地视情况而定。

至于进口。你有一些解决方案

  1. 在本地运行脚本以生成 SQL 插入语句,然后通过您可用的任何接口将数据提供给 mysql 服务器。

  2. 将 CSV 文件上传到服务器并使用命令行 mysql 导入 CSV。MySQL 有一个内置的 CSV 导入器,虽然我从不喜欢它;)

  3. 在服务器上运行脚本并一次添加一行。在 PHP 中,您可以逐行加载 CSV 并在每一行上插入(请记住相应的 set_time_limit 和 memory_limit)。提醒一下,对于第 3 步,如果您通过浏览器而不是通过命令行运行它,那么您的浏览器很可能会超时。放心,通过脚本不会停止运行,直到它结束。

我想我有一个 CSV 导入器(用于巨大的 CSV 文件——比如地理标记)。如果您需要,请告诉我,我可能会找到并在此处发布。

不幸的是,我找不到我的 csv 导入器。但是查看 php 手册中 fgetcsv 的第一个条目并进行了一些修改......

set_time_limit(3600); // 1 hour max script execution time. Adjust it according to your expectations.
if (($handle = fopen("test.csv", "r")) !== FALSE) {
    // this will automate things but modify the csv head for each column to represent the actual column name in your table.
    $header = fgetcsv($handle, 1000, ",");
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $i = 0;
        $values = array();
        foreach($header as $key) {
            if (!empty($data[$i])) {
               $values[$key] = $data[$i];
           }
        }

        $keys = "`" . implode("`, `", array_keys($values)) . "`";
        $values = "'" . implode("', '", $values) . "'";
        $statement = "INSERT INTO `table_name` ({$keys}) VALUES ({$values})";
        // run the statement. the above is if you don't use PDO. For PDO transform accordingly. $values holds the column_name => value pairs. The values that can be null and should not be inserted you should give them default values in your mysql schema (table)
    }
    fclose($handle);
}

我希望这有帮助。还没有测试,但看起来不错;)

于 2012-10-13T11:33:21.857 回答
0

更改文件路径后尝试以下 LOAD 命令,如果需要,请尝试行尾。

LOAD DATA INFILE 'your_file.csv' IGNORE
INTO TABLE zipcodes
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(ZIP5, ZIP4, ZIP9, STATE_CODE, STATE, COUNTY_CODE, COUNTY_NAME, CBSA_CODE, 
CBSA_TITLE, CBSA_LSAD, METRO_DIVISION_CODE, METRO_DIVISION_TITLE, 
METRO_DIVISION_LSAD, CSA_CODE, CSA_TITLE, CSA_LSAD);
于 2012-10-13T14:28:55.570 回答