0

因此,我尝试使用load data local infile命令将 .csv 文件直接导入 mysql,但我遇到了残留的 excel 公式字段的字段,我不知道如何摆脱它们。下面示例中第一个字段的内容=前面有一个。

表结构设置为允许第一个字段为 a ,但如果可能VARCHAR(100),我想将其设为 a。INT这是正在上传的 csv 内容的示例。

"MID","DBA Name","Partner ID","Partner Name","Sub Partner ID","Sub Partner Name","Active Months","Bonus Amount","Bonus Applied Date","Partner Percentage","Partner Share","Total Payment"
="0008788014065741","company2","7968","me,"11839","Joe Blow","0","$50.00","","","","$350.64"
="0008788014065756","company2","7968","you","11839","Joe Blow","0","$50.00","","","","$294.60"

这是我用来导入数据的 mysql load 命令:

sql = """
    LOAD DATA LOCAL INFILE '%(upload)s' IGNORE INTO TABLE `%(table)s`
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' 
    LINES TERMINATED BY '\\r\\n' 
    IGNORE 1 LINES ;
      """ % {"upload": file, "table": report}
self.db.query( sql )

是否可以根据正则表达式或其他东西对导入进行操作?我不知道,我只是在这里抓住稻草......

感谢您的输入!

4

1 回答 1

2

您可以通过两种方式使用LOAD DATA INFILE.

首先按原样读取第一个字段值,然后在子句=中从中删除等号和双引号。SET此外,您很可能希望在加载数据时进行其他转换,例如:

  • NULL当您的字段为空时设置实际s
  • 从货币价值中去除美元符号
  • 您可能必须转换日期值(但您的示例数据没有它们,因此没有信息可以推断)
LOAD DATA LOCAL INFILE '/path/to/your/file.csv' 
IGNORE INTO TABLE table_name
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' 
    LINES TERMINATED BY '\r\n'
    IGNORE 1 LINES 
(@MID, DBAName, PartnerID, PartnerName, SubPartnerID, SubPartnerName, ActiveMonths,
 @BonusAmount, @BonusAppliedDate, @PartnerPercentage, @PartnerShare, @TotalPayment)
SET MID = TRIM(BOTH '"' FROM SUBSTR(@MID, 2)), -- here we get rid of equal sign and double quotes
    BonusAmount  = TRIM(LEADING '$' FROM NULLIF(@BonusAmount, '')),
    BonusAppliedDate = NULLIF(@BonusAppliedDate, ''),
    PartnerPercentage = NULLIF(@PartnerPercentage, ''),
    PartnerShare = TRIM(LEADING '$' FROM NULLIF(@PartnerShare, '')),
    TotalPayment = TRIM(LEADING '$' FROM NULLIF(@TotalPayment, ''))

第二种方法是利用LINES STARTING BY条款

LOAD DATA LOCAL INFILE '/path/to/your/file.csv' 
IGNORE INTO TABLE table_name
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' 
    LINES TERMINATED BY '\r\n' STARTING BY '='
    IGNORE 1 LINES 
(MID, DBAName, PartnerID, PartnerName, SubPartnerID, SubPartnerName, ActiveMonths, @BonusAmount, BonusAppliedDate, PartnerPercentage, PartnerShare, @TotalPayment)
SET BonusAmount  = TRIM(LEADING '$' FROM NULLIF(@BonusAmount, '')),
    BonusAppliedDate = NULLIF(@BonusAppliedDate, ''),
    PartnerPercentage = NULLIF(@PartnerPercentage, ''),
    PartnerShare = TRIM(LEADING '$' FROM NULLIF(@PartnerShare, '')),
    TotalPayment = TRIM(LEADING '$' FROM NULLIF(@TotalPayment, ''))

现在,如果您的目标表架构看起来像

CREATE TABLE table_name 
(
    MID BIGINT, 
    DBAName           VARCHAR(100),
    PartnerID         INT,
    PartnerName       VARCHAR(100),
    SubPartnerID      INT,
    SubPartnerName    VARCHAR(100),
    ActiveMonths      INT,
    BonusAmount       DECIMAL(19, 2),
    BonusAppliedDate  DATE,
    PartnerPercentage DECIMAL(3, 2),
    PartnerShare      DECIMAL(19, 2),
    TotalPayment      DECIMAL(19, 2)
);

然后在使用任一方法加载后,我们在表中得到

mysql> 从表名中选择 *;
+---------------+----------+------------+--------- ---+--------------+----------------+-------------- +-------------+------------------+---------------- ---+--------------+--------------+
| 中 | DBA名称 | 合作伙伴 ID | 合作伙伴名称 | 子合作伙伴ID | 子合作伙伴名称 | 活动月 | 奖金金额 | 奖金申请日期 | 合作伙伴百分比 | 合作伙伴分享 | 总付款 |
+---------------+----------+------------+--------- ---+--------------+----------------+-------------- +-------------+------------------+---------------- ---+--------------+--------------+
| 8788014065741 | 公司2 | 7968 | 我 | 11839 | 乔吹| 0 | 50.00 | 空 | 空 | 空 | 350.64 |
| 8788014065756 | 公司2 | 7968 | 你 | 11839 | 乔吹| 0 | 50.00 | 空 | 空 | 空 | 294.60 |
+---------------+----------+------------+--------- ---+--------------+----------------+-------------- +-------------+------------------+---------------- ---+--------------+--------------+
2 行(0.00 秒)
于 2013-09-27T18:56:27.540 回答