3

我正在使用 Google Cloud 的导入 API 将 Google Cloud Storage 中的 CSV 文件加载到 Cloud SQL MySQL 5.7 数据库中:

https://cloud.google.com/sql/docs/mysql/admin-api/v1beta4/instances/import

导入 API 向 MySQL 数据库生成一个 LOAD DATA INFILE 命令,该命令采用“utf8”字符集。但是,我的 CSV 文件中的数据使用“utf8mb4”编码,它是“utf8”的超集。当某些字符串无法编码为“utf8”时,这会导致加载过程失败:

Exception: CloudSQL Exception: {'kind': 'sql#operation', 'selfLink': 'https://www.googleapis.com/sql/v1beta4/projects/***', 'targetProject': '***', 'targetId': '***', 'targetLink': 'https://www.googleapis.com/sql/v1beta4/projects/***', 'name': '0211c99e-0633-42f1-9ee1-069473308273', 'operationType': 'IMPORT', 'status': 'RUNNING', 'user': '***', 'insertTime': '2019-01-14T02:36:39.861Z', 'startTime': '2019-01-14T02:36:39.972Z', 'error': {'kind': 'sql#operationErrors', 'errors': [{'kind': 'sql#operationError', 'code': 'ERROR_RDBMS', 'message': "Import CSV error: Error 1300: Invalid utf8 character string: ''Afikanisitani|'Apekanikana|A Phu Han (Afghanistan)|A Phú Hãn '\n"}]}, 'importContext': {'kind': 'sql#importContext', 'uri': '***', 'database': '**', 'importUser': '', 'csvImportOptions': {'table': '***'}}}

相关文章:“'message':“导入 CSV 错误:错误 1300:无效的 utf8 字符串:”

有没有办法使用导入 API(或任何其他字符集)添加“utf8mb4”字符集?

我尝试在“csvImportOptions”字典中添加“字符集”:“utf8mb4”,但似乎导入 API 只需要该字典中的“表”和“列”键。

请注意,如果我直接从 MySQL 客户端运行 LOAD DATA INFILE 命令,我可以毫无问题地导入 CSV:

LOAD DATA INFILE 'myCSVFile.csv'
INTO TABLE 'my table'
CHARACTER SET utf8mb4
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
from pprint import pprint

from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

credentials = GoogleCredentials.get_application_default()

service = discovery.build('sqladmin', 'v1beta4', credentials=credentials)

# Project ID of the project that contains the instance.
project = 'my-project'  # TODO: Update placeholder value.

# Cloud SQL instance ID. This does not include the project ID.
instance = 'my-instance'  # TODO: Update placeholder value.

instances_import_request_body = {
    "importContext": {
        "kind": "sql#importContext",
        "fileType": "CSV",
        "uri": gcs_uri,
        "database": database,
        "csvImportOptions": {
            "table": table
        }
    }
}

request = service.instances().import_(project=project, instance=instance, body=instances_import_request_body)
response = request.execute()

附加数据点 我很清楚 Google API 生成的 LOAD DATA INFILE 查询默认为“utf8”字符集。

失败并显示与 API 相同的错误消息

LOAD DATA INFILE 'problematic.csv'
INTO TABLE my_table
**CHARACTER SET utf8**
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
ESCAPED BY '\"'

ERROR 1300 (HY000): Invalid utf8 character string: ''Afikanisitani|'Apekanikana|A Phu Han (Afghanistan)|A Phú Hãn '

作品:

LOAD DATA INFILE 'problematic.csv'
INTO TABLE my_table
**CHARACTER SET utf8mb4**
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
ESCAPED BY '\"'

Query OK, 75641 rows affected (1.14 sec)
Records: 75641  Deleted: 0  Skipped: 0  Warnings: 0

这里的文档不正确:https ://cloud.google.com/sql/docs/mysql/import-export/importing

  LOAD DATA INFILE ... ***CHARACTER SET 'utf8mb4'***
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '\"'.
4

1 回答 1

1

解决方案是在您的 Google Cloud SQL 实例上设置以下标志:

character-set-server: utf8mb4

配置数据库标志

CLI 命令:

gcloud sql instances patch [INSTANCE_NAME] --database-flags character-set-server=utf8mb4

数据库标志列在settings集合下databaseFlags

gcloud sql instances describe [INSTANCE_NAME]

警告:patch命令中未包含的任何标志都将设置回其默认值。

于 2019-01-14T05:17:18.053 回答