1

I'm unable to get the default crawler classifier, nor a custom classifier to work against many of my CSV files. The classification is listed as 'UNKNOWN'. I've tried re-running existing classifiers, as well as creating new ones. Is anyone aware of a specific configuration for a custom classifier for CSV files that works for files of any size?

I'm also unable to find any errors specific to this issue in the logs.

Although I have seen reference to issues for JSON files over 1MB in size, I can't find anything detailing this same issue for CSV files, nor a solution to the problem.

4

1 回答 1

2

Glue Crawler 支持的默认 CSV 分类器:

CSV - 检查以下分隔符:逗号 (,)、竖线 (|)、制表符 (\t)、分号 (;) 和 Ctrl-A (\u0001)。Ctrl-A 是标题开头的 Unicode 控制字符。

如果您有任何其他分隔符,则它不适用于默认的 CSV 分类器。在这种情况下,您将不得不编写 grok 模式。

于 2019-05-30T08:50:54.720 回答