请参阅下面的编辑
我们使用 MarkLogic Content Pump 将数据加载到 ML8 数据库中。我们有一个一切正常的开发环境和一个生产环境,其中 mlcp 不会通过对要处理的文件数量的评估。
我们有 210 万个 JSON 文档要加载。
在开发服务器(ML8 + CentOS6)上,我们看到:
15/07/13 13:19:35 INFO contentpump.ContentPump: Hadoop library version: 2.0.0-alpha
15/07/13 13:19:35 INFO contentpump.LocalJobRunner: Content type is set to MIXED. The format of the inserted documents will be determined by the MIME type specification configured on MarkLogic Server.
15/07/13 13:19:35 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
15/07/13 13:23:06 INFO input.FileInputFormat: Total input paths to process : 2147329
15/07/13 13:24:08 INFO contentpump.LocalJobRunner: completed 0%
15/07/13 13:34:43 INFO contentpump.LocalJobRunner: completed 1%
15/07/13 13:43:42 INFO contentpump.LocalJobRunner: completed 2%
15/07/13 13:51:15 INFO contentpump.LocalJobRunner: completed 3%
完成正常,数据加载正常。
现在我们在我们得到的产品服务器(ML8 + CentOS 7)的不同机器上使用相同的数据
15/07/14 17:02:21 INFO contentpump.ContentPump: Hadoop library version: 2.6.0
15/07/14 17:02:21 INFO contentpump.LocalJobRunner: Content type is set to MIXED. The format of the inserted documents will be determined by the MIME type specification configured on MarkLogic Server.
除了不同的操作系统,我们还在 de prod 服务器 2.6.0 上安装了更新版本的 mlcp,而不是 2.0.0。如果我们使用相同的命令来导入只有 2000 个文件的目录,它适用于 prod ...
计算要处理的文件数时,作业卡住了......
可能是什么问题?
开始编辑我们将 mlcp 放入 DEBUG 并用一个小的 samle.zip 进行测试
结果:
[ashraf@77-72-150-125 ~]$ mlcp.sh import -host localhost -port 8140 -username ashraf -password duurz44m -input_file_path /home/ashraf/sample2.zip -input_compressed true -mode local -output_uri_replace "\".*,''\"" -output_uri_prefix incoming/linkedin/ -output_collections incoming,incoming/linkedin -output_permissions slush-dikw-node-role,read
15/07/16 16:36:31 DEBUG contentpump.ContentPump: Command: IMPORT
15/07/16 16:36:31 DEBUG contentpump.ContentPump: Arguments: -host localhost -port 8140 -username ashraf -password duurz44m -input_file_path /home/ashraf/sample2.zip -input_compressed true -mode local -output_uri_replace ".*,''" -output_uri_prefix incoming/linkedin/ -output_collections incoming,incoming/linkedin -output_permissions slush-dikw-node-role,read
15/07/16 16:36:31 INFO contentpump.ContentPump: Hadoop library version: 2.6.0
15/07/16 16:36:31 DEBUG contentpump.ContentPump: Running in: localmode
15/07/16 16:36:31 INFO contentpump.LocalJobRunner: Content type is set to MIXED. The format of the inserted documents will be determined by the MIME type specification configured on MarkLogic Server.
15/07/16 16:36:32 DEBUG contentpump.LocalJobRunner: Thread pool size: 4
15/07/16 16:36:32 INFO input.FileInputFormat: Total input paths to process : 1
15/07/16 16:36:33 DEBUG contentpump.LocalJobRunner: Thread Count for Split#0 : 4
15/07/16 16:36:33 DEBUG contentpump.CompressedDocumentReader: Starting file:/home/ashraf/sample2.zip
15/07/16 16:36:33 DEBUG contentpump.MultithreadedMapper: Running with 4 threads
15/07/16 16:36:33 DEBUG mapreduce.ContentWriter: Connect to localhost
15/07/16 16:36:33 DEBUG mapreduce.ContentWriter: Connect to localhost
15/07/16 16:36:33 DEBUG mapreduce.ContentWriter: Connect to localhost
15/07/16 16:36:33 DEBUG mapreduce.ContentWriter: Connect to localhost
15/07/16 16:36:34 INFO contentpump.LocalJobRunner: completed 0%
15/07/16 16:36:39 INFO contentpump.LocalJobRunner: completed 100%
2015-07-16 16:39:11.483 WARNING [19] (AbstractRequestController.runRequest): Error parsing HTTP headers: Premature EOF, partial header line read: ''
15/07/16 16:39:12 DEBUG contentpump.CompressedDocumentReader: Closing file:/home/ashraf/sample2.zip
15/07/16 16:39:12 INFO contentpump.LocalJobRunner: com.marklogic.contentpump.ContentPumpStats:
15/07/16 16:39:12 INFO contentpump.LocalJobRunner: ATTEMPTED_INPUT_RECORD_COUNT: 1993
15/07/16 16:39:12 INFO contentpump.LocalJobRunner: SKIPPED_INPUT_RECORD_COUNT: 0
15/07/16 16:39:12 INFO contentpump.LocalJobRunner: Total execution time: 160 sec
只有第一个 json 文件在数据库中,其余的被丢弃/丢失?
JSON文件中的换行符是否存在问题?
(AbstractRequestController.runRequest): Error parsing HTTP headers: Premature EOF, partial header line read: ''
任何提示都会很棒。
雨果