问题标签 [kylo]

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

0 投票
1 回答
1809 浏览

apache-nifi - 使用 NiFi 的 CSV 到具有动态模式的 json

我从第 3 方获取 CSV 文件。该文件的架构是动态的,我唯一可以确定的是,

  1. 每个包含数据的列也将具有标题名称。
  2. 文件将始终有一个标题。
  3. 标题名称将始终是一串没有空格和点的字母。(所以,有点“干净”)。
  4. 值应该被视为字符串,因为我不确定它们将发送什么。

现在要在我的系统中使用这种类型的数据,我正在考虑使用 MongoDB 作为暂存区。作为没有。从一个负载到另一个负载,列数、列顺序或列名不是恒定的。我认为 MongoDB 将成为一个很好的暂存区。

我读到了ConvertRecord处理器,它是 CSV 到 JSON 转换器的理想选择,但我没有架构。我只希望每一行都作为一个文档,标题名称作为键,值作为值。

我应该怎么做?此外,这个文件将在 25-30 GB 范围内,所以我不想关闭我的系统。

我想用我自己的处理器(在 Java 中)来做这件事,我能够得到我想要的东西,但它似乎花费了太多时间,而且看起来并不是最优的。

让我知道,如果这可以通过现有的处理器来实现?

谢谢,拉克什

更新日期:09/05/2018

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><template encoding-version="1.2"><description></description><groupId>a2bd0551-0165-1000-7c6a-a32ca4db047c</groupId><name>csv_to_json_no_schema_v1</name><snippet><connections><id>91bc4a66-704c-3a2f-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold><backPressureObjectThreshold>10000</backPressureObjectThreshold><destination><groupId>defb04c4-c15c-3a07-0000-000000000000</groupId><id>bb6c25ae-f2b6-386a-0000-000000000000</id><type>PROCESSOR</type></destination><flowFileExpiration>0 sec</flowFileExpiration><labelIndex>1</labelIndex><name></name><selectedRelationships>success</selectedRelationships><source><groupId>defb04c4-c15c-3a07-0000-000000000000</groupId><id>eb6cd54a-e1f1-3871-0000-000000000000</id><type>PROCESSOR</type></source><zIndex>0</zIndex></connections><connections><id>ad804e3c-f233-3556-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold><backPressureObjectThreshold>10000</backPressureObjectThreshold><destination><groupId>defb04c4-c15c-3a07-0000-000000000000</groupId><id>64b15a56-8a5f-3297-0000-000000000000</id><type>PROCESSOR</type></destination><flowFileExpiration>0 sec</flowFileExpiration><labelIndex>1</labelIndex><name></name><selectedRelationships>invalid</selectedRelationships><source><groupId>defb04c4-c15c-3a07-0000-000000000000</groupId><id>bb6c25ae-f2b6-386a-0000-000000000000</id><type>PROCESSOR</type></source><zIndex>0</zIndex></connections><connections><id>c30bd123-c436-36ce-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold><backPressureObjectThreshold>10000</backPressureObjectThreshold><destination><groupId>defb04c4-c15c-3a07-0000-000000000000</groupId><id>8a0e37da-acd2-3d72-0000-000000000000</id><type>PROCESSOR</type></destination><flowFileExpiration>0 sec</flowFileExpiration><labelIndex>1</labelIndex><name></name><selectedRelationships>valid</selectedRelationships><source><groupId>defb04c4-c15c-3a07-0000-000000000000</groupId><id>bb6c25ae-f2b6-386a-0000-000000000000</id><type>PROCESSOR</type></source><zIndex>0</zIndex></connections><connections><id>247d2139-26b7-31fe-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold><backPressureObjectThreshold>10000</backPressureObjectThreshold><destination><groupId>defb04c4-c15c-3a07-0000-000000000000</groupId><id>1297bea9-b30f-3f45-0000-000000000000</id><type>PROCESSOR</type></destination><flowFileExpiration>0 sec</flowFileExpiration><labelIndex>1</labelIndex><name></name><selectedRelationships>failure</selectedRelationships><source><groupId>defb04c4-c15c-3a07-0000-000000000000</groupId><id>8a0e37da-acd2-3d72-0000-000000000000</id><type>PROCESSOR</type></source><zIndex>0</zIndex></connections><connections><id>45e5403f-99f7-3ddf-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold><backPressureObjectThreshold>10000</backPressureObjectThreshold><destination><groupId>defb04c4-c15c-3a07-0000-000000000000</groupId><id>9f8f32f7-130c-35bd-0000-000000000000</id><type>PROCESSOR</type></destination><flowFileExpiration>0 sec</flowFileExpiration><labelIndex>1</labelIndex><name></name><selectedRelationships>success</selectedRelationships><source><groupId>defb04c4-c15c-3a07-0000-000000000000</groupId><id>8a0e37da-acd2-3d72-0000-000000000000</id><type>PROCESSOR</type></source><zIndex>0</zIndex></connections><controllerServices><id>88b0195a-34b2-34f0-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><bundle><artifact>nifi-record-serialization-services-nar</artifact><group>org.apache.nifi</group><version>1.6.0</version></bundle><comments></comments><descriptors><entry><key>Schema Write Strategy</key><value><name>Schema Write Strategy</name></value></entry><entry><key>schema-access-strategy</key><value><name>schema-access-strategy</name></value></entry><entry><key>schema-registry</key><value><identifiesControllerService>org.apache.nifi.schemaregistry.services.SchemaRegistry</identifiesControllerService><name>schema-registry</name></value></entry><entry><key>schema-name</key><value><name>schema-name</name></value></entry><entry><key>schema-version</key><value><name>schema-version</name></value></entry><entry><key>schema-branch</key><value><name>schema-branch</name></value></entry><entry><key>schema-text</key><value><name>schema-text</name></value></entry><entry><key>Date Format</key><value><name>Date Format</name></value></entry><entry><key>Time Format</key><value><name>Time Format</name></value></entry><entry><key>Timestamp Format</key><value><name>Timestamp Format</name></value></entry><entry><key>Pretty Print JSON</key><value><name>Pretty Print JSON</name></value></entry><entry><key>suppress-nulls</key><value><name>suppress-nulls</name></value></entry></descriptors><name>JsonRecordSetWriter</name><persistsState>false</persistsState><properties><entry><key>Schema Write Strategy</key><value>no-schema</value></entry><entry><key>schema-access-strategy</key></entry><entry><key>schema-registry</key></entry><entry><key>schema-name</key></entry><entry><key>schema-version</key></entry><entry><key>schema-branch</key></entry><entry><key>schema-text</key></entry><entry><key>Date Format</key></entry><entry><key>Time Format</key></entry><entry><key>Timestamp Format</key></entry><entry><key>Pretty Print JSON</key></entry><entry><key>suppress-nulls</key></entry></properties><state>ENABLED</state><type>org.apache.nifi.json.JsonRecordSetWriter</type></controllerServices><controllerServices><id>c3e80a29-498b-36d4-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><bundle><artifact>nifi-record-serialization-services-nar</artifact><group>org.apache.nifi</group><version>1.6.0</version></bundle><comments></comments><descriptors><entry><key>schema-access-strategy</key><value><name>schema-access-strategy</name></value></entry><entry><key>schema-registry</key><value><identifiesControllerService>org.apache.nifi.schemaregistry.services.SchemaRegistry</identifiesControllerService><name>schema-registry</name></value></entry><entry><key>schema-name</key><value><name>schema-name</name></value></entry><entry><key>schema-version</key><value><name>schema-version</name></value></entry><entry><key>schema-branch</key><value><name>schema-branch</name></value></entry><entry><key>schema-text</key><value><name>schema-text</name></value></entry><entry><key>csv-reader-csv-parser</key><value><name>csv-reader-csv-parser</name></value></entry><entry><key>Date Format</key><value><name>Date Format</name></value></entry><entry><key>Time Format</key><value><name>Time Format</name></value></entry><entry><key>Timestamp Format</key><value><name>Timestamp Format</name></value></entry><entry><key>CSV Format</key><value><name>CSV Format</name></value></entry><entry><key>Value Separator</key><value><name>Value Separator</name></value></entry><entry><key>Skip Header Line</key><value><name>Skip Header Line</name></value></entry><entry><key>ignore-csv-header</key><value><name>ignore-csv-header</name></value></entry><entry><key>Quote Character</key><value><name>Quote Character</name></value></entry><entry><key>Escape Character</key><value><name>Escape Character</name></value></entry><entry><key>Comment Marker</key><value><name>Comment Marker</name></value></entry><entry><key>Null String</key><value><name>Null String</name></value></entry><entry><key>Trim Fields</key><value><name>Trim Fields</name></value></entry><entry><key>csvutils-character-set</key><value><name>csvutils-character-set</name></value></entry></descriptors><name>CSVReader</name><persistsState>false</persistsState><properties><entry><key>schema-access-strategy</key></entry><entry><key>schema-registry</key></entry><entry><key>schema-name</key></entry><entry><key>schema-version</key></entry><entry><key>schema-branch</key></entry><entry><key>schema-text</key></entry><entry><key>csv-reader-csv-parser</key></entry><entry><key>Date Format</key></entry><entry><key>Time Format</key></entry><entry><key>Timestamp Format</key></entry><entry><key>CSV Format</key></entry><entry><key>Value Separator</key></entry><entry><key>Skip Header Line</key><value>true</value></entry><entry><key>ignore-csv-header</key><value>true</value></entry><entry><key>Quote Character</key></entry><entry><key>Escape Character</key></entry><entry><key>Comment Marker</key></entry><entry><key>Null String</key></entry><entry><key>Trim Fields</key></entry><entry><key>csvutils-character-set</key></entry></properties><state>ENABLED</state><type>org.apache.nifi.csv.CSVReader</type></controllerServices><processors><id>8a0e37da-acd2-3d72-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><position><x>0.0</x><y>227.99996948242188</y></position><bundle><artifact>nifi-standard-nar</artifact><group>org.apache.nifi</group><version>1.6.0</version></bundle><config><bulletinLevel>WARN</bulletinLevel><comments></comments><concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount><descriptors><entry><key>record-reader</key><value><identifiesControllerService>org.apache.nifi.serialization.RecordReaderFactory</identifiesControllerService><name>record-reader</name></value></entry><entry><key>record-writer</key><value><identifiesControllerService>org.apache.nifi.serialization.RecordSetWriterFactory</identifiesControllerService><name>record-writer</name></value></entry></descriptors><executionNode>ALL</executionNode><lossTolerant>false</lossTolerant><penaltyDuration>30 sec</penaltyDuration><properties><entry><key>record-reader</key><value>c3e80a29-498b-36d4-0000-000000000000</value></entry><entry><key>record-writer</key><value>88b0195a-34b2-34f0-0000-000000000000</value></entry></properties><runDurationMillis>0</runDurationMillis><schedulingPeriod>0 sec</schedulingPeriod><schedulingStrategy>TIMER_DRIVEN</schedulingStrategy><yieldDuration>1 sec</yieldDuration></config><name>ConvertRecord</name><relationships><autoTerminate>false</autoTerminate><name>failure</name></relationships><relationships><autoTerminate>false</autoTerminate><name>success</name></relationships><state>STOPPED</state><style/><type>org.apache.nifi.processors.standard.ConvertRecord</type></processors><processors><id>9f8f32f7-130c-35bd-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><position><x>11.0</x><y>483.0</y></position><bundle><artifact>nifi-standard-nar</artifact><group>org.apache.nifi</group><version>1.6.0</version></bundle><config><bulletinLevel>WARN</bulletinLevel><comments></comments><concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount><descriptors><entry><key>Log Level</key><value><name>Log Level</name></value></entry><entry><key>Log Payload</key><value><name>Log Payload</name></value></entry><entry><key>Attributes to Log</key><value><name>Attributes to Log</name></value></entry><entry><key>attributes-to-log-regex</key><value><name>attributes-to-log-regex</name></value></entry><entry><key>Attributes to Ignore</key><value><name>Attributes to Ignore</name></value></entry><entry><key>attributes-to-ignore-regex</key><value><name>attributes-to-ignore-regex</name></value></entry><entry><key>Log prefix</key><value><name>Log prefix</name></value></entry><entry><key>character-set</key><value><name>character-set</name></value></entry></descriptors><executionNode>ALL</executionNode><lossTolerant>false</lossTolerant><penaltyDuration>30 sec</penaltyDuration><properties><entry><key>Log Level</key><value>info</value></entry><entry><key>Log Payload</key><value>false</value></entry><entry><key>Attributes to Log</key></entry><entry><key>attributes-to-log-regex</key><value>.*</value></entry><entry><key>Attributes to Ignore</key></entry><entry><key>attributes-to-ignore-regex</key></entry><entry><key>Log prefix</key></entry><entry><key>character-set</key><value>UTF-8</value></entry></properties><runDurationMillis>0</runDurationMillis><schedulingPeriod>0 sec</schedulingPeriod><schedulingStrategy>TIMER_DRIVEN</schedulingStrategy><yieldDuration>1 sec</yieldDuration></config><name>LogAttribute</name><relationships><autoTerminate>true</autoTerminate><name>success</name></relationships><state>STOPPED</state><style/><type>org.apache.nifi.processors.standard.LogAttribute</type></processors><processors><id>bb6c25ae-f2b6-386a-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><position><x>670.0</x><y>225.0</y></position><bundle><artifact>nifi-standard-nar</artifact><group>org.apache.nifi</group><version>1.6.0</version></bundle><config><bulletinLevel>WARN</bulletinLevel><comments></comments><concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount><descriptors><entry><key>validate-csv-schema</key><value><name>validate-csv-schema</name></value></entry><entry><key>validate-csv-header</key><value><name>validate-csv-header</name></value></entry><entry><key>validate-csv-delimiter</key><value><name>validate-csv-delimiter</name></value></entry><entry><key>validate-csv-quote</key><value><name>validate-csv-quote</name></value></entry><entry><key>validate-csv-eol</key><value><name>validate-csv-eol</name></value></entry><entry><key>validate-csv-strategy</key><value><name>validate-csv-strategy</name></value></entry></descriptors><executionNode>ALL</executionNode><lossTolerant>false</lossTolerant><penaltyDuration>30 sec</penaltyDuration><properties><entry><key>validate-csv-schema</key><value>NotNull,ParseInt(),Optional(ParseInt()),Null</value></entry><entry><key>validate-csv-header</key><value>true</value></entry><entry><key>validate-csv-delimiter</key><value>,</value></entry><entry><key>validate-csv-quote</key><value>"</value></entry><entry><key>validate-csv-eol</key><value>\n</value></entry><entry><key>validate-csv-strategy</key><value>Line by line validation</value></entry></properties><runDurationMillis>0</runDurationMillis><schedulingPeriod>0 sec</schedulingPeriod><schedulingStrategy>TIMER_DRIVEN</schedulingStrategy><yieldDuration>1 sec</yieldDuration></config><name>ValidateCsv</name><relationships><autoTerminate>false</autoTerminate><name>invalid</name></relationships><relationships><autoTerminate>false</autoTerminate><name>valid</name></relationships><state>STOPPED</state><style/><type>org.apache.nifi.processors.standard.ValidateCsv</type></processors><processors><id>eb6cd54a-e1f1-3871-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><position><x>688.0</x><y>0.0</y></position><bundle><artifact>nifi-standard-nar</artifact><group>org.apache.nifi</group><version>1.6.0</version></bundle><config><bulletinLevel>WARN</bulletinLevel><comments></comments><concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount><descriptors><entry><key>File Size</key><value><name>File Size</name></value></entry><entry><key>Batch Size</key><value><name>Batch Size</name></value></entry><entry><key>Data Format</key><value><name>Data Format</name></value></entry><entry><key>Unique FlowFiles</key><value><name>Unique FlowFiles</name></value></entry><entry><key>generate-ff-custom-text</key><value><name>generate-ff-custom-text</name></value></entry><entry><key>character-set</key><value><name>character-set</name></value></entry><entry><key>schema.name</key><value><name>schema.name</name></value></entry></descriptors><executionNode>ALL</executionNode><lossTolerant>false</lossTolerant><penaltyDuration>30 sec</penaltyDuration><properties><entry><key>File Size</key><value>0B</value></entry><entry><key>Batch Size</key><value>1</value></entry><entry><key>Data Format</key><value>Text</value></entry><entry><key>Unique FlowFiles</key><value>false</value></entry><entry><key>generate-ff-custom-text</key><value>name,age,int_val,address Rakesh Prasad,0,99,"address 12 33333, 444441" rakesh Prasad1,1,,"address 12 33333, 444442" rakesh Prasad2,2,55,"address 12 33333, 444443" rakesh Prasad3,,33,"address 12 33333, 444444"</value></entry><entry><key>character-set</key><value>UTF-8</value></entry><entry><key>schema.name</key><value>empData</value></entry></properties><runDurationMillis>0</runDurationMillis><schedulingPeriod>1 day</schedulingPeriod><schedulingStrategy>TIMER_DRIVEN</schedulingStrategy><yieldDuration>1 sec</yieldDuration></config><name>GenerateFlowFile</name><relationships><autoTerminate>false</autoTerminate><name>success</name></relationships><state>STOPPED</state><style/><type>org.apache.nifi.processors.standard.GenerateFlowFile</type></processors><processors><id>1297bea9-b30f-3f45-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><position><x>450.0</x><y>539.0</y></position><bundle><artifact>nifi-standard-nar</artifact><group>org.apache.nifi</group><version>1.6.0</version></bundle><config><bulletinLevel>WARN</bulletinLevel><comments></comments><concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount><descriptors><entry><key>Log Level</key><value><name>Log Level</name></value></entry><entry><key>Log Payload</key><value><name>Log Payload</name></value></entry><entry><key>Attributes to Log</key><value><name>Attributes to Log</name></value></entry><entry><key>attributes-to-log-regex</key><value><name>attributes-to-log-regex</name></value></entry><entry><key>Attributes to Ignore</key><value><name>Attributes to Ignore</name></value></entry><entry><key>attributes-to-ignore-regex</key><value><name>attributes-to-ignore-regex</name></value></entry><entry><key>Log prefix</key><value><name>Log prefix</name></value></entry><entry><key>character-set</key><value><name>character-set</name></value></entry></descriptors><executionNode>ALL</executionNode><lossTolerant>false</lossTolerant><penaltyDuration>30 sec</penaltyDuration><properties><entry><key>Log Level</key><value>info</value></entry><entry><key>Log Payload</key><value>false</value></entry><entry><key>Attributes to Log</key></entry><entry><key>attributes-to-log-regex</key><value>.*</value></entry><entry><key>Attributes to Ignore</key></entry><entry><key>attributes-to-ignore-regex</key></entry><entry><key>Log prefix</key></entry><entry><key>character-set</key><value>UTF-8</value></entry></properties><runDurationMillis>0</runDurationMillis><schedulingPeriod>0 sec</schedulingPeriod><schedulingStrategy>TIMER_DRIVEN</schedulingStrategy><yieldDuration>1 sec</yieldDuration></config><name>LogAttribute</name><relationships><autoTerminate>true</autoTerminate><name>success</name></relationships><state>STOPPED</state><style/><type>org.apache.nifi.processors.standard.LogAttribute</type></processors><processors><id>64b15a56-8a5f-3297-0000-000000000000</id><parentGroupId>defb04c4-c15c-3a07-0000-000000000000</parentGroupId><position><x>837.0</x><y>482.0000305175781</y></position><bundle><artifact>nifi-standard-nar</artifact><group>org.apache.nifi</group><version>1.6.0</version></bundle><config><bulletinLevel>WARN</bulletinLevel><comments></comments><concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount><descriptors><entry><key>Log Level</key><value><name>Log Level</name></value></entry><entry><key>Log Payload</key><value><name>Log Payload</name></value></entry><entry><key>Attributes to Log</key><value><name>Attributes to Log</name></value></entry><entry><key>attributes-to-log-regex</key><value><name>attributes-to-log-regex</name></value></entry><entry><key>Attributes to Ignore</key><value><name>Attributes to Ignore</name></value></entry><entry><key>attributes-to-ignore-regex</key><value><name>attributes-to-ignore-regex</name></value></entry><entry><key>Log prefix</key><value><name>Log prefix</name></value></entry><entry><key>character-set</key><value><name>character-set</name></value></entry></descriptors><executionNode>ALL</executionNode><lossTolerant>false</lossTolerant><penaltyDuration>30 sec</penaltyDuration><properties><entry><key>Log Level</key><value>info</value></entry><entry><key>Log Payload</key><value>false</value></entry><entry><key>Attributes to Log</key></entry><entry><key>attributes-to-log-regex</key><value>.*</value></entry><entry><key>Attributes to Ignore</key></entry><entry><key>attributes-to-ignore-regex</key></entry><entry><key>Log prefix</key></entry><entry><key>character-set</key><value>UTF-8</value></entry></properties><runDurationMillis>0</runDurationMillis><schedulingPeriod>0 sec</schedulingPeriod><schedulingStrategy>TIMER_DRIVEN</schedulingStrategy><yieldDuration>1 sec</yieldDuration></config><name>LogAttribute</name><relationships><autoTerminate>true</autoTerminate><name>success</name></relationships><state>STOPPED</state><style/><type>org.apache.nifi.processors.standard.LogAttribute</type></processors></snippet><timestamp>09/05/2018 01:32:27 EDT</timestamp></template>

0 投票
1 回答
1024 浏览

apache-nifi - 来自 SplitRecord 处理器 Nifi 的运行记录数

有没有办法从 SplitRecord 处理器 Nifi 获取片段索引?我将一个非常大的 xls(4 个磨坊记录)拆分为“每个拆分的记录”= 100000。

现在我只想处理前 2 个拆分,以查看文件的质量并拒绝文件的其余部分。

我可以看到片段索引在其他拆分功能(例如 JsonSplit)中,但不在记录拆分中。还有其他黑客吗?

0 投票
1 回答
592 浏览

apache-nifi - 从 FetchHDFS 处理器获取文件总数

是一种从 FetchHDFS 处理器的单次运行中获取文件总数的方法吗?

我的用例是 ==> 从目录(hdfs)中读取所有文件,将它们连接起来,然后进行进一步处理。但是要停止合并处理器(直到所有文件都在队列中),所以我需要文件计数来设置“最小条目数”。

我可以使用等待/通知,但我仍然需要总计数以便正确设置标志。

无论如何,将其作为 FetchHDFS 或任何文件列表处理器的属性听起来不合逻辑。

更新#2(合并处理器)根据配置,合并处理器应该每 300 秒释放一次文件。在我的用例中,输入文件总数为 2000,但它们的速度很慢(大约 200 秒)。所以下面的配置应该足以合并所有文件。但它不起作用。我仍然可以看到合并处理器让文件以更小的间隔进入。 在此处输入图像描述

更新 #3 == 所有 1600 个文件的总大小为 318 KB,远小于 bin 大小 128 MB

在此处输入图像描述

0 投票
1 回答
988 浏览

apache-nifi - Nifi UpdateAttribute不适用于动态变量

我正在尝试获取 ListHDFS 处理的文件数,因此流程如下所示:

ListHDFS -> UpdateAttribute -> LogAttribute

我根据文档配置了 UpdateAttribute(见附件)。奇怪的是,我什至没有在“查看数据来源”选项中看到“fileCount”。

在此处输入图像描述

我错过了什么?

0 投票
0 回答
360 浏览

apache-nifi - 如何在 Nifi 模板中进行错误处理

我在 Nifi 中创建模板,然后在 Kylo 中导入它们。每个模板都特定于我的业务用例,它们由多个 (30+) Nifi 处理器组成。

为了我的单元测试和一些理智,我将每个处理器的错误关系转发到单独的 logAttribute。

但是当我将此模板导入 Kylo 时,即使流程失败,Kylo 也不会将其视为错误,因为所有错误都会记录到 logArribute。

现在我可以将模板中每个处理器的错误关系转发到单个 logArribute。然后将该流转发到 Kylo 的错误端口。

这是一个正确的解决方案吗?如果没有,其他开发人员会做什么?我在想,Nifi 应该有某种 SINK 处理器,我可以在其中转储我所有的错误关系。

请评论。

谢谢,

0 投票
1 回答
139 浏览

security - HDP KYLO 沙箱中的 Kylo 安全实现?

我正在尝试在 Kylo 中应用安全性,例如一个用户创建的提要和类别,如果它对其他用户不可见。

在 Kylo HDP 沙盒中可以吗?

如果是,我需要进行哪些更改?

0 投票
1 回答
1936 浏览

apache-nifi - Nifi 处理器未正确解析 JSON

我正在使用 EvaluateJsonPath 从 JSON 中提取一个特定值。我正在使用以下 JSONPath 表达式:

这是我调用 JSONPath 的 JSON 文档:

当我在 JSONPath 在线测试工具(见附图)上使用上面的配置(特定的 JSONPath 查询)时,我得到了预期的结果。但不知何故 nifi 正在返回空数组。

在此处输入图像描述

模板:

0 投票
0 回答
61 浏览

hive - 当我创建一个新提要时,我遇到了这样的问题 保存提要重复键 ProcessGroupDTO:b20b995a-0165-1000-0479-b059731bba5b 时出错

当我创建一个新提要时,我遇到了这样的问题

在此处输入图像描述

0 投票
1 回答
124 浏览

kylo - 如何在 Kylo 中添加自定义类别属性?

我想在 Kylo 中创建类别时定义一些额外的属性。例如,旗帜之类的东西。如果标志为 Y,则将该类别作为该类别的元数据传递给某个数据库。是否可以?如果是,请建议如何做到这一点。

0 投票
0 回答
48 浏览

kylo - 我想将 kylo 设置为一个 Web 应用程序,而不仅仅是一个摄取工具。我想知道这样做的风险和更好的方法

将 kylo 不仅用作摄取工具,而且将其扩展为提供查询以从配置单元读取数据或使用 API(自定义)在配置单元中运行查询,是否完全可以?将 kylo 与蜂巢一起暴露以运行 quires 是否会带来任何安全风险?