0

我目前在尝试从 Azure 数据工厂加载 ORC 文件时遇到问题。当文件太大时,ADF 管道会抱怨我们的自托管集成运行时失败并出现 OutOfMemory 异常,因为 Java 最大堆大小太小而无法完成加载。

已经尝试过不同的解决方案,例如通过环境变量甚至注册表中的键来增加堆大小(有点像 hack)。具有自托管集成运行时的 VM 具有超过 100GB 的 RAM。

但是仍然失败,因为当从 ADF 查询集成运行时时,这些值似乎一直被“默认”值覆盖。有任何想法吗?

'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.nio.BufferOverflowException:Unable to retrieve Java exception..,Source=Microsoft.DataTransfer.Richfile.OrcTransferPlugin,StackTrace= at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext()
at Microsoft.DataTransfer.Common.Shared.DeserializeControllerBase.GetEstimatedRowSize()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializeController..ctor(DataTable targetSchema, IEnumerable`1 streams, OrcFormatSetting settings, IErrorRowOutput errorRowOutput)
at Microsoft.DataTransfer.ClientLibrary.OrcSerializer.Deserialize(TransferStream stream)
at Microsoft.DataTransfer.Runtime.DeserializationStageProcessor.<Deserialize>d__14.MoveNext()
at Microsoft.DataTransfer.Runtime.TypeConversionStageProcessor.<CreateDataReader>d__5.MoveNext()
at Microsoft.DataTransfer.Runtime.SerializationStageProcessor.<Serialize>d__11.MoveNext()
at Microsoft.DataTransfer.Runtime.BinarySinkStageProcessor.<PopulateStreamName>d__10.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.MultipartWriteSink.ConsumeStreams(IEnumerable`1 streams),''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,StackTrace= at Microsoft.DataTransfer.Richfile.Bridge.BaseObjectBridge.CallObject[TEnum](TEnum methodEnum, jValue[] args)
at Microsoft.DataTransfer.Richfile.Bridge.Orc.OrcBatchReaderBridge.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext(),'
Job ID: daee1a1d-b880-ecb2-e56c-a59397547668
Log ID: Warning        
TraceComponentId: TransferClientLibrary
TraceMessageId: TasksCoordinatorFatalErrorCallback
@logId: Warning
jobId: daee1a1d-b880-ecb2-e56c-a59397547668
activityId: c643b611-8356-4f49-b6d6-e87ea50670e5
eventId: TasksCoordinatorFatalErrorCallback
message: 'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.nio.BufferOverflowException:Unable to retrieve Java exception..,Source=Microsoft.DataTransfer.Richfile.OrcTransferPlugin,StackTrace= at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext()
at Microsoft.DataTransfer.Common.Shared.DeserializeControllerBase.GetEstimatedRowSize()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializeController..ctor(DataTable targetSchema, IEnumerable`1 streams, OrcFormatSetting settings, IErrorRowOutput errorRowOutput)
at Microsoft.DataTransfer.ClientLibrary.OrcSerializer.Deserialize(TransferStream stream)
at Microsoft.DataTransfer.Runtime.DeserializationStageProcessor.<Deserialize>d__14.MoveNext()
at Microsoft.DataTransfer.Runtime.TypeConversionStageProcessor.<CreateDataReader>d__5.MoveNext()
at Microsoft.DataTransfer.Runtime.SerializationStageProcessor.<Serialize>d__11.MoveNext()
at Microsoft.DataTransfer.Runtime.BinarySinkStageProcessor.<PopulateStreamName>d__10.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.MultipartWriteSink.ConsumeStreams(IEnumerable`1 streams),''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,StackTrace= at Microsoft.DataTransfer.Richfile.Bridge.BaseObjectBridge.CallObject[TEnum](TEnum methodEnum, jValue[] args)
at Microsoft.DataTransfer.Richfile.Bridge.Orc.OrcBatchReaderBridge.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext(),'
4

1 回答 1

0

微软在他们自己这边发现了一个错误。加载 .orc 文件时,如果 .orc 文件包含charvarchar列类型,则可能会发生这种错误。将它们全部转换为字符串类型修复了这个错误。它得到了微软的认可,它在 Azure 数据工厂方面,从现在开始大约需要 6 个月的时间才能修复。

于 2021-07-07T09:00:14.083 回答