我目前在尝试从 Azure 数据工厂加载 ORC 文件时遇到问题。当文件太大时,ADF 管道会抱怨我们的自托管集成运行时失败并出现 OutOfMemory 异常,因为 Java 最大堆大小太小而无法完成加载。
已经尝试过不同的解决方案,例如通过环境变量甚至注册表中的键来增加堆大小(有点像 hack)。具有自托管集成运行时的 VM 具有超过 100GB 的 RAM。
但是仍然失败,因为当从 ADF 查询集成运行时时,这些值似乎一直被“默认”值覆盖。有任何想法吗?
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.nio.BufferOverflowException:Unable to retrieve Java exception..,Source=Microsoft.DataTransfer.Richfile.OrcTransferPlugin,StackTrace= at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext()
at Microsoft.DataTransfer.Common.Shared.DeserializeControllerBase.GetEstimatedRowSize()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializeController..ctor(DataTable targetSchema, IEnumerable`1 streams, OrcFormatSetting settings, IErrorRowOutput errorRowOutput)
at Microsoft.DataTransfer.ClientLibrary.OrcSerializer.Deserialize(TransferStream stream)
at Microsoft.DataTransfer.Runtime.DeserializationStageProcessor.<Deserialize>d__14.MoveNext()
at Microsoft.DataTransfer.Runtime.TypeConversionStageProcessor.<CreateDataReader>d__5.MoveNext()
at Microsoft.DataTransfer.Runtime.SerializationStageProcessor.<Serialize>d__11.MoveNext()
at Microsoft.DataTransfer.Runtime.BinarySinkStageProcessor.<PopulateStreamName>d__10.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.MultipartWriteSink.ConsumeStreams(IEnumerable`1 streams),''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,StackTrace= at Microsoft.DataTransfer.Richfile.Bridge.BaseObjectBridge.CallObject[TEnum](TEnum methodEnum, jValue[] args)
at Microsoft.DataTransfer.Richfile.Bridge.Orc.OrcBatchReaderBridge.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext(),'
Job ID: daee1a1d-b880-ecb2-e56c-a59397547668
Log ID: Warning
TraceComponentId: TransferClientLibrary
TraceMessageId: TasksCoordinatorFatalErrorCallback
@logId: Warning
jobId: daee1a1d-b880-ecb2-e56c-a59397547668
activityId: c643b611-8356-4f49-b6d6-e87ea50670e5
eventId: TasksCoordinatorFatalErrorCallback
message: 'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.nio.BufferOverflowException:Unable to retrieve Java exception..,Source=Microsoft.DataTransfer.Richfile.OrcTransferPlugin,StackTrace= at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext()
at Microsoft.DataTransfer.Common.Shared.DeserializeControllerBase.GetEstimatedRowSize()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializeController..ctor(DataTable targetSchema, IEnumerable`1 streams, OrcFormatSetting settings, IErrorRowOutput errorRowOutput)
at Microsoft.DataTransfer.ClientLibrary.OrcSerializer.Deserialize(TransferStream stream)
at Microsoft.DataTransfer.Runtime.DeserializationStageProcessor.<Deserialize>d__14.MoveNext()
at Microsoft.DataTransfer.Runtime.TypeConversionStageProcessor.<CreateDataReader>d__5.MoveNext()
at Microsoft.DataTransfer.Runtime.SerializationStageProcessor.<Serialize>d__11.MoveNext()
at Microsoft.DataTransfer.Runtime.BinarySinkStageProcessor.<PopulateStreamName>d__10.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.MultipartWriteSink.ConsumeStreams(IEnumerable`1 streams),''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,StackTrace= at Microsoft.DataTransfer.Richfile.Bridge.BaseObjectBridge.CallObject[TEnum](TEnum methodEnum, jValue[] args)
at Microsoft.DataTransfer.Richfile.Bridge.Orc.OrcBatchReaderBridge.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext(),'