目前generate_statistics_from_tfrecord
在 Dataflow 中运行时,我发现它导致了这个异常:
File "/usr/local/lib/python3.6/dist-packages/tensorflow_data_validation/statistics/stats_impl.py", line 687, in extract_output
accumulator.partial_accumulators))
File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/statistics/stats_impl.py", line 589, in _for_each_generator
self._generators, zip(*args))]
File "/usr/local/lib/python3.6/site-packages/tensorflow_""data_validation/statistics/stats_impl.py", line 588, in <listcomp>
return [func(gen, *args_for_func) for gen, args_for_func in zip(
File "/usr/local/lib/python3.6/dist-packages/tensorflow_data_validation/statistics/stats_impl.py", line 686, in <lambda>
self._for_each_generator(lambda gen, acc: gen.extract_output(acc),
File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/statistics/generators/basic_stats_generator.py", line 844, in extract_output
self._weight_feature is not None)
File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/statistics/generators/basic_stats_generator.py", line 565, in _make_feature_stats_proto
num_quantiles_histogram_buckets, has_weights)
File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/statistics/generators/basic_stats_generator.py", line 421, in _make_numeric_stats_proto
quantiles, total_num_values, num_histogram_buckets)
File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/utils/quantiles_util.py", line 188, in generate_equi_width_histogram
list(quantiles), total_count, num_buckets)
File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/utils/quantiles_util.py", line 252, in generate_equi_width_buckets
(quantiles[curr_index] - quantiles[curr_index - 1]))
IndexError: list index out of range
我们有其他数据集没有这个问题,所以我假设我们的一个特征有一些坏数据。哪些类型的格式错误的数据会导致此问题?
谢谢你的帮助!