tensorflow - GMM.fit 分段错误

Question

我现在正在尝试在我的实验中使用 GMM。但我有以下问题。我对这个错误感到很困惑。

import tensorflow as tf
class GMMDataLoader:
    def __init__(self, points, batch_size):
        self.points = points
        self.batch_size = batch_size
        num_points = points.shape[0]
        self.num_points = num_points
        dim = points.shape[1]
        self.count = 0
        #self.x = tf.constant(self.points)
        print ('Loaded in a total of %d points, the dimension is %d'%(num_points, dim))


    def next_batch(self, batch_size=128):
        self.count += 1
        count = self.count + 1
        print ('batch [%d]'%count) 
        num_points = self.num_points
        x = tf.constant(self.points)
        indices = tf.random_uniform(tf.constant([batch_size]),
                                          minval=0, maxval=num_points-1,
                                          dtype=tf.int32,
                                          seed=10)
        return tf.gather(x, indices), None


import numpy as np
x = np.random.random((10000, 2048)).astype('float32')

loader = GMMDataLoader(x, 128)

gmm_model = tf.contrib.factorization.GMM(initial_clusters='random', num_clusters=100, random_seed=666)

gmm_model.fit(input_fn=loader.next_batch)

当我运行此代码时，出现以下错误：

Loaded in a total of 10000 points, the dimension is 2048
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp19vzg37k
WARNING:tensorflow:From /u/usr/usr/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/contrib/factorization/python/ops/gmm_ops.py:59: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /u/usr/usr/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/contrib/factorization/python/ops/gmm_ops.py:353: calling reduce_logsumexp (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /u/usr/usr/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/contrib/factorization/python/ops/gmm_ops.py:377: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /u/usr/usr/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/contrib/factorization/python/ops/gmm.py:170: get_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_global_step
2018-01-21 13:25:54.515678: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-01-21 13:25:55.440734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1062] Found device 0 with properties:
name: Tesla P100-SXM2-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.4805
pciBusID: 0000:89:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-01-21 13:25:56.339431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1062] Found device 1 with properties:
name: Tesla P100-SXM2-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.4805
pciBusID: 0000:8a:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-01-21 13:25:56.339489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1077] Device peer to peer matrix
2018-01-21 13:25:56.339527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] DMA: 0 1
2018-01-21 13:25:56.339536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1093] 0:   Y Y
2018-01-21 13:25:56.339543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1093] 1:   Y Y
2018-01-21 13:25:56.339564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1152] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:89:00.0, compute capability: 6.0)
2018-01-21 13:25:56.339574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1152] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla P100-SXM2-16GB, pci bus id: 0000:8a:00.0, compute capability: 6.0)
2018-01-21 13:28:44.093288: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x564187890370
Segmentation fault

任何人都知道如何解决它？

score 1 · Accepted Answer

这是由于在数据样本和高斯之间的距离计算中分配了过多的内存。

已提交修复程序，很快将可用。如果您愿意，可以通过在此处注释掉它来禁用 Tensorflow 图的该部分：

https://github.com/tensorflow/tensorflow/blob/3d86d8ce14989ca65a59ad4cf37f690694bf6267/tensorflow/contrib/factorization/python/ops/gmm_ops.py#L443

tensorflow - GMM.fit 分段错误

1 回答 1

Related

Reference