0

我目前的任务是使用数字和有序分类数据执行二进制分类。

对于这项任务,我有效地复制了以下 Tensorflow 教程中的代码:https ://www.tensorflow.org/tutorials/structured_data/feature_columns

目前,我正在努力寻找发生以下错误的原因以及如何解决它。

错误:

ValueError: slice index 0 of dimension 0 out of bounds. for 'strided_slice' (op: 'StridedSlice') with input shapes: [0], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.

代码:

from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.preprocessing import StandardScaler, PolynomialFeatures

# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
    dataframe = dataframe.copy()
    labels = dataframe.pop('target').astype('float64')
    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
        ds = ds.batch(batch_size)
    return ds

scaler = StandardScaler()
vals = ["Low", "Med", "High"]


# Categorical data preparation 
# Creativity, Productivity, Optimism, Pessimism is ordinal categorical --> numerical
for c in features[2:]:
    if c != "Creativity":
        data[c] = pd.Categorical(data[c], categories = vals, ordered = True)
        data[c] = data[c].cat.codes / 2
        
    else:
        data[c] = pd.Categorical(data[c], categories = ["No", "Yes"], ordered = False)
        data[c] = data[c].cat.codes.astype('float64')
        
data.loc[:, ["Social", "Exercise"]] = scaler.fit_transform(X = data.loc[:, ["Social", "Exercise"]].values)

# Splitting Data
train, test = train_test_split(data, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)

# Tried reducing batch size as per a solution to a similar StackOverflow query 
batch_size = 2
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)


feature_columns = []

# numeric cols
for header in features:
      feature_columns.append(feature_column.numeric_column(header))

# This is to concatenate all of these features for each example into single vectors
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)

# Model Architecture
model = tf.keras.Sequential([
  feature_layer,
  layers.Dense(128, activation='relu'),
  layers.Dense(128, activation='relu'),
  layers.Dropout(.1),
  layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_ds,validation_data=val_ds, epochs=10)

数据

Every feature: ['Creativity', 'Productivity', 'Optimism', 'Pessimism']
A batch of Creativity: tf.Tensor([0. 0.], shape=(2,), dtype=float64)
A batch of Productivity: tf.Tensor([0.5 0.5], shape=(2,), dtype=float64)
A batch of Optimism: tf.Tensor([-0.5 -0.5], shape=(2,), dtype=float64)
A batch of Pessimism: tf.Tensor([-0.5 -0.5], shape=(2,), dtype=float64)
A batch of targets: tf.Tensor([0. 0.], shape=(2,), dtype=float64)

任何有关了解此错误发生的位置以及如何解决此问题的帮助都将非常棒!

**编辑:** 在 Google Colaboratory 中运行此程序后,运行时出现此错误model.fit()

    ValueError: Feature (key: Creativity) cannot have rank 0. Given: Tensor("sequential_2/Cast:0", shape=(), dtype=float32)
4

1 回答 1

1

所以,在做了一些研究之后:我发现这个错误是由 引起的ds = ds.batch(batch_size),它没有正确缩进。

df_to_dataset函数应编写如下(我一定是从教程中错误地复制了它):

# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
    dataframe = dataframe.copy()
    labels = dataframe.pop('target').astype('float64')
    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
    ds = ds.batch(batch_size)
    return ds
于 2020-08-10T12:54:57.720 回答