3

我正在尝试进行 4 位量化并使用此示例 首先我收到以下警告:

WARNING:tensorflow:AutoGraph could not transform <bound method Default8BitQuantizeConfig.set_quantize_activations of <tensorflow_model_optimization.python.core.quantization.keras.default_8bit.default_8bit_quantize_registry.Default8BitQuantizeConfig object at 0x7fb0208015c0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: expected an indented block (<unknown>, line 14)
WARNING: AutoGraph could not transform <bound method Default8BitQuantizeConfig.set_quantize_activations of <tensorflow_model_optimization.python.core.quantization.keras.default_8bit.default_8bit_quantize_registry.Default8BitQuantizeConfig object at 0x7fb020806550>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: expected an indented block (<unknown>, line 14)

比阅读此文档后,我发现可以将我的网络量化为 4 位,但我不明白是否只有密集层或所有(如 Conv2D)有可能?

我也不明白如何使用权重,因为 numpy 只能使用 float32。

UPD:我终于弄清楚如何进行量化意识训练

LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer

class DefaultDenseQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
      return [(layer.kernel, LastValueQuantizer(num_bits=4, symmetric=True, narrow_range=False, per_axis=False))]

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
      return [(layer.activation, MovingAverageQuantizer(num_bits=4, symmetric=False, narrow_range=False, per_axis=False))]

    def set_quantize_weights(self, layer, quantize_weights):
      # Add this line for each item returned in `get_weights_and_quantizers`
      # , in the same order
      layer.kernel = quantize_weights[0]

    def set_quantize_activations(self, layer, quantize_activations):
      # Add this line for each item returned in `get_activations_and_quantizers`
      # , in the same order.
      layer.activation = quantize_activations[0]

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
      return []

    def get_config(self):
      return {}

QAT_model = tfmot.quantization.keras.quantize_annotate_model( keras.Sequential([
    tfmot.quantization.keras.quantize_annotate_layer( tf.keras.layers.Dense(2, activation='relu', input_shape= x_train.shape[1:]), DefaultDenseQuantizeConfig() ),
    tfmot.quantization.keras.quantize_annotate_layer( tf.keras.layers.Dense(2, activation='relu'), DefaultDenseQuantizeConfig() ),
    tfmot.quantization.keras.quantize_annotate_layer( tf.keras.layers.Dense(10, activation='softmax'), DefaultDenseQuantizeConfig() )
]) )

with tfmot.quantization.keras.quantize_scope(
  {'DefaultDenseQuantizeConfig': DefaultDenseQuantizeConfig}):
  # Use `quantize_apply` to actually make the model quantization aware.
  quantized_model = tfmot.quantization.keras.quantize_apply(QAT_model)

quantized_model.summary()

quantized_model.compile(optimizer='adam',  # Good default optimizer to start with
              loss='sparse_categorical_crossentropy',  # how will we calculate our "error." Neural network aims to minimize loss.
              metrics=['accuracy'])  # what to track

quantized_model.fit(x_train, y_train, epochs=3)

val_loss, val_acc = quantized_model.evaluate(x_test, y_test)

但我仍然无法理解如何访问 4 位量化权重。我使用np.array( quantized_model.get_weights() )但当然它给了我 float32 而且量化数组中的元素数量少于原始模型中的元素数量。这怎么解释?

4

0 回答 0