我正在尝试进行 4 位量化并使用此示例 首先我收到以下警告:
WARNING:tensorflow:AutoGraph could not transform <bound method Default8BitQuantizeConfig.set_quantize_activations of <tensorflow_model_optimization.python.core.quantization.keras.default_8bit.default_8bit_quantize_registry.Default8BitQuantizeConfig object at 0x7fb0208015c0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: expected an indented block (<unknown>, line 14)
WARNING: AutoGraph could not transform <bound method Default8BitQuantizeConfig.set_quantize_activations of <tensorflow_model_optimization.python.core.quantization.keras.default_8bit.default_8bit_quantize_registry.Default8BitQuantizeConfig object at 0x7fb020806550>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: expected an indented block (<unknown>, line 14)
比阅读此文档后,我发现可以将我的网络量化为 4 位,但我不明白是否只有密集层或所有(如 Conv2D)有可能?
我也不明白如何使用权重,因为 numpy 只能使用 float32。
UPD:我终于弄清楚如何进行量化意识训练
LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer
class DefaultDenseQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
# Configure how to quantize weights.
def get_weights_and_quantizers(self, layer):
return [(layer.kernel, LastValueQuantizer(num_bits=4, symmetric=True, narrow_range=False, per_axis=False))]
# Configure how to quantize activations.
def get_activations_and_quantizers(self, layer):
return [(layer.activation, MovingAverageQuantizer(num_bits=4, symmetric=False, narrow_range=False, per_axis=False))]
def set_quantize_weights(self, layer, quantize_weights):
# Add this line for each item returned in `get_weights_and_quantizers`
# , in the same order
layer.kernel = quantize_weights[0]
def set_quantize_activations(self, layer, quantize_activations):
# Add this line for each item returned in `get_activations_and_quantizers`
# , in the same order.
layer.activation = quantize_activations[0]
# Configure how to quantize outputs (may be equivalent to activations).
def get_output_quantizers(self, layer):
return []
def get_config(self):
return {}
QAT_model = tfmot.quantization.keras.quantize_annotate_model( keras.Sequential([
tfmot.quantization.keras.quantize_annotate_layer( tf.keras.layers.Dense(2, activation='relu', input_shape= x_train.shape[1:]), DefaultDenseQuantizeConfig() ),
tfmot.quantization.keras.quantize_annotate_layer( tf.keras.layers.Dense(2, activation='relu'), DefaultDenseQuantizeConfig() ),
tfmot.quantization.keras.quantize_annotate_layer( tf.keras.layers.Dense(10, activation='softmax'), DefaultDenseQuantizeConfig() )
]) )
with tfmot.quantization.keras.quantize_scope(
{'DefaultDenseQuantizeConfig': DefaultDenseQuantizeConfig}):
# Use `quantize_apply` to actually make the model quantization aware.
quantized_model = tfmot.quantization.keras.quantize_apply(QAT_model)
quantized_model.summary()
quantized_model.compile(optimizer='adam', # Good default optimizer to start with
loss='sparse_categorical_crossentropy', # how will we calculate our "error." Neural network aims to minimize loss.
metrics=['accuracy']) # what to track
quantized_model.fit(x_train, y_train, epochs=3)
val_loss, val_acc = quantized_model.evaluate(x_test, y_test)
但我仍然无法理解如何访问 4 位量化权重。我使用np.array( quantized_model.get_weights() )
但当然它给了我 float32 而且量化数组中的元素数量少于原始模型中的元素数量。这怎么解释?