我正在尝试使用自定义正则表达式信息类型标记字符串值(以表格格式传递),但是当我在表中添加多行时出现问题。如果我通过单行,它会成功标记 string_value 并返回编码的字符串。我正在为此使用python库。
出于演示目的,自定义信息类型当前设置为字符串中的任何值,并且包装密钥存在于云 KMS 中(出于安全原因,在此处将其删除)。
以下是我正在使用的配置:
# Construct FPE configuration dictionary
crypto_replace_ffx_fpe_config = {
"crypto_key": {
"kms_wrapped": {
"wrapped_key": wrapped_key,
"crypto_key_name": key_name,
}
}
}
# Add surrogate type
if surrogate_type:
crypto_replace_ffx_fpe_config["surrogate_info_type"] = {
"name": surrogate_type
}
# Construct inspect configuration dictionary
inspect_config = {
#"info_types": [{"name": info_type} for info_type in info_types],
#"min_likelihood": "VERY_UNLIKELY",
"custom_info_types": [
{
"info_type": {
"name": "custom"
},
"exclusion_type": "EXCLUSION_TYPE_UNSPECIFIED",
"likelihood": "POSSIBLE",
"regex": {
"pattern": "(?:.*)"
#"pattern": ".*"
}
}
]
}
# Construct deidentify configuration dictionary
deidentify_config = {
"info_type_transformations": {
"transformations": [
{
"primitive_transformation": {
"crypto_deterministic_config": crypto_replace_ffx_fpe_config
}
}
]
}
}
item={
"table":{
"headers":[{
"name":header
} for header in data_headers
],
"rows":[
{
"values":[
{
"string_value":"asa s.com"
}
]
}, #Issue starts when the below row is added having any value in string_value
{
"values":
[
{
"string_value":"14562@gmail.com"
}
]
}
]
}
}
# Call the API
response = dlp.deidentify_content(
parent,
inspect_config=inspect_config,
deidentify_config=deidentify_config,
item=item,
)
# Print results
return response.item.table
如果我发送一行数据,得到响应为
headers {
name: "token"
}
rows {
values {
string_value: "EMAIL_ADDRESS(XX):XXXXXXXXXXXXXXXXXXX="
}
}
当我发送多行的项目时,我得到的是我最初发送到 api 的内容:例如:
headers {
name: "token"
}
rows {
values {
string_value: "asa s.com"
}
}
rows {
values {
string_value: "14562@gmail.com"
}
}