google-cloud-platform - 如何在数据泄露预防（谷歌云平台）去识别模板中使用自定义信息类型？

Question

我正在使用数据泄漏预防（GCP）开发 PII de 识别应用程序。我正在为去识别规则使用去识别模板。

问题：我无法弄清楚如何在去标识化模板中使用自定义信息类型。

这是一个示例去标识化模板：

{
  "deidentifyTemplate":{
    "displayName":"Email and id masker",
    "description":"De-identifies emails and ids with a series of asterisks.",
    "deidentifyConfig":{
      "infoTypeTransformations":{
        "transformations":[
          {
            "infoTypes":[
              {
                "name":"EMAIL_ADDRESS"
              }
            ],
            "primitiveTransformation":{
              "characterMaskConfig":{
                "maskingCharacter":"*"
              }
            }
          }
        ]
      }
    }
  }
}

在上面的示例中，它是一个 bultin 信息类型（电子邮件），并且在文档中自定义信息类型片段如下所示：

    "inspect_config":{
      "custom_info_types":[
        {
          "info_type":{
            "name":"CUSTOM_ID"
          },
          "regex":{
            "pattern":"[1-9]{2}-[1-9]{4}"
          },
          "likelihood":"POSSIBLE"
        }
      ]
  }

去识别模板的剩余文档没有有效的对象定义inspect_config，它仅在检查模板中有效。

是否可以在识别模板（infoTypeTransformations）中使用自定义信息类型？

这是其余文档的链接。

score 2 · Accepted Answer

是的，可以使用自定义信息类型。需要做的是创建一个去识别模板和一个检查模板。

然后，当您调用 API 时，您将两个模板都作为参数发送。使用 dlp 客户端库的 python，这里是一些示例伪代码

from google.cloud import dlp_v2

dlp_client = dlp_v2.DlpServiceClient()
dlp_client.deidentify_content(
    request={
        inspect_template_name = "projects/<project>/locations/global/inspectTemplates/<templateId>,
        deidentify_template_name = "projects/<project>/locations/global/deidentifyTemplates/<templateId>,
        parent = <parent>,
        item = <item>
    }
)

score 1 · Accepted Answer

我们可以使用info types在deidentification模板中使用自定义信息stored类型。

我们可以stored使用 API 调用创建信息类型，并且stored可以像内置信息类型一样引用该信息类型。

创建存储信息类型

很少的全局变量和依赖项

import google.cloud.dlp
import os

dlp = google.cloud.dlp_v2.DlpServiceClient()
default_project = os.environ['GOOGLE_PROJECT']  # project id
parent = f"projects/{default_project}"

# details of custom info types
custom_info_id = "<unique-id>" # example: IP_ADDRESS
custom_info_id_pattern = r"<regex pattern>"

创建请求负载

info_config = {

    "display_name": custom_info_id,
    "description": custom_info_id,

    "regex":
        {
            "pattern": custom_info_id_pattern
        }
}

进行 api 调用

response = dlp.create_stored_info_type(request={
    "parent": parent,
    "config": info_config,
    "stored_info_type_id": custom_info_id
})

如何引用存储的信息类型

使用需要stored_info_type_id在去标识化模板中使用：

          {
            "info_types":[
              {
                "name":"IP_ADDRESS"  # this is defined stored_info_type_id
              }
            ],
            "primitive_transformation":{
              "character_mask_config":{
                "characters_to_ignore":[
                  {
                    "characters_to_skip":"."
                  }
                ],
                "masking_character":"*"
              }
            }
          },

google-cloud-platform - 如何在数据泄露预防（谷歌云平台）去识别模板中使用自定义信息类型？

2 回答 2

Related

Reference