google-bigquery - 如何在 Google BigQuery 中编写去标识化模板

Question

我正在尝试从 Google Cloud Services 中的 CSV 文件中识别某些列。CSV 文件包含具有 ID、FirstName、LastName、DOB 等的 10 列。我试图屏蔽 FirstName 和 LastName 字段以将它们替换为 * 字符。

我从这个链接阅读了编写去识别模板的过程。

我正在尝试使用记录转换仅屏蔽名字和姓氏字段，但是在运行作业时出现 ArrayOutOf Bounds 错误。

是否有必要我必须提及De标识模板中的所有列或仅提及我需要屏蔽的那些字段。

CSV 文件如下所示：

ID  FirstName   LastName    D_O_B   Facility    EncounterNum    EncounterDate   EncounterTime   visitNum

101 Sean    John    8/27/1968   LI  333 4/8/2016    2018-09-02 13:00:00 UTC 1
501 bla bla 7/13/1947   LI  337 3/14/2016   2018-09-03 21:05:00 UTC 67
851 Julius  Caesar  8/15/1988   LI  339 5/17/2016   2018-09-03 21:25:00 UTC 89

我使用的 Deidentfication 模板如下：

{
  "deidentifyTemplate": {
    "description": "Record transformation on Names trial",
    "deidentifyConfig": {
      "recordTransformations": {
        "fieldTransformations": [
          {
            "fields": [
              {
                "name": "FirstName"
              },
              {
                "name": "LastName"
              }
            ],
            "primitiveTransformation": {
              "characterMaskConfig": {
                "maskingCharacter": "*"
              }
            }
          }
        ]
      }
    }
  }
}

我希望输出是 BigQuery 中的一个表，其中包含屏蔽的 FirstName 和 Lastname 列。然而，我得到一个数组越界错误。

score 0 · Accepted Answer

不是 DLP API 的作用，但我尝试了以下去识别配置，它对我有用。使用Cloud DLP API的以下端点。

{
  "item": {
    "value": "My name is John Doe and I live nowhere."
  },
  "inspectConfig": {
    "includeQuote": true,
    "infoTypes": [
      {
        "name": "FIRST_NAME"
      },
      {
        "name": "LAST_NAME"
      }
    ]
  },
  "deidentifyConfig": {
    "infoTypeTransformations": {
      "transformations": [
        {
          "infoTypes": [
            {
              "name": "FIRST_NAME"
            },
            {
              "name": "LAST_NAME"
            }
          ],
          "primitiveTransformation": {
            "characterMaskConfig": {
              "maskingCharacter": "*"
            }
          }
        }
      ]
    }
  }
}

结果：

{
  "item": {
    "value": "My name is **** *** and I live nowhere."
  },
  "overview": {
    "transformedBytes": "7",
    "transformationSummaries": [
      {
        "infoType": {
          "name": "FIRST_NAME"
        },
        "transformation": {
          "characterMaskConfig": {
            "maskingCharacter": "*"
          }
        },
        "results": [
          {
            "count": "1",
            "code": "SUCCESS"
          }
        ],
        "transformedBytes": "4"
      },
      {
        "infoType": {
          "name": "LAST_NAME"
        },
        "transformation": {
          "characterMaskConfig": {
            "maskingCharacter": "*"
          }
        },
        "results": [
          {
            "count": "1",
            "code": "SUCCESS"
          }
        ],
        "transformedBytes": "3"
      }
    ]
  }
}

google-bigquery - 如何在 Google BigQuery 中编写去标识化模板

1 回答 1

Related

Reference