2

我正在研究此处提供的 google cloud dlp api 文档,特别是这个问题是关于deidentify_with_fpe().

我的问题是需要通过函数传递以返回匿名数据的参数的格式是什么。我现在的代码是

def deidentify_with_fpe(
    string,
    info_types,
    alphabet=1,
    project='XXXX-data-development',
    surrogate_type=None,
    key_name='projects/XXXX-data-development/locations/global/keyRings/google-dlp-test-global/cryptoKeys/google-dlp-test-key-global',
    wrapped_key=WRAPPED
):
    
    "read file in for wrapped key"
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string using Format Preserving Encryption (FPE).
    Args:
        project: The Google Cloud project id to use as a parent resource.
        item: The string to deidentify (will be treated as text).
        alphabet: The set of characters to replace sensitive ones with. For
            more information, see https://cloud.google.com/dlp/docs/reference/
            rest/v2beta2/organizations.deidentifyTemplates#ffxcommonnativealphabet
        surrogate_type: The name of the surrogate custom info type to use. Only
            necessary if you want to reverse the deidentification process. Can
            be essentially any arbitrary string, as long as it doesn't appear
            in your dataset otherwise.
        key_name: The name of the Cloud KMS key used to encrypt ('wrap') the
            AES-256 key. Example:
            key_name = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/
            keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME'
        wrapped_key: The encrypted ('wrapped') AES-256 key to use. This key
            should be encrypted using the Cloud KMS key specified by key_name.
    Returns:
        None; the response from the API is printed to the terminal.
    """
    # Import the client library
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient(credentials='/Users/callumsmyth/virtual_envs/google_dlp_test/XXXX.json')
    dlp = dlp_client.from_service_account_json('/Users/callumsmyth/virtual_envs/google_dlp_test/XXXX.json')
    
    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # The wrapped key is base64-encoded, but the library expects a binary
    # string, so decode it here.
    import base64

   # wrapped_key = base64.b64decode(wrapped_key)

    # Construct FPE configuration dictionary
    crypto_replace_ffx_fpe_config = {
        "crypto_key": {
            "kms_wrapped": {
                "wrapped_key": wrapped_key,
                "crypto_key_name": key_name,
            }
        },
        "common_alphabet": alphabet,
    }

    # Add surrogate type
    if surrogate_type:
        crypto_replace_ffx_fpe_config["surrogate_info_type"] = {
            "name": surrogate_type
        }

    # Construct inspect configuration dictionary
    inspect_config = {
        "info_types": [{"name": info_type} for info_type in info_types]
    }

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "crypto_replace_ffx_fpe_config": crypto_replace_ffx_fpe_config
                    }
                }
            ]
        }
    }

    # Convert string to item
    item = {"value": string}

    # Call the API
    response = dlp.deidentify_content(
        parent,
        inspect_config=inspect_config,
        deidentify_config=deidentify_config,
        item=item,
    )

    # Print results
    print(response.item.value)

在哪里

with open('mysecret.txt.encrypted', 'rb') as f:
    WRAPPED = f.read()

并且mysecret.txt.encrypted是由该命令在终端中生成的

--keyring google-dlp-test-global --key google-dlp-test-key-global \
--plaintext-file google-token.txt \
--ciphertext-file mysecret.txt.encrypted

当 google-token.txt 从这里生成时。

我在调用时遇到的错误deidentify_with_fpe('My name is john smith', ['FIRST_NAME'])如下:

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "Could not de-identify all content due to transformation errors. See the error details for an overview of all the transformation errors encountered."
    debug_error_string = "{"created":"@1581675678.839972000","description":"Error received from peer ipv4:216.58.213.10:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Could not de-identify all content due to transformation errors. See the error details for an overview of all the transformation errors encountered.","grpc_status":3}"

这是一个直接原因:

InvalidArgument: 400 Could not de-identify all content due to transformation errors. See the error details for an overview of all the transformation errors encountered.

所以我认为我的问题与密钥有关——在它被加密之前。我无法在文档中看到如何获取该密钥或如何将其传递到函数中。

我很欣赏这是一个漫长而冗长的提交,任何回复都将不胜感激,我花了太长时间试图做到这一点,感觉我快要让它发挥作用了

4

3 回答 3

0

错误:“google.api_core.exceptions.InvalidArgument: 400 由于转换错误,无法取消识别所有内容。有关遇到的所有转换错误的概述,请参阅错误详细信息。”</p>

当自由格式的文本去识别由于某些转换错误而失败时,这是一个一般错误。不幸的是,python 库似乎没有公开错误详细信息。

根据服务文档 [1],检测到的令牌必须至少有两个字符长:

The input value:

- Must be at least two characters long (or the empty string).
- Must be encoded as ASCII.
- Comprised of the characters specified by an "alphabet," which is the set of between 2 and 64 allowed characters in the input value. For more information, see the alphabet field in CryptoReplaceFfxFpeConfig.


[1] https://cloud.google.com/dlp/docs/transformations-reference#fpe

于 2020-03-04T23:51:01.803 回答
0

将 Alphabet 更改为以下而不是 1:

由字母表指定的字符组成。有效选项:

  • 数字
  • 十六进制
  • UPPER_CASE_ALPHA_NUMERIC
  • ALPHA_NUMERIC

输入值:

  • 必须至少有两个字符长(或空字符串)。
  • 必须由字母表指定的字符组成。字母表可以由 2 到 95 个字符组成。(95 个字符的字母表包括 US-ASCII 字符集中的所有可打印字符。)
于 2021-05-26T03:10:13.407 回答
0

如果您的输入格式为 111-222-333,那么您的自定义字母字段应为:"customAlphabet": "-0123456789"

于 2021-12-06T13:50:26.083 回答