我正在调用 DLP API 以使用以下请求以文本形式屏蔽人名和电子邮件地址:
要求
{
"item": {
"value": "Eleanor Rigby\nPharmacist\neleanor.rigby@example.com"
},
"deidentifyConfig": {
"infoTypeTransformations": {
"transformations": [
{
"infoTypes": [ { "name": "EMAIL_ADDRESS" } ],
"primitiveTransformation": {
"characterMaskConfig": {
"maskingCharacter": "#",
"reverseOrder": false,
"charactersToIgnore": [
{
"charactersToSkip": ".@"
}
]
}
}
},
{
"infoTypes": [ { "name": "PERSON_NAME" } ],
"primitiveTransformation": {
"replaceConfig": {
"newValue": {
"stringValue": "(person)"
}
}
}
}
]
}
},
"inspectConfig": {
"infoTypes": [ { "name": "EMAIL_ADDRESS" }, { "name": "PERSON_NAME" } ]
}
}
API 调用
curl -s \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://dlp.googleapis.com/v2/projects/$PROJECT_ID/content:deidentify \
-d @gcp-dlp/input/text-request.json
回复
{
"item": {
"value": "(person)\nPharmacist\n(person)#######.#####@#######.###(person)"
},
"overview": {
"transformedBytes": "50",
"transformationSummaries": [
{
"infoType": {
"name": "EMAIL_ADDRESS"
},
"transformation": {
"characterMaskConfig": {
"maskingCharacter": "#",
"charactersToIgnore": [
{
"charactersToSkip": ".@"
}
]
}
},
"results": [
{
"count": "1",
"code": "SUCCESS"
}
],
"transformedBytes": "25"
},
{
"infoType": {
"name": "PERSON_NAME"
},
"transformation": {
"replaceConfig": {
"newValue": {
"stringValue": "(person)"
}
}
},
"results": [
{
"count": "3",
"code": "SUCCESS"
}
],
"transformedBytes": "25"
}
]
}
}
请求(仅文本)
Eleanor Rigby
Pharmacist
eleanor.rigby@example.com
响应(仅文本)
(person)
Pharmacist
(person)#######.#####@#######.###(person)
输入文本包含人名和电子邮件地址。两者都按预期检测和屏蔽。但是,(person)
在被屏蔽的电子邮件地址之前和之后会添加额外的标签。
这是一个非常简单的示例,但我在以这种方式处理的每个文档中都观察到了这种行为。
为什么多次检测到人员实体?