1

我试图了解使用 botot3 comprehend 的 detect_pii_entities 和 contains_pii_entities 函数之间的区别。我尝试使用以下代码段:

str_text = """
Hello Zhang Wei, I am John. Your AnyCompany Financial Services, LLC credit card account 1111-0000-1111-0008 has a minimum payment of $24.53 that is due by July 31st. Based on your autopay settings, we will withdraw your payment on the due date from your bank account number XXXXXX1111 with the routing number XXXXX0000. 

Your latest statement was mailed to 100 Main Street, Any City, WA 98121. 
After your payment is received, you will receive a confirmation text message at 206-555-0100. 
If you have questions about your bill, AnyCompany Customer Service is available by phone at 206-555-0199 or email at support@anycompany.com.
"""

client = boto3.client('comprehend')
detect_pii = client.detect_pii_entities(
             Text=str_text,
             LanguageCode='en'
         )
print("detect pii: ", detect_pii)
contains_pii = client.detect_pii_entities(
             Text=str_text,
             LanguageCode='en'
         )
print("contains pii: ", contains_pii)

我得到的输出是:

detect_pii:  {'Entities': [{'Score': 0.9996908903121948, 'Type': 'NAME', 'BeginOffset': 52, 'EndOffset': 61}, {'Score': 0.9999550580978394, 'Type': 'NAME', 'BeginOffset': 68, 'EndOffset': 72}, {'Score': 0.9627901911735535, 'Type': 'CREDIT_DEBIT_NUMBER', 'BeginOffset': 134, 'EndOffset': 153}, {'Score': 0.9714980125427246, 'Type': 'DATE_TIME', 'BeginOffset': 201, 'EndOffset': 210}, {'Score': 0.9999960660934448, 'Type': 'BANK_ACCOUNT_NUMBER', 'BeginOffset': 320, 'EndOffset': 330}, {'Score': 0.999988317489624, 'Type': 'BANK_ROUTING', 'BeginOffset': 355, 'EndOffset': 364}, {'Score': 0.9999522566795349, 'Type': 'ADDRESS', 'BeginOffset': 406, 'EndOffset': 441}, {'Score': 0.9999591112136841, 'Type': 'PHONE', 'BeginOffset': 525, 'EndOffset': 537}, {'Score': 0.999980092048645, 'Type': 'PHONE', 'BeginOffset': 633, 'EndOffset': 645}, {'Score': 0.9995272159576416, 'Type': 'EMAIL', 'BeginOffset': 658, 'EndOffset': 680}], 'ResponseMetadata': {'RequestId': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'content-type': 'application/x-amz-json-1.1', 'content-length': '827', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}

contains_pii: {'Labels': [{'Name': 'DATE_TIME', 'Score': 0.9986850023269653}, {'Name': 'EMAIL', 'Score': 0.9985549449920654}, {'Name': 'BANK_ACCOUNT_NUMBER', 'Score': 0.8221991658210754}, {'Name': 'BANK_ROUTING', 'Score': 0.6654205918312073}, {'Name': 'CREDIT_DEBIT_NUMBER', 'Score': 1.0}, {'Name': 'PHONE', 'Score': 1.0}], 'ResponseMetadata': {'RequestId': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'content-type': 'application/x-amz-json-1.1', 'content-length': '285', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}

我看到在第二种情况下,名称和地址丢失了,可能还有一些 PII 标签。我如何使用包含。文档建议名称和地址应该可用,并且控制台上的 Comprehend API 会返回所有 PII 标签。

AWS 控制台上的输出:

{
    "Labels": [
        {
            "Name": "EMAIL",
            "Score": 1
        },
        {
            "Name": "DATE_TIME",
            "Score": 1
        },
        {
            "Name": "NAME",
            "Score": 0.8311530351638794
        },
        {
            "Name": "BANK_ROUTING",
            "Score": 0.7879412174224854
        },
        {
            "Name": "ADDRESS",
            "Score": 0.6723417043685913
        },
        {
            "Name": "BANK_ACCOUNT_NUMBER",
            "Score": 0.6297846436500549
        },
        {
            "Name": "CREDIT_DEBIT_NUMBER",
            "Score": 1
        },
        {
            "Name": "PHONE",
            "Score": 1
        }
    ]
}

不确定我在使用 boto3 包时缺少什么。使用的 boto3 版本:1.18.12

4

0 回答 0