我是 Google Cloud DLP 的新手,我运行了 POST https://dlp.googleapis.com/v2beta1/inspect/operations来扫描.parquet
Google Cloud Storage 目录中的文件,并使用cloudStorageOptions
它来保存.csv
输出。
该.parquet
文件为 53.93 M。
当我对.parquet
文件进行 API 调用时,我得到:
"processedBytes": "102308122",
"infoTypeStats": [{
"infoType": {
"name": "AMERICAN_BANKERS_CUSIP_ID"
},
"count": "1"
}, {
"infoType": {
"name": "IP_ADDRESS"
},
"count": "17"
}, {
"infoType": {
"name": "US_TOLLFREE_PHONE_NUMBER"
},
"count": "148"
}, {
"infoType": {
"name": "EMAIL_ADDRESS"
},
"count": "30"
}, {
"infoType": {
"name": "US_STATE"
},
"count": "22"
}]
当我将.parquet
文件转换.csv
为 360.58 MB 文件时。然后,如果我对.csv
文件进行 API 调用,我会得到:
"processedBytes": "377530307",
"infoTypeStats": [{
"infoType": {
"name": "CREDIT_CARD_NUMBER"
},
"count": "56546"
}, {
"infoType": {
"name": "EMAIL_ADDRESS"
},
"count": "372527"
}, {
"infoType": {
"name": "NETHERLANDS_BSN_NUMBER"
},
"count": "5"
}, {
"infoType": {
"name": "US_TOLLFREE_PHONE_NUMBER"
},
"count": "1331321"
}, {
"infoType": {
"name": "AUSTRALIA_TAX_FILE_NUMBER"
},
"count": "52269"
}, {
"infoType": {
"name": "PHONE_NUMBER"
},
"count": "28"
}, {
"infoType": {
"name": "US_DRIVERS_LICENSE_NUMBER"
},
"count": "114"
}, {
"infoType": {
"name": "US_STATE"
},
"count": "141383"
}, {
"infoType": {
"name": "KOREA_RRN"
},
"count": "56144"
}],
显然,当我扫描文件时,与在我验证所有都被检测到的文件上运行扫描相比,.parquet
并不是所有的都被检测到。infoTypes
.csv
EmailAddresses
我找不到任何关于压缩文件(如镶木地板)的文档,因此我假设 Google Cloud DLP 不提供此功能。
任何帮助将不胜感激。