我在 Google DLP 库中看到让我感到困惑的行为,我希望得到一些澄清。我正在使用 Java 包装库,google-cloud-dlp 版本 0.34.0-beta。给定输入:
Collection<String> input = Lists.newArrayList("Jenny Tutone 2665 Agua Vista Dr Los Gatos CA 95030 (408) 867-5309 or 408.867.5309x100"
我看到了输出:
███ █ ████ or █
如果我传入与子字符串集合相同的字符串:
Collection<String> input = Lists.newArrayList("Jenny Tutone", "2665 Agua Vista Dr", "Los Gatos", "CA 95030", "(408) 867-5309", "or", "408.867.5309x100");
我看到了非常不同的结果:
███, 2665 █, █ Gatos, █ 95030, █, or, █
我正在使用InfoType
我能找到的所有类型,总共有 67 种。我在这里做错了吗?这是调用 Google DLP 库的代码的核心:
private Collection<String> redactContent(Collection<String> input,
String replacement,
Likelihood minLikelihood,
List<InfoType> infoTypes) {
// Replace select info types with chosen replacement string
final Collection<RedactContentRequest.ReplaceConfig> replaceConfigs = infoTypes.stream()
.map(it -> RedactContentRequest.ReplaceConfig.newBuilder().setInfoType(it).setReplaceWith(replacement).build())
.collect(Collectors.toCollection(LinkedList::new));
final InspectConfig inspectConfig =
InspectConfig.newBuilder()
.addAllInfoTypes(infoTypes)
.setMinLikelihood(minLikelihood)
.build();
long itemCount = 0;
try (DlpServiceClient dlpClient = DlpServiceClient.create(settings)) {
// Google's DLP library is limited to 100 items per request, so the requests need to be chunked if the
// number of input items is greater.
Stream.Builder<Stream<ContentItem>> streamBuilder = Stream.builder();
for (long processed = 0; processed < input.size(); processed += maxItemsPerRequest) {
Collection<ContentItem> items =
input.stream()
.skip(processed)
.limit(maxItemsPerRequest)
.filter(item -> item != null && !item.isEmpty())
.map(item ->
ContentItem.newBuilder()
.setType(MediaType.PLAIN_TEXT_UTF_8.toString())
.setData(ByteString.copyFrom(item.getBytes(Charset.forName("UTF-8"))))
.build()
)
.collect(Collectors.toCollection(LinkedList::new));
RedactContentRequest request = RedactContentRequest.newBuilder()
.setInspectConfig(inspectConfig)
.addAllItems(Collections.unmodifiableCollection(items))
.addAllReplaceConfigs(replaceConfigs)
.build();
RedactContentResponse contentResponse = dlpClient.redactContent(request);
itemCount += contentResponse.getItemsCount();
streamBuilder.add(contentResponse.getItemsList().stream());
}
return streamBuilder.build()
.flatMap(stream -> stream.map(item -> item.getData().toStringUtf8()))
.collect(Collectors.toCollection(LinkedList::new));
}
}