2

I would perform data masking for more than 10 tables and each tables has more than 100 columns.

I'd tried to mask data using pentaho PDI tool, but I couldn't find out how should I write mask data with it.

How should I perform data masking with Pentaho? I think one of the way is to use tool named "replace in String" but I couldn't change any string even if I tried to use it.

my question is,

  1. Is it correct way to use "replace in String" in order to do data masking.
  2. if it is correct, how should I fill the value in the respective field?

I want to replace some value with *, let's say, the value is "this is sample value" it should be "txxx xx xxxxx xxxxe" some thing like this.

screen of PDI

please help.

4

2 回答 2

2

这不是关于水壶,而是关于正则表达式。如果在此步骤中使用正则表达式,我可以确认“字符串替换”具有奇怪的不可预测的行为。官方文档中也没有对“替换字符串”步骤的解释,实际上并不多。无论如何,您可以使用 RegexEvaluation 步骤来捕获所需的部分并替换原始字符串中的内容。

但是有一种解决方法可以使它更容易

在此处输入图像描述

于 2015-09-18T08:24:35.830 回答
1

带有 str.replace 的 JavaScript 步骤

这可以通过使用 javascript-step 来完成,例如:

//variable
var str = data_to_mask;

//first letter
var first = str.match(/^[A-Za-z0-9]/);

//last letter
var last = str.match(/[A-Za-z0-9]$/);

//replace all with "x"
str = str.replace(/[A-Za-z0-9]/gi, "x");

//get the first and the last letter back
str = str.replace(/^[A-Za-z0-9]/, first);
str = str.replace(/[A-Za-z0-9]$/, last);

(我认为 Simar 的回答也很有效,也许它更优雅一点:)

于 2015-09-18T08:36:34.703 回答