0

我正在尝试创建一个闪亮的应用程序,允许用户选择列来加密如果数据相同,则每行中的值在后续运行中应该始终相同。即,如果客户名称 =“John”,则在运行此过程时您总是会得到“A”,如果客户名称更改为“Jon”,您可能会得到“C”……但如果改回“John”,您会再次得到 A。这将用于“屏蔽”敏感数据以进行分析。

此外,如果有人可以通过存储稍后使用的密钥来“解密”这些列的方法......那将不胜感激。

我尝试完成此操作的简单版本(需要摘要库):

test <- data.frame(CustomerName=c("John Snow","John Snow","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Joe Farmer","Joe Farmer","Joe Farmer","Joe Farmer"),
               LoanNumber=c("12548","45878","45796","45813","45125","45216","45125","45778","45126","32548","45683"),
               LoanBalance=c("458463","5412548","458463","5412548","458463","5412548","458463","5412548","458463","5412548","2484722"),
               FarmType=c("Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy"))


test[,1] <- sapply(test[,1],digest,algo="sha1")

示例输出:

                                   CustomerName LoanNumber LoanBalance FarmType
1  5c96f777a14f201a6a9b79623d548f7ab61c7a11      12548      458463      Hay
2  5c96f777a14f201a6a9b79623d548f7ab61c7a11      45878     5412548    Dairy
3  10bf345ab114c20df2d1eedbbe7e7cd6b969db05      45796      458463     Fish
4  10bf345ab114c20df2d1eedbbe7e7cd6b969db05      45813     5412548      Hay
5  10bf345ab114c20df2d1eedbbe7e7cd6b969db05      45125      458463    Dairy
6  10bf345ab114c20df2d1eedbbe7e7cd6b969db05      45216     5412548     Fish
7  10bf345ab114c20df2d1eedbbe7e7cd6b969db05      45125      458463      Hay
8  b0db86a39b9617cef61a8986fd57af7960eec9f4      45778     5412548    Dairy
9  b0db86a39b9617cef61a8986fd57af7960eec9f4      45126      458463     Fish
10 b0db86a39b9617cef61a8986fd57af7960eec9f4      32548     5412548      Hay
11 b0db86a39b9617cef61a8986fd57af7960eec9f4      45683     2484722    Dairy

修改后的数据框(在 John 中删除了“h”):

    test <- data.frame(CustomerName=c("Jon Snow","Jon Snow","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Joe Farmer","Joe Farmer","Joe Farmer","Joe Farmer"),
           LoanNumber=c("12548","45878","45796","45813","45125","45216","45125","45778","45126","32548","45683"),
           LoanBalance=c("458463","5412548","458463","5412548","458463","5412548","458463","5412548","458463","5412548","2484722"),
           FarmType=c("Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy"))
test[,1] <- sapply(test[,1],digest,algo="sha1")

新输出:

                                   CustomerName LoanNumber LoanBalance FarmType
1  2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f      12548      458463      Hay
2  2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f      45878     5412548    Dairy
3  b0187b6ff2322fa86004d4d22cd479f3cdc345d2      45796      458463     Fish
4  b0187b6ff2322fa86004d4d22cd479f3cdc345d2      45813     5412548      Hay
5  b0187b6ff2322fa86004d4d22cd479f3cdc345d2      45125      458463    Dairy
6  b0187b6ff2322fa86004d4d22cd479f3cdc345d2      45216     5412548     Fish
7  b0187b6ff2322fa86004d4d22cd479f3cdc345d2      45125      458463      Hay
8  2127453066c45db6ba7e2f6f8c14d22796c3fd54      45778     5412548    Dairy
9  2127453066c45db6ba7e2f6f8c14d22796c3fd54      45126      458463     Fish
10 2127453066c45db6ba7e2f6f8c14d22796c3fd54      32548     5412548      Hay
11 2127453066c45db6ba7e2f6f8c14d22796c3fd54      45683     2484722    Dairy

我所期望的:

    CustomerName LoanNumber LoanBalance FarmType
1  2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f      12548      458463      Hay
2  2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f      45878     5412548    Dairy
3  10bf345ab114c20df2d1eedbbe7e7cd6b969db05      45796      458463     Fish
4  10bf345ab114c20df2d1eedbbe7e7cd6b969db05      45813     5412548      Hay
5  10bf345ab114c20df2d1eedbbe7e7cd6b969db05      45125      458463    Dairy
6  10bf345ab114c20df2d1eedbbe7e7cd6b969db05      45216     5412548     Fish
7  10bf345ab114c20df2d1eedbbe7e7cd6b969db05      45125      458463      Hay
8  b0db86a39b9617cef61a8986fd57af7960eec9f4      45778     5412548    Dairy
9  b0db86a39b9617cef61a8986fd57af7960eec9f4      45126      458463     Fish
10 b0db86a39b9617cef61a8986fd57af7960eec9f4      32548     5412548      Hay
11 b0db86a39b9617cef61a8986fd57af7960eec9f4      45683     2484722    Dairy

我是否误解了这是如何工作的?如果我将相同的逻辑应用于多个列,我会为未更改的列获得相同的值,但对于具有修改值的列,问题仍然存在。我试图对摘要函数进行矢量化,以确保我的 sapply 函数不是具有相同结果的问题。有任何想法吗?

4

1 回答 1

0

我想我已经回答了我自己的问题......当然是在我在这里发布之后:)。

摘要函数有一个带有以下文档的序列化参数:一个逻辑变量,指示是否应使用序列化(ASCII 格式)对对象进行序列化。将此设置为 FALSE 允许将给定字符串的摘要输出与已知的控制输出进行比较。它还允许使用原始向量,例如非 ASCII 序列化的输出。

将 serialize 设置为 FALSE 似乎可以解决问题,并且我得到了预期的输出。

前任:

test[,1] <- sapply(test[,1],digest,algo="sha1",serialize = FALSE)
于 2016-10-13T17:03:26.587 回答