我正在尝试创建一个闪亮的应用程序,允许用户选择列来加密如果数据相同,则每行中的值在后续运行中应该始终相同。即,如果客户名称 =“John”,则在运行此过程时您总是会得到“A”,如果客户名称更改为“Jon”,您可能会得到“C”……但如果改回“John”,您会再次得到 A。这将用于“屏蔽”敏感数据以进行分析。
此外,如果有人可以通过存储稍后使用的密钥来“解密”这些列的方法......那将不胜感激。
我尝试完成此操作的简单版本(需要摘要库):
test <- data.frame(CustomerName=c("John Snow","John Snow","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Joe Farmer","Joe Farmer","Joe Farmer","Joe Farmer"),
LoanNumber=c("12548","45878","45796","45813","45125","45216","45125","45778","45126","32548","45683"),
LoanBalance=c("458463","5412548","458463","5412548","458463","5412548","458463","5412548","458463","5412548","2484722"),
FarmType=c("Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy"))
test[,1] <- sapply(test[,1],digest,algo="sha1")
示例输出:
CustomerName LoanNumber LoanBalance FarmType
1 5c96f777a14f201a6a9b79623d548f7ab61c7a11 12548 458463 Hay
2 5c96f777a14f201a6a9b79623d548f7ab61c7a11 45878 5412548 Dairy
3 10bf345ab114c20df2d1eedbbe7e7cd6b969db05 45796 458463 Fish
4 10bf345ab114c20df2d1eedbbe7e7cd6b969db05 45813 5412548 Hay
5 10bf345ab114c20df2d1eedbbe7e7cd6b969db05 45125 458463 Dairy
6 10bf345ab114c20df2d1eedbbe7e7cd6b969db05 45216 5412548 Fish
7 10bf345ab114c20df2d1eedbbe7e7cd6b969db05 45125 458463 Hay
8 b0db86a39b9617cef61a8986fd57af7960eec9f4 45778 5412548 Dairy
9 b0db86a39b9617cef61a8986fd57af7960eec9f4 45126 458463 Fish
10 b0db86a39b9617cef61a8986fd57af7960eec9f4 32548 5412548 Hay
11 b0db86a39b9617cef61a8986fd57af7960eec9f4 45683 2484722 Dairy
修改后的数据框(在 John 中删除了“h”):
test <- data.frame(CustomerName=c("Jon Snow","Jon Snow","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Joe Farmer","Joe Farmer","Joe Farmer","Joe Farmer"),
LoanNumber=c("12548","45878","45796","45813","45125","45216","45125","45778","45126","32548","45683"),
LoanBalance=c("458463","5412548","458463","5412548","458463","5412548","458463","5412548","458463","5412548","2484722"),
FarmType=c("Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy"))
test[,1] <- sapply(test[,1],digest,algo="sha1")
新输出:
CustomerName LoanNumber LoanBalance FarmType
1 2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f 12548 458463 Hay
2 2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f 45878 5412548 Dairy
3 b0187b6ff2322fa86004d4d22cd479f3cdc345d2 45796 458463 Fish
4 b0187b6ff2322fa86004d4d22cd479f3cdc345d2 45813 5412548 Hay
5 b0187b6ff2322fa86004d4d22cd479f3cdc345d2 45125 458463 Dairy
6 b0187b6ff2322fa86004d4d22cd479f3cdc345d2 45216 5412548 Fish
7 b0187b6ff2322fa86004d4d22cd479f3cdc345d2 45125 458463 Hay
8 2127453066c45db6ba7e2f6f8c14d22796c3fd54 45778 5412548 Dairy
9 2127453066c45db6ba7e2f6f8c14d22796c3fd54 45126 458463 Fish
10 2127453066c45db6ba7e2f6f8c14d22796c3fd54 32548 5412548 Hay
11 2127453066c45db6ba7e2f6f8c14d22796c3fd54 45683 2484722 Dairy
我所期望的:
CustomerName LoanNumber LoanBalance FarmType
1 2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f 12548 458463 Hay
2 2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f 45878 5412548 Dairy
3 10bf345ab114c20df2d1eedbbe7e7cd6b969db05 45796 458463 Fish
4 10bf345ab114c20df2d1eedbbe7e7cd6b969db05 45813 5412548 Hay
5 10bf345ab114c20df2d1eedbbe7e7cd6b969db05 45125 458463 Dairy
6 10bf345ab114c20df2d1eedbbe7e7cd6b969db05 45216 5412548 Fish
7 10bf345ab114c20df2d1eedbbe7e7cd6b969db05 45125 458463 Hay
8 b0db86a39b9617cef61a8986fd57af7960eec9f4 45778 5412548 Dairy
9 b0db86a39b9617cef61a8986fd57af7960eec9f4 45126 458463 Fish
10 b0db86a39b9617cef61a8986fd57af7960eec9f4 32548 5412548 Hay
11 b0db86a39b9617cef61a8986fd57af7960eec9f4 45683 2484722 Dairy
我是否误解了这是如何工作的?如果我将相同的逻辑应用于多个列,我会为未更改的列获得相同的值,但对于具有修改值的列,问题仍然存在。我试图对摘要函数进行矢量化,以确保我的 sapply 函数不是具有相同结果的问题。有任何想法吗?