python - 如何通过 PySeal 库加密数据框列

Question

我正在研究完全同态加密。因为只有完全同态加密才允许对加密数据执行计算，并且这种机制由 PySeal 库提供，它是 Microsoft SEAL 库的 python 分支版本。我的数据框中有 3 列。我想使用 PySeal 加密每一列的每个值，我可以对这些值进行计算。

df

| SNP  | ID     | Effect|
|:---- |:------:| -----:|
| 21515| 1      | 0.5   |
| 21256| 2      | 0.7   |
| 21286| 3      | 1.7   |

PySeal的相关文档：https ://github.com/Lab41/PySEAL/blob/master/SEALPythonExamples/examples.py

score 1 · Accepted Answer

有趣的问题，我可以帮助您将库与 pandas 一起使用，但不能设置像模数这样的安全加密参数。

首先让我们做一些导入：

import pandas
import seal
from seal import Ciphertext, \
    Decryptor, \
    Encryptor, \
    EncryptionParameters, \
    Evaluator, \
    IntegerEncoder, \
    FractionalEncoder, \
    KeyGenerator, \
    Plaintext, \
    SEALContext

现在我们设置加密参数。我没有足够的知识来建议您如何正确设置这些值，但是正确设置这些值对于实现适当的安全性很重要。文档中的引用：

了解这些不同参数的行为方式、它们如何影响加密方案、性能和安全级别至关重要……由于本主题的复杂性，我们强烈建议用户直接咨询同态加密和 RLWE 专家-基于加密方案来确定其参数选择的安全性。

parms = EncryptionParameters()
parms.set_poly_modulus("1x^2048 + 1")
parms.set_coeff_modulus(seal.coeff_modulus_128(2048))
parms.set_plain_modulus(1 << 8)
context = SEALContext(parms)

接下来我们将设置密钥、编码器、加密器和解密器。

iEncoder = IntegerEncoder(context.plain_modulus())
fEncoder = FractionalEncoder(
    context.plain_modulus(), context.poly_modulus(), 64, 32, 3)

keygen = KeyGenerator(context)
public_key = keygen.public_key()
secret_key = keygen.secret_key()
encryptor = Encryptor(context, public_key)
evaluator = Evaluator(context)
decryptor = Decryptor(context, secret_key)

让我们设置一些方便的函数，我们将使用 DataFrames 来加密和解密。

def iencrypt(ivalue):
    iplain = iEncoder.encode(ivalue)
    out = Ciphertext()
    encryptor.encrypt(iplain, out)
    return out

def fencrypt(fvalue):
    fplain = fEncoder.encode(fvalue)
    out = Ciphertext()
    encryptor.encrypt(fplain, out)
    return out

最后，我们将定义一个可以与 pandas 一起使用的整数乘法运算。为了使这个答案简短，我们不会演示对浮点数的操作，但制作它应该不难。

def i_multiplied(multiplier):
    m_plain = iEncoder.encode(multiplier)
    out = Ciphertext()
    encryptor.encrypt(m_plain, out)
    def aux(enc_value):
        # this is an in-place operation, so there is nothing to return
        evaluator.multiply(enc_value, out)
    return aux

请注意， Evaluator.multiple 是一个就地操作，所以当我们将它与 DataFrame 一起使用时，它会改变里面的值！

现在让我们开始工作：

df = pandas.DataFrame(dict(
    SNP=[21515, 21256, 21286],
    ID=[1, 2, 3],
    Effect=[0.5, 0.7, 1.7])
)
print("Input/Plaintext Values:")
print(df.head())

这将打印您的示例：

Input/Plaintext Values:
     SNP  ID  Effect
0  21515   1     0.5
1  21256   2     0.7
2  21286   3     1.7

现在让我们制作一个加密的数据帧：

enc_df = pandas.DataFrame(dict(
    xSNP=df['SNP'].apply(iencrypt),
    xID=df['ID'].apply(iencrypt),
    xEffect=df['Effect'].apply(fencrypt))
)

print("Encrypted Values:")
print(enc_df.head())

印刷：

加密值：

_  xSNP                           
0  <seal.Ciphertext object at 0x7efcccfc2df8>  <seal.Ciphertext object a
1  <seal.Ciphertext object at 0x7efcccfc2d88>  <seal.Ciphertext object a
2  <seal.Ciphertext object at 0x7efcccfc2dc0>  <seal.Ciphertext object a

这只是 DataFrame 中的一堆对象。

现在让我们做一个手术。

# multiply in place
enc_df[['xSNP','xID']].applymap(i_multiplied(2))

print("Encrypted Post-Op Values:")
print(enc_df.head())

您不会注意到此时打印的值有差异，因为我们所做的只是改变数据框中的对象，因此它只会打印相同的内存引用。

现在让我们解密看看结果：

enc_df[['xSNP','xID']]=enc_df[['xSNP','xID']].applymap(idecrypt)

print("Decrypted Post-Op Values:")
print(enc_df[['xSNP','xID']].head())

这打印：

Decrypted Post-Op Values:
    xSNP  xID
0  43030    2
1  42512    4
2  42572    6

这是您期望将整数列乘以 2 的结果。

要实际使用它，您必须先序列化加密的数据帧，然后再发送给另一方进行处理，然后返回给您进行解密。该库强制您使用 pickle 来执行此操作。从安全的角度来看，这是不幸的，因为您永远不应该解开不受信任的数据。服务器是否可以信任客户端不会在 pickle 序列化中放入任何讨厌的东西，并且客户端是否可以信任该服务器在返回答案时不会做同样的事情？一般来说，两者的答案都是不，更多的是因为客户端已经不信任服务器，否则它不会使用同态加密！显然，这些 python 绑定更像是一个技术演示，但我认为值得指出这个限制。

库中有批处理操作，我没有演示。这些在 DataFrames 的上下文中使用可能更有意义，因为它们对于许多值的操作应该具有更好的性能。

python - 如何通过 PySeal 库加密数据框列

1 回答 1

Related

Reference