sql - 您如何对编码或加密数据进行 PostgreSQL 全文搜索？

Question

出于各种无关紧要的原因，我们在 PostgreSQL 中以加密或 base64 编码格式存储文本块。但是，我们希望能够使用 PostgreSQL 的全文搜索来查找并返回未加密/解码形式与搜索查询匹配的数据。

一个人将如何实现这一目标？我已经看到其他帖子提到了在将数据发送到数据库之前构建 tsvector 值的能力，但我希望在 Postgres 端有一些可用的东西（至少对于 base64 文本）。

score 6 · Accepted Answer

加密值

对于加密值，您不能。即使您创建了tsvector客户端，tsvector 也会包含加密文本的形式，因此大多数应用程序都无法接受它。观察：

regress=> SELECT to_tsvector('my secret password is CandyStrip3r');
               to_tsvector                
------------------------------------------
 'candystrip3r':5 'password':3 'secret':2
(1 row)

...哎呀。如果您创建该值客户端而不是使用它并不重要to_tsvector，它仍然会以明文形式保存您的密码。您可以加密 tsvector，但不能将其用于全文搜索。

当然，给定加密值：

CREATE EXTENSION pgcrypto;

regress=> SELECT encrypt( convert_to('my s3kritPassw1rd','utf-8'), '\xdeadbeef', 'aes');
                              encrypt                               
--------------------------------------------------------------------
 \x10441717bfc843677d2b76ac357a55ac5566ffe737105332552f98c2338480ff
(1 row)

你可以（但不应该）做这样的事情：

regress=> SELECT to_tsvector( convert_from(decrypt('\x10441717bfc843677d2b76ac357a55ac5566ffe737105332552f98c2338480ff', '\xdeadbeef', 'aes'), 'utf-8') );
    to_tsvector     
--------------------
 's3kritpassw1rd':2
(1 row)

...但是如果在代码显示框中向右滚动后问题不是很明显，那么您真的应该让其他人为您进行安全设计;-)

关于在不解密加密值的情况下对加密值执行操作的方法有大量研究，例如将两个加密数字加在一起以产生使用相同密钥加密的结果，因此进行加法的过程不需要解密输入的能力为了得到输出。其中一些可能适用于 fts - 但它超出了我在该领域的专业水平，并且可能非常低效和/或加密弱。

Base64 编码值

对于 base64，您decode在将其送入to_tsvector. 因为decode返回 abytea并且您知道编码数据是您需要用来convert_from解码数据库编码bytea中text的文本，例如：

regress=> SELECT encode(convert_to('some text to search','utf-8'), 'base64');
            encode            
------------------------------
 c29tZSB0ZXh0IHRvIHNlYXJjaA==
(1 row)

regress=> SELECT to_tsvector(convert_from( decode('c29tZSB0ZXh0IHRvIHNlYXJjaA==', 'base64'), getdatabaseencoding() ));
     to_tsvector     
---------------------
 'search':4 'text':2
(1 row)

In this case I've used the database encoding as the input to convert_from, but you need to make sure you use the encoding that the underlying base64 encoded text was in. Your application is responsible for getting this right. I suggest either storing the encoding in a 2nd column or ensuring that your application always encodes the text as utf-8 before applying base64 encoding.

sql - 您如何对编码或加密数据进行 PostgreSQL 全文搜索？

1 回答 1

加密值

Base64 编码值

Related

Reference