regex - 在 Mongo Database v3.0.11 中搜索空终止字符？

Question

我们在 MongoDB 实例中有许多字符串，其中包括以空字符结尾的字符，我们需要找出它们是哪些。知道 Mongo 使用 PCRE 正则表达式，我们发现（PCRE 正则表达式可以匹配空字符吗？）匹配空终止字符的正确语法并像这样搜索它：

db.updates_v2.find({'longDescription': /.*\x00.*/ }).count()

但是，这会返回0。我们知道其中有空终止字符，因为在迁移到 DocumentDB 期间，它拒绝接受它们。此外，我们运行了以下查询，确认这longDescription是罪魁祸首：

db.updates_v2.find().forEach(function(doc){
... for (var key in doc) {
...     if ( /.*\x00.*/.test(doc[key]) )
... print(key)
... }
... });
longDescription
longDescription
longDescription
...

我还测试了 Node 中的正则表达式（尽管是不同的正则表达式引擎）：

> test = "wot wot in the \0"
'wot wot in the \u0000'
> test2 = "wot wot in the wat"
'wot wot in the wat'
> regex = /.*\x00.*/
> test2.match(regex)
null
> test.match(regex)
[ 'wot wot in the \u0000',
  index: 0,
  input: 'wot wot in the \u0000',
  groups: undefined ]

这是从 mongodb 迁移到 aws-documentdb 时的问题，因为后者不接受字符串中的 \0 字符。

我们确实需要能够可靠地提取这些内容，以便创建一个脚本，该脚本可以删除有问题find的条目、删除空字符和update条目。有任何想法吗？

regex - 在 Mongo Database v3.0.11 中搜索空终止字符？

0 回答 0

Related

Reference