google-bigquery - 错误：无法解析正则表达式“”：模式太大 - 编译失败

Question

我发现以下现象：

我有一个 BQ 查询，其中包含使用 REGEXP_EXTRACT 函数提取的 100 个字段。

我添加了一个新表达式并得到以下错误：无法解析正则表达式“”：模式太大 - 编译失败。

单独查询这个表达式时，一切运行正常，在更大的查询中，我得到了错误。

这是基于 github 示例数据和简单正则表达式的问题基础的副本：

    SELECT repository.description,
    REGEXP_EXTRACT(repository.description,r'(?:\w){0}(\w)') as Pos1,
    REGEXP_EXTRACT(repository.description,r'(?:\w){1}(\w)') as Pos2,
    REGEXP_EXTRACT(repository.description,r'(?:\w){2}(\w)') as Pos3,
.
. here it goes on and on in the same pattern
.
    REGEXP_EXTRACT(repository.description,r'(?:\w){198}(\w)') as Pos199,
    REGEXP_EXTRACT(repository.description,r'(?:\w){199}(\w)') as Pos200,
    REGEXP_EXTRACT(repository.description,r'(?:\w){200}(\w)') as Pos201,
    FROM [publicdata:samples.github_nested] LIMIT 1000

它返回：

Failed to parse regular expression "(?:\w){162}(\w)": pattern too large - compile failed

但是在运行时：

SELECT repository.description,
REGEXP_EXTRACT(repository.description,r'(?:\w){162}(\w)') as Pos163,
FROM [publicdata:samples.github_nested] LIMIT 1000

一切运行正常...

可以在单个查询中使用的 REGEXP_EXTRACT 数量或其组合复杂性是否有限制？

score 0 · Accepted Answer

我会调查一下这个问题。作为一种解决方法，看起来您要做的是将字段拆分为每个字符位置的单独字段......所以将“abc”转换为 {pos1:“a”, pos2:“b”, pos3: “C”}。那是对的吗？如果是这样，您可能想尝试 LEFT() 和 RIGHT() 函数。如在

LEFT(1, reponsitory.description) as pos1,
RIGHT(1, LEFT(2, reponsitory.description)) as pos2,
RIGHT(1, LEFT(3, reponsitory.description)) as pos3.

这应该比编译 200 个正则表达式使用更少的资源（尽管它仍然不太可能很快）。

google-bigquery - 错误：无法解析正则表达式“”：模式太大 - 编译失败

1 回答 1

Related

Reference