我发现以下现象:
我有一个 BQ 查询,其中包含使用 REGEXP_EXTRACT 函数提取的 100 个字段。
我添加了一个新表达式并得到以下错误:无法解析正则表达式“”:模式太大 - 编译失败。
单独查询这个表达式时,一切运行正常,在更大的查询中,我得到了错误。
这是基于 github 示例数据和简单正则表达式的问题基础的副本:
SELECT repository.description,
REGEXP_EXTRACT(repository.description,r'(?:\w){0}(\w)') as Pos1,
REGEXP_EXTRACT(repository.description,r'(?:\w){1}(\w)') as Pos2,
REGEXP_EXTRACT(repository.description,r'(?:\w){2}(\w)') as Pos3,
.
. here it goes on and on in the same pattern
.
REGEXP_EXTRACT(repository.description,r'(?:\w){198}(\w)') as Pos199,
REGEXP_EXTRACT(repository.description,r'(?:\w){199}(\w)') as Pos200,
REGEXP_EXTRACT(repository.description,r'(?:\w){200}(\w)') as Pos201,
FROM [publicdata:samples.github_nested] LIMIT 1000
它返回:
Failed to parse regular expression "(?:\w){162}(\w)": pattern too large - compile failed
但是在运行时:
SELECT repository.description,
REGEXP_EXTRACT(repository.description,r'(?:\w){162}(\w)') as Pos163,
FROM [publicdata:samples.github_nested] LIMIT 1000
一切运行正常...
可以在单个查询中使用的 REGEXP_EXTRACT 数量或其组合复杂性是否有限制?