regex - 用 Prolog 编写的正则表达式解析器

Question

几个小时以来，我一直在努力解决这个家庭作业问题。我们必须用 Prolog 解析一个正则表达式。在大多数情况下，我使用的谓词都有效，但是有一些正则表达式和字符串组合会导致它们在 SWI-Prolog 中耗尽堆栈空间。这是一个包含两个正则表达式字符串组合的示例，一个有效，一个无效：

star(star(char(a))), []
star(star(char(a))), [a]

第一个有效，第二个用完堆栈。

这是我正在使用的谓词：

re_match(epsilon, []).
re_match(char(Letter), [Letter]).
re_match(star(_), []).
re_match(seq(Rx1, Rx2), List) :- append(List1, List2, List),  re_match(Rx2, List2),  re_match(Rx1, List1).
re_match(alt(Rx1, Rx2), List) :- re_match(Rx1, List); re_match(Rx2, List).
re_match(star(Rx), List) :- append(List1, List2, List), re_match(Rx, List1), re_match(star(Rx), List2).

我不确定我需要进行哪些更改才能使其正常工作，但我不确定还能做什么。

此外，将 List :- append(List1, List2, List) 更改为 [H|T] 不会对其中一个示例进行评估。

score 6 · Accepted Answer

考虑使用 DCG 符号以获得更好的可读性和更容易推断终止属性：

:- op(100, xf, *).

rexp(eps)      --> [].
rexp([T])      --> [T].
rexp(_*)       --> [].
rexp(R*)       --> rexp(R), rexp(R*).
rexp(s(R1,R2)) --> rexp(R1), rexp(R2).
rexp((R1|R2))    --> ( rexp(R1) ; rexp(R2) ).

使用 length/2 生成越来越长的列表以生成与正则表达式匹配的字符串的示例：

?- length(Ls, _), phrase(rexp(s(([a]|[b]),[c]*)), Ls).
Ls = [a] ;
Ls = [b] ;
Ls = [a, c] ;
Ls = [b, c] ;
Ls = [a, c, c] ;
etc.

score 5 · Accepted Answer

我现在无法访问 SWI Prolog，但这里有一个猜测：

尝试改变

re_match(star(Rx), List) :- append(List1, List2, List),
                            re_match(Rx, List1),
                            re_match(star(Rx), List2).

到

re_match(star(Rx), List) :- append([H|List1], List2, List),
                            re_match(Rx, [H|List1]),
                            re_match(star(Rx), List2).

当它在星结构上迭代时强制re_match“吃东西”。

regex - 用 Prolog 编写的正则表达式解析器

2 回答 2

Related

Reference