parsing - 解析文本时在 DCG 中处理状态/上下文

Question

如何在解析文本时传递状态（并在需要时更改它）！？

https://www.metalevel.at/prolog/dcg

该示例正在计数..

不知道我应该如何通过初始状态。我必须将其作为调用参数还是作为列表中的第一个元素？每当我想使用 State 时，我是否必须将 state(S) 作为首要目标？

======

假设我必须解析字符串“添加项目 5”。

如果状态是 "list" ，假设它应该打印 "5=>list"

如果“设置”然后“5 =>设置”

或者可能是更复杂的字符串“new list.add 5”......其中解析“new list”应该设置State=list，以便解析下一部分知道上下文并且解释是“add 5 to a list ” 与“将 5 加到一组中”。

无需使用这些示例，我只是想弄清楚何时使用半上下文表示法以及如何将上下文作为参数传递给 first/top 规则。

如您所见，我很困惑，但这是我在互联网上可以找到的唯一示例，并且序言文档一如既往地密集，简洁且不是很有帮助；（没有示例。

澄清：

sent([X,Y],Ctx) -->  make(X,Ctx), add(Y,Ctx).
make(V,Ctx) --> [make,V], {say(">make ~w |ctx: ~w", [V,Ctx])}.
add(V,_Ctx) --> [add,V], {say(">add ~w", [V])}.

在这个例子中，我在每个级别传递上下文，即使我没有在 add() 中使用它。我可以为 add() 删除它，但如果有子规则我必须保留它。

我知道使用状态线程只需要在顶级规则中声明 Ctx 并且无论何时使用它

score 5 · Accepted Answer

DCG 是一种解析方法，由于它们与宿主语言（当然是 Prolog）的紧密和轻量级集成而引起了一些兴趣。确实如此轻量级，以至于 DCG 子句中包含的解析实际上没有任何特定的内容，只有差异列表使（相当）有效的模式匹配成为可能。

如果使用 DCG 进行解析，则无法在不同子句之间线程化状态，因为差异列表用于匹配标记。

因此，使用传统方式，在参数中手动传递状态，或者切换到扩展 DCG - 例如，pack( edcg )。

score 1 · Accepted Answer

也许这个例子有帮助。我暂时不考虑半上下文技巧，作为稍后的讨论。

我们想计算一个原子中出现和a出现的次数。任何其他字符都归为一个类别（但也算在内）。bcdropped

因此，“状态”是a, b, c,的当前计数dropped。状态应使用映射来实现，在这种情况下是 SWI-Prolog 字典。

三种方法：

和foldl/4
和phrase/3
使用原版 Prolog

使用以下代码：

?-
count_with_foldl(abgdbabba,DictWithCounts).
DictWithCounts = counts{a:3,b:4,dropped:2}.

?- 
count_with_phrase(abgdbabba,DictWithCounts).
DictWithCounts = counts{a:3,b:4,dropped:2} ;
false.

?- 
count_with_recursion(abgdbabba,DictWithCounts).
DictWithCounts = counts{a:3,b:4,dropped:2}.

代码：

% ===
% Morph DictIn to DictOut so that:
% Only for Keys [a,b,c]:
% If Key exists in DictIn, DictOut is DictIn with the count for Key incremented
% If Key notexists in DictIn, DictOut is DictIn with a new entry Key with count 1
% ===

inc_for_key(Key,DictIn,DictOut) :-
   memberchk(Key,[a,b,c]),
   !,
   add_it(Key,DictIn,DictOut).
   
inc_for_key(Key,DictIn,DictOut) :-
   \+memberchk(Key,[a,b,c]),
   add_it(dropped,DictIn,DictOut).

add_it(Key,DictIn,DictOut) :-
   (get_dict(Key,DictIn,Count) -> succ(Count,CountNew) ; CountNew=1),
   put_dict(Key,DictIn,CountNew,DictOut).

% ===
% Using foldl to count
% ===

count_with_foldl(Atom,DictWithCounts) :-
   atom_chars(Atom,Chars),
   foldl(inc_for_key,Chars,counts{},DictWithCounts).

% ===
% Using a DCG to count
% ===

dcg_count(Dict,Dict)      --> [].
dcg_count(DictIn,DictOut) --> [C], { inc_for_key(C,DictIn,Dict2) }, dcg_count(Dict2,DictOut).

count_with_phrase(Atom,DictWithCounts) :-
   atom_chars(Atom,Chars),
   phrase(dcg_count(counts{},DictWithCounts),Chars).

% ===
% Using standard Prolog to count
% ===

count_with_recursion(Atom,DictWithCounts) :-
   atom_chars(Atom,Chars),
   count_rec(Chars,counts{},DictWithCounts).
   
count_rec([],Dict,Dict).
count_rec([C|Cs],DictIn,DictOut) :- inc_for_key(C,DictIn,Dict2), count_rec(Cs,Dict2,DictOut).

通过plunit测试：

:- begin_tests(counting).

test("count using foldl/4, #1", true(DictWithCounts == counts{a:3,b:4,dropped:2})) :-
   count_with_foldl(abgdbabba,DictWithCounts).
   
test("count whith phrase/2, #1", [true(DictWithCounts == counts{a:3,b:4,dropped:2}),nondet]) :- 
   count_with_phrase(abgdbabba,DictWithCounts).

test("count whith recursion, #1", [true(DictWithCounts == counts{a:3,b:4,dropped:2})]) :- 
   count_with_recursion(abgdbabba,DictWithCounts).
   
:- end_tests(counting).

DCG 中的单个状态变量

在上面，DCG 在参数位置 1 处采用“输入状态”，它们发展成“输出状态”，最终在参数位置 2 处返回。

dcg_count(DictIn,DictOut)

这完全类似于函数式编程，其中不可变结构被传递给函数并由函数返回。在这里，结构通过谓词“相关”，这是另一种看待事物的方式（这种方式有时会与必须及时向前计算的机器的现实相冲突）。

请注意，上面的状态变量是基本的 - 没有变量。

如果该状态变量是较大结构的叶子上的“空单元”的名称，则单个状态变量就足够了。较大的结构“在它的叶子上生长”，一个 DCG 填充空单元格，但为下一次调用留下另一个空单元格。

例如，这是一个 DCG，它在其末尾增长了一个“开放列表” Fin。Fin始终是应由规则填充的未绑定变量：

在原子中替换tag为:：

collect(Fin) --> [],         { Fin = [] }.
collect(Fin) --> [t,a,g], !, collect(Fin2), { Fin = [':'|Fin2] }.
collect(Fin) --> [C],        collect(Fin2), { Fin = [C|Fin2] }.

collect(Atom,Collected) :-
   atom_chars(Atom,Chars),
   phrase(collect(Collected),Chars).

?- collect(xytagyx,Out).
Out = [x,y,:,y,x] ;
false.

传统上，DCG 代码编写得更紧凑（这种方式也使其能够进行尾调用优化，这一点不容忽视）：

collect([])        --> [].
collect([':'|Fin]) --> [t,a,g], !, collect(Fin).
collect([C|Fin])   --> [C],        collect(Fin).

collect(Atom,Collected) :-
   atom_chars(Atom,Chars),
   phrase(collect(Collected),Chars).

在这种情况下，每个 DCG 调用只能看到打开列表远端的空单元格，而Collectecd表示它的开始：

Collected ----> [|]        
               /   \       
              x    [|]     
                  /   \    
                 :   ~empty cell~ <--- Fin

所以，当遇到y规则

collect([C|Fin]) --> [C], collect(Fin).

只是Fin尽其所能，将列表增加 1 个列表单元，但将其保留为“打开列表”以继续附加：

Collected ----> [|]        
               /   \       
              x    [|]     
                  /   \    
                 :    [|]
                     /   \
            C ----> y   ~empty cell~ <--- Fin

规则

collect(Fin) --> [], { Fin = [] }.

将打开的列表转换为正确的列表，将空单元格设置为[]：

Collected ----> [|]        
               /   \       
              x    [|]     
                  /   \    
                 :    [|]
                     /   \
                    y    []

这当然正是DCG Primer的树示例中发生的情况：

tree_nodes(nil) --> [].
tree_nodes(node(Name, Left, Right)) -->
        tree_nodes(Left),
        [Name],
        tree_nodes(Right).

在这里，在它的叶子上增长的数据结构不是一个列表，而是一个二叉树。

parsing - 解析文本时在 DCG 中处理状态/上下文

2 回答 2

Related

Reference