lisp - LISP 中的 NFA 识别器

Question

我必须在lisp中定义一个函数，给定一个正则表达式和一个 e-NFA 作为输入，如果该表达式被自动机接受，它将返回 true。

首先，我定义了一个函数，它使用这些运算符 {|、*、+、...} 从正则表达式生成 e-NFA（作为 cons-cell）。

例如：使用表达式（或 ab），输出将是：

((INITIAL 0) (DELTA 0 EPSILON 1) (DELTA 0 EPSILON 3) (DELTA 2 EPSILON 5) (DELTA 4 EPSILON 5) (DELTA 1 A 2) (DELTA 3 B 4) (FINAL 5))

我想出了这个想法：我编写了函数识别或（它处理或案例）：

(defun nfa-recognize-or (fa input init final)
 (let ((arc (cdr (car fa))))
  (cond ((and (equal (first arc) init) (equal (second arc) 'epsilon))
         (nfa-recognize-or (cdr fa) input (third arc) final))
        ((not (equal (first arc) init))
         (nfa-recognize-or (cdr fa) input init final))
        ((and (equal (first arc) init) (equal (second arc) (car input)))
         (nfa-recognize-or fa (cdr input) (third arc) final))
        ((equal (third arc) final)
         T)
  )
 )
)

如果我这样调用函数：

(nfa-recognize-or (cdr fa) '(a) 0 5)

它返回“堆栈溢出”。问题是，经过一些调用，fa 的值 =

(DELTA 1 A 2) (DELTA 3 B 4) (FINAL 5))

与以前一样 init = 2 和 final = 5。此时，程序应该考虑的下一个状态应该是

(DELTA 2 EPSILON 5)

为了返回 TRUE，但这是不可能的，因为此时 NFA 已“消耗”，我无法回溯它以验证其他状态。

你有什么建议吗？

score 2 · Accepted Answer

I'll start at a high level and work down into details. I won't give a complete answer since it is not short but hopefully it will be enough to set you on the right path.

given a regular expression and an e-NFA as input, it returns true if the expression is accepted by the automata.

For simplicity, I will assume that the regular expression is given in a form that is compatible with a string of characters from the alphabet E plus the usual (, ), +, *. I also assume the e-NFA is given in a form which contains all the necessary information to constitute all the information you'd expect in the formal definition of an e-NFA. The difficulty of translating between the actual formats and my preferred formats will not be considered.

I always recommend figuring out how you'd solve a problem by hand before you write any code to do it. How would you solve this problem on a test? How can you tell if an e-NFA accepts a regular expression? More fundamentally, what is a reasonable interpretation of that requirement? My interpretation is as follows:

An e-NFA M accepts a regular expression r if L(r) is a subset of L(M).

In other words, if w is a string matched by r, it should be accepted by M. This first step is important because it casts the problem into one that we can begin to solve formally. We need to see if the language of one thing is a subset of the language of another. There is no straightforward formal machinery to answer this question directly for regular expressions of which I'm aware. However, there is a well-known construction to answer questions like this with finite automata: I refer to this construction as the Cartesian Product Machine construction, and it is used to produce a finite automaton which accepts a language based on the languages of two other finite automata. In particular: if L \ R = 0 (where 0 is the empty set and \ is set difference), then L is a subset of R. The Cartesian Product Machine construction allows us to construct a machine for L \ R given machines for L and R. We already have one for M; it is given. All we need is a machine for L(r) and we are ready to proceed with the deterministic algorithm for producting a machine for the difference of languages. Then, all that remains is checking whether the resulting language is empty. For details of the Cartesian Product Machine construction, see my other answer here. Don't worry that you will have NFAs; the construction works the same as for DFAs as noted here.

Given a regular expression r, there is an algorithm for producing an NFA which accepts L(r). I don't have a ready link for this so I'll gloss over it. Basically, we define some recursive rules for constructing e-NFAs based on each term of a regular expression. Here are the rules:

1. Alphabet symbol a: M(a)

--->()-a->(O)


2. Concatenation rs: M(rs)

--->[M(r)]-e->[M(s)]

* Note: one -e-> from each accepting state of M(r) to initial state of M(s)
        initial state is that of M(r)
        accepting states are those of M(s)


3. M(r+s):

-->()-e->[M(r)]
    \-e->[M(s)]

* Note: new initial state added
        accepting states are those of M(r) and M(s)

4. M(r*):

     /--e--\
    V       \
--->()-e->[M(r)]

* Note: one -e-> from each accepting state of M(r) to initial state
        new initial state
        accepting state is only the new initial state

Now we have an NFA for your regular expression and we know how to construct the Cartesian Product Machine for the difference. What we end up with is a big e-NFA representing the difference of L(r) and L(M). We have already stated that the difference of these languages is empty iff L(r) is a subset of L(M), and L(r) is a subset of L(M) iff M accepts r. The only question that's left is this: is the language of our Cartesian product machine empty or not?

There are lots of ways to answer this question, but the most direct would be simply to perform a graph search algorithm starting at the initial state to see if any of the accepting states are reachable. If a graph search shows that a state is reachable, then there are some strings in the language. If none of the states reachable from the initial state is accepting, the language is empty. Any search algorithm for directed graphs - depth-first, breadth-first, etc. - will do fine. Note that the graph is not acyclic so don't use any methods that require acyclic graphs.

score 0 · Accepted Answer

我想我已经弄清楚了你想要做什么。

编辑：我以为您正在尝试将正则表达式转换为 e-NFA，但似乎您想做其他事情。无论如何，我会暂时离开这个。

让我们尝试在创建一个re->enfa采用以下参数的函数方面取得一些进展：

某种 Lisp 格式的正则表达式
一个状态的符号，它将有一个 epsilon 转换到对应于这个正则表达式的 sub-e-NFA
对应离开状态的符号

我们将使用符号来识别状态，以便我们可以轻松地提出唯一标识符。我们将编写一些小函数以防我们改变主意

(defun new-state () (gensym))
(defun transition (from to on) `((delta ,from ,on ,to)))
(defun nfa-union (&rest args) (apply #'append args))

现在让我们看一下我们将在函数中放入的两种情况。一个用于连接两个正则表达式，一个用于交替。

(defun concat-nfa (a b s0 sn)
  (let ((m (new-state)))
    (nfa-union (re->enfa a s0 m) (re->enfa a m sn))))
(defun or-nfa (a b s0 sn)
  (let ((s1 (new-state))
        (s2 (new-state))
        (sk (new-state))
        (sl (new-state)))
    (nfa-union (transition s0 s1 'epsilon)
               (transition s0 s2 'epsilon)
               (transition sk sn 'epsilon)
               (transition sl sn 'epsilon)
               (re->enfa a s1 sk)
               (re->enfa b s2 sl))))

然后您可以定义re->enfa可能的样子：

(defun re->enfa (x s0 sn)
  (if (atom x) (transition s0 sn x)
    (case (car x)
      (or (if (cddr x) (or-nfa (cadr x) (cons 'or (cddr x) s0 sn)
              (re->enfa (cadr x) s0 sn)))
      (concat  (if (cddr x) (concat-nfa (cadr x) (cons 'concat (cddr x) s0 sn)
              (re->enfa (cadr x) s0 sn)))
      (* (kleene-nfa (cadr x) s0 sn))
      (+ (re->nfa `(concat ,(cadr x) (* ,(cadr x))))))))

这应该会给你一个起点，还有一些事情需要填写。你还可以做一些改变。例如，您可能希望以+只需要计算一次内部表达式的 e-NFA 的方式实现。

您还需要添加一个函数，该函数添加初始和最终状态的标记并 wraps re->enfa。

lisp - LISP 中的 NFA 识别器

2 回答 2

Related

Reference