4

我有一个关于 DFA 最小化的问题。因此,我使用了众所周知的技术将正则表达式转换为 NFA,然后使用 goto / 闭包算法从中构造 DFA。现在的问题是如何最小化它?我在这里看过有关它的演讲:http ://www.youtube.com/watch?v=T9Z66NF5YRk,但我仍然无法理解这一点。什么是 DFA 最小化?这只是合并相同的状态(在相同字符上进入相同状态的状态)还是不同的东西?

所以,我从以下语法开始:

%digit = '0'..'9'
%letter = 'a'..'z' | 'A'..'Z'
%exponent = ("e" | "E") ("+" | "-")? digit+

T_INT = digit+
T_FLOAT = T_INT exponent
T_IDENTIFIER = (letter | "$" | "_") (letter | "$" | "_" | digit)*

并最终得到以下 DFA(表示为 JSON):

{
    "START": [{
        "type": "range",
        "from": 36,
        "to": 36,
        "shift": "1"
    }, {
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "2"
    }, {
        "type": "range",
        "from": 65,
        "to": 90,
        "shift": "1"
    }, {
        "type": "range",
        "from": 95,
        "to": 95,
        "shift": "1"
    }, {
        "type": "range",
        "from": 97,
        "to": 122,
        "shift": "1"
    }],
    "1": [{
        "type": "range",
        "from": 36,
        "to": 36,
        "shift": "1"
    }, {
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "1"
    }, {
        "type": "range",
        "from": 65,
        "to": 90,
        "shift": "1"
    }, {
        "type": "range",
        "from": 95,
        "to": 95,
        "shift": "1"
    }, {
        "type": "range",
        "from": 97,
        "to": 122,
        "shift": "1"
    }, {
        "shift": ["t_identifier"]
    }],
    "2": [{
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "2"
    }, {
        "type": "range",
        "from": 69,
        "to": 69,
        "shift": "3"
    }, {
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "3"
    }, {
        "shift": ["t_int"]
    }],
    "3": [{
        "type": "range",
        "from": 43,
        "to": 43,
        "shift": "5"
    }, {
        "type": "range",
        "from": 45,
        "to": 45,
        "shift": "5"
    }, {
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "4"
    }],
    "4": [{
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "4"
    }, {
        "shift": ["t_float"]
    }],
    "5": [{
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "4"
    }]
}

那么如何最小化它呢?

更新:

好的,这是我的算法。给定以下 DFA:

{
    0: [{
        from: 97,
        to: 97,
        shift: 1
    }],
    1: [{
        from: 97,
        to: 97,
        shift: 3
    }, {
        from: 98,
        to: 98,
        shift: 2
    }],
    2: [{
        from: 98,
        to: 98,
        shift: 4
    }],
    3: [{
        from: 97,
        to: 97,
        shift: 3
    }, {
        from: 98,
        to: 98,
        shift: 4
    }],
    4: [{
        from: 98,
        to: 98,
        shift: 4
    }]
}

这就是我为最小化它所做的事情:

  1. 对于每个状态(在我的示例中编号为 0、1、2、3、4),获取标识此类状态的唯一哈希(例如,对于 state0,这将是:from=97,to=97,shift=1,对于 state1,此将是:from=97,to=97,shift=3&from=98,to=98,shift=2 等等...)

  2. 比较获得的哈希值,如果我们找到两个相同的哈希值,则将其合并。在我的例子中,state2 hash 为:from=98&shift=4&to=98,state4 hash 为:from=98&shift=4&to=98,它们是相同的,所以我们可以将它们合并到新的 state5 中,之后 DFA 将看起来像这样:

    {
    0: [{
        from: 97,
        to: 97,
        shift: 1
    }],
    1: [{
        from: 97,
        to: 97,
        shift: 3
    }, {
        from: 98,
        to: 98,
        shift: 5
    }],
    3: [{
        from: 97,
        to: 97,
        shift: 3
    }, {
        from: 98,
        to: 98,
        shift: 5
    }],
    5: [{
        from: 98,
        to: 98,
        shift: 5
    }]
    

    }

  3. 继续这个,直到我们没有任何变化,所以下一步(合并状态 1 和 3)会将 DFA 转换为以下形式:

    {
    0: [{
        from: 97,
        to: 97,
        shift: 6
    }],
    6: [{
        from: 97,
        to: 97,
        shift: 6
    }, {
        from: 98,
        to: 98,
        shift: 5
    }],
    5: [{
        from: 98,
        to: 98,
        shift: 5
    }]
    

    }

  4. 没有更多相同的状态,这意味着我们完成了。

第二次更新:

好的,给定以下正则表达式: 'a' ('ce')* ('d' | 'xa' | 'AFe')+ | 'b' ('ce')* ('d' | 'xa' | 'AFe')+ 'ce'+ 我有以下 DFA (START -> start state, ["accept"] -> so to说转换到接受状态):

{
    "START": [{
        "type": "range",
        "from": 98,
        "to": 98,
        "shift": "1.2"
    }, {
        "type": "range",
        "from": 97,
        "to": 97,
        "shift": "17.18"
    }],
    "1.2": [{
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "10"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "6.7"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "8"
    }, {
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "4"
    }],
    "10": [{
        "type": "range",
        "from": 97,
        "to": 97,
        "shift": "6.7"
    }],
    "6.7": [{
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "15"
    }, {
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "13"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "6.7"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "11"
    }],
    "15": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "14.accept"
    }],
    "14.accept": [{
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "16"
    }, {
        "shift": ["accept"]
    }],
    "16": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "14.accept"
    }],
    "13": [{
        "type": "range",
        "from": 97,
        "to": 97,
        "shift": "6.7"
    }],
    "11": [{
        "type": "range",
        "from": 70,
        "to": 70,
        "shift": "12"
    }],
    "12": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "6.7"
    }],
    "8": [{
        "type": "range",
        "from": 70,
        "to": 70,
        "shift": "9"
    }],
    "9": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "6.7"
    }],
    "4": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "2.3"
    }],
    "2.3": [{
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "10"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "6.7"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "8"
    }, {
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "5"
    }],
    "5": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "2.3"
    }],
    "17.18": [{
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "25"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "22.accept"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "23"
    }, {
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "20"
    }],
    "25": [{
        "type": "range",
        "from": 97,
        "to": 97,
        "shift": "22.accept"
    }],
    "22.accept": [{
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "28"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "22.accept"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "26"
    }, {
        "shift": ["accept"]
    }],
    "28": [{
        "type": "range",
        "from": 97,
        "to": 97,
        "shift": "22.accept"
    }],
    "26": [{
        "type": "range",
        "from": 70,
        "to": 70,
        "shift": "27"
    }],
    "27": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "22.accept"
    }],
    "23": [{
        "type": "range",
        "from": 70,
        "to": 70,
        "shift": "24"
    }],
    "24": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "22.accept"
    }],
    "20": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "18.19"
    }],
    "18.19": [{
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "25"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "22.accept"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "23"
    }, {
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "21"
    }],
    "21": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "18.19"
    }]
}

故事是一样的,我如何最小化它?如果我遵循经典的 Hopcroft 算法来构建所有这些表,确定不可区分的状态,将它们合并在一起等等,那么我将获得包含 15 个状态的 DFA(使用此工具:http ://regexvisualizer.apphb.com/使用这个正则表达式 a(ce) (d|xa|AFe)+|b(ce) (d|xa|AFe)+ce+ 来检查)。以下是使用 Hopcroft 算法缩小后 DFA 的样子:

Hopcroft 的最小化 DFA

我想出的算法,在“重新思考” Hopcroft 的算法之后,构建的 DFA 比您在上面看到的要小(对不起图像质量,我不得不一步一步地重新绘制它以了解为什么它更小):

我的算法

这就是它的工作原理,关于“状态等价”的决定是基于给定状态(例如“START”)的哈希函数的结果,如果我们从该状态开始,可以构建可以从 DFA 构造的短字符串. 给定上面的 DFA 和 START 状态,我们可以构造以下字符串:98->120, 98->100, 98->65, 98->99, 97->120, 97->100, 97->65 , 97->99 所以让它成为 START 状态的哈希函数的结果。如果我们为 DFA 中的每个状态运行此函数,我们将看到对于某些状态,此函数为我们提供相同的结果(“1.2”、“6.7”、“2.3”和“10”、“13”和“15” ,“16”和“11”,“8”,“26”,“23”和“12”,“9”,“4”,

我看到我在某个地方错了,但不明白我的算法产生的最小化 DFA 有什么问题?

4

3 回答 3

5

您提出的算法没有完全最小化,因为它没有检测到行为相同的复杂结构。要理解这一点,请查看此 DFA(由JFLAP绘制):

在此处输入图像描述

最小化将结合 q1 和 q2,但概述的算法无法做到。

与此相反,Hopcroft 的算法最初会这样划分:

   {q0, q1, q2}, {q3}

然后拆分第一组,因为 q0 没有到 q3 的转换:

   {q0}, {q1, q2}, {q3}

并且不会进一步拆分,因为 q1 和 q2 的行为相同。

于 2012-06-21T11:32:54.853 回答
4

让您的原始 DFA 称为 M1。简单来说,构造一个最小化的 DFA(称为 M2)意味着将其转换为包含最少状态数的 DFA。因此,M2 中的状态数将少于 M1 中的状态数。这里需要注意的重要一点是 M1 和 M2 必须是等价的,这意味着它们必须定义相同的正则语言。构建最小化的 DFA 不仅涉及寻找相同的状态,还涉及以下内容:

  1. 移除“不可到达”状态:
    这涉及从 DFA 的初始状态移除不可到达的状态,对于任何输入字符串。

  2. 去除“死”或“陷阱”状态:
    这涉及去除终止于自身的非接受状态。它们也被称为 TRAP 状态。

  3. 删除“不可区分”状态:
    这涉及删除对于任何输入字符串无法相互区分的状态。

还有一些流行的算法用于 DFA 最小化:

可能值得通过这些算法!

于 2012-06-21T09:14:23.663 回答
1

鉴于您有将 NFA 确定为 DFA 的代码,最小化它的最简单解决方案是 Brzozowski 算法。您将需要实现一个函数来反转 NFA,但这相当简单。(反转转换,交换开始和接受状态)

一旦你有一个确定和反向函数,Brzozowski 最小化被实现为:

minimize(nfa) = determinize(reverse(determinize(reverse(nfa)))) 

恕我直言,这是一个非常优雅的解决方案

于 2013-06-26T19:40:22.183 回答