0

我正在尝试在 Python3 中实现 Mihov 和 Maurel ( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 ) 描述的最小有限状态转换器。我的程序正在运行但需要很多时间,所以我决定使用 cProfile 来确定耗时的部分。结果是这样的:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  103.844  103.844 {built-in method builtins.exec}
        1    0.000    0.000  103.844  103.844 <string>:1(<module>)
        1    0.095    0.095  103.843  103.843 fst.py:16(__init__)
     3049    0.007    0.000  103.639    0.034 fst.py:154(find_minimized)
     3049    3.388    0.001  103.427    0.034 fst.py:178(member)
  2765927    5.173    0.000   99.582    0.000 {built-in method builtins.all}
 52885795   54.989    0.000   94.861    0.000 fst.py:184(<genexpr>)
100430746   21.104    0.000   21.104    0.000 fst.py:217(transition)
 94725124   18.815    0.000   18.815    0.000 fst.py:224(output)
     2248    0.179    0.000    0.205    0.000 fst.py:190(copy_state)
     3160    0.026    0.000    0.030    0.000 fst.py:251(clear_state)
     2248    0.014    0.000    0.015    0.000 fst.py:140(new_state)
    11888    0.011    0.000    0.013    0.000 fst.py:231(set_output)
     7494    0.007    0.000    0.013    0.000 fst.py:265(left_string_division)
    11593    0.007    0.000    0.009    0.000 fst.py:124(longest_common_prefix)
     8498    0.007    0.000    0.009    0.000 fst.py:148(set_transition)
    39099    0.006    0.000    0.006    0.000 {method 'add' of 'set' objects}
     7118    0.003    0.000    0.004    0.000 fst.py:205(state_output)
    50320    0.004    0.000    0.004    0.000 {built-in method builtins.len}
    13660    0.003    0.000    0.003    0.000 fst.py:245(final)
     9256    0.002    0.000    0.002    0.000 {method 'pop' of 'dict' objects}
     3442    0.001    0.000    0.002    0.000 fst.py:238(set_final)
     2248    0.001    0.000    0.001    0.000 {method 'keys' of 'dict' objects}
      704    0.001    0.000    0.001    0.000 {method 'split' of 'str' objects}
     3160    0.001    0.000    0.001    0.000 {method 'discard' of 'set' objects}
      547    0.000    0.000    0.000    0.000 fst.py:211(set_state_output)
      112    0.000    0.000    0.000    0.000 fst.py:131(new_temp_state)
      112    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

所以耗时的部分似乎是以下函数(代表 Mihov & Maurel 中的引理 3):

def member(self,state) :
    for test_state in self.states - set(self.temp_states) :
        if (((state in self.final_states and test_state in self.final_states) or \
            (state not in self.final_states and test_state not in self.final_states)) and \
            (not(state in self.final_states) or (self.state_output(state) == self.state_output(test_state))) and \
            (all((self.transition(state,sym) == self.transition(test_state,sym) and \
            self.output(state,sym) == self.output(test_state,sym)) for sym in self.sigma))) :
            return test_state
    return False

if 条件的以下部分似乎是导致效率低下的原因:

(all((self.transition(state,sym) == self.transition(test_state,sym) and \
self.output(state,sym) == self.output(test_state,sym)) for sym in self.sigma))

所以我现在的问题是我如何才能更有效地寻找等效状态......

4

0 回答 0