我正在尝试在 Python3 中实现 Mihov 和 Maurel ( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 ) 描述的最小有限状态转换器。我的程序正在运行但需要很多时间,所以我决定使用 cProfile 来确定耗时的部分。结果是这样的:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 103.844 103.844 {built-in method builtins.exec}
1 0.000 0.000 103.844 103.844 <string>:1(<module>)
1 0.095 0.095 103.843 103.843 fst.py:16(__init__)
3049 0.007 0.000 103.639 0.034 fst.py:154(find_minimized)
3049 3.388 0.001 103.427 0.034 fst.py:178(member)
2765927 5.173 0.000 99.582 0.000 {built-in method builtins.all}
52885795 54.989 0.000 94.861 0.000 fst.py:184(<genexpr>)
100430746 21.104 0.000 21.104 0.000 fst.py:217(transition)
94725124 18.815 0.000 18.815 0.000 fst.py:224(output)
2248 0.179 0.000 0.205 0.000 fst.py:190(copy_state)
3160 0.026 0.000 0.030 0.000 fst.py:251(clear_state)
2248 0.014 0.000 0.015 0.000 fst.py:140(new_state)
11888 0.011 0.000 0.013 0.000 fst.py:231(set_output)
7494 0.007 0.000 0.013 0.000 fst.py:265(left_string_division)
11593 0.007 0.000 0.009 0.000 fst.py:124(longest_common_prefix)
8498 0.007 0.000 0.009 0.000 fst.py:148(set_transition)
39099 0.006 0.000 0.006 0.000 {method 'add' of 'set' objects}
7118 0.003 0.000 0.004 0.000 fst.py:205(state_output)
50320 0.004 0.000 0.004 0.000 {built-in method builtins.len}
13660 0.003 0.000 0.003 0.000 fst.py:245(final)
9256 0.002 0.000 0.002 0.000 {method 'pop' of 'dict' objects}
3442 0.001 0.000 0.002 0.000 fst.py:238(set_final)
2248 0.001 0.000 0.001 0.000 {method 'keys' of 'dict' objects}
704 0.001 0.000 0.001 0.000 {method 'split' of 'str' objects}
3160 0.001 0.000 0.001 0.000 {method 'discard' of 'set' objects}
547 0.000 0.000 0.000 0.000 fst.py:211(set_state_output)
112 0.000 0.000 0.000 0.000 fst.py:131(new_temp_state)
112 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
所以耗时的部分似乎是以下函数(代表 Mihov & Maurel 中的引理 3):
def member(self,state) :
for test_state in self.states - set(self.temp_states) :
if (((state in self.final_states and test_state in self.final_states) or \
(state not in self.final_states and test_state not in self.final_states)) and \
(not(state in self.final_states) or (self.state_output(state) == self.state_output(test_state))) and \
(all((self.transition(state,sym) == self.transition(test_state,sym) and \
self.output(state,sym) == self.output(test_state,sym)) for sym in self.sigma))) :
return test_state
return False
if 条件的以下部分似乎是导致效率低下的原因:
(all((self.transition(state,sym) == self.transition(test_state,sym) and \
self.output(state,sym) == self.output(test_state,sym)) for sym in self.sigma))
所以我现在的问题是我如何才能更有效地寻找等效状态......