I'll start at a high level and work down into details. I won't give a complete answer since it is not short but hopefully it will be enough to set you on the right path.
given a regular expression and an e-NFA as input, it returns true if the expression is accepted by the automata.
For simplicity, I will assume that the regular expression is given in a form that is compatible with a string of characters from the alphabet E
plus the usual (, ), +, *
. I also assume the e-NFA
is given in a form which contains all the necessary information to constitute all the information you'd expect in the formal definition of an e-NFA
. The difficulty of translating between the actual formats and my preferred formats will not be considered.
I always recommend figuring out how you'd solve a problem by hand before you write any code to do it. How would you solve this problem on a test? How can you tell if an e-NFA
accepts a regular expression? More fundamentally, what is a reasonable interpretation of that requirement? My interpretation is as follows:
An e-NFA
M
accepts a regular expression r
if L(r)
is a subset of L(M)
.
In other words, if w
is a string matched by r
, it should be accepted by M
. This first step is important because it casts the problem into one that we can begin to solve formally. We need to see if the language of one thing is a subset of the language of another. There is no straightforward formal machinery to answer this question directly for regular expressions of which I'm aware. However, there is a well-known construction to answer questions like this with finite automata: I refer to this construction as the Cartesian Product Machine construction, and it is used to produce a finite automaton which accepts a language based on the languages of two other finite automata. In particular: if L \ R = 0
(where 0
is the empty set and \
is set difference), then L
is a subset of R
. The Cartesian Product Machine construction allows us to construct a machine for L \ R
given machines for L
and R
. We already have one for M
; it is given. All we need is a machine for L(r)
and we are ready to proceed with the deterministic algorithm for producting a machine for the difference of languages. Then, all that remains is checking whether the resulting language is empty. For details of the Cartesian Product Machine construction, see my other answer here. Don't worry that you will have NFA
s; the construction works the same as for DFAs as noted here.
Given a regular expression r
, there is an algorithm for producing an NFA
which accepts L(r)
. I don't have a ready link for this so I'll gloss over it. Basically, we define some recursive rules for constructing e-NFA
s based on each term of a regular expression. Here are the rules:
1. Alphabet symbol a: M(a)
--->()-a->(O)
2. Concatenation rs: M(rs)
--->[M(r)]-e->[M(s)]
* Note: one -e-> from each accepting state of M(r) to initial state of M(s)
initial state is that of M(r)
accepting states are those of M(s)
3. M(r+s):
-->()-e->[M(r)]
\-e->[M(s)]
* Note: new initial state added
accepting states are those of M(r) and M(s)
4. M(r*):
/--e--\
V \
--->()-e->[M(r)]
* Note: one -e-> from each accepting state of M(r) to initial state
new initial state
accepting state is only the new initial state
Now we have an NFA for your regular expression and we know how to construct the Cartesian Product Machine for the difference. What we end up with is a big e-NFA
representing the difference of L(r)
and L(M)
. We have already stated that the difference of these languages is empty iff L(r)
is a subset of L(M)
, and L(r)
is a subset of L(M)
iff M
accepts r
. The only question that's left is this: is the language of our Cartesian product machine empty or not?
There are lots of ways to answer this question, but the most direct would be simply to perform a graph search algorithm starting at the initial state to see if any of the accepting states are reachable. If a graph search shows that a state is reachable, then there are some strings in the language. If none of the states reachable from the initial state is accepting, the language is empty. Any search algorithm for directed graphs - depth-first, breadth-first, etc. - will do fine. Note that the graph is not acyclic so don't use any methods that require acyclic graphs.