Skip to content

CIS 2620 Notes 3.1

Published: at 07:48 PM

Operations on Regular Languages

Lets take the following two languages:

A1={w w ends with a}A_1 = \{w | \text{ w ends with a}\} A2={w w contains at least one b}A_2 = \{w | \text{ w contains at least one b} \}

M_1
M_2

However, what if I gave the set A=A1A2A = A_1 \cap A_2. Is it possible to construct a DFA for AA from the DFAs for A1A_1 and A2A_2? Let us call this DFA MM. What state should MM maintain?

In some way, we want to simultaneously execute both M1M_1 and M2M_2 on its input. What if we were to store the state of M as some sort of tuple, like (state of M1M_1, state of M2M_2)? This way, we can essentially simulate two DFAs, where we have a state (q,s)(q, s), and for a processed letter, we go to the states that each DFA would have individually gone to, giving us (q,s)(q’, s’).

For the previous DFA, that would look like this:

DFA

This is the DFA MM, that accepts a string iff it is accepted by M1M_1 and M2M_2.

Formal Definition

Given two DFAs: M1=(Q1,Σ,q01,F1,δ1)M_1 = (Q_1, \Sigma, q_{01}, F_1, \delta_1) M2=(Q2,Σ,q02,F2,δ2)M_2 = (Q_2, \Sigma, q_{02}, F_2, \delta_2)

For MM where L(M)=L(M1)L(M2)L(M) = L(M_1) \cap L(M_2), MM is defined as the following:

This method, however, creates a large number of states. Specifically, it creates the m1×m2m_1 \times m_2 states, where m1m_1 and m2m_2 are the number of states of their respective DFAs. However, it is proven that this is the optimal solution.

Closure Under Intersection

Theorem: The class of regular languages is closed under intersection

Proof: Consider two regular languages A1A_1 and A2A_2. By definition of regular languages, there must exist DFAs M1M_1 and M2M_2 that define the two languages. Using the product construction just defined, we can construct MM, which will define A1A2A_1 \cap A_2. This proves that A1A2A_1 \cap A_2 is regular.

Union

Theorem: The class of regular languages is closed under union

Proof: Consider two regular languages A1A_1 and A2A_2. By definition of regular languages, there must exist DFAs M1M_1 and M2M_2 that define the two languages. Is there a way we can modify the product/intersection construction to create a DFA that works on the union of these two languages?

It is actually very simple to do this. Currently, the above construction simulates both DFAs. However, the set of accepting states for it is only the cartesian product of the set of accepting states for both DFAs. That is to say, (q,s)F(q, s) \in F if and only if qF1sF2q \in F_1 \land s \in F_2. If we were to change that to be that the set of accepting states was whether qF1q \in F_1 or sF2s \in F_2, then this would be a valid DFA for the union of the languages, since a string ww is accepting if and only if it is accepted by M1M_1 or M2M_2.

Formal Definition

Given two DFAs: M1=(Q1,Σ,q01,F1,δ1)M_1 = (Q_1, \Sigma, q_{01}, F_1, \delta_1) M2=(Q2,Σ,q02,F2,δ2)M_2 = (Q_2, \Sigma, q_{02}, F_2, \delta_2)

For MM where L(M)=L(M1)L(M2)L(M) = L(M_1) \cap L(M_2), MM is defined as the following:

This also proves the above theorem.

Closure Under Complementation

The complement of a language is simply the set of strings that are not in the language. ¬A={wwA}\neg A = \{w | w \notin A\}

If we know AA to be regular, then can we conclude ¬A\neg A is also regular?

Yes. All we need to do is toggle the role of accepting and rejecting states. If qq was an accepting state in MM, then it is now a rejecting state in MM’ and vice versa.

Formal Definition

Given two DFAs: M1=(Q1,Σ,q01,F1,δ1)M_1 = (Q_1, \Sigma, q_{01}, F_1, \delta_1) M2=(Q2,Σ,q02,F2,δ2)M_2 = (Q_2, \Sigma, q_{02}, F_2, \delta_2)

For MM where L(M)=L(M1)L(M2)L(M) = L(M_1) \cup L(M_2), MM is defined as the following:

Language Concatenation

The concatenation of strings uu and vv is defined as u.vu.v. The concatenation of two languages AA and BB is defined as A.B={u.vuAvB}A.B = \{u.v | u \in A \land v \in B \}. That is, a string ww is in A.BA.B if it can be split into two (not necessarily equal) strings such that the first part is in AA and the second part is in BB.

Example

Theorem: The class of regular languages is closed under concatenation

Proof: Consider two regular languages A1A_1 and A2A_2. By definition of regular languages, there must exist DFAs M1M_1 and M2M_2 that define the two languages. We can construct a DFA that defines MM where M=M1.M2M = M_1.M_2

To construct this, however, we need another important concept

NFAs with E\mathcal{E}-Transitions

An E\mathcal{E}-Transition is a transition in which the state may change without consuming any input. In the execution tree, this is basically creating two threads without actually consuming a letter. Remember that a string is accepted as long as some execution path succeeds.

Consider the following E\mathcal{E}-NFA

NFA

If you try, you can see that this accepts abab, but does not accept baba.

The transition function now takes the structure of Δ:Q×[Σ{E}2Q\Delta : Q \times [\Sigma \cup \{\mathcal{E}\} \to 2^Q. The empty transitions do not increase expressiveness: an E\mathcal{E}-NFA can still be compiled into an equivalent DFA just by modifying the subset construction.

Towards Concatenation Construction

Given an input ww, MM needs to check that it is possible to split it into two parts such that each are accepted by M1M_1 and M2M_2 respectively. Intuitively, MM should first act like M1M_1, and then when M1M_1 is in a final state, start acting like M2M_2 for the remainder of the string. The challenge comes with when deciding to switch or and try with a different partition of substrings.

However, using an E\mathcal{E}-NFA, we can get a pretty easy solution to this. When we are a final state for M1M_1, we can simply put a E\mathcal{E}-transition to the initial state of M2M_2. This way, M1M_1 will continue running on its own thread for the entire string. However, for every substring for which M1M_1 is satisfied, we will be at a final state, so we will start a new thread at the beginning of M2M_2 for the remaining of the string in that instance. This way, we will be able to test every partition for which the first substring is valid under M1M_1.

DFA

Formal Definition

Given two DFAs: M1=(Q1,Σ,q01,F1,δ1)M_1 = (Q_1, \Sigma, q_{01}, F_1, \delta_1) M2=(Q2,Σ,q02,F2,δ2)M_2 = (Q_2, \Sigma, q_{02}, F_2, \delta_2)

For MM where L(M)=L(M1).L(M2)L(M) = L(M_1).L(M_2), MM is defined as the following E\mathcal{E}-NFA:

This also proves the above theorem that regular languages are closed under construction.

Extra Operations

Prefix Operation A string uu is defined as a prefix of ww if w=u.vw = u.v for some string vv. Prefix(A)=set of all prefixes of all strings of APrefix(A) = \text{set of all prefixes of all strings of A}

Theorem: Regular languages are closed under the prefix operation

Shuffle Operation A string uu is a shuffle of ww if it can be obtained by permuting(shuffling) the characters in ww. Shuffle(A)=set of all strings that are a shuffle of strings in AShuffle(A) = \text{set of all strings that are a shuffle of strings in } A

Theorem: Regular languages are not closed under the shuffle operation

To recap, regular languages are closed under union, intersection, complementation, concatenation, prefix, but not shuffle.