equivalent regular expressions


Let Σ be an alphabet, and E(Σ) be the set of all regular expressionsMathworldPlanetmath over Σ. Two expressions p,q are said to be equivalentMathworldPlanetmathPlanetmathPlanetmathPlanetmath, written pq, if they describe the same languagePlanetmathPlanetmath: L(p)=L(q).

This relationMathworldPlanetmathPlanetmath is clearly an equivalence relation on E(Σ), and therefore partitions E(Σ) into equivalence classesMathworldPlanetmath. Furthermore, if ,, and * are interpreted as operationsMathworldPlanetmath on E(Σ), then it is clear that respects each of these operations, and so is a congruence relationPlanetmathPlanetmath on E(Σ).

Let E=E(Σ)/, the set of equivalence classes. Members of E are denoted [p]. For simplicity, we drop the square brackets around p from now on.

The following identitiesPlanetmathPlanetmathPlanetmathPlanetmath (in E) are easily established: for any p,q,rE:

  1. 1.

    pq=qp

  2. 2.

    pp=p

  3. 3.

    p=p

  4. 4.

    p(qr)=(pq)r

  5. 5.

    p(qr)=(pq)r

  6. 6.

    p(qr)=pqpr

  7. 7.

    (pq)r=prqr

  8. 8.

    p=

  9. 9.

    p=

  10. 10.

    *p=p

  11. 11.

    p*=p

  12. 12.

    pp*=p*p

  13. 13.

    p*p*=p*

  14. 14.

    (p*)*=p*

Identities 1,3,4 establish that E is a commutative monoid with as the “additionPlanetmathPlanetmath”, and as the identity. Likewise, identities 5,10,11 establish that E is a monoid with as the “multiplication”, and * as the identity elementMathworldPlanetmath. By identities 6 through 9, E with the two operations form a semiring ( being the addition and the multiplication). Lastly, identity 2 says that E is an idempotent semiring.

Now, as a idempotent semiring, the binary relation such that pq iff pq=q (or L(p)L(q)). It is not hard to see the following implicationMathworldPlanetmath:

pqrq  implies  p*rq. (1)

Assume the left hand side of the implication. In other words, L(pqr)L(q). Then L(p)L(q)L(r)L(q), which implies that L(r)L(q), and L(p)L(q)L(q), which, by inductionMathworldPlanetmath, implies that L(p)nL(q)L(q), and hence L(p)+L(q)L(q). Now, L(p*r)=L(p)*L(r)=L(r)L(p)+L(r)L(q)L(p)+L(q)L(q). Hence we arrive at the right hand side of the implication.

This implication, together with identities 12 and 13, show that E, with binary operationsMathworldPlanetmath , and the unary operation *, is a Kleene algebra.

Remarks.

  1. 1.

    If we impose the condition *p, the above implication can be written as

    pqr=q  implies  p*r=q. (2)

    Suppose xL(q)=L(pqr)=L(p)L(q)L(r). We use induction on the length of x. If |x|=0, then xL(r)L(p)*L(r), since L(p) does not contain the empty wordPlanetmathPlanetmathPlanetmath λ. Suppose now that |x|>0. Then either x=yz where yL(p) and zL(q), or xL(r). In the former case, since y is not the empty word by the imposed condition, z is a strictly shorter word than x, which, by induction, is in L(p*r)=L(p)*L(r). As a result, x=yzL(p)L(p)*L(r)L(p)*L(r). In the latter case, we have xL(p)*L(r). In either case, xL(p*r), and the implication is proved.

  2. 2.

    Regular expressions can be thought of as well-formed formulas in a formal system. A sentenceMathworldPlanetmath is of the form p=q where p,q are wffs. An interpretationMathworldPlanetmathPlanetmath of the sentence p=q may be defined as the equation L(p)=L(q). A sentence is valid if its interpretation is true. The list of identities above are all valid sentences, and can in fact be thought of as axioms of the system. There are two rules of inferencesMathworldPlanetmath:

    1. (a)

      formal variable substitution, and

    2. (b)

      from pqr=q infer p*r=q, given that p*=p is not valid (implication 2 above).

    The system is completePlanetmathPlanetmathPlanetmath if all valid sentences may be derived from the axioms by rules of inferences. We have the following results:

    • If the set of axioms is finite, and (a) as the sole rule of inference, then the system is not complete.

    • However, the system is complete if the (finite) set of axioms above, and both rules (a) and (b) are used.

    • In fact, with (a) and (b), all the axioms we need are 1, 4, 5, 6, 7, 8, 10, 13, 14, and none can be removed to keep the system complete.

References

Title equivalent regular expressions
Canonical name EquivalentRegularExpressions
Date of creation 2013-03-22 18:57:58
Last modified on 2013-03-22 18:57:58
Owner CWoo (3771)
Last modified by CWoo (3771)
Numerical id 10
Author CWoo (3771)
Entry type Definition
Classification msc 20M35
Classification msc 68Q70