star-free
Let ℛ and 𝒢ℛ be the families of languages represented by regular expressions
and generalized regular expressions respectively. It is well-known that
ℛ=𝒢ℛ. |
This means that the additional symbols ∩ and ¬ in a generalized regular expressions are extraneous: removing them will not affect the representational power of the expressions with respect to regular languages.
What if we remove *, the Kleene star symbol, instead? With regard to regular expressions, removing * severely limits the power of the expressions. By induction, the represented languages are all finite. In this entry, we briefly discuss what happens when * is removed from the generalized regular expressions.
Definition. Let Σ be an alphabet. A language L over Σ is said to be star-free if it can be expressed by a generalized regular expression without *. In other words, a star-free language is a language that can be obtained by applying the operations of union, concatenation
, and complementation to ∅ and atomic languages (singleton subsets of Σ) a finite number of times.
If we denote 𝒮ℱ the family of star-free languages (over some alphabet Σ), then 𝒮ℱ is the smallest set of languages over Σ such that
-
•
∅∈𝒮ℱ,
-
•
{a}∈𝒮ℱ for any a∈Σ,
-
•
if L,M∈𝒮ℱ, then L∪M,LM,Lc∈𝒮ℱ.
A shorter characterization of a star-free language is a language with star height 0 with respect to representations by generalized regular expressions.
In relations to finite and regular languages, we have the following:
ℱ⊆𝒮ℱ⊆ℛ, | (1) |
where ℱ denotes the family of finite languages over Σ.
Furthermore, it is easy to see that 𝒮ℱ is closed under Boolean operations, so that 𝒮ℱ contains infinite languages, for example ¬∅ represents Σ*. As a result, the first inclusion must be strict. This example also shows that languages representable by expressions including the Kleene star may still be star-free. Here’s another example: {ab}* over the alphabet {a,b}. This language can be represented as
λ∪(abΣ*∩Σ*ab∩¬(Σ*a2Σ*)∩¬(Σ*b2Σ*)) |
The expression above, of course, is not star-free, and includes the symbol λ representing the empty word. However, Σ* is just ¬∅, and λ is just Σ*∩¬(aΣ*)∩¬(bΣ*). Some substitutions show that {ab}* is indeed star-free.
Is the second inclusion strict? Are there regular languages such that representations by expressions including the Kleene star is inevitable? The following proposition answers the question:
Proposition 1.
A language L is star-free iff there exists a non-negative integer n such that, for any words u,v,w over Σ, uvnw∈L iff uvn+1w∈L.
A language satisfying the second statement in the proposition is known as noncounting, so the proposition can be restated as: a language is star-free iff it is noncounting.
As a result of this fact, we see that languages such as {(ab)2}* is not star-free, although it is regular. Indeed, if we pick u=v=w=ab as in the statement of the proposition above, we see that uvnw is in the language iff uvn+1w is not in the language, for any n≥0. Therefore, the second inclusion in chain (1) above is also strict.
The above proposition also strengthens chain (1): denote by 𝒯(∞) the family of locally testable languages, then
𝒯(∞)⊂𝒮ℱ⊂ℛ. | (2) |
The first inclusion is due to the fact that, for any k-testable language L (over some Σ), we have
swk(uvkw)=swk(uvk+1w) |
(the definition of swk(u) is found in the entry on locally testable languages). Note the first inclusion is also strict. For example, the language represented by (ab)*∪(ba)* is star-free but not locally testable.
References
- 1 A. Ginzburg, Algebraic Theory of Automata, Academic Press (1968).
-
2
A. Salomaa, Formal Languages
, Academic Press, New York (1973).
Title | star-free |
---|---|
Canonical name | Starfree |
Date of creation | 2013-03-22 18:59:05 |
Last modified on | 2013-03-22 18:59:05 |
Owner | CWoo (3771) |
Last modified by | CWoo (3771) |
Numerical id | 16 |
Author | CWoo (3771) |
Entry type | Definition |
Classification | msc 68Q42 |
Classification | msc 68Q45 |
Synonym | star free |
Synonym | non-counting |
Synonym | noncounting |
Related topic | StarHeight |