You are here
Home ›Dyck language
Primary tabs
Dyck language
The importance of using parentheses can be illustrated by looking at the following expression:
There is no ambiguity in computing the result, which is . If we remove all the parentheses in the expression, we get
which does not make much sense, unless we know the order of arithmetic operations in advance. In addition, without using parentheses, the result will differ depending on how the order of operations is assigned.
Now, if we remove all the symbols in the first expression above except the parentheses, we get
an expression known as a word of “well-balanced” parentheses.
Formally, let be an alphabet consisting of the left and right parentheses. Given word over , let be the number of occurrences of the left parentheses in minus the number of occurrences of the right parentheses in .
Definition. A word over is said to be a word of well-balanced parentheses, if
1. , and
2. for any prefix of .
For simplicity, we also say that is a well-balanced word over .
Given this definition, the word above is well-balanced, but and are not.
Definition. The set of well-balanced words over is called the parenthesis language or Dyck language over , and is denoted by .
By induction, it is not hard to see that can be generated by the following grammar:
1. terminal set is ,
2. non-terminal set is the singleton consisting of the start symbol ,
3. productions are (the empty word), , and .
As a result, is context-free. Furthermore, is a deterministic language, and hence unambiguous.
More generally, one can consider expressions involving more than one type of parentheses, such as [], , and .
Definition. Let be an alphabet consisting of types of parentheses, a left and a right one for each type. The Dyck language over , written , is the language generated by the following grammar:
1. terminal set is ,
2. non-terminal set is the singleton consisting of the start symbol ,
3. productions are (the empty word), , and for each in .
As before, is context-free, and deterministic in particular, and hence unambiguous.
Words in are also called well-balanced. However, it is a little more complicated to characterize what a well-balanced word is. The two criteria above for the case , while necessary, are not sufficient enough to describe the “well-nestedness” of parentheses when . For example, if , and the parentheses considered are and , then the word satisfy both criteria above, but fail to be well-nested.
In order to fully characterize a well-balanced word over , we first define, for each , the function much the same way as : so that is the number of left parentheses , minus the number of right parentheses . Call a word partially balanced if, for every :
1. , and
2. for every prefix of .
Next, write , where each is a symbol in . Let be the prefix . Given a position in , if is a left parenthesis, say , then there is a corresponding right parenthesis in to the right of , positioned at, say , satisfying the equation . This is a straightforward result of the two criteria above. Let be the least such position such that the equation holds. Now, if is right parenthesis, then for some position , we have . This means that, given any position in , there is a unique pair of positions , such that
-
either or , and
-
.
Define, for each , the word to be the subword of with starting position and ending position . Now, we are ready to state the last criterion in order that be well-balanced:
- 3.
for each position in , the word is partially balanced.
It can be shown, the set of words satisfying all three criteria above is . Furthermore, if , the third criterion can be derived from the first two criteria.
Other than being deterministic, some basic properties of :
-
, for any .
-
More generally, is insertion closed: if , then .
-
is also deletion closed: if , then .
-
is not prefix-free.
-
Suppose is a function, such that for each , maps and to some and respectively such that . Then the extension , when restricted to , is a language homomorphism from to .
Remark. It can be shown that the number of words of length in is the -th Catalan number. For a proof of this, see this entry. The idea is to visualize a word in as a path in a two-dimensional lattice, which can be generated as follows: given a word of length , the path starts from (which corresponds to the first symbol of ). If point is on the path, then the next point on the path is , where if the -th symbol of is , otherwise . So the increase from one point to the next is either , or . As a result, the path has the property that it never dips below the -axis, and it ends at . Paths defined this way are also known as Dyck paths.
References
- 1 D. C. Kozen, Automata and Computability, Springer, New York (1997).
- 2 J.E. Hopcroft, J.D. Ullman, Formal Languages and Their Relation to Automata, Addison-Wesley, (1969).
Mathematics Subject Classification
68Q45 Formal languages and automata68Q42 Grammars and rewriting systems
- Forums
- Planetary Bugs
- HS/Secondary
- University/Tertiary
- Graduate/Advanced
- Industry/Practice
- Research Topics
- LaTeX help
- Math Comptetitions
- Math History
- Math Humor
- PlanetMath Comments
- PlanetMath System Updates and News
- PlanetMath help
- PlanetMath.ORG
- Strategic Communications Development
- The Math Pub
- Testing messages (ignore)
- Other useful stuff
Recent Activity
new question: Linear Algebra Combination Problem! by unlord
new question: Computation of $\varphi(2000)$ by jeremyboden
new question: Computation of $\varphi(2000)$ by jeremyboden
May 21
new question: pure subgroups by lvoyster
new correction: Typo in M\"obius function? by Aleph Zero
new collection: analytic number theory by Aleph Zero
May 20
new question: Taylor's Series Query! by unlord
new question: Laplace transform by J
new question: Residue Calculus by J
May 19
new Education: Project: PlanetMath Outlines Series by unlord


