Login
word
Given a set $\Sigma$ , a word (or a string) over $\Sigma$ is a juxtaposition (variously called concatenation or multiplication) of a finite number of elements in $\Sigma$ . The juxtaposition is taken as an associative binary operation on $\Sigma$ . A word with zero number of elements is called an empty word, typically denoted by $\lambda$ or $\epsilon$ . The set of words over $\Sigma$ is denoted $\Sigma^*$ .
Examples.
- If $\Sigma = \lbrace a,b,c,\ldots, x,y,z\rbrace$ , the English alphabet written in the lower case, then ``good'', ``mathematics'', ``fasluiwh'' are all words (without the double quotes) over $\Sigma$ , where as ``PlanetMath'' is not, because it contains upper case letters, which are not in $\Sigma$ .
- Let $\Sigma=\lbrace 0,1,2,3,4,5,6,7,8,9,+,=\rbrace$ . Then ``$12$ '', ``$0345$ '', ``$9+3$ '', ``$87=123$ '', ``$++231++$ '', ``$6+7=13$ '', ``$7=$ '' are also words over $\Sigma$ .
- The notion of words is used extensively in group theory. The juxtaposition here is the group multiplication, as the multiplication is associative. In other words, if $g_1,g_2,...,g_m$ are elements in $G$ then we can form the word $w=g_1g_2\cdots g_m\in G$ . For example, in the free group $\langle a,b|\quad\rangle$ a word could be the commutator $[a,b]=aba^{-1}b^{-1}$ .
Remarks
- $\Sigma^*$ is a monoid with juxtaposition as the monoid multiplication, and $\lambda$ , the empty word, as the multiplicative identity.
- For any element $a$ in $\Sigma$ , define $a^0=\lambda$ , and $a^{n+1}=a^na$ for non-negative integers $n$ . Then $a^{n+m}=a^na^m$ since juxtaposition is associative.
- Words, by definition, are finite in length. This notion can be generalized: an infinite word, or more precisely, a $\omega$ -word, over an alphabet $\Sigma$ is just a function from $\mathbb{N}$ to $\Sigma$ . The set of all words over $\Sigma$ , finite or infinite, is $\Sigma^* \cup \Sigma^{\mathbb{N}}$ , and is denoted by $\Sigma^{\infty}$ or $\Sigma^{\omega}$ .
Subwords
A word $u$ is called a subword of $v$ if $v=xuy$ , for some words $x$ and $y$ (may be empty words). If $u$ is a subword of $v$ , we also say that $u$ occurs in $v$ , or that $v$ contains $u$ . For example, ``math'' is a subword of ``mathematics''.
Given the equation $v=xuy$ , we call the triple $(x,u,y)$ an occurrence of $u$ in $v$ . The collection of occurrences of $u$ in $v$ is denoted $O(u,v)$ . The number of occurrences of $u$ in $v$ defined as the cardinality of $O(u,v)$ , and written $|u|_v$ . The position of occurrence $(x,u,y)$ of $u$ in $v$ is the length of $x$ plus $1$ .
For example, the number of occurrences of subword $a^3$ in $a^3ba^5c$ is $4$ , since $$O(a^3,a^3ba^5c)=\lbrace (\lambda,a^3,ba^5c),(a^3b,a^3,a^2c), (a^3ba, a^3, ac), (a^3ba^2,a^3,c)\rbrace.$$ The positions of these occurrences are $1,5,6$ , and $7$ , respectively.
Generating Words using Rules
Some of the words in the second example above, such as ``$++231++$ '' and ``$7=$ '', do not make any mathematical sense. The way to define words that make sense is through a process called definition by recursion. First, we declare that certain words over $\Sigma$ are sensible. Then, we have a set of rules or a grammar that dictates how new sensible words can be formed from the old ones. Any word that can be formed from the old words by these rules in a finite number of steps is called sensible.
In the last example, we could declare that all symbols $0,1,\ldots,9$ are sensible words. To form new sensible words, we have the rules:
- if $a,b$ do not contain either $+$ or $=$ , then $ab$ is a sensible word;
- if a two sensible words $a,b$ do not contain the symbol $=$ , then $a+b$ and $a=b$ are sensible words;
- the only sensible words are the initially declared sensible words and those that can be formed by the previous two rules.
- $a$
- $a_1+a_2+\cdots +a_n$
- $a_1+a_2+\cdots + a_n = b_1+b_2+\cdots + b_m$ .
Generally, any collection of words is called a language. The collection of all sensible words described above is called the language generated by $0,\ldots,9$ under the rules above. In logic, one calls these sensible words well-formed formulas, or formulas or wff for short.
