Login
language
Let $\Sigma$ be an alphabet. We then define the following using the powers of an alphabet and infinite union, where $n \in \mathbb{Z}$ . \begin{eqnarray*} \Sigma^+ & = & \bigcup_{n=1}^{\infty} \Sigma^n \\ \Sigma^* & = & \bigcup_{n=0}^{\infty} \Sigma^n = \Sigma^+ \cup \{\lambda\} \end{eqnarray*}where $\lambda$ is the element called empty string. A string is an element of $\Sigma^*$ , meaning that it is a grouping of symbols from $\Sigma$ one after another (via concatenation). For example, $abbc$ is a string, and $cbba$ is a different string. A string is also commonly called a word. $\Sigma^+$ , like $\Sigma^*$ , contains all finite strings except that $\Sigma^+$ does not contain the empty string $\lambda$ . Given a string $s\in\Sigma^*$ , a string $t$ is a substring of $s$ if $s = utv$ for some strings $u,v\in\Sigma^*$ . For example, $lp, al, ha, alpha$ , and $\lambda$ (the empty string) are all substrings of the string $alpha$ .
Definition. A language over an alphabet $\Sigma$ is a subset of $\Sigma^*$ , meaning that it is a set of strings made from the symbols in the alphabet $\Sigma$ .
Take for example an alphabet $\Sigma = \{\clubsuit, \wp, 63, a, A \}$ . The following are all languages over $\Sigma$ :
- $\{aaa, \lambda, A\wp63, 63\clubsuit, AaAaA\}$ ,
- $\{\wp a,\wp aa, \wp aaa, \wp aaaa, \cdots\}$ ,
- The empty set $ \varnothing $ . In the context of languages, $\varnothing$ is called the empty language.
- $\lbrace 63 \rbrace$
- $\lbrace a^{2n}\mid n\ge 0 \rbrace$
Given a language $L$ over $\Sigma$ , the alphabet of $L$ is defined as the maximal subset $\Sigma(L)$ of $\Sigma$ such that every symbol in $\Sigma(L)$ occurs in some word in $L$ . Equivalently, define the alphabet of a word $w$ to be the set $\Sigma(w)$ of all symbols that occur in $w$ , then $\Sigma(L)$ is the union of all $\Sigma(w)$ , where $w$ ranges over $L$ .
Remark. A language can also be described in terms of ``infinite'' alphabets. For example, in model theory, a language is built from a set of symbols, together with a set of variables. These sets are often infinite. Another way of generalizing the notion of a language is to allow the strings to have infinite lengths. The way to do this is to think of a string as a partial function $f$ from some set $X$ to the alphabet $A$ such that $|\operatorname{dom}(f)|<|X|$ . Then the length of a string $f:X\to A$ is just $|\operatorname{dom}(f)|$ . This specializes to the finite case if we take $X$ to be the set of all non-negative integers.
