Let Σ be an alphabet. We then define the following using the powers of an alphabet and infiniteMathworldPlanetmath union, where n.

Σ+ = n=1Σn
Σ* = n=0Σn=Σ+{λ}

where λ is the element called empty string. A string is an element of Σ*, meaning that it is a grouping of symbols from Σ one after another (via concatenationMathworldPlanetmath). For example, abbc is a string, and cbba is a different string. A string is also commonly called a word. Σ+, like Σ*, contains all finite strings except that Σ+ does not contain the empty string λ. Given a string sΣ*, a string t is a substring of s if s=utv for some strings u,vΣ*. For example, lp,al,ha,alpha, and λ (the empty string) are all substrings of the string alpha.

Definition. A language over an alphabet Σ is a subset of Σ*, meaning that it is a set ( of strings made from the symbols in the alphabet Σ.

Take for example an alphabet Σ={,,63,a,A}. The following are all languages over Σ:

  • {aaa,λ,A63,63,AaAaA},

  • {a,aa,aaa,aaaa,},

  • The empty setMathworldPlanetmath . In the context of languages, is called the empty language.

  • {63}

  • {a2nn0}

A language L is said to be proper if the empty string does not belong to it: λL. A proper language is also said to be λ-free. Otherwise, it is improper. In the examples above, all but the first and the last examples are λ-free. L is a finite language if L is a finite setMathworldPlanetmath, and atomic if it is a singleton subset of Σ, such as the fourth example above. A language can be arbitrarily formed, or constructed via some set of rules called a formal grammar.

Given a language L over Σ, the alphabet of L is defined as the maximal subset Σ(L) of Σ such that every symbol in Σ(L) occurs in some word in L. Equivalently, define the alphabet of a word w to be the set Σ(w) of all symbols that occur in w, then Σ(L) is the union of all Σ(w), where w ranges over L.

Remark. A language can also be described in terms of “infinite” alphabets. For example, in model theoryMathworldPlanetmath, a language is built from a set of symbols, together with a set of variables. These sets are often infinite. Another way of generalizing the notion of a language is to allow the strings to have infinite lengths. The way to do this is to think of a string as a partial functionMathworldPlanetmath f from some set X to the alphabet A such that |dom(f)|<|X|. Then the length of a string f:XA is just |dom(f)|. This specializes to the finite case if we take X to be the set of all non-negative integers.

Title language
Canonical name Language
Date of creation 2013-03-22 12:17:10
Last modified on 2013-03-22 12:17:10
Owner mps (409)
Last modified by mps (409)
Numerical id 28
Author mps (409)
Entry type Definition
Classification msc 03C07
Classification msc 68Q45
Synonym λ-free
Related topic Alphabet
Related topic ContextFreeLanguage
Related topic RegularLanguage
Related topic DeterministicFiniteAutomaton
Related topic NonDeterministicFiniteAutomaton
Related topic KleeneStar
Related topic FormalGrammar
Related topic FirstOrderLanguage
Related topic TermsAndFormulas
Related topic Word
Defines string
Defines empty language
Defines substring
Defines proper language
Defines improper language
Defines alphabet of a language
Defines alphabet of a word
Defines finite language
Defines atomic language