encoding words
Let be an alphabet, and the set of all words over . An encoding of words over is, loosely speaking, an assignment of words to numbers such that the numbers uniquely identify the words.
Definition. An encoding for a language is a one-to-one function .
If is finite, then it is easy to find an encoding for . We are interested mainly in encoding infinite sets. By the definition above, can not be encoded if it is uncountable. We can therefore limit the discussion to that is at most countably infinite by listing some common methods of encoding .
Among the methods liststed, is an enumerated set . In the first method, is assumed to be finite, and countably infinite in the last three. In addition, is assumed to be in the first three methods.
Method 1. First, set . In addition, for the empty word , we set . Next, inductively define on the length of . Suppose now that has been defiend. Set
It is easy to see that if any non-empty word , with , where . Then
Then is an encoding of . This is really just the base- digital representation of integers, where each can be thought of as digits used by the representation. The only difference here is that is not used as a “digit” (every letter gets mapped to a positive integer), except when the word is empty.
For example, let . Then the words , and have code numbers , and .
It is easy to see that the encoding is one-to-one (see proof here (http://planetmath.org/UniquenessOfDigitalRepresentation)).
Method 2. Pick three positive number such that are coprime, with . Set and . Inductively define on length of . Suppose now that has been defined. Then
For example, .
To see that is injective, we make the following series of observations:
-
1.
is injective on . In addition, either or for any .
-
2.
iff .
-
3.
If for some , then . First, note that , and if , then . So suppose , with and . Then . If , then . Since , . But if , and would not be coprime as . On the other hand, if , then , again impossible. So must be a letter, and therefore is .
- 4.
A variation of this method is to set .
If we set and , then the range of is the set of all positive integers.
Method 3. The third method utilizes the uniqueness of prime decomposition of integers. First, define by . Then, for any , with , define
where is the -th prime number (for example, ). We again set . By the fundamental theorem of arithmetic, and the fact that is a bijection, is injective (and maps onto the set of positive integers). This method is known as the multiplicative encoding of Gödel.
Method 4. Once an encoding is found for , an encoding for can be obtained by restricting the domain of to . Depending on how is defined, other methods of encoding via are possible. We illustrate one example.
Let , where are disjoint non-empty finite sets not containing the empty word. Encode as follows: suppose and . Define such that:
-
1.
and , where and ;
-
2.
, where , and .
Essentially, the first digits are reserved for encoding words and . is easily seen to be injective.
Method 5. Let be the language consisting of all words of length . Define by , where is a pairing function that codes pairs of positive integers. Since is an injection (actually maps onto the set of positive integers), so is . Using , one can encode the language of all length words. Define by . Again, is an injection. By induction, one can encode the language of all words of length , for any positive integer .
Method 6. Let be the language consisting of all words of length at most . We can utilize Method 5 to code . First, let , where is a letter not in . Define by , where is the length of . Then , the language of all length words over . It is easy to see that is one-to-one. Then is an encoding for , where is defined in Method 5 that encodes , via the modified version of the pairing function , where .
Method 7. Can Method 5 be used to encode ? The answer is yes. However, a direct extension of does not work. By this we mean that , given by where , though a function, is not injective. For any positive integer , there is a word of length for every , such that . Instead, define so that if , and . It is easy to see that is injective, since both and are (in fact, is a bijection).
Remark. An encoding for can be thought of as a partial function from to , whose domain is . is said to be effective if is a recursive set. Equivalently, the partial function on given by
is Turing-computable. An enumeration of can be thought of as an encoding for . If is finite, any enumeration of is effective. Assume that is effectively enumerated, whether or not is finite (so that in the definition of can be effectively chosen). Then it is not hard to see that all of the encodings described above are effective. In fact, all of the sets described are primitive recursive.
Title | encoding words |
---|---|
Canonical name | EncodingWords |
Date of creation | 2013-03-22 19:06:09 |
Last modified on | 2013-03-22 19:06:09 |
Owner | CWoo (3771) |
Last modified by | CWoo (3771) |
Numerical id | 20 |
Author | CWoo (3771) |
Entry type | Feature |
Classification | msc 94A60 |
Classification | msc 68Q05 |
Classification | msc 68Q45 |
Classification | msc 03D20 |