|
Let $A$ be an alphabet. A code over $A$ is any subset $C$ of the set of words $A^*$ on the alphabet $A$ such that $C$ has ``uniquue factorization into letters,'' i.e., such that for whenever $a_1\ldots a_n=b_1\ldots b_m$ , with all $a_i,b_j\in C$ , then we have $n=m$ and $a_i=b_i$ for all $i$ . In other words, every ``word over $A$ '' generated by
$C$ (considered as an alphabet) can be uniquely factored into ``letters'' in C.
An example of a subset of $A^*$ which is not a code is given by $C=\lbrace ab, c, a, bc \rbrace$ . Here the word $abc$ can be written either as $(ab)c$ or as $a(bc)$ in terms of elements of $C$ . Since $ab \ne a$ nor $c\ne bc$ , $C$ is not a code.
If we fix a length $n$ for the words, i.e. we require that $C\subset A^n$ , then we call $C$ a block code, and call $n$ the block length of the code. An important property of a code is the code's minimum distance, given by the minimum Hamming distance between any pair of words in $C$ .
This notion of code is obviously very general. In practice (i.e., in coding theory) one typically takes codes with a little more structure. See, in particular, linear codes.
|