## You are here

Homearithmetic encoding

## Primary tabs

# arithmetic encoding

*Arithmetic coding* is a technique for achieving near-optimal entropy encoding.

An arithmetic encoder takes a string of symbols as input and produces a rational number in the interval $[0,1)$ as output. As each symbol is processed, the encoder will restrict the output to a smaller interval.

Let $N$ be the number of distinct symbols in the input; let $x_{1},x_{2}\dots x_{N}$ represent the symbols, and let $P_{1},P_{2}\dots P_{N}$ represent the probability of each symbol appearing. At each step in the process, the output is restricted to the current interval $[y,y+R)$. Partition this interval into $N$ disjoint subintervals:

$I_{1}=[y,y+P_{1}R)$ |

$I_{2}=[y+P_{1}R,y+P_{1}R+P_{2}R)$ |

$\vdots$ |

$I_{N}=\left[y+\sum_{{i=1}}^{{N-1}}P_{i}R,y+R\right)$ |

Therefore the size of $I_{i}$ is $P_{i}R$. If the next symbol is $x_{i}$, then restrict the output to the new interval $I_{i}$.

Note that at each stage, all the possible intervals are pairwise disjoint. Therefore a specific sequence of symbols produces exactly one unique output range, and the process can be reversed.

Since arithmetic encoders are typically implemented on binary computers, the actual output of the encoder is generally the shortest sequence of bits representing the fractional part of a rational number in the final interval.

Suppose our entire input string contains $M$ symbols: then $x_{i}$ appears exactly $P_{i}M$ times in the input. Therefore, the size of the final interval will be

$R_{f}=\prod_{{i=1}}^{N}P_{i}^{{P_{i}M}}$ |

The number of bits necessary to write a binary fraction in this range is

$\displaystyle-\log_{2}R_{f}$ | $\displaystyle=$ | $\displaystyle-\log_{2}\prod_{{i=1}}^{N}P_{i}^{{P_{i}M}}$ | ||

$\displaystyle=$ | $\displaystyle\sum_{{i=1}}^{N}-\log_{2}P_{i}^{{P_{i}M}}$ | |||

$\displaystyle=$ | $\displaystyle\sum_{{i=1}}^{N}-P_{i}M\log_{2}P_{i}$ |

By Shannon’s theorem, this is the total entropy of the original message. Therefore arithmetic encoding is near-optimal entropy encoding.

## Mathematics Subject Classification

68P30*no label found*94A24

*no label found*

- Forums
- Planetary Bugs
- HS/Secondary
- University/Tertiary
- Graduate/Advanced
- Industry/Practice
- Research Topics
- LaTeX help
- Math Comptetitions
- Math History
- Math Humor
- PlanetMath Comments
- PlanetMath System Updates and News
- PlanetMath help
- PlanetMath.ORG
- Strategic Communications Development
- The Math Pub
- Testing messages (ignore)

- Other useful stuff

## Recent Activity

new question: Prove that for any sets A, B, and C, An(BUC)=(AnB)U(AnC) by St_Louis

Apr 20

new image: information-theoretic-distributed-measurement-dds.png by rspuzio

new image: information-theoretic-distributed-measurement-4.2 by rspuzio

new image: information-theoretic-distributed-measurement-4.1 by rspuzio

new image: information-theoretic-distributed-measurement-3.2 by rspuzio

new image: information-theoretic-distributed-measurement-3.1 by rspuzio

new image: information-theoretic-distributed-measurement-2.1 by rspuzio

Apr 19

new collection: On the Information-Theoretic Structure of Distributed Measurements by rspuzio

Apr 15

new question: Prove a formula is part of the Gentzen System by LadyAnne

Mar 30

new question: A problem about Euler's totient function by mbhatia