# 2. Stochastic maps

Any conditional distribution $p(y|x)$ on finite sets  $X$ and $Y$ can be represented as a matrix as follows. Let ${\mathcal{V}}X=\{\varphi:X\rightarrow{\mathbb{R}}\}$ denote the vector space  of real valued functions on $X$ and similarly for $Y$. ${\mathcal{V}}X$ is equipped with Dirac basis $\{\delta_{x}:X\rightarrow{\mathbb{R}}|x\in X\}$, where

 $\delta_{x}(x^{\prime})=\left\{\begin{matrix}1&\mbox{if }x=x^{\prime}\\ 0&\mbox{else}.\end{matrix}\right.$

Given a conditional distribution $p(y|x)$ construct matrix ${\mathfrak{m}}_{p}$ with entry $p(y|x)$ in column $\delta_{x}$ and row $\delta_{y}$. Matrix ${\mathfrak{m}}_{p}$ is stochastic: it has nonnegative entries and its columns sum to 1. Alternatively, given a stochastic matrix ${\mathfrak{m}}:{\mathcal{V}}X\rightarrow{\mathcal{V}}Y$, we can recover the conditional distribution. The Dirac basis induces Euclidean metric

 ${\langle}\bullet|\bullet{\rangle}:{\mathcal{V}}X\otimes{\mathcal{V}}X% \rightarrow{\mathbb{R}}:\left\langle\sum\alpha_{x}\delta_{x}\left|\sum\beta_{x% }\delta_{x}\right\rangle\right.=\sum\alpha_{x}\beta_{x}$ (1)

which identifies vector spaces with their duals ${\mathcal{V}}X\approx({\mathcal{V}}X)^{*}$. Let $p_{\mathfrak{m}}(y|x):={\langle}\delta_{y}|{\mathfrak{m}}(\delta_{x}){\rangle}$.

###### Definition 2.

The category  of stochastic maps ${\mathtt{Stoch}}$ has function spaces ${\mathcal{V}}X$ for objects and stochastic matrices ${\mathfrak{m}}:{\mathcal{V}}X\rightarrow{\mathcal{V}}Y$ with respect to Dirac bases for arrows. We identify of $({\mathcal{V}}X)^{*}$ with ${\mathcal{V}}X$ using the Dirac basis without further comment below.

###### Definition 3.

The dual of surjective  stochastic map ${\mathfrak{m}}:{\mathcal{V}}X\rightarrow{\mathcal{V}}Y$ is the composition ${\mathfrak{m}}^{\natural}:={\mathcal{V}}Y\xrightarrow{{\mathfrak{m}}^{*}\circ ren% }{\mathcal{V}}X$, where $ren$ is the unique map making diagram ${\mathfrak{m}}^{\perp}$ of ${\mathfrak{m}}$ with columns renormalized to sum to 1. The stochastic dual ${\mathfrak{m}}^{\natural}$ is commute. Precomposing ${\mathfrak{m}}^{*}$ with $ren$ renormalizes 11If ${\mathfrak{m}}$ is not surjective, i.e. if one of the rows has all zero entries, then the renormalization is not well-defined. its columns to sum to 1. The stochastic dual of a stochastic transform is stochastic; further, if ${\mathfrak{m}}$ is stochastic then $({\mathfrak{m}}^{\natural})^{\natural}={\mathfrak{m}}$.

Category ${\mathtt{Stoch}}$ is described in terms of braid-like generators and relations in . A more general, but also more complicated, category of conditional distributions was introduced by Giry , see .

###### Example 1 (deterministic functions).

Let $\mathtt{FSet}$ be the category of finite sets. Define faithful functor  ${\mathcal{V}}:\mathtt{FSet}\rightarrow{\mathtt{Stoch}}$ taking set $X$ to ${\mathcal{V}}X$ and function $f:X\rightarrow Y$ to stochastic map ${\mathcal{V}}f:{\mathcal{V}}X\rightarrow{\mathcal{V}}Y:\delta_{x}\mapsto\delta% _{f(x)}$. It is easy to see that ${\mathcal{V}}(X\times Y)={\mathcal{V}}X\otimes{\mathcal{V}}Y$ and ${\mathcal{V}}(X\cup Y)={\mathcal{V}}X\times{\mathcal{V}}Y$.

We introduce special notation for commonly used functions:

• Set inclusion. For any inclusion $i:X\hookrightarrow Y$ of sets, let $\iota:={\mathcal{V}}i:{\mathcal{V}}X\rightarrow{\mathcal{V}}Y$ denote the corresponding stochastic map. Two important examples are

• Point inclusion. Given $x\in X$ define $\iota_{x}:{\mathbb{R}}\rightarrow{\mathcal{V}}X:1\mapsto\delta_{x}$.

• Diagonal map. Inclusion $\Delta:X\hookrightarrow X\times X:x\mapsto(x,x)$ induces $\iota_{\Delta}:{\mathcal{V}}X\rightarrow{\mathcal{V}}X\otimes{\mathcal{V}}X:% \delta_{x}\mapsto\delta_{x}\otimes\delta_{x}$.

• Terminal map. Let $\omega_{X}:{\mathcal{V}}X\rightarrow{\mathbb{R}}:\delta_{x}\mapsto 1$ denote the terminal map induced by $X\rightarrow\{\bullet\}$.

• Projection  . Let $\pi_{XY,X}:{\mathcal{V}}X\otimes{\mathcal{V}}Y\rightarrow{\mathcal{V}}X:\delta% _{x}\otimes\delta_{y}\mapsto\delta_{x}$ denote the projection induced by $pr_{X\times Y,X}:X\times Y\rightarrow X:(x,y)\mapsto x$.

###### Proposition 1 (dual is Bayes over uniform distribution).

The dual of a stochastic map applies Bayes rule to compute the posterior distribution ${\langle}{\mathfrak{m}}^{\natural}(\delta_{y})|\delta_{x}{\rangle}=p_{% \mathfrak{m}}(x|y)$ using the uniform probability distribution.

Proof: The uniform distribution  is the dual $\omega_{X}^{\natural}:{\mathbb{R}}\rightarrow{\mathcal{V}}X:1\mapsto\frac{1}{|% X|}\sum_{x}\delta_{x}$ of the terminal map $\omega_{X}:{\mathcal{V}}X\rightarrow{\mathbb{R}}$. It assigns equal probability $p_{\omega^{\natural}}(x)=\frac{1}{|X|}$ to all of $X$’s elements, and can be characterized as the maximally uninformative distribution  . Let ${\mathfrak{m}}:{\mathcal{V}}X\rightarrow{\mathcal{V}}Y$. The normalized transpose  is

 ${\mathfrak{m}}^{\natural}(\delta_{y})=\sum_{x}\frac{p_{\mathfrak{m}}(y|x)}{% \sum_{x^{\prime}}p_{\mathfrak{m}}(y|x^{\prime})}\delta_{x}=\sum_{x}\frac{p_{% \mathfrak{m}}(y|x)\cdot p_{\omega^{\natural}}(x)}{\sum_{x^{\prime}}p_{% \mathfrak{m}}(y|x^{\prime})p_{\omega^{\natural}}(x^{\prime})}\delta_{x}=\sum_{% x}p_{\mathfrak{m}}(x|y)\cdot\delta_{x}.\,\,\blacksquare$
###### Remark 1.

Note that $p_{\mathfrak{m}}(x|y):={\langle}{\mathfrak{m}}^{\natural}(\delta_{y})|\delta_{% x}{\rangle}\neq{\langle}\delta_{y}|{\mathfrak{m}}(\delta_{x}){\rangle}=:p_{% \mathfrak{m}}(y|x)$. Dirac’s bra-ket notation must be used with care since stochastic matrices are not necessarily symmetric     .

###### Corollary 2 (preimages).

The dual $({\mathcal{V}}f)^{\natural}:{\mathcal{V}}Y\rightarrow{\mathcal{V}}X$ of stochastic map ${\mathcal{V}}f:{\mathcal{V}}X\rightarrow{\mathcal{V}}Y$ is conditional distribution

 $p_{{\mathcal{V}}f}(x|y)=\left\{\begin{matrix}\frac{1}{|f^{-1}(y)|}&\mbox{if }f% (x)=y\\ 0&\mbox{else}.\end{matrix}\right.$ (2)
 $({\mathcal{V}}f)^{\natural}(\delta_{y})=\frac{1}{|f^{-1}(y)|}\sum_{\{x|f(x)=y% \}}\delta_{x}.\,\,\blacksquare$

The support    of $p_{{\mathcal{V}}f}(X|y)$ is $f^{-1}(y)$. Elements in the support are assigned equal probability, thereby treating them as an undifferentiated list. Dual $({\mathcal{V}}f)^{\natural}$ thus generalizes the inverse image  $f^{-1}:Y\rightarrow\underline{2}^{X}$. Conveniently however, the dual $({\mathcal{V}}X)^{\natural}$ simply flips the domain and range of ${\mathcal{V}}f$, whereas the inverse image maps to powerset $\underline{2}^{X}$, an entirely new object.

###### Corollary 3 (marginalization with respect to uniform distribution).

Precomposing ${\mathcal{V}}X\otimes{\mathcal{V}}Y\xrightarrow{{\mathfrak{m}}}{\mathcal{V}}Z$ with the dual $\pi_{X}^{\natural}$ to ${\mathcal{V}}X\otimes{\mathcal{V}}Y\xrightarrow{\pi_{X}}{\mathcal{V}}X$ marginalizes $p_{\mathfrak{m}}(z|x,y)$ over the uniform distribution on $Y$.

Proof: By Corollary 2 we have $\pi^{\natural}_{X}:{\mathcal{V}}X\rightarrow{\mathcal{V}}X\otimes{\mathcal{V}}% Y:\delta_{y}\mapsto\frac{1}{|Y|}\sum_{y\in Y}\delta_{x}\otimes\delta_{y}$. It follows immediately that

 $p_{{\mathfrak{m}}\circ\pi^{\natural}_{X}}(z|x)=\frac{1}{|Y|}\sum_{y\in Y}p_{% \mathfrak{m}}(z|x,y).\,\,\blacksquare$

Precomposing with $\pi_{X}^{\natural}$ treats inputs from $Y$ as extrinsic noise. Although duals can be defined so that they implement Bayes’ rule with respect to other probability distributions, this paper restricts attention to the simplest possible renormalization of columns, Definition 2. The uniform distribution is convenient since it uses minimal  prior knowledge (it depends only on the number of elements in the set) to generalize pre-images to the stochastic case, Proposition 2.

## References

Title 2. Stochastic maps 2StochasticMaps 2014-04-23 0:51:47 2014-04-23 0:51:47 rspuzio (6075) rspuzio (6075) 11 rspuzio (6075) Feature msc 60J20