2. Stochastic maps


Any conditional distribution p(y|x) on finite setsMathworldPlanetmath X and Y can be represented as a matrix as follows. Let 𝒱X={φ:X} denote the vector spaceMathworldPlanetmath of real valued functions on X and similarly for Y. 𝒱X is equipped with Dirac basis {δx:X|xX}, where

δx(x)={1if x=x0else.

Given a conditional distribution p(y|x) construct matrix 𝔪p with entry p(y|x) in column δx and row δy. Matrix 𝔪p is stochastic: it has nonnegative entries and its columns sum to 1. Alternatively, given a stochastic matrix 𝔪:𝒱X𝒱Y, we can recover the conditional distribution. The Dirac basis induces Euclidean metric

|:𝒱X𝒱X:αxδx|βxδx=αxβx (1)

which identifies vector spaces with their duals 𝒱X(𝒱X)*. Let p𝔪(y|x):=δy|𝔪(δx).

Definition 2.

The categoryMathworldPlanetmath of stochastic maps 𝚂𝚝𝚘𝚌𝚑 has function spaces 𝒱X for objects and stochastic matrices 𝔪:𝒱X𝒱Y with respect to Dirac bases for arrows. We identify of (𝒱X)* with 𝒱X using the Dirac basis without further comment below.

Definition 3.

The dual of surjectivePlanetmathPlanetmath stochastic map 𝔪:𝒱X𝒱Y is the composition 𝔪:=𝒱Y𝔪*ren𝒱X, where ren is the unique map making diagram 𝔪 of 𝔪 with columns renormalized to sum to 1. The stochastic dual 𝔪 is

commute. Precomposing 𝔪* with ren renormalizes 11If 𝔪 is not surjective, i.e. if one of the rows has all zero entries, then the renormalization is not well-defined. its columns to sum to 1. The stochastic dual of a stochastic transform is stochastic; further, if 𝔪 is stochastic then (𝔪)=𝔪.

Category 𝚂𝚝𝚘𝚌𝚑 is described in terms of braid-like generators and relations in [2]. A more general, but also more complicated, category of conditional distributions was introduced by Giry [3], see [5].

Example 1 (deterministic functions).

Let 𝙵𝚂𝚎𝚝 be the category of finite sets. Define faithful functorMathworldPlanetmath 𝒱:𝙵𝚂𝚎𝚝𝚂𝚝𝚘𝚌𝚑 taking set X to 𝒱X and function f:XY to stochastic map 𝒱f:𝒱X𝒱Y:δxδf(x). It is easy to see that 𝒱(X×Y)=𝒱X𝒱Y and 𝒱(XY)=𝒱X×𝒱Y.

We introduce special notation for commonly used functions:

  • Set inclusion. For any inclusion i:XY of sets, let ι:=𝒱i:𝒱X𝒱Y denote the corresponding stochastic map. Two important examples are

    • Point inclusion. Given xX define ιx:𝒱X:1δx.

    • Diagonal map. Inclusion Δ:XX×X:x(x,x) induces ιΔ:𝒱X𝒱X𝒱X:δxδxδx.

  • Terminal map. Let ωX:𝒱X:δx1 denote the terminal map induced by X{}.

  • ProjectionPlanetmathPlanetmath. Let πXY,X:𝒱X𝒱Y𝒱X:δxδyδx denote the projection induced by prX×Y,X:X×YX:(x,y)x.

Proposition 1 (dual is Bayes over uniform distribution).

The dual of a stochastic map applies Bayes rule to compute the posterior distribution m(δy)|δx=pm(x|y) using the uniform probability distribution.

Proof: The uniform distributionMathworldPlanetmath is the dual ωX:𝒱X:11|X|xδx of the terminal map ωX:𝒱X. It assigns equal probability pω(x)=1|X| to all of X’s elements, and can be characterized as the maximally uninformative distributionPlanetmathPlanetmath [4]. Let 𝔪:𝒱X𝒱Y. The normalized transposeMathworldPlanetmath is

𝔪(δy)=xp𝔪(y|x)xp𝔪(y|x)δx=xp𝔪(y|x)pω(x)xp𝔪(y|x)pω(x)δx=xp𝔪(x|y)δx.
Remark 1.

Note that p𝔪(x|y):=𝔪(δy)|δxδy|𝔪(δx)=:p𝔪(y|x). Dirac’s bra-ket notation must be used with care since stochastic matrices are not necessarily symmetricMathworldPlanetmathPlanetmathPlanetmathPlanetmath [1].

Corollary 2 (preimages).

The dual (Vf):VYVX of stochastic map Vf:VXVY is conditional distribution

p𝒱f(x|y)={1|f-1(y)|if f(x)=y0𝑒𝑙𝑠𝑒. (2)

Proof: By the proof of PropositionPlanetmathPlanetmath 1

(𝒱f)(δy)=1|f-1(y)|{x|f(x)=y}δx.

The supportMathworldPlanetmathPlanetmathPlanetmath of p𝒱f(X|y) is f-1(y). Elements in the support are assigned equal probability, thereby treating them as an undifferentiated list. Dual (𝒱f) thus generalizes the inverse imagePlanetmathPlanetmath f-1:Y2¯X. Conveniently however, the dual (𝒱X) simply flips the domain and range of 𝒱f, whereas the inverse image maps to powerset 2¯X, an entirely new object.

Corollary 3 (marginalization with respect to uniform distribution).

Precomposing VXVYmVZ with the dual πX to VXVYπXVX marginalizes pm(z|x,y) over the uniform distribution on Y.

Proof: By Corollary 2 we have πX:𝒱X𝒱X𝒱Y:δy1|Y|yYδxδy. It follows immediately that

p𝔪πX(z|x)=1|Y|yYp𝔪(z|x,y).

Precomposing with πX treats inputs from Y as extrinsic noise. Although duals can be defined so that they implement Bayes’ rule with respect to other probability distributions, this paper restricts attention to the simplest possible renormalization of columns, Definition 2. The uniform distribution is convenient since it uses minimalPlanetmathPlanetmath prior knowledge (it depends only on the number of elements in the set) to generalize pre-images to the stochastic case, Proposition 2.

References

  • 1 P A M Dirac (1958): The Principles of Quantum Mechanics. Oxford University Press.
  • 2 Tobias Fritz (2009): A presentationMathworldPlanetmath of the category of stochastic matrices. arXiv:0902.2554v1 .
  • 3 M Giry (1981): A categoricalPlanetmathPlanetmath approach to probability theory. In B Banaschewski, editor: Categorical Aspects of Topology and Analysis, Springer.
  • 4 E T Jaynes (1957): Information theory and statistical mechanics. Phys. Rev. 106(4), pp. 620–630.
  • 5 P Panangaden (1998): Probabilistic relationsMathworldPlanetmathPlanetmath. In C Baier, M Huth, M Kwiatkowska & M Ryan, editors: PROBMIV’98, pp. 59–74.
Title 2. Stochastic maps
Canonical name 2StochasticMaps
Date of creation 2014-04-23 0:51:47
Last modified on 2014-04-23 0:51:47
Owner rspuzio (6075)
Last modified by rspuzio (6075)
Numerical id 11
Author rspuzio (6075)
Entry type Feature
Classification msc 60J20