2. Stochastic maps

Any conditional distribution p(y|x) on finite setsMathworldPlanetmath X and Y can be represented as a matrix as follows. Let 𝒱⁢X={φ:X→ℝ} denote the vector spaceMathworldPlanetmath of real valued functions on X and similarly for Y. 𝒱⁢X is equipped with Dirac basis {δx:X→ℝ|x∈X}, where

δx(x′)={1if ⁢x=x′0else.

Given a conditional distribution p(y|x) construct matrix 𝔪p with entry p(y|x) in column δx and row δy. Matrix 𝔪p is stochastic: it has nonnegative entries and its columns sum to 1. Alternatively, given a stochastic matrix 𝔪:𝒱⁢X→𝒱⁢Y, we can recover the conditional distribution. The Dirac basis induces Euclidean metric

⟨∙|∙⟩:𝒱⁢X⊗𝒱⁢X→ℝ:⟨∑αx⁢δx|∑βx⁢δx⟩=∑αx⁢βx (1)

which identifies vector spaces with their duals 𝒱⁢X≈(𝒱⁢X)*. Let p𝔪(y|x):=⟨δy|𝔪(δx)⟩.

Definition 2.

The categoryMathworldPlanetmath of stochastic maps 𝚂𝚝𝚘𝚌𝚑 has function spaces 𝒱⁢X for objects and stochastic matrices 𝔪:𝒱⁢X→𝒱⁢Y with respect to Dirac bases for arrows. We identify of (𝒱⁢X)* with 𝒱⁢X using the Dirac basis without further comment below.

Definition 3.

The dual of surjectivePlanetmathPlanetmath stochastic map 𝔪:𝒱⁢X→𝒱⁢Y is the composition 𝔪♮:=𝒱⁢Y→𝔪*∘r⁢e⁢n𝒱⁢X, where r⁢e⁢n is the unique map making diagram 𝔪⟂ of 𝔪 with columns renormalized to sum to 1. The stochastic dual 𝔪♮ is

commute. Precomposing 𝔪* with r⁢e⁢n renormalizes 11If 𝔪 is not surjective, i.e. if one of the rows has all zero entries, then the renormalization is not well-defined. its columns to sum to 1. The stochastic dual of a stochastic transform is stochastic; further, if 𝔪 is stochastic then (𝔪♮)♮=𝔪.

Category 𝚂𝚝𝚘𝚌𝚑 is described in terms of braid-like generators and relations in [2]. A more general, but also more complicated, category of conditional distributions was introduced by Giry [3], see [5].

Example 1 (deterministic functions).

Let 𝙵𝚂𝚎𝚝 be the category of finite sets. Define faithful functorMathworldPlanetmath 𝒱:𝙵𝚂𝚎𝚝→𝚂𝚝𝚘𝚌𝚑 taking set X to 𝒱⁢X and function f:X→Y to stochastic map 𝒱⁢f:𝒱⁢X→𝒱⁢Y:δx↦δf⁢(x). It is easy to see that 𝒱⁢(X×Y)=𝒱⁢X⊗𝒱⁢Y and 𝒱⁢(X∪Y)=𝒱⁢X×𝒱⁢Y.

We introduce special notation for commonly used functions:

  • •

    Set inclusion. For any inclusion i:X↪Y of sets, let ι:=𝒱⁢i:𝒱⁢X→𝒱⁢Y denote the corresponding stochastic map. Two important examples are

    • –

      Point inclusion. Given x∈X define ιx:ℝ→𝒱⁢X:1↦δx.

    • –

      Diagonal map. Inclusion Δ:X↪X×X:x↦(x,x) induces ιΔ:𝒱⁢X→𝒱⁢X⊗𝒱⁢X:δx↦δx⊗δx.

  • •

    Terminal map. Let ωX:𝒱⁢X→ℝ:δx↦1 denote the terminal map induced by X→{∙}.

  • •

    ProjectionPlanetmathPlanetmath. Let πX⁢Y,X:𝒱⁢X⊗𝒱⁢Y→𝒱⁢X:δx⊗δy↦δx denote the projection induced by p⁢rX×Y,X:X×Y→X:(x,y)↦x.

Proposition 1 (dual is Bayes over uniform distribution).

The dual of a stochastic map applies Bayes rule to compute the posterior distribution ⟨m♮(δy)|δx⟩=pm(x|y) using the uniform probability distribution.

Proof: The uniform distributionMathworldPlanetmath is the dual ωX♮:ℝ→𝒱⁢X:1↦1|X|⁢∑xδx of the terminal map ωX:𝒱⁢X→ℝ. It assigns equal probability pω♮⁢(x)=1|X| to all of X’s elements, and can be characterized as the maximally uninformative distributionPlanetmathPlanetmath [4]. Let 𝔪:𝒱⁢X→𝒱⁢Y. The normalized transposeMathworldPlanetmath is

Remark 1.

Note that p𝔪(x|y):=⟨𝔪♮(δy)|δx⟩≠⟨δy|𝔪(δx)⟩=:p𝔪(y|x). Dirac’s bra-ket notation must be used with care since stochastic matrices are not necessarily symmetricMathworldPlanetmathPlanetmathPlanetmathPlanetmath [1].

Corollary 2 (preimages).

The dual (V⁢f)♮:V⁢Y→V⁢X of stochastic map V⁢f:V⁢X→V⁢Y is conditional distribution

p𝒱⁢f(x|y)={1|f-1⁢(y)|if ⁢f⁢(x)=y0𝑒𝑙𝑠𝑒. (2)

Proof: By the proof of PropositionPlanetmathPlanetmath 1


The supportMathworldPlanetmathPlanetmathPlanetmath of p𝒱⁢f(X|y) is f-1⁢(y). Elements in the support are assigned equal probability, thereby treating them as an undifferentiated list. Dual (𝒱⁢f)♮ thus generalizes the inverse imagePlanetmathPlanetmath f-1:Y→2¯X. Conveniently however, the dual (𝒱⁢X)♮ simply flips the domain and range of 𝒱⁢f, whereas the inverse image maps to powerset 2¯X, an entirely new object.

Corollary 3 (marginalization with respect to uniform distribution).

Precomposing V⁢X⊗V⁢Y→mV⁢Z with the dual πX♮ to V⁢X⊗V⁢Y→πXV⁢X marginalizes pm(z|x,y) over the uniform distribution on Y.

Proof: By Corollary 2 we have πX♮:𝒱⁢X→𝒱⁢X⊗𝒱⁢Y:δy↦1|Y|⁢∑y∈Yδx⊗δy. It follows immediately that


Precomposing with πX♮ treats inputs from Y as extrinsic noise. Although duals can be defined so that they implement Bayes’ rule with respect to other probability distributions, this paper restricts attention to the simplest possible renormalization of columns, Definition 2. The uniform distribution is convenient since it uses minimalPlanetmathPlanetmath prior knowledge (it depends only on the number of elements in the set) to generalize pre-images to the stochastic case, Proposition 2.


  • 1 P A M Dirac (1958): The Principles of Quantum Mechanics. Oxford University Press.
  • 2 Tobias Fritz (2009): A presentationMathworldPlanetmath of the category of stochastic matrices. arXiv:0902.2554v1 .
  • 3 M Giry (1981): A categoricalPlanetmathPlanetmath approach to probability theory. In B Banaschewski, editor: Categorical Aspects of Topology and Analysis, Springer.
  • 4 E T Jaynes (1957): Information theory and statistical mechanics. Phys. Rev. 106(4), pp. 620–630.
  • 5 P Panangaden (1998): Probabilistic relationsMathworldPlanetmathPlanetmath. In C Baier, M Huth, M Kwiatkowska & M Ryan, editors: PROBMIV’98, pp. 59–74.
Title 2. Stochastic maps
Canonical name 2StochasticMaps
Date of creation 2014-04-23 0:51:47
Last modified on 2014-04-23 0:51:47
Owner rspuzio (6075)
Last modified by rspuzio (6075)
Numerical id 11
Author rspuzio (6075)
Entry type Feature
Classification msc 60J20