5.6 The general syntax of inductive definitions

So far, we have been discussing only particular inductive types: $\mathbf{0}$ , $\mathbf{1}$ , $\mathbf{2}$ , $\mathbb{N}$ , coproducts, products, $\Sigma$ -types, $\mathsf{W}$ -types, etc. However, an important aspect of type theory is the ability to define new inductive types, rather than being restricted only to some particular fixed list of them. In order to be able to do this, however, we need to know what sorts of “inductive definitions” are valid or reasonable.

To see that not everything which “looks like an inductive definition” makes sense, consider the following “constructor” of a type $C$ :

•

$g:(C\to\mathbb{N})\to C$ .

The recursion principle for such a type $C$ ought to say that given a type $P$ , in order to construct a function $f:C\to P$ , it suffices to consider the case when the input $c : C$ is of the form $g(\alpha)$ for some $\alpha:C\to\mathbb{N}$ . Moreover, we would expect to be able to use the “recursive data” of $f$ applied to $\alpha$ in some way. However, it is not at all clear how to “apply $f$ to $\alpha$ ”, since both are functions with domain $C$ .

We could write down a “recursion principle” for $C$ by just supposing (unjustifiably) that there is some way to apply $f$ to $\alpha$ and obtain a function $P\to\mathbb{N}$ . Then the input to the recursion rule would ask for a type $P$ together with a function

h:(C\to\mathbb{N})\to(P\to\mathbb{N})\to P

(5.6.1)

where the two arguments of $h$ are $\alpha$ and “the result of applying $f$ to $\alpha$ ”. However, what would the computation rule for the resulting function $f:C\to P$ be? Looking at other computation rules, we would expect something like “ $f(g(\alpha))\equiv h(\alpha,f(\alpha))$ ” for $\alpha:C\to\mathbb{N}$ , but as we have seen, “ $f(\alpha)$ ” does not make sense. The induction principle of $C$ is even more problematic; it’s not even clear how to write down the hypotheses. (See also http://planetmath.org/node/87788Exercise 5.7,http://planetmath.org/node/87866Exercise 5.8.)

This example suggests one restriction on inductive definitions: the domains of all the constructors must be covariant functors of the type being defined, so that we can “apply $f$ to them” to get the result of the “recursive call”. In other words, if we replace all occurrences of the type being defined with a variable $X:\mathcal{U}$ , then each domain of a constructor must be an expression that can be made into a covariant functor of $X$ . This is the case for all the examples we have considered so far. For instance, with the constructor ${\mathsf{inl}}:A\to A+B$ , the relevant functor is constant at $A$ (i.e. $X\mapsto A$ ), while for the constructor $\mathsf{succ}:\mathbb{N}\to\mathbb{N}$ , the functor is the identity functor ( $X\mapsto X$ ).

However, this necessary condition is also not sufficient. Covariance prevents the inductive type from occurring on the left of a single function type, as in the argument $C\to\mathbb{N}$ of the “constructor” $g$ considered above, since this yields a contravariant functor rather than a covariant one. However, since the composite of two contravariant functors is covariant, double function types such as $((X\to\mathbb{N})\to\mathbb{N})$ are once again covariant. This enables us to reproduce Cantorian-style paradoxes.

For instance, consider an “inductive type” $D$ with the following constructor:

•

$k:((D\to\mathsf{Prop})\to\mathsf{Prop})\to D$ .

Assuming such a type exists, we define functions

	$\displaystyle r$	$\displaystyle:D\to(D\to\mathsf{Prop})\to\mathsf{Prop},$
	$\displaystyle f$	$\displaystyle:(D\to\mathsf{Prop})\to D,$
	$\displaystyle p$	$\displaystyle:(D\to\mathsf{Prop})\to(D\to\mathsf{Prop})\to\mathsf{Prop},$
by
	$\displaystyle r(k(\theta))$	$\displaystyle:\!\!\equiv\theta,$
	$\displaystyle f(\delta)$	$\displaystyle:\!\!\equiv k({\lambda}x.\,(x=\delta)),$
	$\displaystyle p(\delta)$	$\displaystyle:\!\!\equiv{\lambda}x.\,\delta(f(x)).$

Here $r$ is defined by the recursion principle of $D$ , while $f$ and $p$ are defined explicitly. Then for any $\delta:D\to\mathsf{Prop}$ , we have $r(f(\delta))={\lambda}x.\,(x=\delta)$ .

In particular, therefore, if $f(\delta)=f(\delta^{\prime})$ , then we have a path $s:({\lambda}x.\,(x=\delta))=({\lambda}x.\,(x=\delta^{\prime}))$ . Thus, $\mathsf{happly}(s,\delta):(\delta=\delta)=(\delta=\delta^{\prime})$ , and so in particular $\delta=\delta^{\prime}$ holds. Hence, $f$ is “injective” (although a priori $D$ may not be a set). This already sounds suspicious — we have an “injection” of the “power set” of $D$ into $D$ — and with a little more work we can massage it into a contradiction.

Suppose given $\theta:(D\to\mathsf{Prop})\to\mathsf{Prop}$ , and define $\delta:D\to\mathsf{Prop}$ by

\delta(d):\!\!\equiv\exists(\gamma:D\to\mathsf{Prop}).\,(f(\gamma)=d)\times% \theta(\gamma).

(5.6.2)

We claim that $p(\delta)=\theta$ . By function extensionality, it suffices to show $p(\delta)(\gamma)=_{\mathsf{Prop}}\theta(\gamma)$ for any $\gamma:D\to\mathsf{Prop}$ . And by univalence, for this it suffices to show that each implies the other. Now by definition of $p$ , we have

	$\displaystyle p(\delta)(\gamma)$	$\displaystyle\equiv\delta(f(\gamma))$
		$\displaystyle\equiv\exists(\gamma^{\prime}:D\to\mathsf{Prop}).\,(f(\gamma^{% \prime})=f(\gamma))\times\theta(\gamma^{\prime}).$

Clearly this holds if $\theta(\gamma)$ , since we may take $\gamma^{\prime}:\!\!\equiv\gamma$ . On the other hand, if we have $\gamma^{\prime}$ with $f(\gamma^{\prime})=f(\gamma)$ and $\theta(\gamma^{\prime})$ , then $\gamma^{\prime}=\gamma$ since $f$ is injective, hence also $\theta(\gamma)$ .

This completes the proof that $p(\delta)=\theta$ . Thus, every element $\theta:(D\to\mathsf{Prop})\to\mathsf{Prop}$ is the image under $p$ of some element $\delta:D\to\mathsf{Prop}$ . However, if we define $\theta$ by a classic diagonalization:

\theta(\gamma):\!\!\equiv\neg p(\gamma)(\gamma)\quad\text{for all $\gamma:D\to% \mathsf{Prop}$}

then from $\theta=p(\delta)$ we deduce $p(\delta)(\delta)=\neg p(\delta)(\delta)$ . This is a contradiction: no proposition can be equivalent to its negation. (Supposing $P\Leftrightarrow\neg P$ , if $P$ , then $\neg P$ , and so $\mathbf{0}$ ; hence $\neg P$ , but then $P$ , and so $\mathbf{0}$ .)

Remark 5.6.3.

There is a question of universe size to be addressed. In general, an inductive type must live in a universe that already contains all the types going into its definition. Thus if in the definition of $D$ , the ambiguous notation $\mathsf{Prop}$ means $\mathsf{Prop}_{\mathcal{U}}$ , then we do not have $D:\mathcal{U}$ but only $D:\mathcal{U}^{\prime}$ for some larger universe $\mathcal{U}^{\prime}$ with $\mathcal{U}:\mathcal{U}^{\prime}$ . In a predicative theory, therefore, the right-hand side of (5.6.2) lives in $\mathsf{Prop}_{\mathcal{U}^{\prime}}$ , not $\mathsf{Prop}_{\mathcal{U}}$ . So this contradiction does require the propositional resizing axiom mentioned in §3.5 (http://planetmath.org/35subsetsandpropositionalresizing).

This counterexample suggests that we should ban an inductive type from ever appearing on the left of an arrow in the domain of its constructors, even if that appearance is nested in other arrows so as to eventually become covariant. (Similarly, we also forbid it from appearing in the domain of a dependent function type.) This restriction is called strict positivity (ordinary “positivity” being essentially covariance), and it turns out to suffice.

In conclusion, therefore, a valid inductive definition of a type $W$ consists of a list of constructors. Each constructor is assigned a type that is a function type taking some number (possibly zero) of inputs (possibly dependent on one another) and returning an element of $W$ . Finally, we allow $W$ itself to occur in the input types of its constructors, but only strictly positively. This essentially means that each argument of a constructor is either a type not involving $W$ , or some iterated function type with codomain $W$ . For instance, the following is a valid constructor type:

c:(A\to W)\to(B\to C\to W)\to D\to W\to W.

(5.6.4)

All of these function types can also be dependent functions ( $\Pi$ -types).¹¹In the language of §5.4 (http://planetmath.org/54inductivetypesareinitialalgebras), the condition of strict positivity ensures that the relevant endofunctor is polynomial. It is well-known in category theory that not all endofunctors can have initial algebras; restricting to polynomial functors ensures consistency. One can consider various relaxations of this condition, but in this book we will restrict ourselves to strict positivity as defined here.

Note we require that an inductive definition is given by a finite list of constructors. This is simply because we have to write it down on the page. If we want an inductive type which behaves as if it has an infinite number of constructors, we can simply parametrize one constructor by some infinite type. For instance, a constructor such as $\mathbb{N}\to W\to W$ can be thought of as equivalent to countably many constructors of the form $W\to W$ . (Of course, the infinity is now internal to the type theory, but this is as it should be for any foundational system.) Similarly, if we want a constructor that takes “infinitely many arguments”, we can allow it to take a family of arguments parametrized by some infinite type, such as $(\mathbb{N}\to W)\to W$ which takes an infinite sequence of elements of $W$ .

Now, once we have such an inductive definition, what can we do with it? Firstly, there is a recursion principle stating that in order to define a function $f:W\to P$ , it suffices to consider the case when the input $w : W$ arises from one of the constructors, allowing ourselves to recursively call $f$ on the inputs to that constructor. For the example constructor (5.6.4), we would require $P$ to be equipped with a function of type

d:(A\to W)\to(A\to P)\to(B\to C\to W)\to(B\to C\to P)\to D\to W\to P\to P.

(5.6.5)

Under these hypotheses, the recursion principle yields $f:W\to P$ , which moreover “preserves the constructor data” in the evident way — this is the computation rule, where we use covariance of the inputs. For instance, in the example (5.6.4), the computation rule says that for any $\alpha:A\to W$ , $\beta:B\to C\to W$ , $\delta:d$ , and $\omega:W$ , we have

f(c(\alpha,\beta,\delta,\omega))\equiv d(\alpha,f\circ\alpha,\beta,f\circ\beta% ,\delta,\omega,f(\omega)).

(5.6.5)

The induction principle for a general inductive type $W$ is only a little more complicated. Of course, we start with a type family $P:W\to\mathcal{U}$ , which we require to be equipped with constructor data “lying over” the constructor data of $W$ . That means the “recursive call” arguments such as $A\to P$ above must be replaced by dependent functions with types such as $\mathchoice{\prod_{a:A}\,}{\mathchoice{{\textstyle\prod_{(a:A)}}}{\prod_{(a:A)% }}{\prod_{(a:A)}}{\prod_{(a:A)}}}{\mathchoice{{\textstyle\prod_{(a:A)}}}{\prod% _{(a:A)}}{\prod_{(a:A)}}{\prod_{(a:A)}}}{\mathchoice{{\textstyle\prod_{(a:A)}}% }{\prod_{(a:A)}}{\prod_{(a:A)}}{\prod_{(a:A)}}}P(\alpha(a))$ . In the full example of (5.6.4), the corresponding hypothesis for the induction principle would require

\displaystyle d:\mathchoice{\prod_{\alpha:A\to W}\,}{\mathchoice{{\textstyle% \prod_{(\alpha:A\to W)}}}{\prod_{(\alpha:A\to W)}}{\prod_{(\alpha:A\to W)}}{% \prod_{(\alpha:A\to W)}}}{\mathchoice{{\textstyle\prod_{(\alpha:A\to W)}}}{% \prod_{(\alpha:A\to W)}}{\prod_{(\alpha:A\to W)}}{\prod_{(\alpha:A\to W)}}}{% \mathchoice{{\textstyle\prod_{(\alpha:A\to W)}}}{\prod_{(\alpha:A\to W)}}{% \prod_{(\alpha:A\to W)}}{\prod_{(\alpha:A\to W)}}}\Bigl{(}\mathchoice{\prod_{a% :A}\,}{\mathchoice{{\textstyle\prod_{(a:A)}}}{\prod_{(a:A)}}{\prod_{(a:A)}}{% \prod_{(a:A)}}}{\mathchoice{{\textstyle\prod_{(a:A)}}}{\prod_{(a:A)}}{\prod_{(% a:A)}}{\prod_{(a:A)}}}{\mathchoice{{\textstyle\prod_{(a:A)}}}{\prod_{(a:A)}}{% \prod_{(a:A)}}{\prod_{(a:A)}}}P(\alpha(a))\Bigr{)}\to\mathchoice{\prod_{\beta:% B\to C\to W}\,}{\mathchoice{{\textstyle\prod_{(\beta:B\to C\to W)}}}{\prod_{(% \beta:B\to C\to W)}}{\prod_{(\beta:B\to C\to W)}}{\prod_{(\beta:B\to C\to W)}}% }{\mathchoice{{\textstyle\prod_{(\beta:B\to C\to W)}}}{\prod_{(\beta:B\to C\to W% )}}{\prod_{(\beta:B\to C\to W)}}{\prod_{(\beta:B\to C\to W)}}}{\mathchoice{{% \textstyle\prod_{(\beta:B\to C\to W)}}}{\prod_{(\beta:B\to C\to W)}}{\prod_{(% \beta:B\to C\to W)}}{\prod_{(\beta:B\to C\to W)}}}\Bigl{(}\mathchoice{\prod_{(% b:B)}\,}{\mathchoice{{\textstyle\prod_{(b:B)}}}{\prod_{(b:B)}}{\prod_{(b:B)}}{% \prod_{(b:B)}}}{\mathchoice{{\textstyle\prod_{(b:B)}}}{\prod_{(b:B)}}{\prod_{(% b:B)}}{\prod_{(b:B)}}}{\mathchoice{{\textstyle\prod_{(b:B)}}}{\prod_{(b:B)}}{% \prod_{(b:B)}}{\prod_{(b:B)}}}\mathchoice{\prod_{(c:C)}\,}{\mathchoice{{% \textstyle\prod_{(c:C)}}}{\prod_{(c:C)}}{\prod_{(c:C)}}{\prod_{(c:C)}}}{% \mathchoice{{\textstyle\prod_{(c:C)}}}{\prod_{(c:C)}}{\prod_{(c:C)}}{\prod_{(c% :C)}}}{\mathchoice{{\textstyle\prod_{(c:C)}}}{\prod_{(c:C)}}{\prod_{(c:C)}}{% \prod_{(c:C)}}}P(\beta(b,c))\Bigr{)}\to\\ \displaystyle\mathchoice{\prod_{\delta:D}\,}{\mathchoice{{\textstyle\prod_{(% \delta:D)}}}{\prod_{(\delta:D)}}{\prod_{(\delta:D)}}{\prod_{(\delta:D)}}}{% \mathchoice{{\textstyle\prod_{(\delta:D)}}}{\prod_{(\delta:D)}}{\prod_{(\delta% :D)}}{\prod_{(\delta:D)}}}{\mathchoice{{\textstyle\prod_{(\delta:D)}}}{\prod_{% (\delta:D)}}{\prod_{(\delta:D)}}{\prod_{(\delta:D)}}}\mathchoice{\prod_{\omega% :W}\,}{\mathchoice{{\textstyle\prod_{(\omega:W)}}}{\prod_{(\omega:W)}}{\prod_{% (\omega:W)}}{\prod_{(\omega:W)}}}{\mathchoice{{\textstyle\prod_{(\omega:W)}}}{% \prod_{(\omega:W)}}{\prod_{(\omega:W)}}{\prod_{(\omega:W)}}}{\mathchoice{{% \textstyle\prod_{(\omega:W)}}}{\prod_{(\omega:W)}}{\prod_{(\omega:W)}}{\prod_{% (\omega:W)}}}P(\omega)\to P(c(\alpha,\beta,\delta,\omega)).

(5.6.6)

The corresponding computation rule looks identical to (5.6.5). Of course, the recursion principle is the special case of the induction principle where $P$ is a constant family. As we have mentioned before, the induction principle is also called the eliminator, and the recursion principle the non-dependent eliminator.

As discussed in §1.10 (http://planetmath.org/110patternmatchingandrecursion), we also allow ourselves to invoke the induction and recursion principles implicitly, writing a definitional equation with $:\!\!\equiv$ for each expression that would be the hypotheses of the induction principle. This is called giving a definition by (dependent) pattern matching. In our running example, this means we could define $f:\mathchoice{\prod_{w:W}\,}{\mathchoice{{\textstyle\prod_{(w:W)}}}{\prod_{(w:% W)}}{\prod_{(w:W)}}{\prod_{(w:W)}}}{\mathchoice{{\textstyle\prod_{(w:W)}}}{% \prod_{(w:W)}}{\prod_{(w:W)}}{\prod_{(w:W)}}}{\mathchoice{{\textstyle\prod_{(w% :W)}}}{\prod_{(w:W)}}{\prod_{(w:W)}}{\prod_{(w:W)}}}P(w)$ by

f(c(\alpha,\beta,\delta,\omega)):\!\!\equiv\cdots

where $\alpha:A\to W$ and $\beta:B\to C\to W$ and $\delta:D$ and $\omega:W$ are variables that are bound in the right-hand side. Moreover, the right-hand side may involve recursive calls to $f$ of the form $f(\alpha(a))$ , $f(\beta(b,c))$ , and $f(\omega)$ . When this definition is repackaged in terms of the induction principle, we replace such recursive calls by $\bar{\alpha}(a)$ , $\bar{\beta}(b,c)$ , and $\bar{\omega}$ , respectively, for new variables

	$\displaystyle\bar{\alpha}$	$\displaystyle:\mathchoice{\prod_{a:A}\,}{\mathchoice{{\textstyle\prod_{(a:A)}}% }{\prod_{(a:A)}}{\prod_{(a:A)}}{\prod_{(a:A)}}}{\mathchoice{{\textstyle\prod_{% (a:A)}}}{\prod_{(a:A)}}{\prod_{(a:A)}}{\prod_{(a:A)}}}{\mathchoice{{\textstyle% \prod_{(a:A)}}}{\prod_{(a:A)}}{\prod_{(a:A)}}{\prod_{(a:A)}}}P(\alpha(a))$
	$\displaystyle\bar{\beta}$	$\displaystyle:\mathchoice{\prod_{(b:B)}\,}{\mathchoice{{\textstyle\prod_{(b:B)% }}}{\prod_{(b:B)}}{\prod_{(b:B)}}{\prod_{(b:B)}}}{\mathchoice{{\textstyle\prod% _{(b:B)}}}{\prod_{(b:B)}}{\prod_{(b:B)}}{\prod_{(b:B)}}}{\mathchoice{{% \textstyle\prod_{(b:B)}}}{\prod_{(b:B)}}{\prod_{(b:B)}}{\prod_{(b:B)}}}% \mathchoice{\prod_{(c:C)}\,}{\mathchoice{{\textstyle\prod_{(c:C)}}}{\prod_{(c:% C)}}{\prod_{(c:C)}}{\prod_{(c:C)}}}{\mathchoice{{\textstyle\prod_{(c:C)}}}{% \prod_{(c:C)}}{\prod_{(c:C)}}{\prod_{(c:C)}}}{\mathchoice{{\textstyle\prod_{(c% :C)}}}{\prod_{(c:C)}}{\prod_{(c:C)}}{\prod_{(c:C)}}}P(\beta(b,c))$
	$\displaystyle\bar{\omega}$	$\displaystyle:P(\omega).$

Then we could write

f:\!\!\equiv\mathsf{ind}_{W}(P,\,{\lambda}\alpha.\,{\lambda}\bar{\alpha}.\,{% \lambda}\beta.\,{\lambda}\bar{\beta}.\,{\lambda}\delta.\,{\lambda}\omega.\,{% \lambda}\bar{\omega}.\,\cdots)

where the second argument to $\mathsf{ind}_{W}$ has the type of (5.6.6).

We will not attempt to give a formal presentation of the grammar of a valid inductive definition and its resulting induction and recursion principles and pattern matching rules. This is possible to do (indeed, it is necessary to do if implementing a computer proof assistant), but provides no additional insight. With practice, one learns to automatically deduce the induction and recursion principles for any inductive definition, and to use them without having to think twice.

Title	5.6 The general syntax of inductive definitions
\metatable