You are here
Home ›Zipf's law
Primary tabs
Zipf’s law
Zipf’s law (named for Harvard linguistic professor George Kingsley Zipf) models the occurrence of distinct objects in particular sorts of collections. Zipf’s law says that the th most frequent object will appear times the frequency of the most frequent object, or that the th most frequent object from an object “vocabulary” of size occurs
times in a collection of objects, where is the harmonic number of order of .
| (generated by GNU Octave and gnuplot) |
Zipf’s law typically holds when the “objects” themselves have a property (such as length or size) which is modelled by an exponential distribution or other skewed distribution that places restrictions on how often “larger” objects can occur.
An example of where Zipf’s law applies is in English texts, to frequency of word occurrence. The commonality of English words follows an exponential distribution, and the nature of communication is such that it is more efficient to place emphasis on using shorter words. Hence the most common words tend to be short and appear often, following Zipf’s law.
The value of typically ranges between 1 and 2, and is between 1.5 and 2 for the English text case.
Another example is the populations of cities. These follow Zipf’s law, with a few very populous cities, falling off to very numerous cities with a small population. In this case, there are societal forces which supply the same type of “restrictions” that limited which length of English words are used most often.
A final example is the income of companies. Once again the ranked incomes follow Zipf’s law, with competition pressures limiting the range of incomes available to most companies and determining the few most successful ones.
The underlying theme is that efficiency, competition, or attention with regards to resources or information tends to result in Zipf’s law holding to the ranking of objects or datum of concern.
0.1 References
-
References on Zipf’s law - http://linkage.rockefeller.edu/wli/zipf/
-
Baeza-Yates and Ribeiro-Neto, Modern Information Retrieval, ACM Press, 1999.
Mathematics Subject Classification
60E05 Distributions: general theory68P20 Information storage and retrieval
94A99 None of the above, but in MSC2010 section 94Axx
- Forums
- Planetary Bugs
- HS/Secondary
- University/Tertiary
- Graduate/Advanced
- Industry/Practice
- Research Topics
- LaTeX help
- Math Comptetitions
- Math History
- Math Humor
- PlanetMath Comments
- PlanetMath System Updates and News
- PlanetMath help
- PlanetMath.ORG
- Strategic Communications Development
- The Math Pub
- Testing messages (ignore)
- Other useful stuff
Recent Activity
new question: Sorry to steal a few minutes of your time for this question, but i honestly don't know what else to do. by Whrazithar
new question: equality of the determinants of submatrices of an orthogonal matrix by ismayli
Jun 11
new correction: Typo by suitangi
Jun 2
new question: Creating another set with same cardinality. by hkkass
Jun 1
new image: ProblemOneRevised by unlord
new Education: Chapter II by rspuzio
May 31
new collection: The Calculus by Davis and Brenke by rspuzio
new question: Proofs by weixifan
new question: Summation Integration Question by trevor.nickle
May 27
new correction: typo+finite measure hypothesis by Filipe


