About Santa Fe Institute About Santa Fe Institute Education Education Events Events Network Network Research Research About Santa Fe Institute Education Events Network Research

Overview

SFI Working Paper Abstract

2000

Title:

Two Regimes in the Frequency of Words and the Origins of Complex Lexicons: Zipf’s Law Revisited

Author(s):

Ramon Ferrer Cancho and Ricard V. Solé

Files:[gzipped postscript] [postscript]  
Paper #:

00-12-068

Abstract:

Zipf's law states that the frequency of a word is a power function of its rank. The exponent of the power is usually accepted to be close to (-)1. Great deviations between the predicted and real number of different words of a text, disagreements between the predicted and real exponent of the probability density function and statistics on a big corpus, make evident that word frequency as a function of the rank follows two different exponents, $\approx (-)1$ for the first regime and $\approx (-)2$ for the second. The implications of the change in exponents for the metrics of texts and for the origins of complex lexicons are analyzed.