SFI Working Paper Abstract
2000
| Title: | Two Regimes in the Frequency of Words and the Origins of Complex Lexicons: Zipf’s Law Revisited |
| Author(s): | Ramon Ferrer Cancho and Ricard V. Solé |
| Files: | [gzipped postscript] [postscript] |
| Paper #: | 00-12-068 |
| Abstract: | Zipf's law states that the frequency of a word is a power function of its rank. The exponent of the power is usually accepted to be close to (-)1. Great deviations between the predicted and real number of different words of a text, disagreements between the predicted and real exponent of the probability density function and statistics on a big corpus, make evident that word frequency as a function of the rank follows two different exponents, $\approx (-)1$ for the first regime and $\approx (-)2$ for the second. The implications of the change in exponents for the metrics of texts and for the origins of complex lexicons are analyzed. |


