Sunday, September 9, 2018

Zipf's law: causes & derivable from Pareto

I specified the relationship between Pareto and Zipf distributions in wikipedia:


Wikipedia's excuse for Zipf's prevalence:


Preferential Attachment
A simple, common, and possibly accurate view is that there is a preferential attachment ("the rich get richer") going on, e.g. people are more strongly attracted to large cities because there are more opportunities. A common math example for preferential attachment is the Yule process.

Yule process
This is one of the best contenders for explaining Zipf's law and other Pareto (power law) distributions. In biology, the number of species in a genus (or members of a specie?) seems to follow a Yule process that results in a Pareto (power law) in its long tail. This process says a new specie is more likely to form in proportion to the number of species in that genera. It's a simple "the big get bigger in proportion to their size".  It assumes that no species die out. If that condition were included, its tail probably dies off quicker, which is seen in realistic data. Overall, this process gives a hump, or at least a dip at the front end, that is commonly seen in real-world data in their log-log plots. Zipf's law is rho = 0.

Deviation of Front and Tail from straight-line log-log plots
A simple power law such as Pareto (the continuous form of Zipf's) is a straight line on a log-log plot.  But most real data forms a hump, possibly more than the Yule process.  So the front and tail ends dip down from the straight line. The preferential attachment idea is amenable to this: after cities get beyond a certain size, there are drawbacks. If it's not beyond a certain size, there's no benefit to from the limited ability to cooperate.  If a word is used too often, it's not conveying information.  If  a word is used too rarely, no one understands it.  So a double Pareto has been suggested, to cover the front and back tails, but it gives too much of a sharp hump in the middle, so it seems a triple Pareto (power law) would be better. But in many cases, a primary Pareto with a secondary Pareto for the head or tail adjustment might be close enough to the best possible.

BTW,  Zipf-Madelbrot is simply a slightly more general form of Zipf's law, throwing in another constant that might be used to create a more general Pareto distribution (continuous form) by scaling it in a way that results in a CDF =1 at infinity.

This chapter  by a physicist is excellent.

This paper says Zipf law works because it is half way between order and disorder.  It normalized entropy per character is 1/2, half the maximum possible. (At N=100 it's 0.37 and at N=1,000 it's 0.44, using  normalized entropy = H(CDF)/log(N) in a spreadsheet).


Other possible explanations for Zipf's law
Zipf's law usually refers to systems that have an exponent of s = 1. The reason for this simple 1/rank is considered a mystery for a long time. A lot of theories have been proposed (two possibilities are mentioned above, but I don't know if they were s = 1).

An IEEE article (with more comments here) claims many experts say Metcalf's law should be that the "value" of the network is N*log(N) instead of N^2 and that this gives rise to Zipf's law. N^2 assumes the value of every additional connection is the same with every node connected to every other node. Log(N) allows loss in efficiency of the connections as the number of nodes increases. They point out the harmonic sum 1+1/2+1/3+ ... 1/N =~ ln(N) + 0.577. This is not surprising since the integral of 1/x is ln(x).So the network gives ln(N)+0.577 value to each of the N nodes, so the total network value is N*[ln(N)+0.577]. But the node math can't be converted (at least directly) to words and city populations because the nodes have an equal number of connections and the same distributions.  Maybe ideas (for words) or occupations (for city populations) could be treated like nodes. Maybe cities or words could be treated as different networks.



No comments:

Post a Comment