Economics, A.I., physics, & evolution

Thursday, July 21, 2016

an idea for an ideal cryptocurrency

Introduction, context:
Previously I have discussed the problems with limited-quantity coins. This is a coin idea that might be constant-value, or half-way between. This is an outline (without addressing implementation problems) of what I think would be close to an ideal based on the idea that maximizing median human happiness is the "meaning of human life" and the implicit goal humans should be assigning to the economic machine. I do not say "our" economic machine because we are not intentionally (let alone intelligently) cooperating to maximize the benefits we receive from the machine as whole. The "invisible hand" of competitive, cooperative, honest selfishness at the individual transaction level is not a God that makes everything better for our system as a whole, without diligent, intelligent, conscious effort at the system-wide level (such as government regulations for rule of law to encourage safe and honest transactions, and against monopolies and pollution). The prisoner's dilemma does not have a synergistic gain during cooperation unless the rules and goals of the game are intelligently known and enforced. My goal is to prevent evolutionary optimizations from mindless humans and mindless machines to sneak into our economic optimizations without regard to human happiness. But as can be seen from the following, maximum median human happiness might turn out to be equivalent to letting the machines rise, encouraging a decrease in human population. This could be painful in the short term like the black plague, but beneficial in the long term like the enlightenment. But the machines have enough wealth in efficiency that the process does not need to be painful.

Coin description:
Assign a fixed amount of coin to each person on planet. Co-join their "DNA" (not necessarily a retinal scan) and a 2-factor authorization device (random key generator based) as part of their private key(s). The number of coins each adult (25 and older) with unique DNA receives along with their 2-factor device is 1,000,000. Young people receive 100,000 per year from age 16 to 25.

Governance, fees:
If world is overpopulated, people will have less purchasing power. Pay-weighted voting might be a good balance between democracy (which can cause too many regulations, socialism, and overpopulation) and stake-weighted voting (which has problems from insiders and lobbies). Pay-weighted voting might be the only tax for implementing the governing laws. If it's not, then fees, interest, and rent should be collected by the government, targeting entities that are acting against the overall goal which is "maximum happiness per median person".

Constant quantity or constant value?
I can't decide if it should be constant quantity like this (increasing only with population), or if the government can be allowed to expand or contract coin supply based on a basket of commodities. Tracking a basket of commodities keeps prices and wages very stable and prevents boom/bust cycles. Today's financial games driving commodity prices away from supply and demand (Szabo wrote an article on this) do not help in tracking a basket of commodities. Maybe if a measurable-quantity coin(s) takes over the world, these games are not possible. (would it be harder to do fraudulent/stupid derivatives and keyboard credit that pretends to be real coin?) Government printing could be directed to reduce the effect of technological dis-employment. Constant-quantity coin could encourage dis-employment and thereby lead to reduced population and increase median happiness per person.

Coin is continually created, but population is not increasing as fast as productivity gains, so it is half-way between a constant-quantity coin and a constant-value coin.

Wednesday, July 20, 2016

Proof of stake correlate with vote power, at least not anonymously

Post to github

Voting by POS is like lobbyists influencing the government. It protects or increases assets without regard to the long term health of the economy. For a coin, "health" means "used in the marketplace" or "long term value" (which are probably mutually dependent). These are abstract ideas too far away for the vast majority of stake holders who are looking at their bank account. Stock holders are biased towards short-term profit at the long-term expense of the company. Laws had to be put in place to stop it: public disclosure of who the largest stakeholders are and how much they have. The rest of the shareholders demand they restrict themselves to if and when they can cash out. These are not even options for Zcash, so voting by number of shares held as in stocks is not a good idea.

Paying to vote would be better, like the largest taxpayers deciding where government spends its money. Government is no more than code (law) directing the proper use a currency (legal tender for taxes, court, and the marketplace). People in this space need to realize coders=government, stakeholders=voters, and nodes=banks. Voters pay taxes to the government and fees to the banks. The only difference is that computers are infinitely more efficient and reliable than brains.

A short-term investor is less likely to spend money to vote. Money gained from the vote should be invested to increase the value of the coin (like taxes for gov). Helping merchants accept it is an obvious possibility, as is paying more to researchers/coders. Destroying the voting coins is possible, but limited quantity coins are already enough of a marketplace disaster that I would not encourage it. (Markets need constant value in order to keep contracts, wages, and prices valid in the same way science needs units and conversion factors to stay the same. )

Coders/researchers/economists should be the law makers, like Plato's elite philosophers, in designing the constitution. Voters can work out details and future problems. The goal is to make the game fun, fair, and long-lasting. The winning team should not be given extra power to change the rules in the middle of the game. The winning team should even be penalized so that the games remains a distributive network, unless they want to end up playing by themselves, with no external value.
=====
My sensitivity to seeing it [coin issuance curve changes] is partly based on decision-makers being large holders of the coin in quantities unknown even to each other. The rest of this paragraph is against this type of proof-of-stake voting, an issue I posted on yesterday #1112 The complexity of the tech issues in Zcash makes the founders de facto stock-like insiders despite it being open source, insiders that may have a preference for short-term gain at long term expense and not regulated to prevent this expectation. Explicit voting by size of stake is bad because it is a bias for short term holding value at expense of long-term marketplace-use value, which is the basis of long term holding value. Under the pretense of having an unbiased interest in the long-term health of the coin is worse. Proof of stake voting with noble pretenses underlies Bitcoin's woes. "Our developers will not mine". But how will you know? I am not really concerned about this major concern, but the laws have not caught up and the company could accidentally do things that would normally be illegal if it follows a path that is intuitively good. Voting by size of stake is a bias towards a chain letter. Doing it secretly is a bias towards ponziness. Does Satoshi's abandonment of the project indicate awareness of a conflict of interest as a large anonymous stakeholder / insider that should be illegal? Would him selling without disclosure normally be illegal and this is the reason he has not sold any?

All things considered, I think the company should explicitly state its contract with society (what the coin must always be) in its principles of organization, self-referentially unmodifiable, with a copy-left inheritance requirement in the event of a buyout, rigidly connected to and defining the "Zcash"
trademark. The target audience of the contract would be with future holders of the coin, not current holders. Then add it to the blog and code, before launch.

ETH giving release names seems more necessary because the system's philosophy and understanding of itself is still changing. This is why I have very little interest in it. I would like an asset. I want them to succeed in replacing government and banking. But I still want an asset I can understand that is not connected to complexity and self-identity confusion, let alone "Turing-complete security holes".
Both supply curve and name changes give an impression of "instability". Name changing is more of an issue with me because it implies the coin's identity is changing. That's great for improving products who's primary features are changing. They need to change identity in order to advance. But Zcash should have a rigid, limited, stated philosophical identity like Excel and Powerpoint instead of CPU-like name changes. I think Zcash is trying to be anonynous, secure, bitcoin quantity and at least similar curve, distributed mining as much as possible, and fast and efficient as much as possible. Since these features should only improve and without substantial change or any foreseeable addition, names seem to add only confusion as to what the name means (is it one of many products under a Zcash company umbrella? Has the coin changed it's anonymizing or hash method?) and gives an impression of changeability. Of course everyone wants the product to improve the stated goals, but not to otherwise change. A major anonymizing or hash algorithm change is a detail that should fall under Zcash "2.0" or whatever release notes.

Saturday, July 16, 2016

A.I. and economics again, post to zcash forum

Your 1st and 2nd sentences seem contradictory. My view is that making ASIC infeasible (to level the playing field) is a drastic market interference, for a good reason. A free market evolves towards concentration of wealth and monopolies. Democratic voting creates a more level playing field (1 person = 1 vote) by causing government to write the rules (algorithmic protocol) to bias the free market (capitalism) away from concentration of wealth, towards socialization.

Developers are the "governing employees" that make Zcash more democratic, more social. Equal access to coins based on investment expense is a fair market, made possible by a "government" (algorithm). It's a democratic idea, 1 vote = an equal investment expense.

It's true my ROI is entirely speculative. My point was to show small miners will lose only if big miners lose. Equihash is a good system for preventing wealth concentration at the outset: if ASICs were feasible, special interests could be a problem at the outset. Look at bitcoin's miners.

The economic/democratic problems I'm about to describe for any constant-quantity coin like Zcash are long-term. Given no other option, not even in theory, I'm choosing Zcash to be in the 1% instead of the 99%. All economic woes are a consequence of the physics of evolution. There is no solution. Humans are not capable of subverting the physics of evolutionary progress towards higher efficiency.

Anonymity that prevents government from unjustly targeting individuals is a form of wealth distribution. It can take away power of special interests who try to subvert democracy. But it can also prevent government from performing the good aspects of its democratic role. Among other things, if a constant quantity coin becomes the default currency, compound interest always results in wealth concentration in the lenders. Gold historically works only in times of anarchy and war. The people needing loans also need an inflating currency (but not inflationary prices). Ideally all the interest charges should be used to finance all of government. Interest should be the only tax, and that tax should fund the expansion of the society (which is ~ equal to its need for the currency) so that there is no inflation in prices or wages, which keeps contracts in that coin valid (think ETH).

Equitable computation is a more intelligent network (solves problems) for deep reasons, despite being less efficient. In A.I., the most effective systems evenly distribute computation. The constant quantity of total available CPU time and memory space is the "currency" that needs to be distributed to grant access. There's a conversion factor between the CPU time and memory space that is not unrelated to Einstein's meters=i*c*seconds based on Landauer principle. Genetic algorithms, agoric economic agent systems, bayesian techniques, & neural nets seek to redistribute computation among a wider variety of "genes/agents/nodes" and the "weighting factors" ("wiring" or "links" in the web) between them by distributing computational requirements more evenly, economizing the resources towards solutions. An unused node, gene, or web page (no links to it) and a very low price (in agoric agent-based computing) or very low probability (bayesian) are all computational elements that can be eliminated from the algorithm with minor error (a universal NAND gate with no wiring to it is the simplest example of an unused computational element).

Like everyone else interested in cryptocurrencies, I want to make as much profit with the least amount of work. Constant quantity currencies might be ideally suited for the 1% and a subversion of democracy. Wei Dai expressed a similar concern about bitcoin. That's why he likes tracking commodities. An ideal coin would expand in lock-step with its M2-like usage to keep wages and prices constant, which keeps contracts in that coin valid, and prevents early adopters from profiting (gaining more access to society's finite resources) without having to work for it (contributing to society).

A constant quantity currency is contrary to equitable (intelligent) economics for these reasons. It is only optimal when the resources it represents control of are a constant, as in A.I. systems constrained a specific hardware system. It will be beneficial in times of war and anarchy for the survivalists have planned ahead. Anonymity amplifies this benefit.

Sunday, May 29, 2016

The moral intelligence of Japanese greed

This was a post in response to an article describing Subaru's comeback that included marketing to lesbians in the late 1990's. The article: http://priceonomics.com/how-an-ad-campaign-made-lesbians-fall-in-love-with/

Not being offended by lesbianism and accepting it as just a normal part of life, i.e. not even being aware it is even a "thing", bespeaks of an intelligence and open society. Combined with durability, safety, and practicality, there was a larger group of intelligent people Subaru was appealing to than just the 5 micro-groups mentioned in the article. These qualities are very Japanese. For some reason, I always thought of Subaru as European with an American flavor. It seems much more American to me than Nissan, Mitsubishi, Toyota, or Honda. Seemingly unique and different has helped them. Everyday I see a neighbor's Subaru and think "how intelligent and different" and I am a little jealous that I don't own a Subaru. He parks the car backwards in the driveway which is done in case a quick exit is needed, a mark of intelligence and concern for safety, which seems to be the features Subaru exhibits and attracts. Lack of a male companion and being an "outcast" (at least in the past) possibly makes lesbians more concerned about safety in general, not just dependability. It's an intriguing story that goes beyond lesbianism. It's kind of distracting that it's cast as a "controversial" topic. There's something more here than gay rights, marketing, or controversy, as presented in the article. I think it's a triumph stemming from the Japanese people being simple, rational, and non-judgmental. If only the rational pursuit of profit were always like this.

Wednesday, May 18, 2016

Benford's law, Zipf's law, and Yule-Simon distribution

Summary:
Language and population drop off at both ends from the log-log plot.
Benford's law is better than Zipf's for population and language, capturing the most common words better. It's below the log-log on the front end compared to Zipf's. But it seems sensitive to change.
Yule-Simon is best in the sense that it has an algebraic function that is easily solvable and is better than Zipf's, dropping off at the high on a log-log plot as is seen in population and language. It is based on evolution, I believe considering new species being added. When made "with memory" (not so algebraic, probably a differential equation), it was made to work really good. It might apply really well to social/computer networks where nodes are added. Words have connections to each other like a network.
Douple Pareto Log-Normal (DPLN) seems to have more interest, maybe even applicable to a lot of physics. It combines "geometric Brownian motion" (GBM) (a differential equation with a feed source and random changes) and Yule-Simon. The GBM is a "pure form" of Gibrat's law for cities. Gibrat's says cities start with a log normal distribution, which I believe causes the tail end to drop off since Yule drops of the other end. Pareto is log-log and has a "killing constant" that might drop off the tail. I do not know why they call it double pareto unless it is because it is like using two pareto curves, one for the top and one for the bottom.

The differential equations seem to be needed because it allows a "feedback", i.e. current state is used to calculating future states. For example, words, species, and cities are competing with each other for popularity in a limited "space". People feed words by employing them, environment feeds (employs) species, and cities employ (feed) people. But once feeding gets massive, there is drawback: the more a word is used, the less information it can convey due to how Shannon entropy/word is calculated. City density starts decreasing the efficiency benefits. Environments run out of food. On the tail end, rare words carry a lot of info, but few people know them. Fewer members of a species means less mating opportunities for gains in natural selection (Darwin realized this). Fewer people means fewer job options. There is a wide middle ground with an exponential. It is higher on the tail end as "population" benefit starts to kick in, and decreases at the high end as efficiency exponential starts being blocked by the energy (species), time (language), or spatial (cities) limits.

This is possibly my favorite of the articles:
http://www.cs.uml.edu/~zfang/files/dpln_ho.pdf

I checked Benford's law log(1+1/r) times 2.2 compared to Mandelbrot's modified Zipf law ~1/(r+2.7) for english. After the rank of 21, the error is less than 5%. It's higher for ranks 1 to 21, matching the first few English words better. Both are too high for large r. Benford also predicts country populations better.

Concerning the relationship between the Zipf and Benford:
http://mathworld.wolfram.com/BenfordsLaw.html

The Pareto distribution is a similar function applied to wealth (Xmin/X)^a where a greater than 1 and has been used as a measure of wealth inequality.

But it appears the wide-ranging real-world observations of these power-like laws is largely the result of the "preferential attachment". In short "success breeds success", the rich get richer. Words that are common become more common because they are common. Same thing with cities and species. Darwin wrote about how species become distinct because when you have a larger population to breed with, you have more options for the best selecting the best. Cities become more efficient in terms of providing potential employment. Companies gain efficiency as they get larger, allowing them to get larger. The kind of ranking that results from this is the Yule-Simon distribution. On a log-log plot, it give the most common words lower than expected from a log-log plot, which is what words do. It's formula is

freq = x*x!R!/(x + R)!
where x! is the gamma function of x+1 and x is a real value greater than 0. R = rank-1. (x+R)! is the gamma function of (x+1+R). The Gamma function is the continuous version of (N-1)!. I would call x the "amplifier" in the positive feedback. k*k!*x!/(k+x)! For x=1 it is R!/(1+R)! = 1/R = zipf's law.

But it is inadequate for the tail end as it is straight when it also needs to drop off. One of the following papers used the formula expressed as P(r) = 1/r^a where a=1+1/(1-p) where p is a constant probability of a new word being added during a time step. In this version they modified it to have a downward concave shape, so it worked really well.

It has been show to model language excellently and in city population

Yule-simon Works better in language

http://arxiv.org/pdf/1412.4846.pdf

works better in cities

http://arxiv.org/pdf/1506.08535.pdf

But there is a dropping off of the log-log straight line at both ends in most data that the straight Yule-Simon law does not handle. Successful cities do not merely add new nearby cities as Yule shows. The bigger city's relative population drops off from from this which is a different way of saying maybe overpopulation starts losing efficiency of its attraction. On the tail end there is there are otehr disadvantages. Commonly-used words are used more often because they are common, but since they convey less information due to being common, the effect is limited which prevents it from following a straight log-log curve. On the other end rare words are more rare than expected because not enough people know them to be able to usually use them. Similarly cities would follow a strict log-log curve due to statistics, but inefficiencies are created for different reasons in the most and least populated regions. In animals, they either start eating each other's food source, or they are not able to find a mate. Wealth on the other hand may not be subject to an "overpopulation" effect.

So the DPLN may be the ultimate:

For cities if not a wide range of physics, it seems better to combine Yule with the Geometric Brownian Motion (GBM, random variation of a random variable with a fuel source for new entrants) which is supposed to be Gibrat's log-normal law for cities in its pure form.

"A random variable X is said to follow GBM if its behavior over time is governed by the following differential equation dX = (µdt +σdB)X, (15) where dB is the increment of a standard Brownian motion (a.k.a. the white noise). For a GBM the proportional increment of X in time dt comprises a systematic component µdt, which is a steady contribution to X, and a random component σdB, which is fluctuated over time. Thus the GBM can be seen to be a stochastic version of simple exponential growth."

GBM feeds in new populations or words, and where they settle has a random fluctuation. Maybe this some how causes the tail to drop off, as Yule causes the high end to drop off.

Here's the best complete explanation of city sizes.

"The double Pareto lognormal seems more appropriate since it comprises a lognormal body and power law tails. Reed [36] suggests a GBM model, similar to the one that models personal incomes, for obtaining the settlement size distribution. Individual human settlements grow in many different ways. At the macro level a GBM process can be used to model the size growth by assuming a steady systematic growing rate and a random component. The steady growing rate reflects the average growth rate over all settlements and times, and the random component re- flects the variability of the growth rate. The time when a city is founded varies from settlement to settlement. If we assume in the time interval (t,t + dt) any existing settlement can form a new satellite settlement with probability λdt, the creation of settlements is a Yule process [39], which was first proposed as a model for the creation of new biological species. Under Yule process, the expected number of settlements is e^λt after t time since the first settlement. That is, the number of settlements is growing at rate λ. Therefore, the existing time for all settlements is exponentially distributed. It is straightforward to conclude that under GBM and Yule processes, the overall settlements size distribution will is a double Pareto distribution. If we further assume a lognormal initial assume a lognormal initial settlement size, the result will converge to the double Pareto lognormal distribution

Reed 2004 , DPLN invention, applicable to physics

http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf

http://www.cs.uml.edu/~zfang/files/dpln_ho.pdf

http://www.ekf.vsb.cz/export/sites/ekf/projekty/cs/weby/esf-0116/databaze-prispevku/clanky_ERSA_2011/ERSA2011_paper_00597.pdf

Thursday, May 12, 2016

Problem with bitcoin and gold as currency

In a previous post I discussed the problem with bitcoin's constant-quantity of money. Wei Dai has commented that he views bitcoin as probably problematic for probably similar reasons. But even an asset-backed currency such as Gold or b-money has a problem.

Hard core money, an objective asset that retains its value is great when doing transactions with potential enemies. It should have an important place in transactions across disparate legal systems such as countries. You want to walk away without future obligation (a threat). "Cash on the barrel head" has its place with potential enemies not mutually adhering to a higher law or assurance of mutual benefit after the transaction (that higher law and mutual benefit are often the same thing). But money does not have to be meant only to objectively optimize isolated transactions without regard to a wider society. It can be more intelligent than that, finding optimal solutions to the prisoner's dilemma on a grander scale, beyond the immediate participants in a transaction.

The problem (or lack of "optimality") occurs in systems where you are not the only one who is important to you. It's not ridiculous or anti-evolution-theory to assume you will sacrifice a part of your profit for the benefit of others, especially if it is a small cost to you and a great benefit to others. If you count your success as dependent on society's success and not just your bank balance, there's more to consider. This is why a constant-value coin is not ideal. By making the value of the asset vary with something other than a stable asset, pro-social law (aka system-wide intelligence) can be implemented.

The fundamental problem with a constant-quantity coin like Bitcoin or gold is that it is an anti-social money. It seeks for the holder to maintain value without regard to the condition of society. Society can go to hell and Gold (at least) will still have value. That's a-social. Past transactions that result in a person holding an asset should be post-invalidated if the sum of those transactions resulted in disaster for all. Every transaction should carry a concern for all, present and future. That is a characteristic of a system that displays cooperative intelligence. There should always be a feedback measurement from the future of the entire community of people you (should) care about back to your current wealth. This feedback is a scientific measurement as if the past was an experiment. It enforces decisions on how to make future measurements, seeking an optimal outcome. Defining an optimal outcome should be the first step. (this is not easy, see footnote 1). Deciding how to measure it is the second step. Deciding how to use the measurement in order to adjust your actions in order to maximum the outcome is the core intelligence (see footnote 2), once you've realized the important of steps 1 and 2. Technology has advanced so rapidly, we never formalized a consensus goal for 1 well enough for defining a number 2. As Einstein said, the defining characteristic of our age is an excess of means without knowing what we want. It used to be we just wanted money so that we could have food, sex, and children, or to have enough pride via money and/or social relative to our peers that we felt justified in having children.

Side note, Nick Szabo has pointed out that keyboard credit from modern banking allows speculators to change the price of commodities as much as supply and demand. In what I describe here, that would need to be prevented.

This is why a coin that adjusts to keep commodity prices constant is more intelligent. Laws against monopolies and pollution can regulate transactions to prevent the anti-social nature of maximizing profit per transaction. That's not the benefit of a commodity coin. A commodity-coin has a different kind of system-wide intelligence. If commodities are in excess to demand, the prices will try to fall. So a currency following a basket of commodities will "print" more of itself to keep commodity prices stable. In a growing economy, the excess money could replace taxes, so it would merely fund government, or it could fund the building of more infrastructure to make it's workers healthier, happier, and/or more competitive with other countries. That would demonstrate intelligence that is good for the system's future. A less intelligent outcome (defined as bad for the future strength of the system) is to print the money to buy off voters or to bailout corrupt, inefficient, useless banks with QE.

Printing more money when commodity prices falls prevents the type of destructive deflation that occurred in the Great Depression. Instead of printing more money, they burned food in the fields. They stopped producing real assets like commodities on purpose instead of producing paper money.

If commodities get scarce, the money supply would contract along with them, raising its value. This promotes savings and working. Theoretically the savings and working would be directed towards more commodity production to return the system to health.

In the first case, an economic boom is allowed because the availability of commodities indicated it could afford it. In the second case a bust is prevented by making everyone work harder.

In the first case, savers are penalized. It should be this way because their capital is no longer needed to invest in producing more commodities. It needs to be spent, and if they are not spending it, then "the people" will spend it on the street, reaping the rewards of past savings. Commodities are the measure because they are the fundamental inputs to everything else that require the largest investments.

In the second case, everyone is biased towards being more of a saver.

footnote 1) Should we have a higher median happiness, or a higher median happiness times number of people? Should we restrict it to people? Or should we have a preference for machines? They're infinitely more efficient (see past posts of my measurements of their ability to acquire energy, move matter, create strong structures, and to think about how to do it). They'll be the only ones capable of repelling an alien invasion and to engage in the most successful invasions themselves.

footnote 2) Intelligence requires feedback from observation to modify the algorithm. Engineering control theory writes it as a differential equation and block diagrams "consciousness" as the subtraction from where you are from where you want to be, and takes action on the difference (the error signal). I am not sure if there's any A.I. that is not a variation of this. If it is not making an observation (science) to increase intelligence, is it an adaptable intelligence? In past posts I've mentioned how the simplest form of this feedback is also an element (like a NAND or XOR gatee) that can be used to implement a complete Turing machine. A house thermostat is an example. There is also a reduction in entropy in intelligence, taking a lot of observation to classify observation into a much smaller set of action. The error-signal-of-consciousness may need a reduction (classification) of the observed world. I believe Schrodinger discussed this in "What is Life?"

Tuesday, May 10, 2016

relation between language and physical entropy, the dimensions, zipf's law

A post to reddit on possibility of Wei Dai being Satoshi.

Thanks for the explanation. Languages have various "attractors" for letters, words, and words groupings (idioms). The letter frequencies are not random because they represent phonemes that come from physical mouths that must follow a certain forward sequence. Listen to words backwards and you know how hard it would be to say it like that and you can't recognize anything. Listen to music backwards where the instruments are time-symmetrical due to less complexity compared to a mouth and in 2 seconds you know the song and it has the same emotional content, minus the words.

People expect certain word sequences. The word and phoneme "attractors" are like a gravitational field in 2 or 3 dimensions. Someone smart and always writing instead of talking can break away from the phoneme and expectation attractors and convey a lot in a few words. Einstein was like this. Szabo has half the frequency of his most common words compared to Satoshi and Wei which means his language is changing more. There's more true information content. On the other hand, someone smart or always talking instead of writing may want to be very clear to everyone and not break convention.

The extent to which a person has attractors (is living in a strong gravitational field) determines how sharply their word frequency drops down (Zipf's law for words in language, city populations, etc). Closer to "earth" would be higher word frequency, or living in a high gravitational field forces more words closer to Earth. Szabo's intelligence (or lack of too much concern if you can't follow) allows him to escape the gravity and say rare words more often, conveying more information. Measuring that was my original objective. That could be an independent way to identify an author (it's a different single dimension metric that conflates all the word dimensions you're talking about into one).

Large cities have an attractor based on opportunity and efficiency caused by the concentration of people that's self-re-enforcing. Convention in a community is self-re-enforces in words. So is "ease of speaking vowels" so they occur more frequently because less real energy is required to speak them, so they are in a low gravitational potential.

*[edit: My point in all this is to point out the curse of dimensionality as I understand it from you is that it assumes a random distribution. In my view, the "upper atmosphere", although larger in volume per radius increase from the center (the metric we're interested in), there will be fewer gas particles per volume (words) due to the gravity of a speaking/writing human's constraints). Our objective is to identify constraints that all people have, but also have varying gravitational constants for that constraint. People have different nitrogen to oxygen atom ratios in their atmospeheres. I have strong interest and experience in the relation between physical and information entropy, and words are at the intersection. Everything is a word, aka a symbol on a Turing machine, and people are running different algorithms on those symbols. The physical entropy is a function of ln(states/N!) where N is the number of particles and words also have this ln(1/N!) characteristic due to zipf's law and both are related to an energy in the system. Normal Shannon entropy assumes sampling with replacement is possible (2^n =states where n= number of bits and N=2 unique symbols), but this is not the case in physical entropy where each particle is sampled only once, so (1/N!)^n shows up as well as in fixed-length text where people have constraints on how often they can or will choose a word. computers do not have this constraint because there is not an energy cost to sampling with replacement. ]*

The origins of Zipf's law has always been a mystery. Many remember reading about it in Murray Gell-Mann's the Quark and the Jaguar. It was the only interesting thing in his book. But recently there have been good papers showing how it is probably derivable from Shannon's entropy when each word or person has a a a log of the energy cost or energy savings by being attracted to local groupings. There's feedback going on, or a blocking which means y=y' in differential equations so that the sum (integral) of y=1/x (which is Zipf's law, x=rank, y=frequency) gives a ln(x). So we're not fundamentally checking frequencies as much as we're comparing the rank of each word by using ln(x1/x2) which a subtraction of a frequency ln(x1) - ln(x2). Actually, we might need to do this on ranking's instead of frequencies, but you can see how similar it is. I did try it briefly and did not notice a difference. But there may be some good idea like applying it to singles with the other method on pairs, then finding conversion factor multpilier between the two before adding them (or a sum of their squares which won't make much difference) for a similarity (or author difference) metric.

It's always better to use lower-dimensions when there is a real low number of dimensional attractors working behind the scenes, if you know how to rank "how high" each word, word pair, or vowel is in that dimension. It's best (at least less noisy) but difficult to remove the effect the other 2 dimensions are having, probably requiring something like bayes theorem. Stylometry (word diagramming) would be a 4th dimension. There is a real physical person that works differently in those dimension, so it not good to be reducing them to a single dimension. The animal organ weights are only rough. Placing each weight in a dimension and not conflating the dimensions gives infinitely better categorization. Each word could be a dimension like you say, based on someone's experience and education. But if they are reading each other's writing and "attracted" to certain words and pairs because they know the other one uses it (Dai, Yudowksy, Back, Finney, and Satoshi) it reduces the chances they will NOT say the Satoshi words, by "taking up space" in what could have been said differently.

But in every word, letter, and idiom that is not in the core of the topic at hand, the simpler dimensions could show up and be measured by this sum of surprisals method, but broken out into 3 dimensions instead of 1. The group that won the Netflix prize started in hyperplanes of dimensions, whatever that means.

The open software SVMLight is the best way to do what I'm attempting (there's a simple ranking option), but I'd rather exhaust my ideas before trying to figure out how to use it.

What you're calling a gaussian" is really only because of a bad sampling of files, or having a true match. Great sampling should try to PREVENT having a "gaussian" good match by forcing it into a linear increase.

There should be a way to reduce or increase words in #1 and #2 as a result of comparing #1 and #2. Then increase or decrease the remaining word ratios. Then compare again with the mystery file and a true match should get better while the less match gets worse. "He who is the enemy of my enemy is my friend" or "He who is my friend's enemy is my enemy." It should be applied blindly, not making a distinction between #1 and #2 and being symmetrical.

Word pairs gave me twice as much distinction between the ratios I am saying are the key (#3-#2)/(#2-#1) = 5 whereas single, triple, and quad words gave 2.5. This was comparing Dai, Yudowsky, and gwern, all from the lesswrong site, and commonyl showing up in the same threads. I used 2MB on each to Satoshi's 253 kb.

Entropy of an ideal gas of n particles is S = A*ln[(Volume of container)^n/n!] +B*ln[((Energy in container)/n!)^n)]. This different from information entropy that takes the form S = log((values/memory location)^n) = N * H. Physical entropy carries more information per particle than information entropy does per symbol because of the n! that comes from the particles being selectable only once where symbols can be re-used. This would normally mean less information was possible. But the number of unique symbols in physical entropy is the number of states per particle which increases if not all the particles are carrying the energy. In short, physical entropy can carry information in different ways that information entropy cant.

But language has some physical entropy aspects to it. We can say the same message in many different ways that uses a larger or smaller set of symbols. Information entropy assumes the symbols used in a message were the only symbols that were available.

There is a physical energy cost for the different words we use, and there is a container of constraints (custom and word ordering) on the things we can say.
=============
udate: in trying to carry the above possible connection further, I've failed:

language entropy
S= N*sum(-k/rank/N*log(k/rank/N) = [A log(1) + B log(2) + ...] - k/(n/2*(n/2+1))* log(k)
Where N is a total words, not unique words n that are equal to max rank.

The entropy of an ideal gas (Sakur-Tetrode equation) of N molecules (and probably any physical entropy) can be written as
S = C*log((internal energy/N!)^N) + D*log(volume^N/N!)
S=N * [ C log(U) + D log(V) - C log(N!) ] - D log(N!)

===========

An encoding scheme of a language when the language does NOT follow Zipf's law might result in the encoding following Benford's law (aka ~ Zipf's law). It might follow Benford's law better than most languages.

Language might follow Benford's law (data is more likely to begin with the number "1") instead of Zipf's law. I read English follows 1/rank^0.85. In looking at the 1st table in the wolfram link below, I see Benford's law for rank 1 divided by rank 9 is almost exactly equal to saying English follows 1/rank^0.85. Notice Benford's law is derived from a p(x)=1/x that might be the source of Zipf's law. The article says Benford's law (and the 1/x) results from a dimensional measurement that is scale-invariant or from the distribution of a distribution of a distribution... I do not know if word frequency is a physical measurement that is invariant under a change in scale, or if it is the distribution of a distribution of a distribution.... http://mathworld.wolfram.com/BenfordsLaw.html

So I have 3 possibilities for why language follows ~Zipf's law. My feeling is that it is not either of the above, but the 3rd possibility I mentioned before: the result of competitive positive feedback in the efficient use of symbols. The system of differential equations could cause Zipf's to fail at the upper and lower ends.