Economics, A.I., physics, & evolution

Friday, January 26, 2018

Fundamental limits of computing

Moore's law has almost come to a halt in our typical computing devices. By "typical" I mean based on silicon and non-reversible synchronous computing. Reversible computing might achieve a lot more, but it is a suspect idea due to quantum-based dispersion of precision in moving forward and backwards. Switching to something like carbon nanotubes could extend the limits, but those and any other material will still have limits based on causes like those below. Asynchronous computing has no limit, but breaking tasks up into parts for disparate computing has fundamental problems. On the other hand, brains are mostly asynchronous and capable.

The computing limit for irreversible synchronous non-quantum silicon computing is based on:

1) A logic device requires a certain minimum number of atoms.

2) Heat per bit change is released in irreversible computing, aka Landauer's principle, Q = T*k*ln(2). There may be a way around this. This does not change the basic limit in entropy cost that will ultimately come at an energy cost, but it might be capable of computing without a local increase in heat.

3) Probability of error increases as the energy we use to store or transmit a bit gets near Landauer's limit. I think the probability of error per computation or transmission is based on e^(-E/Q) where E is the energy per bit used to transmit or store the bit and Q is Landauer's limit. A modest 100 M transistor 3 GHz device would have an error every hour if E is 40x Q.

4) The speed of light is limited.

5) Atoms break apart if they get too hot.

6) The amount of heat a cold sink can extract out of a computing device is limited, even if it is only 1-layer thick.

In order for transistors to change state in step with each other, the distance across the chip is limited by the speed of light and the processor speed. The faster the processor, the smaller the chip has to be. For example, if the longest trace from the clock input to a remote transistor is 4x the chip die diameter and every transistor has to acquire the needed state in 1/4 a clock cycle, then the max diameter at 3 GHz owing to speed of light is 3E8 m/s / 3 GHz / 4 / 4 = 6 mm. This is approximately what is seen in current chips. Since the number of transistors increases as diameter squared, and the available size limit due to speed of light increases linearly with decreases in speed, more synchronous computing can be done with slower chip speeds. This also reduces the heat problem so that more layers of transistors can be used. This is why chip speed is not increasing. To take advantage of this (if we wnet to a slower speed), going to more channels (128 and higher instead of 64 bit) or more cores would be needed. More cores is currently the method that takes advantage of the largest die diameter allowed by the speed of light. A 3 MHz chip could do 1,000x more computing at a cost of 1,000,000x more transistors. It would not need to be 6 meters per side as this implies because it needs to dissipate 1,000x less energy per second which means it could be more layers thick.

Based on the speed of light, the above was pretty good at predicting current chip diameter. Now I'll try to predict chips speeds based on Landauer's limit. I explained energy usage per bit needs to be at least 40x Q to have 50% chance of error every hour. It only needs to be 50x to increase that to 1 error every 100,000 hours. I'll assume in practice it is currently 100x. I see a 14 nm AMD Ryzen 7 1800X chip has 4.8 billion transistors (14 mm per side) and uses a max of 95 Watts. This is not an intel chip, but intel says a 14 nm die has a max of 105 C. I'll guess transistors actually reach about 150 C locally. It's 4 GHz. I'll guess the average transistor changes state only 1/4 the time (once per 4 clock cycles). So, 100x Landauer's limit times the number of transistors times the clock speed divided by 4 = 100*(150+273 kelvins)*1.38E-23*ln(2)*4.8E9*4E9/4 = 2 watts. Actually, if every transistor needs to be able to not an error in 100,000 hours with my 2x energy safety/inefficiency factor, then the 1/4 maybe should not be applied., giving 8 watts in a weird adjusted basis. They seem to be wasting 95/8 = 12x more energy than the limit of what they could. The limit here seems to be chip temperature. They've made it as fast as they could without "melting". They've made the dies as big as they could for that speed. Smaller chip processes allow a squared factor (or more) of less energy usage while allowing a squared factor of more chips per die of the same size. So they can continue to make them smaller at the same speed and the heat profile will not change. The 50x limit implies they could go SQRT(12) = 3.5x smaller with my 100x estimate of a hard limit. That would be down to 14/3.5 = 4 nm. This is far from the limit based on heat dissipation. My assumptions could have made it 1 nm. I see 5 nm is scheduled for about 2020. Since atoms are only 0.2 nm wide, that must be getting close to the size limit before they have to resort to slower processor speeds to enable larger dies (or more layers).

Sunday, January 21, 2018

Compressing short texts

I want to compress text to 512 bytes as much asp ossible for Zcash and Hush memo field.

I didn't have any luck searching the internet on how to compress short texts....until I had finished doing it from scratch. Normal compression algorithms are not useful because they need to include various types of look-up tables in the data which can make small texts even bigger.

I used Huffman-like look-up tables for the language at hand, using the known frequency of letters and words to make decisions, combined with other tricks based on knowing things like a capital letter will probably be a period. I used 2 bits to tell the decoder how many bits were about to follow depending on which table contained the word or letter to replace. 4, 6, 7, or 12 bits followed the 2 bits. The 4 bit table was for the 16 most common characters (e,t,a,i,o,l,s,n, newline, period, comma, from memory). The 12 bit table was for less-common words. I had a table of 4096 words. A 6 bit table was for the rest of ascii-printable characters, and the 7-bit was for the most common words.

But here's a better job of it in precise detail:
https://academic.oup.com/comjnl/article-pdf/24/4/324/1039097/240324.pdf

He and I got about compression down to about 45% of the original english files. He did it with only 250 words instead of 4096, so his method is really good. We're using almost exactly the same scheme, but he used only 4 bits for the most common letters which is why he was able to do as good with a lot less. It's hard to get better no matter how big the tables are, but my next improvement could be to follow his method exactly, but add another table for 4096 words with 12 bits after the 4-bit.

His method has a big additional advantage: by the bits staying multiple of 4, a common procedure can be applied: Burrow-Wheeler transform (BWT) followed by move-to-front (MTF), then Arithmetic encoding (AC). BWT gets like-symbols closer together which enables MTF to more frequently assign a low-number position number. BWT and MTF do not do compression. AC does the compression following MTF to make use of the lower entropy due to more frequent position numbers. By itself, the BWT-MTF-AC method got the original file down to 60% using 4 bit sequences. It did not benefit my method because my odd-length method offsets the bits all the time, making 4-bit sequences more random. 4-bit sequences are better instead of byte for short messages because AC has to attach a "dictionary" to the data (i.e. a list of each symbol's count in the original) and 4 bits means I need only 16 counts up 256 if no symbol is more than 25% of a 1000 byte pre-compression text (since my goal is 512 bytes and I won't get more than 50% on text already pre-compressed). That means only 16 extra bytes needed verses 256 for 8-bits (I'll just list the count in a logical order of fixed length instead of including both symbol and count). MTF will be a little longer since the ranking on 1000 bytes will require a distance of up to 2000 for 4-bits instead of 1000.

The 4-bit method is also immediately applicable to UTF-8.

Wednesday, December 27, 2017

Using difficulty to get constant-value dev fees

I posted this to bitcoin-ml and bitcoin-dev mailists and to reddit and bitcointalk, but no one seems interested.
========
Has anyone used difficulty to get constant-dollar developer or node
fees? Difficulty is exactly proportional to network hashrate, and
network hashrate is closely proportional to coin price.

Say a coin is currently $1.23 and someone wants to get a fixed income
from the coin like $0.01 each time something occurs. To achieve this
they could use a constant that is multiplied by the difficulty:

fee = 0.0123 * difficulty_at_$1.23_per_coin / current_difficulty / reward_per_block_at_$1.23 * current_reward_per_block

Dollar value here is constant-value relative to when the ratio was
determined (when difficulty was at $1.23). If hash power is not able
to keep up with coin price (which is a temporary effect), the value
would be larger than expected. Otherwise, the real-world value slowly
decreases as hashing efficiency increases, which may be a desired
effect if it is for dev fees because software gets outdated. But
Moore's law has gotten very slow for computers. Hashing should get
closer to being a constant hardware cost per hash.

To get constant value:
Q1/Q0 = D0 / D1 * moores_law_adjustment
Q = quantity of coin in circulation
D = difficulty
0 = baseline
1 = current

Electricity is more than half the current cost of hashing and
could soon be 3/4 or more of the cost. Worldwide electricity cost is
very stable and possibly the best single-commodity measure of constant
value.

Also, all coins for a given POW (if not in an even more general sense) will have the same factor above, adjusted only by multiplying by 2^(x-y) where x is the number of leading zeros in maxTarget for the other coin, and y is the number in the coin above.

In a very idealized situation where the algorithms running on computer hardware control all physical resources with a certain constant level of efficiency, then any increase in the hardware capabilities (Moore's law) would be proportional to the increase in the efficiency of the economic system, so there would not be a Moore's law adjustment. This is hyper-idealized situation in the distant future, but I wanted to point out the quantity ratio with difficulty has deep roots.

Tuesday, November 21, 2017

Best difficulty algorithm

I'm changing this as I find improvements. Last change: 12/1/2017.

Newest information is here:
https://github.com/zawy12/difficulty-algorithms/issues/3

This first one appears to be the best all around:

# Jacob Eliosoff's EMA (exponential moving average)
# https://en.wikipedia.org/wiki/Moving_average#Application_to_measuring_computer_performance
# ST = solvetime, T=target solvetime
# height = most recently solved block
# N=70
# MTP should not be used

for (i=height - 10 to height) {  # check timestamps starting 10 blocks into past)
   previous_max = max
   max=timestamp[i] if timestamp[i] > max
}
ST = max - previous_max
ST = max(T/100,min(T*10, ST))
next_D = previous_D * ( T/ST + e^(-ST/T/N) * (1-T/ST) )

A word of caution with the above EMA: when converting to and from bits field (aka target) to difficulty, it is important to not make a consistently wrong "rounding" error. For example, if previous difficulty was 100,000, it is important that nothing in the code make it consistently +50 more or consistently -50 less (0.05% error). That would cause the EMA at N=70 to have 3.5% error in solvetime. At 0.5% error per block, there is 35% error in the solvetimes (difficulty is 30% too high or low). The error that develops seems to be based on about 1.005^70 = 41%. If half the time it is +1,000 too high and the other half -1,000 too low, then it's OK. just don't be consistently wrong in the same direction. Error in the value for e=2.7183 does not hurt it.

In an simple average (SMA), a multiplicative error like this only causes a proportional error in solvetime, not a compounded error.

There is a T/solvetime ratio in two places. It must be the same in both places. I don't know how it would be coded to give something different, but it's something to be aware of.

==========
The following attempts to make it smoother while having some ability to respond to faster to big hashrate changes.

# Dynamic EMA difficulty algo (Jacob Eliosoff's EMA and Zawy's adjustable window). 
# Bitcoin cash dev(?) came up with the median of three to reduce timestamp errors.
# For EMA origins see 
# https://en.wikipedia.org/wiki/Moving_average#Application_to_measuring_computer_performance
# "Dynamic" means it triggers to a faster-responding value for N if a substantial change in hashrate
# is detected. It increases from that event back to Nmax

Nmax=100 # max EMA window
Nmin=25 # min EMA window
A=10, B=2, C=0.37  # A,B,C = 10,2,37 or 20, 1.65 0.45, 

# TS=timestamp, T=target solvetime, i.e. 600 seconds
# Find the most recent unusual 20-block event
for (i=height-Nmax to height) {  # height=current block index
    if ( (median(TS[i],TS[i-1],TS[i-2]) - median(TS[i-20],TS[i-21],TS[i-22]))/T/A  > B 
           or  
         (median(TS[i],TS[i-1],TS[i-2]) - median(TS[i-20],TS[i-21],TS[i-22]))/T/A  < C  ) 
      {   unusual_event=height - i + Nmin   } 
}
N = min(Nmax, unusual_event))

# now do the EMA shown above with this N

Here's another algorithm that seems to be about as good as the EMA

#  Weighted-Weighted Harmonic Mean (WWHM) difficulty algorithm
# Original idea from Tom Harding (Deger8) called "WT-144"  
# No limits, filtering, or tempering should be attempted.
# MTP should not be used.

# set constants
# N=60 # See Masari coin for live data with N=60
# T=600 # coin's Target solvetime. If this changes, nothing else needs to be changed.
# adjust=0.99 # 0.98 for N=30, 0.99 for N=60
#  k = (N+1)/2 *adjust * T # there is not a missing N. 
# height = most recently solved block

# algorithm
d=0, t=0, j=0
previous_max=timestamp[height - N] 
for ( i = height-N+1; i < height+1; i++) {  # (N most recent blocks)
    max_timestamp=max(timestamp[i], previous_max)
    solvetime = max_timestamp - previous_max
    solvetime=1 if solvetime < 1
   # for N=60, 10*T solvetimes drives difficulty too far down, so:
    solvetime = 10*T if solvetime > 10*T 
    previous_max=max_timestamp
    j++
    t +=  solvetime * j 
    d += D[i] # sum the difficulties this is how WWHM is different from Tom's WT.
}
t=T*N if t < T*N  # in case of startup weirdness, keep t reasonable
next_D = d * k / t 

# limits on next_D do not need to be used because of the above solvetime restrictions
# but if you still want limits on D's change per block & expect max 5x hash rate
# changes or want to replace the solvetime restrictions, use
# limit = 5^(3/N)
# next_D = previous_D*limit if next_D > previous_D*limit
# next_D = previous_D/limit if next_D > previous_D/limit

Monday, November 13, 2017

Maximizing options as the basis of memory-less intelligence

There seems to be some big news in A.I. and cosmology. To give an example of how far-reaching this idea is, view walking upright and the ability to use our hands as something that maximizes our options rather than something that gives us more power in any other sense. This simple algorithm can successfully play games without any training at all, other than defining "maximize future options" for a given set of rules.

I am far from certain his general view is correct (or even expressed clearly enough to criticize. I still can't view it as anymore than a method of problem solving when his claims are much greater than that, but I can't exactly understand his claims. It sounds like he is saying things appear intelligent because nature somehow keeps options open.

Ted Talk
https://www.ted.com/talks/alex_wissner_gross_a_new_equation_for_intelligence

General idea:
https://physics.aps.org/articles/v6/46

How it's good at playing atari without training:
http://entropicai.blogspot.com/2017/06/solved-atari-games.html#more

On freedom in society making us more powerful:
https://www.edge.org/response-detail/26181

Basic physics of causal entropy and intelligence
http://www.alexwg.org/publications/PhysRevLett_110-168702.pdf

How it can predict the cosmological constant by following the anthropic principle:
http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-12353.pdf

Sunday, October 15, 2017

Smoothing coin release rate for bitcoin

If a coin wants to implement a continuous reward adjustment to give the same number of coins released as the same rate as bitcoin use:

C/block = 69.3*(0.5)^(-height/210,000)

But to implement it in an an existing coin, it takes a little more work to find a solution:

The halving events in BTC seem absurb in being such a powerful step function. Due to rounding error and trying to keep the same quantity of satoshis emitted every 4 years, it takes some effort to find a formula that exactly ends in 1/2 as many coins per block after 4 years (210,000 blocks) and gives the exact same number of Satoshis in those 4 years. Here's what I came up with.

Once every 4 hours, starting half-way into a halving event, set coins awarded to

BTC=C*int(1E8*A*B^(N/8750))/1E8

where

C = number of coins supposed to be emitted in this halving event.
A = 1.072026880076
B = 0.46636556
N = blocks after the 1/2-way point, up to 8750 when N goes back to 1 and C is cut in half.

8750 is 4 years' worth of 4-hour periods. I didn't like 5, 7, or 10 hours because it is not a multiple of a day. 2 hours would have been harder to check in excel, having 2*8750 adjustment per halving. Other options were not an integer number of blocks. I guess 8 hours is an option.

I guess it could be improved to not have to reset C and N every 4 years.

I believe there is a starting point that is better in the sense that it results in a simpler equation that is good for all time, like ln(2) = 0.693 into a halving event or maybe 1/e into it or 1/e before the end.

Wednesday, October 11, 2017

The ideal currency (new)

This does not replace my previous ideal currency article that talks about a p2p coin that depends on "reputation" as the coin itself. I want to connect the lowering of physical entropy on Earth to currency.

Previously I described how all characteristics of an ideal currency such as Nick Szabo's list (scarcity, fungibility, divisibility, durability, and transferability) can be derived from the desire to have constant value in space and time. The best measure of "value" results in a more specific definition: ideal constant value currency is proportional to the net work energy available per person. A legal system creates the relevance of the currency and guides system-wide goals. Otherwise it is an anarchy-type asset where individuals are not necessarily cooperating for a larger goal such as survival and promotion of a society. I'm not talking about that type of currency. The assumed legal system dictates how the energy may be used, which includes enforcing the concept of ownership of "the work energy assets" and enforcing law (settling disputes, collecting taxes, etc) in that particular currency (unless the transfer of other "work energy assets" in certain cases is more appropriate). Intellectual property does not have a visible net work energy that is proportional to its numerical value, but it ostensibly increases the net work energy available in other assets by making the use of them more efficient, even if only by entertaining people which may enable them to work better. Assets have a real net physical work energy, but in calculating how much more currency should be in the system due to those assets, the costs of anything such as intellectual property needed in its conversion to work should be first subtracted out.

There is waste energy as heat when work energy is expended, and the amount depends on the form of the original energy. There is also wasted energy when work energy is expended to get other work energy where it is needed. These and other forms of waste are not included in the "net total work in the system per person" that I'm talking about. So, I can't say this work is exactly based on Gibbs free energy ( E = U + pV - ST) of a set of atoms in some system because that is measured before these wastes. Gibbs free energy is the precise definition of "available work energy". So what I'm talking about is the Gibbs free energy minus the waste which includes cost of the intellectual property. Gibbs free energy includes a subtraction from the total that is due to the energy having an amount of disorder (entropy) at a given temperature (S*T). In a sense, the waste and I.P. expense is like a pre-existing "disorder" (or inefficiency) in the assets. If the size of the system under a common legal and currency control is stable, the new total net work energy coming into the system in a given time is equal to the waste energy going out. The infrastructure that acquires the input energy is a potential energy that will be depleted over time as the infrastructure depreciates. Every asset is similarly a potential energy. So the net work energy I've defined is probably better viewed as a potential energy and new energy coming in and going out in a stable system keep that potential energy constant. If the incoming energy coming is greater than the waste going out, the potential energy of the system is increasing, which should be accompanied by an increase in currency. Also, if the I.P. costs in the system decreases, it is like the S*T part of Gibbs free energy decreasing, enabling more of the potential energy to be converted into work energy. This also demands more currency creation.

The currency is proportional to the net work energy in the system, but not equal to it. All assets in a legal system should have a reference such as a document that defines the owner. The currency gains control of a portion of those assets (say, 10%) by owners having debts as well as assets which places a lien on their assets, so they do not exactly have full ownership of the assets they own. The debts may be expressible in other assets, but the legal system typically allows settlement in the currency. So not all debt is currency, but all currency is based on a debt. An immediate question I have is "Should the total debt-currency (as a percentage of the assets in the system) be constant?" My first guess is "yes" to keep things simple and therefore more measurable and predictable.

The rest is highly suspect that I need to investigate further. I have it here for my future reference.

coins = bytes = DNA = synapses => used to create economically beneficial arrangements of atoms and potential energy

The usefulness of energy depends on the form it is in as well as the pre-existing order of the matter it needs to move. For example, oil in the ground is not as valuable as oil in a tanker. Gold in an vein is not as useful as gold in sea water. So the order in the mass of commodities has value like energy commodities due to making itself more amenable for energy to move. A.I. systems like evolution and economies use energy to move mass to make copies of themselves to repeat the process. More precisely, the historical position of mass and potential energy gradients cause matter to form self-replicating way. Genes, brains, and A.I. are not forces that do anything of their own free will, but they are the result of forces from pre-existing potential energy gradients that created them. They are enzymes that allow energy and mass to move in an efficient direction, not forces. The following is mathematically exactly true: Intelligence = efficient prediction = compression = science. I am referring to the "density" of each of these, not the total abilities. For example, science seeks to predict the most in the least number of bytes. The least number of bytes is known as Occam's razor in science and is the 2nd of two fundamental tenants of science. The first is that observations should have the potential to prove a theory wrong (falsifiability), and that those observations always support the theory (reproducible observations). So the 1st science tenant is prediction and the 2nd is compression or efficiency. Total currency in an A.I. system = bytes / time that are destroyed in CPU computations and memory writes. Every computation in a CPU and memory write to RAM or a hard drive generates heat and entropy. The theoretical minimal entropy per byte destroyed is S =kb * ln(2). kb is boltzmann's constant. The minimal heat energy created (the energy lost) is Q = Temperature * S. In economics, the "bytes" are "dollars" that represent energy spent like a CPU computation to create a mass of a commodity (like the storage of a byte). When we write to memory in A.I. we are creating value that can be used in the future. Typically the writes are assigning weights to the connections in neural nets or the probabilities to a Bayesian net or making copies of a gene in genetic algorithms. Bytes in evolution are DNA. The bytes in our bodies are cellular energy like glucose and energy stored in the crystals of DNA. Energy-based commodities are spent to create mass-based commodities that are used for an economic system to replicate (expand), just like evolution and A.I. Total currency is the total available commodities per economic cycle. Approximating a constant number of economizing agents like people or neurons in a brain or nodes in a neural net means that the currency is also a bytes per economic cycle per economizing agent. Agents compete for the limited number of bytes in order to increase the number of bytes per agent per cycle. Coins are bytes that represent a percent ownership of the total commodities available per economic cycle per person.