Tuesday, November 24, 2015

entropy: relation between physical and information

== The relation between S and H ==

H is intensive S
Shannon entropy H appears to be more like bits per symbol of a message (see examples below for proof) which has abetter parallel with '''intensive''' entropy (S per volume or S per mole), NOT the usual '''extensive''' entropy S.  This allows "n" distinct symbols like a,b,c in a Shannon message of length N to correlate with "n" energy levels in N particles or energy modes.

H as possible extensive S?  Not really
However, a larger block of matter will have more longer wavelength phonons so that more Shannon symbols will be needed to represent the larger number of energy levels, so maybe it can't be taken as intensive S in all occasions.  But the same error will apply if you try to use intensive S.  So intensive S seems to be exactly equal to Shannon, as long as the same reference size block is used. This problem does not seem to apply to gases. But does a double-sized volume of gas with double number of molecules have double entropy as I would like to use Shannon entropy?    Yes, according to the equation I found (at the bottom) for S for a gas:  S = N*( a*ln(U/N) + b*ln(V/N)+c* ).  If N, U, and V all double, then entropy merely doubles.

Why Shannon entropy H is not like the usual extensive entropy S
Examples of Shannon entropy (using the online Shannon entropy calculator): http://www.shannonentropy.netmark.pl/
01, 0011, and 00101101 all have Shannon entropy H = log2(2) = 1
abc, aabbcc, abcabcabc, and abccba all have Shannon entropy H = log2(3) = 1.58

The important point is to notice it does not depend on the message length, whose analogy is number of particles.  Clearly physical extensive entropy increases with number of particles.

Suggested conversion between H and regular extensive S
To convert Shannon entropy H to extensive entropy S or vice versa use
where N is number of oscillators or particles, kb is Boltzmann's constant, ln(2) is the conversion from bits to nats, and N is the number of particles to which you've assigned a "Shannon" symbol representing the distinct energy level (if volume is constant) or microstates (a combination of volume and energy) of the particles or other oscillators of energy. A big deal should not be made out of kb (and therefore S) having units of J/K because temperature is a precise measure of a kinetic energy distribution, so it is unitless in a deep sense. It is merely a slope that allows zero heat energy and zero temperature to meet at the same point, both of which are Joules per particle.

Physical entropy of an Einstein solid derivable from information theory?
Consider an Einstein Solidhttp://hyperphysics.phy-astr.gsu.edu/hbase/therm/einsol.html with 3 possible energy states (0, 1, 2) of in each of N oscillators. Let the average energy per oscillator be 1 which means the total energy q=N.  The physical extensive entropy S is (by the reference above) very close to S = kb*ln(2)*2N  = 1.38*N*kb for large N. You can verify this by plugging in example N's, or by Sterling's approximation and some math.

Shannon entropy for this system is as follows: each oscillator's energy is represented by 1 of 3 symbols of equal probability (i.e. think of a, b, c substituted for energies 0,1,2). Then H = log2(3) = 1.585 which does not depend on N (see my Shannon entropy examples above). Using the H to S conversion equation above gives S=N*kb*ln(2)*log2(3) = ln(3)*N*kb = 1.10*N*kb. This is 56% lower than the Debye model. I think this means there are 14 possible states of even probability in the message of oscillators for each oscillator N giving S = k*N*ln(14) and K.E.= N*energy/N.

log2(3) = ln(3)/ln(2)

Note that the physical S is 25% more than the S derived from information theory, 2*ln(2) verses ln(3). The reason is that the physical entropy is not restricted to independently random oscillator energy levels which average 1.  It only requires the total energy be q=N.

You might say this proves Shannon entropy H is not exactly like intensive entropy S.  But I wonder if this idealized multiplicity is a physical reality. Or maybe kb contains a "hidden" 25% reduction factor in which case I would need to correct my H to S conversion above by a factor of 0.75 if used in multiplicity examples instead of classical thermodynamics.   Another possibility is that my approach in assigning a symbol to an energy level and a symbol location to a particle is wrong.  But it's hard to imagine the usefulness of looking at it any other way.

Going further towards Debye model of solids
However, I might be doing it wrong.  The oscillators might also be all 1's or half 2's and half 0's ( all 1's has log2(1)=0) . This would constitute 3 different types of signal sources as the possible source of the total energy with entropy of maybe H=log2(3)+log2(2)+log2(1)=2.58. Times ln(2) gives 1.77 which is higher than the 1.38, but close to Debye's model of solids which predicts temps 1/0.806 more than einstein for a given total energy which is S/kb=1.72  (because U/N in both models ~T*S). The model itself is supposed to be the problem with einstein solid, so H should not come out closer based on it, unless I'm cheating or re-interpreting in a way that makes it closer to the Debye model. To complete this, I need to convert the debye model to a shannon signal source, assigning a symbol to each energy level. Each symbol would occur in the signal source (the solid) based on how many modes are allowing it. So N would not be particles but number of phonons, which determine plank radiation even at low temps, and heat capacity. It is strange that energy level quantity does not change the entropy, just the amount of variation in the possibilities. This probably is the way the physical entropy is calculated so it's garanteed to be correct.

Debye Temp ~ s* sqrt(N/V)  where s=sonic velocity of the solid .  "Einstein Temp"  = 0.806* debye temp.  This means my treatment

C = a*T^3 (heat capacity)

Entropy of a single harmonic oscillator (giving rise to phonons?) at "high" temp is
for solids S=k*ln(kT/hf + 1) where f is frequency of the oscillations which is higher for stronger bonds.  So if Earth acquires stronger bonds per oscillator and maintains a constant temperature, S decreases.  For indpendent oscillators in 1 D I think it might be S=kN*(ln(kT/hf/N) + 1).  In terms of S=n*H entropy:  S=X*[-N/X*[ln(N/X)-1]]  where X=kT/hf. For 3D X^3.  See Vibrational Thermodynamics of Materials, Fultz paper.

Example that includes volume
The entropy of an ideal gas according to internal energy page.
S = k * N *[ cln(U/N)  + R*ln(V/N) ]
by using log rules applied to Wikipedia's internal energy page, where c is heat capacity. You could use two sets of "Shannon" symbols, one for internal energy states U and a set for V, or use one set to represent the microstates determined by U and V to give S=constant*N*ln(2)*H.   H using distinct symbols for microstates is thereby the Shannon entropy of an ideal gas.

ideal gas not near absolute zero, more accurately:
S=kN*ln(V/N*(mU4pi/3Nh^2)^3/2 + 5/2))
Sackur–Tetrode equation
S~N*( ln(V/N*(mU/N)^3/2 + a) +b)
S~N*(ln(V/N) + a*ln(mU/N) +b)
The separate ln()'s are because there are 2 different sets of symbols that need to be on the same absolute base, but have "something", maybe different "symbol lengths" as a result of momentum being the underlying quantity and it affects pressure per particle as v whereas it affects U as v^2.  In PV=nRT, T is v^2 for a given mass, and P is v.  P1/P2 = sqrt(m2/m1) when T1 and T2 are the same for a given N/V.

edit to Wikipedia: (entropy article)

The closest connection between entropy in information theory and physical entropy can be seen by assigning a symbol to each distinct way a quantum energy level (microstate) can occur per mole, kilogram, volume, or particle of homogeneous substance. The symbols will thereby "occur" in the unit substance with different probability corresponding to the probability of each microstate. Physical entropy on a "per quantity" basis is called "[[Intensive_and_extensive_properties|intensive]]" entropy as opposed to total entropy which is called "extensive" entropy.  When there are N moles, kilograms, volumes, or particles of the substance, the relationship between this assigned Shannon entropy H in bits and physical extensive entropy in nats is:
:S = k_\mathrm{B} \ln(2) N H
where ln(2) is the conversion factor from base 2 of Shannon entropy to the natural base e of physical entropy.  [[Landauer's principle]] demonstrates the reality of this connection: the minimum energy E required and therefore heat Q generated by an ideally efficient memory change or logic operation from irreversibly erasing or merging N*H bits of information will be S times the temperature,
:E = Q = T k_\mathrm{B} \ln(2) N H,
where H is in information bits and E and Q are in physical Joules. This has been experimentally confirmed.{{Citation |author1=Antoine Bérut |author2=Artak Arakelyan |author3=Artyom Petrosyan |author4=Sergio Ciliberto |author5=Raoul Dillenschneider |author6=Eric Lutz |doi=10.1038/nature10872 |title=Experimental verification of Landauer’s principle linking information and thermodynamics |journal=Nature |volume=483 |issue=7388 |pages=187–190 |date=8 March 2012 |url=http://www.physik.uni-kl.de/eggert/papers/raoul.pdf|bibcode = 2012Natur.483..187B }}

(scott note: H = sum(count/N*log2 (N/count)  )

BEST wiki article
When a file or message is viewed as all the data a source will ever generate, the Shannon entropy H in '''bits per symbol''' is
H = - \sum_{i=1}^{n} p_i \log_2 (p_i) = \sum_{i=1}^{n}count_i/N \log_2 (N/count_i)

where i is for each distinct symbol, p_i is "probability of symbol being received,  and count_i is the number of times symbol i occurs in the message that is N symbols long. This equation is "bits" per symbol because it is in a logarithm of base 2. It is "entropy per symbol" or "normalized entropy" that ranges from 0 to1 when the logarithm base is equal to the n distinct symbols. This can be calculated by multiplying H in bits/symbol by ln(2)/ln(n). The "bits per symbol" of data that has only 2 symbols is therefore also "entropy per symbol".

The "entropy" of a message, file, source, or other data is S=N*H, in keeping with Boltzmann's [[H-theorem]] for physical entropy that Claude Shannon's 1948 paper cited as analogous to his information entropy.  To clarify why Shannon's H is often called "entropy" instead of "entropy per symbol" or "bits per symbol", section 1.6 of Shannon's paper refers to H as the "entropy of the set of probabilities" which is not the same as the entropy of N symbols. H is analogous to physic's  [[Intensive and extensive properties|intensive]] entropy So that is on a per mole or per kg basis, which is different from the more common extensive entropy S.  In section 1.7 Shannon more explicitly says H is "entropy per symbol" or "bits per symbol".

It is instructive to see H values in bits per symbol (log2) for several examples of short messages by using an online entropy (bits per symbol) calculator:http://www.shannonentropy.netmark.pl/http://planetcalc.com/2476/
* H=0 for "A", "AAAAA", and "11111"
* H=1 for "AB", "ABAB", "0011", and "0110101001011001" (eight 1's and eight 0's)
* H=1.58 for "ABC", "ABCABCABCABC", and "aaaaBBBB2222"
* H=2 for "ABCD" and "AABBCCDD"

The entropy in each of these short messages is H*N.

Claude Shannon's 1948 paper was the first to define an "entropy" for use in information theory. His H function (formally defined below) is named after Boltzmann's [[H-theorem]] which was used to define physical entropy by S=kb*N*H of an ideal gas. Boltzmann's H is entropy in [[nats_(unit)|nats]] per particle, so each symbol in a message has an analogy to each particle in an ideal gas, and the probability of a specific symbol is analogous to the probability of a particle's microstate. The "per particle" basis does not work in solids because they are interacting.  But the information entropy maintains mathematical equivalency with bulk [[Intensive and extensive properties|intensive]] physical entropy So which is on a per mole or per kilogram basis (molar entropy and specific entropy). H*N is mathematically analogous to total [[Intensive and extensive properties|extensive]] physical entropy S. If the probability of a symbol in a message represents the probability of a microstate per mole or kg, and each symbol represents a specific mole or kg, then S=kb*ln(2)*N*Hbits. Temperature is a measure of the average kinetic energy per particle in an ideal gas (Kelvins = Joules*2/3/kb) so the Joules/Kelvins units of kbare fundamentally unitless (Joules/Joules), so S is fundamentally an entropy in the same sense as H.
[[Landauer's principle]] shows a change in information entropy is equal to a change in physical entropy S when the information system is perfectly efficient. In other words, a device can't irreversibly change its information entropy content without causing an equal or greater increase in physical entropy.   Sphysical=k*ln(2)*(H*N)bits where k, being greater than Boltzmann's constant kb, represents the system's inefficiency at erasing data and thereby losing energy previously stored in bits as heat dQ=T*dS.

on the information and entropy article:  (my best version)

The Shannon entropy H in information theory has units of bits per symbol (Chapter 1, section 7). For example, the messages "AB" and "AAABBBABAB" both have Shannon entropy H=log2(2) = 1 bit per symbol because the 2 symbols A and B occur with equal probability.  So when comparing it to physical entropy, the physical entropy should be on a "per quantity" basis which is called "[[Intensive_and_extensive_properties|intensive]]" entropy instead of the usual total entropy which is called "extensive" entropy.  The "shannons" of a message are its total "extensive" information entropy and is H times the number of bits in the message.
A direct and physically real relationship between H and S can be found by assigning a symbol to each microstate that occurs per mole, kilogram, volume, or particle of a homogeneous substance, then calculating the H of these symbols. By theory or by observation, the symbols (microstates) will occur with different probabilities and this will determine H. If there are N moles, kilograms, volumes, or particles of the unit substance, the relationship between H (in bits per unit substance) and physical extensive entropy in nats is:
:S = k_\mathrm{B} \ln(2) N H
where ln(2) is the conversion factor from base 2 of Shannon entropy to the natural base e of physical entropy.  N*H is the amount of information in bits needed to describe the state of a physical system with entropy S.  [[Landauer's principle]] demonstrates the reality of this by stating the minimum energy E required (and therefore heat Q generated) by an ideally efficient memory change or logic operation by irreversibly erasing or merging N*H bits of information will be S times the temperature which is
:E = Q = T k_\mathrm{B} \ln(2) N H
where H is in informational bits and E and Q are in physical Joules. This has been experimentally confirmed.{{Citation |author1=Antoine Bérut |author2=Artak Arakelyan |author3=Artyom Petrosyan |author4=Sergio Ciliberto |author5=Raoul Dillenschneider |author6=Eric Lutz |doi=10.1038/nature10872 |title=Experimental verification of Landauer’s principle linking information and thermodynamics |journal=Nature |volume=483 |issue=7388 |pages=187–190 |date=8 March 2012 |url=http://www.physik.uni-kl.de/eggert/papers/raoul.pdf|bibcode = 2012Natur.483..187B }}
Temperature is a measure of the average kinetic energy per particle in an ideal gase (Kelvins = 2/3*Joules/kb) so the J/K units of kb is fundamentally unitless (Joules/Joules). kb is the conversion factor from energy in 3/2*Kelvins to Joules for an ideal gas. If kinetic energy measurements per particle of an ideal gas were expressed as Joules instead of Kelvins, kb in the above equations would be replaced by 3/2. This shows S is a true statistical measure of microstates that does not have a fundamental physical unit other than "nats" which is just a statement of which logarithm base was chosen by convention.

8 molecules of gas, with 2 equally probable internal energy states and 2 equally probable positions in a box.  If at first they are on the same side of the box with half in 1 energy state and the other half in the other energy state, then the state of the system could be written ABABABAB. Now if they are allowed to distribute evenly in the box keeping their previous energy levels it is written ABCDABCD.   For an ideal gas:   S ~ N*ln(U*V) where U and V are averages per particle.

An ideal gas uses internal energy and volume.  Solids seem to not include volume, and depend on phonons with have bond-stretching energies in place of rotational energies.  But it seems like solids should follow S~N*ln(U) = k*ln(2)*H where H would get into quantum probabilities of each energy state.  The problem is that phonon waves across the solid are occurring.  So that instead of U, an H like this might need to be calculated from phase-space (momentum and position of the atoms and their electrons? )
comment Wikipedia
Leegrc, "bits" are the invented name for when the log base is 2 is used. There is, like you say, no "thing" in the DATA itself you can point to. Pointing to the equation itself to declare a unit is, like you are thinking, suspicious. But physical entropy itself is in "nats" for natural units for the same reason (they use base "e"). The only way to take out this "arbitrary unit" is to make the base of the logarithm equal to the number of symbols. The base would be just another variable to plug a number in. Then the range of the H function would stay between 0 and 1. Then it is a true measure of randomness of the message per symbol. But by sticking with base two, I can look at any set of symbols and know how many bits (in my computing system that can only talk in bits) would be required to convey the same amount of information. If I see a long file of 26 letters having equal probability, then I need H = log2(26) = 4.7 bits to re-code each letter in 1's and 0's. There are H=4.7 bits per letter.
PAR, as far as I know, H should be used blind without knowledge of prior symbol probabilities, especially if looking for a definition of entropy. You are talking about watching a transmitter for a long time to determine probabilities, then looking at a short message and using the H function with the prior probabilities.

Let me give an example of why a blind and simple H can be extremely useful. Let's say there is a file that has 8 bytes in it. One moment it say AAAABBBB and the next moment it says ABCDABCD. I apply H blindly not knowing what the symbols represent. H=1 in the first case and H=2 in the second. H*N went from 8 to 16. Now someone reveals the bytes were representing microstates of 8 gas particles. I know nothing else. Not the size of the box they were in, not if the temperature had been raised, not if a partition had been lifted, and not even if these were the only possible microstates (symbols). But there was a physical entropy change everyone agrees upon from S1=kb*ln(2)*8 to S2=kb*ln(2)*16. So I believe entropy H*N as I've described it is as fundamental in information theory as it is in physics. Prior probabilities and such are useful but need to be defined how they are used. H on a per message basis will be the fundamental input to those other ideas, not to be brushed aside or detracted from.
I agree you can shorten up the H equation by entering the p's directly by theory or by experience. But you're doing the same thing as me when I calculate H for large N, but I do not make any assumption about the symbol probabilities. You and I will get the same entropy H and "extensive" entropy N*H for a SOURCE. Your N*H extensive entropy is N*sum(p*log(p)). The online entropy calculators and I use N*H = N*sum[ count/N*log(count/N) ] ( they usually give H without the N). These are equal for large N if the source and channel do not change. "My" H can immediately detect if a source has deviated from its historical average. "My" H will fluctuate around the historical or theoretical average H for small N. You should see this method is more objective and more general than your declaration it can't be applied to a file or message without knowing prior p's. For example, let a partition be removed to allow particles in a box to get to the other side. You would immediately calculate the N*H entropy for this box from theory. "My" N*H will increase until it reaches your N*H as the particles reach maximum entropy. This is how thermodynamic entropy is calculated and measured. A message or file can have a H entropy that deviates from the expected H value of the source.
The distinct symbols A, B, C, D are distinct microstates at the lowest level. The "byte" POSITION determines WHICH particle (or microvolume if you want) has that microstate: that is the level to which this applies. The entropy of any one of them, is "0" by the H function, or "meaningless" as you stated. A sequence of these "bytes" tells the EXACT state of each particle and system, not a particular microstate (because microstate does not care about the order unless it is relevant to it's probability). A single MACROstate would be combinations of these distinct states. One example macrostate of this is when the gas might be in any one of these 6 distinct states: AABB, ABAB, BBAA, BABA, ABBA, or BAAB. You can "go to a higher level" than using A and B as microstates, and claim AA, BB, AB, and BA are individual microstates with a certain probabilities. But the H*N entropy will come out the same. There was not an error in my AAAABBBBB example and I did not make an assumption. It was observed data that "just happened" to be equally likely probabilities (so that my math was simple). I just blindly calculated the standard N*H entropy, and showed how it give the same result physics gets when a partition is removed and the macrostate H*N entropy went from 8 to 16 as the volume of the box doubled. The normal S increases S2-S1=kb*N*ln(2) as it always should when mid-box partition is removed.
I can derive your entropy from the way the online calculators and I use Shannon's entropy, but you can't go the opposite way.
Now is the time to think carefully, check the math, and realize I am correct. There are a lot of problems in the article because it does not distinguish between intensive Shannon entropy H in bits/symbol and extensive entropy N*H in bits (or "shannons to be more precise to distinguish it from the "bit" count of a file which may not have 1's and 0's of equal probability).
BTW, the entropy of an ideal gas is S~N*log2(u*v) where u and v are internal energy and volume per particle. u*v gives the number of microstates per particle. Quantum mechanics determines that u can take on a very large number of values and v is the requirement that the particles are not occupying the same spot, roughly 1000 different places per particle at standard conditions. The energy levels will have different probabilities. By merely observing large N and counting, H will automatically include the probabilities.
In summary, there are only 3 simple equations I am saying. They precisely lay the foundation of all further information entropy considerations. These equations should replace 70% of the existing article. These are not new equations, but defining them and how to use them is hard to come across since there is so much clutter and confusion regarding entropy as a result of people not understanding these statements.
1) Shannon's entropy is "intensive" bits/symbol = H = sum[ count/N*log2(count/N) ] where N is the length of a message and count is for each distinct symbol.
2) Absolute ("extensive") information entropy is in units of bits or shannons = N*H.
3) S = kb*ln(2)*N*H where each N has a distinct microstate which is represented by a symbol. H is calculated directly from these symbols for all N. This works from the macro down to the quantum level.
Example: 3 interacting particles with sum total energy 2 and possible individual energies 0,1,2 may have possible energy distributions 011, 110, 101, 200, 020, or 002. I believe the order is not relevant to what is called a microstate, so you have only 2 symbols for 2 microstates, and get the probability for each is 50-50. Maybe there is usually something that skews this towards low energies. I would simply call each one of the 6 "sub-micro states" a microstate and let the count be included in H. Assuming equal p's again, the first case gives log(2)=1 and the 2nd log(6)=2.58. I believe the first one is the physically correct entropy (the approach, that is, not the exact number I gave). If I had let 0,1,2 be the symbols, then it would have 3*1.46 = 4.38 which is wrong.
Physically, because of the above, when saying S=k*ln(2)*NH, it requires that you look at specific entropy So and make it = k*ln(2)*H, so you'll have the correct H. This back-calculates the correct H. This assumes you are like me and can't derive Boltzmann's thermodynamic H from first (quantum or not) principles. I may be able to do it for an ideal gas. I tried to apply H to Einstein's oscillators (he was not aware of Shannon's entropy at the time) for solids, and I was 25% lower than his multiplicity, which is 25% lower than the more accurate Debye model. So a VERY simplistic approach to entropy with information theory was only 40% lower than experiment and good theory, for the one set of conditions I tried. I assumed the oscillators had only 4 energy states and got S=1.1*kT where Debye via Einstein said S=1.7*kT
My point is this: looking at a source of data and choosing how we group the data into symbols can result in different values for H and NH, [edit: if not independent]. Using no grouping on the original data is no compression and is the only one that does not use an algorithm plus lookup table. Higher grouping on independent data means more memory is required with no benefit to understanding (better understanding=lower NH). People with bad memories are forced to develop better compression methods (lower NH), which is why smart people can sometimes be so clueless about the big picture, reading too much with high NH in their brains and thinking too little, never needing to reduce the NH because they are so smart. Looking for a lower NH by grouping the symbols is the simplest compression algorithm. The next step up is run-length encoding, a variable symbol length. All compression and pattern recognition create some sort of "lookup table" (symbols = weighting factors) to run through an algorithm that may combine symbols to create on-the-fly higher-order symbols in order to find the lowest NH to explain higher original NH. The natural, default non-compressed starting point should be to take the data as it is and apply the H and NH statistics, letting each symbol be a microstate. Perfect compression for generalized data is not a solvable problem, so we can't start from the other direction with an obvious standard.
This lowering of NH is important because compression is 1 of 3 requirements for intelligence. Intelligence is the ability to acquire highest profit divided by noise*log(memory*computation) in the largest number of environments. Memory on a computing device has a potential energy cost and computation has a kinetic energy cost. The combination is internal energy U. Specifically, for devices with a fixed volume, in both production machines and computational machines, profit = Work output/[k*Temp*N*ln(U/N)] = Work/(kTNH). This is Carnot efficiency W/Q, except the work output includes acquisition of energy from the environment so that the ratio can be larger than 1. The thinking machine must power itself from its own work production, so I should write (W-Q)/Q instead. W-Q feeds back to change Q to improve the ratio. The denominator represents a thinking machine plus its body (environment manipulator) that moves particles, ions (in brains), or electrons (in computers) to model much larger objects in the external world to try different scenarios before deciding where to invest W-Q. "Efficient body" means trying to lower k for a given NH. NH is the thinking machine's algorithmic efficiency for a giving k. NH has physical structure with U losses, but that should be a conversion factor moved out to be part of the kT so that NH could be a theoretical information construct. The ultimate body is bringing kT down to kb at 0 C. The goal of life and a more complete definition of intelligence is to feed Work back to supply the internal energy U and to build physical structures that hold more and more N operating at lower and lower k*T. A Buddhist might say we only need to stop being greedy and stop trying to raise N (copies of physical self, kT, aka the number of computations) and U and we could leave k alone. This assumes constant volume, otherwise replace U/N with V/N*(U/N)^3/2 (for an ideal gas, anyway, maybe UV/NN is ok for solids. Including volume means we want to make it lower to lower kTNH. So denser thinking machine.  The universe itself increases V/N (Hubble expansion) buth it cancels in determining Q because it causes U/N to decrease at possibly the same rate. This keeps entropy and energy CONSTANT on a universal COMOVING basis (ref: Weinberg's 1977 famous book "First 3 Minutes"), which causes entropy to be emitted (not universally "increased" as the laymen's books still say) from gravitational systems like Earth and Galaxies. The least action principle (the most general form of Newton's law, better than Hamiltonian & Lagrangian for developing new theories, see Feynman's red books) appears to me to have an inherent bias against entropy, preferring PE over KE over all time scales, and thereby tries to lower temp and raise the P.E. part of U for each N on Earth. This appears to be the source of evolution and why machines are replacing biology, killing off species 50,000 times faster than the historical rate. The legal requirement of all public companies is to dis-employ workers because they are expensive and to extract as much wealth from society as possible so that the machine can grow. Technology is even replacing the need for shareholders and skill (2 guys started MS, Apple, google, youtube, facebook, and snapchat and you can see trend in decreasing intelligence and age and increasing random luck needed to get your first $billion). Silicon, carbon-carbon, and matals are higher energy bonds (which least action prefers over kinetic energy) enabling lower N/U and k, and even capturing 20 times more Work energy per m^2 than photosynthesis. Ions that brains have to model objects with still weigh 100,000 times more than the electrons computers use.
In the case of the balance and 13 balls, we applied the balance like asking a question and organize thigs to get the most data out of the test. We may seek more NH answers from people or nature than we give in order to profit, but in producing W, we want to spend as little NH as possible.
[edit: I originally backtracked on dependency but corrected it, and I made a lot errors with my ratios from not letting k be positive for the ln().]Ywaz (talk) 23:09, 8 December 2015 (UTC)

Rubik's cube
Let the number of quarter turns from a specific disorder to ordered state be a microstate with a probability based on number of turns.  Longer routes to solution are more likely.  The shortest route uses the least energy which is indicative of intelligence. That's why we prize short solutions.  The N in N*H = N*sum(count/N*log2(N/count)) is the number of turns.  Count is the number of different routes with that N.  Speakin in terms Carnot understood and reversing time, the fall from low entropy to high entropy should be fast in order minimize work production (work absorption when forward in time).  The problem is always determining if a single particular step is the most efficient towards the goal or not. There's no incremental measure for profit increase in the problems we can't solve. Solving problems is a decrease in entropy. Is working backwards to generate the most mixed up state in as few turns as possible a beginning point?

Normalized entropy
There may not be a fundamental cost to a large memory, i.e., a large number of symbols aka classifications. That is potential energy which is an investment in infrastructure. Maybe there is a "comparison","lookup", or even transmission cost (more bits are still required), but maybe those are reversible. Maybe you can take large entropy, classify it the same as with microbits, and the before and after entropy cost is the same, but maybe computation is not reversible (indeed, dispersion calls it into question) but memory access is, so a larger memory set for classification is less energy to computation.  If there are no obvious dependencies to the H function, the larger number of symbols absorbing more bits/symbol in the message will have the same NH.  If H is not equal to 1 for binary, then this may be an unusual situation.  Let i=0 to n=number of unique symbols of message N symbols long then H=sum(log(N/count_i)) . So

H' = H*ln(2)/ln(n) where "2" should be replaced if the H was not calculated with log base 2.

 = normalized entropy where each "bit" position can take on i states, i.e., an access to a memory of symbols has i choices, the number of symbols, aka the number of distinct microstates.  H' varies from 0 to 1, which Shannon did not point out, but let "entropy" vary in definition based on the log base. So if there is not a cost to accessing a large memory bank and it speeds up computation or discussion, then even if NH is the same, NH' will be lower for the larger memory bank (the division makes it smaller).  But again, NH is only the same between the two anyway if symbol frequencies are equal and this occurs only if you see no big patterns at all and it looks like noise in both cases, or if you have really chosen the optimum set of symbols for the data (physics equations seem to be seeking smallest NH, not smallest NH').  It seems like a larger number of symbols available will almost always detect more patterns, with no problems about narrow-mindedness if the training data set is large enough. The true "disorder" of a physical object relative to itself (per particle or per symbol) rather than to it's mass or size (So) or information is H' = S/ln(number of microstates).  If one is trying to compare the total entropy of objects or data that use completely different sets of microstates (symbols) that are very different, the measure N*H' is still completely objective.

With varying-length symbols, NH' has to more seriously lose all connection with physical entropy and energy on a per bit level.  It has to be strictly on a per symbol level.  Varying symbols lose all connection to physics. consider the following:  n times symbol bit length is no longer representing a measure of the bit memory space required because bit length is varying.  N divided by symbol bit length is also not the number of symbols in a message.  There are more memory locations addressed, with total memory required being n*average symbol length.  It seems like it would be strictly "symbols only", especially if the log base is chosen based on n.  The log base loses the proportional connection to "energy" (if energy were still connected to number of bits instead of symbols).

For varying-length symbols there is also a problem when trying to relate computations to energy and memory even if I keep base 2.  To keep varying symbol lengths on a bit basis for the entropy in regards to computation energy required per information entropy, I need the following: For varying bit-length symbols where symbol i has bit length Li, I get
H=1/N*sum ( count_i * Li * log2(N/(count_i*Li) ) )
This is average bit (and real entropy since N is in bits) variation per bit communicated.
where N has to be in bits instead of symbols and the log base has to be in bits. Inside the log is the ability of the count to reduce the number of bits needed to be transmitted.  Remember for positive H it is p*log(1/p) so this represents a higher probability of "count" occurring for a particular bit if L is higher.  I don't know how to convert the log to a standard base where it varies from 0 to 1. Because they are varying length, it kind of loses relevance. Maybe the sum frequency of symbol occurring times tits length divided by sum (count_i/N*Li/(sum( L)/n))

Log base two on varying symbol bit-lengths loses its connection to bits required to re-code, so the above has to be used.

So in order to keep information connected to physics, you need to stick with log base 2 so you can count the changes in energy accurately. The symbols can vary in bit length as long as they are the same length.

So H' is a disorder measurement per symbol, comparable across all types of systems with varying number of n and N symbols. It varies from 0 to 1.   It is "per energy" only if there is a fixed energy cost per symbol transmission and storage without regard to symbol length. If someone has remembered a lot of symbols of varying lengths, they might know a sequence of bits immediately as 2 symbols and get a good NH' on the data at hand but not a better NH if they had remembered all sequences of that length. If they know the whole message as 1 symbol they already know the message it is 0 in both and the best in both.   Being able to utilize the symbol for profit is another matter (low k factor as above). A huge memory will have a cost that raises k. Knowing lots of facts but not being able to know how to use them is a high k (no profit).  Facts without profit are meaningless.  You might as well read data and just store it.

Maybe "patterns" or "symbols" or "microstates" to be recognized are the ones that lead to profit. You run scenarios and look at the output. That requires patterns as constraints and capabilities. Or look at profit patterns and reverse-invent scenarios of possible patterns under the constraints.

Someone who knows many symbols of a Rubik's cube is one who recognizes the patterns that lead to the quickest profit. The current state plus the procedure to take is a pattern that leads to profit.  The goal is speed. Energy losses for computation and turn cost are not relevant.  They will have the lowest NH', so H' is relevant.
Thanks for the link to indistinguishable particles. The clearest explanation seems to be here, [[Gibbs_paradox#The_mixing_paradox|the mixing paradox]]. The idea is this: if we need to know the kTNH energy required (think NH for a given kT) to return to the initial state at the level 010, 100, 001 with correct sequence from a certain final sequence, then we need do the microstates at that low level. Going the other way, "my" method should be  mathematically the same as "yours" if it is required to NOT specify the exact initial and final sequences, since those were implicitly not measured.  Measuring the initial state sequences without the final state sequences would be changing the base of the logarithm mid-stream. H is in units of true entropy per symbol when the base of logarithm is equal to the number of distinct symbols.  In this way H always varies from 0 to 1 for all objects and data packets, giving a true disorder (entropy) per symbol (particle). You multiply by ln(2)/ln(n) to change base from 2 to n symbols. Therefore the  ultimate objective entropy (disorder or information) in all systems, physical or information, when applied to data that accurately represents the source should be

Entropy = N*(-H) = \sum_i count_i \log_n (N/count_i)

 Entropy = sum(count*logn(N/count))

where i=1 to n distinct symbols in data N symbols long. Shannon did not specify which base H uses, so it is a valid H.  To convert it to nats of normal physical entropy or entropy in bits, multiply by ln(n) or log2(n). The count/N is inverted to make H positive. In this equation, with the ln(2) conversion factor, this entropy of "data" is physically same as the entropy of "physics" if the symbols are indistinguishable, and we use energy to change the state of our system E=kT*NH where our computer system has a k larger than kb due to inefficiency.  Notice that changes in entropy will be the same without regard to k, which seems to explain why ultimately distinguishable states get away with using higher-level microstates definitions that are different with different absolute entropy. For thermo, kb is what appears to have fixed not caring about the deeper states that were ultimately distinguishable.

In the equation above, there is a penalty if you chose a larger symbol set. Maybe that accounts for the extra memory space required to define symbols.

The best wiki articles are going to be like this: you derive what is the simplest but perfectly accurate view, then find the sources using that conclusion to justifyits inclusion.
So if particles (symbols) are distinguishable and we use that level of distinguishability, the count at the 010 level has to be used. Knowing the sequence means knowing EACH particle's energy. The "byte-position" in a sequence of bits represents WHICH particle. This is not mere symbolism because the byte positions on a computer have a physical location in a volume, so that memory core and CPU entropy changes are exactly the physical entropy changes if they are at 100% efficiency ([[Landauer's principle]]). (BTW the isotope method won't work better than different molecules because it has more mass. This does not affect temperature, but it affects pressure, which means the count has to be different so that pressure is the same. So if you do not do anything that changes P/n in PV=nRT, using different gases will have no effect to your measured CHANGE in entropy, and you will not know if they mixed or not. ) [[User:Ywaz|Ywaz]] ([[User talk:Ywaz|talk]]) 11:48, 10 December 2015 (UTC)

By using indistinguishable states, physics seems to be using a non-fundamental set of symbols, which allows it to define states that work in terms of energy and volume as long as kb is used.  The ultimate, as far as physicists might know, might be phase space (momentum and position) as well as spin, charge, potential energy and whatever else.  Momentum and position per particle are 9 more variables because unlike energy momentum is a 6D vector (including angular), and a precise description of the "state" of a system would mean which particle has the quantities matters, not just the total.  Thermo gets away with just assigning states based on internal energy and volume, each per particle. I do not see kb in the ultimate quantum description of entropy unless they are trying to bring it back out in terms of thermo. If charge, spin, and particles are made up of even smaller distinguishable things, it might be turtles all the way down, in which case, defining physical entropy as well as information entropy in the base of the number of symbols used (our available knowledge) might be best.

I didn't make it up. It's normally called normalized entropy, although they normally refer to this H with logn "as normalized entropy" when according to Shannon they should say "per symbol" and use NH to call it an entropy.  I'm saying there's a serious objectivity to it that I did not realize until reading about indistinguishable states.
I hope you agree "entropy/symbol" is a number that should describe a certain variation in a probability distribution, and that if a set of n symbols were made of continuous p's, then a set of m symbols should have the same continuous distribution.  But you can't do that (get the same entropy number) for the exact same "extrapolated" probability distributions if they use a differing number of symbols.  You have to let the log base equal the number of symbols. I'll get back to the issue of more symbols having a "higher resolution".  The point is that any set of symbols can have the same H and have the same continuous distribution if extrapolated.
If you pick a base like 2, you are throwing in an arbitrary element, and then have to call it (by Shannon's own words) "bits/symbol" instead of "entropy/symbol".  Normalized entropy makes sense because of the following
entropy in bits/symbol = log2(2^("avg" entropy variation/symbol))
entropy per symbol = logn(n^("avg" entropy variation/symbol))
The equation I gave above is the normalized entropy that gives this 2nd result.
Previously we showed for a message of N bits, NH=MH' if the bits are converted to bytes and you calculate H' based on the byte symbols using the same log base as the bits, and if the bits were independent.  M = number of byte symbols = N/8.  This is fine for digital systems that have to use a certain amount of energy per bit.  But what if energy is per symbol?  We would want NH = M/8*H' because the byte system used 8 fewer symbols.
By using log base n, H=H' for any set of symbols with the same probability distribution, and N*H=M/8*H.
Bytes can take on an infinite number of different p distributions for the same H value, whereas bits are restricted to a certain pair of values for p0 and p1 (or p1 and p0) for a certain H, since p0=1-p0. So bytes have more specificity, that could allow for higher compression or describing things like 6-vector momentum instead of just a single scalar for energy, using the same number of SYMBOLS. The normalized entropy allows them to have the same H to get the same kTNH energy without going through contortions.  So for N particles let's say bits are being used to describe each one's energy with entropy/particle H, and bytes are used to described their momentums with entropy/particle H'.  Momentums uniquely describe the energy (but not vice versa). NH=NH'. And our independent property does not appear to be needed: H' can take on a specific values of p's that satisfy H=H', not some sort of average of those sets. Our previous method of NH=MH' is not as nice, violating Occam's razor.

entropy and kolmogorov complexity
== Shannon entropy of a universal measure of Kolmogorov complexity   ==
What is the relation of Kolmogorov complexity to [[Information theory]]? It seems very close to Shannon entropy, in that both are maximised by randomness, although one seems to deal more with the message than the context. [[User:Cesiumfrog|Cesiumfrog]] ([[User talk:Cesiumfrog|talk]]) 00:03, 14 October 2010 (UTC)
:Shannon entropy is a statistical measure applied to data which shows how efficiently the symbols are transferring bits based solely on how many times the symbols occur relative to each other. It's like the dumbest, simplest way of looking for patterns in data, which is ironically why it is useful. Its biggest most basic use is for taking data with a large number of different symbols and stating how many bits are needed to say the same thing without doing any compression on the data.
:Kolmogorov complexity is the absolute extreme in the direction Shannon entropy takes step 0. It is the highest possible level of intelligent compression of a data set. It is not computable in most cases, it's just a theoretical idea.  But  Shannon entropy is always computable and blind. By "compression" I mean an algorithm has been added to the data, and the data reduced to a much smaller set that is the input to the algorithm to generate the original data. Kolmogorov is the combination in bits of the program plus the smaller data set.
:I would use Shannon entropy to help determine the "real" length in bits of a program that is proposing to be better than others at get close to the idealized Kolmogorov complexity. Competing programs might be using a different or larger sets of functions, which I would assign a symbol to. Programs that use a larger set of functions would be penalized when the Shannon entropy measure is applied. There are a lot of problems with this staring point that I'll get to, but I would assign a symbol to each "function" like a=next line, b=*,  e=OR, f=space between arguments, and so on. Numbers would remain number symbols  1,2,3, because they are already efficiently encoded.  Then the Shannon entropy (in bits) and therefore the "reduced" level attempted Kolgomorov complexity (Shannon entropy in bits) is
: N*H = - N \sum_i f_i \log_2 (f_i) = \sum_i count_i \log_2 (N/count_i)
:where i=1 to n is for each of n distinct symbols, f_i is "frequency of symbol i occurring in program",  and count_i is the number of times symbol i occurs in the program that is N symbols long.  This is the Shannon bits (the unit is called "shannons") in the program.   The reason this is a better measure of k is because if
:normalized H = \sum_i count_i/N \log_n (N/count_i)
:is < 1, then there is an inefficiency in the way the symbols themselves were used (which has no relevance to the efficiency of the logic of the algorithm) that is more efficient when expressed as bits.  Notice the log base is "n" in this 2nd equation. This H is normalized and is the truest entropy per symbol. To calculate things in this base use logn(x) = ln(x)/ln(n)
:But there is a big problem.  Suppose you want to define a function to be equal to a longer program and just assign a symbol to it, so the program length is reduced to nearly "1". Or maybe a certain function in a certain language is more efficient for certain problems.   So there needs to be a reference "language" (set of allowable functions) to compare one K program to the next. All standard functions have a known optimal expression in Boolean logic: AND, NOT, and OR. So by choosing those 3 as the only functions, any program in any higher-level language can be reduced back to a standard. Going even further, these can be reduced back to a transistor count or to a single universal logic gate like NAND or XOR, or a single universal reversible logic gate like Toffoli or Fredkin.  Transistors are just performing a function to, so i guess they are universal to, but I am not sure.  The universal gates can be wired to performs all logic operations and are Turing complete.  So any program in any language can be re-expressed into a single fundamental function. The function itself will not even need a symbol in the program to identify it (as I'll show), so that the only code in the program is describing the wiring between "addresses". So the complexity describes the complexity of the physical connections, the path the electrons or photons in the program take, without regard to the distance between them.  Example:  sequence of symbols ABCCDA&CEFA&G-A would mean "perform A NAND B and send output to C, perform C NAND D and send output to A and C, perform E NAND F and send output to A and F, go to A. The would define input and output addresses, I would use something like ABD>(program)>G. The "goto" symbol "-" is not used if the program is expanded to express Levine's modified K complexity which penalizes loop \s by expanding them. It thereby takes into account computational time (and therefore a crude measure of energy energy) as well and a different type of program length.
:The program length by itself could or should be a measure of the computational infrastructure required (measured as energy required to create the transistors), which is another reason it should be in terms of something like a single gate: so it's cost to implement can be measured.  What's the cost to build an OR verses a fourier transform? Answer: Count the required transistors or the NAND gates (which are made up of a distinct number of transistors). All basic functions already have had their Boolean logic and transistors reduce to a minimal level so it's not arbitrary or difficult.
:I think this is how you can get a comparable K in all programs in all languages:  express them in a single function with distinct symbols representing the input and ouput addresses. Then calculate Shannon's N*H to get its K or Levine complexity in absolute physical terms. Using [[Landauer's principle]], counting the number of times a bit address (at the input or output of a NAND gate) changes state will give the total energy in joules that the program used, Joules=k*T*ln(2)*N where k is larger than Boltzmann's constant kb as a measure of the inefficiency of the particular physical implementation of the NAND gates. If a theoretically reversible gate is used there is theoretically no computational energy loss, except for the input and output changes.  [[User:Ywaz|Ywaz]] ([[User talk:Ywaz|talk]]) 22:20, 12 December 2015 (UTC)

simplifying ST equation for ideal gas and looking at it from an entropy view.
== Derivation from uncertainty principle and relation to information entropy ==
Volume and energy are not the direct source of the states that generate entropy, so I wanted to express it in terms of x*p/h' number of  states for each N.  Someone above asked for a derivation from the uncertainty principle ("the U.P.") and he says it's pretty easy. S-T pre-dates U.P., so it may be only for historical reasons that a more efficient derivation is not often seen.

The U.P. says x'p'>h/4pi where x'p' are the standard deviations of x and p.  The x and p below are the full range, not the 34.1% of the standard deviation so I multiplied x and p each by 0.341. It is 2p because p could be + or - in x,y, and z. By plugging the variables in and solving, it comes very close to the S-T equation. For Ω/N=1000 this was even more accurate than S-T, 0.4% lower.

Sackur-Tetrode equation:

S = k_B \ln\left(\frac{\Omega^N}{N!}\right) \approx k_B N \left(\ln\left(\frac{\Omega}{N} \right) +1\right)


\Omega= \left(\frac{2px}{\hbar/2}\right)^{3} = \left(\frac{2p_x x_x}{\hbar/2}\right)\left(\frac{2p_y x_y}{\hbar/2}\right)\left(\frac{2p_z x_z}{\hbar/2}\right)

\sigma=0.341, x=\sigma V^\frac{1}{3},  p = \sigma\left(\frac{2mU*b}{N!}\right)^\frac{1}{2}, U*b=K.E.=3/2 k_B T

Stirling's approximation N!=(N/e)^N is used in two places that results in a 1/N^(5/2) and and e^(5/2) which is where the 5/2 factor comes from.  The molecules' internal energy U is kinetic energy for the monoatomic gas case for which the S-T applies. b=1 for monoatomic, and it may simply be changed for other non-monoatomic gases that have a different K.E./U ratio. The equation for p is the only difficult part of getting from the U.P. to the S-T equation and it is difficult only because the thermodynamic measurements T (kinetic energy per atom) and V are an energy and a distance where the U.P. needs x*p or t*E. This strangeness is where the 3/2 and 5/2 factors come from. The 2m is to get 2*m*1/2*m*V^2 = p^2. Boltzmann's entropy assumes it is a max for the given T, V, and P which I believe means the N's are evenly distributed in x^3 and assumes all are carrying the same magnitude p momentum.

To show how this can come directly from information theory, first remember that Shannon's H function is valid only for a random variable. In this physical case, there are only 2 possible values that each phase space can have: with an atom in it, or not, so it is like a binary file. But unlike normal information entropy, some or many of the atoms may have zero momentum as the others carry more: the total energy just has to be the same. So physical entropy can use anywhere from N to 1 symbols (atoms) to carry the same message (the energy), whereas information entropy is stuck with N. Where information entropy has a log(1/N^N)=N*log(N) factor, physical entropy has log(1/N!)=N*log(N)+N which is a higher entropy.  My correction to Shannon's H shown below is known as the sum of the surprisals and is equal to the information (entropy) content in bits. The left sum is the information contributions from the empty phase space slots and the right side are those where an atom occurs. The left sum is about 0.7 without regard to N (1 if ln() had been used) and the right side is about 17*N for a gas at room temperature (for Ω/N ~ 100,000 states/atom):

H*N = \sum_{i=1}^{\Omega-N} \log_2\left(\frac{\Omega}{\Omega-i N/(\Omega-N)}\right) + \sum_{j=0}^{N-1} \log_2\left(\frac{\Omega}{N-j}\right) \approx \log_2\left(\frac{\Omega^N}{N!}\right)

Physical entropy S then comes directly from this Shannon entropy:

S = H*N k_B \ln(2)

A similar procedure can be applied to phonons in solids to go from information theory to physical entropy. For reference here is a 1D oscillator in solids:

S = k_B \left[ \ln \left( \frac{k_B T}{hf}\right) +1 \right]
[[User:Ywaz|Ywaz]] ([[User talk:Ywaz|talk]]) 02:37, 15 January 2016 (UTC)
email to jon bain, NYU 1/17/2016  (this conatins errors.  See above for the best stuff..
You have a nice entropy presentation, but I wanted to show you Shannon's H is an "intensive" entropy which is entropy per symbol of a message as he states explicitly in section 1.7 of his paper.  This has much more of a direct analogy to Boltzmann's H-theorem that Shannon says was his inspiration.

Specific entropy in physics is S/kg or S/mole which is what both the H's are. Regular extensive entropy S for both Shannon and Boltzmann is:


So conceptually it seems the only difference from Shannon/Boltzmann and any other physical entropy is that Shannon/Boltzmann make use independent (non-interacting) symbols (particles) because they are available. It seems like they could be forced to generalize towards Gibbs or QM entropy as needed, mostly by just changing definitions of the variables (semantics).

Boltzmann's N particles
Let O{N!} = "O!" = # of QM-possible mAcrostates for N at E
Only a portion of N may carry E (and therefore p) and each Ni may not occupy less than (xp/h')^3 phase space:
O{N!} = (2xp/h')^3N = [2x*sqrt(2m*E/N!)/h']^3N ,   N!=(N/e)^N
2xp because p could be + or -.
ln [ (O{N!})^3 / N! ] = ln [O^3N*(1/e)^(-3/2*N) / (N/e)^N ] = 5/2*N*[ln(O/N)+1]
O= 2xP/N/h' = 2xp/h'
In order to count O possible states I have to consider than some of the Ni's did not have the minimum xp'/h' which is why S-T has error for small N: they can't be zero by QM.
S=k*ln(O{N!}/N!)=5/2*k*N*[ln(O/N) + 1]
   results in a constant in the S-T equation:  S=kN(ln(O/N) + 5/2)

S= k*ln(Gamma) =  k*ln("O!"/N!) = k*5/2*N*(ln(O/N)+1] )
S=k*N*(-1*Oi/N*sum[ln(N/Oi) -const]
S=k*N*(sum[Oi/N*const - Oi/N*ln(N/Oi)]

Shannon: N=number of symbols in the data, O^O=distinct symbols,


So the Shannon entropy appears to me to be Boltzmann's entropy except that particles can be used only once and symbols can be used a lot.  It seems to me Shannon entropy math could be applied to Boltzmann's physics (or vice versa) it get the right answer with only a few changes to semantics.

If I sent messages with uniquely-colored marbles from a bag with gaps in the possible time slots, I am not sure there is a difference between Shannon and Boltzmann entropy.

Can the partition function be expressed as a type of N!/O! ?

The partition function's macrostates seems to work backwards to get to Shannon's H because measurable macrostates are the starting point.

The S-T equation for a mono-atomic gas is S=kN*(ln(O/N)+5/2) where O is  (2px/h')^3=(kT/h'f)^3 in the V=x^3 volume where h' is an adjusted h-bar to account for p*x in the uncertainty principle being only 1 standard deviation (34.1%) instead of 100% of the possibilities, so h'=h-bar/(0.34.1)^2 and there are no other constants or pi's in S-T equation except for 5/2 which comes from [(xp/h')^3N]/N! = [x*sqrt(2m*E/N!)/h']^3 ~ (sqrt(E/e))^(3N*sqrt(E)) / (N/e)^N = 5/2

k is just a conversion factor from the kinetic energy per N as measured by temperature in order to get it into joules when you use dQ=dS*T.  If we measure the kinetic energy of the N's in Joules instead of temp, then k is absent. Of course ln(2) is the other man-made historical difference.

If the logarithm base is made N, then it is a normalized entropy that ranges from 0 to 1 for all systems, giving a universal measure of how close the system is to maximum or minimum entropy.

I think your equations should keep "const" inside the parenthesis like this because it depends on N and I believe it is a simple ratio. I think the const is merely the result of "sampling without replacement" factorials.
they say Boltzmann's entropy is when the system has reached thermal equilibrium so that S=N*H works, or something like that, and that Gibbs is more general where the systems macrostate measurements may have a lower entropy than boltzmann's calculation.  I think this is like a message where you have calculated the p's, but it sends you something different.  Or that you calculate S=N*H and source's H is different than it turns out to be in the future.
post to rosettacode.org on "entropy" where they give code for H in many languages.

==  This is not exactly "entropy". H is in bits/symbol. ==
This article is confusing Shannon entropy with information entropy and incorrectly states Shannon entropy H has units of bits.

There are many problems in applying H= -1*sum(p*log(p)) to a string and calling it the entropy of that string. H is called entropy but its units are bits/symbol, or entropy/symbol if the correct log base is chosen. For example, H of 01 and 011100101010101000011110 are exactly the same "entropy", H=1 bit/symbol, even though the 2nd one obviously carries more information entropy than the 1st.  Another problem is that if you simply re-express the same data in hexadecimal, H gives a different answer for the same information entropy. The best and real information entropy of a string is 4) below. Applying 4) to binary data gives the entropy in "bits", but since the data was binary, its units are also a true "entropy" without having to specify "bits" as a unit.

Total entropy (in an arbitrarily chosen log base, which is not the best type of "entropy") for a file is S=N*H where N is the length of the file. Many times in [http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf Shannon's book] he says H is in units of "bits/symbol", "entopy/symbol", and "information/symbol".  Some people don't believe Shannon, so [https://schneider.ncifcrf.gov/ here's a modern respected researcher's home page] that tries to clear the confusion by stating the units out in the open.

Shannon called H "entropy" when he should have said "specific entropy" which is analogous to physics' S0 that is on a per kg or per mole basis instead of S.  On page 13 of [http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf Shannon's book], you easily can see Shannon's horrendous error that has resulted in so much confusion.  On that page he says H, his "entropy", is in units of "entropy per symbol". This is like saying some function "s" is called "meters" and its results are in "meters/second". He named H after Boltzmann's H-theorem where H is a specific entropy on a per molecule basis.  Boltzmann's entropy S = k*N*H = k*ln(states).

There 4 types of entropy of a file of N symbols long with n unique types of symbols:

1) Shannon (specific) entropy '''H = sum(count_i / N * log(N / count_i))'''
where count_i is the number of times symbol i occured in N.
Units are bits/symbol if log is base 2, nats/symbol if natural log.

2) Normalized specific entropy: '''Hn = H / log(n).'''
The division converts the logarithm base of H to n. Units are entropy/symbol. Ranges from 0 to 1. When it is 1 it means each symbol occurred equally often, n/N times. Near 0 means all symbols except 1 occurred only once, and the rest of a very long file was the other symbol. "Log" is in same base as H.

3) Total entropy '''S' = N * H.'''
Units are bits if log is base 2, nats if ln()).

4) Normalized total entropy '''Sn' = N * H / log(n).'''  See "gotcha" below in choosing n.
Unit is "entropy". It varies from 0 to N

5) Physical entropy S of a binary file when the data is stored perfectly efficiently (using Landauer's limit): '''S = S' * kB / log(e)'''

6) Macroscopic information entropy of an ideal gas of N identical molecules in its most likely random state (n=1 and N is known a priori):  '''S' = S / kB / ln(1)''' = kB*[ln(states^N/N!)] = kB*N* [ln(states/N)+1].

*Gotcha: a data generator may have the option of using say 256 symbols but only use 200 of those symbols for a set of data. So it becomes a matter of semantics if you chose n=256 or n=200, and neither may work (giving the same entropy when expressed in a different symbol set) because an implicit compression has been applied.
rosetta stone, on the main entropy page:

Calculate the Shannon entropy H of a given input string.

Given the discreet random variable X that is a string of N "symbols" (total characters) consisting of n different characters (n=2 for binary), the Shannon entropy of X in '''bits/symbol''' is :
:H_2(X) = -\sum_{i=1}^n \frac{count_i}{N} \log_2 \left(\frac{count_i}{N}\right)

where count_i is the count of character n_i.

For this task, use X="1223334444" as an example. The result should be 1.84644... bits/symbol.  This assumes X was a random variable, which may not be the case, or it may depend on the observer.

This coding problem calculates the "specific" or "[[wp:Intensive_and_extensive_properties|intensive]]" entropy that finds its parallel in physics with "specific entropy" S0 which is entropy per kg or per mole, not like physical entropy S and therefore not the "information" content of a file. It comes from Boltzmann's H-theorem where S=k_B  N  H where N=number of molecules. Boltzmann's H is the same equation as Shannon's H, and it gives the specific entropy H on a "per molecule" basis.

The "total", "absolute", or "[[wp:Intensive_and_extensive_properties|extensive]]" information entropy is
:S=H_2 N bits
This is not the entropy being coded here, but it is the closest to physical entropy and a measure of the information content of a string. But it does not look for any patterns that might be available for compression, so it is a very restricted, basic, and certain measure of "information". Every binary file with an equal number of 1's and 0's will have S=N bits. All hex files with equal symbol frequencies will have S=N \log_2(16) bits of entropy. The total entropy in bits of the example above is S= 10*18.4644 = 18.4644  bits.

The H function does not look for any patterns in data or check if X was a random variable.  For example, X=000000111111 gives the same calculated entropy in all senses as Y=010011100101. For most purposes it is usually more relevant to divide the gzip length by the length of the original data to get an informal measure of how much "order" was in the data.

Two other "entropies" are useful:

Normalized specific entropy:
:H_n=\frac{H_2 * \log(2)}{\log(n)}
which varies from 0 to 1 and it has units of "entropy/symbol" or just 1/symbol. For this example, Hn<\sub>= 0.923.

Normalized total (extensive) entropy:
:S_n = \frac{H_2 N * \log(2)}{\log(n)}
which varies from 0 to N and does not have units. It is simply the "entropy", but it needs to be called "total normalized extensive entropy" so that it is not confused with Shannon's (specific) entropy or physical entropy. For this example, Sn<\sub>= 9.23.

Shannon himself is the reason his "entropy/symbol" H function is very confusingly called "entropy". That's like calling a function that returns a speed a "meter". See section 1.7 of his classic [http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf  A Mathematical Theory of Communication] and search on "per symbol" and "units" to see he always stated his entropy H has units of "bits/symbol" or "entropy/symbol" or "information/symbol".  So it is legitimate to say entropy NH is "information".

In keeping with Landauer's limit, the physics entropy generated from erasing N bits is S = H_2 N k_B \ln(2) if the bit storage device is perfectly efficient.  This can be solved for H2*N to (arguably) get the number of bits of information that a physical entropy represents. 
== From Shannon's H to ideal gas S ==
This is a way I can go from information theory to the Sackur-Tetrode equation by simply using the sum of the surprisals or in a more complicated way by using Shannon's H. It gives the same result and has 0.16% difference from ST for neon at standard conditions.

Assume each of the N atoms can either be still or be moving with the total energy divided among "i" of them. The message the set of atoms send is "the the total kinetic energy in this volume is E". The length of each possible message is the number of moving atoms "i".  The number of different symbols they can use is N different energy levels as the number of moving atoms ranges from 1 to N.  When fewer atoms are carrying the same total kinetic energy E, they will each have a larger momentum which increases the number of possible states they can have inside the volume in accordance with the uncertainty principle. A complication is that momentum increases as the square root energy and can go in 3 different directions (it's a vector, not just a magnitude), so there is a 3/2 power involved. The information theory entropy S in bits is the sum of the surprisals. The log gives the information each message sends, summed over N messages.

S_2 = \sum_{i=1}^{N} \log_2\left(\frac{\Omega_i}{i} \right)  where  \Omega_i = \Omega_N*\left(\frac{N}{i} \right)^\frac{3}{2}

To convert this to physical entropy, change the base of the logarithm to ln(): S=k_B \ln(2) S_2
You can make the following substitions to get the Sackur-Tetrode equation:

\Omega_N = \left(\frac{xp}{\frac{\hbar}{2\sigma^2}}\right)^3, x=V^\frac{1}{3}, p=\left(2mU/N\right)^\frac{1}{2}, \sigma = 0.341, U/N=\frac{3}{2} k_B T

The probability of encountering an atom in a certain momentum state depends (through the total energy constraint) on the state of the other atoms. So the probability of the states of the individual atoms are not a random variable with regard to the other atoms, so I can't write H as a function of the state of each atom (I can't use S=N*H) directly. But by looking at the math, it seems valid to re-interpret interpret an S=ΩH as an S=N*H. The H inside () is entropy per state. The 1/i makes it entropy per moving atom, then the sum over N gives total entropy.  The sum for i over N was for the H, but then the 1/i re-interpreted it to be per atom. So it is an odd type of S=NH. Notice my sum for j does not actually use j as it is just adding the same pi up for all states.

S_2 = \sum_{i=1}^{N}\left[ \frac{1}{i} *\left( -1*\sum_{j=1}^{\Omega_i} \frac{i}{\Omega_i}\log_2 \frac{i}{\Omega_i}\right)\right] = same =  \sum_{i=1}^{N} \log_2\left(\frac{\Omega_i}{i} \right)

Here's how I used Shannon's H in a more direct but complicated way but got the same result:

There will be N symbols to count in order to measure the Shannon entropy: empty states (there are Ω-N of them), states with a moving atom (i of them), and states with a still atom (N-i).  Total energy determines what messages the physical system can send, not the length of the message. It can send many different messages. This is why it's hard to connect information entropy to physical entropy: physical entropy has more freedom than a normal message source. So in this approximation, N messages will have their Shannon H calculated and averaged. Total energy is evenly split among "i" moving atoms, where "i" will vary from 1 to N, giving N messages. The number of phase states (the length of each message) increases as "i" (moving atoms) decreases because each atom has to have more momentum to carry the total energy. The Ωi states (message length) for a given "i" of moving atoms is a 3/2 power of energy because the uncertainty principle determining number of states is 1D and volume is 3D, and momentum is a square root of energy.

S_2 = \frac{1}{N} \sum_{i=1}^{N} \left[ H_i \Omega_i \right]

Again use S=k_B \ln(2) S_2 to convert to physical entropy. Shannon's entropy Hi for the 3 symbols is the sum of the probability of encountering an empty state, a moving-atom state, and a "still" atom state. I am employing a cheat beyond the above reasoning by counting only 1/2 the entropy of the empty states. Maybe that's a QM effect.

H_i= -0.5*\frac{\Omega_i - N}{\Omega_i} \log_2\left(\frac{\Omega_i-N}{\Omega_i}\right) -  \frac{i}{\Omega_i} \log_2\left(\frac{i}{\Omega_i}\right) -  \frac{N-i}{\Omega_i} \log_2\left(\frac{N-i}{\Omega_i}\right)

Notice H*Ω simplifies to the count equation programmers use. E=empty state, M=state with moving atom, S=state with still atom.

S_i=H_i \Omega_i= 0.5 E \log_2\left(\frac{\Omega_i}{E}\right) + M\log_2\left(\frac{\Omega_i}{M}\right) + S \log_2\left(\frac{\Omega_i}{S}\right)

By some miracle the above simplifies to the previous equation. I couldn't do it, so I wrote a Perl program to calculate it directly to compare it to the ST equation for neon gas at ambient conditions for a 0.2 micron cube (so small to reduce number of loops to N<1 confirmed="" correct="" entropy="" equation="" is="" mega="" million.="" million="" molar="" nbsp="" official="" s="" st="" standard="" sup="" the="" with="">0
for neon: 146.22 entropy/mole / 6.022E23 * N. It was within 0.20%. I changed P, T, or N by 1/100 and 100x and the difference was from 0.24% and 0.12%.  #!/usr/bin/perl
  # neon gas entropy by Sackur-Tetrode (ST), sum of surprisals (SS), and Shannon's H (SH)
  $T=298; $V=8E-21; $kB=1.381E-23; $m=20.8*1.66E-27; $h=6.6262E-34; $P=101325;  # neon, 1 atm, 0.2 micron sides cube
  $N = int($P*$V/$kB/$T+0.5); $U=$N*3/2*$kB*$T;
  $ST = $kB*$N*(log($V/$N*(4*3.142/3*$m*$U/$N/$h**2)**1.5)+5/2)/log(2.718);
  $x = $V**0.33333;  $p = (2*$m*$U/$N)**0.5; $O = ($x*$p/($h/(4*3.142*0.341**2)))**3;
  for ($i=1;$i<$N;$i++) {  $Oi=$O*($N/$i)**1.5;
      $SH += 0.5*($Oi-$N)*log($Oi/($Oi-$N)) + $i*log($Oi/$i)  + ($N-$i)*log($Oi/($N-$i));
      $SS += log($Oi/($i));     }
  $SH += 0.5*($O-$N)*log($O/($O-$N)) + $N*log($O/$N); # for $i=$N
  $SH = $kB*$SH/log(2.718)/$N;
  $SS = $kB*$SS/log(2.718);
  print "SH=$SH, SS=$SS, ST=$ST, SH/ST=".$SH/$ST.", N=".int($N).", Omega=".int($O).", O/N=".int($O/$N);
[[User:Ywaz|Ywaz]] ([[User talk:Ywaz|talk]]) 19:54, 27 January 2016 (UTC)

Monday, November 23, 2015

FDA, pharmaceuticals, chemo, heart surgery, and supplements

The FDA will not allow "snake oil" to be advertised as a cure for a disease. It's a great idea.  However, in order to meet the FDA's burden of proof, it is a very expensive process to prove a treatment can help a disease.  The same burden of proof is not required for any surgery because there is no way to give the surgery that many times, especially in multiple centers trying to use different techniques, or even have a placebo group.  This is how the cardiology field gets away with literally selling snake oil and killing people who could have lived.   Honest groups of cardiologists have even published the vast failure of heart surgeries that should have been putting in tubes instead (basically, if you are not in the wake of a recent heart attack, you should be skeptical of the need for open heart surgery over a stent, according to the medical field's own research). The oncology field commits the same crimes under the pretense of investigating new drugs (for the past 4 decades...with no improvements) that sell false hope.  It's not distinguishable from snake oil. The details are usually arguable, but in many specific cases it is  clear. You can't get doctors to testify against other doctors in most cases or they can get in deep systematic trouble with their peers, especially with the heads of medical boards that issue the licenses needed to testify.  Researchers face the same problem: be careful what truths or opinions you tell or your funding dries up. 
My point in the comparison is that heart surgeries and chemotherapies that are known to be harmful and not provide any benefit are being prescribed, and yet people who sell very safe inexpensive compounds are not allowed to tell you (by FDA regulations and severe penalties) that they stop Parkinson's in the test tube, in animals, and prevent PD in epidemiological studies.  Instead, supplements require the same level of proof as usually toxic and expensive pharmaceuticals. Even the majority of private and public funding go to the pharmaceuticals there is every reason to believe will not help (at least in cancer) as opposed to the safe and cheap supplements that already have supporting evidence. Where's the logic in this if it is not because of some sort of accidental or intentional conspiracy in our economics? 
Then there is advertising:  GM1 phase 3 trials were complete over 5 years ago. But almost no one here has heard about GM1.  See
It reverted and stopped the disease from progressing, which is not being said of nilotinib.  It was 77 patients instead of 12.  The data is published.  Nilotinib study is not.  It was a phase 3 trial.  Nilotinib is only a phase 1 trial that is supposed to be only looking at safety. I wonder if a lot of professionals are aghast that they are saying these things through the media before publishing, especially since it was supposed to look only at safety. On the other hand only 12 people and the researchers doing such an outlandish thing like going public could mean it really is that good.  I just hope it is not cold fusion all over again, if you remember that fiasco where they went public without the science to back it up. Getting back tot he point of advertising:  I came across GM1 accidentally because I was looking for the trial on nilotinib.
Two years after treatment stopped, they had progressed to where "standard care" patients had been 2 years earlier.  In other words, the benefits seemed to permanently reverse the condition by 2 years even after treatment was stopped. That's my reading of figure 2 in the link above.
It's not a conspiracy theory, but I think it's just how economics works.  We all seek profit.  Doctors, politicians, pharmaceuticals, and researchers should not be expected to act any different than we do. I assume everyone acts about like mechanics, plumbers, painters, and AC repairmen.  My experience with them does not usually fall under the heading of "honest" and "fair".  If the consumer is not knowledgeable, then he should expect to be taken to the cleaners.
I do not know of something better than GM1, but that does not mean there are not 5 other compounds out there with similar proof.  It's just one I accidentally saw.  Gallic acid might be just as good and it's super cheap, but it probably hasn't been tested in people, except for the benefits noted from black tea and grape seed.

Thursday, November 5, 2015

Mechanical and Electrical Analogy: deriving of mass from charges, posted to talk page in wikipedia

I would love to see a theoretical physics discussion of the origin of the analogy between electrical and mechanical components. On the surface, you could say we find it easy to think in terms of linear systems, so we think of components that act linearly. This leads directly to the same simple differential equations for different systems, especially when conservation of energy is a natural focal point for optimizing engineering applications.  The most natural of the possible analogies is the impedance analogy which is the one most commonly used and first cited in the Wikipedia article. The reason for this is that charge in electrical components is simply replaced with meters in the mechanical components. There are no other changes. Capacitors allow charge to build up for a fixed dielectric distance and the analogous spring allows meters to build up for a fixed number of charges (which are the source of resisting compression). For small compressions and non-saturating amounts of charge, both are linear.  The same direct relation exists for inductors and mass: inductance (magnetism) from a classical view (pre-quantum) is a relativistic effect of charge build up per unit length, not a thing unto itself.  See Schwartz, Feynman, and Wikipedia. For small changes, it is again linear so V=L*di/dt instead of having to resort to full-blown relativistic equations. So it seems mass could be viewed from a pre-quantum perspective as the relativistic effect of (quark?) charges being brought closer together as a result of length contraction.  Again, it's linear for small changes in velocity so F=ma instead full-blown relativistic calculations. In short, linear electrical components control charge/length where length is held constant by the component, and mechanical systems do the same but hold the charge constant and allow lengths to change. This is simple enough that there should be some references out there that delve into the source of the analogies and thereby allow it to be included in the article.

Monday, November 2, 2015

Internal depth and romantic love

How do you define depth?
Here is my attempt for both men and women: Ability to surrender heart, mind, body, and soul to the one he/she loves. This a two way street.
All the talk about intelligence, experience, money, and appearance are missing the mark. These are like pre-requisites. Depth seals the deal.

Intelligence and maturity could be different types of depth that could make life with someone easier, but how are these really different from wealth?  The  type I am thinking about is about the ability to be intimate.

Qualities that seem related to this:
1) comfortable and content with self.
2) honest.
3) not a whiner
4) no preoccupation with appearance
The 4 above can be summed up as "mature".
Women who know how to let a man be a man is a big plus, as is men who know how to be manly when given permission.

I could replace "not a whiner" with  "ability to be fair". "Respectful" would also fall under "fair". "Whining" and "not fair" implies trying to get something material or emotional out of the other person without just cause or compensation. So someone in love with you is not showing depth if unconsciously trying to extract too much.  Not being able to detect that you are being harmed by their actions or requests is a lack of "depth".  Intelligence is thereby deeply connected to depth and maturity, but it is definitely not the key.
Many here have mentioned preoccupation with appearance.  "Narcissism" usually has roots in unconscious low self-esteem that results from childhood traumas. It can appear confident, happy, intelligent, etc, and yet have absolutely no depth when it comes to romantic intimacy. A deep person can see "something is off" when trying to get to known a "narcissist". I don't like using the word because its an incredibly brutal accusation against someone who has had a serious injury in childhood, and there all levels of it including "normal" people.  Above all, the pure narcissist's self esteem can't be threatened. The is a permanent road block to true intimacy. They can love, but not be loved.  The internal person the narcissist might have been has often been killed off by childhood traumas, be it faulty/bad relationships (or lack thereof) with parents or whatever.
"Mature" is not a hardened type of maturity.
"Confident" is a key to attraction, maturity, and depth. But it is not a requirement for anything other than attraction. Someone with low self-esteem and conscious of it can have plenty of depth, even more than they realize. But I do not think they can or should end up with someone confident. Love is fair even if fairness does not lead to love.
The confident find the confident. The down-trodden find the down-trodden. If God is merciful, the shallow will stick will find the shallow.

Friday, October 16, 2015

neurogensis notes from ted talk

Notes from this ted talk

Here's her list for improving neurogenesis in the hippocampus for improving mood and memory:

good sleep
calorie restriction
intermittent fasting
(she left out heat baths)

omega-3 oils (fish oil, canola)
low sugar
food that needs more chewing
vit A, vit D, vit B's, vit E, zinc, folic acid, caffeine, no saturated fat
She mentioned also resveratrol and curcumin, but these are only in animal models. These do not make it to the brain in humans in any formulation, certainly not in the concentrations that have been found to work in animals.

hurts it:
sleep deprevation

Monday, September 28, 2015

my theory of every thing, biology being replaced, post to MIT review

@0.426771640856  Wiki has a better interpretation of the Bremerman limit:  "The limit has been further analysed in later literature as the maximum rate at which a system with energy spread can evolve into an orthogonal and hence distinguishable state to another".  I adhere to the many world's theory, so the problem it is "solving" is the act of actually taking every possible route, not looking for the one that solves an optimization problem.

The narrow goal of A.I. is optimization. Chess is a specific problem but you're shifting gears to an optimization problem when there is no defined problem, or rather, you're assuming there is a very grand problem that existence is solving, harking back to Adam's Hitchhiker's Guide, which is undoubtedly not the originator of the idea. I mean, the guide itself was stole straight out of Asimov's Foundation. My view is that nothingness by definition can't exist and "everything" is being created as a result of the logical impossibility of nothingness. A finite universe seems like it would leave holes of nothingness. Related to this everythingness is Gödel's incompleteness theorem which indicates you have to have an infinite number of axioms to have a logically complete system, or an inconsistent set of axioms. My nothingness axiom fulfills both conditions: it is a recursively self-destructive axiom who's output is every other axiom.

But if we look at pieces of this everythingness such as gravitational systems emitting entropy (which keeps the Universe's comoving-volume entropy constant) (final entropy of the Universe is an open question and measured to be constant per comoving volume due to the observed homogeneity of heat, i.e. there is no heat transfer), then we can see a shift towards local order, i.e., an optimization problem being solved. The Earth intuitively feels like a computer because life seems to be going towards more local order, which is possible because it is an open system, and we can measure the amount of excess entropy it is generating. Answers to problems are fewer bits than the data behind the question so answers are less entropy.

Evolution appears to be the principle of least action (a more general form of newton's laws), which maximizes potential energy at the cost of kinetic energy over all time scales, which creates high-energy bonds and produces less heat in systems, utilizing excess energy that comes to the Earth. We release excess entropy via 17 random low energy photons for every directional photon from the sun, so that we have the opportunity to create local order.

Seeking the highest energy potential energy bonds limits the number of choices which creates "copies" which is less entropy. Less kinetic energy means less heat, which means less entropy. This is why the mass on Earth is shifting from biological bonds to silicon, metallic, and carbon bonds with oxygen removed. These elemental (more order) high energy bonds enable machines to be 20x, 100x, and 10,000,000 times more efficient than photosynthesis, muscles, and brains, respectively, at acquiring energy to move matter to make copies of the thnking machine's support structure. Brains need to move ions weighing 40,000 times more than the electrons in CPUs which is why brains are inherently outdated. Biology was not able to directly smelt metals and metalloids. Thinking machines will move electrons to model larger bits of matter in the external world in order to repeat the reproduction process more efficiently.  The great depression began as muscles were replaced by machines on farm and factory. We had to shift to thinking jobs in order to remain useful to the corporate machine (which seeks fewest employees and fewest shareholders). The current problem is deciding how to distribute the wealth as more and more muscles and brains are not needed. Wealth inequality could eventually kill 90% of the human population due to people being more and more irrelevant to the efficiency-seeking evolutionary process we call economics, but that is not going to stop the biosphere from being replaced by a more efficient entropy-reducing technology.

You did ask for thoughts.

Saturday, September 26, 2015

Kepler on Gravity: Newton who?

"If I have seen further, it is by standing on the shoulders of giants." - Sir Isaac Newton, modifying a quote attributed to 12th century philosopher Bernard of Chartres.

"Before Kepler, all men were blind, Kepler had one eye, and Newton had two eyes." -Voltaire who first referred to Newton's refinements of Kepler's contributions as "laws".

Kepler was aware of every aspect of universal gravity except explicitly saying 1/R^2, and that his most surprising gravity axiom shows he assumed either F=ma, or "Einstein's" equivalence principle between inertia and gravity.

Kepler's axioms of gravity contain the following statements.  This is my wording, but I'll spell out the caveats and give the exact wording further down. They are all correct.

1) objects at rest in space stay at rest, if not affected by gravity
2) gravity is proportional to mass
3) mass = density*volume
4) applies to all substances (corporeal). no substance is without mass
5) decreases with distance
6) extends forever if no other gravitational mass interferes
7) center of a mass has zero gravity
8) Earth and moon would collide if you let them go from resting positions. Two masses M1 and M2 released from rest at a distance R will collide after travelling D1=R*M2/(M1+M2) and D2=R*M1/(M2+M1). Closer to his wording and more simply written: D2/D1=M1/M2. It's correct. I believe it originates from Galileo's investigations, but Kepler was unsuccessful in convincing Galileo gravity extended from the moon to causes the tides. Can physicists prove 8 without using "Newton's" laws and "Newtonian" gravity?

The author of the newest translation of Kepler's "Astronomia Nova" (wherein Kepler states his gravity axioms and his 3 laws of planetary motion) is wrong in claiming that Kepler's gravity "[does not contain] the least notion that gravity extends to any other bodies than the Earth and the moon". It is so ingrained in modern thinking that Newton discovered gravity, that many excuses have to be made against Kepler's perfectly correct gravity (which was missing the 1/R^2 if you exclude the logical conclusion of his gravity and planetary laws). What was Newton's dwarf-on-shoulders of giants contribution? Kepler seemed to be simply unaware there was not an additional "motive" force other then Galileo's inertia that kept planets going around the Sun.

In Bethune's words from 1830 (London edition, not the 1932 Boston edition)
"[Kepler] also conjectured that the irregularities in the moon's motion were caused by the joint action of the sun and earth, and recognized the mutual action of the sun and planets, when he declared the mass and density of the sun to be so great that the united attraction of the other planets cannot remove it from its place."
Professor Forbes in 1909 in "History of Astronomy"
"it must be obvious that [Kepler] had at that time some inkling of the meaning of his laws--universal gravitation. From that moment the idea of universal gravitation was in the air, and hints and guesses were thrown out by many"  
In the words of Bethune again:
"Many who are but superficially acquainted with the History of Astronomy, are apt to suppose that Newton's great merit was in his being the first to suppose an attractive force existing in and between the different bodies composing the solar system. This idea is very erroneous .. the general notion of an attractive force between the sun, moon, and planets was very commonly entertained before Newton was born, and may be traced back to Kepler, who was probably the first modern philosopher who suggested it. "
Now compare this to Newton, known for denying credit to his peers, acting as if he discovered the attraction of gravity that was explained in the introduction to Kepler's book on planetary motion:
"Kepler’s laws, although not rigidly true, are sufficiently near to the truth to have led to the discovery of the law of attraction of the bodies of the solar system. The deviation from complete accuracy is due to the facts, that the planets are not of inappreciable mass, that, in consequence, they disturb each other's orbits about the Sun..."    
Here are more points in support of the view that Kepler was thinking in terms of a "universal" gravity:

1) Kepler thought magnetism must be in the Sun and planets because it was in the Earth, and he emphasized the similarities between gravity and magnetism (but did not claim they were the same but is accused of claiming magnetism was gravity).  He also correctly jumped to the conclusion that if the Earth spins, then so does the Sun, showing he viewed the Sun as a celestial body like the planets.

2) Kepler was justifiably accused of trying too hard to generalize basic ideas. Examples: "Harmonies" of the solar system being related to music, wondering if the Earth might breathe in some way analogous to fish (predating Gaia ideas), and imagining a story about travelling to the moon to meet potential inhabitants.

3) I have not seen anyhting to indicate he could be accused of thinking the Earth is different from any other planet. His comments that were inline with older thoughts of assigning personalities to planets seemed very much allegorical and an appeasement to the language and thought of the time.

4) He tried to apply his three planetary laws to the moon (see quote below), indicating he could think of the Earth acting as a "Sun" and the moon like a "planet", and since he stated gravity is always present, he had to be thinking it was an integral part of his planetary laws, and therefore applicable, if not equally applicable to all planets. As he states, you only need to make a correction for density if you know the volume in order to get the mass.

5) Kepler supported Bruno's opinion that stars were Suns with planets or moons in their orbits.

"Short History of Astronomy" by Arthur Berry, 1898
"The "Epitome of Copernican Astronomy" (1618)  contains the first clear statement that the two fundamental laws of planetary motion established for the case of Mars were true also for the other planets (no satisfactory proof being, however, given), and that they applied also to the motion of the moon round the earth, "
Universality of gravity aside, it is hard to underestimate the precision and completeness of Kepler's gravity axioms stated in the introduction to Astronomia Nova. It was presented as proof that the Earth can't be the center of the universe, and was in support of the idea that physics should be applied to astronomy. This was his most widely distributed text, and one of the few you can find for free on the internet. (This was his most famous book and it contains his 3 laws, but you can't find the rest of this text on the internet in any language except Latin. Only his 8 gravity axioms are easy to find.).  After reading his axioms, it is not possible to hold the opinion that Newton discovered gravity.  The only element of Newtonian gravity that is not blatant is 1/R^2, which is a geometrical effect of conserved "rays" of any type being emitted from a point source. It is also a derivable consequence of just the first two of Kepler's laws and Galileo's inertia. But he did not do this derivation (his math skills were easily up to the task which Einstein marveled at), possibly because he was thinking the planets were being subjected not only gravity, but some sort of friction and a sideways force from the Sun in the form of magnetism (or another unknown force) to overcome the friction.

Decades before Newton was thinking about falling objects, Kepler described with perfect precision in "meters" and "kg" how far the Earth would move "up" if we were able to stop the moon's orbit and drop it towards the Earth. He said this applied equally well to every mass. I can't derive this axiom unless I use F=ma and the knowledge that gravity causes an acceleration, i.e. Galileo's distance=1/2*a*t^2.  If you can derive this without using F=ma, please say so in the comments.

Other than not saying 1/R^2 explicitly, there are two potential problems with his gravity axioms.

One is that he used the phrase "cognate bodies" (mutuainter cognata corpora unitionem seu conjunctionem) which some have claimed means he was only talking about the Earth-moon system. There are many lines of reasoning that show this is a big assumption, as I've described above.

"If he knew gravity was universal, why didn't he talk more about it?"  He did place it in the introduction to his most famous work and call them axioms.  His friend Galileo could not believe the tides were caused by the moon and going further might have seemed premature. The Sun carrying such a force might have caused more problems with the church (other than trying to burn his mother at the stake). But more likely I think gravity is such an obvious everyday thing that Kepler could have considered it less important and trivial compared to determining the rules that govern the positions of the planets. Gravity was incredibly easier to see and understand compared to determining the rules that govern the wandering planets as viewed from Earth. We can easily say "ellipse" now, but determining it and proving it to others from the point of view of Earth's orbit in the midst of these other ellipses (that differed from circular by only 0.4% in the case of Mars) made Newton's mathematical restatement of Kepler's laws (and connecting it to Galileo's work) trivial (as Einstein explained). Kepler was no less shabby than Newton in optics and calculus-like work. What more needed to be said about gravity beyond Galileo?  Kepler had gravity correct far beyond Galileo's beliefs, but he could not see how it alone with inertia was enough to determine his laws without any other magnetic, animal, or friction forces. Gravity for him probably appeared to be obvious, everyday, simple, and solved, much less interesting than the solving the riddle of the planets.

Getting back to potential problems with his axioms: More advanced forms of the "cognate bodies" complaint are that he may have thought each planet ("cognate body") would have a different gravity such as a different "gravitational constant" or distance rule, especially if they had a magnet effect that was interfering with it and thereby modifying it.  Others think that maybe he thought masses were attracted only to their own "cognate" masses, which by some strange reasoning would include only their satellites but not other planets. Hence, they say, Newton was endowed with "universal" gravitation.

But if axiom 8 works on other planets with a different gravity constant, then (as I discuss below) the inertial force would have to change in lockstep with the change in the gravitational force (the "Einstein" principle of equivalency would remain intact because axiom 8 depends on it). This means the ratio of the gravity force to the inertial force would have to remain the same. But this is the same saying only the density has changed in other "cognate bodies" and he was fully aware of the effect of changes in density. His gravity adjusts for changes in density.

You might object that his knowledge of inertia was not advanced enough to make the above claim. Besides his comments on inertia and knowledge of Galileo (who discovered Newton's first law), his "Dream" book more than anything shows how well he understood inertia. He estimated that only with extreme precautions could aliens transport thin humans with strong bones (from riding goats since childhood) and opiates to the moon in only 4 hours by a "blast off" method accelerating into space, with great aliens standing on top of each other's shoulders to give the humans a boost "as if by gun powder", predating rocket ships.  He mentions easier travel after blast off, if it were not for the lack of air and the cold, and a similar deceleration process at the moon.  At 10 g, the most a person might be able to tolerate, I get that the acceleration would need to be for 300 seconds to get half way to the moon in 2 hours (4 hours for the full trip). That would need to be about a 3000 mile-high stack of aliens reaching out into space, all within the parameters of description. He had to understand inertia, falling bodies, and gravity well enough for this to come out so accurately in accordance with Newton's laws, with a good guess as to what a human body can withstand. I would have been hard-pressed to come this close without knowing 10 g, v=at, and s=1/2*a*t^2. He had some sort of access to the last two via Galileo. It is interesting that he made the travelling time "a little less than 4 hours" for another reason: at 1 g acceleration above Earth's gravity to half way and 1 g deceleration for the rest of the way, I get 3.88 hours. I could find no mention about gravity on the moon being less except his mentioning that the inhabitants grow to a very large size which is more likely when gravity is less.

Another potential problem is that in two different English translations he says the Earth is less "attracted" to a stone than the stone "seeks" the Earth. His latin is "trahat" and "petit", which can also be translated as "to pull forth" and "to strive for" or "travels to".  If he meant as some type of modern "force" then it is wrong, as the force is supposed to be equal on both. But this would conflict with his 8th axiom.  He may have meant it in a "distance" or "velocity" sense, in which case it is correct and in accordance with his 8th axiom.

Some might complain they did not have a distinct concept of "mass" so the translation could be in error. However, the latin was "moles" which is "mass", "weight", or "load", so I think it is sufficiently clear he meant it as you weigh it, and he distinctly indicated moles = density * volume which is mass.  He may not have thought about it as an absolute value of mass translatable to any planet as we know it, but he kept his comments on gravity perfectly correct by speaking in terms of ratios of mass.

Another complaint is that he thought magnetism was gravity but his two stones example did not require they be magnets. He entertained the idea that a magnetism or some other force from the Sun kept planets moving against a supposed friction. He also appears to have ascribed an "animal force" for this movement, inherent to bodies, but he may have specifically meant for this to mean inertia. He sometimes mentions gravity and mass have a parallel in the magnetic force in that magnets have an invisible type of action at a distance.

I can confirm axiom 8 only by using F=ma and Galileo's distance=1/2*a*t^2 for each mass, plugging in a=F/m, noting that they collide at the same time t, then thereby let F, 1/2, and t^2 cancel when I divide the two Galileo distance equations and get D1/D2=m2/m1 as Kepler states. This is an interesting simple result and it could be named "Kepler's law of mass attraction." It leaves open the possibility of gravity changing as any function of distance, but remains valid only if the well-known "Einstein" equivalency of the inertial force and gravity force on the mass are the same, allowing me to make the substation of "a" above.   How did he do it when he was not supposed to know F=ma?  Do his 1st two laws give this result when taken to the limit of an orbit of zero?  He is saying "if you take out my planetary laws by stopping the moon's orbit and remove all other forces on it, you will have a gravitational force that will cause the Earth and moon to come together at this particular point in space."  So he could see gravity clearly when he stopped thinking about "magnetic", "animal", and frictional forces in planets by making the orbits stop.

Finally I am getting to Kepler's own words, which are equal in weight to all the above. All the above was needed to defend against the various remarks made that try to diminish Kepler's contributions. Part of the problem is that there is no full English translation of Astronomia Nova, and no free version available on the internet in any language other than latin.  This is as astonishing as anything else. Everyone has had access to English Newton, but not German Kepler. And the dominance of British and US English speakers in world economics for the past 2+ centuries could not have helped Kepler's case.

In the quote below he indicates the earth/moon volume ratio is 54, but it is actually 49, showing he had a 3% error in the ratio of the diameters.

Kepler, on gravity in his introduction to "Astronomia Nova":
"It is therefore plain that the [common] theory of gravity is erroneous. The true theory of gravity is founded on the following axioms : Every corporeal substance, so far forth as it is corporeal, has a natural fitness for resting in every place where it may be situated by itself beyond the sphere of influence of a body cognate with it. Gravity is a mutual affection between cognate bodies towards union or conjunction (similar in kind to the magnetic virtue), so that the earth attracts a stone much rather than the stone seeks the earth. ...If two stones were placed in any part of the world near each other, and beyond the sphere of influence of a third cognate body, these stones, like two magnetic needles, would come together in the intermediate point, each approaching the other by a space proportional to the comparative mass of the other. If the moon and earth were not retained in their orbits by their animal force or some other equivalent, the earth would mount to the moon by a fifty-fourth part of their distance, and the moon fall towards the earth through the other fifty-three parts, and they would there meet, assuming, however, that the substance of both is of the same density. If the earth should cease to attract its waters to itself all the waters of the sea would he raised and would flow to the body of the moon. The sphere of the attractive virtue which is in the moon extends as far as the earth, and entices up the waters; but as the moon flies rapidly across the zenith, and the waters cannot follow so quickly, a flow of the ocean is occasioned in the torrid zone towards the westward. If the attractive virtue of the moon extends as far as the earth, it follows with greater reason that the attractive virtue of the earth extends as far as the moon and much farther; and, in short, nothing which consists of earthly substance anyhow constituted although thrown up to any height, can ever escape the powerful operation of this attractive virtue."
Part of the problem with not giving Kepler his due may be summed up by this Berry, in his "Short History of Astronomy,"
"as one reads chapter after chapter without a lucid, still less a correct
idea, it is impossible to refrain from regrets that the intelligence of
Kepler should have been so wasted, and it is difficult not to suspect at
times that some of the valuable results which lie embedded in this great
mass of tedious speculation were arrived at by a mere accident. On the
other hand it must not be forgotten that such accidents have a habit of
happening only to great men,"
But before believing any negative comments about Kepler, it's good to check Kepler's words against what others say he said.  Compare Sir David Brewster's comments to what Kepler actually said.

Brewster's libel:
"Although Kepler, in his Commentaries on Mars, had considered it
probable that the waters of our ocean are attracted by the moon, as iron
is by a loadstone, yet this opinion seems to have been a very transient
one, as he long afterwards, in his System of Harmonies, stated his firm
belief that the earth is an enormous living animal, and enumerates even
the analogies between its habits and those of known animated beings. He
considered the tides as waves produced by the spouting out of water
through its gills, and he explains their relation to the solar and lunar
motions by supposing that the terrene monster has, like other animals,
its daily and nightly alternations of sleeping and waking."
Here are Kepler's actual words:
"What so like breathing, especially of those fish who draw water into their
mouths and spout it out again through their gills, as that wonderful
tide! For although it is so regulated according to the course of the
moon, that, in the preface to my 'Commentaries on Mars,' I have
mentioned it as probable that the waters are attracted by the moon, as
iron by the loadstone, yet if anyone uphold that the earth regulates its
breathing according to the motion of the sun and moon, as animals have
daily and nightly alternations of sleep and waking, I shall not think
his philosophy unworthy of being listened to; especially if any flexible
parts should be discovered in the depths of the earth, to supply the
functions of lungs or gills."
Kepler seems sensible, fun, and open-minded to new ideas without abandoning his belief that the moon's gravity was the source of the tides..

Is it merely an English bias that has placed Newton so far above Germany's Kepler?  Einstein, another German, greatly admired Kepler's skill and grieved that Galileo did not give him more support.

The foundation of physics is the interaction between observation and mathematical condensation. Galileo successfully used observation to form ideas Kepler needed to form an idea of universal gravity. The physical "experimentalist" Brahe took careful measurements. Kepler religiously adhered to them, needing only a 0.4% error in Mars and a 2% error in Mercury (after a tremendously complex deduction of orbits) to abandon the perfect circles that he above others wanted to be true. He is very often accused of being soft, flaky, and religious in sentiment. But it was an incredibly strong faith in observation guiding theory without exception or imprecision, and with incredibly difficult mathematical work (according to Einstein), that enabled him to provide the physics Newton needed to do the mathematical combination of Kepler's gravity, Kepler's orbits, and Galileo's students' inertia.  In the English-speaking world Newton was the real great beginning or great leap forward in physics, but I think Kepler's work is closer to the truth.