Tuesday, November 24, 2015

entropy: relation between physical and information

== The relation between S and H ==

H is intensive S
Shannon entropy H appears to be more like bits per symbol of a message (see examples below for proof) which has abetter parallel with '''intensive''' entropy (S per volume or S per mole), NOT the usual '''extensive''' entropy S.  This allows "n" distinct symbols like a,b,c in a Shannon message of length N to correlate with "n" energy levels in N particles or energy modes.

H as possible extensive S?  Not really
However, a larger block of matter will have more longer wavelength phonons so that more Shannon symbols will be needed to represent the larger number of energy levels, so maybe it can't be taken as intensive S in all occasions.  But the same error will apply if you try to use intensive S.  So intensive S seems to be exactly equal to Shannon, as long as the same reference size block is used. This problem does not seem to apply to gases. But does a double-sized volume of gas with double number of molecules have double entropy as I would like to use Shannon entropy?    Yes, according to the equation I found (at the bottom) for S for a gas:  S = N*( a*ln(U/N) + b*ln(V/N)+c* ).  If N, U, and V all double, then entropy merely doubles.

Why Shannon entropy H is not like the usual extensive entropy S
Examples of Shannon entropy (using the online Shannon entropy calculator): http://www.shannonentropy.netmark.pl/
01, 0011, and 00101101 all have Shannon entropy H = log2(2) = 1
abc, aabbcc, abcabcabc, and abccba all have Shannon entropy H = log2(3) = 1.58

The important point is to notice it does not depend on the message length, whose analogy is number of particles.  Clearly physical extensive entropy increases with number of particles.

Suggested conversion between H and regular extensive S
To convert Shannon entropy H to extensive entropy S or vice versa use
where N is number of oscillators or particles, kb is Boltzmann's constant, ln(2) is the conversion from bits to nats, and N is the number of particles to which you've assigned a "Shannon" symbol representing the distinct energy level (if volume is constant) or microstates (a combination of volume and energy) of the particles or other oscillators of energy. A big deal should not be made out of kb (and therefore S) having units of J/K because temperature is a precise measure of a kinetic energy distribution, so it is unitless in a deep sense. It is merely a slope that allows zero heat energy and zero temperature to meet at the same point, both of which are Joules per particle.

Physical entropy of an Einstein solid derivable from information theory?
Consider an Einstein Solidhttp://hyperphysics.phy-astr.gsu.edu/hbase/therm/einsol.html with 3 possible energy states (0, 1, 2) of in each of N oscillators. Let the average energy per oscillator be 1 which means the total energy q=N.  The physical extensive entropy S is (by the reference above) very close to S = kb*ln(2)*2N  = 1.38*N*kb for large N. You can verify this by plugging in example N's, or by Sterling's approximation and some math.

Shannon entropy for this system is as follows: each oscillator's energy is represented by 1 of 3 symbols of equal probability (i.e. think of a, b, c substituted for energies 0,1,2). Then H = log2(3) = 1.585 which does not depend on N (see my Shannon entropy examples above). Using the H to S conversion equation above gives S=N*kb*ln(2)*log2(3) = ln(3)*N*kb = 1.10*N*kb. This is 56% lower than the Debye model. I think this means there are 14 possible states of even probability in the message of oscillators for each oscillator N giving S = k*N*ln(14) and K.E.= N*energy/N.

log2(3) = ln(3)/ln(2)

Note that the physical S is 25% more than the S derived from information theory, 2*ln(2) verses ln(3). The reason is that the physical entropy is not restricted to independently random oscillator energy levels which average 1.  It only requires the total energy be q=N.

You might say this proves Shannon entropy H is not exactly like intensive entropy S.  But I wonder if this idealized multiplicity is a physical reality. Or maybe kb contains a "hidden" 25% reduction factor in which case I would need to correct my H to S conversion above by a factor of 0.75 if used in multiplicity examples instead of classical thermodynamics.   Another possibility is that my approach in assigning a symbol to an energy level and a symbol location to a particle is wrong.  But it's hard to imagine the usefulness of looking at it any other way.

Going further towards Debye model of solids
However, I might be doing it wrong.  The oscillators might also be all 1's or half 2's and half 0's ( all 1's has log2(1)=0) . This would constitute 3 different types of signal sources as the possible source of the total energy with entropy of maybe H=log2(3)+log2(2)+log2(1)=2.58. Times ln(2) gives 1.77 which is higher than the 1.38, but close to Debye's model of solids which predicts temps 1/0.806 more than einstein for a given total energy which is S/kb=1.72  (because U/N in both models ~T*S). The model itself is supposed to be the problem with einstein solid, so H should not come out closer based on it, unless I'm cheating or re-interpreting in a way that makes it closer to the Debye model. To complete this, I need to convert the debye model to a shannon signal source, assigning a symbol to each energy level. Each symbol would occur in the signal source (the solid) based on how many modes are allowing it. So N would not be particles but number of phonons, which determine plank radiation even at low temps, and heat capacity. It is strange that energy level quantity does not change the entropy, just the amount of variation in the possibilities. This probably is the way the physical entropy is calculated so it's garanteed to be correct.

Debye Temp ~ s* sqrt(N/V)  where s=sonic velocity of the solid .  "Einstein Temp"  = 0.806* debye temp.  This means my treatment

C = a*T^3 (heat capacity)

Entropy of a single harmonic oscillator (giving rise to phonons?) at "high" temp is
for solids S=k*ln(kT/hf + 1) where f is frequency of the oscillations which is higher for stronger bonds.  So if Earth acquires stronger bonds per oscillator and maintains a constant temperature, S decreases.  For indpendent oscillators in 1 D I think it might be S=kN*(ln(kT/hf/N) + 1).  In terms of S=n*H entropy:  S=X*[-N/X*[ln(N/X)-1]]  where X=kT/hf. For 3D X^3.  See Vibrational Thermodynamics of Materials, Fultz paper.

Example that includes volume
The entropy of an ideal gas according to internal energy page.
S = k * N *[ cln(U/N)  + R*ln(V/N) ]
by using log rules applied to Wikipedia's internal energy page, where c is heat capacity. You could use two sets of "Shannon" symbols, one for internal energy states U and a set for V, or use one set to represent the microstates determined by U and V to give S=constant*N*ln(2)*H.   H using distinct symbols for microstates is thereby the Shannon entropy of an ideal gas.

ideal gas not near absolute zero, more accurately:
S=kN*ln(V/N*(mU4pi/3Nh^2)^3/2 + 5/2))
Sackur–Tetrode equation
S~N*( ln(V/N*(mU/N)^3/2 + a) +b)
S~N*(ln(V/N) + a*ln(mU/N) +b)
The separate ln()'s are because there are 2 different sets of symbols that need to be on the same absolute base, but have "something", maybe different "symbol lengths" as a result of momentum being the underlying quantity and it affects pressure per particle as v whereas it affects U as v^2.  In PV=nRT, T is v^2 for a given mass, and P is v.  P1/P2 = sqrt(m2/m1) when T1 and T2 are the same for a given N/V.

edit to Wikipedia: (entropy article)

The closest connection between entropy in information theory and physical entropy can be seen by assigning a symbol to each distinct way a quantum energy level (microstate) can occur per mole, kilogram, volume, or particle of homogeneous substance. The symbols will thereby "occur" in the unit substance with different probability corresponding to the probability of each microstate. Physical entropy on a "per quantity" basis is called "[[Intensive_and_extensive_properties|intensive]]" entropy as opposed to total entropy which is called "extensive" entropy.  When there are N moles, kilograms, volumes, or particles of the substance, the relationship between this assigned Shannon entropy H in bits and physical extensive entropy in nats is:
:S = k_\mathrm{B} \ln(2) N H
where ln(2) is the conversion factor from base 2 of Shannon entropy to the natural base e of physical entropy.  [[Landauer's principle]] demonstrates the reality of this connection: the minimum energy E required and therefore heat Q generated by an ideally efficient memory change or logic operation from irreversibly erasing or merging N*H bits of information will be S times the temperature,
:E = Q = T k_\mathrm{B} \ln(2) N H,
where H is in information bits and E and Q are in physical Joules. This has been experimentally confirmed.{{Citation |author1=Antoine Bérut |author2=Artak Arakelyan |author3=Artyom Petrosyan |author4=Sergio Ciliberto |author5=Raoul Dillenschneider |author6=Eric Lutz |doi=10.1038/nature10872 |title=Experimental verification of Landauer’s principle linking information and thermodynamics |journal=Nature |volume=483 |issue=7388 |pages=187–190 |date=8 March 2012 |url=http://www.physik.uni-kl.de/eggert/papers/raoul.pdf|bibcode = 2012Natur.483..187B }}

(scott note: H = sum(count/N*log2 (N/count)  )

BEST wiki article
When a file or message is viewed as all the data a source will ever generate, the Shannon entropy H in '''bits per symbol''' is
H = - \sum_{i=1}^{n} p_i \log_2 (p_i) = \sum_{i=1}^{n}count_i/N \log_2 (N/count_i)

where i is for each distinct symbol, p_i is "probability of symbol being received,  and count_i is the number of times symbol i occurs in the message that is N symbols long. This equation is "bits" per symbol because it is in a logarithm of base 2. It is "entropy per symbol" or "normalized entropy" that ranges from 0 to1 when the logarithm base is equal to the n distinct symbols. This can be calculated by multiplying H in bits/symbol by ln(2)/ln(n). The "bits per symbol" of data that has only 2 symbols is therefore also "entropy per symbol".

The "entropy" of a message, file, source, or other data is S=N*H, in keeping with Boltzmann's [[H-theorem]] for physical entropy that Claude Shannon's 1948 paper cited as analogous to his information entropy.  To clarify why Shannon's H is often called "entropy" instead of "entropy per symbol" or "bits per symbol", section 1.6 of Shannon's paper refers to H as the "entropy of the set of probabilities" which is not the same as the entropy of N symbols. H is analogous to physic's  [[Intensive and extensive properties|intensive]] entropy So that is on a per mole or per kg basis, which is different from the more common extensive entropy S.  In section 1.7 Shannon more explicitly says H is "entropy per symbol" or "bits per symbol".

It is instructive to see H values in bits per symbol (log2) for several examples of short messages by using an online entropy (bits per symbol) calculator:http://www.shannonentropy.netmark.pl/http://planetcalc.com/2476/
* H=0 for "A", "AAAAA", and "11111"
* H=1 for "AB", "ABAB", "0011", and "0110101001011001" (eight 1's and eight 0's)
* H=1.58 for "ABC", "ABCABCABCABC", and "aaaaBBBB2222"
* H=2 for "ABCD" and "AABBCCDD"

The entropy in each of these short messages is H*N.

Claude Shannon's 1948 paper was the first to define an "entropy" for use in information theory. His H function (formally defined below) is named after Boltzmann's [[H-theorem]] which was used to define physical entropy by S=kb*N*H of an ideal gas. Boltzmann's H is entropy in [[nats_(unit)|nats]] per particle, so each symbol in a message has an analogy to each particle in an ideal gas, and the probability of a specific symbol is analogous to the probability of a particle's microstate. The "per particle" basis does not work in solids because they are interacting.  But the information entropy maintains mathematical equivalency with bulk [[Intensive and extensive properties|intensive]] physical entropy So which is on a per mole or per kilogram basis (molar entropy and specific entropy). H*N is mathematically analogous to total [[Intensive and extensive properties|extensive]] physical entropy S. If the probability of a symbol in a message represents the probability of a microstate per mole or kg, and each symbol represents a specific mole or kg, then S=kb*ln(2)*N*Hbits. Temperature is a measure of the average kinetic energy per particle in an ideal gas (Kelvins = Joules*2/3/kb) so the Joules/Kelvins units of kbare fundamentally unitless (Joules/Joules), so S is fundamentally an entropy in the same sense as H.
[[Landauer's principle]] shows a change in information entropy is equal to a change in physical entropy S when the information system is perfectly efficient. In other words, a device can't irreversibly change its information entropy content without causing an equal or greater increase in physical entropy.   Sphysical=k*ln(2)*(H*N)bits where k, being greater than Boltzmann's constant kb, represents the system's inefficiency at erasing data and thereby losing energy previously stored in bits as heat dQ=T*dS.

on the information and entropy article:  (my best version)

The Shannon entropy H in information theory has units of bits per symbol (Chapter 1, section 7). For example, the messages "AB" and "AAABBBABAB" both have Shannon entropy H=log2(2) = 1 bit per symbol because the 2 symbols A and B occur with equal probability.  So when comparing it to physical entropy, the physical entropy should be on a "per quantity" basis which is called "[[Intensive_and_extensive_properties|intensive]]" entropy instead of the usual total entropy which is called "extensive" entropy.  The "shannons" of a message are its total "extensive" information entropy and is H times the number of bits in the message.
A direct and physically real relationship between H and S can be found by assigning a symbol to each microstate that occurs per mole, kilogram, volume, or particle of a homogeneous substance, then calculating the H of these symbols. By theory or by observation, the symbols (microstates) will occur with different probabilities and this will determine H. If there are N moles, kilograms, volumes, or particles of the unit substance, the relationship between H (in bits per unit substance) and physical extensive entropy in nats is:
:S = k_\mathrm{B} \ln(2) N H
where ln(2) is the conversion factor from base 2 of Shannon entropy to the natural base e of physical entropy.  N*H is the amount of information in bits needed to describe the state of a physical system with entropy S.  [[Landauer's principle]] demonstrates the reality of this by stating the minimum energy E required (and therefore heat Q generated) by an ideally efficient memory change or logic operation by irreversibly erasing or merging N*H bits of information will be S times the temperature which is
:E = Q = T k_\mathrm{B} \ln(2) N H
where H is in informational bits and E and Q are in physical Joules. This has been experimentally confirmed.{{Citation |author1=Antoine Bérut |author2=Artak Arakelyan |author3=Artyom Petrosyan |author4=Sergio Ciliberto |author5=Raoul Dillenschneider |author6=Eric Lutz |doi=10.1038/nature10872 |title=Experimental verification of Landauer’s principle linking information and thermodynamics |journal=Nature |volume=483 |issue=7388 |pages=187–190 |date=8 March 2012 |url=http://www.physik.uni-kl.de/eggert/papers/raoul.pdf|bibcode = 2012Natur.483..187B }}
Temperature is a measure of the average kinetic energy per particle in an ideal gase (Kelvins = 2/3*Joules/kb) so the J/K units of kb is fundamentally unitless (Joules/Joules). kb is the conversion factor from energy in 3/2*Kelvins to Joules for an ideal gas. If kinetic energy measurements per particle of an ideal gas were expressed as Joules instead of Kelvins, kb in the above equations would be replaced by 3/2. This shows S is a true statistical measure of microstates that does not have a fundamental physical unit other than "nats" which is just a statement of which logarithm base was chosen by convention.

8 molecules of gas, with 2 equally probable internal energy states and 2 equally probable positions in a box.  If at first they are on the same side of the box with half in 1 energy state and the other half in the other energy state, then the state of the system could be written ABABABAB. Now if they are allowed to distribute evenly in the box keeping their previous energy levels it is written ABCDABCD.   For an ideal gas:   S ~ N*ln(U*V) where U and V are averages per particle.

An ideal gas uses internal energy and volume.  Solids seem to not include volume, and depend on phonons with have bond-stretching energies in place of rotational energies.  But it seems like solids should follow S~N*ln(U) = k*ln(2)*H where H would get into quantum probabilities of each energy state.  The problem is that phonon waves across the solid are occurring.  So that instead of U, an H like this might need to be calculated from phase-space (momentum and position of the atoms and their electrons? )
comment Wikipedia
Leegrc, "bits" are the invented name for when the log base is 2 is used. There is, like you say, no "thing" in the DATA itself you can point to. Pointing to the equation itself to declare a unit is, like you are thinking, suspicious. But physical entropy itself is in "nats" for natural units for the same reason (they use base "e"). The only way to take out this "arbitrary unit" is to make the base of the logarithm equal to the number of symbols. The base would be just another variable to plug a number in. Then the range of the H function would stay between 0 and 1. Then it is a true measure of randomness of the message per symbol. But by sticking with base two, I can look at any set of symbols and know how many bits (in my computing system that can only talk in bits) would be required to convey the same amount of information. If I see a long file of 26 letters having equal probability, then I need H = log2(26) = 4.7 bits to re-code each letter in 1's and 0's. There are H=4.7 bits per letter.
PAR, as far as I know, H should be used blind without knowledge of prior symbol probabilities, especially if looking for a definition of entropy. You are talking about watching a transmitter for a long time to determine probabilities, then looking at a short message and using the H function with the prior probabilities.

Let me give an example of why a blind and simple H can be extremely useful. Let's say there is a file that has 8 bytes in it. One moment it say AAAABBBB and the next moment it says ABCDABCD. I apply H blindly not knowing what the symbols represent. H=1 in the first case and H=2 in the second. H*N went from 8 to 16. Now someone reveals the bytes were representing microstates of 8 gas particles. I know nothing else. Not the size of the box they were in, not if the temperature had been raised, not if a partition had been lifted, and not even if these were the only possible microstates (symbols). But there was a physical entropy change everyone agrees upon from S1=kb*ln(2)*8 to S2=kb*ln(2)*16. So I believe entropy H*N as I've described it is as fundamental in information theory as it is in physics. Prior probabilities and such are useful but need to be defined how they are used. H on a per message basis will be the fundamental input to those other ideas, not to be brushed aside or detracted from.
I agree you can shorten up the H equation by entering the p's directly by theory or by experience. But you're doing the same thing as me when I calculate H for large N, but I do not make any assumption about the symbol probabilities. You and I will get the same entropy H and "extensive" entropy N*H for a SOURCE. Your N*H extensive entropy is N*sum(p*log(p)). The online entropy calculators and I use N*H = N*sum[ count/N*log(count/N) ] ( they usually give H without the N). These are equal for large N if the source and channel do not change. "My" H can immediately detect if a source has deviated from its historical average. "My" H will fluctuate around the historical or theoretical average H for small N. You should see this method is more objective and more general than your declaration it can't be applied to a file or message without knowing prior p's. For example, let a partition be removed to allow particles in a box to get to the other side. You would immediately calculate the N*H entropy for this box from theory. "My" N*H will increase until it reaches your N*H as the particles reach maximum entropy. This is how thermodynamic entropy is calculated and measured. A message or file can have a H entropy that deviates from the expected H value of the source.
The distinct symbols A, B, C, D are distinct microstates at the lowest level. The "byte" POSITION determines WHICH particle (or microvolume if you want) has that microstate: that is the level to which this applies. The entropy of any one of them, is "0" by the H function, or "meaningless" as you stated. A sequence of these "bytes" tells the EXACT state of each particle and system, not a particular microstate (because microstate does not care about the order unless it is relevant to it's probability). A single MACROstate would be combinations of these distinct states. One example macrostate of this is when the gas might be in any one of these 6 distinct states: AABB, ABAB, BBAA, BABA, ABBA, or BAAB. You can "go to a higher level" than using A and B as microstates, and claim AA, BB, AB, and BA are individual microstates with a certain probabilities. But the H*N entropy will come out the same. There was not an error in my AAAABBBBB example and I did not make an assumption. It was observed data that "just happened" to be equally likely probabilities (so that my math was simple). I just blindly calculated the standard N*H entropy, and showed how it give the same result physics gets when a partition is removed and the macrostate H*N entropy went from 8 to 16 as the volume of the box doubled. The normal S increases S2-S1=kb*N*ln(2) as it always should when mid-box partition is removed.
I can derive your entropy from the way the online calculators and I use Shannon's entropy, but you can't go the opposite way.
Now is the time to think carefully, check the math, and realize I am correct. There are a lot of problems in the article because it does not distinguish between intensive Shannon entropy H in bits/symbol and extensive entropy N*H in bits (or "shannons to be more precise to distinguish it from the "bit" count of a file which may not have 1's and 0's of equal probability).
BTW, the entropy of an ideal gas is S~N*log2(u*v) where u and v are internal energy and volume per particle. u*v gives the number of microstates per particle. Quantum mechanics determines that u can take on a very large number of values and v is the requirement that the particles are not occupying the same spot, roughly 1000 different places per particle at standard conditions. The energy levels will have different probabilities. By merely observing large N and counting, H will automatically include the probabilities.
In summary, there are only 3 simple equations I am saying. They precisely lay the foundation of all further information entropy considerations. These equations should replace 70% of the existing article. These are not new equations, but defining them and how to use them is hard to come across since there is so much clutter and confusion regarding entropy as a result of people not understanding these statements.
1) Shannon's entropy is "intensive" bits/symbol = H = sum[ count/N*log2(count/N) ] where N is the length of a message and count is for each distinct symbol.
2) Absolute ("extensive") information entropy is in units of bits or shannons = N*H.
3) S = kb*ln(2)*N*H where each N has a distinct microstate which is represented by a symbol. H is calculated directly from these symbols for all N. This works from the macro down to the quantum level.
Example: 3 interacting particles with sum total energy 2 and possible individual energies 0,1,2 may have possible energy distributions 011, 110, 101, 200, 020, or 002. I believe the order is not relevant to what is called a microstate, so you have only 2 symbols for 2 microstates, and get the probability for each is 50-50. Maybe there is usually something that skews this towards low energies. I would simply call each one of the 6 "sub-micro states" a microstate and let the count be included in H. Assuming equal p's again, the first case gives log(2)=1 and the 2nd log(6)=2.58. I believe the first one is the physically correct entropy (the approach, that is, not the exact number I gave). If I had let 0,1,2 be the symbols, then it would have 3*1.46 = 4.38 which is wrong.
Physically, because of the above, when saying S=k*ln(2)*NH, it requires that you look at specific entropy So and make it = k*ln(2)*H, so you'll have the correct H. This back-calculates the correct H. This assumes you are like me and can't derive Boltzmann's thermodynamic H from first (quantum or not) principles. I may be able to do it for an ideal gas. I tried to apply H to Einstein's oscillators (he was not aware of Shannon's entropy at the time) for solids, and I was 25% lower than his multiplicity, which is 25% lower than the more accurate Debye model. So a VERY simplistic approach to entropy with information theory was only 40% lower than experiment and good theory, for the one set of conditions I tried. I assumed the oscillators had only 4 energy states and got S=1.1*kT where Debye via Einstein said S=1.7*kT
My point is this: looking at a source of data and choosing how we group the data into symbols can result in different values for H and NH, [edit: if not independent]. Using no grouping on the original data is no compression and is the only one that does not use an algorithm plus lookup table. Higher grouping on independent data means more memory is required with no benefit to understanding (better understanding=lower NH). People with bad memories are forced to develop better compression methods (lower NH), which is why smart people can sometimes be so clueless about the big picture, reading too much with high NH in their brains and thinking too little, never needing to reduce the NH because they are so smart. Looking for a lower NH by grouping the symbols is the simplest compression algorithm. The next step up is run-length encoding, a variable symbol length. All compression and pattern recognition create some sort of "lookup table" (symbols = weighting factors) to run through an algorithm that may combine symbols to create on-the-fly higher-order symbols in order to find the lowest NH to explain higher original NH. The natural, default non-compressed starting point should be to take the data as it is and apply the H and NH statistics, letting each symbol be a microstate. Perfect compression for generalized data is not a solvable problem, so we can't start from the other direction with an obvious standard.
This lowering of NH is important because compression is 1 of 3 requirements for intelligence. Intelligence is the ability to acquire highest profit divided by noise*log(memory*computation) in the largest number of environments. Memory on a computing device has a potential energy cost and computation has a kinetic energy cost. The combination is internal energy U. Specifically, for devices with a fixed volume, in both production machines and computational machines, profit = Work output/[k*Temp*N*ln(U/N)] = Work/(kTNH). This is Carnot efficiency W/Q, except the work output includes acquisition of energy from the environment so that the ratio can be larger than 1. The thinking machine must power itself from its own work production, so I should write (W-Q)/Q instead. W-Q feeds back to change Q to improve the ratio. The denominator represents a thinking machine plus its body (environment manipulator) that moves particles, ions (in brains), or electrons (in computers) to model much larger objects in the external world to try different scenarios before deciding where to invest W-Q. "Efficient body" means trying to lower k for a given NH. NH is the thinking machine's algorithmic efficiency for a giving k. NH has physical structure with U losses, but that should be a conversion factor moved out to be part of the kT so that NH could be a theoretical information construct. The ultimate body is bringing kT down to kb at 0 C. The goal of life and a more complete definition of intelligence is to feed Work back to supply the internal energy U and to build physical structures that hold more and more N operating at lower and lower k*T. A Buddhist might say we only need to stop being greedy and stop trying to raise N (copies of physical self, kT, aka the number of computations) and U and we could leave k alone. This assumes constant volume, otherwise replace U/N with V/N*(U/N)^3/2 (for an ideal gas, anyway, maybe UV/NN is ok for solids. Including volume means we want to make it lower to lower kTNH. So denser thinking machine.  The universe itself increases V/N (Hubble expansion) buth it cancels in determining Q because it causes U/N to decrease at possibly the same rate. This keeps entropy and energy CONSTANT on a universal COMOVING basis (ref: Weinberg's 1977 famous book "First 3 Minutes"), which causes entropy to be emitted (not universally "increased" as the laymen's books still say) from gravitational systems like Earth and Galaxies. The least action principle (the most general form of Newton's law, better than Hamiltonian & Lagrangian for developing new theories, see Feynman's red books) appears to me to have an inherent bias against entropy, preferring PE over KE over all time scales, and thereby tries to lower temp and raise the P.E. part of U for each N on Earth. This appears to be the source of evolution and why machines are replacing biology, killing off species 50,000 times faster than the historical rate. The legal requirement of all public companies is to dis-employ workers because they are expensive and to extract as much wealth from society as possible so that the machine can grow. Technology is even replacing the need for shareholders and skill (2 guys started MS, Apple, google, youtube, facebook, and snapchat and you can see trend in decreasing intelligence and age and increasing random luck needed to get your first $billion). Silicon, carbon-carbon, and matals are higher energy bonds (which least action prefers over kinetic energy) enabling lower N/U and k, and even capturing 20 times more Work energy per m^2 than photosynthesis. Ions that brains have to model objects with still weigh 100,000 times more than the electrons computers use.
In the case of the balance and 13 balls, we applied the balance like asking a question and organize thigs to get the most data out of the test. We may seek more NH answers from people or nature than we give in order to profit, but in producing W, we want to spend as little NH as possible.
[edit: I originally backtracked on dependency but corrected it, and I made a lot errors with my ratios from not letting k be positive for the ln().]Ywaz (talk) 23:09, 8 December 2015 (UTC)

Rubik's cube
Let the number of quarter turns from a specific disorder to ordered state be a microstate with a probability based on number of turns.  Longer routes to solution are more likely.  The shortest route uses the least energy which is indicative of intelligence. That's why we prize short solutions.  The N in N*H = N*sum(count/N*log2(N/count)) is the number of turns.  Count is the number of different routes with that N.  Speakin in terms Carnot understood and reversing time, the fall from low entropy to high entropy should be fast in order minimize work production (work absorption when forward in time).  The problem is always determining if a single particular step is the most efficient towards the goal or not. There's no incremental measure for profit increase in the problems we can't solve. Solving problems is a decrease in entropy. Is working backwards to generate the most mixed up state in as few turns as possible a beginning point?

Normalized entropy
There may not be a fundamental cost to a large memory, i.e., a large number of symbols aka classifications. That is potential energy which is an investment in infrastructure. Maybe there is a "comparison","lookup", or even transmission cost (more bits are still required), but maybe those are reversible. Maybe you can take large entropy, classify it the same as with microbits, and the before and after entropy cost is the same, but maybe computation is not reversible (indeed, dispersion calls it into question) but memory access is, so a larger memory set for classification is less energy to computation.  If there are no obvious dependencies to the H function, the larger number of symbols absorbing more bits/symbol in the message will have the same NH.  If H is not equal to 1 for binary, then this may be an unusual situation.  Let i=0 to n=number of unique symbols of message N symbols long then H=sum(log(N/count_i)) . So

H' = H*ln(2)/ln(n) where "2" should be replaced if the H was not calculated with log base 2.

 = normalized entropy where each "bit" position can take on i states, i.e., an access to a memory of symbols has i choices, the number of symbols, aka the number of distinct microstates.  H' varies from 0 to 1, which Shannon did not point out, but let "entropy" vary in definition based on the log base. So if there is not a cost to accessing a large memory bank and it speeds up computation or discussion, then even if NH is the same, NH' will be lower for the larger memory bank (the division makes it smaller).  But again, NH is only the same between the two anyway if symbol frequencies are equal and this occurs only if you see no big patterns at all and it looks like noise in both cases, or if you have really chosen the optimum set of symbols for the data (physics equations seem to be seeking smallest NH, not smallest NH').  It seems like a larger number of symbols available will almost always detect more patterns, with no problems about narrow-mindedness if the training data set is large enough. The true "disorder" of a physical object relative to itself (per particle or per symbol) rather than to it's mass or size (So) or information is H' = S/ln(number of microstates).  If one is trying to compare the total entropy of objects or data that use completely different sets of microstates (symbols) that are very different, the measure N*H' is still completely objective.

With varying-length symbols, NH' has to more seriously lose all connection with physical entropy and energy on a per bit level.  It has to be strictly on a per symbol level.  Varying symbols lose all connection to physics. consider the following:  n times symbol bit length is no longer representing a measure of the bit memory space required because bit length is varying.  N divided by symbol bit length is also not the number of symbols in a message.  There are more memory locations addressed, with total memory required being n*average symbol length.  It seems like it would be strictly "symbols only", especially if the log base is chosen based on n.  The log base loses the proportional connection to "energy" (if energy were still connected to number of bits instead of symbols).

For varying-length symbols there is also a problem when trying to relate computations to energy and memory even if I keep base 2.  To keep varying symbol lengths on a bit basis for the entropy in regards to computation energy required per information entropy, I need the following: For varying bit-length symbols where symbol i has bit length Li, I get
H=1/N*sum ( count_i * Li * log2(N/(count_i*Li) ) )
This is average bit (and real entropy since N is in bits) variation per bit communicated.
where N has to be in bits instead of symbols and the log base has to be in bits. Inside the log is the ability of the count to reduce the number of bits needed to be transmitted.  Remember for positive H it is p*log(1/p) so this represents a higher probability of "count" occurring for a particular bit if L is higher.  I don't know how to convert the log to a standard base where it varies from 0 to 1. Because they are varying length, it kind of loses relevance. Maybe the sum frequency of symbol occurring times tits length divided by sum (count_i/N*Li/(sum( L)/n))

Log base two on varying symbol bit-lengths loses its connection to bits required to re-code, so the above has to be used.

So in order to keep information connected to physics, you need to stick with log base 2 so you can count the changes in energy accurately. The symbols can vary in bit length as long as they are the same length.

So H' is a disorder measurement per symbol, comparable across all types of systems with varying number of n and N symbols. It varies from 0 to 1.   It is "per energy" only if there is a fixed energy cost per symbol transmission and storage without regard to symbol length. If someone has remembered a lot of symbols of varying lengths, they might know a sequence of bits immediately as 2 symbols and get a good NH' on the data at hand but not a better NH if they had remembered all sequences of that length. If they know the whole message as 1 symbol they already know the message it is 0 in both and the best in both.   Being able to utilize the symbol for profit is another matter (low k factor as above). A huge memory will have a cost that raises k. Knowing lots of facts but not being able to know how to use them is a high k (no profit).  Facts without profit are meaningless.  You might as well read data and just store it.

Maybe "patterns" or "symbols" or "microstates" to be recognized are the ones that lead to profit. You run scenarios and look at the output. That requires patterns as constraints and capabilities. Or look at profit patterns and reverse-invent scenarios of possible patterns under the constraints.

Someone who knows many symbols of a Rubik's cube is one who recognizes the patterns that lead to the quickest profit. The current state plus the procedure to take is a pattern that leads to profit.  The goal is speed. Energy losses for computation and turn cost are not relevant.  They will have the lowest NH', so H' is relevant.
Thanks for the link to indistinguishable particles. The clearest explanation seems to be here, [[Gibbs_paradox#The_mixing_paradox|the mixing paradox]]. The idea is this: if we need to know the kTNH energy required (think NH for a given kT) to return to the initial state at the level 010, 100, 001 with correct sequence from a certain final sequence, then we need do the microstates at that low level. Going the other way, "my" method should be  mathematically the same as "yours" if it is required to NOT specify the exact initial and final sequences, since those were implicitly not measured.  Measuring the initial state sequences without the final state sequences would be changing the base of the logarithm mid-stream. H is in units of true entropy per symbol when the base of logarithm is equal to the number of distinct symbols.  In this way H always varies from 0 to 1 for all objects and data packets, giving a true disorder (entropy) per symbol (particle). You multiply by ln(2)/ln(n) to change base from 2 to n symbols. Therefore the  ultimate objective entropy (disorder or information) in all systems, physical or information, when applied to data that accurately represents the source should be

Entropy = N*(-H) = \sum_i count_i \log_n (N/count_i)

 Entropy = sum(count*logn(N/count))

where i=1 to n distinct symbols in data N symbols long. Shannon did not specify which base H uses, so it is a valid H.  To convert it to nats of normal physical entropy or entropy in bits, multiply by ln(n) or log2(n). The count/N is inverted to make H positive. In this equation, with the ln(2) conversion factor, this entropy of "data" is physically same as the entropy of "physics" if the symbols are indistinguishable, and we use energy to change the state of our system E=kT*NH where our computer system has a k larger than kb due to inefficiency.  Notice that changes in entropy will be the same without regard to k, which seems to explain why ultimately distinguishable states get away with using higher-level microstates definitions that are different with different absolute entropy. For thermo, kb is what appears to have fixed not caring about the deeper states that were ultimately distinguishable.

In the equation above, there is a penalty if you chose a larger symbol set. Maybe that accounts for the extra memory space required to define symbols.

The best wiki articles are going to be like this: you derive what is the simplest but perfectly accurate view, then find the sources using that conclusion to justifyits inclusion.
So if particles (symbols) are distinguishable and we use that level of distinguishability, the count at the 010 level has to be used. Knowing the sequence means knowing EACH particle's energy. The "byte-position" in a sequence of bits represents WHICH particle. This is not mere symbolism because the byte positions on a computer have a physical location in a volume, so that memory core and CPU entropy changes are exactly the physical entropy changes if they are at 100% efficiency ([[Landauer's principle]]). (BTW the isotope method won't work better than different molecules because it has more mass. This does not affect temperature, but it affects pressure, which means the count has to be different so that pressure is the same. So if you do not do anything that changes P/n in PV=nRT, using different gases will have no effect to your measured CHANGE in entropy, and you will not know if they mixed or not. ) [[User:Ywaz|Ywaz]] ([[User talk:Ywaz|talk]]) 11:48, 10 December 2015 (UTC)

By using indistinguishable states, physics seems to be using a non-fundamental set of symbols, which allows it to define states that work in terms of energy and volume as long as kb is used.  The ultimate, as far as physicists might know, might be phase space (momentum and position) as well as spin, charge, potential energy and whatever else.  Momentum and position per particle are 9 more variables because unlike energy momentum is a 6D vector (including angular), and a precise description of the "state" of a system would mean which particle has the quantities matters, not just the total.  Thermo gets away with just assigning states based on internal energy and volume, each per particle. I do not see kb in the ultimate quantum description of entropy unless they are trying to bring it back out in terms of thermo. If charge, spin, and particles are made up of even smaller distinguishable things, it might be turtles all the way down, in which case, defining physical entropy as well as information entropy in the base of the number of symbols used (our available knowledge) might be best.

I didn't make it up. It's normally called normalized entropy, although they normally refer to this H with logn "as normalized entropy" when according to Shannon they should say "per symbol" and use NH to call it an entropy.  I'm saying there's a serious objectivity to it that I did not realize until reading about indistinguishable states.
I hope you agree "entropy/symbol" is a number that should describe a certain variation in a probability distribution, and that if a set of n symbols were made of continuous p's, then a set of m symbols should have the same continuous distribution.  But you can't do that (get the same entropy number) for the exact same "extrapolated" probability distributions if they use a differing number of symbols.  You have to let the log base equal the number of symbols. I'll get back to the issue of more symbols having a "higher resolution".  The point is that any set of symbols can have the same H and have the same continuous distribution if extrapolated.
If you pick a base like 2, you are throwing in an arbitrary element, and then have to call it (by Shannon's own words) "bits/symbol" instead of "entropy/symbol".  Normalized entropy makes sense because of the following
entropy in bits/symbol = log2(2^("avg" entropy variation/symbol))
entropy per symbol = logn(n^("avg" entropy variation/symbol))
The equation I gave above is the normalized entropy that gives this 2nd result.
Previously we showed for a message of N bits, NH=MH' if the bits are converted to bytes and you calculate H' based on the byte symbols using the same log base as the bits, and if the bits were independent.  M = number of byte symbols = N/8.  This is fine for digital systems that have to use a certain amount of energy per bit.  But what if energy is per symbol?  We would want NH = M/8*H' because the byte system used 8 fewer symbols.
By using log base n, H=H' for any set of symbols with the same probability distribution, and N*H=M/8*H.
Bytes can take on an infinite number of different p distributions for the same H value, whereas bits are restricted to a certain pair of values for p0 and p1 (or p1 and p0) for a certain H, since p0=1-p0. So bytes have more specificity, that could allow for higher compression or describing things like 6-vector momentum instead of just a single scalar for energy, using the same number of SYMBOLS. The normalized entropy allows them to have the same H to get the same kTNH energy without going through contortions.  So for N particles let's say bits are being used to describe each one's energy with entropy/particle H, and bytes are used to described their momentums with entropy/particle H'.  Momentums uniquely describe the energy (but not vice versa). NH=NH'. And our independent property does not appear to be needed: H' can take on a specific values of p's that satisfy H=H', not some sort of average of those sets. Our previous method of NH=MH' is not as nice, violating Occam's razor.

entropy and kolmogorov complexity
== Shannon entropy of a universal measure of Kolmogorov complexity   ==
What is the relation of Kolmogorov complexity to [[Information theory]]? It seems very close to Shannon entropy, in that both are maximised by randomness, although one seems to deal more with the message than the context. [[User:Cesiumfrog|Cesiumfrog]] ([[User talk:Cesiumfrog|talk]]) 00:03, 14 October 2010 (UTC)
:Shannon entropy is a statistical measure applied to data which shows how efficiently the symbols are transferring bits based solely on how many times the symbols occur relative to each other. It's like the dumbest, simplest way of looking for patterns in data, which is ironically why it is useful. Its biggest most basic use is for taking data with a large number of different symbols and stating how many bits are needed to say the same thing without doing any compression on the data.
:Kolmogorov complexity is the absolute extreme in the direction Shannon entropy takes step 0. It is the highest possible level of intelligent compression of a data set. It is not computable in most cases, it's just a theoretical idea.  But  Shannon entropy is always computable and blind. By "compression" I mean an algorithm has been added to the data, and the data reduced to a much smaller set that is the input to the algorithm to generate the original data. Kolmogorov is the combination in bits of the program plus the smaller data set.
:I would use Shannon entropy to help determine the "real" length in bits of a program that is proposing to be better than others at get close to the idealized Kolmogorov complexity. Competing programs might be using a different or larger sets of functions, which I would assign a symbol to. Programs that use a larger set of functions would be penalized when the Shannon entropy measure is applied. There are a lot of problems with this staring point that I'll get to, but I would assign a symbol to each "function" like a=next line, b=*,  e=OR, f=space between arguments, and so on. Numbers would remain number symbols  1,2,3, because they are already efficiently encoded.  Then the Shannon entropy (in bits) and therefore the "reduced" level attempted Kolgomorov complexity (Shannon entropy in bits) is
: N*H = - N \sum_i f_i \log_2 (f_i) = \sum_i count_i \log_2 (N/count_i)
:where i=1 to n is for each of n distinct symbols, f_i is "frequency of symbol i occurring in program",  and count_i is the number of times symbol i occurs in the program that is N symbols long.  This is the Shannon bits (the unit is called "shannons") in the program.   The reason this is a better measure of k is because if
:normalized H = \sum_i count_i/N \log_n (N/count_i)
:is < 1, then there is an inefficiency in the way the symbols themselves were used (which has no relevance to the efficiency of the logic of the algorithm) that is more efficient when expressed as bits.  Notice the log base is "n" in this 2nd equation. This H is normalized and is the truest entropy per symbol. To calculate things in this base use logn(x) = ln(x)/ln(n)
:But there is a big problem.  Suppose you want to define a function to be equal to a longer program and just assign a symbol to it, so the program length is reduced to nearly "1". Or maybe a certain function in a certain language is more efficient for certain problems.   So there needs to be a reference "language" (set of allowable functions) to compare one K program to the next. All standard functions have a known optimal expression in Boolean logic: AND, NOT, and OR. So by choosing those 3 as the only functions, any program in any higher-level language can be reduced back to a standard. Going even further, these can be reduced back to a transistor count or to a single universal logic gate like NAND or XOR, or a single universal reversible logic gate like Toffoli or Fredkin.  Transistors are just performing a function to, so i guess they are universal to, but I am not sure.  The universal gates can be wired to performs all logic operations and are Turing complete.  So any program in any language can be re-expressed into a single fundamental function. The function itself will not even need a symbol in the program to identify it (as I'll show), so that the only code in the program is describing the wiring between "addresses". So the complexity describes the complexity of the physical connections, the path the electrons or photons in the program take, without regard to the distance between them.  Example:  sequence of symbols ABCCDA&CEFA&G-A would mean "perform A NAND B and send output to C, perform C NAND D and send output to A and C, perform E NAND F and send output to A and F, go to A. The would define input and output addresses, I would use something like ABD>(program)>G. The "goto" symbol "-" is not used if the program is expanded to express Levine's modified K complexity which penalizes loop \s by expanding them. It thereby takes into account computational time (and therefore a crude measure of energy energy) as well and a different type of program length.
:The program length by itself could or should be a measure of the computational infrastructure required (measured as energy required to create the transistors), which is another reason it should be in terms of something like a single gate: so it's cost to implement can be measured.  What's the cost to build an OR verses a fourier transform? Answer: Count the required transistors or the NAND gates (which are made up of a distinct number of transistors). All basic functions already have had their Boolean logic and transistors reduce to a minimal level so it's not arbitrary or difficult.
:I think this is how you can get a comparable K in all programs in all languages:  express them in a single function with distinct symbols representing the input and ouput addresses. Then calculate Shannon's N*H to get its K or Levine complexity in absolute physical terms. Using [[Landauer's principle]], counting the number of times a bit address (at the input or output of a NAND gate) changes state will give the total energy in joules that the program used, Joules=k*T*ln(2)*N where k is larger than Boltzmann's constant kb as a measure of the inefficiency of the particular physical implementation of the NAND gates. If a theoretically reversible gate is used there is theoretically no computational energy loss, except for the input and output changes.  [[User:Ywaz|Ywaz]] ([[User talk:Ywaz|talk]]) 22:20, 12 December 2015 (UTC)

simplifying ST equation for ideal gas and looking at it from an entropy view.
== Derivation from uncertainty principle and relation to information entropy ==
Volume and energy are not the direct source of the states that generate entropy, so I wanted to express it in terms of x*p/h' number of  states for each N.  Someone above asked for a derivation from the uncertainty principle ("the U.P.") and he says it's pretty easy. S-T pre-dates U.P., so it may be only for historical reasons that a more efficient derivation is not often seen.

The U.P. says x'p'>h/4pi where x'p' are the standard deviations of x and p.  The x and p below are the full range, not the 34.1% of the standard deviation so I multiplied x and p each by 0.341. It is 2p because p could be + or - in x,y, and z. By plugging the variables in and solving, it comes very close to the S-T equation. For Ω/N=1000 this was even more accurate than S-T, 0.4% lower.

Sackur-Tetrode equation:

S = k_B \ln\left(\frac{\Omega^N}{N!}\right) \approx k_B N \left(\ln\left(\frac{\Omega}{N} \right) +1\right)


\Omega= \left(\frac{2px}{\hbar/2}\right)^{3} = \left(\frac{2p_x x_x}{\hbar/2}\right)\left(\frac{2p_y x_y}{\hbar/2}\right)\left(\frac{2p_z x_z}{\hbar/2}\right)

\sigma=0.341, x=\sigma V^\frac{1}{3},  p = \sigma\left(\frac{2mU*b}{N!}\right)^\frac{1}{2}, U*b=K.E.=3/2 k_B T

Stirling's approximation N!=(N/e)^N is used in two places that results in a 1/N^(5/2) and and e^(5/2) which is where the 5/2 factor comes from.  The molecules' internal energy U is kinetic energy for the monoatomic gas case for which the S-T applies. b=1 for monoatomic, and it may simply be changed for other non-monoatomic gases that have a different K.E./U ratio. The equation for p is the only difficult part of getting from the U.P. to the S-T equation and it is difficult only because the thermodynamic measurements T (kinetic energy per atom) and V are an energy and a distance where the U.P. needs x*p or t*E. This strangeness is where the 3/2 and 5/2 factors come from. The 2m is to get 2*m*1/2*m*V^2 = p^2. Boltzmann's entropy assumes it is a max for the given T, V, and P which I believe means the N's are evenly distributed in x^3 and assumes all are carrying the same magnitude p momentum.

To show how this can come directly from information theory, first remember that Shannon's H function is valid only for a random variable. In this physical case, there are only 2 possible values that each phase space can have: with an atom in it, or not, so it is like a binary file. But unlike normal information entropy, some or many of the atoms may have zero momentum as the others carry more: the total energy just has to be the same. So physical entropy can use anywhere from N to 1 symbols (atoms) to carry the same message (the energy), whereas information entropy is stuck with N. Where information entropy has a log(1/N^N)=N*log(N) factor, physical entropy has log(1/N!)=N*log(N)+N which is a higher entropy.  My correction to Shannon's H shown below is known as the sum of the surprisals and is equal to the information (entropy) content in bits. The left sum is the information contributions from the empty phase space slots and the right side are those where an atom occurs. The left sum is about 0.7 without regard to N (1 if ln() had been used) and the right side is about 17*N for a gas at room temperature (for Ω/N ~ 100,000 states/atom):

H*N = \sum_{i=1}^{\Omega-N} \log_2\left(\frac{\Omega}{\Omega-i N/(\Omega-N)}\right) + \sum_{j=0}^{N-1} \log_2\left(\frac{\Omega}{N-j}\right) \approx \log_2\left(\frac{\Omega^N}{N!}\right)

Physical entropy S then comes directly from this Shannon entropy:

S = H*N k_B \ln(2)

A similar procedure can be applied to phonons in solids to go from information theory to physical entropy. For reference here is a 1D oscillator in solids:

S = k_B \left[ \ln \left( \frac{k_B T}{hf}\right) +1 \right]
[[User:Ywaz|Ywaz]] ([[User talk:Ywaz|talk]]) 02:37, 15 January 2016 (UTC)
email to jon bain, NYU 1/17/2016  (this conatins errors.  See above for the best stuff..
You have a nice entropy presentation, but I wanted to show you Shannon's H is an "intensive" entropy which is entropy per symbol of a message as he states explicitly in section 1.7 of his paper.  This has much more of a direct analogy to Boltzmann's H-theorem that Shannon says was his inspiration.

Specific entropy in physics is S/kg or S/mole which is what both the H's are. Regular extensive entropy S for both Shannon and Boltzmann is:


So conceptually it seems the only difference from Shannon/Boltzmann and any other physical entropy is that Shannon/Boltzmann make use independent (non-interacting) symbols (particles) because they are available. It seems like they could be forced to generalize towards Gibbs or QM entropy as needed, mostly by just changing definitions of the variables (semantics).

Boltzmann's N particles
Let O{N!} = "O!" = # of QM-possible mAcrostates for N at E
Only a portion of N may carry E (and therefore p) and each Ni may not occupy less than (xp/h')^3 phase space:
O{N!} = (2xp/h')^3N = [2x*sqrt(2m*E/N!)/h']^3N ,   N!=(N/e)^N
2xp because p could be + or -.
ln [ (O{N!})^3 / N! ] = ln [O^3N*(1/e)^(-3/2*N) / (N/e)^N ] = 5/2*N*[ln(O/N)+1]
O= 2xP/N/h' = 2xp/h'
In order to count O possible states I have to consider than some of the Ni's did not have the minimum xp'/h' which is why S-T has error for small N: they can't be zero by QM.
S=k*ln(O{N!}/N!)=5/2*k*N*[ln(O/N) + 1]
   results in a constant in the S-T equation:  S=kN(ln(O/N) + 5/2)

S= k*ln(Gamma) =  k*ln("O!"/N!) = k*5/2*N*(ln(O/N)+1] )
S=k*N*(-1*Oi/N*sum[ln(N/Oi) -const]
S=k*N*(sum[Oi/N*const - Oi/N*ln(N/Oi)]

Shannon: N=number of symbols in the data, O^O=distinct symbols,


So the Shannon entropy appears to me to be Boltzmann's entropy except that particles can be used only once and symbols can be used a lot.  It seems to me Shannon entropy math could be applied to Boltzmann's physics (or vice versa) it get the right answer with only a few changes to semantics.

If I sent messages with uniquely-colored marbles from a bag with gaps in the possible time slots, I am not sure there is a difference between Shannon and Boltzmann entropy.

Can the partition function be expressed as a type of N!/O! ?

The partition function's macrostates seems to work backwards to get to Shannon's H because measurable macrostates are the starting point.

The S-T equation for a mono-atomic gas is S=kN*(ln(O/N)+5/2) where O is  (2px/h')^3=(kT/h'f)^3 in the V=x^3 volume where h' is an adjusted h-bar to account for p*x in the uncertainty principle being only 1 standard deviation (34.1%) instead of 100% of the possibilities, so h'=h-bar/(0.34.1)^2 and there are no other constants or pi's in S-T equation except for 5/2 which comes from [(xp/h')^3N]/N! = [x*sqrt(2m*E/N!)/h']^3 ~ (sqrt(E/e))^(3N*sqrt(E)) / (N/e)^N = 5/2

k is just a conversion factor from the kinetic energy per N as measured by temperature in order to get it into joules when you use dQ=dS*T.  If we measure the kinetic energy of the N's in Joules instead of temp, then k is absent. Of course ln(2) is the other man-made historical difference.

If the logarithm base is made N, then it is a normalized entropy that ranges from 0 to 1 for all systems, giving a universal measure of how close the system is to maximum or minimum entropy.

I think your equations should keep "const" inside the parenthesis like this because it depends on N and I believe it is a simple ratio. I think the const is merely the result of "sampling without replacement" factorials.
they say Boltzmann's entropy is when the system has reached thermal equilibrium so that S=N*H works, or something like that, and that Gibbs is more general where the systems macrostate measurements may have a lower entropy than boltzmann's calculation.  I think this is like a message where you have calculated the p's, but it sends you something different.  Or that you calculate S=N*H and source's H is different than it turns out to be in the future.
post to rosettacode.org on "entropy" where they give code for H in many languages.

==  This is not exactly "entropy". H is in bits/symbol. ==
This article is confusing Shannon entropy with information entropy and incorrectly states Shannon entropy H has units of bits.

There are many problems in applying H= -1*sum(p*log(p)) to a string and calling it the entropy of that string. H is called entropy but its units are bits/symbol, or entropy/symbol if the correct log base is chosen. For example, H of 01 and 011100101010101000011110 are exactly the same "entropy", H=1 bit/symbol, even though the 2nd one obviously carries more information entropy than the 1st.  Another problem is that if you simply re-express the same data in hexadecimal, H gives a different answer for the same information entropy. The best and real information entropy of a string is 4) below. Applying 4) to binary data gives the entropy in "bits", but since the data was binary, its units are also a true "entropy" without having to specify "bits" as a unit.

Total entropy (in an arbitrarily chosen log base, which is not the best type of "entropy") for a file is S=N*H where N is the length of the file. Many times in [http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf Shannon's book] he says H is in units of "bits/symbol", "entopy/symbol", and "information/symbol".  Some people don't believe Shannon, so [https://schneider.ncifcrf.gov/ here's a modern respected researcher's home page] that tries to clear the confusion by stating the units out in the open.

Shannon called H "entropy" when he should have said "specific entropy" which is analogous to physics' S0 that is on a per kg or per mole basis instead of S.  On page 13 of [http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf Shannon's book], you easily can see Shannon's horrendous error that has resulted in so much confusion.  On that page he says H, his "entropy", is in units of "entropy per symbol". This is like saying some function "s" is called "meters" and its results are in "meters/second". He named H after Boltzmann's H-theorem where H is a specific entropy on a per molecule basis.  Boltzmann's entropy S = k*N*H = k*ln(states).

There 4 types of entropy of a file of N symbols long with n unique types of symbols:

1) Shannon (specific) entropy '''H = sum(count_i / N * log(N / count_i))'''
where count_i is the number of times symbol i occured in N.
Units are bits/symbol if log is base 2, nats/symbol if natural log.

2) Normalized specific entropy: '''Hn = H / log(n).'''
The division converts the logarithm base of H to n. Units are entropy/symbol. Ranges from 0 to 1. When it is 1 it means each symbol occurred equally often, n/N times. Near 0 means all symbols except 1 occurred only once, and the rest of a very long file was the other symbol. "Log" is in same base as H.

3) Total entropy '''S' = N * H.'''
Units are bits if log is base 2, nats if ln()).

4) Normalized total entropy '''Sn' = N * H / log(n).'''  See "gotcha" below in choosing n.
Unit is "entropy". It varies from 0 to N

5) Physical entropy S of a binary file when the data is stored perfectly efficiently (using Landauer's limit): '''S = S' * kB / log(e)'''

6) Macroscopic information entropy of an ideal gas of N identical molecules in its most likely random state (n=1 and N is known a priori):  '''S' = S / kB / ln(1)''' = kB*[ln(states^N/N!)] = kB*N* [ln(states/N)+1].

*Gotcha: a data generator may have the option of using say 256 symbols but only use 200 of those symbols for a set of data. So it becomes a matter of semantics if you chose n=256 or n=200, and neither may work (giving the same entropy when expressed in a different symbol set) because an implicit compression has been applied.
rosetta stone, on the main entropy page:

Calculate the Shannon entropy H of a given input string.

Given the discreet random variable X that is a string of N "symbols" (total characters) consisting of n different characters (n=2 for binary), the Shannon entropy of X in '''bits/symbol''' is :
:H_2(X) = -\sum_{i=1}^n \frac{count_i}{N} \log_2 \left(\frac{count_i}{N}\right)

where count_i is the count of character n_i.

For this task, use X="1223334444" as an example. The result should be 1.84644... bits/symbol.  This assumes X was a random variable, which may not be the case, or it may depend on the observer.

This coding problem calculates the "specific" or "[[wp:Intensive_and_extensive_properties|intensive]]" entropy that finds its parallel in physics with "specific entropy" S0 which is entropy per kg or per mole, not like physical entropy S and therefore not the "information" content of a file. It comes from Boltzmann's H-theorem where S=k_B  N  H where N=number of molecules. Boltzmann's H is the same equation as Shannon's H, and it gives the specific entropy H on a "per molecule" basis.

The "total", "absolute", or "[[wp:Intensive_and_extensive_properties|extensive]]" information entropy is
:S=H_2 N bits
This is not the entropy being coded here, but it is the closest to physical entropy and a measure of the information content of a string. But it does not look for any patterns that might be available for compression, so it is a very restricted, basic, and certain measure of "information". Every binary file with an equal number of 1's and 0's will have S=N bits. All hex files with equal symbol frequencies will have S=N \log_2(16) bits of entropy. The total entropy in bits of the example above is S= 10*18.4644 = 18.4644  bits.

The H function does not look for any patterns in data or check if X was a random variable.  For example, X=000000111111 gives the same calculated entropy in all senses as Y=010011100101. For most purposes it is usually more relevant to divide the gzip length by the length of the original data to get an informal measure of how much "order" was in the data.

Two other "entropies" are useful:

Normalized specific entropy:
:H_n=\frac{H_2 * \log(2)}{\log(n)}
which varies from 0 to 1 and it has units of "entropy/symbol" or just 1/symbol. For this example, Hn<\sub>= 0.923.

Normalized total (extensive) entropy:
:S_n = \frac{H_2 N * \log(2)}{\log(n)}
which varies from 0 to N and does not have units. It is simply the "entropy", but it needs to be called "total normalized extensive entropy" so that it is not confused with Shannon's (specific) entropy or physical entropy. For this example, Sn<\sub>= 9.23.

Shannon himself is the reason his "entropy/symbol" H function is very confusingly called "entropy". That's like calling a function that returns a speed a "meter". See section 1.7 of his classic [http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20Theory%20of%20Communication.pdf  A Mathematical Theory of Communication] and search on "per symbol" and "units" to see he always stated his entropy H has units of "bits/symbol" or "entropy/symbol" or "information/symbol".  So it is legitimate to say entropy NH is "information".

In keeping with Landauer's limit, the physics entropy generated from erasing N bits is S = H_2 N k_B \ln(2) if the bit storage device is perfectly efficient.  This can be solved for H2*N to (arguably) get the number of bits of information that a physical entropy represents. 
== From Shannon's H to ideal gas S ==
This is a way I can go from information theory to the Sackur-Tetrode equation by simply using the sum of the surprisals or in a more complicated way by using Shannon's H. It gives the same result and has 0.16% difference from ST for neon at standard conditions.

Assume each of the N atoms can either be still or be moving with the total energy divided among "i" of them. The message the set of atoms send is "the the total kinetic energy in this volume is E". The length of each possible message is the number of moving atoms "i".  The number of different symbols they can use is N different energy levels as the number of moving atoms ranges from 1 to N.  When fewer atoms are carrying the same total kinetic energy E, they will each have a larger momentum which increases the number of possible states they can have inside the volume in accordance with the uncertainty principle. A complication is that momentum increases as the square root energy and can go in 3 different directions (it's a vector, not just a magnitude), so there is a 3/2 power involved. The information theory entropy S in bits is the sum of the surprisals. The log gives the information each message sends, summed over N messages.

S_2 = \sum_{i=1}^{N} \log_2\left(\frac{\Omega_i}{i} \right)  where  \Omega_i = \Omega_N*\left(\frac{N}{i} \right)^\frac{3}{2}

To convert this to physical entropy, change the base of the logarithm to ln(): S=k_B \ln(2) S_2
You can make the following substitions to get the Sackur-Tetrode equation:

\Omega_N = \left(\frac{xp}{\frac{\hbar}{2\sigma^2}}\right)^3, x=V^\frac{1}{3}, p=\left(2mU/N\right)^\frac{1}{2}, \sigma = 0.341, U/N=\frac{3}{2} k_B T

The probability of encountering an atom in a certain momentum state depends (through the total energy constraint) on the state of the other atoms. So the probability of the states of the individual atoms are not a random variable with regard to the other atoms, so I can't write H as a function of the state of each atom (I can't use S=N*H) directly. But by looking at the math, it seems valid to re-interpret interpret an S=ΩH as an S=N*H. The H inside () is entropy per state. The 1/i makes it entropy per moving atom, then the sum over N gives total entropy.  The sum for i over N was for the H, but then the 1/i re-interpreted it to be per atom. So it is an odd type of S=NH. Notice my sum for j does not actually use j as it is just adding the same pi up for all states.

S_2 = \sum_{i=1}^{N}\left[ \frac{1}{i} *\left( -1*\sum_{j=1}^{\Omega_i} \frac{i}{\Omega_i}\log_2 \frac{i}{\Omega_i}\right)\right] = same =  \sum_{i=1}^{N} \log_2\left(\frac{\Omega_i}{i} \right)

Here's how I used Shannon's H in a more direct but complicated way but got the same result:

There will be N symbols to count in order to measure the Shannon entropy: empty states (there are Ω-N of them), states with a moving atom (i of them), and states with a still atom (N-i).  Total energy determines what messages the physical system can send, not the length of the message. It can send many different messages. This is why it's hard to connect information entropy to physical entropy: physical entropy has more freedom than a normal message source. So in this approximation, N messages will have their Shannon H calculated and averaged. Total energy is evenly split among "i" moving atoms, where "i" will vary from 1 to N, giving N messages. The number of phase states (the length of each message) increases as "i" (moving atoms) decreases because each atom has to have more momentum to carry the total energy. The Ωi states (message length) for a given "i" of moving atoms is a 3/2 power of energy because the uncertainty principle determining number of states is 1D and volume is 3D, and momentum is a square root of energy.

S_2 = \frac{1}{N} \sum_{i=1}^{N} \left[ H_i \Omega_i \right]

Again use S=k_B \ln(2) S_2 to convert to physical entropy. Shannon's entropy Hi for the 3 symbols is the sum of the probability of encountering an empty state, a moving-atom state, and a "still" atom state. I am employing a cheat beyond the above reasoning by counting only 1/2 the entropy of the empty states. Maybe that's a QM effect.

H_i= -0.5*\frac{\Omega_i - N}{\Omega_i} \log_2\left(\frac{\Omega_i-N}{\Omega_i}\right) -  \frac{i}{\Omega_i} \log_2\left(\frac{i}{\Omega_i}\right) -  \frac{N-i}{\Omega_i} \log_2\left(\frac{N-i}{\Omega_i}\right)

Notice H*Ω simplifies to the count equation programmers use. E=empty state, M=state with moving atom, S=state with still atom.

S_i=H_i \Omega_i= 0.5 E \log_2\left(\frac{\Omega_i}{E}\right) + M\log_2\left(\frac{\Omega_i}{M}\right) + S \log_2\left(\frac{\Omega_i}{S}\right)

By some miracle the above simplifies to the previous equation. I couldn't do it, so I wrote a Perl program to calculate it directly to compare it to the ST equation for neon gas at ambient conditions for a 0.2 micron cube (so small to reduce number of loops to N<1 confirmed="" correct="" entropy="" equation="" is="" mega="" million.="" million="" molar="" nbsp="" official="" s="" st="" standard="" sup="" the="" with="">0
for neon: 146.22 entropy/mole / 6.022E23 * N. It was within 0.20%. I changed P, T, or N by 1/100 and 100x and the difference was from 0.24% and 0.12%.  #!/usr/bin/perl
  # neon gas entropy by Sackur-Tetrode (ST), sum of surprisals (SS), and Shannon's H (SH)
  $T=298; $V=8E-21; $kB=1.381E-23; $m=20.8*1.66E-27; $h=6.6262E-34; $P=101325;  # neon, 1 atm, 0.2 micron sides cube
  $N = int($P*$V/$kB/$T+0.5); $U=$N*3/2*$kB*$T;
  $ST = $kB*$N*(log($V/$N*(4*3.142/3*$m*$U/$N/$h**2)**1.5)+5/2)/log(2.718);
  $x = $V**0.33333;  $p = (2*$m*$U/$N)**0.5; $O = ($x*$p/($h/(4*3.142*0.341**2)))**3;
  for ($i=1;$i<$N;$i++) {  $Oi=$O*($N/$i)**1.5;
      $SH += 0.5*($Oi-$N)*log($Oi/($Oi-$N)) + $i*log($Oi/$i)  + ($N-$i)*log($Oi/($N-$i));
      $SS += log($Oi/($i));     }
  $SH += 0.5*($O-$N)*log($O/($O-$N)) + $N*log($O/$N); # for $i=$N
  $SH = $kB*$SH/log(2.718)/$N;
  $SS = $kB*$SS/log(2.718);
  print "SH=$SH, SS=$SS, ST=$ST, SH/ST=".$SH/$ST.", N=".int($N).", Omega=".int($O).", O/N=".int($O/$N);
[[User:Ywaz|Ywaz]] ([[User talk:Ywaz|talk]]) 19:54, 27 January 2016 (UTC)

Monday, November 23, 2015

FDA, pharmaceuticals, chemo, heart surgery, and supplements

The FDA will not allow "snake oil" to be advertised as a cure for a disease. It's a great idea.  However, in order to meet the FDA's burden of proof, it is a very expensive process to prove a treatment can help a disease.  The same burden of proof is not required for any surgery because there is no way to give the surgery that many times, especially in multiple centers trying to use different techniques, or even have a placebo group.  This is how the cardiology field gets away with literally selling snake oil and killing people who could have lived.   Honest groups of cardiologists have even published the vast failure of heart surgeries that should have been putting in tubes instead (basically, if you are not in the wake of a recent heart attack, you should be skeptical of the need for open heart surgery over a stent, according to the medical field's own research). The oncology field commits the same crimes under the pretense of investigating new drugs (for the past 4 decades...with no improvements) that sell false hope.  It's not distinguishable from snake oil. The details are usually arguable, but in many specific cases it is  clear. You can't get doctors to testify against other doctors in most cases or they can get in deep systematic trouble with their peers, especially with the heads of medical boards that issue the licenses needed to testify.  Researchers face the same problem: be careful what truths or opinions you tell or your funding dries up. 
My point in the comparison is that heart surgeries and chemotherapies that are known to be harmful and not provide any benefit are being prescribed, and yet people who sell very safe inexpensive compounds are not allowed to tell you (by FDA regulations and severe penalties) that they stop Parkinson's in the test tube, in animals, and prevent PD in epidemiological studies.  Instead, supplements require the same level of proof as usually toxic and expensive pharmaceuticals. Even the majority of private and public funding go to the pharmaceuticals there is every reason to believe will not help (at least in cancer) as opposed to the safe and cheap supplements that already have supporting evidence. Where's the logic in this if it is not because of some sort of accidental or intentional conspiracy in our economics? 
Then there is advertising:  GM1 phase 3 trials were complete over 5 years ago. But almost no one here has heard about GM1.  See
It reverted and stopped the disease from progressing, which is not being said of nilotinib.  It was 77 patients instead of 12.  The data is published.  Nilotinib study is not.  It was a phase 3 trial.  Nilotinib is only a phase 1 trial that is supposed to be only looking at safety. I wonder if a lot of professionals are aghast that they are saying these things through the media before publishing, especially since it was supposed to look only at safety. On the other hand only 12 people and the researchers doing such an outlandish thing like going public could mean it really is that good.  I just hope it is not cold fusion all over again, if you remember that fiasco where they went public without the science to back it up. Getting back tot he point of advertising:  I came across GM1 accidentally because I was looking for the trial on nilotinib.
Two years after treatment stopped, they had progressed to where "standard care" patients had been 2 years earlier.  In other words, the benefits seemed to permanently reverse the condition by 2 years even after treatment was stopped. That's my reading of figure 2 in the link above.
It's not a conspiracy theory, but I think it's just how economics works.  We all seek profit.  Doctors, politicians, pharmaceuticals, and researchers should not be expected to act any different than we do. I assume everyone acts about like mechanics, plumbers, painters, and AC repairmen.  My experience with them does not usually fall under the heading of "honest" and "fair".  If the consumer is not knowledgeable, then he should expect to be taken to the cleaners.
I do not know of something better than GM1, but that does not mean there are not 5 other compounds out there with similar proof.  It's just one I accidentally saw.  Gallic acid might be just as good and it's super cheap, but it probably hasn't been tested in people, except for the benefits noted from black tea and grape seed.

Friday, November 13, 2015

parkinson's: GM1 Ganglioside a cure?

Again and again, I accidentally come across some compound that seems to have profound effects for PD, and yet there is no one getting paid to tell us about it, so we never hear about unless it is through each other.  
For example, here's a phase 2 trial completed 4 YEARS AGO for a compound shown in the 1980's to help PD and it shows patients having a REVERSAL of symptoms for as long as they got the 2/day injections.   The patients were forced to stop taking it and began showing normal progression of PD in the 2 years since.   It seems to be very expensive as it needs to be derived from animals, but it might be much cheaper if there was a larger need for it.
JS Schneider has been publishing on GM1 for decades
http://www.ncbi.nlm.nih.gov/pubmed/26099170  (2015, 40 patients)
the clinical trial:
https://clinicaltrials.gov/ct2/show/NCT00037830  (subcutaneous injection, 100 mg, 2 doses per day)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3532888/  (2012, 77 patients)
http://www.ncbi.nlm.nih.gov/pubmed/20206941 (2010, 16 patients, end of 5 year study)
http://www.ncbi.nlm.nih.gov/pubmed/9633704  (1998, 45 patients)
http://www.ncbi.nlm.nih.gov/pubmed/7783880  (1995 worked in 10 patients)
http://www.ncbi.nlm.nih.gov/pubmed/1613817  (1992 in cats)
http://www.ncbi.nlm.nih.gov/pubmed/1350379  (1992, primates)
http://www.ncbi.nlm.nih.gov/pubmed/2568945 (1989 mice)

http://www.ncbi.nlm.nih.gov/pubmed/6141701  (1983, different researcher, rats)

making body generate GM1, worked a little, using sialidase from bacteria

The link below is the first of 5 research papers in the 1980's showing its benefit for PD in animals.  There are literally 100 natural compounds that have shown the same results in animals, although many of them will probably not reach the human brain, including the most powerful. 
Three years ago GM1 was discussed on this web site:
And an interesting comment from one poster (3 years ago)
"In the last five years I have seen dozens of these studies. I never see where they actually go into treatment. I am receiving the same treatment I would have gotten twenty years ago. Seems odd."
Moreover, this natural compound may be the underlying reason nilotinib works.  The tyrosine kinase inhibitors like nilotinib induce GM1  (see 2006 article http://www.jimmunol.org/content/176/2/864.long ).   So it begs the question: exactly why are the researchers interested in leukemia's tyrosine kinase inhibitors in the first place?  Did they or Novartis see the work on GM1 and say, hmmm, is there a pharmaceutical that can do this?
There are 61 papers with GM1 Ganglioside and Parkinson's in the abstract.  Here's one of them:
"The nigral neurons of PD subjects that were severely deficient in GM1 showed subnormal levels of tyrosine phosphorylated RET. Also in PD brain, GM1 levels in the occipital cortex, a region of limited PD pathology, were significantly below age-matched controls, suggesting the possibility of systemic GM1 deficiency as a risk factor in PD. This would accord with our finding that mice with partial GM1 deficiency represent a faithful recapitulation of the human disease. Together with the previously demonstrated age-related decline of GM1 in human brain, this points to gradual development of subthreshold levels of GM1 in the brain of PD subjects below that required for effective GDNF signaling. This hypothesis offers a dramatically different explanation for the etiology of sporadic PD as a manifestation of acquired resistance to GDNF. "

There is a company trying to make this compounds available to people with PD and it seems like it already has FDA approval as an orphan drug, so it could be used off-label for Parkinson's.
"The actual approval status is that the US FDA on December 3, 2012, designated GM-1 as an orphan drug in acute SCI treatment, and GM-1 is now in the US FDA IDE NDA (New Drug Application) and approval process."
"The authors of the letter go on to mention the renewed activity around GM-1 at the Federal Drug Administration (FDA). We were particularly encouraged and interested to hear about this, but upon searching the FDA site, we found only that the orphan drug designation had been given, but that, unfortunately, it was not approved for use in acute spinal cord injury (Figure 1).
"We then noted that the Sponsor for the application at the FDA was TRB Chemedica International S.A. and we wondered whether, if we wanted to use the drug “off-label” for acute spinal cord injury (as Geisler and Coleman are apparently suggesting), it would be available to us. In searching the site of this company that manufactures and sells drugs related to ophthalmology and joint diseases for any mention of GM-1, we found that it was listed in a history of the company back in the 1990s as a potential treatment for Parkinson’s Disease (Figure 2). Direct inquiries to the company on its availability yielded no response.
"It then appears that this renewed activity around the “actual approval status” does not afford the practitioner access to the drug, even if he/she wanted to use it off-label."

 Currently only 2 grams per brain and 1 gram per day is needed:
 Ovine GM1 ganglioside is currently produced from brain tissue supplied by GRI to Avanti Polar Lipids in Alabaster, Alabama.  Ovine GM1 ganglioside is currently produced for research use only.  Levels of GM1 ganglioside from brain are approximately 1.5 – 2.0 grams per brain.  Levels in spinal cord and other tissues such as salivary gland, adrenal gland, liver, kidney spleen, intestinal mucosa and many others are significantly elevated and maximum yield per lambs is expected to approach 4 to 5 grams based on preliminary data.

As a reminder, a "cure" for Parkinson's was discovered 20 years ago, with a safety study almost the same as this one:
It was known even in the 1980's but mad cow disease eliminated the cow-brain source of GM1.
This researcher JS Schneider keeps doing clinical trials (he has 235 papers, most on PD, about 27 on GM1), and the results are always the same: patients were better than at the start of the study for as long as they took it:
He first looked into it in 1989 in mice
The first research on this for PD was in rats in 1983....32 years ago!
The only reason GM1 is not available to us now is because it requires investment in flocks of sheep, so the pharmaceuticals were not interested.  It will remain a problem because with genetically modified sheep, then can still only get 1 or 2 g per brain, and 0.2 gram is the daily dose used in the trials, 1 sheep brain per 10 days. There is a PhD vetenarian and his M.S. degreed wife who are licensing the raising of the genetically modified sheep, now over 4000 ewes:
And here's the punch line:  nilotinib may be working the same as GM1.  Nilotinib's cousin imatinib "upregulates" GM1, so if GM1 is not available in enough quantity from the sheep, it can be combined with nilotinib to amplify it:  "Inhibition of Bcr/Abl tyrosine kinase activity by imatinib induces a high surface expression of GM1"

Thursday, November 5, 2015

Mechanical and Electrical Analogy: deriving of mass from charges, posted to talk page in wikipedia

I would love to see a theoretical physics discussion of the origin of the analogy between electrical and mechanical components. On the surface, you could say we find it easy to think in terms of linear systems, so we think of components that act linearly. This leads directly to the same simple differential equations for different systems, especially when conservation of energy is a natural focal point for optimizing engineering applications.  The most natural of the possible analogies is the impedance analogy which is the one most commonly used and first cited in the Wikipedia article. The reason for this is that charge in electrical components is simply replaced with meters in the mechanical components. There are no other changes. Capacitors allow charge to build up for a fixed dielectric distance and the analogous spring allows meters to build up for a fixed number of charges (which are the source of resisting compression). For small compressions and non-saturating amounts of charge, both are linear.  The same direct relation exists for inductors and mass: inductance (magnetism) from a classical view (pre-quantum) is a relativistic effect of charge build up per unit length, not a thing unto itself.  See Schwartz, Feynman, and Wikipedia. For small changes, it is again linear so V=L*di/dt instead of having to resort to full-blown relativistic equations. So it seems mass could be viewed from a pre-quantum perspective as the relativistic effect of (quark?) charges being brought closer together as a result of length contraction.  Again, it's linear for small changes in velocity so F=ma instead full-blown relativistic calculations. In short, linear electrical components control charge/length where length is held constant by the component, and mechanical systems do the same but hold the charge constant and allow lengths to change. This is simple enough that there should be some references out there that delve into the source of the analogies and thereby allow it to be included in the article.

Monday, November 2, 2015

Internal depth and romantic love

How do you define depth?
Here is my attempt for both men and women: Ability to surrender heart, mind, body, and soul to the one he/she loves. This a two way street.
All the talk about intelligence, experience, money, and appearance are missing the mark. These are like pre-requisites. Depth seals the deal.

Intelligence and maturity could be different types of depth that could make life with someone easier, but how are these really different from wealth?  The  type I am thinking about is about the ability to be intimate.

Qualities that seem related to this:
1) comfortable and content with self.
2) honest.
3) not a whiner
4) no preoccupation with appearance
The 4 above can be summed up as "mature".
Women who know how to let a man be a man is a big plus, as is men who know how to be manly when given permission.

I could replace "not a whiner" with  "ability to be fair". "Respectful" would also fall under "fair". "Whining" and "not fair" implies trying to get something material or emotional out of the other person without just cause or compensation. So someone in love with you is not showing depth if unconsciously trying to extract too much.  Not being able to detect that you are being harmed by their actions or requests is a lack of "depth".  Intelligence is thereby deeply connected to depth and maturity, but it is definitely not the key.
Many here have mentioned preoccupation with appearance.  "Narcissism" usually has roots in unconscious low self-esteem that results from childhood traumas. It can appear confident, happy, intelligent, etc, and yet have absolutely no depth when it comes to romantic intimacy. A deep person can see "something is off" when trying to get to known a "narcissist". I don't like using the word because its an incredibly brutal accusation against someone who has had a serious injury in childhood, and there all levels of it including "normal" people.  Above all, the pure narcissist's self esteem can't be threatened. The is a permanent road block to true intimacy. They can love, but not be loved.  The internal person the narcissist might have been has often been killed off by childhood traumas, be it faulty/bad relationships (or lack thereof) with parents or whatever.
"Mature" is not a hardened type of maturity.
"Confident" is a key to attraction, maturity, and depth. But it is not a requirement for anything other than attraction. Someone with low self-esteem and conscious of it can have plenty of depth, even more than they realize. But I do not think they can or should end up with someone confident. Love is fair even if fairness does not lead to love.
The confident find the confident. The down-trodden find the down-trodden. If God is merciful, the shallow will stick will find the shallow.

Saturday, October 31, 2015

Apigenin for parkinson's

Honey, pollen, and propolis ("bee glue" resin from trees) appear to be MAO inhibitors in the test tube. 

Especially propolis.  The active ingredient in it appears to be APIGENIN, which is available as a supplement. It protects against a-syn aggregation, ROS, increases BDNF, is an MAO inhibitor, and increases dopamine uptake. 

But like Xanthumol (which may work by increasing activity of GM1 like nilotinib that affect Bcr/Acl inhibitor) it also stops excess glutamate which is key in many neurodegeneration like that caused by hypoxia.

Luteolin and apigenin are the main active ingredients in Chamomile (tea) and artichoke which can be bought as extracts.  Their molecular weights are less than 400 (270) so they might cross blood brain barrier, although luteolin was found to not be bioavailable in one study, although its metabolites might be the active ingredients.  200 mg for either is the dose needed to copy the work in mice, but that is a bit high.

Apigenin is about 0.005% in orange juice, which is a decent amount. Hesperidin is 10x higher. Naringin in grapefruit juice is also 10x higher.  400 g grapefruit juice contains 0.2 g naringin.

Parsley and celery have the highest amounts, about 2% in fresh wet parsley, but maybe only 1% seems to reach blood stream. So 10 grams of parsley should give 200 mg Apigenin.

luteolin and apigenin  "enhancing monoamine uptake" "monoamine transporter activators"

apigenin is an inhibitor of liver enzyme CYP2C9 so the list in the link below needs to be consulted if you're taking medication. It will amplify the effect of any drugs listed in the "substrate" column by not letting them be broken down.  Most notably NSAIDs.

"In conclusion, apigenin and luteolin protected the dopaminergic neurons probably by reducing oxidative damage, neuroinflammation and microglial activation along with enhanced neurotrophic potential. The above results propose both these flavonoids as promising molecules in the therapeutics of PD. "

apigenin promotes neurogenesis

Bee products are MAO inhibitors, propolis is stronger.

Apigenin is main active ingredient in propolis, targets the preferred MAO-B (less chees effect) strongest

"has been used as a sedative and tranquilizer in Brazilian folk medicine and as a natural anxiolytic agent. Flavonoids from Passiflora edulis and P. alata have been shown to improve behavioral performance in rats [95]. The phytochemicals that contribute most to the effects of Passiflora are flavonoids such as apigenin-8-C-β-digitoxopyranoside, apigenin-8-C-β-boivinopyranoside (Figure 6), and luteolin-8-C-β-boivinopyranoside [88]. Apigenin and its derivatives are known to have anticarcinogenic, antioxidant, and anti-inflammatory properties [96]. Subchronic treatment with apigenin in APP/PS1 mice model downregulates BACE, β-CTF, and β-amyloid deposition and restores BDNF expression leading to increased memory and synaptic plasticity by ERK1/2/CREB-mediated prevention of AD [97]. Recently, we studied apigenin and found that it also plays a vital role in neurodegenerative disease. It exerts its anti-inflammatory effect on LPS-activated microglia and inhibits NO and PGE2 production by scavenging free radicals. Moreover, apigenin suppresses ERK1/2, p38 MAPK, and JNK and modulates NGF-induced neurite outgrowth in PC12 cells [88]. Additionally, apigenin has an apparent permeability coefficient in the BBB, and thus it serves as an effective phytochemical for the treatment of neurodegenerative diseases "