The conversion factor from physical entropy to information entropy (in random bits) uses Landauer's limit: (physical entropy)=(information bits)*kb*ln(2). The number of yes/no questions that have to be asked to determine which state a physical system is in is equal to Shannon's entropy in bits, but not Shannon's intensive, specific entropy H, but his extensive, total entropy of a data-generating source: S=N*H where H=1 if the n bits are mutually independent.
Landauer's limit states that 1 bit of information irreversibly changing state releases entropy kb*ln(2), which is a heat energy reelase for a given T: Q=T*kb*S, implying there was a stored potential energy that was the bit. This shows that entropy is information entropy: the ln(2) converts from ln() to log2(). kb is a simple conversion factor from average kinetic energy per particle (definition of temperature) to heat joules which has units of joules/joules, i.e. unitless. If our T was defined in terms of joules of kinetic energy (average 1/2 mv^2 of the particles) instead of Kelvins, then kb=1. So kb is unitless joules/joules. It's not a fundamental constant like h. c also does not have fundamental units if you accept time=i*distance as Einstein mentioned in appendix 2 of his book, allowing use of the simpler Euclidean space instead of Minkoswki space without error or qualification and in keeping with Occam's razor.
Shannon's "entropy" (specific, intensive) is H=sum(-p*log(p)) and he stated 13 times in his paper that H has units of bits, entropy, or information PER SYMBOL, not bits (total entropy) as most people assume. An information source generates entropy S=N*H where N is the number of symbols emitted. H is a "specific entropy" based on the probability of "n" unique symbols out of N total symbols. H is not a "total entropy" as is usually believed, finding its physical parallel with So=entropy/mole. Physical S=N*So and information S=N*H. It is rare to find texts that explain this.
An ideal monoatomic gas (Sackur-Tetrode equation) has an entropy from N mutually independent gas particles of S=kb*sum(ln(total states/i^(5/2)) where the sum is over i=1 to N. This is approximated by Stirling's formula to be S=kb*N*[ln(states/particle)+5/2]. I can't derive that from Shannon's total entropy S=N*H even though I showed in the first paragraph the final entropies are exactly the same. You can't directly correlate an informatic "symbol" (such as the particles in a gas or the phonons in a solid) in a physical system to the symbols in Shannon entropy because physical entropy is constrained by total energy that can be distributed among fewer than the N available particles or phonons. Once you've characterized a source of symbols in Shannon entropy, you know the number of symbols and that is what you use to calculate the entropy. But the energy in a physical system can be distributed among N or fewer particles which results in a N! divisor instead of an N^N divisor in Shannon's N*H = N*log[(states in N symbols)/(N)] plus using a log rule. This gives physical entropy more possible ways to use the N particles than information can use N symbols. 1 particle carrying the total energy is a possible macrostate (not counting the minimal QM state for the others), but information entropy does not have a "constrained only by the sum" of an external variable like this to use fewer symbols. Physical entropy seems to always be S=kb*N*[ln(states/particle)+c] and the difference from information entropy is the c. But the c is generally not a huge factor and might become insignificant in bulk matter where the energy is spread equally between bulks, then physical entropy is then S=N*So. Information entropy is perfectly like this (S=N*H), but you can't derive So from H, but it's always true that Sbits=S/(kb*ln(2)) (i.e., the exact state a system is in can be specified by answering Sbits of yes/no questions.
To show the exact equivalence between physical and informational entropy, I have to precisely define and constrain an informational situation to match physical entropy. It goes like this: the message to be sent will consist of 1 to N symbols. The number of states each symbol can have (the size of the "alphabet") is n to n/N as you go from 1 to N message length. In this concept I would like to correlate n with total energy, but disastrously to my desire for simplicity this is not so. The number of states in each N as you go from 1 to N depends on a combination of the size of the box and the total translational energy. To add more trouble, momentum is a v factor and energy is v^2 which results in a 5/2 factor (otherwise it would be a ^3 factor having to do with 3 dimensions).
The simplest way to view the exact correlation is to view a gas in a box as a sequence of messages of length 1 to N based on the number of particles carrying the total translational energy (heat) and then calculate the number of states (the size of the alphabet) they each may posses. So the correlation is a sum of the entropies of different messages of increasing length, with a decreasing alphabet size in each message. The way the alphabet size decreases from 1 to N is an equation determined by the size of the box and the total heat. It's nice to finally understand it, but I hate this conclusion! The energy is distributed in N particle velocities, and it may use less than N, and it's energy per particle times the time to cross the box that determines how many states are possible for each of the N that are carrying the energy.
So Shannon's entropy is a lot simpler and comes out to LESS entropy if you try to make N particles in a physical system equivalent to N unique symbols. The simplest physical entropy is of independent harmonic oscillators in 1D sharing a total energy but not necessarily evenly is S=kb*ln[(states/oscillator)^N / N!] which is S=N*[log(states/particle)+1] for large N. So even in the simplest case, the c remains. Shannon entropy is of a fundamentally different form: S~log((states/symbol)^N) = N*log(states/symbol) when each symbol is mutually independent (no patterns in the data and equal symbol probabilities). For example, for random binary data S=log2(2^N) = N bits. So it is hard to see the precise connection in the simplest case, even as they are immediately shown by true/false questions to be identical quantities with a simple conversion factor. Stirling's approximation is exact in the limit of N and Shannon's H depends in a way on an infinite N to get exact p's, so the approximation is not a problem to me.
I have not contradicted anything user346 (physics stackexchange question) has said but I wanted to show why the connection is not trivial except in the case of looking at specific entropy of bulk matter. QM uses S=sum(-p*log(p)) but Shannon entropy is S=N*sum(-p*log(p)). They come out the same because calculating the p's is different. Physical's p=(certain macrostate)/(total microstates) but the numerator and denominator are not simply determined by counting. Information's p=(distinct symbol count)/(total symbols) for a given source. And yet, they both require the same number of bits (yes/no questions) to identify the exact microstate (after applying kb*ln(2) conversion).
But there's a problem (in calling black hole entropy the maximum amount of informtion bits that can be stored) which was mentioned in the comments to his answer. In an information system we require the bits to be reliable. We can never get 100% reliability because of thermal fluctuations. At this limit of 1 bit = kb*ln(2) we have a 49.9999% probability of any particular bit not being in the state we expected. The Landuaer limit is definitely a limit. The energy required to break a bond that is holding one of these bits in a potential memory system is "just below" (actually equal) to the average kinetic energy of the thermal agitations. Landauer's limit assumes the energy required to break our memory-bond is E=T*kb*ln(2) which is slightly weaker than a van der waals bond which is about the weakest thing you can call a "bond" in the presence of thermal agitations.
So we have to decide what level of reliability we want our bits. Using the black hole limit also seems to add a problem of "accessibility". It is the information content of the system, but it is not an information storage system.