I have been wondering the last few years if I could think of a "fundamental unit of intelligence" in the same way the "bit" is a measure of raw uncompressed information and that a complete Turing machine (in my mind a "fundamental unit of thought") can be constructed out of a bunch of NAND or XOR gates without anything else. The "intelligence" (or value?) of the bits is what they represent and the "intelligence" of the NAND gates is how many there are and the memory they hold (bit states), assuming they are wired efficiently. There are certain ways of wiring for maximum efficiency for certain applications (for example to implement specific mathematical formulas), and for simple Turing machines it may not be hard to get close to an ideal.
Since I've mixed in "intelligence" with my bits and NAND gates, I need to define "intelligence" better. Rather than philosophizing ad naseum, I'll just jump to the end and tell you the answer to life, the Universe and everything so that I can define intelligence. The answer is to the question "how can we quickly convert energy to structure". This is the meaning of life, its goal. To take high energy photons, chemical bonds, heat differentials, nuclear bonds, and gravitational fields and to reduce these extractable sources of energies into ... well exactly what I am not sure. Making exact or imperfect copies of the things that do the extraction for more extraction is a beginning step. More and more complex and "intelligent" forms of life seem to come along with the ability to extract energy from the reduced forms of energy. It seems to increase the rate at which the heat death of the Universe is achieved. This is normally viewed as a perfectly random system of fundamental particles not moving (at least relative to each other), assuming the Universe does not collapse and begin again. But random is only the result of perception. It might be more accurate to say the final structure ("randomness") is perfectly logical in that it extracted all the energy sources available. To be more concrete, imagine we mine all matter in the solar system to place a shield around the Sun to capture all the photons so that we can create high-energy bonds of Silicon, Carbon, or whatever can be used to get the most out of each photon so that lower-energy photons do not leave the solar system. When we've idealized the structure that surrounds the Sun, then the extract photons could be converted to particles for more structure. Maybe gravitons could be captured, but that leads to things that take me off-track. But then we have high-energy sources again in the Si-Si and C-C bonds and whatnot. So it seems avoiding the heat death of the Universe is possible. Could "we" recycle energy at 100% efficiency so that each galaxy has perpetual "happiness"? The galaxies are supposed to be moving such that one day they will never be able to influence each other, not even getting light from each other.
Well, ANYWAY, all the AI I see seems to be about "prediction" which is a better way of saying "pattern recognition". "Compression" seems to be intimately connected with effective prediction. Or rather, compression algorithms are the most efficient way of representing a given type of "media". The compression algorithm is better if it is tailored to the type of media in the same way the brain is pre-coded to develop to be able to do certain tasks better. The most efficient compression will be an algorithm designed for a specific type of media, and then designs an instance of itself even better based on the totality of a given set of data being compressed: it creates a "look up table" and an algorithm to decompress a string of representations based on the data at hand. In other words, a highly-tailored algorithm will be created that has the ability to convert the data at hand into representations that refer to the look up table. In Numenta's CLA that is supposed to be like the brain, each cell or segment of synapses is like a "fuzzy word". By this I mean each one is like an item in a look up table, but the item by itself does not have meaning and needs to be combined with other cells or synapse activations to have meaning in the next-higher level of cells. So the tailored algorithm will refer to a string of representations which will refer to a look up table. The string is read by the algorithm, so the string is the condensed program and the algorithm is the interpreter. The look up table is the memory. But it's hard and probably not necessary to keep the algorithm and string of "representations" and look up table as distinct. The string is intrinsically connected to the look up table and algorithm. It's all really just an algorithm as in kolmogorov complexity. Or rather, it's like physics trying to reduce the observed world to the simplest set of equations (Occam's razor), and that this methodology results in the most "accurate" equation because more complex equations have more bits that can define equivalent results. Maybe this is useful in physics at least because smaller equations also allow for describing a wider range of things. Longer equations seem more specific, and they are if they are the shortest possible length and there are no equivalents. But now imagine there are a lot of equivalent equations for a given set of observations, which means they will also be more restrictive with respect to each in terms of "potential" observations not yet tested. Imagine they are all correct in the potential observations, but not overlapping due to being too restrictive because they are too "long". Then a simpler equation, if it can be found, is more important.
OK, well, I still haven't gotten close to where I want to go. AI seems to concentrate on "prediction" and in the closest cases of achieving "intelligence" it seems to be looking for compression to a kolmorogov ideal simplicity. Sort of like physics equations are the most "intelligent" thing AI knows. The thing I see missing in AI discussion is "achieving profit", whether or not it is defined as "making copies of oneself". Compression/prediction is a type of seeking profit, and I'm sure there are plenty of more directly-interpretable profit-seeking algorithms like least squares curve fitting, hill climbing, and the calculus of variation. Compression/prediction has inherent it some sort of "model" of whatever is generating the data. It may be related to the actual thing generating the data, or an even more efficient representation of it. What's needed (I think I'm finally getting to my point) is the ability to identify what "inputs to" or "control knobs in" the generator can be controlled at the least expense to get the most profit. Profit needs to be defined for the algorithm, but an intelligent algorithm is one that most efficiently discovers the most efficient algorithm that can get the most profit from a giving data generator. Intelligence needs not only to predict, but to change the data stream or the generator in order to profit. A compressor has a certain level of reversibility to it depending on its desired accuracy. Being able to change the data or generator seems to insert causality.