Wednesday, December 3, 2014

elements of an intelligent agent

newer summary:
Understanding = smallest possible algorithm that can generate the previously unpredictable data being observed ("output"), from still-unpredictable data ("input") being observed. AKA Occam's razor in physics, AKA Kolmogorov complexity.  The still-unpredictable data can be viewed as the compressed data and the algorithm the decompressor. 

Intelligence = ability to optimally change the changeable inputs and the changeable parts or dials of the data generation "machine" (that is modeled in the understanding definition above) for maximum profit divided by the minimal number of bits needed to expressthe intelligence algorithm (memory space of the algorithm which is a real mass determined by the technology of the hardware) and also divided by the "FLOPS" needed to run the algorithm (computation time or real energy) to get the profit.  These last two, algorithm storage and execution, can each have a weighting factor depending on the cost of the machine upon which the intelligence algorithm is running, if the intelligence measure is to be penalized for running on expensive hardware like the brain. The two weighting factors can be ratio-ed to be viewed as a conversion factor like c is a conversion factor from mass to energy and space and time (meters=i*c*seconds see Einstein's "Relativity" appendix 2).  You sum this up over a set of Environments to get a more general measure of the intelligence for comparison to other algorithms (which includes the "intelligence" of their machines if you include the weighting factors and standardized all weighting factors to something like the dollar). This is Hutter Intelligence with my addition of hardware costs.  If it is a competitive environment, other agents actions are part of the "environment", so "working with" other agents for profit can be more intelligent than understanding the part of the environment that is NOT other agents, but at least some portion of agents need to understand the environment from which all agents profit. An agent who understands the non-agent-environment "the best" (e.g. physicist+"evolutionist"+mathematician) may seek to profit without regard to the desires of other agents, but the less intelligent agents may consider him a threat and reduce his profit and therefore the objective measure of his intelligence, unless he takes them into account and controls their art and therefore their governments or other methods and thereby enslaves or destroys the other agents (e.g. "bad" A.I.).


=======
What is the highest intelligence?  The simplest agent that can evolve itself to find the optimal solution to any problem?  Does K complexity of a trained network (algorithm and memory) take into the amount of energy it needs to compute?  The amount of hardware?  Both?  Can K complexity refer to an agent that has not yet replicated and designed itself to solve the problem at hand?

The paper by Legg and Hutter "A Universal Measure of Intelligence for Artificial Agents" defines intelligence as the sum over each environment the agent was subject to (aka problem space) of ((profit acquired over time from interacting with the environment divided by 2^(Kolmogorov complexity of the agent plus time of computation) ))   This 2^(Kt) factor is Levine's Kt complexity definition from 1973.

Their language is like this: agent Perceives an Observation and Reward and applies an Action and this exchange repeats in the next time step.  The total Value (reward) is the Sum over the time steps i=1 to n of 1/i^2 * Ri.  They arbitrarily select 1/i^2 as a weighting factor because it is greedy if the interaction is just beginning, but then not so greedy if the interaction has been going on a long time.  

So to answer my questions above, it seems they are giving greater value to simpler agents, but penalizing them for time of computation.  There seems to me that intelligence will be situation-dependent based on the weighting factor you give to computation energy, time, and resources required (like using a larger physical memory to reduce computation time and energy).
The brain has 10,000 or so genes and if they average 1000 or so amino acids (~20 amino acids =~ 4.5 bits each) then he brain is encoded in about 5.6kBytes. I wonder if this is close to the ideal Kolmogorov complexity of the computing brain as Eric Baum's book "What is Thought" discussed. But the build of the structure is under the influence of the environment, so it's made much more specialized, big, and complex as it is built.  Is the information from the environment given during building part of the coding algorithm or the training or both?  The environment is not just the computer upon which the DNA program is being run.  It is providing data to tell it how to construct  and expand itself, i.e. it is part of the program.  But then again neural nets assign weightings on the program and maybe expansion is not different, so maybe it can be considered training only.  GA+NN in an environment can also expand itself as the environment dictates. Maybe it is just a matter of semantics.  If I define x part of environment is that which merely decodes the program and expands it and the remaining y part is training, then I do not know if 2^Kt is thereby decreased since decoding may not be part of K (program) or t (training and computation).  Could someone call training "decoding" and not penalize an algorithm for that?  A large memory in a large network that results in less computation time/FLOPS to solve a specific problem due to its size (use sparse activation) means that the training needed to specify the network of simple elements must be considered part of the program or computation time/FLOPS.  I need to read the two classics I have on Kolmogorov.




OK, so that's about measuring intelligence. What intelligence need to do are:
1) model environment accurately ...
   1b) and simply.  This needs to be reduced in complexity so that 3) and 4) can be accomplished more easily, in addition to fundamental Occam's razor reasons of being more sure of having hit upon a more "real" truth and of using less computing resources and time.
2) have or discover goals
   2b) This might mean it needs to model itself. It at least means a model will be implicit. (desire)
3) discover what parts of the environment it can change.
   3b) this might mean it needs to model itself. It at least means a model will be implicit. (sense of power)
4) maximize 2) which is NP-hard
5) do each of the above most efficiently in terms of program size, memory size, energy required, and computation time. 
6) ability to work with or merge with other A.I. in order to defeat more intelligent A.I. that are not for you or serve more intelligent A.I. who are with you.





No comments:

Post a Comment