Introduction

Artificial Intelligence is the mathematical implementation of intelligence. A mathematical implementation is called an algorithm. In grade school, you learned to multiply two big numbers without a calculator. Remember? You learned an algorithm: A series of instructions to calculate a result. Back before computers there were humans called "computers" that followed complicated algorithms to compute things. Then computation was automated with electronics. Algorithms that instruct computers are usually called computer programs. To implement an algorithm nowadays, you almost always use a computer. Henceforth we'll use "computer program" and "algorithm" interchangeably.

Intelligence is applied learning. You first go to school and then apply what you learned to do things of value. Scientific research is societal-scale learning. Technology development is societal-scale application of societal-scale learning.

Machine learning is the mathematical implementation of learning. Applied machine learning is another way of defining "artificial intelligence". But people too often get the cart before the horse, so to speak. Learning is hard. Science is hard. Research is hard. They are hard and not just because there is no immediate value. Even if you value learning for its own sake, it is still hard work. Worse, you're never really "done". That's why people fail to recognize that the most important thing in artificial intelligence is the mathematization of learning.

The mathematization of learning is called "Algorithmic Information Theory". But almost no one in the "AI industry" is properly educated in Algorithmic Information Theory! We'll get around to the strange and bizarre tale of why that is the case. But it isn't because it is hard to learn what Algorithmic Information Theory is. That's the easy part of understanding Artificial Intelligence.

Later you'll learn how AI mathemetizes technology in something called Sequential Decision Theory.

But in the next few minutes, you'll learn just how simple and important Algorithmic Information Theory is.

Here we go!

Algorithmic Information Theory (henceforth "AIT")

Imagine Joe can get a billion dollars if he can guess the next number:

100, 200, 300, 400, 500, ?

Joe would decide to say "600". But before that, Scientist Joe would look at the series of numbers to learn the algorithm:

"Whatever the last number was + 100 is the next number."

Technologist Joe applies that algorithm to get the billion dollars by deciding to say "600". Why? It's not just because Joe looked at the series of numbers, learned an algorithm to follow and can add. It's also because Joe values money! But notice: Technology depends on getting the science right, so, understand science as a value neutral activity.

Let's say the numbers given to Joe were, instead:

100, 200, 301, 400, 500, ?

That nasty little "1" stands out there like a sore thumb. But Joe, who has common sense, would go ahead and bet that the answer is still 600. (Below we'll get to a less common guess of 601.) In fact there is mathematical proof that common sense is correct! That proof is the basis of the mathematization of science called Algorithmic Information Theory (henceforth "AIT").

AIT proves you should bet according to the predictions made by the shortest algorithm that generates all past observations. That algorithm can then keep going to generate predictions of future observations. If you've heard of "generative AI" you now know what "generative" means.

Before AIT was mathematically proven, this rule about how to select the best algorithm to generate predictions was known as Occam's Razor: The simplest explanation is the best.

Here's the shortest computer program for the above sequence of numbers. Follow the instructions step by step and you can see how AIT works:

PreviousNumber now becomes 0

Repeat Forever:

ThisNumber now becomes PreviousNumber + 100
IF ThisNumber is 300:

THEN Error now becomes 1
ELSE Error now becomes 0

Guess ThisNumber+Error
PreviousNumber now becomes ThisNumber

A less common guess is 601 because some people think the rule is "Every 3rd number will have 1 added to it."

But this makes the IF condition longer:

IF Remainder of ThisNumber/300 is 0:

Remember long division from grade school? When a number isn't evenly divisible by another number (in this case 300) there is a "remainder". So the condition that predicts 601 is longer.

The ideal artificial scientist can discover this algorithm by algorithm! It is an algorithm that creates other algorithms, a meta-algorithm. Here is that ideal artificial scientist:

Automatically write all possible algorithms.

Execute them all.
Keep only those that output 100, 200, 301, 400, 500...
Select the shortest of those algorithms as the best algorithm.

The most obvious practical problem with this is that there are an infinite number of possible programs. This is the same problem human scientists face when asked to prove that their theory is the best theory possible. All a human scientist can do is appeal to Occam's Razor or, if he's a physicist, to Albert Einstein's paraphrase of Occam's Razor:

"A theory should be as simple as possible but no simpler."

Of course, not even Einstein could prove that his theories were the simplest possible. All he could do was argue that they were simpler than other theories people had come up with.

Now for the Big Reveal

The AI Industry has several "But The Dog Ate My Homework!" excuses for ignoring AIT:

* Artificial scientists can't prove a candidate algorithm is the shortest one possible. Never mind that not even Einstein could prove his theory was the best one possible.

* AI experts are used to measuring error in statistical terms -- not in algorithmic terms. e.g. they can't even begin to think about errors as what are called "program literals" such as "Error = 1" in the example above.

* Critics of AIT say the language in which to express algorithms is "arbitrary" when, in fact, it is no more arbitrary than science's choice of arithmetic to compute predictions.

* They aren't even aware that Algorithmic Information Theory rigorously formalizes science!

This last excuse is the most troubling of all. Popular "philosophers of science" like Popper and Kuhn spread ignorance of AIT right at the beginning of the explosion of data and computation, which continued year after year and continues after 70 years! Worse, radical social changes made AIT needed as a way to select which theory of social causation best fit the explosion of social data. If you want to know why the social sciences don't have a principled criterion to select the best theory of the cause of social ills, ask Popper and Kuhn.

Given how divisive politics have become -- especially over the claim of who can rightfully don the authority of "science" -- perhaps debates over the "OUGHT" of artificial intelligence "alignment" ought to take a breather and get the "IS" of science right, starting with what science really is [1]. Then our decision-makers might have what they need. That brings us to the other half of Artificial Intelligence: Sequential Decision Theory.

The Other Half of AI: Sequential Decision Theory

Returning to our human friend, Joe for a moment. Let's imbue Joe with a magic genie Scientist, named "Gene".

Joe asks Gene, "What's the shortest algorithm that generates a simulation of everything that has ever happened in the world?"

Gene says, "Uh, you don't mean that literally do you, Joe?"

Joe says, "Of course not. I mean just give me a USB thumb drive that has The Algorithm."

Gene hands Joe the thumb drive. Joe puts it in his laptop's USB port. The laptop generates the history of the world right up to the present (it's a magic laptop). And then it keeps on generating future events faster than Reality unfolds. This lets Joe see the future. Joe flies to Las Vegas. He cleans out the casinos, buys a million Bitcoin with the winnings and flees Las Vegas with the mafia thugs hot on his tail. We'll now return to our regularly scheduled lesson.

What you just witnessed was the application of a world model to get value. Nothing is remotely that perfect. But business schools of "management science" teach you about "Decision Trees". The leaves have various values, like the billion dollars that Joe is after. The leaves also have negative values, like the mafia thugs catching up with Joe because Joe didn’t pay enough attention to his magic laptop to anticipate their moves. The forks in the branches are decisions, like the proverbial "fork in the road" where you must decide between the path taken and the path not taken.

But these forks have gambling odds associated with them. Nothing is certain. You can't see to the end of each fork in the road. Your Genie and computer aren't magic maps of the future. But you simulate the future with the shortest algorithm that your AIT meta-algorithm can come up with and the computer you have to explore the future scenarios, looking for the greatest expected value given all the risks of all the various paths.

So you provide something called a Sequential Decision Theory algorithm with your value system otherwise called a utility function to which it can refer to decide which paths along the branches have the highest expected value given the sequence of gambling odds along each path and the pot of gold at the end of the path.

The Sequential Decision Theory algorithm uses the shortest algorithm that your AIT algorithm could come up with to calculate the gambling odds for each decision along each path. These gambling odds are called the "Algorithmic Probabilities" of your decision tree.

[1] See Hume's Guillotine for the IS vs OUGHT distinction in the philosophy of scientific ethics. See also https://github.com/jabowery/HumesGuillotine

Feral Observations

Thursday, February 06, 2025

Artificial Intelligence In A Nutshell

Here we go!

Algorithmic Information Theory (henceforth "AIT")

The Other Half of AI: Sequential Decision Theory