Monday, April 15, 2024

The Scientific Validity of the Algorithmic Information Criterion for Model Selection

Thesis

The Algorithmic Information Criterion (AIC), based on Kolmogorov Complexity, is a valid and principled criterion for model selection in the natural sciences, despite objections regarding the choice of the underlying Turing Machine.

Supporting Arguments

  1. Universality of NiNOR Gates: A finite, directed cyclic graph (DCG) of N-input NOR (NiNOR) gates can, given a starting state, perform any computation that a Turing Machine with finite memory can execute. This universality suggests that the choice of a specific computational model can be principled, akin to choosing an axiomatic basis in mathematics.
  2. Minimization of NiNOR Complexity: By creating an instruction set emulation program that simulates a directed cyclic graph of NiNOR gates (which in turn provides the instruction set), and another program written in that instruction set to output a given dataset, a parameter-free definition NiNOR Complexity is established: The minimum length of these two programs. Note that since both programs are written in the same non-arbitrary instruction set, this factors out any arbitrary Universal Turing machine that might be chosen to emulate the instruction set emulator.
  3. Philosophical Consistency with Scientific Methods: By removing an "arbitrary" parameter from Kolmogorov Complexity's definition of Algorithmic Information, Solomonoff's proofs can be revisited without any parameter any more subjective than the proof of NOR gate universality. All that must be given up is the notion of infinities. Moreover, this revised definition of an Algorithmic Information Criterion for model selection retains its relevance to the dynamical systems of the natural world -- a decisive advantage over statistical information criteria.

Counterarguments

  • Claim of Arbitrary Turing Machine Choice: Critics argue that the choice of Turing Machine in determining Kolmogorov Complexity is arbitrary because one can tailor a machine's instruction set to trivially minimize the complexity of any given dataset.
  • Reductio ad Absurdum on Turing Machine Instruction Set: Critics might use a reductio ad absurdum approach by proposing a Turing Machine whose instruction set includes a command that outputs the entire dataset in a single instruction, thus falsely reducing the complexity to an absurdly low value.

Rebuttals

  1. Non-Arbitrariness in Computational Model Choice: The choice of a particular model and its instruction set reflects underlying computational principles (e.g., the universality of NiNOR gates) and is not more arbitrary than foundational decisions in other scientific and mathematical fields.
  2. Logical Flaw in Critics’ Argument: The critic’s approach to arbitrarily defining a Turing Machine’s instruction set to minimize complexity does not properly consider the complexity of the instruction set itself in which the dataset is encoded. By focusing on trivializing the output instruction, they overlook the broader implications of the instruction set’s design, which fundamentally contributes to the system's overall complexity. This misrepresents the principle of Kolmogorov Complexity, which aims to measure the minimal description length of the dataset in a way that genuinely reflects its informational content, rather than artificially minimizing it through tailored instruction sets.

Conclusion

The critique against the Algorithmic Information Criterion (AIC) using Kolmogorov Complexity based on the arbitrary choice of Turing Machine does not withstand scrutiny. Proper understanding and application of AIC demonstrate that it robustly captures the essential complexity of datasets consistent with Solomonoff's proofs. This complexity includes the design of the instruction set itself, which should not be arbitrarily minimized to misrepresent the dataset's intrinsic informational content. Thus, the AIC remains a principled and effective method for model selection in the natural sciences. Indeed, prior criticisms based on the supposed subjective choice of UTM are considered not only specious but harmful to the scientific enteprise.

No comments: