Mon, May 4, 7:56 PM
To Antonio Badia
Professor Badia,
My motivation in 2005 for suggesting Wikipedia as the corpus for The Hutter Prize For Lossless Compression of Human Knowledge was to factor out bias in Wikipedia, and generate latent identities responsible for that bias. Therefore the recent surge of interest in Algorithmic Bias has left me looking for those who approach the topic from an information theoretic perspective.
That's how I found "The Information Manifold: Why Computers Can't Solve Algorithmic Bias and Fake News".
The importance of this topic is hard to exaggerate. I was one of the earliest (1982) prognosticators of what has now emerged as "The Trump Phenomenon":
There is a tremendous danger that careless promotion of deregulation will be dogmatically (or purposefully) extended to the point that there may form an unregulated monopoly over the information replicated across the nation-wide videotex network, now underdevelopment. If this happens, the prophecies of a despotic, "cashless-society" are quite likely to become a reality. My opinion is that this nightmare will eventually be realized but not before the American pioneers have had a chance to reach each other and organize. I base this hope on the fact that the first people to participate in the videotex network will represent some of the most pioneering of Americans, since videotex is a new "territory".
I predicted, then, that the response could be a potentially disastrous imposition of monopolistic censorship. That's why I approach the topic more seriously and with greater prescience than the vast majority of recognized experts in the field. To wit, this censorship is now placing the West on the precipice of the modern equivalent of The Thirty Years War that killed upwards of 20% of the population, ending in The Peace of Westphalia.
It is with this grave perspective that I bring to your attention an omission in your otherwise admirable information theoretic ansatz.
Bias (and fakery) is generated by suboptimal models of the world. Our task, in service of truth, is to find the optimal model given the limited intelligence and data at our disposal.
What the late Ray Solomonoff proved was that if an algorithm generates the world we measure, the smallest program that outputs all of those measurements, provides optimal predictions about the world. There are 3 critiques some may level at this approach to optimal model selection:
- It is not provable that the world is generated by an algorithm.
- Since a given program cannot be proven the smallest capable of outputting the measurements, it cannot be proven optimal.
- Since not all measurements are available in a single corpus, some selection process must take place and this selection may bias the data itself.
#1 is facile since all engineering (including social engineering entailed by dealing with bias and fake news) more or less formally calculates predicted outcomes.
#2 Although I can appeal to the long history of science that, for whatever reason, finds this heuristic convincing and although I can appeal to authorities you likely respect (Minsky and Chomsky who considers Minsky the authority in this regard) on this exact issue, as well as recent authoritative articles with increasing acceptance, I'll simply state that any program that outputs all prior measurements is under increasing constraints as its size decreases. These constraints narrow down the range of universes in which its predictions may be true. This is the essence of information in that information constrains possibilities.
#3 In addition to the search for unbiased truth ('is"), a plausible universal value we might agree on ("ought") is that avoiding a modern version of The Thirty Years War is highly desirable. To the degree that various parties claim a given corpus is biased, they presumably have reasons that can be backed up by observations. A simple example might be set of measurements showing that water freezes at 1C and boils at 101C. Claims that these measurements are biased are based on data from other measurements, and not just of temperature, but of any of a variety of physical phenomena that provide an overcomplete basis set, triangulating on a model of the world that reifies a latent identity, the thermometer in question, and a correction which, when applied to its measurements, bring it into consilience.
Note that #3 accomplishes precisely what I wished to accomplish in suggesting not just Wikipedia, but its change log, as the corpus to Marcus Hutter. I wanted to identify the culprits responsible for information sabotage. However, since the change log was too large for impoverished but gifted programmers around the world to handle with the computers available at that time, I demurred, accepting that latent identities would have to emerge from the competition.
The Hutter Prize is a small, but essential, step toward an unbiased model of digital content. The funding for the Hutter Prize should be orders of magnitude larger than it is not only because of the urgen need for such a model, and not only because the simple and rigorously justifiable model selection criterion of a single figure of merit, size, reduces the potential for motivated argumentation, but because no money is paid out without proportionate improvement in the resulting model. This stands in contrast to the vast sums now being paid to deal with a lack of reliable information -- sums paid for techniques that frequently pour gasoline on the fire of bias.
-- Jim Bowery, Hutter Prize Judging Committee Member
PS: I want to make clear that my motives are different from Marcus Hutter's motives. In his recent expansion of the scope and size of the prize, he changed the rules to discourage manual model creation and encourage automatic model generation. As senior scientist at DeepMind (and PhD advisor to its founders) this is, in effect, his job. In the original rules I had strongly advocated that a prize entry need consist only of a compressed corpus so that human beliefs could make direct contributions in the form of large scale production systems whose rules could serve as search heuristics for optimal models.
No comments:
Post a Comment