Wednesday, April 26, 2017

Rescuing Computer Science With the Relational Dimensions of the Empirical World

First I'm going to make a few radical assertions:

  • A real-world relation is best-regarded as a random variable.  Think of measurement.  This is consistent with SQL's default allowance of duplicate rows in an extension.  These count tables represent the probability distribution of the random variable.  Each relationship (row) in an extension is, therefore, best thought of as a single measurement, or case.  The duplicate row counts are therefore case-counts.  A probability density function results from simply dividing each case's count by the total counts in the relation.
  • The properties of a measurement (say, time and distance) are the dimensions of the measurement and these correspond to the columns of the extension.
  • Any measurement can be thought of as a low-dimensional selected projection of the empirical world: the universe. The universal extension has a single row -- a row with as many dimensions as the entire history of the universe has properties:  We might call this row "That which is The Case."

Now, accepting all of that (which philosophers may well argue against -- particularly if they don't like Descartes, etc.):

In order for the random variable to have meaning, its dimensions must have counts, just as do its duplicate rows.  For instance, we might think of a relation whose composite dimension is velocity, with columns: time and distance.  Although there might be meaning to a physical dimension of time*distance (time^(+1) * distance^(+1)) that is not the physical dimension we call "velocity".  To obtain a velocity relation, we need distance/time which is time^(-1) * distance^(+1).  Note that these terms commute because multiplication (like 'and') commutes.  Column order is meaningless, just as is row order meaningless since addition (like 'or') also commutes.

Now consider the relational dimension of energy where we join the velocity relation with a mass relation and assign column counts thus: time^(-2) * distance^(+2) * mass^(+1).

Note that thus far, I have not talked about "units", nor of "types".  First a down-to-earth comment about "units":  It is important to regard "units" as I/O formats (or "representations") with isomorphic transformations between them (1:1 correspondence between a distance measurement in inches and one in feet).  Second is a more philosophical comment about "types" vs "dimensions" that gets to the heart of what I believe is a huge mistake in the foundation of computer science dating to Russell and Whitehead's Principia Mathematica:

PM's type theory (and elaborations/variations thereof) is the current foundation of computer science.  Russell used it to develop Relation Arithmetic.  In "My Philosophical Development", of Principia Mathematica Part IV "Relation Arithmetic", Bertrand Russell laments:

"I think relation-arithmetic important, not only as an interesting generalization, but because it supplies a symbolic technique required for dealing with structure. It has seemed to me that those who are not familiar with mathematical logic find great difficulty in understanding what is meant by 'structure', and, owing to this difficulty, are apt to go astray in attempting to understand the empirical world [emphasis JAB]. For this reason, if for no other, I am sorry that the theory of relation-arithmetic has been largely unnoticed."

However, the ultimate project of Principia Mathematica was directed at "the empirical world" in the conclusion of PM: Part VI "Quantity". "Quantity" consists of 3 sections the last of which, section "C", is about "Measurement" in terms of a generalization of the concept of number (section "A"), to include units of measurement (mass, length, time, etc.) as commensurable (dimensioned) quantities ("B" "Vector-Families").

Yet, other than *314:

"Relational real numbers are useful in applying measurement by means of real numbers to vector-families, since it is convenient to have real numbers of the same type as ratios."

I see nothing in Part VI that references anything like "relation numbers" as defined in Part IV.

Before I get into a resolution strategy, I want to add one final issue that is key to understanding relational structure in terms of measurement:

Any value that we assign to a cell in a table has what is called "measurement error".  Note, I'm talking here not of a relation (table) nor of a relationship (row), but of a relata (cell value) of that relationship.  Take, for instance, a table of velocities with time and distance columns.  Each case (row, or relationship between measured properties) has two measurements for that case: a measured distance and a measured time.  What we call "measurement error" is an estimate of the probability distribution that would prospectively obtain with repeated measurements of the same conditions.  In other words, assigning measurement error, or "fuzziness", is best thought of as imputing missing data -- those prospective measurements just mentioned.  In any rigorous attempt to deal with the fuzziness of the real world, it is important to keep in mind the relational structure of the measurements so that propagation of measurement error is understood in terms of relational composition (aka 'JOIN' to use database jargon).

Now to proceed to the resolution strategy:

Late in Russell's life he admitted he regretted Type Theory, stating it was the most arbitrary thing he and Whitehead did and that it was more of a stopgap than a theory.

As it turns out, Russell admitted this because he was relieved and delighted he lived long enough to see the matter resolved in the late 1960s book titled "The Laws of Form" by G. Spencer Brown.  The resolution was to include what logicians think of as "paradox" as a, if not the, primary foundation of mathematical logic:

Russell's Paradox (The set of all sets that don't contain themselves as members.) which motivated PM's Type Theory, is only one form of this protean "paradox".  The most Laconian form is:

"This sentence is false."

The resolution provided in GSB's LoF was to introduce the the square root of -1 as primary in mathematical logic.  This is otherwise known as the imaginary identity 'i' found throughout all of dynamical systems theory.  Dynamical systems are about changes.  In relational database terms, these are updates.  Relational updates are addition and subtraction of rows.

Under the notion of row-as-relationships-as-case, subtraction entails negative case counts.

Interestingly, negative case counts permit the emergence of something called Link Theory which Paul Allen's think tank, Interval Research supported until its demise, at which point I supported it at HP's "Internet Chapter II" project aka "eSpeak" until _its_ demise, at which point Federico Faggin (co-founder of Intel's microprocessor division) underwrote its final support at Boundary Institute.

Link Theory utilized negative case counts to provide a relational description of physics including the core of quantum mechanics -- and was therefore of interest in the quantum computing field.  This is due to the fact that quantum measurement involves projection (as do all measurements -- see my prior invocation of "That which is The Case.") that included not only ordinary probabilities, but also what are called "probability amplitudes".  Quantum probability amplitudes have complex values on the unit circle of the complex plane. Complex values have imaginary components,   Link theory accommodated QM's imaginary components with a particular symmetry used by George W. Mackey in his 1963 book "Mathematical Foundations of Quantum Mechanics" representing 'i' as a 2x2 spinor matrix:

 0  1
-1  0

See Appendix A of "Link Theory -- From Logic to Quantum Physics".

The -1 in this spinor corresponds to the negative case counts required for relational structure to encompass quantum measurement.

Federico Faggin supported this work because hardware design languages needed a formal theory other than conventional logic to model digital circuits with feedback (ie: memory, state change, etc.).  George Spencer Brown developed his mathematics as a result of inventing minimal circuits in the early days of the transistor -- and found he was working with imaginary logic values.

So, tying this all together to address the original point:  It would appear that the computer science notion of "type" is not only ill-founded -- leading to all manner of confusion regarding "the empirical world" (in Russell's apt descriptive phrase) but is recognized as being ill-founded by its founder!

My assertion is that the notion of "type" is rescued by the notion of "unit" and that "abstract type" is rescued by the notion of "dimension" within the relational paradigm. That this might be the case should be no surprise as the natural sciences (particularly physics) most rigorously address "the empirical world".

Once we accept the framework of dimensionality as relational structure, we can see, further, the potential for new modes of schema analysis based on the scientific discipline of dimensional analysis.