Optimum data objects for technical literacy
14 Old Village Road
Acton Mass 01720 USA
Tel: 978 263-3508
Development of technical literacy in most areas of science and technology has been impeded by lack of understanding of a fundamental structure in information technology -- the structure of data objects. There is an optimum structure for data objects: hierarchically interconnected pointers. It is an optimum in much the same sense that wheels should be round, pillars should be vertical, or pipes should be tubular (Lowry, 1997a, 1991).
Using optimum data objects can free workers, students, and educators from several kinds of complexity burden when working with technical information whether or not computers are used. Understanding data object structure leads to a practical "universal language supporting technical literacy." When functionality for many problem domains is expressed in terms of the same optimum data object structure, the functionality can be easily merged into one common language semantics. The result can be highly readable, precise, and complete. It can provide a basic new medium for technical communication and technical education (Lowry, 1996, 1997c).
What is a reasonable structure for data objects? While data objects are among the most important basic structures in any technology, they are so far among the most poorly understood. The situation may have no historical precedent. From arrowheads to turbine blades, the developers of technology have always had good reason to give meticulous attention to the structures of the basic components of their artifacts. Reasons for trying to understand the fine structure of information and its representations include:
Over 30 calendar years and 70 person-years of experience with the a series of languages (Lowry, 1977; Van Horn, 1985) (now called Shannon) have clarified the optimization issues. Hypotheses have been presented fairly widely to the technical community along with a $5000 offer for refutations. The response has been limited but supportive. The language as designed in 1974 provides much better simplicity of expression than anything commercially available today. The torrent of needless complexity going into schools as educational technology is largely the result of decades of corporate policies to entangle users in proprietary complexity. This could be judged as one of the most disgraceful episodes in the history of technology (Lowry, 1997b).
Simplification is basic to any engineering and central to information technology. Excess complexity damages quality in almost all dimensions and defeats a central goal of information technology: increasing the productivity of mental effort.
Improved ease of access to technical knowledge could alter assumptions about where just-in-time learning is appropriate. Use of formal language to present technical material could alter the current assumption that the main benefits of educational technology depend on expensive and rapidly changing hardware and support. Paper-based representational technology could reduce much concern about a developing "digital divide".
The simplifications and common technical language could also alter assumptions about what kinds of information are susceptible to technical representation and analysis. Law, organizational administration, equipment manuals, philosophy, etc may be become better expressed with the help of improved formal language.
Optimum data objects could be introduced prior to and as an aid to learning algebra or any technical subject matter beyond arithmetic. Some parts can be introduced earlier when instructing computers to draw simple designs. Hierarchically organized pointers can be treated as a general purpose mental playground which becomes increasingly rich, eventually encompassing models of any technical system or subject.
Hierarchically interconnected pointer data objects: Needles
Maximizing simplicity favors language semantics based on hierarchically interconnected pointers. They are referred to here as needles by analogy with pine needles which are pointed at one end and connected to trees at the other. The following hypothesis, its 7 subsidiary hypothesises, and their justifications are amplified in (Lowry, 1997a):
For a sufficiently rich set of deterministically defined applications, whenever the total complexity of a deterministic language definition plus the expression of the applications in it is minimum, then the data objects will form hierarchies where all objects are "pointers" (or needles) that point away from a parent in the hierarchy and have secondary connections to at most two siblings and two children in the hierarchy.
This hypothesis summarizes the sequence of 6 hypotheses described below.
An expression in a language based on needles could be presented in a textual form such as:
82 = count every element where some isotope of it is stable;
Examples illustrating application to descriptions of chemistry, accounting, and particle physics are available (Lowry, 1997c).
Constraints on data object structures
The main hypothesis above implies a minimization problem. Solving the problem can be broken into the application of a succession of constraints with associated hypotheses. Different kinds of excess complexity arising from different kinds of sub- optimum design seem sufficiently independent that relatively simple considerations are adequate to show how the minimization imposes the succession of constraints. The breakdown helps clarify the pros and (lack of) cons and the independence of subject matter of the structural constraints. The overly brief rationale supporting each is expanded in (Lowry, 1997a).
1. The data objects are modular, in the sense that they connect only to each other, not to any embedding space or storage space. Reason: this eliminates complications when inserting and deleting objects.
2. The only features of a data object are a finite number of connections to data objects. Reason: complexities of internal state in objects are simplified by changing them to connections to other objects.
3. The data objects are asymmetrical in their connections. Reason: this is needed for deterministic execution.
4. All objects have the same potential connection types. Reason: Combining them simplifies the language and does no harm.
The above four hypotheses are largely common sense and their correctness strongly supports the existence of some optimum data object structure for rich applications and it can be characterized in terms of the kinds of interconnection between the objects.
5. The connections allow all the objects which comprise a data state to be located in an hierarchy. A data object will have connections to immediate neighbors in the hierarchy when they exist, its parent, its successor sibling (and perhaps predecessor), its first child (and perhaps its last).
Reason: hierarchy allows simplification by coalescing information. Uniform iteration access over sets of children in the hierarchy provides "functional expressiveness", the ability to have nested expressions which operate on many large aggregates of data.
6. A given data object will have no structural features other than some subset of the following seven kinds of connection to data objects: five kinds of connection to neighbors in the hierarchy, connections to "class" objects which are used to select from among a set of children, and connections to "relatee" objects when the given object represents a relationship between some other object and the relatee. Reason: empirically, this is all that is needed.
7. A data object will have connections to immediate neighbors in the hierarchy and at most one other object. Reason: if there are more, we can simplify by decomposing the objects and rearrange them to provide useful sets of sets, a small but consistent advantage which leads to convergence on a final optimum. Reducing remote connections to one, has no effect on the existence of neighbors in the hierarchy or the need for connections to them.
A series of complexity implosions result from optimizing data objects:
Simple irreducibility optima
Optimum data objects appear to be substantively in a class with the invention of the wheel. There are a few dozen cases where optimizing an engineering design terminates with a sharply defined structural constraint which eliminates a design deficiency. For example, constraining wheels to be round eliminates vertical vibration, constraining pillars to be vertical eliminates shear forces, constraining mirrors to be flat eliminates image distortion. Such constraints often have no adverse side-effects which raise significant tradeoff issues. They usually get near-universal acceptance and they have given broad support for technological civilization for centuries. The optimization of data objects appears to fit this pattern.
Lowry, E. S. (1977). PROSE Specification. IBM Poughkeepsie Laboratory Technical Report TR 00.2902, November.
Lowry, E. S. (1991). Toward an Optimum Language Data Model. Computer Standards & Interfaces, 13, 105-108.
Lowry, E. S. (1996). Formal Language as a Medium for Technical Education. Proceedings of ED-MEDIA 96, AACE, June, 407.
Lowry, E. S. (1997a). Toward Perfect Information Microstructures. http://www.ultranet.com/~eslowry/tpim, Draft, December 11.
Lowry, E. S. (1997b). Misdirections in Information Technology. http://www.ultranet.com/~eslowry/misdirec, December 11.
Lowry, E. S. (1997c). Formal Language as a Medium for Technical Education. http://www.ultranet.com/~eslowry/edmedium, December 13, An expanded and updated version of Lowry (1996).
Van Horn, E. C. (1985). Expressing product development information in application terms. Proc. IEEE Int. Conf. Computer Design: VLSI in Computers, ICCD'85, October, 82-85.