Method for tracking software lineage Edit

In biology, random genetic variation plays a crucial role in evolution by natural selection, because it produces functional variations in organisms which are transmitted to offspring. These functional variations have influences on success and biological reproduction, which therefore produces differential reproduction of selected genetic variations, which produces adaptive evolution of species, lineages, etc. Similar processes of variation and selection occur in other systems, and are an increasingly important area of research in theoretical and applied computer science. The relevant disciplines are known as "Artificial Life", "Evolutionary Programming", "Genetic Algorithms", "General Evolution Theory", etc.

Through the study of the genetic sequences in individual genome, biologists have determined that random variation occurs each time reproduction occurs, and that these variations propagate and accumulate though successive generations. By comparing the sequences in one individual with sequences found in other individuals, it is possible to deduce and reconstruct the historical sequence of copying errors which derived those sequences from a common ancestor. The procedures involved are amply documented and widely employed in the scientific literature, so we will only summarize some basic heuristics here.

The degree of similarity between two individuals can be used as an index of the number of copying events which intervene between them. For example, since only one copying event intervenes between parent and offspring, there will be relatively little variation between them, whereas the genomes of more distant relatives tend to be less similar, because many copying events intervene between them. With further assumptions about mutation rate one can estimate the precise number of copying events intervening between two individuals based upon use the degree of genetic dissimilarity between them.

Moreover, because genes can be distinguished by their position within a genome, analysis of the specific patterns of information shared by two individuals provides further clues to the ancestry, or copying history, of those patterns. When an unusual (or less than universal) genetic sequence shows up in two individuals in the same genomic location, it is probable that those individuals share a common ancestor, and that that ancestor bore the same trait. In this way, the genome of the common ancestor can be determined probabilistically.

Finally, by correlating this information with knowledge of confirmed individuals, and through other means, it is possible to reconstruct with a high degree of probability the historical sequence of copying errors which intervened between individuals with similar, but non-identical genomes. The genetic history of a lineage can thus be reconstructed.

Those familiar with biology and biotechnology will know that through methods of the sort sketched above, and through other techniques with similar bases in biology and mathematics, it is possible to reconstruct biological pedigrees with a high degree of accuracy based on very limited samples of populations. The techniques work even though genome size remains constant, and even though the genomes do not contain a systematic record of their own pedigrees.

Although these biological techniques have been developed for the analysis of relatively "messy" biological systems, they can be applied to any system in which idiosyncratic patterns of information accumulate within a reproducing lineage.

We disclose several methods of achieving these ends, and disclose further methods which eliminate the need for retrieval of complete software-instances from the field. Another elaboration will couple the random error mechanism with a mechanism of selection in order to increase the fitness of products to their environment, the usefulness of products to their customers, and the profitability of these products to their creators. Finally, a last elaboration will allow vendors to use the information gathered in order to modify the characteristics of already-released software.

kljh Edit

Under an additional disclosure in the next paragraph, non-appendation-based embodiments can alternatively be used to create, accrue and collate lineage information. These methods do not bear the burden of a Variable Portion which grows open-endedly. Rather, the system created is more like that of organisms with fixed-size genome. Under this scheme, as in biology, the inferences one came make are more probabilistic than deterministic. As in biology also, the dynamic of random variation and natural selection in this scheme gives rise to a variety of scientifically interesting, and commercially useful, phenomena.

When mutations are desired, a random or idiosyncratically chosen bit in the Variable Portion is set to its opposite state (0 to 1, or 1 to 0). As a result, copies of this particular program-copy are identifiable by the particular pattern which results. If one of these copies is made to mutate on a further occasion, another randomly selected bit in the Variable Portion will be flipped, and the descendants of that "lineage" will be identifiable by the particular sequence of bits produced by those two random events.

Phrases such as "random or idiosyncratic" should be clarified. As used in biology and in many computer implementations of "random number generators" the term does not necessarily refer to genuinely non-deterministic events. Random number generators for example, often generate pseudorandom numbers by way of deterministic algorithms in essentially error-free hardware. The important point is that "random mutations" in biology "random numbers" in computers and "random errors" in the present invention are idiosyncratic and unpredictable in the context of the data structures to which they are introduced (genome, number stream, or Variable Portion). In the context of the present disclosure, it is important only that when two identical programs are "mutated" the particular bit or bits chosen for flipping will usually differ. Randomness, pseudo- or otherwise, is only one way of achieving this end. Another way, using the technology of our co-pending invention would be to use a fingerprint of the user's computer as a modulus for selecting the bit to be flipped from the range of available bits. Another way, would be to have the Central Database decide which bit to flip, based upon a statistical analysis of all prior cases, so as to minimize ambiguities.

However achieved, random or idiosyncratic mutations in the Variable Portion of a product provide the program with a "genome" which will yield to the kinds of lineage analyses developed in biology, and sketched above. By sampling the genome of individual programs obtained from the "field," it will thus be possible to determine which instances are copies of the originally-distributed mint copy, which instances are copies of "first generation" registered offspring of the mint-copy, and so on. It will be possible to derive pedigrees. By correlating those pedigrees with independent information about the distribution of those pedigrees in space and time. It will be possible to draw inferences about the temporal-spatial niches which favor high rates of purchasing, copying, pedigree-branching, and so on. The methodologies involved were reviewed above, and are well-documented in the literature of biology, ecology, evolutionary systematics, etc.

These inferences are necessarily probabilistic and subject to error, but they are extremely powerful nonetheless. To illustrate one source of error, consider that a particular bit might be selected for flipping on two separate occasions in the history of a particular lineage, and that the two flips could cancel each other out. This weakness can be addressed in a variety of ways. Since the "Variable Portion" of the mint copy can be known, the software which does the bit-flipping could be programmed to avoid this case by selecting randomly among as-yet-unflipped bits only. Even without this fix, however, the resolving power of this system is directly related to the size of the Variable Portion, and substantial resolving power could be achieved with a Variable Portion of only a few thousand bytes. Furthermore, by partitioning the Variable Portion into distinct regions, even greater resolving power can be achieved. For example if the space allocated to the Variable Portion is increased from 1000 to 2000, the chance of such "collisions" is halved. But if the 2000 bit Variable Portion were also partitioned into two regions, with each region undergoing one random mutation per Event, then the chance of a reversal goes from 1 in one thousand to 1 in one million. Other methods of preventing or resolving ambiguities would be to have the Central Database assign or reassign non-unique mutations based while communicating with the software-instance.

Even if ambiguities exist in population of Variable Portions, sophisticated analytical algorithms could resolve such ambiguity by using correlative information such as the time and place from which the copy in question was acquired. Customers could also be queried when such ambiguities are discovered.

It should thus be understood that the system need not be completely reliable in order to be of great utility, that the embodiments just described are only a simple example, and that many variations and improvements can be envisioned which would fall within the scope of the present invention.

Natural Selection, Artificial Selection, and Gene Therapy.

Natural selection as successful algorithm Edit "....Natural selection is the most successful algorithm known for the generation of solutions to problems. Some philosophers of science characterize the algorithm in quite general terms--the differential reproduction of randomly generated successful variations--and assert that it is the only solution-generating algorithm there can be....."

"...the differential reproduction of randomly generated successful variations-...".

Which reduces to: Successful variations replicate. "replicate" and "successful" alludes to the same fact, it says the same thing twice. Natural selection as term is irrelevant to the underlying Aristotelian tautological fallacy.

asdf Edit

Be that as it may, we will now disclose a method by which the variations embedded in the Variable Portion of a product can be the basis for a natural-selection-like process which can be directed toward solutions to problems which include, but are not limited to, the maximizing of sales and the tendency to be copied.

Some of the data in the Variable Portion of a product can be made to encode parameters which affect the utility or attractiveness of that product. This is a standard technique in the branch of computer science known as genetic algorithms and evolutionary programming. In the present context, the designer of the product would probably want to constrain the executing program's use of those parameters carefully, so that mutations could not have unacceptable or fatal effects. But even within such constraints, there are many ways this might be done.

Consider the case of a computer program which, if it is operating in demo mode, runs for a certain number of minutes and then requires the user to either make a purchase or restart the program. From the Vendor's point of view, the optimal number of minutes would give the user enough time to evaluate and appreciate the program, but not so much time as to reduce the probability of purchasing. It may be difficult for the software designers to identify the optimal number of minutes, and in fact the optimal value may well depend upon the market in which the product is being distributed. The present invention addresses many such cases in which it would be desirable for digital products to adapt themselves to local circumstances without direct intervention by the designer.

Because the parameter settings are encoded in the Variable Portion of the product, occasional mutations will cause those parameter settings to vary from one software-instance to another in the field. By definition, and by the logic of natural selection, software-instances with parameter settings which are more conducive to copying in a given environment, will tend to be copied more often and will therefore tend be more widely represented in the field. Thus, simply by encoding some functional parameters of the product in the product's Variable Portion, a process very much like natural selection will tend to occur wherever multiple instances of a program tend to proliferate. The adaptive process will be efficient only if the mutation rate is not so high as to degrade the influence of selective factors, and a variety of other possible adjustments and embellishments can readily be gleaned from the extensive literature on genetic algorithms and evolutionary programming. But the foregoing presentation should be sufficient to demonstrate that that literature has been made relevant and applicable by the invention here disclosed.

This invention is scientifically valuable because it extends the theory and technology of selection theory to the field of software distribution, and it is commercially valuable because it provides a means by which products which thrive on copying can automatically adjust themselves so as to promote their own reproduction. It should also be noted that the applicability of the invention is not restricted to products which are meant to be purchased. It is applicable to software whose purpose is the presentation of commercial messages, because the more widely distributed the software is, the more effective it will be. And for similar reasons it is applicable to non-commercial software which is simply more useful when it is ubiquitous, e.g., a "positive computer virus" released by network administrators whose function was to somehow facilitate network traffic. Many other domains of applicability exist as well, and are intended to fall within the scope of the present invention.

However, from the point of view of a Vendor of a product intended for purchase, the invention just disclosed will be most useful insofar as the parameter values which promote copying also promote purchasing. This may not be the case. For example, in the case of the program which waits N minutes before requiring that the user purchase or restart, a high value of N might maximize copying but minimize purchasing. In that case, the natural selection process (which promotes copying, not "goodness") would actually work against the Vendor's true interest. The following paragraph shows how many of the inventions disclosed so far can be used in concert.

If the sampling process is yoked to purchasing events (as has often been posited above for expositional reasons only) than the Central Database will be able to directly monitor purchasing events rather than copying events per se. However we will now disclose (1) a way in which copying events can be monitored directly and (2) a way in which copying events can be monitored. indirectly.

Copying events can be monitored directly as follows, and as illustrated in FIG. 1. Let the product store a trace of its footprint, physical location, or context in an Auxiliary Region of the Variable Portion, and let its footprint be empirically checked dynamically each time the program runs. If the footprint determined empirically differs from the footprint stored in the Auxiliary Region and if no purchasing event has also occurred, then the program has been copied (or moved) from a former location. Record that event in a Central Database, in an auxiliary region, etc. and update the stored footprint value. Such techniques could be elaborated and implemented in a variety of ways all of which fall within the scope of the present invention.

Copying events can be monitored indirectly as follows. Suppose that purchasing events produce mutations, but copying events do not. Lineages whose members tend to promote purchasing rather than copying will tend to be deep rather than wide-fewer copies of an individual will be made, but those which are made will tend to generate descendant variants through purchases. Lineages whose members tend to promote copying more than purchasing will therefore tend to be wide rather than deep-that is, individuals well tend to have more siblings than grandchildren. Thus by identifying lineages which are wide vs deep, and then examining the parameter values associated with these lineages, it would be possible to identify parameter values which promote purchasing as well as copying.

Once those parameters were identified, the vendor who wished to maximize sales could release a new version of the program with fixed rather than Randomizable settings on those parameters which maximize promote purchases. Alternatively, an ancillary invention disclosed in the following paragraph could be employed.

During the purchasing process, a channel of communication is established between the user's program and the vending system. As exploited elsewhere in this invention, the channel is bi-directional. Some information flows from customer to vending system: customer information, context information, and Variable Portion information. Information also flows from vending system to customer: the password and potentially, as disclosed now, other information which could be used to reset or reprogram the software-instance being purchased. Specifically if the Central System identified a software-instance with evolved characteristics known to be at variance with the desires of the vendor, it could be used to transmit Vendor-selected values to the software-instance, and also transmit a code which would protect values from mutation in the future. These settings would then be stably propagated when copies of the program were redistributed. The information transmitted from the vending system to the software-instance could be embedded in the password, or it could be transmitted as a separate piece of information. It could be transmitted with or without the active participation of the customer. Thus the invention disclosed is quite general, and the embodiments described merely illustrate a few of the ways in which the present disclosures might be used in practice. As an example, and as previously noted, it should not be supposed that the purchase-based sampling process is the only one which could be used to allow vendors to set parameters on their products after those products have been released. Network-aware applications of the sort which are now common on the global internet can, in seconds, exchange information with servers located anywhere in the world, and it would therefore be possible for software-instances to get or give information to the Vendor whenever they are executed and not just when they are purchased. The present disclosures thus apply to any method by which information gathered from populations of variant software-instances is used to set parameters in already-released copies of that software.

Summary: Ramifications and Scope

The present invention increases the convergence, relevance and mutual benefit of computer science, evolutionary biology, economics, and software marketing, and software engineering to each other. Many variations on, and permutations of the disclosures herein can be envisioned, and so the examples, embodiments, and specificities above should not be constued as limiting the scope of the invention, but merely providing illustration of the presently preferred embodiments of this invention. For example, the methods disclosed could be applied to copy-instances not usually thought of as software such as music CDs or photocopied materials; mutations might be desirable under occasions of interest such as software execution as well as the occasions of copying and purchasing discussed above; and so on.

Thus the scope of the invention should be determined by the appended claims and their legal equivalents, rather than by the examples given.

Community content is available under CC-BY-SA unless otherwise noted.