Application of a Genetic Algorithm to Find Optimal Physical Properties of a Protocell to Maximize Steady-State Light Absorption
Download Software Here.
1. A command line C++ program that evolves the physical properties of the 4 chemicals comprising the self-maintaining spot can be downloaded here. The fitness function is defined as the integral of [W] over 20000 timesteps. Again, the chemicalNetwork.data file is necessary, and the location of that file must be specified in the world.cpp constructor.
Introduction
I use the model developed here, to evolve the phase seperation properties of a simple autocatalytic system that is capable of producing self-maintaining spots. I wish this system to exhibit maximum light absorption within the duration of the fitness assessment. In the first experiment I only alter the physical properties of a fixed chemical network. In the second experiment I introduce random new reactions that produce new particles one at a time on the surface. Novel particles are produced in proportion to the extent to which A undergoes new reactions. New reactions are undertaken by A with other particles, or can be re-arrangements of A. New reactions are most likely to occur between or within particles that have high free energies of formation. This prediliction is simulated by selecting for those systems capable of the greatest amount of light absorption, since it is these systems that will have the greatest capacity for change. A new reaction can basically be considered to be a random "mutation", i.e. the addition of a new random reactions to the chemical network. The unit of selection is the supramolecular assembly in which that new reaction occurs. Greater search through the chemical space of new reactions is undertaken by assemblies with high free energy molecules. Such assemblies must be capable of re-cycling, or steady state energy absorption. This is a natural fitness function, only the supramolecular organizations capable of maintaining high energy molecules undertake significant search in the chemical space and so reach regions of greater dynamic stability. Some 'mutations' will take the system to regions of static stability, i.e. not dissipitive stability, but these will no longer be capable of further change, other 'mutations will take the system towards regions of instability, where the organization is destroyed, so resulting in poor light absorption once again, or at least light absorption only due to fluxes at the lower level of organization.
This scenario can be explored by applying the same GA approach used to optimize flux in the well mixed reactor. The algorithm is as follows.
0. Initialize a population of surfaces. Each surface has its own chemistry with initially 0.0219 region spots. In addition, each surface is initialized with a random new reaction. The new reaction is produced according to a generative regime in the same manner as previously, e.g. species with higher free energies of formation are more likely to undergo a bimolecular reaction. Any new species are assigned completely random physical properties! Some of these bimolecular reactions may absorb light, some may release heat. In the extreme case a 1:1 GA can be used, i.e. we just have one surface, we mutate it, if the mutant is betten than the parent, we replace the parent, if not, then we mutate the original parent.
1.Each fitness trial involves assessing the integral of light absorption in a spatial system (perhaps a 1D spatial system for the sake of spead, similar dynamics should be expected on a line as on a 2D surface).
The physical properties of species will also have to be stored in the file. These can be stored in a seperate file as a matrix. Ideally we should be able to visualize the end state of each fitness assessment, i.e. it should be stored in a file so that a pastiche of end-states can be viewed all at once and analysed in detail afterwards.The code should run in the command line so that it can be ported to Clusters and run remotely. However, during development it would be useful to be able to visualize the simulations in real time.
Results
1. The first experiment allows mutation of the repulsion and attraction properties of the particles in the system. This is obviously somewhat unrealistic, however, it is intended not at this stage to model realistic kinds of variation, but to help us understand how in this particular model light absorption could be maximized. The next experiment will introduce much more realistic kinds of variation, i.e. the production of novel random reactions and species with novel random physical properties.
2. Evolution of physical parameters is difficult for two reasons, i. a single fitness assessment takes some time, and ii. a fitness assessment is noisy due to the random assignment of initial concentrations. I deal with this problem by allowing 5 fitness assessments per parameter set.
The fitness of the 1:1 GA is shown below for 4 evolutionary runs, with the corresponding repulsion matrix and final state of the surface, and the extent of light asborption. Only one trial is used. There is no trend of increasing fitness over many replication events. Fitness assesment is very noisy.
Trial 1:| X | A | R | W | |
| X | 0.001 | 0.00097 | 0.001 | 0.001 |
| A | 0.000967 | 0.00118 | 0.0219 | 0.001 |
| R | 0.001 | 0.0219 | 0.001 | 0.001 |
| W | 0.001 | 0.001 | 0.004 | 0.001428 |
Trial 2:
| X | A | R | W | |
| X | 0.00038 | 0.00106 | 0.00193 | 0.001 |
| A | 0.00106 | 0.001 | 0.0219 | 0.000811 |
| R | 0.00194 | 0.0219 | 0.00176 | 0.00136 |
| W | 0.001 | 0.00081 | 0.00136 | 0.000989 |
Trial 3:
| X | A | R | W | |
| X | 0.001 | 0.001 | 0.001 | 0.001779 |
| A | 0.001 | 0.001 | 0.02183 | 0.000129 |
| R | 0.001 | 0.0218 | 0.00144 | 0.001 |
| W | 0.001779 | 0.000128 | 0.001 | 0.001 |
Trial4:
| X | A | R | W | |
| X | 0.00047 | 0.001 | 0.001 | 0.001 |
| A | 0.001 | 0.001 | 0.0310 | 0.000898 |
| R | 0.001 | 0.03099 | 0.001 | 0.00987 |
| W | 0.001 | 0.000898 | 0.000987 | 0.000786 |

The original settings that were optimized by hand cannot really be improved upon by the GA given the range of phenotypic variation available, at least with such noisy fitness assessment. The following results show evolution with 5 trials per fitness assesment.

In light of these unimpressive results, we must question whether there is a bug in the GA code. We test the GA with a simple task that even the 1:1 GA should be able to evolve. Fitness is new defined as minimizing the integral of [A]. This can be done by reducing the extent of phase seperation between A and R. If this is not evolvable then the GA is seriously called into question. If it is working then it seems there is no better set of parameters for maximizing light absorption with this range of variation than was hand-designed in the 0.0219 regime. The code for this experiment can be downloaded here. Remember you still need to read the file chemicalReloadable4.data to get the network description. In addition we run an experiment in which we wish to minimize the integral of [W], using the code here. Both experiments were able to satisfy the fitness function. The solution is to abolish phase seperation by reducing values in the repulsion matrix to a minimum. Therefore, it appears the GA in the more difficult example earlier was legitimately not able to improve upon the hand-designed light absorption maximizing parameters, given the limited variation available.
Allowing the Stochastic Production of New Reactions on a Surface Selected for Maximizing Light Absorption.
The final modification to the software that will be made is to replace the unrealistic variation operator that exists so far, i.e. that which mutated the physical properties of the chemical network, with a more realistic operator that generates a completely new random rearrangement reaction, that may produce novel species with completely random physical properties. Once this has been done, we can test the capacity of the genetic algorithm to maximize steady state light absorption, with and without physical properties. The biochemical justification for this fitness function is that supramolecular systems with higher free energies are more likely to experience novel reactions than those with lower free energies.
The code that generates new reactions and runs the more realistic GA is downloadable here.
The results of 4 evolutionary runs using the above method are shown below. The system is more complex to analyse.
Results:
Conclusions:
A Review of the Paradigm.
1. I believe in and seek a subset of chemical systems having the property that
i. They are dissipative systems, many instances of which can form spontaneously with high probability under appropriate planetary conditions. The planetary conditions are such that a continuous flux of matter can be preserved within this system for many millions of years without depletion. This can be called the 'spontaneous recycling system (SRC)'.
ii. A further requirement of this dissipative system is that there is rich variety between instances of these dissipative systems, due to "random" novel chemical reactions that occur within each instance. This will exclude dissipative systems that are purely based on physical flux, e.g. clouds, or based on simple chemical fluxes e.g. ozone, fire, but include dissipative systems comprising organic molecules.
iii. "Selection" acts on the replicating (i.e discrete selection) or merely proliferating (i.e. continuous selection) instances of these dissipative systems (assume they are at steady state or carrying capacity), to inevitably produce the following outcomes. By selection I mean any process that can change the steady state concentration of an class of instances.
a. An instance may be transformed so as to become a less dissipative system.
b. An instance may be transformed so as to become a more dissipative system.
Either stratergy is consistant with a higher or lower steady state concentration of that class of instances. However, only the increasingly dissipitive systems are capable of accelerated variation.
iv. The former may or may not withdraw further matter from the SRC, but by definition we assume the SRC is able to maintain itself in the face of this withdrawal. The later will positively modulate the SRC, increasing the energy flux through the SRC.
iv. I assume that the extent of chemical variation (exploration of chemical space) is proportional to the Mass x Free energy of the instance of the dissipative system.
A Review of the GA Methodology. GA as an epistomological tool, where the fitness function determines what we decide to observe, in a vast range of possibilities constrained by variation.
Imagine that you had been on the earth 3.5 billion years ago. What would you have decided to look at and measure? Here is my justification for my use of a GA that answers this question. The GA is epistemological tool rather than an ontological reality. The only legitimate use of the GA should be seen as providing a means for us to concentrate our computational attention on subsurfaces of a much larger imagined surface that would otherwise be impossible to simulate. It is intended as an epistomological aid rather than an ontological model of what actually would have been. For this to be the case a modification to my previous paradigm is required. Previously I have been resetting the state of the surface at each "generation" to a random state. This is clearly incompatable with the above claimed use of the GA which demands that the 'offspring' inherits the final configuration of the 'parent', and has its fitness assesed from that point on. As Kepa pointed out, biological systems do not just maximize energy flux. I agree. I believe that a small set of chemical systems was capable of increasing its capacity for variation and hence for selection, which follows INEVITABLY from variation. To account for this I acknowledge that previously in the GA model I have said nothing of self-replication, or even proliferation. How does this relate to the GA methodology? Self-replication and proliferation would bias the contents of the surface that were present in our GA sampling method. It is best to think of the early earth to make this clear. Imagine a very large surface, with vast physical and chemical heterogenaity, and that some proportion of this surface has the capacity for sustaining something like the cycle that we have been simulating. We wish to follow the fate of the matter originally present in this cycle. We do this by assuming that random variation of the cycle would arise on this surface. These we represent as explicit variation operators in the GA. Variation here is a function not only of the chemical properties of the chemical system, but of the kinds of environment that this chemical system may experience. They involve e.g. the addition of random new reactions. A proportion of the cycles matter will be lost to equilibrium configurations. In these we are not interested and so the GA "window" will be programmed to ignore this subset. A proportion of the recycling system matter will find itself in (This reminds me of the Parable of the Sower where Jesus throws seeds randomly and some lands on furtile ground!) areas where it can utilize more light energy. And finally, a proportion of the recycling system matter will find itself capable of autocatalytic growth and hence proliferation. This subclass achieves increased NET light absorption not by changing its organization with fixed mass, but simply by increasing the mass that has the same organization. Since we assume that the capacity for variation is proportional to the steady-state Mass x Free Energy of a particular organization between variation events, we must have a means of approximating this product for 'parents' and 'offspring' in the GA. Previously I had only been measuring light absorption rate, but this fails to account for alterations to the Mass term, which should bias the GA sampling window. How do I measure the capacity for proliferation of the organization? One possibility is to increase the extent of incident light in a brief square wave in the middle of the fitness assessment and observe to what extent the extent of light absorption is able to increase as a result of this. Some chemical systems may show higher order increases in the steady state of light absorption, whereas others may only show linear increases. The fitness function would remain the integral of light absorption throughout the experiment, BUT there would be a selection pressure for the proliferation of the light absorbing organization during periods of increased incident light. This fitness assessment is intended to simulate a human observer following the behaviour of that subset of chemical systems capable of increasing their total capacity for variation and hence of selection.
A very abstract mathematical model of the above process can be produced where we begin with a range of chemical systems, some more capable of cyclic light absorption at steady state than others. Variation occurs as a function of Mass x Free Energy. Variation tends to produce a range of systems, a few being capable of autocatalysis and a few being capable of increased light absorption. What distribution of organizations will the system reach after a long period of variation?
A New Methodology: An Epistomological GA for Observing Early Chemical Evolution on Surfaces.
The following changes to the simulation are made.
1. Random removal of chemical reactions as well as random addition of chemical reactions is permitted. A version in which reactions have a 50% chance of removal as well as 50% chance of addition can be downloaded here.
2. "Offspring" inherit the surface distribuion of chemicals present at the end of the "parents" fitness assessment. Code with removal and addition of reactions, mutation of physical properties of species, and inheritance of the state of the surface at the end of the parents' fitness assessment is downloadable here.
3. In the middle of a fitness assessment there is a 5000 time-step doubling of light intensity.
Results:
You can download the visualizer for XCode that reads two files. chemicalReloadable.data, and repulsionMatrix.data and simulates the network with a random initial set of conditions, here.
It is also useful to be able to load a starting condition file, so that it is possible to re-create more realistically the conditions of the fitness assessment. The visualizer to enable this is downloadable here.
1. Visualization of the fitness in 4 experiments, along with the details of some of the solutions evolved.




There are several points of importance.
1. Fitness is very "noisy", i.e. after a good solution is found, the offspring may only be able to obtain much lower fitness. This may be becaue parents have discovered a transient means of obtaining fitness by depleting all [W] irreversibly. The offspring of a successful parent may then be left with a low flux starting condition from which it is very difficult to evolve an agent capable of beating the parent who started off with much more favourable initial conditions. To solve the problem of a hopeless inhertance, we can
i. Assess the fitness by the [W] concentration at the final time-step. (Modification 1).
ii. Revert to random initial conditions, as had been the case previously (Modification 2). This will at least produce more interpretable results and is at the very least an important control case. The aim is to later force all diffusion rates to be fixed, and to see whether the capacity for chemical species to have physical properties helps the evolution of increasing light absorption rates. This modification is not compatible with Modification 1, which should be made independently.
2. The evolved networks are all small, consisting of only 4 reactions. This may be because the timestep used is so high that the introduction of more reactions can result in failure of proper numerical integration due to the potential for higher fluxes in the network. To make sure this is not such an important factor we could reduce the size of the timestep used. (Modification 3: Reduce timestep to 0.005 from 0.01). Another possibility is that the diffusion potentials that are randomly generated are too high for the diffusion rate used. (Modification 4: A maximum limit can be set to the repulsion potential of 0.03.
Code with modification 2, 3 and 4 is downloadable here. Version 13
Code with modification 1, 3 and 4 is downloadable here. Version 14
Testing the Influence of Space and Phase Seperation in Chemical Evolution.
Hypothesis1 : We wish to test the hypothesis that if species are allowed to have physical properties that allow phase seperation, then they are more likely to self-organize light absorbing systems. This is tested by using a GA to evolve light absorbing systems with and without the capacity for establishing phase seperation. The versions 13 and 14 above are the test cases, with two different evolutionary regimes. Control cases are simply the same as these versions but where the physical properties of chemicals do not permit the establishment of phase seperation, i.e. all repulsion values are set to 0.0001.
Expected Conclusion: The reason I expect phase seperation to help is because I expect it to be more likely for the GA to find chemical reaction subsystems that are isolated from other reactions, when phase seperation is permitted. Whether this is true remains to be seen. Of-course, many random reactions will destroy the recycling capacity of the system, and we do not continue to observe these using the epistamological GA. The epistomological GA is expected to more easily (in fewer fitness assessments) be able to find highly light absorbing systems in the spatial case compared to in the well-mixed reactor.
The lineage of max fit networks in a 1:1 GA with all 4 arms of the experiment are shown below.
Optimization.
1. For rapid simulation allowing more generations a minimal grid is used consisting of a 2x2 set of patches. The solution can then be transferred to a larger grid to test whether it transfers ok.
2. If the first fitness assessment gives a score at least 10x less than the parent, then it is likely that the other 4 assessments also will give a score lower than the parent, so stop the assessments there, and do another mutation.
Version 15 contains the above modifications and can be downloaded here. It has modifications 2, 3 and 4 also (i.e. random initial conditions) It uses a 4x4 grid, and ceases further fitness assessments if the childs' first fitness assessment is less than 1/2 that of the parents fitness. Version 16 has modifications 1, 3 and 4 (i.e. inheritance of parents final condition, plus fitness assessed only on final [W].).
Results: Version 15. I.e. random initial conditions, small spatial system.
The fitness graphs and the various solutions evolved are superimposed onto the graphs. The first result is version 15, i.e. with random initial conditions, evolved on 4 x 4 grid.

Above you see that the fitness increases in approximately 4 punctuations. The graph is for a 1:1 GA and shows each fitness assessment of the child that is made. Labelling the regions of the fitness graph are pictures of the final state of the 4x4 grid. The same color map is used as previously and so only the concentration of the first three chemicals are indicated. The main point of note is that the highly fit solutions have homogeneous surfaces that have not seemed (at least with these three chemical species) to exploit phase seperation. The chemical networks associated with some of these agents is shown below.

Adaptations above: 1. Generation 103: Recycling of ab is promoted by the "catalysis" by bbbbb of ba --> ab. 2. Generation 104 The depleting reaction bbbbb-> bbb + bbbbbbb is abolished, allowing a greater recycling of ab. 3. Generation 257: This non-adaptive reaction is stabilized due to stochastic effects that give this network a higher fitness than a previously evolved network. Note that a new reaction is generated by randomly choosing two molecules from the set of species that has ever existed before in the past, EVEN if their current concentration is zero. This is somewhat unrealistic, but has allowed this reaction to be generated.

Generation 298: ba + ba --> abb + a increases the production of abb for light absorption. Generation 318: a + aab --> ab + aa increases the production of ab. Generation 341: The reaction abbb + abbb ---> aab + bbbbb is abolished. This means that bbbbb is no longer produced. This is an important motif. Old recycling loops are abolished when they are no longer needed when new recycling loops exist. Generation 569: a + ba ---> ab + a. A begins to catalyse the reaction between ba and ab.

No apparently useful adaptations are produced above. The new reactions persist because of stochastic selection effects.

Generation 1836: A major adaptation is the development of the reaction a + abbb ---> abb + ab. Note that the blue line on the right representing [abb] barely decreases freom the initial concentration.
Next we compare this to a reactor in which the offspring's fitness is assessed using the final condition of the parental reactor. In that case, fitness is assessed only at the final time-step of the experiment.
Results: Version 16. I.e. parental final conditions = offspring initial conditions, small spatial system.

The final network evolved is shown below.

ba undergoes most reactions to be converted back into ab, or directly back into abb. abb is defined as the only light absorbing species.
Results: Version 15 Control. I.e. random initial conditions, non-spatial system, all diffusion potentials = 0.000001.
The fitness in the control case is shown here.

It is interesting that after apx 9000 events no further adaptation occurs in the 1:1 GA. The maximum fitness increases to a much greater extent (7000 compared to 2000) than in the spatial case, after 10000 generations. The final network is shown below.

Note that due to on oversight in the GA, the same reaction can be evolved multiple times. This is what has happened. The solution is a simple loop, directly from ba + abbb ---> abb + abb. This is interestingly SIMPLY compared to the case where we allow the evolution of repulsion effects. Why does the well - mixed reactor GA evolve a much simpler network, that is capable of a greater rate of recycling?
To check whether this is a general property, let us observe another control case for Version 15.

This very compllex network with fitness only equal to apx 4000 shows that networks evolved in the well-mixed reactor are not always simple. Also , it seems this network cannot be very effective due to the many species sinks that arise which cannot be recycled. This is the only network in which the original reaction ab + ba --> ab + ab was not deleted.
Modification.
1. Lager species should diffuse more slowly that smaller species.
2. Early results suggest that it is no easier to evolve high light absorbing systems where phase seperation is possible, as well as reactions in a well mixed reactor. This may not be the case when catalysis is introduced, i.e. we assume that each new reaction depends to some extent on the concentration of an already exisitng moelcule in the system. This molecule may be abiotic (which is what is assumed so far), however, there is also the possibility that it is itself GENERATED BY THE NETWORK ITSELF. This is an excellent way to obtain recursive functional constraints.
3. We wish next to include the capacity for internally generated catalysis, vs. externally imposed catalysis. Of-course a mixture of these is possible, with internal molecules trapping external catalysts that have not been synthesised by the cell itself, but sequestered from environmental sources.
4. The variation operators are without proper biological basis. In the following experiment I consider an explicit selection experiment that could be conducted by a chemist, and attempt to model it. See here for details of that model.
5. There is no selection pressure for autocatalytic systems in this selection regime. I propose that if there is constant depletion of organized high energy matter, but continuous replenishment with a low energy food source, that if a high energy chemical organization is capable of being constructed by previous generations that used light energy, that even if reduced in total mass is capable of re-creating its original mass by the utilization of an abundent precursor, then there will be a selection for organizations capable of growing autocatalytically and hence regenerating light absorbing material at a greater rate than could be generated by the direct conversion of low energy material into light absorbing matter alone. The hypothesis is simple to test for plausibility in the model. The offspring inherits half the solution from its parent, and the other half of its solution as food molecule 'ab' and side-reactant molecule 'abbb', just as in the initial supply, except without the benefit of extra 'ba' (autocatalyst). This means that the system must evolve a new method of synthesising abb from ab and abbb. This could be by direct non-autocatalytic synthesis, however, this would result in the loss of concentration of species in each generation that were obtained from the parent. All abb would have to be synthesised without the utilization of potentially useful high energy molecules inherited from the parent that could be used to drive the production of abb which would be capable of further production of high energy molecules for the offspring which could then drive the further absorption of high energy molecules. However, 'ba' is actually a high energy molecule since it is one of the products of light absorption, and so perhaps we should add 'abbb', the low energy molecule. This would
Return to Evolution of Metabolism Part 5:
