Chrisantha Fernando

A Model of Transcription Factor Binding Site Evolution in Variable Environments

Download Code Below

     
     
     
     
     
     
     
     
14th July 2007 Download Varient of Kashtan and Alon's code PNAS (Logic Gate evolution). Introducing logic gate identity mutation with N-bit OR gates added to each input of the NAND gates.
23rd June 2007 Download Control, TF length = 8, Linear T energy profile, Genome Length = 1000. Simple energy minimization function.

 

Introduction

I explore some principles of how transcription factor binding sites (for type I TFs) might evolve in changing environments. By changing environment I mean those with discontinuously variable fitness functions. For example, half the time the promotor may conferr greatest fitness when no TF is bound, and the other half of the time the promotor may confer greatest fitness when TF is maximally bound. What will be the sequence structure of promotors in such selective environments?

 

Methods

 

I use an evolutionary simulation using a microbial GA. Promotor are 100 nucleotides in length. The TF is defined as binding to an 8-mer. It binding strength is determined by a linear sum over an Eij matrix where i = position in the TFBS, and j = nucleotide type. The Eij matrix is set such that the TF binds most strongly when the TFBS has the sequence TTTTTTTT. This is quite arbitrary. Each Eij entry for j = T is 1, whereas all other Eij entries = 0. The matrix is convolved with the promotor to give a sliding window energy over the whole promotor. Issues of steric hinderence that would result from the binding of one TF preventing the binding of another TF are ignored, although I admit they may be very important for high desired TFBS densities or fine control. Mutation occurs at the rate of 10^-5 (or 0.01) mutations per site per bacterial lifetime. A population of 100 bacteria is considered.

MODIFICATION: The above methodology (23rd June 2007) is abandoned in favour of a modification of Kashtan and Alon's PNAS paper evolving logic gates under fluctuating environments. The code is available above.

 

Results

Some control experiments are shown below.

Control Experiment 1. The graph below shows the sequences observed and fitness of the population when we select for zero TF binding to the promotor. It is obvious that T nucleotides would be removed from the promotor.

[Show promotor sequence evolution every N generations]

[Show fitness graph]

Control Experiment 2. The graph below shows the sequences observed and fitness of the population when we select for maximal TF binding strength to the promotor. It is obvious that T nucleotides would be selected for at all sites in the promotor.

[Show same graphs]

Control Experiment 3. Now I show a graph of the frequency of T nucleotides obtained after a very long evolutionary period (at steady-state) as a function of desired binding strength between TF and promotor, and some samples of the promotor sequence obtained for various desired binding energies.

Variable Environment Experiment 4. The desired binding energy calculated in the convolution is varied (oscillated) in a square wave from zero to M, with period P. For each pair {M, P} the mean T composition is determined, along with the mean fitness obtained, compared to that fitness expected for perfect adaptation.

 

Biological Observation To Explain Using this Model

 

 

1. Why are TFBSs fuzzy/degenerate? Different sites for the same factor differ by 20-30% of bases. See Collado-Vides, et al Control site location and transcriptional regulation in E.Coli. Microbial Reviews 1991, 55:371-394. [My hypothesis is that these degenerate TFBSs are the result of selection for weak interaction, rather than strong binding.] Why should this be? See below...

2. Why is there sometimes more than one TFBS for the same TF on a promotor? [My hypothesis is that multiple weak TFBSs are better than one strong TFBS, for reducing transcription initiation variance.] A probability model should be produced to demonstrate / test this claim. The same principle may act in supply chains where one wishes to distribute inputs amoungst weak suppliers, rather than depend on one strong supplier, in order to minimize variance in supply? I need to learn some more probability models to prove this fact.

3. The distribution of weak TFBSs in the genome. I need to obtain this data by testing the TFBS frequency matrices the entire bacterial genome. Is this the distribution expected from a random model without selection for weak connectivity?

 

 

References

[1] H. Hoekstra and J.A. Coyne . The locus of evolution: Evo-devo and the genetics of adaptation. Evolution May 2007. (61): 995-1016.

[2] G.A. Wray. The evolutionary signiicance of cis-regulatory mutations. Nature Rev, Gen. 8:206-216. [Review of some examples of cis-regulatory mutations in animals. They happen to decrease pleiotropy, because they are co-dominant because each allele is regulated by its own associated cis-element, and so heterozygotes can experience selection pressure, whereas protein mutations that were recessive would experience less selection. Also, timing, dynamics and fine continuous control can be achieved with cis-regulation compared to mutations in protein coding regions.]

[3] Andolfatto, P. Adaptive evolution in non-coding DNA in Drosophila. Nature 437:1149-1152.

[4]. A stochastic model for the evolution of transcription factor binding site abundance. JTB 2007. Wagner, Otto, Lynch, Stadler. and Dermitzakis, E.T., Clark, A.G., 2002. Evolutionof transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover. Mol. Biol. Evol. 19, 1114–1121.

[5] A stochastic model for the evolution of transcription factor bindi ng site abundance. Wagner, Otto, Lynch, Stadler. 2007 JTB.

[6] Memory in Viral Quasispecies.

About Us | Site Map | Privacy Policy | Contact Us | ©2005 Chrisantha Fernando