Error Threshold and Evolutionary Biology

In his famous quasispecies theory Manfred Eigen presented the notion of the error threshold (or error catastrophe). Eigen’s explanation notwithstanding, it is not that simple to define what was exactly meant by “error threshold”; and up till now it is easy to find quite different definitions of this phenomenon. It is much easier to show a figure (I believe that the first who plot this stunning figure were Swetina and Schuster, 1982) which explains the concept of the error threshold without extra words. Here are three figures, showing the concept of the error threshold, and anyone asked to find “an error threshold” can do it with ease, even without detailed knowledge of the theory behind the whole story:


Figure 1.


Figure 2.


Figure 3.

Now to the theory (and to the mathematical models).

Assume that there exists a population of sequences of length L, each sequence is composed from 0s and 1s. There are 2^L different sequences. For example, if L=3 then the possible sequences are

000,\,001,\,010,\,011,\,100,\,101,\,110,\,111.

Sequences can reproduce and can mutate to different sequences. Let us assume that sequences reproduce faithfully with rates r_\sigma, where \sigma means a given sequence and mutate such that mutation rates are \mu(\sigma'\to\sigma)=\mu if the hamming distance between \sigma and \sigma' is equal to 1, and \mu(\sigma'\to\sigma)=0 otherwise. Let us use some natural ordering of all the sequences (e.g., we can use numbering such that sequence \sigma has the number, which has the binary representation \sigma). If R denotes a diagonal matrix with r_i on the main diagonal and M denote the matrix with entries m_{ij}=\mu(\sigma'\to\sigma) (m_{ii}=-L\mu), then, using the variables n_i to denote the size of the subpopulation consisting of the i-th sequences and n=(n_1,\ldots,n_{2^L}), we have the dynamical system

\dot n=(R+M)n (1)

that describes how the population evolves. System (1) is linear, the solution is n=n(0)\exp{(R+M)t}, due to the nature of the matrix all the solutions either tend to infinity or vanish. Since we are interested in the changes in the population structure, it is possible to jump from (1) to the equation for frequencies p=n/|n|, where |n|=\sum_i n_i:

\dot p=(R'+M)p, (2)

where now matrix R' has entries on the main diagonal of the form r_i-\bar{r},\,\bar{r}=\sum_i r_ip_i. System (2) is nonlinear (because of \bar{r}) and, with some technical conditions on R',\, M, has a unique equilibrium \hat{p}, which can be found as the normalized eigenvector of R+M corresponding to the maximal eigenvalue. This equilibrium was dubbed “quasispecies” by Eigen.

First note that even for modest values of L matrix R+M has very large dimension (2^L\times 2^L), therefore it is difficult to find \hat{p} even numerically. To overcome this problem it is usually supposed that the growth rates are such that sequences with the same number of 0s have the same r_i. In this case it is possible to lump system (2) to system (3) which has dimension only (L+1)\times (L+1):

\dot{q}=(R''+M'')q, (3)

where now R'' now has distinct r_i-\bar{r} on the main diagonal (now the index i=0,\ldots,L ), and matrix M'' has three diagonal structure, the diagonal over the main diagonal has the form \mu,\,2\mu,\ldots,L\mu, under the main diagonal – L\mu,\,(L-1)\mu,\ldots,\mu and the main diagonal has the entries -L\mu (what is a simple way to prove it?). For system (3) it is possible to find eigenvectors for moderate L. Assume for example that L=30 and that r_0=10,\,r_i=1,\,i=1,\ldots, L. Then, for fixed \mu it is easy numerically to find \hat{p}, and this is what exactly shown in Figures 1, 2, 3. We change the mutation rate, the equilibrium changes, and there is a sharp boundary (see the figures) when the distribution for \hat{p} becomes binomial, which means that the distribution of separate sequences (not classes) is uniform. At this point it is argued that evolution stops: we cannot discriminate by statistical analysis “good” sequences from “bad.”

Does this behavior can be observed in real life? It is frequently mentioned that the whole quasispecies theory is well adapted for simulating virus evolution, in particular¬† due to the viruses’ high mutation rates. And therefore here is the way to deal with viruses: increase mutation rates, evolution would stop, viruses go extinct.

Several points to note:

  • The error threshold exists in systems for frequencies, they are relevant only if |n|\neq 0. It is quite possible that real populations go extinct earlier the theoretical error threshold. So here is a first question: Is it possible to show that at least in some in vitro populations the error threshold occurs earlier than actual extinction?
  • The figures above are drawn for extremely unrealistic fitness landscape (this is the name to describe all r_i in two words). Of course it is possible to suggest different fitness landscapes, the easiest one (multiplicative) is just r_i=\log (1-s)^i, where 0<s<1. It is known that this fitness landscape does not have the error threshold. Next possible formula is r_i=\log (1-s)^{i^\alpha}. Here \alpha can be either more or less 1. It is known that in the former case there is error threshold and in the latter case there is not. But this conclusion is based on limiting procedures, when L\to\infty! And many pictures just do not show any sharp error threshold (compare to Figs. 1,2,3). Here is an example with s=0.1,\,\alpha=0.5,\,L=100:So here is the second question: Is it possible to find the exact conditions for an error threshold (defined in some sense) to exist for the given fitness landscape?
  • For the simple fitness landscape in Figs. 1,2,3 it is possible to find an approximate formula for the critical mutation rate that puts the limits on the lengths of the sequences. Is it possible to find such formulas for other fitness landscapes?
  • All the above discussion deals with the case when the fitness landscape is permutation invariant (i.e., we have only L+1 distinct classes). Real fitness landscapes are not. How the critical mutation rate would change if the fitness landscape does not have the permutation invariant property?

 

Advertisements

About Artem Novozhilov

I am an applied mathematician interested in studying various evolutionary processes by means of mathematical models. More on my professional activities can be found on my page https://www.ndsu.edu/pubweb/~novozhil/
This entry was posted in Evolutionary theory, Uncategorized and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s