## Error Threshold and Evolutionary Biology

In his famous quasispecies theory Manfred Eigen presented the notion of the error threshold (or error catastrophe). Eigen’s explanation notwithstanding, it is not that simple to define what was exactly meant by “error threshold”; and up till now it is easy to find quite different definitions of this phenomenon. It is much easier to show a figure (I believe that the first who plot this stunning figure were Swetina and Schuster, 1982) which explains the concept of the error threshold without extra words. Here are three figures, showing the concept of the error threshold, and anyone asked to find “an error threshold” can do it with ease, even without detailed knowledge of the theory behind the whole story:

Now to the theory (and to the mathematical models).

Assume that there exists a population of sequences of length $L$, each sequence is composed from 0s and 1s. There are $2^L$ different sequences. For example, if $L=3$ then the possible sequences are $000,\,001,\,010,\,011,\,100,\,101,\,110,\,111.$

Sequences can reproduce and can mutate to different sequences. Let us assume that sequences reproduce faithfully with rates $r_\sigma,$ where $\sigma$ means a given sequence and mutate such that mutation rates are $\mu(\sigma'\to\sigma)=\mu$ if the hamming distance between $\sigma$ and $\sigma'$ is equal to 1, and $\mu(\sigma'\to\sigma)=0$ otherwise. Let us use some natural ordering of all the sequences (e.g., we can use numbering such that sequence $\sigma$ has the number, which has the binary representation $\sigma$). If $R$ denotes a diagonal matrix with $r_i$ on the main diagonal and $M$ denote the matrix with entries $m_{ij}=\mu(\sigma'\to\sigma)$ ( $m_{ii}=-L\mu$), then, using the variables $n_i$ to denote the size of the subpopulation consisting of the $i$-th sequences and $n=(n_1,\ldots,n_{2^L})$, we have the dynamical system $\dot n=(R+M)n$ (1)

that describes how the population evolves. System (1) is linear, the solution is $n=n(0)\exp{(R+M)t}$, due to the nature of the matrix all the solutions either tend to infinity or vanish. Since we are interested in the changes in the population structure, it is possible to jump from (1) to the equation for frequencies $p=n/|n|$, where $|n|=\sum_i n_i$: $\dot p=(R'+M)p,$ (2)

where now matrix $R'$ has entries on the main diagonal of the form $r_i-\bar{r},\,\bar{r}=\sum_i r_ip_i$. System (2) is nonlinear (because of $\bar{r}$) and, with some technical conditions on $R',\, M$, has a unique equilibrium $\hat{p}$, which can be found as the normalized eigenvector of $R+M$ corresponding to the maximal eigenvalue. This equilibrium was dubbed “quasispecies” by Eigen.

First note that even for modest values of $L$ matrix $R+M$ has very large dimension ( $2^L\times 2^L$), therefore it is difficult to find $\hat{p}$ even numerically. To overcome this problem it is usually supposed that the growth rates are such that sequences with the same number of 0s have the same $r_i$. In this case it is possible to lump system (2) to system (3) which has dimension only $(L+1)\times (L+1)$: $\dot{q}=(R''+M'')q,$ (3)

where now $R''$ now has distinct $r_i-\bar{r}$ on the main diagonal (now the index $i=0,\ldots,L$ ), and matrix $M''$ has three diagonal structure, the diagonal over the main diagonal has the form $\mu,\,2\mu,\ldots,L\mu$, under the main diagonal – $L\mu,\,(L-1)\mu,\ldots,\mu$ and the main diagonal has the entries $-L\mu$ (what is a simple way to prove it?). For system (3) it is possible to find eigenvectors for moderate $L$. Assume for example that $L=30$ and that $r_0=10,\,r_i=1,\,i=1,\ldots, L$. Then, for fixed $\mu$ it is easy numerically to find $\hat{p}$, and this is what exactly shown in Figures 1, 2, 3. We change the mutation rate, the equilibrium changes, and there is a sharp boundary (see the figures) when the distribution for $\hat{p}$ becomes binomial, which means that the distribution of separate sequences (not classes) is uniform. At this point it is argued that evolution stops: we cannot discriminate by statistical analysis “good” sequences from “bad.”

Does this behavior can be observed in real life? It is frequently mentioned that the whole quasispecies theory is well adapted for simulating virus evolution, in particular  due to the viruses’ high mutation rates. And therefore here is the way to deal with viruses: increase mutation rates, evolution would stop, viruses go extinct.

Several points to note:

• The error threshold exists in systems for frequencies, they are relevant only if $|n|\neq 0$. It is quite possible that real populations go extinct earlier the theoretical error threshold. So here is a first question: Is it possible to show that at least in some in vitro populations the error threshold occurs earlier than actual extinction?
• The figures above are drawn for extremely unrealistic fitness landscape (this is the name to describe all $r_i$ in two words). Of course it is possible to suggest different fitness landscapes, the easiest one (multiplicative) is just $r_i=\log (1-s)^i$, where $0. It is known that this fitness landscape does not have the error threshold. Next possible formula is $r_i=\log (1-s)^{i^\alpha}$. Here $\alpha$ can be either more or less 1. It is known that in the former case there is error threshold and in the latter case there is not. But this conclusion is based on limiting procedures, when $L\to\infty$! And many pictures just do not show any sharp error threshold (compare to Figs. 1,2,3). Here is an example with $s=0.1,\,\alpha=0.5,\,L=100$: So here is the second question: Is it possible to find the exact conditions for an error threshold (defined in some sense) to exist for the given fitness landscape?
• For the simple fitness landscape in Figs. 1,2,3 it is possible to find an approximate formula for the critical mutation rate that puts the limits on the lengths of the sequences. Is it possible to find such formulas for other fitness landscapes?
• All the above discussion deals with the case when the fitness landscape is permutation invariant (i.e., we have only $L+1$ distinct classes). Real fitness landscapes are not. How the critical mutation rate would change if the fitness landscape does not have the permutation invariant property? 