The redundancy principle in

biology Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary i ...

expresses the need of many copies of the same entity (

cells Cell most often refers to: * Cell (biology), the functional basic unit of life Cell may also refer to: Locations * Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery w ...

molecule A molecule is a group of two or more atoms held together by attractive forces known as chemical bonds; depending on context, the term may or may not include ions which satisfy this criterion. In quantum physics, organic chemistry, and bioch ...

ions An ion () is an atom or molecule with a net electrical charge. The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by conven ...

) to fulfill a

biological function In evolutionary biology, function is the reason some object or process occurred in a system that evolved through natural selection. That reason is typically that it achieves some result, such as that chlorophyll helps to capture the energy of sunl ...

. Examples are numerous: disproportionate numbers of

spermatozoa A spermatozoon (; also spelled spermatozoön; ; ) is a motile sperm cell, or moving form of the haploid cell that is the male gamete. A spermatozoon joins an ovum to form a zygote. (A zygote is a single cell, with a complete set of chromosomes, ...

during

fertilization Fertilisation or fertilization (see spelling differences), also known as generative fertilisation, syngamy and impregnation, is the fusion of gametes to give rise to a new individual organism or offspring and initiate its development. Proce ...

compared to one egg, large number of

neurotransmitter A neurotransmitter is a signaling molecule secreted by a neuron to affect another cell across a synapse. The cell receiving the signal, any main body part or target cell, may be another neuron, but could also be a gland or muscle cell. Neuro ...

s released during

neuronal A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa. No ...

communication compared to the number of

receptor Receptor may refer to: * Sensory receptor, in physiology, any structure which, on receiving environmental stimuli, produces an informative nerve impulse *Receptor (biochemistry), in biochemistry, a protein molecule that receives and responds to a ...

s, large numbers of released

calcium Calcium is a chemical element with the symbol Ca and atomic number 20. As an alkaline earth metal, calcium is a reactive metal that forms a dark oxide-nitride layer when exposed to air. Its physical and chemical properties are most similar to ...

ions during transient in cells, and many more in molecular and cellular transduction or

gene activation Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are wi ...

and

cell signaling In biology, cell signaling (cell signalling in British English) or cell communication is the ability of a cell to receive, process, and transmit signals with its environment and with itself. Cell signaling is a fundamental property of all cellula ...

. This redundancy is particularly relevant when the sites of activation are physically separated from the initial position of the molecular messengers. The redundancy is often generated for the purpose of resolving the time constraint of fast-activating pathways. It can be expressed in terms of the theory of extreme statistics to determine its laws and quantify how the shortest paths are selected. The main goal is to estimate these large numbers from physical principles and mathematical derivations. When a large distance separates the source and the target (a small activation site), the redundancy principle explains that this geometrical gap can be compensated by large number. Had nature used less copies than normal, activation would have taken a much longer time, as finding a small target by chance is a

rare event Extreme value theory or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, th ...

and falls into

narrow escape problem The narrow escape problem is a ubiquitous problem in biology, biophysics and cellular biology. The mathematical formulation is the following: a Brownian particle ( ion, molecule, or protein) is confined to a bounded domain (a compartment or a c ...

Molecular rate

The time for the fastest particles to reach a target in the context of redundancy depends on the numbers and the local geometry of the target. In most of the time, it is the rate of activation. This rate should be used instead of the classical Smoluchowski's rate describing the mean arrival time, but not the fastest. The statistics of the minimal time to activation set kinetic laws in biology, which can be quite different from the ones associated to average times.

Physical models

Stochastic process

The motion of a particle located at position

X_t

can be described by the Smoluchowski's limit of the Langevin equation:

dX_t=\sqrt \, dB_t+\fracF(x)dt,

where

D

is the

diffusion coefficient Diffusivity, mass diffusivity or diffusion coefficient is a proportionality constant between the molar flux due to molecular diffusion and the gradient in the concentration of the species (or the driving force for diffusion). Diffusivity is enco ...

of the particle,

\gamma

is the

friction coefficient Friction is the force resisting the relative motion of solid surfaces, fluid layers, and material elements sliding against each other. There are several types of friction: *Dry friction is a force that opposes the relative lateral motion of t ...

per unit of mass,

F(x)

the force per unit of mass, and

B_t

is a

Brownian motion Brownian motion, or pedesis (from grc, πήδησις "leaping"), is the random motion of particles suspended in a medium (a liquid or a gas). This pattern of motion typically consists of random fluctuations in a particle's position insi ...

. This model is classically used in

molecular dynamics Molecular dynamics (MD) is a computer simulation method for analyzing the physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamic "evolution" of the ...

simulations.

Jump processes

\begin
x_=
\begin x_n-a, & \text l(x_n) \\ x_n+b, & 
\text r(x_n) 
\end
\end

, which is for example a model of

telomere A telomere (; ) is a region of repetitive nucleotide sequences associated with specialized proteins at the ends of linear chromosomes. Although there are different architectures, telomeres, in a broad sense, are a widespread genetic feature mos ...

length dynamics. Here

r(x)=\frac,

, with

r(x)+l(x)=1

Directed motion process

\dot=v_0 \bf u,

where

\bf u

is a unit vector chosen from a uniform distribution. Upon hitting an obstacle at a boundary point

X_0 \in \partial \Omega

, the velocity changes to

\dot=v_0 \bf v,

where

\bf v

is chosen on the unit sphere in the supporting half space at

X_0

from a uniform distribution, independently of

\bf u

. This rectilinear with constant velocity is a simplified model of spermatozoon motion in a bounded domain

\Omega

. Other models can be diffusion on graph, active graph motion.

Mathematical formulation: Computing the rate of arrival time for the fastest

The mathematical analysis of large numbers of molecules, which are obviously redundant in the traditional activation theory, is used to compute the in vivo time scale of stochastic chemical reactions. The computation relies on asymptotics or probabilistic approaches to estimate the mean time of the fastest to reach a small target in various geometries. With N non-interacting i.i.d. Brownian trajectories (ions) in a bounded domain Ω that bind at a site, the shortest arrival time is by definition

\tau^=\min (t_1,\ldots,t_N),

where

t_i

are the independent arrival times of the N ions in the medium. The survival distribution of arrival time of the fastest

Pr(\tau^>t)

is expressed in terms of a single particle,

Pr(\tau^>t)=Pr^N(t_1>t)

. Here

Pr\

is the survival probability of a single particle prior to binding at the target.This probability is computed from the solution of the

diffusion equation The diffusion equation is a parabolic partial differential equation. In physics, it describes the macroscopic behavior of many micro-particles in Brownian motion, resulting from the random movements and collisions of the particles (see Fick's la ...

in a domain

\Omega

\frac =D \Delta p(x,t) \hbox  x \in \Omega, t>0

\begin 
p(x,0)=&p_0(x) \hbox x \in \Omega \\
\frac(x,t) &=0 \hbox x \in \partial \Omega_r\\
p(x,t)&=0  \hbox x \in \partial \Omega_a,
\end

where the boundary

\partial \Omega

contains NR binding sites

\partial \Omega_i\subset\partial \Omega

(

\partial \Omega_a=\bigcup\limits_^\partial\Omega_i,\ \partial\Omega_r=\partial\Omega-\partial\Omega_a

). The single particle survival probability is

\Pr\ =\int\limits_ p(x,t)dx,

so that

\Pr\ = \frac\Pr\=N(\Pr\)^\Pr\limits\,

where

\Pr\=   \frac\, dS_

and

\Pr\= N_R  \frac\,dS_

. The probability density function (pdf) of the arrival time is

\oint\limits_ \frac dS_,

which gives the MFPT

\bar^=\int\limits\limits_0 ^\Pr\ dt = \int\limits_0 ^ \left \Pr\ \right N dt.

The probability

\Pr\

can be computed using short-time asymptotics of the diffusion equation as shown in the next sections.

Explicit computation in dimension 1

The short-time asymptotic of the diffusion equation is based on the ray method approximation. For an semi-interval

[0,\infty[

, the survival pdf is solution of

\begin
\frac& =D \frac
\quad\mbox x>0,\ t>0 \\
p(x,0)&=\delta(x-a)\quad\mbox\ x>0,\quad p(0,t)=0\quad\mbox t>0,
\end

that is

p(x,t) =\frac\left[\exp\left\- \exp\left\\right].

The survival probability with D=1 is

\Pr\=\int\limits\limits_^ p(x,t)\,dx=1-\frac \int\limits\limits_^e^\,du

. To compute the MFPT, we expand the complementary error function

\frac \int\limits\limits_^e^\,du =\frac\left(1-\frac+O(x^)\right)\quad\mbox\ x\gg1,

which gives

N dt \approx \int\limits\limits_0 ^ \exp\left\\, dt \approx \frac\int\limits\limits_0^ \exp \left\du

, leading (the main contribution of the integral is near 0) to

\bar^ \approx \frac\quad\mbox\ N\gg1.

This result is reminiscent of using the Gumbel's law. Similarly, escape from the interval ,ais computed from the infinite sum

p(x,t\,, \,y) =\frac\sum\limits_^ \left exp \left\ -\exp \left\ \right

.The conditional survival probability is approximated by

\quad\mbox\ t\to0

, where the maximum occurs at

\delta=

min ,a-yfor 0 (the shortest ray from y to the boundary). All other integrals can be computed explicitly, leading to

N dt \approx \int\limits\limits_0 ^ \exp\left\dt \approx \frac\quad\mbox\ N\gg1.

Arrival times of the fastest in higher dimensions

The arrival times of the fastest among many

s are expressed in terms of the shortest distance from the source S to the absorbing window A, measured by the distance

\delta_=d(S,A),

where d is the associated

Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...

. Interestingly, trajectories followed by the fastest are as close as possible from the optimal trajectories. In technical language, the associated trajectories of the fastest among N, concentrate near the optimal trajectory (shortest path) when the number N of particles increases. For a diffusion coefficient D and a window of size a, the expected first arrival times of N identically independent distributed Brownian particles initially positioned at the source S are expressed in the following asymptotic formulas :

\bar\tau^ \approx \frac, \hbox N \gg1
,

\bar \tau^ \approx  \frac, \hbox \frac\gg1,

\bar\tau^ \approx \frac, \hbox 3, \hbox \frac\gg1.

These formulas show that the expected arrival time of the fastest particle is in dimension 1 and 2, O(1/\log(N)). They should be used instead of the classical forward rate in models of activation in biochemical reactions. The method to derive formulas is based on short-time asymptotic and the Green's function representation of the Helmholtz equation. Note that other distributions could lead to other decays with respect N.

Optimal Paths

Minimizing The optimal path in large N

The optimal paths for the fastest can be found using the Wencell-Freidlin functional in the Large-deviation theory. These paths correspond to the short-time asymptotics of the diffusion equation from a source to a target. In general, the exact solution is hard to find, especially for a space containing various distribution of obstacles. The

Wiener integral In mathematics, the Wiener process is a real-valued continuous-time stochastic process named in honor of American mathematician Norbert Wiener for his investigations on the mathematical properties of the one-dimensional Brownian motion. It is o ...

representation of the pdf for a pure Brownian motion is obtained for a zero drift and diffusion tensor

\sigma=D

constant, so that it is given by the probability of a sampled path until it exits at the small window

\partial\Omega_a

at the random time T

Pr\

=[\int\limits_ \cdots \int\limits\limits_\prod_^ \frac
\exp \

where

\Delta t=t/M, t_=j\Delta t,\ x(t_)=y \hbox _j=x(t_)

in the product and T is the exit time in the narrow absorbing window

\partial\Omega_a.

Finally,

\langle\tau^\rangle=\int\limits\limits_0 ^\exp \left\ dt =\int_0 ^ \tau_ Pr\ dt,

where

S_n(y)

is the ensemble of shortest paths selected among n Brownian trajectories, starting at point y and exiting between time t and t+dt from the domain

\Omega

. The probability

Pr\

is used to show that the empirical stochastic trajectories of

S_n

concentrate near the shortest paths starting from y and ending at the small absorbing window

\partial \Omega_a

, under the condition that

\epsilon=\frac \ll 1

. The paths of

S_n(y)

can be approximated using discrete broken lines among a finite number of points and we denote the associated ensemble by

\tilde S_n(y)

. Bayes' rule leads to

Pr\=\sum_^

Pr\Pr\

where

Pr\=Pr\

is the probability that a path of

\tilde S_n(y)

exits in m-discrete time steps. A path made of broken lines (random walk with a time step

\Delta t

) can be expressed using Wiener path-integral. The probability of a Brownian path x(s) can be expressed in the limit of a path-integral with the functional:

Pr\ \approx \exp \left(-\int_^t , \dot x, ^2ds \right).

The Survival probability conditioned on starting at y is given by the Wiener representation:

S(t, x_0)= \int_ dx \int_^  (x)\exp \left(-\int_^t , \dot x, ^2ds \right),

where

(x)

is the limit Wiener measure: the exterior integral is taken over all end points x and the path integral is over all paths starting from x(0). When we consider n-independent paths

(\sigma_1,..\sigma_n)

(made of points with a time step

\Delta t

that exit in m-steps, the probability of such an event is

Pr \=  
 
 \left(\int\limits_  \cdots \int\limits_
 \int\limits_  \frac\prod_^ \exp 
\Bigg \
\right)^n

\approx \left(\frac\right)^n\int_ 
(x)\exp \Bigg \

.Indeed, when there are n paths of m steps, and the fastest one escapes in m-steps, they should all exit in m steps. Using the limit of path integral, we get heuristically the representation

Pr \= \left(\int\limits_ \cdots \int\limits_\int\limits_ \frac\prod_^ \exp ( -\frac \left[, _j-_), ^2 \right])\right)^n

\approx \int_ dx \int_^  (x)\exp (-n \int\limits_0^ \dot^2ds ) ,

where the integral is taken over all paths starting at y(0) and exiting at time

m\Delta t

. This formula suggests that when n is large, only the paths that minimize the integrant will contribute. For large n, this formula suggests that paths that will contribute the most are the ones that will minimize the exponent, which allows selecting the paths for which the energy functional is minimal, that is

E=\min_\int\limits_0^T \dot^2ds,

where the integration is taken over the ensemble of regular paths

\mathcal P_t

inside

\Omega

starting at y and exiting in

\partial \Omega_a

, defined as

\mathcal P_T=\.

This formal argument shows that the random paths associated to the fastest exit time are concentrated near the shortest paths. Indeed, the Euler-Lagrange equations for the extremal problem are the classical

geodesic In geometry, a geodesic () is a curve representing in some sense the shortest path ( arc) between two points in a surface, or more generally in a Riemannian manifold. The term also has meaning in any differentiable manifold with a connection. ...

s between y and a point in the narrow window

\partial \Omega_a

Fastest escape from a cusp in two dimensions

The formula for the fastest escape can generalize to the case where the absorbing window is located in funnel cusp and the initial particles are distributed outside the cusp. The cusp has a size

\epsilon

in the opening and a curvature R. The diffusion coefficient is D. The shortest arrival time, valid for large n is given by

\tau^ \approx  \frac.

Here

\tilde \epsilon=\frac

and c is a constant that depends on the diameter of the domain. The time taken by the first arrivers is proportional to the reciprocal of the size of the narrow target

\epsilon

. This formula is derived for fixed geometry and large n and not in the opposite limit of large n and small epsilon.

Concluding remarks

How nature sets the disproportionate numbers of particles remain unclear, but can be found using the theory of diffusion. One example is the number of neurotransmitters around 2000 to 3000 released during synaptic transmission, that are set to compensate the low copy number of receptors, so the probability of activation is restored to one. In natural processes these large numbers should not be considered wasteful, but are necessary for generating the fastest possible response and make possible rare events that otherwise would never happen. This property is universal, ranging from the molecular scale to the population level. Nature's strategy for optimizing the response time is not necessarily defined by the physics of the motion of an individual particle, but rather by the extreme statistics, that select the shortest paths. In addition, the search for a small activation site selects the particle to arrive first: although these trajectories are rare, they are the ones that set the time scale. We may need to reconsider our estimation toward numbers when punctioning nature in agreement with the redundant principle that quantifies the request to achieve the biological function.

References

{{Reflist Biology terminology