The Codon Adaptation Index (CAI)
is the most widespread technique for analyzing
Codon usage bias. As opposed to other measures of codon usage bias, such as the '
effective number of codons Effective number of codons (abbreviated as ENC or ''Nc'') is a measure to study the state of codon usage biases in genes and genomes. The way that ENC is computed has obvious similarities to the computation of effective population size in population ...
' (Nc), which measure deviation from a uniform bias (null hypothesis), CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes. CAI is used as a quantitative method of predicting the level of expression of a gene based on its codon sequence.
Rationale
Ideally, the reference set in CAI is composed of highly expressed genes, so that CAI provides an indication of gene expression level under the assumption that there is translational selection to optimize gene sequences according to their expression levels. The rationale for this is dual: highly expressed genes need to compete for resources (i.e. ribosomes) in fast-growing organisms and it makes sense for them to be also more accurately translated. Both hypotheses lead to highly expressed genes using mostly codons for tRNA species that are abundant in the cell.
Implementation
For each amino acid in a gene, the weight of each of its codons represented by a parameter termed relative adaptiveness (), is computed from a reference sequence set, as the ratio between the observed frequency of the codon and the frequency of the most frequent synonymous codon for that amino acid.
: