Non-coding RNA
A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non- ...
s have been discovered using both experimental and
bioinformatic
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
approaches. Bioinformatic approaches can be divided into three main categories. The first involves
homology search
Homology may refer to:
Sciences
Biology
*Homology (biology), any characteristic of biological organisms that is derived from a common ancestor
*Sequence homology, biological homology between DNA, RNA, or protein sequences
*Homologous chromo ...
, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
, and are thus able to discover entirely new kinds of ncRNAs.
Discovery by homology search
Homology search refers to the process of searching a
sequence database for RNAs that are similar to already known RNA sequences. Any algorithm that is designed for homology search of nucleic acid sequences can be used, e.g.,
BLAST
Blast or The Blast may refer to:
*Explosion, a rapid increase in volume and release of energy in an extreme manner
*Detonation, an exothermic front accelerating through a medium that eventually drives a shock front
Film
* ''Blast'' (1997 film), ...
. However, such algorithms typically are not as sensitive or accurate as algorithms specifically designed for RNA.
Of particular importance for RNA is its conservation of a
secondary structure
Protein secondary structure is the three dimensional form of ''local segments'' of proteins. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary struct ...
, which can be modeled to achieve additional accuracy in searches. For example,
Covariance models can be viewed as an extension to a
profile hidden Markov model
Profile or profiles may refer to:
Art, entertainment and media Music
* ''Profile'' (Jan Akkerman album), 1973
* ''Profile'' (Githead album), 2005
* ''Profile'' (Pat Donohue album), 2005
* ''Profile'' (Duke Pearson album), 1959
* '' ''Profi ...
that also reflects conserved secondary structure. Covariance models are implemented in the Infernal software package.
Discovery of specific types of ncRNAs
Some types of RNAs have shared properties that algorithms can exploit. For example, tRNAscan-SE is specialized to finding
tRNA
Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino a ...
s. The heart of this program is a tRNA homology search based on covariance models, but other tRNA-specific search programs are used to accelerate searches.
The properties of
snoRNA
In molecular biology, Small nucleolar RNAs (snoRNAs) are a class of small RNA molecules that primarily guide chemical modifications of other RNAs, mainly ribosomal RNAs, transfer RNAs and small nuclear RNAs. There are two main classes of snoRNA, ...
s have enabled the development of programs to detect new examples of snoRNAs, including those that might be only distantly related to previously known examples. Computer programs implementing such approaches include snoscan and snoReport.
Similarly, several algorithms have been developed to detect
microRNA
MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. m ...
s. Examples include miRNAFold and miRNAminer
Discovery by general properties
Some properties are shared by multiple unrelated classes of ncRNA, and these properties can be targeted to discover new classes. Chief among them is the conservation of an RNA secondary structure. To measure conservation of secondary structure, it is necessary to somehow find homologous sequences that might exhibit a common structure. Strategies to do this have included the use of BLAST between two sequences
or multiple sequences, exploited synteny via orthologous genes
or used
locality sensitive hashing In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Since ...
in combination with sequence and structural features.
Mutations that change the
nucleotide
Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecul ...
sequence, but preserve secondary structure are called
covariation, and can provide evidence of conservation. Other statistics and probabilistic models can be used to measure such conservation. The first ncRNA discovery method to use structural conservation was QRNA,
which compared the probabilities of an alignment of two sequences based on either an RNA model or a model in which only the primary sequence conserved. Work in this direction has allowed for more than two sequences and included phylogenetic models, e.g., with EvoFold. An approach taken in RNAz involved computing statistics on an input multiple-sequence alignment. Some of these statistics relate to structural conservation, while others measure general properties of the alignment that could affect the expected ranges of the structural statistics. These statistics were combined using a
support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories ...
.
Other properties include the appearance of a
promoter to transcribe the RNA. ncRNAs are also often followed by a
Rho-independent transcription terminator.
Using a combination of these approaches, multiple studies have enumerated candidate RNAs, e.g.,
Some studies have proceeded to manual analysis of the predictions to find a details structural and functional prediction.
See also
*
AbiF RNA motif
*
ARRPOF RNA motif
*
CyVA-1 RNA motif
The CyVA-1 RNA motif is a conserved RNA structure that was discovered by bioinformatics.
CyVA-1 motifs are found in Cyanobacteria, Acidobacteriota, and Verrucomicrobiota. Only one example of the RNA is known in any Acidobacterial organism, a ...
*
List of RNA structure prediction software
References
{{Bioinformatics
Non-coding RNA
Bioinformatics