Z-curve
   HOME

TheInfoList



OR:

The Z curve (or Z-curve) method is a
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
algorithm for
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
analysis Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...
. The Z-curve is a
three-dimensional Three-dimensional space (also: 3D space, 3-space or, rarely, tri-dimensional space) is a geometric setting in which three values (called ''parameters'') are required to determine the position of an element (i.e., point). This is the informal ...
curve In mathematics, a curve (also called a curved line in older texts) is an object similar to a line (geometry), line, but that does not have to be Linearity, straight. Intuitively, a curve may be thought of as the trace left by a moving point (ge ...
that constitutes a unique representation of a DNA sequence, i.e., for the Z-curve and the given DNA sequence each can be uniquely reconstructed from the other. The resulting curve has a zigzag shape, hence the name Z-curve.


Background

The Z Curve method was first created in 1994 as a way to visually map a DNA or RNA sequence. Different properties of the Z curve, such as its symmetry and periodicity can give unique information on the DNA sequence. The Z curve is generated from a series of nodes, P0, P1,...PN, with the coordinates xn, yn, and zn (n=0,1,2...N, with N being the length of the DNA sequence). The Z curve is created by connecting each of the nodes sequentially. x_ = (A_ + G_) - (C_ + T_) y_ = (A_ + C_) - (G_ + T_) z_ = (A_ + T_) - (C_ + G_) n = 0, 1, 2, ... N


Applications

Information on the distribution of nucleotides in a DNA sequence can be determined from the Z curve. The four
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules wi ...
s are combined into six different categories. The nucleotides are placed into each category by some defining characteristic and each category is designated a letter. The x, y, and z components of the Z curve display the distribution of each of these categories of bases for the DNA sequence being studied. The x-component represents the distribution of
purine Purine is a heterocyclic compound, heterocyclic aromatic organic compound that consists of two rings (pyrimidine and imidazole) fused together. It is water-soluble. Purine also gives its name to the wider class of molecules, purines, which includ ...
s and
pyrimidine Pyrimidine (; ) is an aromatic, heterocyclic, organic compound similar to pyridine (). One of the three diazines (six-membered heterocyclics with two nitrogen atoms in the ring), it has nitrogen atoms at positions 1 and 3 in the ring. The other ...
bases (R/Y). The y-component shows the distribution of amino and keto bases (M/K) and the z-component shows the distribution of strong- H bond and weak-H bond bases (S/W) in the DNA sequence. The Z-curve method has been used in many different areas of
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
research, such as
replication origin The origin of replication (also called the replication origin) is a particular sequence in a genome at which replication is initiated. Propagation of the genetic material between generations requires timely and accurate duplication of DNA by semi ...
identification,'', ab initio''
gene prediction In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functiona ...
, isochore identification,
genomic island A genomic island (GI) is part of a genome that has evidence of horizontal origins. The term is usually used in microbiology, especially with regard to bacteria. A GI can code for many functions, can be involved in symbiosis or pathogenesis, an ...
identification and
comparative genomics Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural lan ...
. Analysis of the Z curve has also been shown to be able to predict if a gene contains
intron An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of the cistron .e., gene. ...
s,


Research

Experiments have shown that the Z curve can be used to identify the replication origin in various organisms. One study analyzed the Z curve for multiple species of Archaea and found that the oriC is located at a sharp peak on the curve followed by a broad base. This region was rich in AT bases and had multiple repeats, which is expected for replication origin sites. This and other similar studies were used to generate a program that could predict the origins of replication using the Z curve. The Z curve has also been experimentally used to determine phylogenetic relationships. In one study, a novel coronavirus in China was analyzed using sequence analysis and the Z curve method to determine its phylogenetic relationship to other coronaviruses. It was determined that similarities and differences in related species can quickly by determined by visually examining their Z curves. An algorithm was created to identify the geometric center and other trends in the Z curve of 24 species of coronaviruses. The data was used to create a phylogenetic tree. The results matched the tree that was generated using sequence analysis. The Z curve method proved superior because while sequence analysis creates a phylogenetic tree based solely on coding sequences in the genome, the Z curve method analyzed the entire genome.


References


External links


The Z curve database
*{{cite web , title=Ori-Finder , publisher=Centre of Bioinformatics, Tianjin University (TUBIC) , url=http://tubic.tju.edu.cn/Ori-Finder/ — a free, web-based program for predicting "origins of replication" using Z-curves.
ENCODE threads explorer
Three-dimensional connections across the genome.
Nature (journal) ''Nature'' is a British weekly scientific journal founded and based in London, England. As a multidisciplinary publication, ''Nature'' features peer-reviewed research from a variety of academic disciplines, mainly in science and technology. ...

ZCurve
*Introduction to Z curves. http://tubic.tju.edu.cn/zcurve/introduce.php *Identify Gene Start Sites Using Z curves. http://tubic.tju.edu.cn/GS-Finder/ Bioinformatics algorithms