Structural bioinformatics
   HOME

TheInfoList



OR:

Structural bioinformatics is the branch of
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
that is related to the analysis and prediction of the three-dimensional structure of biological
macromolecules A macromolecule is a very large molecule important to biophysical processes, such as a protein or nucleic acid. It is composed of thousands of covalently bonded atoms. Many macromolecules are polymers of smaller molecules called monomers. The ...
such as
proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term ''structural'' has the same meaning as in
structural biology Structural biology is a field that is many centuries old which, and as defined by the Journal of Structural Biology, deals with structural analysis of living material (formed, composed of, and/or maintained and refined by living cells) at every le ...
, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.


Introduction


Protein structure

The structure of a protein is directly related to its function. The presence of certain chemical groups in specific locations allows proteins to act as
enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products ...
s, catalyzing several chemical reactions. In general, protein structures are classified into four levels:
primary Primary or primaries may refer to: Arts, entertainment, and media Music Groups and labels * Primary (band), from Australia * Primary (musician), hip hop musician and record producer from South Korea * Primary Music, Israeli record label Works ...
(sequences),
secondary Secondary may refer to: Science and nature * Secondary emission, of particles ** Secondary electrons, electrons generated as ionization products * The secondary winding, or the electrical or electronic circuit connected to the secondary winding i ...
(local conformation of the polypeptide chain),
tertiary Tertiary ( ) is a widely used but obsolete term for the geologic period from 66 million to 2.6 million years ago. The period began with the demise of the non-avian dinosaurs in the Cretaceous–Paleogene extinction event, at the start ...
(three-dimensional structure of the protein fold), and
quaternary The Quaternary ( ) is the current and most recent of the three periods of the Cenozoic Era in the geologic time scale of the International Commission on Stratigraphy (ICS). It follows the Neogene Period and spans from 2.58 million year ...
(association of multiple polypeptide structures). Structural bioinformatics mainly addresses interactions among structures taking into consideration their space coordinates. Thus, the primary structure is better analyzed in traditional branches of bioinformatics. However, the sequence implies restrictions that allow the formation of conserved local conformations of the polypeptide chain, such as
alpha-helix The alpha helix (α-helix) is a common motif in the secondary structure of proteins and is a right hand- helix conformation in which every backbone N−H group hydrogen bonds to the backbone C=O group of the amino acid located four residues ...
, beta-sheets, and loops (secondary structure). Also, weak interactions (such as
hydrogen bond In chemistry, a hydrogen bond (or H-bond) is a primarily electrostatic force of attraction between a hydrogen (H) atom which is covalently bound to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a l ...
s) stabilize the protein fold. Interactions could be intrachain, i.e., when occurring between parts of the same protein monomer (tertiary structure), or interchain, i.e., when occurring between different structures (quaternary structure).


Structure visualization

Protein structure visualization is an important issue for structural bioinformatics. It allows users to observe static or dynamic representations of the molecules, also allowing the detection of interactions that may be used to make inferences about molecular mechanisms. The most common types of visualization are: * Cartoon: this type of protein visualization highlights the secondary structure differences. In general,
α-helix The alpha helix (α-helix) is a common motif in the secondary structure of proteins and is a right hand-helix conformation in which every backbone N−H group hydrogen bonds to the backbone C=O group of the amino acid located four residues ...
is represented as a type of screw, β-strands as arrows, and
loop Loop or LOOP may refer to: Brands and enterprises * Loop (mobile), a Bulgarian virtual network operator and co-founder of Loop Live * Loop, clothing, a company founded by Carlos Vasquez in the 1990s and worn by Digable Planets * Loop Mobile, an ...
s as lines. * Lines: each amino acid residue is represented by thin lines, which allows a low cost for graphic rendering. * Surface: in this visualization, the external shape of the molecule is shown. * Sticks: each covalent bond between amino acid atoms is represented as a stick. This type of visualization is most used to visualize interactions between
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
s...


DNA structure

The classic DNA duplexes structure was initially described by
Watson and Crick "Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid" was the first article published to describe the discovery of the double helix structure of DNA, using X-ray diffraction and the mathematics of a helix transfor ...
(and contributions of
Rosalind Franklin Rosalind Elsie Franklin (25 July 192016 April 1958) was a British chemist and X-ray crystallographer whose work was central to the understanding of the molecular structures of DNA (deoxyribonucleic acid), RNA (ribonucleic acid), viruses, ...
). The DNA molecule is composed of three substances: a
phosphate In chemistry, a phosphate is an anion, salt, functional group or ester derived from a phosphoric acid. It most commonly means orthophosphate, a derivative of orthophosphoric acid . The phosphate or orthophosphate ion is derived from phosph ...
group, a
pentose In chemistry, a pentose is a monosaccharide (simple sugar) with five carbon atoms. The chemical formula of many pentoses is , and their molecular weight is 150.13 g/mol.adenine Adenine () ( symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its deriv ...
,
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidin ...
,
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached ( ...
, or
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is ...
). The DNA double helix structure is stabilized by hydrogen bonds formed between base pairs: adenine with thymine (A-T) and cytosine with guanine (C-G). Many structural bioinformatics studies have focused on understanding interactions between DNA and small molecules, which has been the target of several drug design studies.


Interactions

Interactions are contacts established between parts of molecules at different levels. They are responsible for stabilizing protein structures and perform a varied range of activities. In
biochemistry Biochemistry or biological chemistry is the study of chemical processes within and relating to living organisms. A sub-discipline of both chemistry and biology, biochemistry may be divided into three fields: structural biology, enzymology and ...
, interactions are characterized by the proximity of atom groups or molecules regions that present an effect upon one another, such as
electrostatic forces Coulomb's inverse-square law, or simply Coulomb's law, is an experimental law of physics that quantifies the amount of force between two stationary, electrically charged particles. The electric force between charged bodies at rest is convention ...
,
hydrogen bond In chemistry, a hydrogen bond (or H-bond) is a primarily electrostatic force of attraction between a hydrogen (H) atom which is covalently bound to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a l ...
ing, and
hydrophobic effect The hydrophobic effect is the observed tendency of nonpolar substances to aggregate in an aqueous solution and exclude water molecules. The word hydrophobic literally means "water-fearing", and it describes the segregation of water and nonpolar ...
. Proteins can perform several types of interactions, such as protein-protein interactions (PPI), protein-peptide interactions, protein-ligand interactions (PLI), and protein-DNA interaction.


Calculating contacts

Calculating contacts is an important task in structural bioinformatics, being important for the correct prediction of protein structure and folding, thermodynamic stability, protein-protein and protein-ligand interactions, docking and molecular dynamics analyses, and so on. Traditionally, computational methods have used threshold distance between atoms (also called cutoff) to detect possible interactions. This detection is performed based on Euclidean distance and angles between atoms of determined types. However, most of the methods based on simple Euclidean distance cannot detect occluded contacts. Hence, cutoff free methods, such as
Delaunay triangulation In mathematics and computational geometry, a Delaunay triangulation (also known as a Delone triangulation) for a given set P of discrete points in a general position is a triangulation DT(P) such that no point in P is inside the circumcircle o ...
, have gained prominence in recent years. In addition, the combination of a set of criteria, for example, physicochemical properties, distance, geometry, and angles, have been used to improve the contact determination.


Protein Data Bank (PDB)

The Protein Data Bank (PDB) is a database of 3D structure data for large biological molecules, such as
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
s, DNA, and RNA. PDB is managed by an international organization called the Worldwide Protein Data Bank ( wwPDB), which is composed of several local organizations, as. PDBe, PDBj, RCSB, and BMRB. They are responsible for keeping copies of PDB data available on the internet at no charge. The number of structure data available at PDB has increased each year, being obtained typically by
X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
,
NMR spectroscopy Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique to observe local magnetic fields around atomic nuclei. The sample is placed in a magnetic fi ...
, or
cryo-electron microscopy Cryogenic electron microscopy (cryo-EM) is a cryomicroscopy technique applied on samples cooled to cryogenic temperatures. For biological specimens, the structure is preserved by embedding in an environment of vitreous ice. An aqueous sample so ...
.


Data format

The PDB format (.pdb) is the legacy textual file format used to store information of three-dimensional structures of macromolecules used by the Protein Data Bank. Due to restrictions in the format structure conception, the PDB format does not allow large structures containing more than 62 chains or 99999 atom records. The PDBx/ mmCIF (macromolecular Crystallographic Information File) is a standard text file format for representing crystallographic information. Since 2014, the PDB format was substituted as the standard PDB archive distribution by the PDBx/mmCIF file format (.cif). While PDB format contains a set of records identified by a keyword of up to six characters, the PDBx/mmCIF format uses a structure based on key and value, where the key is a name that identifies some feature and the value is the variable information.


Other structural databases

In addition to the Protein Data Bank (PDB), there are several databases of protein structures and other macromolecules. Examples include: *
MMDB The Molecular Modeling Database (MMDB) is a database of experimentally determined three-dimensional biomolecular structures and hosted by the National Center for Biotechnology Information. See also * Protein structure Protein structure is th ...
: Experimentally determined three-dimensional structures of biomolecules derived from Protein Data Bank (PDB). * Nucleic acid Data Base (NDB): Experimentally determined information about nucleic acids (DNA, RNA). * Structural Classification of Proteins (SCOP): Comprehensive description of the structural and evolutionary relationships between structurally known proteins. * TOPOFIT-DB: Protein structural alignments based on the TOPOFIT method. * Electron Density Server (EDS): Electron-density maps and statistics about the fit of crystal structures and their maps. *
CASP Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP prov ...
: Prediction Center Community-wide, worldwide experiment for protein structure prediction
CASP Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP prov ...
. * PISCES server for creating non-redundant lists of proteins: Generates PDB list by sequence identity and structural quality criteria. * The Structural Biology Knowledgebase: Tools to aid in protein research design. *
ProtCID The Protein Common Interface Database (ProtCID) is a database of similar protein-protein interfaces in crystal structures of homologous proteins. Its main goal is to identify and cluster homodimeric and heterodimeric interfaces observed in mult ...
: The Protein Common Interface Database Database of similar protein-protein interfaces in crystal structures of homologous proteins. *
AlphaFold AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. The program is designed as a deep learning system. AlphaFold AI software has had two major ve ...
:AlphaFold - Protein Structure Database.


Structure comparison


Structural alignment

Structural alignment Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large R ...
is a method for comparison between 3D structures based on their shape and conformation. It could be used to infer the evolutionary relationship among a set of proteins even with low sequence similarity. Structural alignment implies superimposing a 3D structure over a second one, rotating and translating atoms in corresponding positions (in general, using the ''Cα'' atoms or even the backbone heavy atoms ''C'', ''N'', ''O'', and ''Cα''). Usually, the alignment quality is evaluated based on the root-mean-square deviation (RMSD) of atomic positions, ''i.e.'', the average distance between atoms after superimposition: : \mathrm=\sqrt where ''δi'' is the distance between atom ''i'' and either a reference atom corresponding in the other structure or the mean coordinate of the ''N'' equivalent atoms. In general, the RMSD outcome is measured in
Ångström The angstromEntry "angstrom" in the Oxford online dictionary. Retrieved on 2019-03-02 from https://en.oxforddictionaries.com/definition/angstrom.Entry "angstrom" in the Merriam-Webster online dictionary. Retrieved on 2019-03-02 from https://www.m ...
(Å) unit, which is equivalent to 10−10 m. The nearer to zero the RMSD value, the more similar are the structures.


Graph-based structural signatures

Structural signatures, also called fingerprints, are
macromolecule A macromolecule is a very large molecule important to biophysical processes, such as a protein or nucleic acid. It is composed of thousands of covalently bonded atoms. Many macromolecules are polymers of smaller molecules called monomers. The ...
pattern representations that can be used to infer similarities and differences. Comparisons among a large set of proteins using
RMSD The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The RMSD represents ...
still is a challenge due to the high computational cost of structural alignments. Structural signatures based on graph distance patterns among atom pairs have been used to determine protein identifying vectors and to detect non-trivial information. Furthermore, linear algebra and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
can be used for clustering protein signatures, detecting protein-ligand interactions, predicting ΔΔG, and proposing mutations based on
Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore ...
.


Structure prediction

The atomic structures of molecules can be obtained by several methods, such as X-ray crystallography (XRC),
NMR spectroscopy Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique to observe local magnetic fields around atomic nuclei. The sample is placed in a magnetic fi ...
, and 3D electron microscopy; however, these processes can present high costs and sometimes some structures can be hardly established, such as
membrane protein Membrane proteins are common proteins that are part of, or interact with, biological membranes. Membrane proteins fall into several broad categories depending on their location. Integral membrane proteins are a permanent part of a cell membrane ...
s. Hence, it is necessary to use computational approaches for determining 3D structures of macromolecules. The structure prediction methods are classified into comparative modeling and de novo modeling.


Comparative modeling

Comparative modeling, also known as homology modeling, corresponds to the methodology to construct three-dimensional structures from an
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
sequence of a target protein and a template with known structure. The literature has described that evolutionarily related proteins tend to present a conserved three-dimensional structure. In addition, sequences of distantly related proteins with identity lower than 20% can present different folds.


''De novo'' modeling

In structural bioinformatics, ''de novo'' modeling, also known as ''ab initio'' modeling, refers to approaches for obtaining three-dimensional structures from sequences without the necessity of a homologous known 3D structure. Despite the new algorithms and methods proposed in the last years, de novo protein structure prediction is still considered one of the remain outstanding issues in modern science.


Structure validation

After structure modeling, an additional step of structure validation is necessary since many of both comparative and 'de novo' modeling algorithms and tools use
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
s to try assembly the 3D structure, which can generate many errors. Some validation strategies consist of calculating energy scores and comparing them with experimentally determined structures. For example, the DOPE score is an energy score used by the MODELLER tool for determining the best model. Another validation strategy is calculating φ and ψ backbone dihedral angles of all residues and construct a
Ramachandran plot In biochemistry, a Ramachandran plot (also known as a Rama plot, a Ramachandran diagram or a †,ψplot), originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regions ...
. The side-chain of
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
s and the nature of interactions in the backbone restrict these two angles, and thus, the visualization of allowed conformations could be performed based on the
Ramachandran plot In biochemistry, a Ramachandran plot (also known as a Rama plot, a Ramachandran diagram or a †,ψplot), originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regions ...
. A high quantity of amino acids allocated in no permissive positions of the chart is an indication of a low-quality modeling.


Prediction tools

A list with commonly used software tools for
protein structure prediction Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different ...
, including comparative modeling, protein threading, '' de novo''
protein structure prediction Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different ...
, and secondary structure prediction is available in the
list of protein structure prediction software This list of protein structure prediction software summarizes notable used software tools in protein structure prediction, including homology modeling, protein threading, ''ab initio'' methods, secondary structure prediction, and transmembrane h ...
.


Molecular docking

Molecular docking In the field of molecular modeling, docking is a method which predicts the preferred orientation of one molecule to a second when a ligand and a target are bound to each other to form a stable complex. Knowledge of the preferred orientation in ...
(also referred to only as docking) is a method used to predict the orientation coordinates of a molecule (
ligand In coordination chemistry, a ligand is an ion or molecule (functional group) that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's elect ...
) when bound to another one (receptor or target). The binding may be mostly through non-covalent interactions while covalently linked binding can also be studied. Molecular docking aims to predict possible poses (binding modes) of the
ligand In coordination chemistry, a ligand is an ion or molecule (functional group) that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's elect ...
when it interacts with specific regions on the receptor. Docking tools use force fields to estimate a score for ranking best poses that favored better interactions between the two molecules. In general, docking protocols are used to predict the interactions between small molecules and proteins. However, docking also can be used to detect associations and binding modes among
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
s,
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. ...
s, DNA or RNA molecules,
carbohydrate In organic chemistry, a carbohydrate () is a biomolecule consisting of carbon (C), hydrogen (H) and oxygen (O) atoms, usually with a hydrogen–oxygen atom ratio of 2:1 (as in water) and thus with the empirical formula (where ''m'' may o ...
s, and other
macromolecule A macromolecule is a very large molecule important to biophysical processes, such as a protein or nucleic acid. It is composed of thousands of covalently bonded atoms. Many macromolecules are polymers of smaller molecules called monomers. The ...
s.


Virtual screening

Virtual screening Virtual screening (VS) is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme. Virt ...
(VS) is a computational approach used for fast screening of large compound libraries for
drug discovery In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which new candidate medications are discovered. Historically, drugs were discovered by identifying the active ingredient from traditional remedies or b ...
. Usually, virtual screening uses docking algorithms to rank small molecules with the highest affinity to a target receptor. In recent times, several tools have been used to evaluate the use of virtual screening in the process of discovering new drugs. However, problems such as missing information, inaccurate understanding of drug-like molecular properties, weak scoring functions, or insufficient docking strategies hinder the docking process. Hence, the literature has described that it is still not considered a mature technology.


Molecular dynamics

Molecular dynamics Molecular dynamics (MD) is a computer simulation method for analyzing the physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamic "evolution" of th ...
(MD) is a computational method for simulating interactions between
molecule A molecule is a group of two or more atoms held together by attractive forces known as chemical bonds; depending on context, the term may or may not include ions which satisfy this criterion. In quantum physics, organic chemistry, and b ...
s and their atoms during a given period of time. This method allows the observation of the behavior of molecules and their interactions, considering the system as a whole. To calculate the behavior of the systems and, thus, determine the trajectories, an MD can use Newton's equation of motion, in addition to using
molecular mechanics Molecular mechanics uses classical mechanics to model molecular systems. The Born–Oppenheimer approximation is assumed valid and the potential energy of all systems is calculated as a function of the nuclear coordinates using Force field (chemi ...
methods to estimate the forces that occur between particles (
force fields Force field may refer to: Science * Force field (chemistry), a set of parameter and equations for use in molecular mechanics simulations * Force field (physics), a vector field indicating the forces exerted by one object on another * Force field ( ...
).


Applications

Informatics Informatics is the study of computational systems, especially those for data storage and retrieval. According to ACM ''Europe and'' '' Informatics Europe'', informatics is synonymous with computer science and computing as a profession, in which t ...
approaches used in structural bioinformatics are: * Selection of Target - Potential targets are identified by comparing them with databases of known structures and sequence. The importance of a target can be decided on the basis of published literature. Target can also be selected on the basis of its
protein domain In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of ...
. Protein domains are building blocks that can be rearranged to form new proteins. They can be studied in isolation initially. * Tracking
X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
trials - X-Ray crystallography can be used to reveal three-dimensional structure of a protein. But, in order to use X-ray for studying protein crystals, pure proteins crystals must be formed, which can take a lot of trials. This leads to a need for tracking the conditions and results of trials. Furthermore, supervised machine learning algorithms can be used on the stored data to identify conditions that might increase the yield of pure crystals. * Analysis of X-Ray crystallographic data - The diffraction pattern obtained as a result of bombarding X-rays on electrons is
Fourier transform A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed ...
of electron density distribution. There is a need for algorithms that can deconvolve Fourier transform with partial information ( due to missing phase information, as the detectors can only measure amplitude of diffracted X-rays, and not the phase shifts ). Extrapolation technique such as Multiwavelength anomalous dispersion can be used to generate electron density map, which uses the location of selenium atoms as a reference to determine rest of the structure. Standard
Ball-and-stick model In chemistry, the ball-and-stick model is a molecular model of a chemical substance which displays both the three-dimensional position of the atoms and the bonds between them. The atoms are typically represented by spheres, connected by rods ...
is generated from the electron density map. * Analysis of NMR spectroscopy data -
Nuclear magnetic resonance spectroscopy Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique to observe local magnetic fields around atomic nuclei. The sample is placed in a magnetic fie ...
experiments produce two (or higher) dimensional data, with each peak corresponding to a chemical group within the sample. Optimization methods are used to convert spectra into three dimensional structures. * Correlating Structural information with functional information - Structural studies can be used as probe for structural-functional relationship.


Tools


See also


References


Further reading

* * * * * * * * {{genomics-footer