Proteomics is the large-scale study of proteins.[1][2] Proteins are vital parts of living organisms, with many functions. The proteome is the entire set of proteins that is produced or modified by an organism or system. Proteomics has enabled the identification of ever increasing numbers of protein. This varies with time and distinct requirements, or stresses, that a cell or organism undergoes.[3] Proteomics is an interdisciplinary domain that has benefitted greatly from the genetic information of various genome projects, including the Human Genome Project.[4] It covers the exploration of proteomes from the overall level of protein composition, structure, and activity. It is an important component of functional genomics.
Proteomics generally refers to the large-scale experimental analysis of proteins and proteomes, but often is used specifically to refer to protein purification and mass spectrometry.
The first studies of proteins that could be regarded as proteomics began in 1975, after the introduction of the two-dimensional gel and mapping of the proteins from the bacterium Escherichia coli.
The word proteome is a portmanteau of protein and genome, and was coined by Marc Wilkins in 1994 while he was a Ph.D. student at Macquarie University.[5] Macquarie University also founded the first dedicated proteomics laboratory in 1995.[6][7]
After genomics and transcriptomics, proteomics is the next step in the study of biological systems. It is more complicated than genomics because an organism's genome is more or less constant, whereas proteomes differ from cell to cell and from time to time. Distinct genes are expressed in different cell types, which means that even the basic set of proteins that are produced in a cell needs to be identified.
In the past this phenomenon was assessed by RNA analysis, but it was found to lack correlation with protein content.[8][9] Now it is known that mRNA is not always translated into protein,[10] and the amount of protein produced for a given amount of mRNA depends on the gene it is transcribed from and on the current physiological state of the cell. Proteomics confirms the presence of the protein and provides a direct measure of the quantity present.
Not only does the translation from mRNA cause differences, but many proteins also are subjected to a wide variety of chemical modifications after translation. The most common and widely studied post translational modifications include phosphorylation and glycosylation. Many of these post-translational modifications are critical to the protein's function.
One such modification is phosphorylation, which happens to many enzymes and structural proteins in the process of cell signaling. The addition of a phosphate to particular amino acids—most commonly serine and threonine[11] mediated by serine-threonine kinases, or more rarely tyrosine mediated by tyrosine kinases—causes a protein to become a target for binding or interacting with a distinct set of other proteins that recognize the phosphorylated domain.
Because protein phosphorylation is one of the most-studied protein modifications, many "proteomic" efforts are geared to determining the set of phosphorylated proteins in a particular cell or tissue-type under particular circumstances. This alerts the scientist to the signaling pathways that may be active in that instance.
Ubiquitin is a small protein that may be affixed to certain protein substrates by enzymes called E3 ubiquitin ligases. Determining which proteins are poly-ubiquitinated helps understand how protein pathways are regulated. This is, therefore, an additional legitimate "proteomic" study. Similarly, once a researcher determines which substrates are ubiquitinated by each ligase, determining the set of ligases expressed in a particular cell type is helpful.
In addition to phosphorylation and ubiquitination, proteins may be subjected to (among others) methylation, acetylation, glycosylation, oxidation, and nitrosylation. Some proteins undergo all these modifications, often in time-dependent combinations. This illustrates the potential complexity of studying protein structure and function.
A cell may make different sets of proteins at different times or under different conditions, for example during portmanteau of protein and genome, and was coined by Marc Wilkins in 1994 while he was a Ph.D. student at Macquarie University.[5] Macquarie University also founded the first dedicated proteomics laboratory in 1995.[6][7]
After genomics and transcriptomics, proteomics is the next step in the study of biological systems. It is more complicated than genomics because an organism's genome is more or less constant, whereas proteomes differ from cell to cell and from time to time. Distinct genes are expressed in different cell types, which means that even the basic set of proteins that are produced in a cell needs to be identified.
In the past this phenomenon was assessed by RNA analysis, but it was found to lack correlation with protein content.[8][9] Now it is known that [8][9] Now it is known that mRNA is not always translated into protein,[10] and the amount of protein produced for a given amount of mRNA depends on the gene it is transcribed from and on the current physiological state of the cell. Proteomics confirms the presence of the protein and provides a direct measure of the quantity present.
Not only does the translation from mRNA cause differences, but many proteins also are subjected to a wide variety of chemical modifications after translation. The most common and widely studied post translational modifications include phosphorylation and glycosylation. Many of these post-translational modifications are critical to the protein's function.
One such modification is phosphorylation, which happens to many enzymes and structural proteins in the process of phosphorylation, which happens to many enzymes and structural proteins in the process of cell signaling. The addition of a phosphate to particular amino acids—most commonly serine and threonine[11] mediated by serine-threonine kinases, or more rarely tyrosine mediated by tyrosine kinases—causes a protein to become a target for binding or interacting with a distinct set of other proteins that recognize the phosphorylated domain.
Because protein phosphorylation is one of the most-studied protein modifications, many "proteomic" efforts are geared to determining the set of phosphorylated proteins in a particular cell or tissue-type under particular circumstances. This alerts the scientist to the signaling pathways that may be active in that inst
Because protein phosphorylation is one of the most-studied protein modifications, many "proteomic" efforts are geared to determining the set of phosphorylated proteins in a particular cell or tissue-type under particular circumstances. This alerts the scientist to the signaling pathways that may be active in that instance.
Ubiquitin is a small protein that may be affixed to certain protein substrates by enzymes called E3 ubiquitin ligases. Determining which proteins are poly-ubiquitinated helps understand how protein pathways are regulated. This is, therefore, an additional legitimate "proteomic" study. Similarly, once a researcher determines which substrates are ubiquitinated by each ligase, determining the set of ligases expressed in a particular cell type is helpful.
In proteomics, there are multiple methods to study proteins. Generally, proteins may be detected by using either antibodies (immunoassays) or mass spectrometry. If a complex biological sample is analyzed, either a very specific antibody needs to be used in quantitative dot blot analysis (QDB), or biochemical separation then needs to be used before the detection step, as there are too many analytes in the sample to perform accurate detection and quantification.
In proteomics, there are multiple methods to study proteins. Generally, proteins may be detected by using either antibodies (immunoassays) or mass spectrometry. If a complex biological sample is analyzed, either a very specific antibody needs to be used in quantitative dot blot analysis (QDB), or biochemical separation then needs to be used before the detection step, as there are too many analytes in the sample to perform accurate detection and quantification.
One of the earliest methods for protein
One of the earliest methods for protein analysis has been Edman degradation (introduced in 1967) where a single peptide is subjected to multiple steps of chemical degradation to resolve its sequence. These early methods have mostly been supplanted by technologies that offer higher throughput.
More recently implemented methods use mass spectrometry-based techniques, a development that was made possible by the discovery of "soft ionization" methods developed in the 1980s, such as mass spectrometry-based techniques, a development that was made possible by the discovery of "soft ionization" methods developed in the 1980s, such as matrix-assisted laser desorption/ionization (MALDI) and electrospray ionization (ESI). These methods gave rise to the top-down and the bottom-up proteomics workflows where often additional separation is performed before analysis (see below).
For the analysis of complex biological samples, a reduction of sample complexity is required. This may be performed off-line by one-dimensional or two-dimensional separation. More recently, on-line methods have been developed where individual peptides (in bottom-up proteomics approaches) are separated using reversed-phase chromatography and then, directly ionized using ESI; the direct coupling of separation and analysis explains the term "on-line" analysis.
The second quantitative approach uses stable isotope tags to differentially label proteins from two different complex mixtures. Here, the proteins within a complex mixture are labeled isotopically first, and then digested to yield labeled peptides. The labeled mixtures are then combined, the peptides separated by multidimensional liquid chromatography and analyzed by tandem mass spectrometry. Isotope coded affinity tag (ICAT) reagents are the widely used isotope tags. In this method, the cysteine residues of proteins get covalently attached to the
The second quantitative approach uses stable isotope tags to differentially label proteins from two different complex mixtures. Here, the proteins within a complex mixture are labeled isotopically first, and then digested to yield labeled peptides. The labeled mixtures are then combined, the peptides separated by multidimensional liquid chromatography and analyzed by tandem mass spectrometry. Isotope coded affinity tag (ICAT) reagents are the widely used isotope tags. In this method, the cysteine residues of proteins get covalently attached to the ICAT reagent, thereby reducing the complexity of the mixtures omitting the non-cysteine residues.
Quantitative proteomics using stable isotopic tagging is an increasingly useful tool in modern development. Firstly, chemical reactions have been used to introduce tags into specific sites or proteins for the purpose of probing specific protein functionalities. The isolation of phosphorylated peptides has been achieved using isotopic labeling and selective chemistries to capture the fraction of protein among the complex mixture. Secondly, the ICAT technology was used to differentiate between partially purified or purified macromolecular complexes such as large RNA polymerase II pre-initiation complex and the proteins complexed with yeast transcription factor. Thirdly, ICAT labeling was recently combined with chromatin isolation to identify and quantify chromatin-associated proteins. Finally ICAT reagents are useful for proteomic profiling of cellular organelles and specific cellular fractions.[30]
Another quantitative approach is the accurate mass and time (AMT) tag approach developed by Richard D. Smith and coworkers at Pacific Northwest National Laboratory. In this approach, increased throughput and sensitivity is achieved by avoiding the need for tandem mass spectrometry, and making use of precisely determined separation time information and highly accurate mass determinations for peptide and protein identifications.
Balancing the use of mass spectrometers in proteomics and in medicine is the use of protein micro arrays. The aim behind protein micro arrays is to print thousands of protein detecting features for the interrogation of biological samples. Antibody arrays are an example in which a host of different antibodies are arrayed to detect their respective antigens from a sample of human blood. Another approach is the arraying of multiple protein types for the study of properties like protein-DNA, protein-protein and protein-ligand interactions. Ideally, the functional proteomic arrays would contain the entire complement of the proteins of a given organism. The first version of such arrays consisted of 5000 purified proteins from yeast deposited onto glass microscopic slides. Despite the success of first chip, it was a greater challenge for protein arrays to be implemented. Proteins are inherently much more difficult to work with than DNA. They have a broad dynamic range, are less stable than DNA and their structure is difficult to preserve on glass slides, though they are essential for most assays. The global ICAT technology has striking advantages over protein chip technologies.[30]
One major development to come fro
One major development to come from the study of human genes and proteins has been the identification of potential new drugs for the treatment of disease. This relies on genome and proteome information to identify proteins associated with a disease, which computer software can then use as targets for new drugs. For example, if a certain protein is implicated in a disease, its 3D structure provides the information to design drugs to interfere with the action of the protein. A molecule that fits the active site of an enzyme, but cannot be released by the enzyme, inactivates the enzyme. This is the basis of new drug-discovery tools, which aim to find new drugs to inactivate proteins involved in disease. As genetic differences among individuals are found, researchers expect to use these techniques to develop personalized drugs that are more effective for the individual.[31]
Proteomics is also used to reveal complex plant-insect interactions that help identify candidate genes involved in the defensive response of plants to herbivory.[32][33]Proteomics is also used to reveal complex plant-insect interactions that help identify candidate genes involved in the defensive response of plants to herbivory.[32][33][34]
Interaction proteomics is the analysis of protein interactions from scales of binary interactions to proteome- or network-wide. Most proteins function via protein–protein interactions, and one goal of interaction proteomics is to identify binary protein interactions, protein complexes, and interactomes.
Several methods are available to probe protein–protein interactions. While the most traditional method is yeast two-hybrid analysis, a powerful emerging method is probe protein–protein interactions. While the most traditional method is yeast two-hybrid analysis, a powerful emerging method is affinity purification followed by protein mass spectrometry using tagged protein baits. Other methods include surface plasmon resonance (SPR),[35][36] protein microarrays, dual polarisation interferometry, microscale thermophoresis and experimental methods such as phage display and in silico computational methods.
Knowledge of protein-protein interactions is especially useful in regard to biological networks and systems biology, for example in cell signaling cascades and gene regulatory networks (GRNs, where knowledge of protein-DNA interactions is also informative). Proteome-wide analysis of protein interactions, and integration of these interaction patterns into larger biological networks, is crucial towards understanding systems-level biology.[37][38]
Expression proteomics includes the analysis of protein expression at larger scale. It helps identify main proteins in a particular sample, and those proteins differentially expressed in related samples—such as diseased vs. healthy tissue. If a protein is found only in a diseased sample then it can be a useful drug target or diagnostic marker. Proteins with same or similar expression profiles may also be functionally related. There are technologies such as 2D-PAGE and mass spectrometry that are used in expression proteomics.[39]
Understanding the proteome, the structure and function of each protein and the complexities of protein–protein interactions is critical for developing the most effective diagnostic techniques and disease treatments in the future. For example, proteomics is highly useful in identification of candidate biomarkers (proteins in body fluids that are of value for diagnosis), identification of the bacterial antigens that are targeted by the im
Understanding the proteome, the structure and function of each protein and the complexities of protein–protein interactions is critical for developing the most effective diagnostic techniques and disease treatments in the future. For example, proteomics is highly useful in identification of candidate biomarkers (proteins in body fluids that are of value for diagnosis), identification of the bacterial antigens that are targeted by the immune response, and identification of possible immunohistochemistry markers of infectious or neoplastic diseases.[42]
An interesting use of proteomics is using specific protein biomarkers to diagnose disease. A number of techniques allow to test for proteins produced during a particular disease, which helps to diagnose the disease quickly. Techniques include western blot, immunohistochemical staining, enzyme linked immunosorbent assay (ELISA) or mass spectrometry.[28][43] Secretomics, a subfield of proteomics that studies secreted proteins and secretion pathways using proteomic approaches, has recently emerged as an important tool for the discovery of biomarkers of disease.[44]
In proteogenomics, proteomic technologies such as mass spectrometry are used for improving gene annotations. Parallel analysis of the genome and the proteome facilitates discovery of post-translational modifications and proteolytic events,[45] especially when comparing multiple species (comparative proteogenomics).[46]
Mass spectrometry and microarray produce peptide fragmentation information but do not give identification of specific proteins present in the original sample. Due
Mass spectrometry and microarray produce peptide fragmentation information but do not give identification of specific proteins present in the original sample. Due to the lack of specific protein identification, past researchers were forced to decipher the peptide fragments themselves. However, there are currently programs available for protein identification. These programs take the peptide sequences output from mass spectrometry and microarray and return information about matching or similar proteins. This is done through algorithms implemented by the program which perform alignments with proteins from known databases such as UniProt[47] and PROSITE[48] to predict what proteins are in the sample with a degree of certainty.
Most programs available for protein analysis are not written for proteins that have undergone Most programs available for protein analysis are not written for proteins that have undergone post-translational modifications.[50] Some programs will accept post-translational modifications to aid in protein identification but then ignore the modification during further protein analysis. It is important to account for these modifications since they can affect the protein's structure. In turn, computational analysis of post-translational modifications has gained the attention of the scientific community. The current post-translational modification programs are only predictive.[51] Chemists, biologists and computer scientists are working together to create and introduce new pipelines that allow for analysis of post-translational modifications that have been experimentally identified for their effect on the protein's structure and function.
Advances in quantitative proteomics would clearly enable more in-depth analysis of cellular systems.[37][38] Biological systems are subject to a variety of perturbations (cell cycle, cellular differentiation, carcinogenesis, environment (biophysical), etc.). Transcriptional and translational responses to these perturbations results in functional changes to the proteome implicated in response to the stimulus. Therefore, describing and quantifying proteome-wide changes in protein abundance is crucial towards understanding biological phenomenon more holistically, on the level of the entire system. In this way, proteomics can be seen as complementary to genomics, transcriptomics, epigenomics, metabolomics, and other -omics approaches in integrative analyses attempting to define biological phenotypes more comprehensively. As an example, The Cancer Proteome Atlas provides quantitative protein expression data for ~200 proteins in over 4,000 tumor samples with matched transcriptomic and genomic data from The Cancer Genome Atlas.[54] Similar datasets in other cell types, tissue types, and species, particularly using deep shotgun mass spectrometry, will be an immensely important resource for research in fields like cancer biology, developmental and stem cell biology, medicine, and evolutionary biology.