Practical proteomics. Concepts, principles and directions of proteomics. The Curse of Isotopic Distribution

INTRODUCTION

proteomics enzyme research

Currently, there is a revolution in ideas about the etiology, pathogenesis and therapy of human diseases, which is associated with advances in the field of molecular biology and genetics, molecular medicine and pharmacology.

Significant advances have been made in understanding the structure and function of DNA, RNA, proteins, genome replication and functioning, reverse transcription, modification, DNA repair and recombination, transcription and translation of mRNA in pro- and eukaryotic cells. Numerous studies based on new bioanalytical methods have clarified the main pathways of gene expression regulation. Recombinant DNA technologies have been studied in detail. Currently, the study of the physicochemical bases of the development of hereditary and socially significant human diseases (atherosclerosis, oncopathologies, diabetes mellitus, intracellular infections, neurodegenerative diseases, etc.) has received powerful development.

In the post-genomic era, the question of the practical implementation of fundamental developments in the field of molecular biology, medicine and pharmacology arises. At the same time, the functioning of the genome is reflected in postgenomic events associated with the synthesis of numerous proteins, the study of which is now receiving special attention within the framework of a separate scientific field - proteomics. The development of proteomic research is impossible without the construction of algorithms and analysis methods, the creation of a database that makes it possible to elucidate the functioning mechanism of biological texts and develop targeted pharmacological effects (biotransformation).

Related problems of genomics and proteomics, pharmacogenomics and biotransformatics are implemented on the basis of unique methodological solutions and technological platforms.

Currently, at the level of academic centers, various research institutes in Russia, the CIS countries, Western Europe, the USA and Canada, the results of scientific technological platforms for biomedical and pharmaceutical research are being developed and introduced into the clinic.

The purpose of the practice is to learn the basics of proteomics and proteomic mapping

The purpose of the practice is to consolidate and deepen the theoretical knowledge acquired during the learning process; master methods of working with specialized literature; collect specific materials in accordance with the recommended questions; formalize the results obtained during the internship.

CONCEPTS, PRINCIPLES AND DIRECTIONS OF PROTEOMICS

The beginning of the 21st century is marked by the beginning of the era of proteomics. This term comes from two other well-known concepts in biochemistry: “PROTEins” and “genOMe” and was first used in 1995.

Of course, genomics will not disappear, it will develop at the same, and maybe even faster speed, but it is clear that the center of post-genomic research will be moved to the area of ​​inventory and elucidation of the human proteomic map. At first glance, the problem seems completely unsolvable. If the human genomic map is essentially the same for all human cells (these are 23 chromosomes with the same set of genes - the exception is 14 sex cells), then in the case of the human proteomic map it is completely meaningless to talk about its generality: every cell, every tissue, each biological fluid must have its own proteomic map. Although each cell may have about 100,000 functioning genes, numerous modification reactions can increase the number of proteins in a cell to 10 to 20 million.

In this regard, there are currently two definitions of proteomics: a narrow one, which can be called structural proteomics, and a broader one, which includes both the structural and functional parts of proteomics. In the narrow sense of the word, proteomics is the science that deals with the inventory of proteins using the combined use of methods: two-dimensional electrophoresis (2D-electrophoresis), mass spectrometric (MS) analysis of molecular weight and the sequence of proteins of biological material separated by electrophoresis, followed by analysis of the results using bioinformatics methods. Essentially, structural proteomics is a combination of 2D electrophoresis, mass spectrometry and bioinformatics. And if the resolving capabilities of two-dimensional electrophoresis have been known for a long time, since the first work of O"Farrell in 1975, then the ability of MS analysis to very quickly determine the molecular weight and sequence of polypeptide chains became clear only very recently. They developed so quickly that now some companies fully automated systems have already been created for determining the molecular mass and sequence of proteins, operating at phenomolar and atomicmolar concentration levels... Using a combination of these methods, it is possible to create a proteomic map of any biological material, which represents a phenotypic manifestation of the genome of a cell, tissue, or even an entire organ. In a broad sense, the terms proteomic analysis, or proteomics, can be used not only to inventory the proteins of a biological object, but also to control the reversible post-translational modification (PTM) of proteins by specific enzymes, such as: phosphorylation, glycosylation, acylation, frenylation, scaffolding, etc. . .

Currently, more than 300 different types of post-translational modification have been characterized using proteomics.

The intensive development of MS analysis has contributed to the emergence over the past 5 - 7 years of a whole group of areas of proteomic research (Fig. 1), most of which have a biomedical focus, however, the fundamental basis today still remains with structural and functional proteomics.

The policies of most countries of the European Union, Russia and the CIS countries are, to one degree or another, connected with the natural desire of the population to live in accordance with international quality standards. Terms such as “ecologically clean area” or “ecologically friendly product”, as well as all kinds of words with the prefix “euro-”, which have become firmly established in everyday life, unfortunately, in most cases, do not have any actual content. At the same time, the desired standards of quality of life established in many countries are the result of complex processes affecting the cultural, social and legal aspects of the development of these states.

Figure 1 - Modern directions of proteomic analysis.

SCIENTIFIC COMMUNICATIONS

S.V. Suchkov12, D.A. Gnatenko1, D.S. Kostyushev1, S.A. Krynsky1, M.A. Paltsev3

1 First Moscow State Medical University named after. THEM. Sechenov, Russian Federation 2 Moscow State Medical and Dental University named after. A.I. Evdokimova, Russian Federation

3 RRC “Kurchatov Institute”, Moscow, Russian Federation

Proteomics as a fundamental tool for preclinical screening, assay verification and evaluation of applied

Proteomics is a science that studies the proteins of living organisms, their functions and interactions, and today is an indispensable component in the creation of preclinical diagnostic protocols. In combination with the achievements of genomics and bioinformatics, the use of proteomics technologies is a powerful tool for early diagnosis of diseases, as well as dynamic assessment of the course of pathological processes (in particular, against the background of ongoing pharmacotherapy). The article discusses general and specific aspects of proteomics based on models of cardiovascular and oncological diseases.

Key words: proteomics, diagnostics, prediction, translational medicine.

Introduction

It is known that the vast majority of pathological changes in the functioning of cells, tissues and organs are accompanied by a deviation from the physiological protein profile of a normal healthy organism. In modern conditions, the analysis and prediction of such changes come to the fore when creating preclinical screening protocols (i.e., identifying hidden and latent protein “precursors” of the disease, as well as assessing the effectiveness of applied therapy methods). Search, identification, separation, quantitative and qualitative determination of protein molecules that play a role in providing sensitivity or directly in the formation of the disease are the main tasks of proteomics.

Proteomics is a science that studies the protein composition of biological objects, as well as the structural and functional properties of protein molecules. Its task is to identify and quantify the total individual proteins that are contained in biological samples (blood serum, cerebrospinal fluid, urine, biopsies) at different stages of disease development, as well as against the background of therapy. The totality of all the proteins of the body, i.e., in fact, its protein profile, is called “pro-theome”.

Modern technological arsenal of proteomics

Fractionation and separation of proteins contained in a specific biological sample is carried out

S.V. Suchkov1,2, D.S. Kostushev1, S.A. Krynskiy1, D.A. Gnatenko1, M.A. Paltsev3

1 I.M. Sechenov First Moscow State Medical University, Russian Federation 2 A.I. Evdokimov Moscow State Medical Dental University, Russian Federation 3 Kurchatov’s Scientific Institute, Moscow, Russian Federation

Proteomics as a fundamental tool for subclinical screening, tests verification and assessment of applied therapy

Proteomics is a science which studies proteins of the body, interactions of proteins and their biological functions. Today, it is an essential partner in establishing preclinical diagnosis protocols. In conjunction with other sciences such as genomics and bioinformatics it will be possible to diagnose diseases at the earliest stages before its clinical onset or to gain the dynamics of pathological processes in the body and response to drug therapy. This article discusses general aspects of proteomics as well as special ones on the basis of models of cardiac diseases and cancer.

Key words: proteomics, diagnostics, prediction, translation medicine.

BULLETIN OF RAMS /2013/ No. 1

by electrophoresis in polyacrylamide gel. To identify isolated proteins, a wide range of methods are used, among which the following should be highlighted:

Protein microsequencing;

High pressure liquid chromatography (HPLC) and high resolution;

Methods of immunochemical testing using monoclonal antibodies to individual antigenic determinants;

Mass spectrometry.

In recent years, the procedure for detecting protein molecules has been significantly optimized by developing for this purpose a wide panel of microbiochips with different types of detection, for example SELDI (surface-enhanced laser desorption/ionization) and/or MALDI (matrix-assisted laser desorption/ionization). Approaches of this kind made it possible to simultaneously analyze up to 10,000 individual proteins in one sample, while recording minute shifts in their concentrations under the influence of various factors. As a result, if proteins differ in at least one of their inherent parameters (total charge of the molecule or molecular weight), the above approach makes it possible to achieve their separation with subsequent identification and characterization.

One of the most promising methods for identifying proteins is mass spectrometry, based on the formation of ionized particles of the analyte in a vacuum space, followed by analysis of the ratio of the mass of ions to their charge. There are various modifications of mass spectrometry, which are divided depending on the ionization and particle detection methods used. A time-of-flight mass spectrometer records individual ions, indicating the ion's mass-to-charge ratio (m/z), the number of ions, and the time of flight of the ions from the source to the ion detector.

Chromatographic methods have a lower resolution, allowing separation of proteins according to the physical properties of molecules: charge (ion exchange chromatography), hydrophobicity parameters (hydrophobic chromatography), size (gel filtration), ability to bind to various ligands, for example antibodies (affinity chromatography) . In these cases we are talking about variants of liquid chromatography, because Protein molecules do not exist in the gas phase. In proteomic analysis, a combination of mass spectrometry and liquid chromatography (chromatography-mass spectrometry) is often used: that is, in fact, the creation and implementation of mass spectrometry led to a leap in the development of proteomics.

Finally, proteomics methods include immunochemical analysis using monoclonal antibodies to individual antigenic determinants, linear and conformation-dependent, including a number of cryptic epitopes.

An important role when working with tissue sections is played by immunohistochemical research methods based on specific antigen-antibody interactions. Immunohistochemical methods are highly sensitive and specific, allowing the determination of almost any antigen of interest (the scope of application of the method is limited only by the antibody library available).

Detection of bound antibodies is carried out using enzyme or fluorescent labels. In clinical practice, enzyme labels are more common since the immunofluorescence method, although

and is more sensitive and specific, but requires expensive equipment. In addition, fluorescent dyes have a short shelf life. Some techniques involve the use of polymer carriers for antibodies, which increases the sensitivity of the reaction.

The final stage of such a labor-intensive and multi-stage research is protein identification using databases (bioinformatics).

Bioinformatics, from the perspective of applied science, allows not only to store, analyze and process enormous amounts of data necessary for scientific and diagnostic procedures, but is also capable of obtaining information about the functional properties of certain protein molecules based on some data on the structure of the genome. Thus, without having practically any information on the interaction of groups of molecules with each other, their functions and properties, in some cases it is possible to reliably, with a high degree of probability, determine the characteristics of the object being studied.

Proteomics as a foundation for scientific research with subsequent implementation of results into clinical practice within the framework of the principles and objectives of translational medicine

Research often requires analysis of a large number of similar samples. Meanwhile, each study requires material and time costs, which can be minimized by the tissue matrix method, which involves the creation of libraries of tissue samples with the subsequent possibility of simultaneous (on one glass) examination of many sections. The typical sequence of operations for research of this kind is as follows:

Sampling (cells, tissue, biological fluid);

Sample preparation (cell lysis, protein extraction);

Two-dimensional polyacrylamide gel electrophoresis;

The appearance of protein spots on the gel;

Electropherogram analysis (number of spots, their location);

Isolation of gel areas containing individual protein spots;

Cleavage of individual proteins (trypsinization) directly in the gel;

Mass spectrometric analysis (determination of amino acid sequences of individual protein fragments);

Identification of each protein and measurement of its concentration, documentation, processing of results;

Interpretation of the obtained data using bioinformatics methods - analysis of databases, obtaining a differential profile of proteins.

Using this procedure, new protein markers have already been discovered and impressive results have been obtained in the field of cardiovascular proteomics and oncoproteomics.

Particular aspects of proteomics

The two main types of proteomics are structural and functional. The first one studies

SCIENTIFIC COMMUNICATIONS

the structure of individual proteins, while the second considers them in interaction with other proteins, exploring the conformational, biochemical and functional changes that occur. The set of all cell proteins that interact with a specific target protein molecule is called the “interactome.”

Primary diagnostic purposes are primarily served by structural proteomics, while functional proteomics is more a path of scientific research, as well as the foundation for the development of fundamentally new drugs that work with specific and individual pharmacotherapeutic targets at the cellular and molecular level.

Blood plasma proteomics

Among all body tissues, blood plasma most closely reflects the protein composition: the plasma proteome includes about 1/10 of all proteins present in the body. Among the proteins present in plasma are:

Proteins functioning in plasma;

Immunoglobulins;

Hormones;

Cytokines;

Proteins passing transiently through plasma;

Intracellular proteins that enter the plasma during

destruction or increase in cell permeability;

Proteins that are absent normally and secreted by malignant cells;

Foreign proteins.

At least 1/2 of plasma proteins exist in the form of multiprotein complexes. With the help of special molecular tags introduced into the protein molecule, it is possible to isolate and isolate such complexes for the purpose of their further study for the characteristics of a particular interactome.

To date, more than 10,000 plasma proteins have been identified based on mass spectrometric analysis of one or two peptides of each protein, and more than 3,000 proteins based on the identification of two or more peptides. Almost 900 plasma proteins have been identified with 95% confidence.

The possibilities offered by proteomic analysis of blood plasma are very attractive. However, plasma as a standard test sample also has a number of significant disadvantages. These include a very large (up to 10 orders of magnitude) scatter in protein concentrations and the predominance among them of little diagnostic significance. When studying changes in the plasma proteome, for example in cardiovascular disease, one must first find a way to separate these unimportant proteins, which poses a significant challenge. Therefore, the optimal sensitivity and specificity would be to study a sample obtained from a biopsy of the target organ, which, however, is not always applicable.

It should be noted that, despite intensive research in this area, the rate of introduction of new biomarkers into clinical practice remains low. This is explained by both objective and subjective reasons. One of them should be considered a predominantly empirical approach to organizing research without proper theoretical justification, as well as insufficient development of infrastructural connections between research centers, the lack of a unified nomenclature and problems with systematization of available data. Factual data in no small part

at least remain scattered, since the pace of their accumulation outstrips the capabilities of science to integrate them.

Cardiovascular proteomics

This section of proteomics is one of the most intensively developing. Databases have already been created on hundreds of proteins of the myocardial proteome, the levels of which change in chronic and acute cardiovascular pathologies. The greatest progress has been made in the study of dilated cardiomyopathy. With this disease, the content of more than 100 proteins changes, which can be divided into 3 main groups:

Proteins associated with energy and metabolism;

Stress-inducible proteins;

Proteins providing contractile functions

and formation of the cytoskeleton.

These results are fully consistent with modern ideas about the pathogenesis of dilated cardiomyopathy.

Progress in studying the pathogenesis of coronary heart disease and chronic heart failure is not so significant. It is not always possible to adequately model these types of pathologies: some results obtained in animal models are not consistent with those in humans. Most of the reliable results are related to the role of the so-called in the development and prevention of coronary heart disease and chronic heart failure. heat shock proteins (Hsp 27). Particular attention is paid to the study of the proteome in reperfusion syndrome. After reperfusion injury, changes in the structure of contractile proteins are detected: MLC-2 (myosin light chain 2), all three proteins of the troponin complex. The signaling mechanisms involved in the pathogenesis of reperfusion syndrome are being studied, although the complete picture of protein interactions has not yet been fully established. Studies have been conducted to study the phenomenon of remote preconditioning of the myocardium before ischemic injury, when a hypoxic state is created first in some other organ, and then in the heart. This reduces reperfusion damage. However, to date it has not been possible to identify candidate molecules for the role of humoral mediators of preconditioning.

Studying the proteomics of atherosclerosis is difficult due to the significant functional heterogeneity of the endothelial tissue phenotype. However, models of the protein profile of atherosclerotic plaques have been obtained, in which changes in the content of proteins such as Hsp27, crystallins, tumor necrosis factor a, cathepsins, peroxiredoxins, etc. are detected, about 80 proteins in total. To create biomarkers of atherosclerosis, it is proposed to study the profiles of plasma proteins associated with inflammation. The secretion of proteins by atherosclerotic plaques in vitro is also being studied.

In chronic heart failure, the only clinically useful biomarker is B-natriuretic peptide. As for coronary heart disease, the number of biomarkers is much larger: cardiac troponins, creatine kinase, etc. However, their content increases only in the later stages of ischemia, so a search is underway for new biomarkers that allow diagnosing its early stages. Another area of ​​interest is biomarkers specific to ischemia (rather than myocardial necrosis). At the moment, there is only one such marker - ischemia-modified albumin (ischemic-

BULLETIN OF RAMS /2013/ No. 1

modified albumin, IMA). However, its low specificity makes it difficult to use outside of a complex with traditional biomarkers.

Studying the cardiac proteome poses significant challenges. The most accurate method of analysis would be a biopsy, but it is difficult to perform. In the case of studying blood plasma, identifying among the huge mass of proteins those that could have clinical significance is an extremely difficult task. In this regard, in animal studies, perfusion of isolated hearts with blood-substituting solutions is often used, followed by the study of proteins released by tissues into the solution. Another direction is the study of pericardial fluid. Thus, in patients undergoing cardiac surgery, the level of the protein H-FABP (heart-type fatty acid binding protein) in the pericardial fluid was examined. It has been found that the level of this pericardial fluid protein, which is absent from the blood plasma, increases during ischemia.

Proteomics of lung diseases

When studying lung diseases, from the point of view of proteomics, lung tissue, fluid lining the epithelium, alveolocytes, and blood plasma are used as samples.

To study the proteome of the fluid lining the epithelium, bronchoalveolar fluid is used as a sample. Some lung tissue-specific proteins, such as glutathione-transferase and surfactant protein B, are significantly more abundant in this fluid than in plasma. Changes in bronchoalveolar fluid are studied in various diseases: sarcoidosis, cystic fibrosis, mesothelioma, idiopathic fibrosing alveolitis, etc. The study of bronchoalveolar fluid also makes it possible to isolate alveolar macrophages for subsequent assessment of their proteomic profile.

To obtain lung tissue samples, the use of invasive technologies is necessary. These studies are mainly aimed at assessing proteome changes in lung cancer. In a study by D.P. Carbone found that the protein content of SUMO-2 (small ubiquitin-like protein-2), thymosin-p4 and ubiquitin correlates with prognosis in non-small cell lung cancer. Studies have been conducted to identify protein patterns that distinguish invasive tumors from normal bronchial epithelium. To increase the reliability of the results, laser microdissection was used when obtaining samples to prevent the capture of healthy tissue. However, long-term clinical trials will be required before new biomarkers can be introduced into clinical practice.

To build proteomic profiles of adenocarcinomas, blood plasma studies are also used. Thus, when labeled with radioactive oxygen, 211 proteins were found whose levels increased in lung adenocarcinoma in mice, and 246 proteins whose levels decreased.

Oncoproteomics

The main objectives of oncoproteomics are:

Construction of proteomes and analysis of their dynamics during the emergence and development of various tumors;

Identification of cell signaling pathways leading to tumorigenesis;

Identification of markers for the diagnosis of cancer and for monitoring the response of the tumor and the body to surgery and to different types of therapy;

Determination of the immune response to tumorigenesis. Tumor markers are macromolecules (usually proteins

with a lipid or carbohydrate component), the presence and concentrations of which in blood plasma and/or other biological fluid correlate to a certain extent with the presence and growth of a malignant tumor. Among the wide variety of indicators used in the diagnosis of tumors, there are both specific tumor markers and some substances, the concentration of which can change during various pathological processes, incl. and tumor. The most specific tumor markers, practically absent in a healthy body, include embryonic antigens (the synthesis of which stops in the early stages of embryonic development and is derepressed during malignant transformation): cancer embryonic antigen, a-fetoprotein. Tumor-specific antigens are molecules (secretory products or membrane glycoproteins) expressed more intensely by tumor cells than by normal cells. These include CA 19-9, CA 15-3 (membrane glycoproteins), as well as prostate-specific antigen (PSA), a secretory product of prostate glandulocytes. In addition, hormones (human chorionic gonadotropin) and substances of other groups (thyroglobulin, P2-microglobulin, etc.) can act as tumor markers. To predict the course of the disease, proteins that are markers of proliferative activity and proteins that regulate apoptosis are examined (Fig. 1).

The areas of clinical application of tumor markers are as follows:

Early diagnosis of cancer;

Monitoring and evaluation of treatment effectiveness;

Definition of forecast.

Based on the above, the main requirements for a tumor marker are sufficiently high sensitivity and specificity, correlation with tumor volume, and the ability to provide information about the location of the tumor.

The sensitivity and specificity of most currently available tumor biomarkers are often insufficient. Markers for which the sensitivity at a specificity of 95% is more than 50% are considered clinically useful, and only a few of them can demonstrate a sensitivity of more than 70% at a given level of specificity. There are 2 approaches to the search for new tumor markers: the first involves targeted research based on modern knowledge about carcinogenesis, testing certain hypotheses; the second is an empirical search by comparing the proteomes of normal and tumor cells or by comparing the protein profile of the sera of healthy and sick patients; with and without risk factors.

Let's consider the possibilities of using modern tumor markers in the 3 aspects mentioned above.

1. Diagnostics. Due to lack of sensitivity, most tumor markers are unsuitable for screening studies in the general population. However, some of them can be effectively used for early diagnosis in risk groups where the likelihood of the disease is initially higher. So, PSA screening

SCIENTIFIC COMMUNICATIONS

Gene Expression Analysis

Protein

microchips

Mass spectrometry

Automated

IHC profiles

A____________

Assessing the prognosis Choosing a treatment method Obtaining new antibodies

Immunohistochemical (IHC) techniques

Serum

Indirect methods

Treatment monitoring

L_________

Selecting a treatment method Monitoring treatment effectiveness identifying side effects

Database

Direct Methods

Diagnostics

TO___________

Early diagnosis in risk groups Diagnosis of relapses

Rice. Proteomics technologies in the diagnosis of cancer.

carried out in a group of men over 50 years of age; screening for a-fetoprotein (a marker of hepatocellular carcinoma) - in patients with liver cirrhosis; for calcitonin (a marker of medullary thyroid cancer) - in persons with a family history.

2. Monitoring the course of the disease. Currently, tumor markers are most widely used for these purposes. A sign of successful radical surgery is a persistent decrease in marker concentration. Its subsequent increase indicates, depending on the time and rate of growth, the presence of a residual tumor, the occurrence of a relapse or isolated metastasis.

3. Predicting the course of the disease and determining treatment tactics. The level of many tumor markers correlates with the volume of the primary tumor and increases sharply with local and distant metastasis. So, for example, in chronic lymphocytic leukemia, the content of the serum deoxythymidine synthetase marker correlates with the course of the disease (stable or progressive).

To predict the course of the disease, the expression of markers of proliferative activity is also determined: Ki-67 protein, PCNA, cyclins (for example, cyclin D1), inhibitors of cyclin-dependent kinases. The levels of proteins that regulate apoptosis (Bcl-2, Bcl-x, Bax, Bak, etc.) are of great prognostic significance. Recently, apoptosis inhibitors - servivin and telomerase - have been intensively studied. Increased concentrations of these molecules have been demonstrated in tumors of many, although not all, locations. The level of their expression correlates with the stage of tumor development. For some types of carcinomas, a correlation of the course with the level of p53 protein expression, as well as the number of mutant forms of this protein, has been proven.

The type of marker studied and the significance of the result vary depending on the histological structure and location of the tumor. The final conclusion is made after a comprehensive assessment with other factors. An important task in oncology is the identification of signaling pathways involved in the process of carcinogenesis. The role of apoptosis regulatory proteins in this process is undoubted: p53, proteins of the bcr family, etc. The focus of functional proteomics is the study of the interactomes of these proteins, in other words, the reconstruction of the molecular interactions in which these proteins are involved.

The main problem in introducing oncoproteomics into practice is the difficulty of training oncologists to read oncotranscriptome and oncoproteome maps.

Types of protein molecules and features of interactomes

Although many proteins carry out their functions independently, the vast majority of them require highly specific interactions with other proteins in the body to exhibit their biological activity. Examples of various protein-protein interactions found in complex biological systems:

Protein-protein interactomes in strictly defined cellular compartments;

Messenger proteins that interact with receptors on the outer surface of the cell membrane, which is a necessary condition for triggering signaling cascades;

Proteins that form network and structural interactions, structural relationships at the intercellular level;

Enzyme inhibitors;

BULLETIN OF RAMS /2013/ No. 1

Modification (often followed by denaturation) due to the action of enzymes;

Interactions of protein subunits leading to allosteric effects in the composition of multimeric biocomplexes;

Protein-protein interactions underlying the motor functions of individual organelles, organs or the body as a whole (muscle contraction). Protein interactions are usually subdivided

into stable and transient, and both types can be provided by both strong and weak intermolecular bonds.

Stable interaction is observed in proteins consisting of several subunits-complexes and polypeptide chains. Typical examples of complex protein molecules consisting of several stably linked polypeptide chains are hemoglobin and polymerases.

Transient protein-protein interactions are involved in the control of most intra- and extracellular signaling processes. Transient interactions usually require a specific set of conditions that promote the development of various physiological effects, namely phosphorylation, conformational changes, or localization to a discrete region of the cell. Transient interacting proteins are involved in a wide range of cellular processes, including in catalytic protein modification, transport, reserve, signaling, regulatory, receptor and motor functions.

Transient protein-protein interaction is also observed during the transport of proteins through membrane pores, during the deformation of native proteins, at certain stages of the translation cycle, and the reformation of cellular structures during the cell cycle (cytoplasmic microfilaments, nuclear pore complex, etc.).

Proteins can bind to each other through hydrophobic/hydrophilic bonds, van der Waals forces, and ionic bridges between binding domains on each protein. These domains can be represented by a small area of ​​the protein surface and consist of only a few peptides. On the other hand, proteins with long polypeptide regions spanning hundreds of amino acids are widespread; the strength of their binding depends on the size and properties of the binding domain. One of the most common intraprotein bonds that provides stability to the entire molecule is the leucine zipper.

In the leucine zipper, the amino acid leucine is found at approximately every 8th position of the α-helix, resulting in leucine residues on one side, forming an amphipathic helix in which one side is hydrophobic. Thus, the leucine zipper forms a dimeric protein by linking two parallel α-helices together like a zipper.

The two Src homologous (SH) domains, SH2 and SH3, are an example of transient binding domains that are connected by short peptide sequences and are commonly found in signaling proteins. The Sffi domain “recognizes” only peptide sequences with phosphorylated tyrosine residues, which is a sign of an activated protein. In other words, the SH2 region is the most important region on the receptor involved in the growth factor signaling pathway, in which these residues are recognized through ligand-receptor-mediated phosphorylation of tyrosine residues by Sffi domains. SH3 domains typically recognize proline-rich peptide sequences and are typically found in enzymes such as kinases, phospholipases, and GTPases. They are designed to identify target proteins.

Conclusion

Proteomics, being a fundamental science, is nevertheless indispensable in solving a number of practical medical and applied scientific problems. The study of various biological fluids of the body using modern technological techniques of proteomics can provide the diagnostician with sufficient amounts of information necessary for an unambiguous diagnosis or assessment of the risks of a particular disease in a particular patient. The construction of algorithms for preclinical and clinical monitoring of patients using a conglomerate of laboratory diagnostic procedures, including genomic, transcriptomic and proteomic methods of analysis, as well as bioinformational techniques for data processing and analysis, is the key to the successful identification of a pathological condition in the latent stage, verification of diagnosis, definition and possible predicting the type and nature of the course of the disease, as well as monitoring the reactions of the patient’s body in response to the type of therapy used.

LITERATURE

1. Alaoui-Jamali M.A., Xu Y.J. Proteomic technology for biomarker profiling in cancer: an update. J. Zhejiang. Univ. sci. B. 2006; 6: 411-420.

2. Introduction to molecular diagnostics. Ed. M.S. Paltseva. M.: Medicine. 2010. 368 p.

3. Anderson N.L., Anderson N.G. The human plasma proteome: history, character, and diagnostic prospects. Mol. CellProteomics. 2002; 11: 845-867.

4. Sturgeon C. Perspectives in clinical proteomics conference: translating clinical proteomics into clinical practice. Exp. Rev. Proteomics. 2010; 4: 469-471.

5. McGregor E., Dunn M.J. Proteomics of heart disease. Hum. Mol. Genet. 2003; 2: 135-144.

6. Edwards A.V., White M.Y., Cordwell S.J. The role of pro-teomics in clinical cardiovascular biomarker discovery. Mol. Cell Pro-teomics. 2008; 10: 1824-1837.

7. Bowler R.P., Ellison M.C., Reisdorph N. Proteomics in pulmonary medicine. Chest. 2006; 2: 567-574.

A.A. ZAMYATNIN, Doctor of Biological Sciences, Institute of Biochemistry named after. A.N.Bach RAS

Our story will be dedicated to one of the youngest fundamental sciences (if not the youngest), which was born just a few years ago along with those who are still in elementary school. Unlike many other sciences about proteomics, you can say exactly under what circumstances it arose, indicate the year when its name appeared and who came up with it.

Let's start with the circumstances. In the second half of the XX century. Analytical methods of biochemistry, molecular biology and computer technology developed rapidly. The remarkable advances made in these fields have led to the ability to decipher enormous sequences of nucleic acid bases and to record the complete genome of a living organism. The complete genome was first deciphered in 1980 in the bacteriophage phi X-174 (about 5 103 bases), then in the first bacterium, Haemophilus influenzae (1.8 106 bases). And with the end of the 20th century. The enormous work of deciphering the complete human genome was completed - identifying the sequence of approximately 3 billion nucleic acid bases. Several billion dollars were spent on this work (about one dollar per base). In total, the genomes of several dozen species of living organisms have already been deciphered. It was during this period that two new biological sciences emerged: in 1987, the word “genomics” was used for the first time in the scientific press, and in 1993, “bioinformatics”.

In each biological species, part of the genome is represented by regions encoding the amino acid sequences of proteins. For example, there are about 100,000 such areas in humans (according to some estimates, this number can reach 300,000, and taking into account chemically modified structures - several million). It would seem that, knowing the complete genome and genetic code, one can obtain all the information about the structure of proteins through translation. However, everything is not so simple. It gradually became apparent that in the given cellular system of the body under consideration there was no correlation between the sets of mRNA and proteins. In addition, many proteins synthesized on ribosomes in accordance with the nucleotide sequence are subject to chemical modifications after synthesis and can exist in the body in modified and unmodified forms. And it is also important that proteins have a variety of spatial structures, which today cannot be determined by linear sequences of nucleotides and even amino acids. Therefore, direct isolation and determination of the structures of all functioning proteins remains an urgent task (direct determination of the structure has been carried out for approximately only 10% of human proteins to date). Thus, in addition to genomics, the term “proteomics” appeared, the object of study of which is the proteome (from the English PROTEins - proteins and genOMe - genome). And in the scientific press, the mention of the proteome first appeared in 1995.

It should be added that numerous short fragments of protein precursors, called oligopeptides, or simply peptides, play a major role in the life of organisms. It is because of them that there is such discrepancy in assessing the amount of protein-peptide components in representatives of the same biological species. Therefore, along with the terms “proteome” and “proteomics”, such terms as “peptidome” and “peptidomics” are currently used, which are part of the proteome and proteomics. We spoke earlier about the diversity of structure and functions of proteins and peptides on the pages of the Biology newspaper.

So, let’s formulate definitions of new sciences that have emerged during the lifetime of the current young generation and which are closely interconnected with each other (Fig. 1).

Rice. 1. Diagram illustrating the complete relationship of the three new biological sciences

Genomics is a science that studies the structure and functions of genes (a genome is the totality of all the genes of an organism).

Bioinformatics is a science that deals with the study of biological information using mathematical, statistical and computer methods.

Proteomics is a science that studies the totality of proteins and their interactions in living organisms (proteome is the totality of all proteins in an organism).

Note also that proteomics broadly includes structural proteomics, functional proteomics, and applied proteomics, which we will discuss separately.

Structural proteomics

The most striking feature of biology is diversity. It is visible at all levels of biological organization (biological species, morphology, chemical structure of molecules, network of regulatory processes, etc.). This fully applies to proteins. The scale of their structural diversity has not yet been fully revealed. Suffice it to say that the number of amino acid residues in one protein can range from two (the minimum structure having a peptide bond) to tens of thousands, and the human titin protein contains 34,350 amino acid residues and is currently the record holder for the largest of all known protein molecules.

To obtain information about the proteome, it is first necessary to isolate it and purify it from other molecules. Since the number of proteins in the entire proteome (i.e., in the entire organism) is very large, they usually take only part of the organism (its organ or tissue) and isolate the protein component using various methods. Over the nearly 200-year history of the study of proteins, many methods for isolating proteins have been developed - from simple salt precipitation to modern complex methods that take into account the various physical and chemical properties of these substances. After obtaining a pure fraction of an individual protein, its chemical structure is determined.

In structural proteomics, the structure of not one, but many proteins is determined at once, and to date, a special series of procedures has been developed for this and an arsenal of corresponding high-precision instruments has been created. (A complete set of equipment for proteomic research costs more than one million dollars.)

Rice. 2. Proteomics tools

On fig. Figure 2 shows a diagram of the laboratory cycle from sample preparation to determination of its structure. After isolation and purification (the figure shows an already isolated and purified preparation), proteins are separated using two-dimensional electrophoresis. This separation proceeds in two directions: in one, protein molecules having different masses are separated, in the other, different total electrical charges are separated. As a result of this delicate procedure, identical molecules are grouped on a special carrier, forming macroscopic spots, and each spot contains only identical molecules. The number of spots, i.e. the number of different proteins or peptides can be many thousands (Fig. 3, 4), and automatic devices for processing and analysis are used to study them. Then the spots are selected and the substances they contain are introduced into a complex physical device - a mass spectrometer, with the help of which the chemical (primary) structure of each protein is determined.

Rice. 3. An example of a two-dimensional electropherogram of proteins from a mouse liver extract

Rice. 4. Example of a two-dimensional electropherogram of peptides from human cerebrospinal fluid

Rice. 5. Nucleotide sequence of the gene encoding human serum albumin

The primary structure of a protein can also be determined using the results of genomics and bioinformatics. On fig. 5 shows the complete structure of the human serum albumin gene. It contains 1830 nitrogenous bases encoding 610 amino acid residues. This gene, like the vast majority of others, begins with an atg codon, encoding a methionine residue, and ends with one of the stop codons, in this case taa. This encodes a structure consisting of 609 amino acid residues (Fig. 6). However, this structure is not yet a molecule of serum albumin, but only its precursor. The first 24 amino acid residues are the so-called signal peptide, which is cleaved off during the transition of the molecule from the nucleus to the cytoplasm, and only after that the structure of serum albumin is formed, which is obtained by isolating this protein. As a result, this molecule contains 385 amino acid residues.

Rice. 6. Amino acid sequence of the human serum albumin precursor translated from the nucleotide sequence using the genetic code

Rice. 7. Spatial (tertiary) structure of the human serum albumin molecule

However, the amino acid sequence does not reveal the spatial structure of the protein. From the point of view of thermodynamics, an elongated linear structure is energetically unfavorable, and therefore it folds in a sequence-specific manner into a unique spatial structure, which can be determined using two powerful physical methods - X-ray diffraction analysis and nuclear magnetic resonance (NMR spectroscopy). Using the first of them, the spatial structures of several thousand proteins have been determined, including human serum albumin, the image of which is presented in Fig. 7. This structure, in contrast to the primary (amino acid sequence), is called tertiary and in it spiralized sections, which are elements of the secondary structure, are clearly visible.

Thus, the task of structural proteomics comes down to the isolation, purification, determination of the primary, secondary and tertiary structures of all proteins of a living organism, and its main tools are two-dimensional electrophoresis, mass spectrometry and bioinformatics.

Bioinformatics of proteins

The existence of a huge number of different proteins has led to the need to create information arrays - databases (or banks) of data in which all known information about them would be entered. Currently, there are many general and specialized databases that are available on the Internet to everyone. General databases contain information about all known proteins of living organisms, i.e. about the global proteome of all living things. An example of such a database is SwissProt-TrEMBL (Switzerland-Germany), which currently contains the structures of almost 200,000 proteins determined by analytical methods, and almost 2 million more structures that were determined as a result of translation from nucleotide sequences. On fig. 8 and 9 show the number of existing proteins that are known for each given number of amino acid residues. The x-axes in these graphs are limited to 2000 residues, but, as mentioned above, although not often, significantly larger molecules do occur. From the data presented in the figures, it follows that the largest number of proteins contain several hundred amino acid residues. These include enzymes and other fairly mobile molecules. Among the larger proteins, there are many that perform supporting or protective functions, holding biological structures together and giving them strength.

Rice. 8. Distribution of known (isolated) proteins by number of amino acid residues

Rice. 9. Distribution of translated amino acid sequences by the number of amino acid residues

Rice. 10. Distribution of known natural oligopeptides by number of amino acid residues

In the global proteome, a special place is occupied by small, very mobile molecules containing no more than 50 amino acid residues and possessing a specific spectrum of functional activity. They are called oligopeptides, or simply peptides. For them, i.e. for the global peptidome, a special data bank has been created, called EROP-Moscow. This name is an abbreviation for the term Endogenous Regulatory OligoPeptides, and indicates that the bank was created and based in our nation's capital. To date, the structure of almost 6000 oligopeptides isolated from representatives of all living kingdoms has been deciphered. Just like large proteins, the number of oligopeptides with a given number of amino acid residues can be depicted graphically (Fig. 10). Judging by the graph, the most common oligopeptides are those containing approximately 8–10 amino acid residues. Among them, they mainly contain molecules that are involved in the regulation of the nervous system, and are therefore called neuropeptides. Obviously, the fastest processes in a living organism are carried out with the participation of the nervous system, so peptide regulators must be mobile and therefore small. However, it should be noted that, due to the enormous structural and functional diversity of both proteins and peptides, a strict classification has not yet been created for them.

Thus, in this case, the tasks of bioinformatics are the accumulation of information about the physicochemical and biological properties of proteins, the analysis of this information, cataloging and preparation of an information base and computing tools to identify the mechanisms of their functioning.

Functional proteomics

The presence of a particular protein in the body gives reason to assume that it has (or had) a certain function, and the entire proteome serves to ensure the full functioning of the entire organism. Functional proteomics deals with determining the functional properties of the proteome, and the problems it solves are much more complex than, for example, determining protein-peptide structures.

It is obvious that the functioning of the proteome is carried out in a multicomponent environment in which many molecules of other chemical classes are present - sugars, lipids, prostaglandins, various ions and many others, including water molecules. It is possible that after some time such terms as “sugar”, “lipid” and the like will appear. Protein molecules interact with other or similar structures surrounding them, which ultimately leads to the emergence of functional reactions, first at the molecular level, and then at the macroscopic level. Many such processes are already known, including those involving proteins. These include the interaction of an enzyme with a substrate, an antigen with an antibody, peptides with receptors, toxins with ion channels, etc. (receptors and ion channels are also protein structures). To identify the mechanisms of these processes, both experimental studies of individual participants in the interaction and systemic studies using bioinformatics are carried out. Let's look at a few examples of such systemic approaches.

On fig. Figure 11 shows representatives of the human proteome (in this case, the peptidome) - various gastrins and cholecystokinins, which are localized in the gastrointestinal tract (when writing amino acid sequences, a standard one-letter code was used, the decoding of which was given by us earlier). The functional parts of these peptide molecules are very similar right-hand regions. However, peptides have directly opposite behavioral properties: gastrins make a person feel hungry, and cholecystokinins make a person feel full. Apparently, this difference is due to the fact that in the primary sequence of cholecystokinins the position of the tyrosine Y residue is shifted by one step compared to gastrins. The same figure shows the primary structure of the cionine peptide obtained from the representative of the protozoan chordate Ciona intestinalis (Fig. 12). Its structure is homologous to both gastrins and cholecystokinins and is characterized by two tyrosine residues located in the same positions as both of these peptides. Unfortunately, its functional properties have not been studied. And with proper experimental research, it would be possible to answer the question of what is the role of the chemical structure in general and tyrosine residues in particular in the manifestation of opposite physiological effects.

Rice. 11. Primary structures of representatives of the human peptidome in comparison with the structure of one of the tunicate peptides

Rice. 12. Tunicate Ciona intestinalis, living in the North Sea

Another example: in Fig. Figure 13 shows the amino acid sequences of very similar molecules, which are also combined into a structurally homologous family. These molecules are found in very evolutionarily distant living organisms - from insects to mammals. The first line gives the primary structure of bradykinin, which contains 9 amino acid residues and is found in many higher organisms, including humans. Over the years, chemists have synthesized various non-natural analogues of this molecule to answer the question of which part of it is responsible for interacting with the receptor. About 30 years ago, all possible fragments of bradykinin were even synthesized - 8 dipeptides, 7 tripeptides, etc. (a total of 36 fragments possible), the magnitude of the activity of which was then tested in the same biological test. The result turned out to be trivial: it turned out that only the entire molecule exhibits maximum activity, and each fragment individually has either trace or zero activity. This laborious work would not have had to be done if the other bradykinins shown in Fig. 1 had been known at that time. 13, and they would be isolated from the global proteome using bioinformatics. The presented structurally homologous family clearly demonstrates that all molecules have a region that has remained virtually unchanged as a result of biological evolution (quasi-conservative region), and it represents the bradykinin molecule of higher living organisms, selected as the most perfect as a result of the evolutionary process. This example demonstrates that proteomics, together with bioinformatics, can quickly (and cheaply) solve fundamental scientific problems.

Rice. 13. Primary structures of natural bradykinin peptides obtained from various living organisms. Quasi-conserved regions are indicated in bold.

Rice. 14. Primary structures of the structurally homologous family of endothelins/toxins

And finally, the third example is the structurally homologous family of mammalian endothelins and snake toxins (Fig. 14). Despite the striking similarity of the structures, their functional properties are strikingly different from each other: some are very useful regulators of vascular contraction, while others are deadly. In this case, we are faced with a situation where the primary structure does not contain sufficient information that can explain the reason for the difference in functions, and a more detailed consideration of the spatial (tertiary) structure is necessary. On fig. Figures 15 and 16 show the spatial structures of two members of this family, endothelin-1 and saraphotoxin 6b, obtained using NMR spectroscopy. In the figures they are rotated so as to achieve maximum spatial homology. But complete homology cannot be obtained by any rotation. Consequently, despite the great similarity of the primary structures, their interaction occurs with different receptor structures, and therefore leads to different physiological effects.

Rice. 15. Spatial structure of human vasoconstrictor endothelin-1 peptide

Rice. 16. Spatial structure of sarafotoxin 6b of the Israeli snake Atractaspis engaddesis

Of course, it is impossible to fully characterize the diversity of functional proteomics with such specific examples. Creating ideas about the huge network of interactions of protein and other molecules in the body requires enormous work and the use of all the means of modern bioinformatics. In fact, the creation of such representations is just beginning. However, there is reason to believe that every year our knowledge in this area will grow rapidly.

Rice. 17. General contours of the map of carboxylic acid metabolism

One of the first successes on this path is the creation of a map of the metabolism of carboxylic acids at the Institute of Biochemistry. A.N. Bach of the Russian Academy of Sciences (Fig. 17). This map represents a network of reactions with a regular periodic structure. This approach has proven successful due to the fact that functionally similar metabolites undergo similar biochemical transformations, forming functionally similar derivatives. In the map, the vertical rows are areas containing compounds with the same number of carbon atoms (from 1 to 10), and the horizontal rows represent rows of functionally similar metabolites. Chemical structures on the map are connected by numerous arrows indicating which enzymes (proteins) are involved in the corresponding chemical transformations. Isn’t it true that this approach is reminiscent of D.I.’s periodic table of chemical elements? Mendeleev? And just like the Mendeleev system, this map has predictive power. With its help, a number of new enzymes were predicted, which were subsequently discovered experimentally.

Similar schemes can be extended to other metabolic processes (for example, carbohydrates, amino acids, etc.), and also used to search for new metabolites of biochemical reactions.

Thus, functional proteomics studies the complex relationships between the structure and function of the proteome.

Practical proteomics

So, the main task of proteomics is to identify the mechanism of interaction of a huge number of proteins and peptides in one organism. What is the practical significance of this grandiose and expensive work? It is obvious that pharmacologists and physicians are primarily interested in the results of such work, since very often there is a close connection between changes in protein composition and the disease state of a person. Therefore, new data in proteomics will be (and are already being used) for the rapid development of new drugs and new treatments for diseases that medicine has been struggling with for centuries. Today, 95% of all pharmacological agents affect proteins. Proteomics, with its systems approach, can help identify and evaluate the importance of emerging proteins much more efficiently, which in turn will accelerate the development of new diagnostic tests and therapeutics.

The first practical application of proteomic research took place long before the term “proteomics” appeared, back in the early 20th century, when the role of insulin in the development of such a serious disease as diabetes was discovered. The creation of insulin drugs saved the lives of millions of people.

At present, proteomics, together with genomics and bioinformatics, is focused on the creation of new drugs (Fig. 18), in which certain proteins will serve as molecular targets. The process of finding new drug targets is solved using bioinformatics, and the object of analysis is the gene. However, after analyzing the genome, it is necessary to obtain evidence that this protein is intensely expressed and is in working condition in the cell. Proteomics solves this problem. In this way, the molecular genetic target for the drug is identified.

Rice. 18. The relationship between genomics, proteomics and bioinformatics in solving the problem of designing new drugs

It should be noted that proteomics itself can solve the problem of finding a target. If we obtain proteomic maps (similar to those presented in Fig. 3 or 4) of normal and pathological tissues, then from the differences in them we can determine which proteins are important for the development of a particular pathological condition, and select them as targets or use these knowledge for diagnosis. It can be assumed that in the future the creation of proteomic blood maps will be added to routine blood testing. To do this, clinics will need to use special equipment with which blood will be periodically taken from patients. If a disease state occurs, the proteomic map of a sick person will only need to be compared with his own proteomic map, but compiled at a time when he was healthy, and it will be possible to identify the changes that have occurred in the protein composition of the blood and determine the cause of the disease. Such a comparison of the proteomes of tumor and normal cells, cells before and after exposure to certain factors (for example, physical or chemical), the use of biological fluids for diagnostic purposes - all this is of great interest and opens up completely new prospects for medicine, veterinary medicine, pharmacology, the food industry and other application areas. There is enormous and interesting work ahead.

List literature

1. Sanger F., Air G.M., Barrell B.G., Brown N.L. et al. Nucliotide sequence of bacteriophage phi X-174 DNA.//Nature. 1977. V. 265, No. 5596. P. 687–695.

2. Fleischmann R.D., Adams M.D., White O. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.//Science. 1995. V. 269, No. 5223. P. 496–512.

3. Nature. 2001. 409, No. 6822 (most of the journal issue is devoted to deciphering the human genome).

4. Ferguson-Smith A.C., Ruddle F.H. The genomics of human homeobox-containing loci.//Pathol. Immunopathol. Res. 1988. V. 7, no. 1–2. P. 119–126.

5. Franklin J. Bioinformatics changing the face of information.//Ann. NY Acad. sci. 1993. V. 700. P. 145–152.

6. Wasinger V.C., Cordwell S.J., Cerpa-Poljak A. et al. Progress with gene-product mapping of the Mollicutes: Mycoplasma genitalium.//Electrophoresis. 1995. V. 16, No. 7. P. 1090–1094.

7. Zamyatnin A.A. The brilliant world of proteins and peptides.//Biology. 2002. No. 25–26. P. 8–13.

8. Gorg A., Weiss W., Dunn M.J. Current two-dimensional electrophoresis technology for proteomics.//Proteomics. 2004. V. 4, No. 12. P. 3665–3685.

9. Ramstrom M., Bergquist J. Miniaturized proteomics and peptidomics using capillary liquid separation and high resolution mass spectrometry.//FEBS Lett. 2004. V. 567, No. 1. P. 92–95.

10. http://au.expasy.org/sprot/

11. http://erop.inbi.ras.ru/

12. Malygin A.G. Metabolism of carboxylic acids (periodic scheme). – M.: “International Education Program”, 1999.

Proteomics is a functional science whose main subject of study is the proteome. The proteome is the entire set of proteins that are produced or modified by an organism or system. Proteomics is the science that studies the types of proteins, and therefore it has helped to discover many new types of this compound - many more than were known before its emergence as a science. The amount of proteins appears to depend on time and the various demands or stresses to which cells or organisms are exposed. Proteomics is an interdisciplinary field that is largely driven by the latest genome research projects. It covers the study of proteomes from the overall level of protein composition, structure and activity. Functional proteomics is often cited as the most important component of functional genomics.

Subject of study

Defining proteomics is not as simple as it might seem at first glance. This science typically involves large-scale experimental analysis of proteins and proteomes, but is often used to explore the possibilities of protein purification.

After genomics and transcriptomics, proteomics is the next step in the study of biological systems. It is much more complex than genomics because the genome of an organism is more or less constant, whereas the proteome differs from cell to cell and from time to time. Individual genes are expressed in different cell types, meaning that even the core set of proteins that are produced in a cell must be identified.

History of study

Proteomics, the study of protein structure, is a direction in biochemistry that emerged relatively recently. In the past, protein research was done using RNA analysis, but it turned out that RNA structure did not correlate with protein content. It is known that mRNA is not always translated into protein, and the amount of protein produced for a given amount of mRNA depends on which gene is being transcribed, as well as the current physiological state of the cell. Proteomics is the science that confirms the presence of a protein and provides a direct estimate of the amount present.

Subsequent changes

Not only does extracting a protein from mRNA damage it, but many proteins also undergo a wide range of chemical modifications after this process. Many of these post-translational modifications are critical to protein function.

Phosphorylation

One such modification is phosphorylation, which occurs with many enzymes and structural proteins during cellular signaling. The addition of phosphate to certain amino acids, most commonly serines and threonines mediated by serine/threonine aminoses or less commonly tyrosine mediated by tyrosine kinases, causes the protein molecule to be targeted for binding or interaction with a varied set of other molecules that recognize the phosphorylated domain.

Because protein phosphorylation is one of the most studied protein modifications, many “proteomic” efforts are aimed at identifying the set of phosphorylated proteins in a specific cell or tissue type under specific circumstances.

Ubiquitination

Ubiquitin is a small protein that can be attached to certain substrates by enzymes scientifically called E3 ubiquitin-ligases. Determining which proteins are poly-ubiquitinated helps to understand how the movement of these molecules is regulated. Likewise, once a researcher has determined which substrates are ubiquitinated by each ligase, it is useful to determine the set of ligases expressed in a particular cell type.

Additional changes

In addition to phosphorylation and ubiquitination, proteins can undergo (among others) methylation, acetylation, glycosylation, oxidation and nitrosylation. Some proteins undergo all of these changes, often in time-dependent combinations. This illustrates the potential difficulty of studying protein structure and function.

Individual proteins are produced under different conditions. A cell may make different sets of proteins at different times or under different conditions, such as during development, cell differentiation, the cell cycle, or carcinogenesis. The further increase in proteome complexity, as already mentioned, implies that most proteins can undergo a wide range of post-translational modifications.

Therefore, research in the field of proteomics is a challenging task in the future, even if the topic of study of this science will remain limited. For more ambitious tasks, such as looking for a biomarker for a specific cancer subtype, a proteomist scientist may choose to study multiple serum samples from multiple cancer patients to minimize confounding factors. Thus, complex experimental designs are sometimes necessary to account for the dynamic complexity of the proteome.

Differences from genomics

Proteomics provides different levels of understanding than genomics for many reasons:

  1. The level of transcription of a gene provides only a rough estimate of its level of translation into protein. Once produced in abundance, mRNA can be quickly degraded or transformed in an inefficient manner, resulting in the production of small amounts of protein.
  2. As mentioned above, many proteins undergo post-translational modifications that greatly affect their functionality. For example, some proteins are not active until they become phosphorylated. Techniques such as phosphoproteomics and glycoproteomics are used to study post-translational modifications.
  3. Many transcripts give rise to more than one protein, through alternative splicing or alternative post-translational modifications.
  4. Many proteins form complexes with other proteins or RNA molecules and act only in the presence of these other molecules. The degree of protein degradation plays an important role in its content.

Reproducibility

One of the major factors affecting the reproducibility of proteomics experiments is the simultaneous elution of many other peptides that can be measured by mass spectrometers. This results in stochastic differences between experiments due to data-dependent tryptic peptide treatments. Although early large-scale analyzes of the yeast proteome showed considerable variability in results between different laboratories, presumably due in part to technical and experimental differences between them, reproducibility has been improved in more recent mass spectrometric analyses, especially when using mass spectrometers.

Research methods

In proteomics, there are many methods for studying proteins. Typically, they can be detected using antibodies (immunoassays) or mass spectrometry. If a complex biological sample is being analyzed, it is necessary to either use a very specific antibody in a quantitative metope blot (qdb) analysis or biochemical separation.

Protein detection using antibodies (immunoassays)

Antibodies to specific proteins or modified forms have been used in biochemistry and cell biology studies. They are among the most common tools used by molecular biologists today. There are several specific methods and protocols that involve the use of antibodies for protein detection. For decades, enzyme-linked immunosorbent assay (ELISA) has been used to detect and quantify them in biological samples. Western blot can be used to detect and quantify individual proteins, where initially a complex organic mixture is separated using SDS-PAGE and then the protein of interest is identified using an antibody.

Modified proteins can be studied by developing an antibody specific for that modification. For example, there are antibodies that only recognize certain proteins when they are tyrosine-phosphorylated, known as phospho-specific antibodies. In addition, there are antibodies specific for other modifications. They can be used to determine the set of proteins that have undergone modification.

Proteomics in medicine

Disease detection at the molecular level is driving a new revolution in diagnosis and treatment. Digital immunoassay technology has improved the detection sensitivity of molecules to the so-called attomolar range. This opportunity gives us the potential to unlock new advances in diagnostics and therapy, but such technologies have been relegated to manual procedures that are not well suited to effective daily use.

Although protein detection with antibodies is still very common in molecular biology, other methods have been developed that do not rely on the antibody. These methods offer various advantages, for example, they can often determine the sequence of a protein or peptide, they can have higher throughput than an antibody, and sometimes they can identify and quantify proteins for which no antibodies exist.

Proteomics methods

One of the earliest methods for protein analysis was Edman degradation (introduced in 1967), where a single peptide undergoes several steps of chemical degradation to determine its sequence. These methods have mostly been superseded by technologies that provide higher throughput. Various areas of proteomics also depend on the methods.

Basic separation methods

Analyzing complex biological samples requires reducing their complexity. This can be done using one-dimensional or two-dimensional separation. More recently, online methods have been developed in which individual peptides were separated using reverse phase chromatography and then directly ionized using the ESI method.

Hybrid technologies

There are several hybrid technologies that use antibody-based purification of individual analytes and then mass spectrometric analysis to identify and quantify them. Examples of these methods are the MSIA (mass spectrometric immunoassay) method developed by Randal Nelson in 1995 and the SISCAPA (Stable Isotope Standard Capture with Antipeptide Antibody) method introduced by Lee Anderson in 2004.

Comparative proteomic analyzes can reveal the role of proteins in complex biological systems, including reproduction. For example, treatment with the insecticide triazophos results in an increase in brown seedlings (Nolaparvata lugens (Stål)) - male accessory iron proteins (Acps), which can be transferred to females through mating, resulting in increased fertility (i.e., fertility) in females. To identify changes in the types of accessory gland proteins (Acps) and reproductive proteins obtained from male grasshoppers, the researchers performed a comparative proteomic analysis of hibernating male N. lugens. The results showed that these proteins are involved in the reproductive process of adult female and male grasshoppers N. lugens.

High-throughput proteomic technologies

Proteomics is a science that has steadily gained momentum over the past decade. Many of the approaches developed by this science are absolutely revolutionary, while some are based on old scientific methods. Methods based on mass spectrometry and microwells are the most common technologies for large-scale study of proteins.

Mass spectrometry and profiling

Currently, two methods of mass spectrometry are used for protein profiling. A better known and widely used method uses high resolution 2D electrophoresis to separate proteins from different samples in parallel, followed by selection and staining of differentiated expressed proteins to be identified by mass spectrometry. Despite the advances in 2DE and the general sophistication of this method, it also has its limits. The main problem is the inability to identify all proteins in a sample, given their variability and other unique properties.

The second quantitative approach uses stable isotope tags to differentially label proteins from two different complex mixtures. Here, proteins in a complex mixture are first labeled with isotopes and then digested to produce labeled peptides. The labeled mixtures are then combined, with the peptides separated by multidimensional liquid chromatography and analyzed by tandem mass spectrometry. Isotope-coded tags (ICAT) are widely used isotopic tags. In this scientific method, cysteine ​​residues of proteins are covalently attached to the ICAT reagent, thereby reducing the complexity of mixtures by eliminating non-cysteine ​​residues.

Proteomics, genomics, metabolomics are new directions in biology, characterized by complexity and innovation. Not everyone can study them.