Virtual screening (VS) is a computational technique used in

drug discovery In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which new candidate medications are discovered. Historically, drugs were discovered by identifying the active ingredient from traditional remedies or b ...

to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...

receptor Receptor may refer to: *Sensory receptor, in physiology, any structure which, on receiving environmental stimuli, produces an informative nerve impulse *Receptor (biochemistry), in biochemistry, a protein molecule that receives and responds to a n ...

enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products ...

. Virtual screening has been defined as "automatically evaluating very large libraries of compounds" using computer programs. As this definition suggests, VS has largely been a numbers game focusing on how the enormous

chemical space Chemical space is a concept in cheminformatics referring to the property space spanned by all possible molecules and chemical compounds adhering to a given set of construction principles and boundary conditions. It contains millions of compounds wh ...

of over 10⁶⁰ conceivable compounds can be filtered to a manageable number that can be synthesized, purchased, and tested. Although searching the entire chemical universe may be a theoretically interesting problem, more practical VS scenarios focus on designing and optimizing targeted combinatorial libraries and enriching libraries of available compounds from in-house compound repositories or vendor offerings. As the accuracy of the method has increased, virtual screening has become an integral part of the

process. Virtual Screening can be used to select in house database compounds for screening, choose compounds that can be purchased externally, and to choose which compound should be synthesized next.

Methods

There are two broad categories of screening techniques: ligand-based and structure-based. The remainder of this page will reflect Figure 1 Flow Chart of Virtual Screening.

Ligand-based methods

Given a set of structurally diverse

ligands In coordination chemistry, a ligand is an ion or molecule (functional group) that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's electr ...

that binds to a

, a model of the receptor can be built by exploiting the collective information contained in such set of ligands. Different computational techniques explore the structural, electronic, molecular shape, and physicochemical similarities of different ligands that could imply their mode of action against a specific molecular receptor or cell lines. A candidate ligand can then be compared to the pharmacophore model to determine whether it is compatible with it and therefore likely to bind. Different 2D chemical similarity analysis methods have been used to scan a databases to find active ligands. Another popular approach used in ligand-based virtual screening consist on searching molecules with shape similar to that of known actives, as such molecules will fit the target's binding site and hence will be likely to bind the target. There are a number of prospective applications of this class of techniques in the literature. Pharmacophoric extensions of these 3D methods are also freely-available as webservers. Also shape based virtual screening has gained significant popularity.

Structure-based methods

Structure-based virtual screening approach includes different computational techniques that consider the structure of the receptor that is the molecular target of the investigated active ligands. Some of these techniques include molecular docking, structure-based pharmacophore prediction, and molecular dynamics simulations. Molecular docking is the most used structure-based technique, and it applies a scoring function to estimate the fitness of each ligand against the binding site of the macromolecular receptor, helping to choose the ligands with the most high affinity. Currently, there are some webservers oriented to prospective virtual screening.

Hybrid methods

Hybrid methods that rely on structural and ligand similarity were also developed to overcome the limitations of traditional VLS approaches. This methodologies utilizes evolution‐based ligand‐binding information to predict small-molecule binders and can employ both global structural similarity and pocket similarity. A global structural similarity based approach employs both an experimental structure or a predicted protein model to find structural similarity with proteins in the PDB holo‐template library. Upon detecting significant structural similarity, 2D fingerprint based Tanimoto coefficient metric is applied to screen for small-molecules that are similar to ligands extracted from selected holo PDB templates. The predictions from this method have been experimentally assessed and shows good enrichment in identifying active small molecules. The above specified method depends on global structural similarity and is not capable of ''a priori'' selecting a particular ligand‐binding site in the protein of interest. Further, since the methods rely on 2D similarity assessment for ligands, they are not capable of recognizing stereochemical similarity of small-molecules that are substantially different but demonstrate geometric shape similarity. To address these concerns, a new pocket centric approach, ''PoLi,'' capable of targeting specific binding pockets in holo‐protein templates, was developed and experimentally assessed.

Computing Infrastructure

The computation of pair-wise interactions between atoms, which is a prerequisite for the operation of many virtual screening programs, scales by

O(N^)

, ''N'' is the number of atoms in the system. Due to the quadratic scaling, the computational costs increase quickly.

Ligand-based Approach

Ligand-based methods typically require a fraction of a second for a single structure comparison operation. Sometimes a single CPU is enough to perform a large screening within hours. However, several comparisons can be made in parallel in order to expedite the processing of a large database of compounds.

Structure-based Approach

The size of the task requires a

parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different f ...

infrastructure Infrastructure is the set of facilities and systems that serve a country, city, or other area, and encompasses the services and facilities necessary for its economy, households and firms to function. Infrastructure is composed of public and priv ...

, such as a cluster of

Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...

systems, running a batch queue processor to handle the work, such as

Sun Grid Engine Oracle Grid Engine, previously known as Sun Grid Engine (SGE), CODINE (Computing in Distributed Networked Environments) or GRD (Global Resource Director), was a grid computing computer cluster software system (otherwise known as a batch-queuin ...

or Torque PBS. A means of handling the input from large compound libraries is needed. This requires a form of compound database that can be queried by the parallel cluster, delivering compounds in parallel to the various compute nodes. Commercial database engines may be too ponderous, and a high speed indexing engine, such as

Berkeley DB Berkeley DB (BDB) is an unmaintained embedded database software library for key/value data, historically significant in open source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitr ...

, may be a better choice. Furthermore, it may not be efficient to run one comparison per job, because the ramp up time of the cluster nodes could easily outstrip the amount of useful work. To work around this, it is necessary to process batches of compounds in each cluster job, aggregating the results into some kind of log file. A secondary process, to mine the log files and extract high scoring candidates, can then be run after the whole experiment has been run.

Accuracy

The aim of virtual screening is to identify molecules of novel chemical structure that bind to the macromolecular target of interest. Thus, success of a virtual screen is defined in terms of finding interesting new scaffolds rather than the total number of hits. Interpretations of virtual screening accuracy should, therefore, be considered with caution. Low hit rates of interesting scaffolds are clearly preferable over high hit rates of already known scaffolds. Most tests of virtual screening studies in the literature are retrospective. In these studies, the performance of a VS technique is measured by its ability to retrieve a small set of previously known molecules with affinity to the target of interest (active molecules or just actives) from a library containing a much higher proportion of assumed inactives or decoys. There are several distinct ways to select decoys by matching the properties of the corresponding active molecule and more recently decoys are also selected in a property-unmatched manner. The actual impact of decoy selection, either for training or testing purposes, has also been discussed. By contrast, in prospective applications of virtual screening, the resulting hits are subjected to experimental confirmation (e.g., IC₅₀ measurements). There is consensus that retrospective benchmarks are not good predictors of prospective performance and consequently only prospective studies constitute conclusive proof of the suitability of a technique for a particular target.

Application to drug discovery

Virtual screening is a very useful application when it comes to identifying hit molecules as a beginning for medicinal chemistry. As the virtual screening approach begins to become a more vital and substantial technique within the medicinal chemistry industry the approach has had an expeditious increase.

Ligand-based methods

While not knowing the structure trying to predict how the ligands will bind to the receptor. With the use of pharmacophore features each ligand identified donor, and acceptors. Equating features are overlaid, however given it is unlikely there is a single correct solution.

Pharmacophore models

This technique is used when merging the results of searches by using unlike reference compounds, same descriptors and coefficient, but different active compounds. This technique is beneficial because it is more efficient than just using a single reference structure along with the most accurate performance when it comes to diverse actives. Pharmacophore is an ensemble of steric and electronic features that are needed to have an optimal supramolecular interaction or interactions with a biological target structure in order to precipitate its biological response. Choose a representative as a set of actives, most methods will look for similar bindings. It is preferred to have multiple rigid molecules and the ligands should be diversified, in other words ensure to have different features that don't occur during the binding phase.

Shape-Based Virtual Screening

Shape-based molecular similarity approaches have been established as important and popular virtual screening techniques. At present, the highly optimized screening platform ROCS (Rapid Overlay of Chemical Structures) is considered the de facto industry standard for shape-based, ligand-centric virtual screening. It uses a Gaussian function to define molecular volumes of small organic molecules. The selection of the query conformation is less important, rendering shape-based screening ideal for ligand-based modeling: As the availability of a bioactive conformation for the query is not the limiting factor for screening — it is more the selection of query compound(s) that is decisive for screening performance.

Field-Based Virtual Screening

As an improvement to Shape-Based similarity methods, Field-Based methods try to take into account all the fields that influence a ligand-receptor interaction while being agnostic of the chemical structure used as a query. Examples of other fields that are used in these methods are Electrostatic or Hidrophobic fields.

Quantitative-Structure Activity Relationship

Quantitative-Structure Activity Relationship (QSAR) models consist of predictive models based on information extracted from a set of known active and known inactive compounds. SAR's (Structure Activity Relationship) where data is treated qualitatively and can be used with structural classes and more than one binding mode. Models prioritize compounds for lead discovery.

Machine learning algorithms

Machine learning algorithms have been widely used in virtual screening approaches. Supervised learning techniques use a training and test datasets composed of known active and known inactive compounds. Different ML algorithms have been applied with success in virtual screening strategies, such as recursive partitioning,

support vector machines In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratori ...

k-nearest neighbors In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regres ...

and

neural networks A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...

. These models find the probability that a compound is active and then ranking each compound based on its probability.

Substructural analysis in Machine Learning

The first Machine Learning model used on large datasets is the Substructure Analysis that was created in 1973. Each fragment substructure make a continuous contribution an activity of specific type. Substructure is a method that overcomes the difficulty of massive dimensionality when it comes to analyzing structures in drug design. An efficient substructure analysis is used for structures that have similarities to a multi-level building or tower. Geometry is used for numbering boundary joints for a given structure in the onset and towards the climax. When the method of special static condensation and substitutions routines are developed this method is proved to be more productive than the previous substructure analysis models.

Recursive partitioning

Recursively partitioning is method that creates a decision tree using qualitative data. Understanding the way rules break classes up with a low error of misclassification while repeating each step until no sensible splits can be found. However, recursive partitioning can have poor prediction ability potentially creating fine models at the same rate.

Structure-based methods known protein ligand docking

Ligand can bind into an active site within a protein by using a docking search algorithm, and scoring function in order to identify the most likely cause for an individual ligand while assigning a priority order.

References

External links

VLS3D
– list of over 2000 databases, online and standalone ''in silico'' tools Bioinformatics Drug discovery Cheminformatics Alternatives to animal testing

Methods

Ligand-based methods

Structure-based methods

Hybrid methods

Computing Infrastructure

Ligand-based Approach

Structure-based Approach

Accuracy

Application to drug discovery

Ligand-based methods

Pharmacophore models

Shape-Based Virtual Screening

Field-Based Virtual Screening

Quantitative-Structure Activity Relationship

Machine learning algorithms

Substructural analysis in Machine Learning

Recursive partitioning

Structure-based methods known protein ligand docking

See also

References

Further reading

External links