SAMPL Challenge
   HOME

TheInfoList



OR:

SAMPL (Statistical Assessment of the Modeling of Proteins and Ligands) is a set of community-wide blind challenges aimed to advance computational techniques as standard predictive tools in
rational drug design Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that activa ...
. A broad range of biologically relevant systems with different sizes and levels of complexities including
proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
, host–guest complexes, and drug-like small molecules have been selected to test the latest modeling methods and force fields in SAMPL. New experimental data, such as
binding affinity In biochemistry and pharmacology, a ligand is a substance that forms a complex with a biomolecule to serve a biological purpose. The etymology stems from ''ligare'', which means 'to bind'. In protein-ligand binding, the ligand is usually a mol ...
and hydration free energy, are withheld from participants until the prediction submission deadline, so that the true
predictive power The concept of predictive power, the power of a scientific theory to generate testable predictions, differs from ''explanatory power'' and ''descriptive power'' (where phenomena that are already known are retrospectively explained or described ...
of methods can be revealed. The most recent SAMPL5 challenge contains two prediction categories: the binding affinity of host–guest systems, and the distribution coefficients of drug-like molecules between water and cyclohexane. Since 2008, the SAMPL challenge series has attracte interest from scientists engaged in the field of
computer-aided drug design Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that acti ...
(CADD) The current SAMPL organizers include John Chodera, Michael K. Gilson, David Mobley, and Michael Shirts.


Project significance

The SAMPL challenge seeks to accelerate progress in developing quantitative, accurate drug discovery tools by providing prospective validation and rigorous comparisons for computational methodologies and force fields. Computer-aided drug design methods have been considerably improved over time, along with the rapid growth of high-performance computing capabilities. However, their applicability in the pharmaceutical industry are still highly limited, due to the insufficient accuracy. Lacking large-scale prospective validations, methods tend to suffer from over-fitting the pre-existing experimental data. To overcome this, SAMPL challenges have been organized as blind tests: each time new datasets are carefully designed and collected from academic or industrial research laboratories, and measurements are released shortly after the deadline of prediction submission. Researchers then can compare those high-quality, prospective experimental data with the submitted estimates. A key emphasis is on lessons learned, allowing participants in future challenges to benefit from modeling improvements made based on earlier challenges. SAMPL has historically focused on the properties of host–guest systems and drug-like small molecules. These simply model systems require considerably less computational resources to simulate than protein systems, and thus
converge Converge may refer to: * Converge (band), American hardcore punk band * Converge (Baptist denomination), American national evangelical Baptist body * Limit (mathematics) * Converge ICT, internet service provider in the Philippines *CONVERGE CFD s ...
more quickly. Through careful design, these model systems can be used to focus on one particular or a subset of simulation challenges. The past several SAMPL host–guest, hydration free energy and log D challenges revealed the limitations in generalized force fields, facilitated the development of solvent models, and highlighted the importance of properly handling protonation states and salt effects.


Participation

Registration and participation is free for SAMPL challenges. Beginning with SAMPL7, challenge participation data was posted on th
SAMPL website
as well as th
GitHub page for the specific challenge
Instructions, input files and results were then provided through GitHub (earlier challenges provided content primarily through D3R for SAMPL4-5, and via other means for earlier SAMPLs). Participants were allowed to submit multiple predictions through the D3R website, either anonymously or with research affiliation. Since the SAMPL2 challenge, all participants have been invited to attend the SAMPL workshops and submit manuscripts to describe their results. After a peer-review process, the resulting papers, along with the overview papers which summarize all submitting data, were published in the special issues of the Journal of Computer-Aided Molecular Design.


Funding

The SAMPL project was recently funded by the
NIH The National Institutes of Health, commonly referred to as NIH (with each letter pronounced individually), is the primary agency of the United States government responsible for biomedical and public health research. It was founded in the late ...
(grant GM124270-01A1), for the period of Sept. 2018 through August 2022, to allow the design of future SAMPL challenges to drive advances in the areas they are most needed for modeling efforts. The effort is spearheaded by David L. Mobley (UC Irvine) with co-investigators John D. Chodera (MSKCC),
Bruce C. Gibb Bruce C. Gibb (born 1965 in Aberdeen, Scotland) is a professor of chemistry at Tulane University. He is notable for his work in aqueous supramolecular chemistry, with particular emphasis on self-assembly leading to compartmentalization, and contrib ...
(Tulane), and Lyle Isaacs (Maryland). Currently challenges and workshops are run in partnership with the NIH-funde
Drug Design Data Resource
but this will likely change over time as funding for the two projects is not coupled. Funding also allowed a broadening of scope of SAMPL; through SAMPL6, its role had been seen as primarily focused on physical properties, with D3R handling protein-ligand challenges. However, the funded effort broadened its focus to include systems which will drive improvements in modeling, including potentially suitable protein-ligand systems. This is still in contrast to D3R, which relies on donated datasets of pharmaceutical interest, whereas SAMPL challenges are specifically designed to focus on specific modeling challenges.


History


Earlier SAMPL challenges

The first SAMPL exercise, SAMPL0 (2008) focused on the predictions of solvation free energies of 17 small molecules. A research group at
Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...
and scientists at
OpenEye Scientific Software OpenEye Scientific Software is an American software company founded by Anthony Nicholls in 1997. It develops large-scale molecular modelling applications and toolkits. In July 2022, Cadence Design Systems agreed to acquire OpenEye for $500mi ...
carried out the calculations. Despite the informal format, SAMPL0 laid the groundwork for the following SAMPL challenges. SAMPL1 (2009) and SAMPL2 challenges (2010) were organized by OpenEye and continued to focus on predicting solvation free energies of drug-like small molecules. Attempts were also made to predict binding affinities, binding poses and tautomer ratios. Both challenges attracted significant participations from computational scientists and researchers in academia and industry.


SAMPL3 and SAMPL4

The blinded data sets for host–guest binding affinities were introduced for the first time in SAMPL3 (2011-2012), along with solvation free energies for small molecules and the binding affinity data for 500 fragment-like tyrosine inhibitors. Three host molecules were all from the
cucurbituril In host-guest chemistry, cucurbiturils are macrocyclic molecules made of glycoluril () monomers linked by methylene bridges (). The oxygen atoms are located along the edges of the band and are tilted inwards, forming a partly enclosed cavity ( ...
family. The SAMPL3 challenge received 103 submissions from 23 research groups worldwide. Different from the prior three SAMPL events, the SAMPL4 exercise (2013-2014) was coordinated by academic researchers, with logistical support from OpenEye. Datasets in SAMPL4 consisted of binding affinities for host–guest systems and
HIV integrase inhibitors Integrase inhibitors (INIs) are a class of antiretroviral drug designed to block the action of integrase, a viral enzyme that inserts the viral genome into the DNA of the host cell. Since integration is a vital step in retroviral replication, bloc ...
, as well as hydration free energies of small molecules. Host molecules included cucurbit ril (CB7) and octa-acid. The SAMPL4 hydration challenge involved 49 submissions from 19 groups. The participation of the host–guest challenge also grew significantly compared to SAMPL3. The workshop was held at Stanford University in September, 2013.


SAMPL5

The protein-ligand challenges were separated from SAMPL in SAMPL5 (2015-2016) and were distributed as the new
Grand Challenges of the Drug Design Data Resource Grand may refer to: People with the name * Grand (surname) * Grand L. Bush (born 1955), American actor * Grand Mixer DXT, American turntablist * Grand Puba (born 1966), American rapper Places * Grand, Oklahoma * Grand, Vosges, village and comm ...
(D3R). SAMPL5 allowed participants to make predictions of the binding affinities of three sets of host–guest systems: an acyclic CB7 derivative and two host from the octa-acid family. Participants were also encouraged to submit predictions for binding enthalpies. A wide array of computational methods were tested, including
density functional theory Density-functional theory (DFT) is a computational quantum mechanical modelling method used in physics, chemistry and materials science to investigate the electronic structure (or nuclear structure) (principally the ground state) of many-body ...
(DFT),
molecular dynamics Molecular dynamics (MD) is a computer simulation method for analyzing the physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamic "evolution" of the ...
, docking, and metadynamics. The
distribution coefficient In the physical sciences, a partition coefficient (''P'') or distribution coefficient (''D'') is the ratio of concentrations of a compound in a mixture of two immiscible solvents at equilibrium. This ratio is therefore a comparison of the solub ...
predictions were introduced for the first time, receiving total of 76 submissions from 18 researcher groups or scientists for a set of 53 small molecules. The workshop was held in March, 2016 at
University of California, San Diego The University of California, San Diego (UC San Diego or colloquially, UCSD) is a public university, public Land-grant university, land-grant research university in San Diego, California. Established in 1960 near the pre-existing Scripps Insti ...
as part of the D3R workshop. The top-performing methods in the host–guest challenge yielded encouraging yet imperfect correlations with experimental data, accompanied by large, systematic shifts relative to experiment.


SAMPL6

The SAMPL6 testing systems include cucurbit ril, octa-acid, tetra-endo-methyl octa-acid, and a series of fragment-like small molecules. The host–guest, conformational sampling and pKa prediction challenges of SAMPL6 are now closed. The SAMPL6 workshop was jointly run with the D3R workshop in February 2018 at the
Scripps Institution of Oceanography The Scripps Institution of Oceanography (sometimes referred to as SIO, Scripps Oceanography, or Scripps) in San Diego, California, US founded in 1903, is one of the oldest and largest centers for oceanography, ocean and Earth science research ...
and
SAMPL special issue
of the ''Journal of Computer Aided Molecular Design'' reported many of the results. A SAMPL6 Part II challenge focused on a small
octanol-water partition coefficient The ''n''-octanol-water partition coefficient, ''K''ow is a partition coefficient for the two-phase system consisting of ''n''-octanol and water. ''K''ow is also frequently referred to by the symbol P, especially in the English literature. It is a ...
prediction set and was followed by a virtual workshop on May 16, 2019 and a joint D3R/SAMPL workshop in San Diego in August 2019. A special issue or special section of JCAMD is planned to report the results. SAMPL6 inputs and results are available via th
SAMPL6 GitHub repository


SAMPL7

SAMPL7 again included host-guest challenges and a physical property challenge. A protein-ligand binding challenge on PHIPA fragments was also included. Host-guest binding focused on several small molecules binding to octa-acid and exo-octa-acid; binding of two compounds to a series of cyclodextrin derivatives; and binding of a series of small molecules to a clip-like guest known as TrimerTrip. A SAMPL7 virtual workshop took place an
is available online
A SAMPL7 physical properties challeng
is currently ongoing
Plans for a EuroSAMPL in-person workshop in Fall 2020 were derailed by COVID-19 and the workshop is being conducted virtually. SAMPL7 inputs and (as challenge components are completed, results) are available via th
SAMPL6 GitHub repository


SAMPL8

SAMPL8 included host-guest components on binding of drugs of abuse to CB8, and a series of small molecules to Gibb Deep Cavity Cavitands (GDCCs), as detailed on th
SAMPL8 GitHub repository
An additional pKa and logD challenge focused on pK and logD prediction for a series of drug-like molecules.


SAMPL9

SAMPL9 is in planning stages, except that a SAMPL9 host-guest challenge on a host from Lyle Isaacs' group is currently underway. Details are available on th
SAMPL9 GitHub repository


SAMPL Special Issues


SAMPL Publications

A relatively complete list o
SAMPL-related publications
is maintained by the SAMPL organizers; more than 150 related papers have been published.


Future challenges

SAMPL is slated to continue its focus on physical property prediction, including logP and logD values, pKa prediction, host–guest binding, and other properties, as well as broadening to include a protein-ligand component. Some data is planned to be collected directly by the SAMPL co-investigators (Chodera, Gibb and Isaacs), but industry partnerships and internships are also proposed.


See also


References

{{reflist, 30em


External links


Website
Drug discovery Computational chemistry