Cambridge Structural Database
   HOME

TheInfoList



OR:

The Cambridge Structural Database (CSD) is both a repository and a validated and curated resource for the three-dimensional structural data of
molecule A molecule is a group of two or more atoms held together by attractive forces known as chemical bonds; depending on context, the term may or may not include ions which satisfy this criterion. In quantum physics, organic chemistry, and bioch ...
s generally containing at least
carbon Carbon () is a chemical element with the symbol C and atomic number 6. It is nonmetallic and tetravalent In chemistry, the valence (US spelling) or valency (British spelling) of an element is the measure of its combining capacity with o ...
and
hydrogen Hydrogen is the chemical element with the symbol H and atomic number 1. Hydrogen is the lightest element. At standard conditions hydrogen is a gas of diatomic molecules having the formula . It is colorless, odorless, tasteless, non-toxic, an ...
, comprising a wide range of organic, metal-organic and
organometallic Organometallic chemistry is the study of organometallic compounds, chemical compounds containing at least one chemical bond between a carbon atom of an organic molecule and a metal, including alkali, alkaline earth, and transition metals, and so ...
molecules. The specific entries are complementary to the other
crystallographic database A crystallographic database is a database specifically designed to store information about the structure of molecules and crystals. Crystals are solids having, in all three dimensions of space, a regularly repeating arrangement of atoms, ions, or ...
s such as the
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cry ...
(PDB),
Inorganic Crystal Structure Database Inorganic Crystal Structure Database (ICSD) is a chemical database founded in 1978 by Günter Bergerhoff (University of Bonn) and I. D. Brown (University of McMaster, Canada). It is now produced by FIZ Karlsruhe in Europe and the U.S. National Inst ...
and
International Centre for Diffraction Data The International Centre for Diffraction Data (ICDD) maintains a database of powder diffraction patterns, the Powder Diffraction File (PDF), including the d-spacings (related to angle of diffraction) and relative intensities of observable diffrac ...
. The data, typically obtained by
X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
and less frequently by
electron diffraction Electron diffraction refers to the bending of electron beams around atomic structures. This behaviour, typical for waves, is applicable to electrons due to the wave–particle duality stating that electrons behave as both particles and waves. Si ...
or
neutron diffraction Neutron diffraction or elastic neutron scattering is the application of neutron scattering to the determination of the atomic and/or magnetic structure of a material. A sample to be examined is placed in a beam of thermal or cold neutrons to o ...
, and submitted by crystallographers and
chemist A chemist (from Greek ''chēm(ía)'' alchemy; replacing ''chymist'' from Medieval Latin ''alchemist'') is a scientist trained in the study of chemistry. Chemists study the composition of matter and its properties. Chemists carefully describe th ...
s from around the world, are freely accessible (as deposited by authors) on the Internet via the CSD's parent organization's website (CCDC, Repository). The CSD is overseen by the not-for-profit incorporated company called the
Cambridge Crystallographic Data Centre The Cambridge Crystallographic Data Centre (CCDC) is a non-profit organisation based in Cambridge, England. Its primary activity is the compilation and maintenance of the Cambridge Structural Database, a database of small molecule crystal struc ...
, CCDC. The CSD is a widely used repository for small-molecule organic and metal-organic crystal structures for scientists. Structures deposited with
Cambridge Crystallographic Data Centre The Cambridge Crystallographic Data Centre (CCDC) is a non-profit organisation based in Cambridge, England. Its primary activity is the compilation and maintenance of the Cambridge Structural Database, a database of small molecule crystal struc ...
(CCDC) are publicly available for download at the point of publication or at consent from the depositor. They are also scientifically enriched and included in the database used by software offered by the centre. Targeted subsets of the CSD are also freely available to support teaching and other activities.


History

The CCDC grew out of the activities of the crystallography group led by
Olga Kennard Olga Kennard, Lady Burgen ( Weisz; born 23 March 1924) is a British scientist specialising in crystallography, and founder of the Cambridge Crystallographic Data Centre. Her research focused on determining the structures of organic molecule ...
OBE FRS in the Department of Organic, Inorganic and Theoretical Chemistry of the
University of Cambridge , mottoeng = Literal: From here, light and sacred draughts. Non literal: From this place, we gain enlightenment and precious knowledge. , established = , other_name = The Chancellor, Masters and Schola ...
. From 1965, the group began to collect published bibliographic, chemical and crystal structure data for all small molecules studied by
X-ray An X-ray, or, much less commonly, X-radiation, is a penetrating form of high-energy electromagnetic radiation. Most X-rays have a wavelength ranging from 10  picometers to 10  nanometers, corresponding to frequencies in the range 30&nb ...
or
neutron diffraction Neutron diffraction or elastic neutron scattering is the application of neutron scattering to the determination of the atomic and/or magnetic structure of a material. A sample to be examined is placed in a beam of thermal or cold neutrons to o ...
. With the rapid developments in
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
taking place at this time, this collection was encoded in electronic form and became known as the Cambridge Structural Database (CSD). The CSD was one of the first numerical scientific databases to begin operations anywhere in the world, and received academic grants from the UK Office for Scientific and Technical Information and then from the UK
Science and Engineering Research Council The Science and Engineering Research Council (SERC) and its predecessor the Science Research Council (SRC) were the UK agencies in charge of publicly funded scientific and engineering research activities, including astronomy, biotechnology and bi ...
. These funds, together with subventions from National Affiliated Centres, enabled the development of the CSD and its associated software during the 1970s and 1980s. The first releases of the CSD System to the United States, Italy and Japan occurred in the early 1970s. By the early 1980s the CSD System was being distributed in more than 30 countries. As of 2014, the CSD System was distributed to academics in 70 countries. During the 1980s, interest in the CSD System from
pharmaceutical A medication (also called medicament, medicine, pharmaceutical drug, medicinal drug or simply drug) is a drug used to diagnose, cure, treat, or prevent disease. Drug therapy (pharmacotherapy) is an important part of the medical field and re ...
and
agrochemical An agrochemical or agrichemical, a contraction of ''agricultural chemical'', is a chemical product used in industrial agriculture. Agrichemical refers to biocides ( pesticides including insecticides, herbicides, fungicides and nematicides) an ...
s companies increased significantly. This led to the establishment of the
Cambridge Crystallographic Data Centre The Cambridge Crystallographic Data Centre (CCDC) is a non-profit organisation based in Cambridge, England. Its primary activity is the compilation and maintenance of the Cambridge Structural Database, a database of small molecule crystal struc ...
(CCDC) as an independent company in 1987, with the legal status of a non-profit charitable institution, and with its operations overseen by an international board of governors. The CCDC moved into purpose-built premises on the site of the University Department of Chemistry in 1992. Kennard retired as Director in 1997 and was succeeded by David Hartley (1997-2002) and Frank Allen (2002-2008). Colin Groom was appointed as executive director from 1 October 2008 to September 2017. And most recently, Juergen Harter was appointed CEO in June 2018. CCDC software products diversified to the use of crystallographic data in applications in the life sciences and crystallography. Much of this software development and marketing is carried out by CCDC Software Limited (founded in 1998), a wholly owned subsidiary which covenants all of its profits back to the CCDC. Although the CCDC is a self-administering organization, it retains close links with the
University of Cambridge , mottoeng = Literal: From here, light and sacred draughts. Non literal: From this place, we gain enlightenment and precious knowledge. , established = , other_name = The Chancellor, Masters and Schola ...
, and is a University Partner Institution that is qualified to train postgraduate students for higher degrees (PhD, MPhil). The CCDC established US applications and support operations in the USA in October 2013, initially at
Rutgers, the State University of New Jersey Rutgers University (; RU), officially Rutgers, The State University of New Jersey, is a public land-grant research university consisting of four campuses in New Jersey. Chartered in 1766, Rutgers was originally called Queen's College, and was ...
, where it is co-located with the RCSB Protein Data Bank


Contents

The CSD is updated with about 50,000 new structures each year, and with improvements to existing entries. Entries (structures) in the repository are released for public access as soon as the corresponding entry has appeared in the peer-reviewed scientific literature. Meanwhile, data can also be deposited and published directly through the CSD without an accompanying scientific article as what is known as
''CSD Communication''
Periodically, general statistics about the breadth of CSD holdings are reported, for example the January 2014 report. , the summary statistics are as follows: As of January 2019, the top 25 scientific journals in terms of publication of structures in the CSD repository were: ::1. structures were reported in '' Inorg. Chem.'' ::2. structures were reported in '' Dalton & J. Chem. Soc., Dalton Trans.'' ::3. structures were reported in ''
Organometallics ''Organometallics'' is a biweekly journal published by the American Chemical Society. Its area of focus is organometallic and organometalloid chemistry. This peer-reviewed journal has an impact factor of 3.837 as reported by the 2021 Journal Citat ...
'' ::4. structures were reported in ''
J. Am. Chem. Soc. The ''Journal of the American Chemical Society'' is a weekly peer-reviewed scientific journal that was established in 1879 by the American Chemical Society. The journal has absorbed two other publications in its history, the ''Journal of Analytic ...
'' ::5. structures were reported in '' Acta Crystallogr. Sect. E'' ::6. structures were reported in '' Chem. Eur. J.'' ::7. structures were reported in '' J. Organomet. Chem.'' ::8. structures were reported in '' Angew. Chem. Int. Ed.'' ::9. structures were reported in '' Inorg. Chim. Acta'' ::10. structures were reported in '' Chem. Commun. & J. Chem. Soc.'' ::11. structures were reported in
CSD Communications
' ::12. structures were reported in '' Acta Crystallogr. Sect. C'' ::13. structures were reported in ''
Polyhedron In geometry, a polyhedron (plural polyhedra or polyhedrons; ) is a three-dimensional shape with flat polygonal faces, straight edges and sharp corners or vertices. A convex polyhedron is the convex hull of finitely many points, not all on th ...
'' ::14. structures were reported in '' Eur. J. Inorg. Chem.'' ::15. structures were reported in ''
J. Org. Chem. ''The Journal of Organic Chemistry'', colloquially known as ''JOC'', is a peer-reviewed scientific journal for original contributions of fundamental research in all branches of theory and practice in organic and bioorganic chemistry. It is publ ...
'' ::16. structures were reported in '' Cryst. Growth Des.'' ::17. structures were reported in ''
CrystEngComm ''CrystEngComm'' is a peer-reviewed online-only scientific journal publishing original research and review articles on all aspects of crystal engineering including properties, polymorphism, target materials, and crystalline nanomaterials. It i ...
'' ::18. structures were reported in ''
Organic Letters ''Organic Letters'' is a biweekly peer-reviewed scientific journal covering research in organic chemistry. It was established in 1999 and is published by the American Chemical Society. In 2014, the journal moved to a hybrid open access publishin ...
'' ::19. structures were reported in '' Z. Anorg. Allg. Chem.'' ::20. structures were reported in '' Acta Crystallogr. Sect. B'' ::21. structures were reported in ''
Tetrahedron In geometry, a tetrahedron (plural: tetrahedra or tetrahedrons), also known as a triangular pyramid, is a polyhedron composed of four triangular faces, six straight edges, and four vertex corners. The tetrahedron is the simplest of all the o ...
'' structures were reported as ''Private Communication to the CSD'' ::22. structures were reported in '' J. Mol. Struct.'' ::23. structures were reported in '' Tetrahedron Lett.'' ::24. structures were reported in '' Eur. J. Org. Chem.'' ::25. structures were reported in ''
New Journal of Chemistry The ''New Journal of Chemistry'' is a monthly peer-reviewed scientific journal publishing research and review articles on all aspects of chemistry. It is published by the Royal Society of Chemistry on behalf of the French National Centre for Scient ...
'' These 25 journals account for 704,541 of the 996,193 or 70.7% of the structures in the CSD. These data show that most structures are determined by X-ray diffraction, with less than 1% of structures being determined by
neutron diffraction Neutron diffraction or elastic neutron scattering is the application of neutron scattering to the determination of the atomic and/or magnetic structure of a material. A sample to be examined is placed in a beam of thermal or cold neutrons to o ...
or
powder diffraction Powder diffraction is a scientific technique using X-ray, neutron, or electron diffraction on powder or microcrystalline samples for structural characterization of materials. An instrument dedicated to performing such powder measurements is call ...
. The number of error-free coordinates were taken as a percentage of structures for which 3D coordinates are present in the CSD. The significance of the structure factor files, mentioned above, is that, for CSD structures determined by X-ray diffraction that have a structure file, a crystallographer can verify the interpretation of the observed measurements.


Growth trend

Historically, the number of structures in the CSD has grown at an approximately exponential rate passing the 25,000 structures milestone in 1977, the 50,000 structures milestone in 1983, the 125,000 structures milestone in 1992, the 250,000 structures milestone in 2001, the 500,000 structures milestone in 2009, and the 1,000,000 structures milestone on June 8, 2019. The one millionth structure added to CSD is the crystal structure of 1-(7,9-diacetyl-11-methyl-6H-azepino ,2-andol-6-yl)propan-2-one. ''Note: data for 1923-1964 are aggregated together in the last line of the table.''


File format

The primary file format for CSD structure deposition, adopted around 1991, is the "Crystallographic Information file" format, CIF. The deposited CSD files can be downloaded in the CIF format. The validated and curated CSD files can be exported in a wide range of formats, including CIF, MOL, Mol2, PDB, SHELX and XMol, using tools in the CSD System. The CCDC uses two different codes to distinguish between the deposited dataset and the curated CSD entry. For example, one specific ‘''CSD Communication''’ of an organic molecule was deposited with the CCDC and assigned the deposition number 'CCDC-991327.' This allows free public access to the data as deposited. From the deposited data, selected information is extracted to prepare the validated and curated CSD entry which was assigned the refcode 'MITGUT'. As a part of the curation process, CCDC also applies an algorithm, DeCIFer, to help the editors assign chemistry to structures when those representations (e.g. bond types and charge assignments etc.) are missing from the original CIF files submitted. The validated and curated entry is included in the CSD System and WebCSD distributions, with availability restricted to those making appropriate contributions.


Viewing the data

Each data set in CSD can be openly viewed and retrieved using the fre
Access Structure
service. Through this web-browser based service, users can view the data set in 2D and 3D, obtain some basic information about the structure, and download the deposited data set. More advanced search functions and curated information are available through the subscription base
CSD system
Besides using th
CSD system
the structure files may be viewed using one of several open source computer programs such as
Jmol Jmol is computer software for molecular modelling chemical structures in 3-dimensions. Jmol returns a 3D representation of a molecule that may be used as a teaching tool, or for research e.g., in chemistry and biochemistry. It is written in the ...
. Some other free, but not open source programs include
MDL Chime MDL ''Chime'' was a free plugin used by web browsers to display the three-dimensional structures of molecules. and was based on the RasMol code. Chime was used by a wide range of biochemistry web sites for the visualization of macromolecules ...
,
Pymol PyMOL is an open source but proprietary molecular visualization system created by Warren Lyford DeLano. It was commercialized initially by DeLano Scientific LLC, which was a private software company dedicated to creating useful tools that becom ...
,
UCSF Chimera UCSF Chimera (or simply Chimera) is an extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, a ...
,
Rasmol RasMol is a computer program written for molecular graphics visualization intended and used mainly to depict and explore biological macromolecule structures, such as those found in the Protein Data Bank. It was originally developed by Roger Sayle ...
, WINGX, the CCDC provides a free version of its visualization progra
Mercury
Starting from 2015,
Mercury Mercury commonly refers to: * Mercury (planet), the nearest planet to the Sun * Mercury (element), a metallic chemical element with the symbol Hg * Mercury (mythology), a Roman god Mercury or The Mercury may also refer to: Companies * Merc ...
from CCDC also provides the functionality to generate 3D print ready file from structures in CSD.


See also

*
Crystallographic database A crystallographic database is a database specifically designed to store information about the structure of molecules and crystals. Crystals are solids having, in all three dimensions of space, a regularly repeating arrangement of atoms, ions, or ...
*
Mercury Mercury commonly refers to: * Mercury (planet), the nearest planet to the Sun * Mercury (element), a metallic chemical element with the symbol Hg * Mercury (mythology), a Roman god Mercury or The Mercury may also refer to: Companies * Merc ...
*
Protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, the monomers of the polymer. A single amino acid monomer ma ...


References


External links


The Cambridge Crystallographic Data Centre (CCDC)
— parent site to CSD {{Crystallography Biological databases Chemical databases Chemical industry in the United Kingdom Crystallographic databases Databases in the United Kingdom Science and technology in Cambridgeshire