Publisher Springer Nature contributes millions of chemical-article links

PubChem added more than 26 million links to scientific articles, thanks to contributions from the publisher Springer Nature.  Of these, 1.6 million links point to open access or free-to-read documents! (Read Springer Nature’s press release and presentation about it.)

Springer Nature includes the SpringerLink, SpringerOpen, and BioMed Central research platforms as well as the nature.com website.  Combined, they include more than 10 million scientific documents spanning the primary literature, book chapters, and reference works.  InfoChem, a subsidiary of Springer Nature, identified the chemicals mentioned in these scientific articles using a proprietary approach.

 

What was contributed?

The Springer Nature data collection in PubChem covers over 600 thousand chemical substance records, and contains nearly 4 million scientific article descriptions (of which almost 300 thousand are open- or free-access) and 26.8 million links between chemicals and articles.  The document descriptions include information such as a document object identifier (DOI), publication title, name of the journal or book, document type, subject matter classification, language, open/free access availability, and publication year.

 

Why is this important?

This contribution, which doubles the number of chemical structures in PubChem with links to the scientific literature, improves the accessibility and discoverability of information about chemicals. Nearly all link content provided by Springer Nature is novel to PubChem, with only 10% of the provided chemical structures having a previous link to the scientific literature.

Integration of the Springer Nature links and data within PubChem has opened new possibilities for organizations and researchers. As a result of the contribution, PubChem added the capability to handle DOI-based annotation content. Additional appropriate DOI-based linked content (articles, data sets, and more) can now be added to PubChem.

 

What is Springer Nature?

PubChem Source page for Springer Nature

Springer Nature is a scientific publishing company and a leading global research, educational and professional publisher formed through the merger of Nature Publishing Group, Palgrave Macmillan, Macmillan Education, and Springer Science+Business media.

The SpringerLink research platform provides access to more than 6 million journal articles, 3.7 million book chapters, and more than 480,000 reference works primarily in the areas of science, technology, and medicine.

 

Where can I access the contributed content?

Springer Nature References for CID 19

Each chemical record with a Literature “Springer Nature References” section includes a table containing document links from Springer Nature.  As an example, below are the links to the Springer Nature References section for aspirin (Compound ID 2244 and Substance ID 341138876).  (Read this blog if you are not familiar with how Compounds and Substances in PubChem are different from each other.)

https://pubchem.ncbi.nlm.nih.gov/compound/2244#section=Springer-Nature-References

https://pubchem.ncbi.nlm.nih.gov/substance/341138876 #section=Springer-Nature-References

Click an article title to access the document on a Springer Nature website.  To download all the contributed document data for a chemical record in CSV format, click the “Download” button at the top right of the table (see image). There is a full table data view accessible (by clicking the  icon), where you can see additional data columns such as the DOI.  By default, the articles are ordered by degree of “relevance” to the chemical as provided by Springer Nature, but the sorting field is easily changed through a pulldown menu, and sort direction also may be changed.

 

How to find chemical records with the Springer Nature references?

There are multiple ways to get a complete list of PubChem Substance or Compound records with “Springer Nature References”.  One can:

The PubChem Classification Browser provides the means to navigate PubChem contents using various hierarchical classification trees.  The PubChem Compound TOC (Table of Contents) classification tree allows you to find all chemicals with a given annotation section.  In this case, one can click ‘Literature’ to view the subset fields under literature and find the ‘Springer Nature References’ section.  Clicking on the number will then show compound records with that section.

The entire list of chemical substances provided by Springer Nature is available using the PubChem Data Sources page. (Read this blog to learn more about the PubChem Data Sources page.)  Searching for “Springer Nature” from the list of data sources shown on the page will help lead you to the Springer Nature data source page that has a link to the PubChem records provided by Springer Nature.  Alternatively, you can search the PubChem Compound or PubChem Substance database using the query: “Springer Nature”[sourcename].

Introducing the PubChem Target Summary

Have you ever wanted a convenient way  to access information stored in PubChem about a particular biological target? We’ve created the PubChem Target Summary page to help you easily explore PubChem content.

 

Why a PubChem Target Summary page?

PubChem Target Page for EGFR

PubChem contains a wealth of information about chemical substances.  Included in this massive corpus is more than 230 million biological activity data results from more than one million biological experiments deposited in the BioAssay database.  Finding all relevant information for a given target can involve a lot of clicking!

This new page provides the means to readily navigate and download PubChem content using a gene-centric data view.  In addition, other pertinent annotation is provided to help give context to the biological target relative to the available PubChem content.  This includes information such as:

  • Protein targets encoded by the gene (i.e., protein gene-products)
  • Known drugs, chemical probes, ligands, and compounds tested against the gene or gene-products
  • Available small molecule and RNAi biological assay experiments for the gene or gene-products
  • Annotated information about the gene and gene-products, such as: biological function, relevance to disease, gene/protein family classifications, gene-gene interactions, and pathways

 

How do I access the new PubChem Target Summary page?

Each PubChem BioAssay Record page with a gene/protein target now has a link to its corresponding PubChem Target Summary page.  This can be located under the “BioAssay Target” section of the Assay Record page.  (See image)

PubChem Target Page for EGFR

In addition, the PubChem Target Summary page for a given gene can be accessed via a web URL that contains the corresponding NCBI Gene ID or Gene Symbol.  For example, the following URLs will give the same PubChem Target Summary page for human epidermal growth factor receptor (EGFR) gene (Gene ID 1956):

https://pubchem.ncbi.nlm.nih.gov/target/gene/1956

https://pubchem.ncbi.nlm.nih.gov/target/gene/EGFR

In this URL scheme, by default, using a gene symbol will yield the corresponding human gene target.  If PubChem does not have content for the human gene, an orthologous gene with available content will be provided.  If there is more than one orthologous gene, the orthologue with the smallest NCBI Taxonomy identifier is used.  One can navigate between PubChem Target Summary pages for orthologous genes using the “Orthologs” link provided in the summary table at the top of each target page.  (See image)

Future directions

The types of targets covered by the PubChem Target Summary page may be expanded to include other known target types, such as cell lines and pathways.  In addition, improvements may be made to expand the annotation content and utility of the PubChem Target Summary page.

Spectral Information in PubChem

Did you know that a growing number of chemicals in PubChem contain spectral information?

There are now more than 300 thousand chemicals with spectral information available, including 13C NMR, 1H NMR, 2D NMR, ATR-IR, FT-IR, GC-MS, Raman, UV-Vis, vapor-phase IR, and more.  This content comes primarily from four data sources: the NIST/EPA/NIH Mass Spectral Library, the Hazardous Substances Data Bank (HSDB), the Human Metabolome Database (HMDB), and a new addition, SpectraBase.

The NIST/EPA/NIH Mass Spectral Library includes images of the top-three peaks of GC-MS or MS-MS spectra along with related metadata and annotation for more than two hundred thousand chemicals. The Library’s annotation includes instrument, collision energy, spectrum type, and associated metadata information.

HSDB content is text-based and includes citations and spectra peak information for thousands of chemicals. HMDB annotation includes links to spectra for thousands of chemicals.

PubChem Spectral_data

SpectraBase, provided by Bio-Rad, a commercial publisher of spectral databases and spectroscopy software, includes images of, annotation about, and links to a diverse set of spectral information for tens of thousands of compounds. SpectraBase content includes extensive annotation and a variety of metadata, such as the instrument, measurement technique, sample source, and spectrum source, in addition to the image of the spectra.

For all four sources, additional spectral information is often available directly from the source and easily accessed using the links on the PubChem page.

 

How to find and access spectral data for a compound?

PubChem records with spectral data have a Table of Content (TOC) section labelled “Chemical and Physical Properties” with a “Spectral Properties” subsection. One can use the TOC to jump to a given type of spectral data content.  Clicking the SpectraBase image or the HMDB link directs the user to an external web page for that compound, where one can further interact with the spectral information.

PubChem Classification Browser

The PubChem Classification Browser (https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72) can help you locate all PubChem Compound records containing a particular type of spectral information.  The “Spectral Properties” node can be found under the “Chemical and Physical Properties” section in the PubChem Compound Table of Contents (TOC) classification tree.

PubChem presents at the 254th American Chemical Society National Meeting in Washington D.C. (August 20-24, 2017)

On August 20-24, 2017, the 254th American Chemical Society National Meeting will be held in Washington D.C.  The PubChem team will be at the ACS meeting to present new developments and recent changes in PubChem.  Below is a list of presentations that will be given by the PubChem staff.

 

Day 1 (Sunday, August 20)

 

Day 2 (Monday, August 21)

 

Day 3 (Tuesday, August 22)

  • CINF108: PubChem and open data (S. Kim)
    Junior Ballroom 2 – Washington Marriott at Metro Center, 5:00 pm – 5:25 pm

 

Day 4 (Wednesday, August 23)

 

Day 5 (Thursday, August 24)

 

In addition, one of PubChem’s collaborators will give a presentation on our joint effort to develop a new service that provides information on biologics.

PubChem presents at the American Chemical Society National Meeting in San Francisco (April 2-6, 2017)

On April 2-6, 2017, the 253rd American Chemical Society National Meeting will be held in San Francisco, CA, the theme of which is “Advanced Materials, Technologies, Systems & Processes”.  The PubChem team will be at the ACS meeting to present new developments and recent changes in PubChem.  Below are a list of presentations that will be given by the PubChem staff.

 

Day 1 (Sunday, April 2)

 

Day 2 (Monday, April 3)

 

Day 3 (Tuesday, April 4)

 

Day 4 (Wednesday, April 5)

 

Atomic mass changes in PubChem

PubChem is now using the latest International Union of Pure and Applied Chemistry (IUPAC) recommendations for atomic mass and isotopic composition information.  In addition, PubChem is now restricting the allowed isotopes for a given element to those with a half-life of one millisecond or greater.

Fundamental changes within atomic mass information

Hydrogen and DeuteriumNormally atomic mass updates are not blog worthy; however, there are some fundamental changes in the way masses are conceptualized that affect the atomic weight values computed for nearly all compounds in PubChem.

Molecular weight is one of the most frequently requested pieces of information about a chemical.  To compute a molecular weight of a molecule, one consults a periodic chart and sums the average atomic weights of the elements comprising the chemical, while considering any specified isotopic enrichment information.  Although the molecular weight computation seems straightforward, as greater degrees of precision in atomic masses are known, the chemical science community is recognizing complex issues with average atomic weight and isotopic data.

The abundance ratio between different isotopes of a given element is used to determine its average atomic weight.  As the sensitivity of measuring equipment has increased, scientists now notice a distinct difference in these abundance ratios depending on the material source of that element.  To reflect this variation, and as explained in this IUPAC technical report, many elements are now given an atomic weight interval, consisting of a range of known discrete values reflecting the varying isotopic abundance ratios found in different elemental material sources. For example, the atomic weight interval of carbon is 12.0096 to 12.0116.

Another complicating factor is that the abundance ratio of naturally occurring isotopes is not available for all elements.  Some elements like radon do not have any stable isotope and no characteristic isotopic composition in earthly materials.  It means that no average atomic weight can be determined!  There are also a growing number of elements that do not exist in nature, being “synthesized” in the lab.  These artificially created elements are metastable, rapidly decaying into other elements.  Importantly, because different isotopes of a given element decay at different rates, the isotopic abundance ratio between isotopes is time-dependent.

All of these considerations contribute to the uncertainty in atomic weight and isotopic information, which in turn impacts the molecular weight of a compound.

What changes did PubChem make?

All molecular weights in the PubChem Compound database were updated as such:

  • Adoption of “conventional atomic weights”Periodic Table
    To provide a single, representative average atomic-weight value for an element ignoring any material source uncertainties, the latest IUPAC recommendations include a concept of “conventional atomic weight value” whereby most or all atomic-weight variation in normal materials is covered (with an interval of ± 1 in the last digit).  PubChem has adopted this approach for the twelve elements (hydrogen, lithium, boron, carbon, nitrogen, oxygen, magnesium, silicon, sulfur, chlorine, bromine, and thallium) with standard atomic weights given as intervals.
  • Standard atomic weights updated
    Standard atomic weights in PubChem use the latest values provided by IUPAC (except when a conventional atomic weight value is used).  For the thirty-four elements without any abundance information (e.g., technetium), the atomic weight of the most stable, non-theoretical isotope was used, as found in the NuBase2012 evaluation (http://amdc.in2p3.fr/nubase/nubtab12.asc) of nuclear and decay properties.
  • Trimmed precision of molecular weights
    To take into account the uncertainties in elemental abundances and masses, the precision of all molecular weight values were reduced from six to three digits beyond the decimal point.
  • Updated allowed isotopes for elements
    The internal PubChem knowledgebase used to generate the PubChem Compound database from the PubChem Substance database was updated.  (Read this blog if you are not familiar with how these two databases differ from each other.)  As a part of this, only isotopes for elements with an experimentally measured half-life of one millisecond or greater were allowed when using the NuBase2012 evaluation of nuclear and decay properties (http://amdc.in2p3.fr/nubase/nubtab12.asc).  This (slightly) modifies the scope of what can be found in the PubChem Compound database.

 Where can you learn more about this topic?

To learn more about this topic, please read the following:

  • Atomic weights of the elements 2013 (IUPAC Technical Report)
    Meija et al., Pure Appl. Chem. 2016; 88(3): 265-291.
    doi: 10.1515/pac-2015-0305
  • Isotopic compositions of the elements 2013 (IUPAC Technical Report)
    Meija et al., Pure Appl. Chem. 2016; 88(3): 293-306.
    doi: 10.1515/pac-2015-0503

New PubChem Data Sources Page

The PubChem Data Sources page is now updated.

What is the Data Sources page?

As an archive, PubChem contains information from hundreds of sources from all over the world.  Contributors can provide different types of content, such as substances, bioassays, and annotation.  The Data Sources page is an interface that helps one to determine, among other things, who provided what information.PubChem Data Sources

What changed?

As a part of an underlying technology update of PubChem, this page has been completely overhauled with a new look and feel.  The categorization describing the organization types providing content was simplified.  Sources of hierarchical classifications and textual annotations are now included.  There is now a unified data source table containing all primary information.  The updated interface provides new and improved capabilities to navigate as a function of data type, category, and country, while also including keyword searching, counts, and geographic visualization.

  • Filtering capability
    A panel (on the left-hand side of the screen) now summarizes (by count) key aspects of PubChem data sources.  By clicking the check boxes, one can filter the data sources listed.

By type
Classification of the type of information provided to PubChem.  This includes the ability to consider data ‘on-hold’ (to be released at a later date).

By category
General-purpose groupings that describe the contributing organization.

By status
Separates active contributors from legacy.  As explained in this post, some contributors or projects no longer exist (although their contributed data may still have substantial utility or value).

By geographic region
PubChem data contributors span the globe.  One can now filter and visualize by country.

PubChem - Nature Chemistry

  • Expanded sorting capability
    The improved Data Sources page allows users to sort by record counts and last-modified date.  For example, sort by last-modified date helps to identify organizations who recently updated their content.
  • Exploring sources on a mobile device
    As with other PubChem pages developed in recent years, great effort is taken to make the page adapt to the unique experience of mobile devices.  This means that without sacrificing features, the layout scales and complexity adjusts to match the appropriate screen size.
  • Improved individual data source page
    If you click on a data source link in PubChem, it now directs you to a dedicated page for that depositor.  Beyond showing contact information with its location displayed on a Google Map, it provides the date content was last updated and the current counts of submitted records.