PubChem adds a “legacy” designation for outdated data

Sometimes information provided to PubChem by data contributors becomes outdated.  To address this, PubChem is introducing a “legacy” designation for collections that are not regularly updated.  This “legacy” designation applies to project/contributors that appear to no longer be active, as well as to their individual records.  This designation will help PubChem users quickly identify records that may have out-of-date information and/or hyperlinks.

Why a “legacy” designation?

PubChem Legacy Designation 1As an archive, PubChem accepts scientific data from contributors and maintains that data even if the contributing project is discontinued. While this helps ensure community access to the information lasts beyond the lifetime of a given scientific endeavor, the archival nature of PubChem does not allow anyone other than the data contributor to modify provided information.  Therefore, some records in PubChem can persist with outdated (or incorrect) data.  To help identify such cases, we are introducing a “legacy” indication for contributors and their records.  Please note that this does not mean that data identified as “legacy” is without value.  Quite to the contrary, some legacy collections successfully collected valuable scientific data for the research community, and are simply no longer updating the information.

How is a “legacy” designation determined?

A “legacy” designation is arrived at via a semi-manual, semi-automated procedure.  It involves aspects of examining contributor account information, individual records, and user reports.  For example, if the depositor website does not work for a period of time, attempts are made to contact the submitting organization.  If PubChem staff are unable to make contact with the data contributor or if an organization is no longer updating records, a legacy designation may be initiated.  Please note that a “legacy” designation can be removed at any time, when contact is reestablished and updates resume.

Impacts of legacy designation?

PubChem Legacy Designation 2If a data contributor is designated as “legacy”, all records deposited by the contributor are also designated as “legacy”.  While still searchable, these records will clearly indicate that they are “legacy”.  Please note that “legacy” records will not be shown in the “Chemical Vendors” section of Compound Summary pages.  In addition, in the “Substances by Category” section of the Compound Summary page, “legacy” substance records only will be found under “Legacy Depositors”.

Future plans?

The way PubChem implements both manual and automated processes to ascertain a “legacy” indication will likely evolve over time.  In addition, we are looking at the possibility of enabling users to separate out legacy records when searching and analyzing the database.

Laboratory Chemical Safety Summary (LCSS) views now available in PubChem

PubChem Laboratory Chemical Safety Summary 1The PubChem Laboratory Chemical Safety Summary (LCSS) provides pertinent chemical health and safety data for a given PubChem Compound record.  The PubChem LCSS is a community effort involving professionals in health and safety, chemistry librarianship, informatics, and other specialties.

What is LCSS?

PubChem Laboratory Chemical Safety Summary 2The LCSS is based on the format described by the National Research Council in the Prudent Practices in the Laboratory: Handling and Management of Chemical Hazards.  Information contained in the PubChem LCSS is a subset of the PubChem Compound summary page content.  It includes a summary of hazard and safety information for a chemical, such as flammability, toxicity, exposure limits, exposure symptoms, first aid, handling, and clean up.

How can I access LCSS?

PubChem Laboratory Chemical Safety Summary 3An LCSS is available for PubChem Compound records with a GHS hazard classification (Globally Harmonized System of Classification and Labeling of Chemicals).  If a PubChem Compound record has an LCSS, the link to view it is provided at the top of the page under the heading “Safety Summary”.  In addition, one can get the complete list of chemicals with an LCSS by visiting the PubChem LCSS webpage or by using the PubChem Classification Browser.

To learn more about LCSS in PubChem, please explore the following webpages:

Significant Update to PubChemRDF!

PubChemRDF 1.5β is now available.  The new version is faster, supports linked data in new formats, features improved search and query functions, and contains new links.

What is PubChemRDF?

PubChemRDF expresses data in a Resource Description Framework (RDF) format using ontological frameworks and semantic web technologies.  It facilitates data sharing and analysis, and integrates with other National Center for Biotechnology Information (NCBI) resources along with external resources across scientific domains.  To learn more about this project, please see our earlier blog post and PubChemRDF release notes.

PubChem RDF v1.5-beta

What is new in PubChemRDF 1.5β?

The 1.5β release contains a number of new features and technological improvements including:

  • Faster Speed
    PubChemRDF data is now served from a triple-store and provides a noticeable speed improvement, especially for records with lots of data.  Previously, RDF was generated on the fly from data stored in disparate data systems.
  • Addition of MeSH
    Major improvements were made to the reference subdomain.  Most notable is the addition of Medical Subject Heading (MeSH) annotation of PubMed records.  This includes MeSH topical descriptors (with optional qualifier) that indicate the subject of an article and MeSH (supplementary) concepts that indicate things like chemicals and diseases discussed in an article.
  • Direct links to authoritative RDF resources
    PubChemRDF now enhances cross-integration by providing direct links to available authoritative RDF resources within applicable subdomains, including: reference, synonym, and inchikey to MeSH RDF; protein to UniProt RDF; protein and substance to PDB RDF; biosystem to Reactome RDF; substance to ChEMBL RDF; and compound to WikiData RDF.  For example, the links to PDB RDF help to distinguish proteins and associated chemical substances found in a Protein Data Bank (PDB) crystal structure.
  • Addition of ‘concept’ subdomain
    A new ‘concept’ subdomain provides the means to annotate PubChemRDF subdomains.  For example, annotation between nodes within the concept subdomain allows a hierarchy of concepts to be created, such as those in the WHO ATC classification.  These can then be applied, such as in the case of adding links from chemical substance synonyms to a WHO ATC classification to indicate its therapeutic and pharmacological properties.
  • New links added between the compound and biosystem subdomains
    Previously, the biosystem subdomain linked only to the protein subdomain.  The added links between the compound and biosystem subdomains help to indicate the chemical structure involved in a given pathway.
  • Support for protein complexes
    Protein complex targets are now distinguished within the bioassay subdomain and are linked to the component protein units.
  • Linked Data using JSON
    JSON-LD (or JavaScript Object Notation for Linked Data) is a method of transporting Linked Data using JSON. This addition helps those wanting to use JSON formatted data, for example, with JavaScript.

Where can I learn more about PubChemRDF?

To read more on this topic, please consider exploring these links:

Substance Record Page Released

The PubChem Substance Record page is now available.  It complements an update of the PubChem Compound Summary page released six months ago.

What is the Substance Record Page?Substance Record Page

PubChem organizes its data into three databases: Substance, Compound, and BioAssay.  PubChem Substance (accession SID) contains nearly 200 million chemical substance descriptions provided by hundreds of data contributors.  Each record has a webpage that displays contributed information provided by an individual contributor about a particular chemical substance.  This page is called the Substance Record page, and it replaces the PubChem Substance Summary page.

What changed?

The key improvements include:

  • Technology refresh
    As with the recent update to the Compound Summary page, this new page loads much faster by minimizing the amount of data and the time to respond to requests.  The new interface is optimized for both touch- and mouse-based devices.  Using a responsive design, it automatically adapts to the available screen size, making it friendly for desktops, tablets, and phones.
  • What you see, is what we got
    The new page is renamed the Substance Record page as it clearly shows the information provided to PubChem by the contributor.  The older page was called the PubChem Substance Summary page and included additional derived annotation (making it confusing to understand what the contributor provided) and a direct interface to the Compound Summary page (adding user confusion as to the difference between a compound and substance record).
  • URL change
    The old URLs from the Substance Summary page will automatically redirect to the new location for the Substance Record page.  For example, the URL for the Substance Record for SID 12345 is now:

Future plans?

The legacy Substance Summary page will be accessible until October 1, 2015; however, a redirect to the new pages will remain in place.

Our next focus will be on redesigning the BioAssay Summary page.

Compound Summary Page Redesigned

A revamped PubChem Compound Summary page is now available.  Technology has advanced considerably since the last major update in 2011, so this page was given a substantial makeover.

What is the Compound Summary Page?

New Compound Summary Page (Desktop)
PubChem is organized as three interconnected databases: Substance, Compound, and BioAssay.  PubChem Compound contains the unique chemical structure content of PubChem Substance, after normalization processing.  Each individual PubChem Compound record (accession CID) has a web page called a “Compound Summary” that recaps all information known about a particular chemical.  For example, take a look at the Compound Summary page for aspirin:

What changed?

Quite a lot.  Improvements were made in four primary areas:

Look and Feel

  • Emphasis on speedNew Compound Summary Page (Mobile)
    Considerable attention was focused on making the page load faster.  This was achieved by both reducing the amount of data required to display the page and by improving the PubChem computational infrastructure which reduced the time to respond to requests.
  • Universally device-friendly
    The new interface uses a responsive design approach and is optimized for both touch- and mouse-based interfaces.  In addition, the new page automatically adjusts to the available screen size, making it friendly for desktops, tablets, and phones.
  • Tool tips
    Each section and subsection now has its own built in help making it easier to immediately learn what each section or subsection is about by clicking on the “?”.
  • Provenance information
    Annotation sources are now prominently highlighted.  The organization providing the data is indicated to the right of the assertion.  Additionally, when provided by the data source, a bibliographic citation is displayed under the assertion.  Lastly, an overall list of annotation sources for the entire Compound Summary page is available in a new “Information Sources” section at the bottom of the web page, helping to summarize additional reference sources.


  • Icon bar
    An icon bar at the top summarizes key types of available information and provides the means to quickly jump to that section.
  • Table of Contents
    The table of contents was reorganized, optimized, and expanded.  On larger screens, the table of contents is now ‘sticky’, staying with you as you move around the web page, making for easier navigation.  On smaller screens, the table of contents is available when clicking the three horizontal bars.
  • Bookmark a given section
    As you navigate to a given section or subsection, the URL changes making it easier to bookmark or share a direct link to a specific part of the page.


New Annotations

  • MeSH synonyms
    Medical Subject Heading (MeSH) is used to manually cross-index the biomedical literature.  Chemical names found in MeSH for the chemical record are now displayed in their own section.
  • FDA
    Unique Ingredient Identifiers (UNIIs) and pharmacological classifications were added from the U.S. Food and Drug Administration (FDA).
  • WHO
    Anatomical Therapeutic Codes (ATC) and International Nonproprietary Names (INNs) were added from the World Health Organization (WHO).
  • Chemical, Safety, and Toxicology
    Expanded physical properties, safety, and toxicology information from various organizations (including NIH, NIOSH, OSHA, EPA, and ICSC) was added.
  • Other Identifiers
    A new section includes key identifiers for a given chemical.  These include FDA Unique Ingredient Identifiers (UNIIs), European Community (EC) numbers, International Chemical Safety Card (ICSC) number, and more.

New Features

  • Downloadable record data
    The available annotation information can now be downloaded in multiple formats, including JSON and XML.
  • Improved printing
    Special attention to format and layout was made when printing. For example, like book chapters, each section now starts on its own page.  In addition, a “print” button was added.
  • Chemical Vendors
    The chemical vendor section was revamped and added into the table of contents.
  • Added ‘widgets’
    Interactive interfaces are now provided for PDB and pathway data.  In addition, the 2D and 3D chemical structure now have their own respective sections and interfaces.
  • URL interface
    The URL for the Compound Summary page has changed. While old-style URLs will automatically be redirected, the new URL interface can be used to text search PubChem.  For example, these will all work to find the chemical structure of ‘Aspirin’:






Future plans?

This update to the Compound Summary page reflects a new approach to PubChem web pages.  An update to the Substance Summary page using this new framework is anticipated in early 2015.  The old version of the Compound Summary page will continue to remain accessible but will be retired within a year. To access the old version of the page, the parameter “&r=summary” will need to be added to the URL.  For example, to access the old version of the Compound Summary page for Aspirin:


Ten years of service

September 16, 2004 is a special day in the history of PubChem (  It marks the beginning of PubChem as an on-line resource.  Now fast forward ten years.  PubChem provides information daily to many tens of thousands of users.  PubChem's 10th birthdayDespite the passage of time, PubChem’s primary mission remains the same: providing comprehensive information on the biological activities of chemical substances.


PubChem has faced many challengesGrowth in PubChem Depositors over the years.  Chief among them is scalability.  For example, within the first year of operation, the amount of available data in PubChem more than doubled.  To this day, the growth of contributors and data remains very strong, with hundreds of contributing organizations, 20% of which provide Growth in PubChem Substances and Compounds biological activity information to PubChem.  These data providers represent a highly varied cross-section of academic, commercial, and governmental entities.  Combined, they have contributed information on a significant fraction of all known organic small molecule chemical entities, numbering in the tens of millions.
Growth in PubChem bioactivity outcomes
PubChem was created to archive the output of the recently concluded Molecular Libraries Program (MLP – high-throughput screening (HTS) initiative.   Most of the biological activity results in PubChem (>95%) are from MLP HTS centers; however, it is interesting to note that Growth in PubChem BioAssays MLP represents only a small fraction (<1%) of the biological experiments.  All told, there are over 225 million publically available biological activity reports in PubChem, with approximately two million chemicals having some form of biological testing data.  In addition, RNAi screening experiments are increasingly found in PubChem.


Providing chemical information to researchers in the biomedical science community is a key part of PubChem’s purpose.  Over the years, PubChem introduced and incrementally developed several interfaces, each with its own distinct purpose and set of use cases.  Primary to these is the Entrez search interface (, where PubChem is organized as three distinct databases: Substance, Compound, and BioAssay.  Substance provides substance descriptions (accession number: SID), Compound provides the unique small-molecule chemical content of Substance (accession number: CID), and BioAssay provides biological experiment results for substances (accession number: AID).  [Go here to learn more about the different between Substance and Compound.]  Each of these databases has an advanced search interface and contain numerous indexes and filters, which can be combined to construct elaborate queries.  Additional interfaces exist to search and analyze information in PubChem, including the ability to analyze bioactivity information, download chemical and assay data, search by chemical structure or protein sequence, navigate using integrated classifications, visualize chemical 3-D information, and more.

PubChem continues to evolve the way it provides on-line content.  External search engines (like Google, Bing, and others) are now a key way in which researchers locate data.  In addition, programmatic interfaces now account for a significant portion of PubChem’s overall usage (+50%).  Key programmatic interfaces to PubChem include Entrez Utilities and PUG/REST.


The world of information is forever changing and improving.  If the past ten years are any indication of what the future will bring to PubChem, the next ten are sure to be very exciting, with more data from a greater number of sources, additional types of data, increased annotation, improved interfaces, and advancements in ease of access.  With your support as contributors and users, PubChem will continue to serve the needs of the community.

Why contribute your data to PubChem?

PubChem is an open archive of chemical substances and their biological experimental results.  “Open” means that you can put your scientific data in PubChem and that others may use it. What kinds of chemical substances can you provide information about?  All kinds, including small molecule chemicals, RNAs, carbohydrates, peptides, complex mixtures, natural products, PubChem Uploadand more. And you can also provide the results of your biological experiments with these substances. Appropriate biological experimental results include biological assay screens (such as phenotypic, whole cell, defined target, high throughput, dose-response, validation, etc.), physical property measurements, and beyond.

There are many reasons to Upload your data to PubChem:

  • Maximize the benefit of your research.
    When research data is made publicly available, it helps to promote new scientific discovery. Other researchers can find your data, use it, and build upon it. This can lead to new research collaborations and improved insight into your results, thus helping to increase the impact of your research efforts and advance science more quickly. 
  • Save time and effort in open-access data sharing.
    Maintaining your own data archive and user interface takes precious time and adds to research costs. Data sharing requirements by journals and granting agencies may be satisfied by use of the PubChem data archiving platform. PubChem provides high-capacity interfaces, so you know your data will be accessible. Given that PubChem is part of NLM, you can rest assured that your data will be preserved and available without (login or paywall) barriers, now and for the foreseeable future. 
  • Maintain control over when your data becomes public.
    Timing when you release scientific data can be critical.  Release data too soon and you might not be able to file a patent or publish a paper.  If you need to time the release of your data with the publication of a paper, the filing of a patent, or in coordination with a grant administrator, you can set a hold-until date of up to one year in the future.  If anything changes, your hold-until date can be adjusted (shortened or extended). 
  • Share your held data with only those you choose.
    When you first submit data to PubChem, you are assigned stable identifiers for your substances and bioassays.  These identifiers can be used to prove that you have submitted data to PubChem even if the data are not yet publicly viewable.  If your data are on-hold, you can login to your PubChem Upload account and dynamically create unique, private URLs to individual data submissions to share with reviewers and collaborators.  At any moment during the hold-until period, you can delete access to these URLs.

Sharing scientific data is important. PubChem is upgrading its service to make it easier than ever to rapidly upload information about your chemical substances and biological experiment results. Scientific data, however, can be complex. PubChem Upload provides wizards to help guide you through the process of making data public.  In addition, use of standard spreadsheet formats and private FTP uploads for large datasets help to streamline data submission.

For more information, please see the following:

If you have any questions concerning these topics, please contact us via email at