Compound Summary Page Redesigned

A revamped PubChem Compound Summary page is now available.  Technology has advanced considerably since the last major update in 2011, so this page was given a substantial makeover.

What is the Compound Summary Page?

New Compound Summary Page (Desktop)
PubChem is organized as three interconnected databases: Substance, Compound, and BioAssay.  PubChem Compound contains the unique chemical structure content of PubChem Substance, after normalization processing.  Each individual PubChem Compound record (accession CID) has a web page called a “Compound Summary” that recaps all information known about a particular chemical.  For example, take a look at the Compound Summary page for aspirin:

What changed?

Quite a lot.  Improvements were made in four primary areas:

Look and Feel

  • Emphasis on speedNew Compound Summary Page (Mobile)
    Considerable attention was focused on making the page load faster.  This was achieved by both reducing the amount of data required to display the page and by improving the PubChem computational infrastructure which reduced the time to respond to requests.
  • Universally device-friendly
    The new interface uses a responsive design approach and is optimized for both touch- and mouse-based interfaces.  In addition, the new page automatically adjusts to the available screen size, making it friendly for desktops, tablets, and phones.
  • Tool tips
    Each section and subsection now has its own built in help making it easier to immediately learn what each section or subsection is about by clicking on the “?”.
  • Provenance information
    Annotation sources are now prominently highlighted.  The organization providing the data is indicated to the right of the assertion.  Additionally, when provided by the data source, a bibliographic citation is displayed under the assertion.  Lastly, an overall list of annotation sources for the entire Compound Summary page is available in a new “Information Sources” section at the bottom of the web page, helping to summarize additional reference sources.


  • Icon bar
    An icon bar at the top summarizes key types of available information and provides the means to quickly jump to that section.
  • Table of Contents
    The table of contents was reorganized, optimized, and expanded.  On larger screens, the table of contents is now ‘sticky’, staying with you as you move around the web page, making for easier navigation.  On smaller screens, the table of contents is available when clicking the three horizontal bars.
  • Bookmark a given section
    As you navigate to a given section or subsection, the URL changes making it easier to bookmark or share a direct link to a specific part of the page.


New Annotations

  • MeSH synonyms
    Medical Subject Heading (MeSH) is used to manually cross-index the biomedical literature.  Chemical names found in MeSH for the chemical record are now displayed in their own section.
  • FDA
    Unique Ingredient Identifiers (UNIIs) and pharmacological classifications were added from the U.S. Food and Drug Administration (FDA).
  • WHO
    Anatomical Therapeutic Codes (ATC) and International Nonproprietary Names (INNs) were added from the World Health Organization (WHO).
  • Chemical, Safety, and Toxicology
    Expanded physical properties, safety, and toxicology information from various organizations (including NIH, NIOSH, OSHA, EPA, and ICSC) was added.
  • Other Identifiers
    A new section includes key identifiers for a given chemical.  These include FDA Unique Ingredient Identifiers (UNIIs), European Community (EC) numbers, International Chemical Safety Card (ICSC) number, and more.

New Features

  • Downloadable record data
    The available annotation information can now be downloaded in multiple formats, including JSON and XML.
  • Improved printing
    Special attention to format and layout was made when printing. For example, like book chapters, each section now starts on its own page.  In addition, a “print” button was added.
  • Chemical Vendors
    The chemical vendor section was revamped and added into the table of contents.
  • Added ‘widgets’
    Interactive interfaces are now provided for PDB and pathway data.  In addition, the 2D and 3D chemical structure now have their own respective sections and interfaces.
  • URL interface
    The URL for the Compound Summary page has changed. While old-style URLs will automatically be redirected, the new URL interface can be used to text search PubChem.  For example, these will all work to find the chemical structure of ‘Aspirin’:






Future plans?

This update to the Compound Summary page reflects a new approach to PubChem web pages.  An update to the Substance Summary page using this new framework is anticipated in early 2015.  The old version of the Compound Summary page will continue to remain accessible but will be retired within a year. To access the old version of the page, the parameter “&r=summary” will need to be added to the URL.  For example, to access the old version of the Compound Summary page for Aspirin:


Ten years of service

September 16, 2004 is a special day in the history of PubChem (  It marks the beginning of PubChem as an on-line resource.  Now fast forward ten years.  PubChem provides information daily to many tens of thousands of users.  PubChem's 10th birthdayDespite the passage of time, PubChem’s primary mission remains the same: providing comprehensive information on the biological activities of chemical substances.


PubChem has faced many challengesGrowth in PubChem Depositors over the years.  Chief among them is scalability.  For example, within the first year of operation, the amount of available data in PubChem more than doubled.  To this day, the growth of contributors and data remains very strong, with hundreds of contributing organizations, 20% of which provide Growth in PubChem Substances and Compounds biological activity information to PubChem.  These data providers represent a highly varied cross-section of academic, commercial, and governmental entities.  Combined, they have contributed information on a significant fraction of all known organic small molecule chemical entities, numbering in the tens of millions.
Growth in PubChem bioactivity outcomes
PubChem was created to archive the output of the recently concluded Molecular Libraries Program (MLP – high-throughput screening (HTS) initiative.   Most of the biological activity results in PubChem (>95%) are from MLP HTS centers; however, it is interesting to note that Growth in PubChem BioAssays MLP represents only a small fraction (<1%) of the biological experiments.  All told, there are over 225 million publically available biological activity reports in PubChem, with approximately two million chemicals having some form of biological testing data.  In addition, RNAi screening experiments are increasingly found in PubChem.


Providing chemical information to researchers in the biomedical science community is a key part of PubChem’s purpose.  Over the years, PubChem introduced and incrementally developed several interfaces, each with its own distinct purpose and set of use cases.  Primary to these is the Entrez search interface (, where PubChem is organized as three distinct databases: Substance, Compound, and BioAssay.  Substance provides substance descriptions (accession number: SID), Compound provides the unique small-molecule chemical content of Substance (accession number: CID), and BioAssay provides biological experiment results for substances (accession number: AID).  [Go here to learn more about the different between Substance and Compound.]  Each of these databases has an advanced search interface and contain numerous indexes and filters, which can be combined to construct elaborate queries.  Additional interfaces exist to search and analyze information in PubChem, including the ability to analyze bioactivity information, download chemical and assay data, search by chemical structure or protein sequence, navigate using integrated classifications, visualize chemical 3-D information, and more.

PubChem continues to evolve the way it provides on-line content.  External search engines (like Google, Bing, and others) are now a key way in which researchers locate data.  In addition, programmatic interfaces now account for a significant portion of PubChem’s overall usage (+50%).  Key programmatic interfaces to PubChem include Entrez Utilities and PUG/REST.


The world of information is forever changing and improving.  If the past ten years are any indication of what the future will bring to PubChem, the next ten are sure to be very exciting, with more data from a greater number of sources, additional types of data, increased annotation, improved interfaces, and advancements in ease of access.  With your support as contributors and users, PubChem will continue to serve the needs of the community.

Why contribute your data to PubChem?

PubChem is an open archive of chemical substances and their biological experimental results.  “Open” means that you can put your scientific data in PubChem and that others may use it. What kinds of chemical substances can you provide information about?  All kinds, including small molecule chemicals, RNAs, carbohydrates, peptides, complex mixtures, natural products, PubChem Uploadand more. And you can also provide the results of your biological experiments with these substances. Appropriate biological experimental results include biological assay screens (such as phenotypic, whole cell, defined target, high throughput, dose-response, validation, etc.), physical property measurements, and beyond.

There are many reasons to Upload your data to PubChem:

  • Maximize the benefit of your research.
    When research data is made publicly available, it helps to promote new scientific discovery. Other researchers can find your data, use it, and build upon it. This can lead to new research collaborations and improved insight into your results, thus helping to increase the impact of your research efforts and advance science more quickly. 
  • Save time and effort in open-access data sharing.
    Maintaining your own data archive and user interface takes precious time and adds to research costs. Data sharing requirements by journals and granting agencies may be satisfied by use of the PubChem data archiving platform. PubChem provides high-capacity interfaces, so you know your data will be accessible. Given that PubChem is part of NLM, you can rest assured that your data will be preserved and available without (login or paywall) barriers, now and for the foreseeable future. 
  • Maintain control over when your data becomes public.
    Timing when you release scientific data can be critical.  Release data too soon and you might not be able to file a patent or publish a paper.  If you need to time the release of your data with the publication of a paper, the filing of a patent, or in coordination with a grant administrator, you can set a hold-until date of up to one year in the future.  If anything changes, your hold-until date can be adjusted (shortened or extended). 
  • Share your held data with only those you choose.
    When you first submit data to PubChem, you are assigned stable identifiers for your substances and bioassays.  These identifiers can be used to prove that you have submitted data to PubChem even if the data are not yet publicly viewable.  If your data are on-hold, you can login to your PubChem Upload account and dynamically create unique, private URLs to individual data submissions to share with reviewers and collaborators.  At any moment during the hold-until period, you can delete access to these URLs.

Sharing scientific data is important. PubChem is upgrading its service to make it easier than ever to rapidly upload information about your chemical substances and biological experiment results. Scientific data, however, can be complex. PubChem Upload provides wizards to help guide you through the process of making data public.  In addition, use of standard spreadsheet formats and private FTP uploads for large datasets help to streamline data submission.

For more information, please see the following:

If you have any questions concerning these topics, please contact us via email at

What is the difference between a substance and a compound in PubChem?

PubChem users sometimes ask about the difference between a substance and a compound.  The question is not surprising as the names “substance” and “compound” alone do not inherently convey the difference.  In PubChem terminology, a substance is a chemical sample description provided by a single source and a compound is a normalized chemical structure representation found in one or more contributed substances.  The distinction is important as PubChem is organized in three separate databases: Compound, Substance, and BioAssay.  The diagram below explains the difference, but let’s explore this further.

PubChem 3 DBs

To understand the different databases in PubChem, it is helpful to know where the information comes from.  PubChem ( is an open archive of chemical substances and information about their biological activities.  Data is provided by hundreds of contributors (, including publishers, researchers, chemical vendors, pharmaceutical companies, and a number of important chemical biology resources.  Each of these data sources contributes a description of chemical substance samples for which they have information.

PubChem calls these community-provided sample descriptions “substances.”  Each record found in the PubChem Substance database ( contains information provided by an individual contributor about a particular chemical substance.  Substance records are independent of each other.  Two different Substance records (from the same or different providers) could provide different information about the same chemical structure.  For example, one substance record may give information about the biological role of aspirin, while another may give information about a research grade sample of aspirin.  The Substance database maintains the provenance of chemical substance information in PubChem.  It helps users see who provided what.  As a result, there may be many substance records about a given molecule, presenting a problem for users who are interested in an aggregated view of information on the molecule.  This is where the PubChem Compound database ( comes into play.
PubChem substance vs compound
The Compound database is derived from the chemical structure contents found in the Substance database.  Each chemical is computationally examined with a series of validation and normalization steps.  This process results in a normalized representation of the chemical structure for a substance record.  Chemical substances in the Substance database that are not completely described or that fail normalization procedures are not included in the Compound database.  Those substances in the Substance database that pass chemical structure normalization procedures are linked to a “compound” record in the Compound database.  If two substances refer to the same chemical structure, they point to the same compound.  This allows data from different Substance data providers to be aggregated through a common Compound record.  However, also having separate substance records is still valuable to users, who, for example, might be interested in the provenance of a substance or a particular state of the chemical (e.g., a different tautomeric form).  In essence, a primary purpose of the PubChem Compound database is to provide a “non-redundant” view of the depositor-contributed chemical structure contents stored in the PubChem Substance database.

So, to answer the question posed at the beginning, what is the difference between a substance and a compound?  A substance is a contributed chemical substance sample description from a particular PubChem data provider.  A compound is a normalized chemical structure representation found in one or more contributed substance descriptions.

To read more on this topic, please consider exploring these links:

PubChemRDF is Launched

Introducing PubChemRDF!

The PubChemRDF project encodes PubChem information using the Resource Description Framework (RDF).  One of the aims of the PubChemRDF project is to help researchers work with PubChem data on local computing resources using semantic web technologies.  Another aim is to harness ontological frameworks to help facilitate PubChem data sharing, analysis, and integration with resources external to the National Center for Biotechnology Information (NCBI) and across scientific domains.

What is RDF?

RDF stands for resource description framework and constitutes a family of World Wide Web Consortium (W3C) specifications for data interchange on the Web. RDF breaks down knowledge into machine readable discrete pieces, called “triples.” Each “triple” is organized as a trio of “subject-predicate-object.” For example, in the phrase “atorvastatin may treat hypercholesterolemia,” the subject is “atorvastatin,” the predicate is “may treat,” and the object is “hypercholesterolemia.” RDF uses a Uniform Resource Identifier (URI) to name each part of the “subject-predicate-object” triple. A URI looks just like a typical web URL.

RDF is a core part of semantic web standards.  As an extension of the existing World Wide Web, the semantic web attempts to make it easier for users to find, share, and combine information.  Semantic web leverages the following technologies: Extensible Markup Language (XML), which provides syntax for RDF; Web Ontology Language (OWL), which extends the ability of RDF to encode information; Resource Description Framework (RDF), which expresses knowledge; and RDF query language (SPARQL), which enables query and manipulation of RDF content.

How can PubChemRDF help your research?

PubChem users have frequently expressed interest in having a downloadable, schema-less database. PubChemRDF enables the NoSQL database access and query of PubChem databases.  Using PubChemRDF, one can download the desired RDF formatted data files from the PubChem FTP site, import them into a triplestore, and query using a SPARQL query interface. There are a number of open-source or commercial triplestores, such as Apache Jena TDB and OpenLink Virtuoso (a list of triplestores can be found here: Other than triplestores, PubChemRDF data can also be loaded into RDF-aware graph databases such as Neo4j, and the graph traversal algorithms can be used to query the RDF graphs. At last but not least, the ontological representation of PubChem knowledge base allows logical inference, such as forward/backward chaining.

The RDF data on the PubChem FTP site is arranged in such a way that you only need to download the type of information in which you are interested, so you can avoid downloading parts of PubChem data you will not use.  For example, if you are just interested in computed chemical properties, you only need to download PubChemRDF data in compound descriptor subdomain. In addition to bulk download, PubChemRDF also provides programmatic data access through REST-full interface.

Where can you learn more about this?

To get an overview of the PubChemRDF project, please view this presentation.  To learn more about detailed aspects of PubChemRDF and how to use it, please view this presentation. The PubChemRDF Release Notes provide additional technical information about the project.

Additional blog posts will follow on PubChemRDF project topics, including: the FTP site layout, the REST-full interface, and ways to utilize PubChemRDF for research purposes including using SPARQL queries.

PubChem Upload 1.0f Released

Submitting your data to PubChem is now easier than ever. PubChem Upload: click to see the large image The new PubChem Upload system offers streamlined procedures for data submissions and includes an extensive set of wizards, inline help tips, and templates to assist users.  First released as a beta in April 2013, PubChem Upload is now in final form (1.0f) and replaces the Deposition Gateway as the primary PubChem data submission system.  The PubChem Deposition Gateway, first introduced in April 2005, has been superseded as an interface and will be completely phased out in 2014.

What does it do?

PubChem Upload is a data submission system.  PubChem Upload: click to see the large image It allows contributors to provide substance descriptions (including chemical structures, names, crosslinks, and comments), assay experiment descriptions, and the results of substances being tested in assays.  There is a great deal of flexibility in the information that can be provided to PubChem.  For example, there are no limits (beyond the practical) on the number of assay readouts or the count of substances per assay that can be provided.  An abbreviated list of PubChem Upload features include:

  • PubChem Upload: click to see the large image The means to enter data and descriptive information by web form or by file, based on user preference.
  • Convenient spreadsheet formats (CSV, Excel & OpenOffice) as well as XML-based data specifications accommodate both one-off and frequent data providers.
  • A “Preview” function displays incoming data to show how it will appear in PubChem before being loaded.
  • An automated suite of validation checks help contributors identify potential issues before data is made public.

Why the new release?

Advances in web technologies provided us the opportunity to enhance the user experience by reducing the time and effort required to make substance descriptions and their associated biological activities available and useful for the public. PubChem Upload: click to see the large image The new PubChem Upload interface greets a new contributor who may only be interested in making a quick submission with a simple decision-tree set of wizards to guide them through the process of publishing their data in PubChem.  For the experienced user, the wizards can be avoided, and the enhanced upload and editing capabilities used instead.

There are many improvements over the older Deposition Gateway system. One noteworthy feature is that PubChem Upload offers an expanded ability to edit data directly in the browser.  The spreadsheet editor gives PubChem contributors the ability to upload large spreadsheets with minimal reformatting and to edit those large datasets online.

Potential future directions

PubChem staff places a high importance on continuing to improve the submission process and increasing the usefulness of data to the PubChem end-user.  One such direction is the use of controlled vocabulary annotations, or ontologies, such as BAO, GO, and MeSH, to help streamline the description of provided data.  This may, for example, improve the ability of PubChem end-users to utilize and analyze bioactivity results.

The new PubChem Upload system utilizes a RESTful model of data communication between client and server.  As such, it is now technically possible to document and support the creation of upload utilities that can be incorporated into third-party software such as ELNs and LIMs. Interfacing PubChem Upload directly with a properly configured laboratory data system may dramatically reduce the effort to publish data in PubChem.

Where can I learn more about PubChem Upload?

To get an overview of the PubChem Upload system, please view this presentation.  To get basic information, please read this abbreviated help document.  For a more extensive overview and detailed information about the features, please read the complete help document.