Important Changes to PubChem Web Protocols

PubChem will no longer use HTTP web URLs in favor of HTTPS by September 30, 2016.

What does this mean to you?

Currently, PubChem supports both HTTP and HTTPS web URLs. For example, both URLs http://pubchem.ncbi.nlm.nih.gov and https://pubchem.ncbi.nlm.nih.gov take you to PubChem. However, by September 30, 2016, the HTTP web protocol will be retired in favor of the HTTPS protocol. Furthermore, the HTTPS web protocol will be implemented according to the HTTPS-Only Standard. Any attempt to access PubChem after September 30, 2016 using a web URL starting with “http:” may no longer work.

For the most part, this change will be invisible to you as PubChem started to use HTTPS protocol in early 2014. Today, many sites are using HTTPS when linking to PubChem with an URL. However, those still accessing PubChem using the HTTP protocol will need to be updated to the HTTPS protocol.

Why the change?

On June 8, 2015, the US federal government issued a HTTPS-only policy for all publicly accessible Federal websites.  As a part of this mandate, the National Center for Biotechnology Information (NCBI) recently announced important changes to NCBI Web Protocols to adopt HTTPS on September 30, 2016. A webinar is available on the NCBI YouTube channel that explains how this will affect access to web pages. PubChem resides at NCBI and will adopt the same HTTPS-only policy.

Why is this change being mandated?

The unencrypted HTTP protocol does not protect data from interception or alteration, which can subject users to eavesdropping, tracking, and the modification of received data. The regular unencrypted HTTP protocols create some vulnerabilities and may expose potentially sensitive information about users to hackers. The information may include browser identities, website contents, search terms, user submitted information, and more. Many commercial organizations such as banks have already adopted HTTPS-only policies to protect users when using their websites and services.

HTTPS verifies the identity of a website or web service for a connecting client, and encrypts nearly all information sent between the website or service and the user. Protected information includes cookies, user agent details, URL paths, form submissions, and query string parameters. HTTPS is designed to prevent this information from being read or changed while in transit. HTTPS provides a layer of protection for web users, however, it may be worth noting that HTTPS has several important limitations. IP addresses and destination domain names are not encrypted during communication. Even encrypted traffic can reveal some information indirectly, such as time spent on site, or the size of requested resources or submitted information.

To learn more, visit these websites:

 

PubChem presents at the American Chemical Society National Meeting in San Diego (March 13-17, 2016)

On March 13-17, 2016, the 251st American Chemical Society National Meeting will be held in San Diego, CA, the theme of which is “Computers in Chemistry”.  The PubChem team will be at the ACS meeting to present new developments and recent changes in PubChem.  Below is a list of presentations that will be given by the PubChem staff.

 

Day 1 (Sunday, March 13)

Day 2 (Monday, March 14)

Day 3 (Tuesday, March 15)

Day 4 (Wednesday, March 16)

Day 5 (Thursday, March 17)

Recent PubChem Publications: Read about What’s New!

PubChem PublicationsThe PubChem team published an article in the 2016 Nucleic Acids Research Database issue (Kim et al., Nucl. Acids Res., 2016, 44(D1), D1202-D1213, PMID: 26400175).  This article provides an overview of the PubChem Compound and Substance databases, including organization, contents, interfaces, programmatic access and other relevant tools and services.  Considerable changes have been made since these two databases were described in a previous paper published in 2008 (Bolton et al., Ann. Rep. Comput. Chem., 2008, 4, 217-241), and the newly published paper provides updated information on these resources.

Additional papers published about PubChem by the team in 2015 include:

To get a complete list of all articles published by the PubChem team, please visit the PubChem Publication page.

BioAssay Record Page Released

The PubChem BioAssay Record page is now available.  It complements a recent revamp of the PubChem Compound Summary page and the Substance Record page.

What is the BioAssay Record Page?

PubChem Legacy Designation 1As explained in a previous post, PubChem organizes data into three primary databases: Substance, Compound, and BioAssay.  The BioAssay database contains over one million biological assay experiments containing more than 229 million bioactivity outcomes.  For each assay, PubChem now provides a BioAssay Record page (formerly called the Assay Summary page), which displays information provided by the data contributor about the assay as well as annotations and links to tools that support data interpretation and analysis.

What changed?

The key improvements include:

  • Technology refresh
    As with the recent update to the Compound Summary and the Substance Record pages, the new data-driven BioAssay interface is optimized for both touch- and mouse-based devices.  Using a responsive design, it automatically adapts to the available screen size, making it friendly for desktops, tablets, and mobile phones.
  • Data contents reorganized
    In the new BioAssay Record page, depositor-provided information is presented first, followed by annotations based on third-party curation and PubChem processing.  In the now deprecated Assay Summary page, depositor-provided information was intermingled with annotations from third-party curation and PubChem processing, often causing confusion about data provenance (i.e., the information source).
  • Improved data table
    PubChem Legacy Designation 1While a full bioactivity data set is retrieved by default, the data table is partitioned according to activity outcomes (e.g., active, inactive, submicromolar activity, subnanomolar activity, and so on), allowing users to quickly filter results.  In addition, users can download the entire data table or a filtered subset.  To support comparative evaluation, a link to a cross-assay bioactivity analysis page is provided for each compound displayed in the data table.
  • Extended download functionality
    PubChem Legacy Designation 1The top bar ‘Download’ button provides access to all downloadable data on the page.  This includes depositor-provided description, data table results, chemical structures tested in an assay, and annotations shown in an individual section.
  • Integrated Related BioAssay Summary
    At the bottom of the BioAssay Record page, BioAssays from the same assay project and other related BioAssays are displayed in a tabular format, facilitating assay data interpretation and comparison.
  • URL change
    BioAssay Record page uses a different URL from the now deprecated Assay Summary page.  For example, the URL for the BioAssay Record for AID 1284 is:
     
    https://pubchem.ncbi.nlm.nih.gov/bioassay/1284
     
    Links to the now deprecated Assay Summary page will automatically redirect to the new location for the BioAssay Record page.

Future plans?

The now deprecated Assay Summary page will remain accessible from a link at the top of the BioAssay Record page until May 2016.

A PubChem Target Summary page is in progress, helping to summarize available biological activity and annotation information in PubChem is in progress.

PubChem adds a “legacy” designation for outdated data

Sometimes information provided to PubChem by data contributors becomes outdated.  To address this, PubChem is introducing a “legacy” designation for collections that are not regularly updated.  This “legacy” designation applies to project/contributors that appear to no longer be active, as well as to their individual records.  This designation will help PubChem users quickly identify records that may have out-of-date information and/or hyperlinks.

Why a “legacy” designation?

PubChem Legacy Designation 1As an archive, PubChem accepts scientific data from contributors and maintains that data even if the contributing project is discontinued. While this helps ensure community access to the information lasts beyond the lifetime of a given scientific endeavor, the archival nature of PubChem does not allow anyone other than the data contributor to modify provided information.  Therefore, some records in PubChem can persist with outdated (or incorrect) data.  To help identify such cases, we are introducing a “legacy” indication for contributors and their records.  Please note that this does not mean that data identified as “legacy” is without value.  Quite to the contrary, some legacy collections successfully collected valuable scientific data for the research community, and are simply no longer updating the information.

How is a “legacy” designation determined?

A “legacy” designation is arrived at via a semi-manual, semi-automated procedure.  It involves aspects of examining contributor account information, individual records, and user reports.  For example, if the depositor website does not work for a period of time, attempts are made to contact the submitting organization.  If PubChem staff are unable to make contact with the data contributor or if an organization is no longer updating records, a legacy designation may be initiated.  Please note that a “legacy” designation can be removed at any time, when contact is reestablished and updates resume.

Impacts of legacy designation?

PubChem Legacy Designation 2If a data contributor is designated as “legacy”, all records deposited by the contributor are also designated as “legacy”.  While still searchable, these records will clearly indicate that they are “legacy”.  Please note that “legacy” records will not be shown in the “Chemical Vendors” section of Compound Summary pages.  In addition, in the “Substances by Category” section of the Compound Summary page, “legacy” substance records only will be found under “Legacy Depositors”.

Future plans?

The way PubChem implements both manual and automated processes to ascertain a “legacy” indication will likely evolve over time.  In addition, we are looking at the possibility of enabling users to separate out legacy records when searching and analyzing the database.

Laboratory Chemical Safety Summary (LCSS) views now available in PubChem

PubChem Laboratory Chemical Safety Summary 1The PubChem Laboratory Chemical Safety Summary (LCSS) provides pertinent chemical health and safety data for a given PubChem Compound record.  The PubChem LCSS is a community effort involving professionals in health and safety, chemistry librarianship, informatics, and other specialties.

What is LCSS?

PubChem Laboratory Chemical Safety Summary 2The LCSS is based on the format described by the National Research Council in the Prudent Practices in the Laboratory: Handling and Management of Chemical Hazards.  Information contained in the PubChem LCSS is a subset of the PubChem Compound summary page content.  It includes a summary of hazard and safety information for a chemical, such as flammability, toxicity, exposure limits, exposure symptoms, first aid, handling, and clean up.

How can I access LCSS?

PubChem Laboratory Chemical Safety Summary 3An LCSS is available for PubChem Compound records with a GHS hazard classification (Globally Harmonized System of Classification and Labeling of Chemicals).  If a PubChem Compound record has an LCSS, the link to view it is provided at the top of the page under the heading “Safety Summary”.  In addition, one can get the complete list of chemicals with an LCSS by visiting the PubChem LCSS webpage or by using the PubChem Classification Browser.

To learn more about LCSS in PubChem, please explore the following webpages:

Significant Update to PubChemRDF!

PubChemRDF 1.5β is now available.  The new version is faster, supports linked data in new formats, features improved search and query functions, and contains new links.

What is PubChemRDF?

PubChemRDF expresses data in a Resource Description Framework (RDF) format using ontological frameworks and semantic web technologies.  It facilitates data sharing and analysis, and integrates with other National Center for Biotechnology Information (NCBI) resources along with external resources across scientific domains.  To learn more about this project, please see our earlier blog post and PubChemRDF release notes.

PubChem RDF v1.5-beta

What is new in PubChemRDF 1.5β?

The 1.5β release contains a number of new features and technological improvements including:

  • Faster Speed
    PubChemRDF data is now served from a triple-store and provides a noticeable speed improvement, especially for records with lots of data.  Previously, RDF was generated on the fly from data stored in disparate data systems.
  • Addition of MeSH
    Major improvements were made to the reference subdomain.  Most notable is the addition of Medical Subject Heading (MeSH) annotation of PubMed records.  This includes MeSH topical descriptors (with optional qualifier) that indicate the subject of an article and MeSH (supplementary) concepts that indicate things like chemicals and diseases discussed in an article.
  • Direct links to authoritative RDF resources
    PubChemRDF now enhances cross-integration by providing direct links to available authoritative RDF resources within applicable subdomains, including: reference, synonym, and inchikey to MeSH RDF; protein to UniProt RDF; protein and substance to PDB RDF; biosystem to Reactome RDF; substance to ChEMBL RDF; and compound to WikiData RDF.  For example, the links to PDB RDF help to distinguish proteins and associated chemical substances found in a Protein Data Bank (PDB) crystal structure.
  • Addition of ‘concept’ subdomain
    A new ‘concept’ subdomain provides the means to annotate PubChemRDF subdomains.  For example, annotation between nodes within the concept subdomain allows a hierarchy of concepts to be created, such as those in the WHO ATC classification.  These can then be applied, such as in the case of adding links from chemical substance synonyms to a WHO ATC classification to indicate its therapeutic and pharmacological properties.
  • New links added between the compound and biosystem subdomains
    Previously, the biosystem subdomain linked only to the protein subdomain.  The added links between the compound and biosystem subdomains help to indicate the chemical structure involved in a given pathway.
  • Support for protein complexes
    Protein complex targets are now distinguished within the bioassay subdomain and are linked to the component protein units.
  • Linked Data using JSON
    JSON-LD (or JavaScript Object Notation for Linked Data) is a method of transporting Linked Data using JSON. This addition helps those wanting to use JSON formatted data, for example, with JavaScript.

Where can I learn more about PubChemRDF?

To read more on this topic, please consider exploring these links: