March 21, 2018: Dr. Rob Califf to Speak on Data Science at March 23 Grand Rounds

Robert Califf, MD, former FDA Commissioner and current Vice Chancellor for Health Data Science at Duke University School of Medicine, will present at NIH Collaboratory Grand Rounds on Friday, March 23 at 1 pm ET. The webinar will be broadcast live and is open to the public. Following the presentation, Dr. Califf will answer questions from the Grand Rounds audience.

As Director of Duke Forge, Duke’s interdisciplinary center for actionable health data science, Dr. Califf is currently working on initiatives designed to harness biostatics, machine learning, and sophisticated informatics approaches to improve health and healthcare. Dr. Califf is also an adjunct professor of medicine at Stanford University and is employed by Verily Life Sciences as a scientific advisor. Verily, part of the Alphabet (Google) family of companies, is aimed at transforming the growth of health-related data into practical applications.

Dr. Califf has been a pioneer in the fields of clinical, translational, and outcomes research, and the NIH Collaboratory looks forward to hearing his thoughts on the pragmatic applications of data that will advance health and health care strategies and practice.

Topic: Data Science in the Era of Data Ubiquity

Date: Friday, March 23, 2018, 1:00-2:00 p.m. ET

Meeting Info: To check whether you have the appropriate players installed for UCF (Universal Communications Format) rich media files, go to

To join the online meeting:
Go to

October 10, 2017: NIH Collaboratory Core Working Group Interviews: Reflections from the Phenotypes, Data Standards, and Data Quality Core

At the NIH Collaboratory Steering Committee meeting in May 2017, we asked Drs. Rachel Richesson and W. Ed Hammond, Co-chairs of the Phenotypes, Data Standards, and Data Quality Core, to reflect on the first 5 years of their Core’s work and the challenges ahead.

Both were pleased with how the Core was able to provide guidelines for assessing data quality and the reporting of pragmatic trials, especially around issues with phenotypes and the use of electronic health record data. Future work in this area needs to advance the development of regulations and standards for the collection of clinical data to support learning healthcare systems.

“We’ve built a community in our Core that represents a diverse group of scientists and clinicians showing the many ways to look at data challenges.”
– Dr. Rachel Richesson

In Fall 2017, the Phenotypes, Data Standards, and Data Quality Core merged with the Electronic Health Records Core. The combined Core will continue to work on data standards and quality, and approaches to define clinical phenotypes and endpoints, extract information, and discover errors in data from healthcare systems.

Download the interview (PDF).

A PDF of the May 2017 interview with leaders of the Phenotypes Core Working Group.

New Living Textbook Chapter on Acquiring and Using Electronic Health Record Data for Research

Topic ChaptersMeredith Nahm Zozus and colleagues from the NIH Collaboratory’s Phenotypes, Data Standards, and Data Quality Core have published a new Living Textbook chapter about key considerations for secondary use of electronic health record (EHR) data for clinical research.

In contrast to traditional randomized controlled clinical trials where data are prospectively collected, many pragmatic clinical trials use data that were primarily collected for clinical purposes and are secondarily used for research. The chapter describes the steps a prospective researcher will take to acquire and use EHR data:

  • Gain permission to use the data. When a prospective researcher wishes to use data, a data use agreement (DUA) is usually required that describes the purpose of the research and the proposed use of the data. This section also describes use of de-identified data and limited data sets.
  • Understand fundamental differences in context. Data collected in routine care settings reflect standard procedures at an individual’s healthcare facility, and are not collected in a standard, structured manner.
  • Assess the availability of health record data. Few assumptions can be made about what is available from an organization’s healthcare records; up-front, detailed discussions about data element collection over time at each facility is required.
  • Understand the available data. A secondary data user must understand both the data meaning and the data quality; both can vary greatly across organizations and affect a study’s ability to support research conclusions.
  • Identify populations and outcomes of interest. Because healthcare facilities are obligated to provide only the minimum necessary data to answer a research question, investigators must identify the needed patients and data elements with specificity and sensitivity to answer the research question given the available data.
  • Consider record linkage. Studies using data from multiple records and sources will require matching data to ensure they refer to the correct patient.
  • Manage the data. The investigator is responsible for receiving, managing, and processing data and must demonstrate that the data are reproducible and support research conclusions.
  • Archive and share the data after the study. Data may be archived and shared to ensure reproducibility, enable auditing for quality assurance and regulatory compliance, or to answer other questions about the research. Analysis Dataset Available from CTTI

Tools for ResearchAs part of a project that examined the degree to which sponsors of clinical research are complying with federal requirements for the reporting of clinical trial results, the Clinical Trials Transformation Initiative (CTTI) and the authors of the study are making the primary dataset used in the analysis available to the public. The full analysis dataset, study variables, and data definitions are available as Excel worksheets from the CTTI website and on the Living Textbook’s Tools for Research page.

Collaboratory Phenotypes, Data Standards, and Data Quality Core Releases Data Quality Assessment White Paper

The NIH Collaboratory’s Phenotypes, Data Standards, and Data Quality Core has released a new white paper on data quality assessment in the setting of pragmatic research. The white paper, titled Assessing Data Quality for Healthcare Systems Data Used in Clinical Research (V1.0) provides guidance, based on the best available evidence and practice, for assessing data quality in pragmatic clinical trials (PCTs) conducted through the Collaboratory. Topics covered include an overview of data quality issues in clinical research settings, data quality assessment dimensions (completeness, accuracy, and consistency), and a series of recommendations for assessing data quality. Also included as appendices are a set of data quality definitions and review criteria, as well as a data quality assessment plan inventory.

The full text of the document can be accessed through the “Tools for Research” tab on the Living Textbook or can be downloaded directly here (PDF).