Using Electronic Health Record Data in Pragmatic Clinical Trials
Developing and Refining the Research Questions
Karen Staman, MS
As with any type of research study, PCTs begin with a scientific question. A clearly articulated research question defines the phenomena of interest, the purpose for using EHR data, the possible sources of data to detect that phenomena, and, more specifically, the data requirements, definitions, quality, and data collection plan. In practice, however, there is often an iterative process between defining the data requirements for the EHR and understanding what is actually available at a given institution. The researcher must consider what information is needed to answer the question, and in turn, the available data may then influence the research question (i.e., ongoing cycles of this conversation: Health System: What data do you need for the trial? Researcher: That depends… What data is available?) This dialogue may lead investigators to refine their research question slightly to one that is likely more “answerable” based upon what data are collected or available. Anecdotally, many researchers see the potential for discovery in clinical data warehouses derived from EHRs, and are excited to use the available data to generate questions and answers. While this approach is understandable and practical in theory, the complexity of EHR architectures, and their inherent bias and possible error, could produce inaccurate data and lead to misinterpretation of results and erroneous conclusions. Consequently, we assert that the scientific research question should be the fundamental driver for the study design and hence the foundation for any PCT.
Good clinical research practice and ethics dictate that clinical trials collect the necessary data (and ONLY the necessary data) to answer a specific research question (ICH Harmonised Tripartite Guideline 1996). In most PCTs, it is vitally important to identify and measure co-variates between study arms or clusters. In these cases, there need to be high quality (accurate, complete) data from a number of variables so that researchers can assess the comparability between groups. The objective is to achieve balance between the groups along as many dimensions that are relevant, important, and feasible.
Because developing a PCT that will use data from the EHR can be extremely complicated, the NIH Collaboratory has developed a set of papers and chapters that provide a deeper dive into many of the issues, including developing data definitions and phenotyping, developing a plan to assess and assessing data quality, and steps to use to acquire and manage the data. See the Resources section in the side panel.
- Data as a Surrogate for Clinical Phenomena
- Developing and Refining the Research Questions
- Specific Uses for EHR Data in PCTs
- Identifying the Study Population and Assessing Baseline Prognostic Characteristics
- Implementing and Monitoring the Delivery of an Intervention
- Assessing Outcomes
- The Research Question Drives the Data Requirements
- Additional Resources
A resource chapter describing mechanisms for identifying and evaluating phenotype definitions, with a particular focus on standardization efforts from the Collaboratory
A guide to help those conducting a literature search for publications related to utilizing EHR data for the purpose of characterizing patients, populations, or cohorts
A catalog of phenyotype-related efforts identified
List of sources of existing phenotypes