Data Curation Best Practices and Innovations for Real-World Evidence Generated Using Electronic Health Record-Sourced Data

Data Curation Best Practices and Innovations for Real-World Evidence Generated Using Electronic Health Record-Sourced Data

White Paper

Data Curation Best Practices and Innovations for Real-World Evidence Generated Using Electronic Health Record-Sourced Data

Introduction

The FDA describes a three-stage life cycle for data sourced from EHRs—data accrual, curation, and transformation—in their industry guidance titled “Real-World Data: Assessing Electronic Health Records and Medical Claims Data To Support Regulatory Decision-Making for Drug and Biological Products” (depicted in Figure 1 of the guidance) and provides considerations for maintaining data quality, reliability, and relevance at each life cycle stage. The guidance broadly considers data curation as “the processing of source data through the application of standards for exchange, integration, sharing, and retrieval.” This encompasses activities such as data cleaning, error correction, terminology standardization, and integration across systems. While conceptually valid, data curation activities in practice represent the first step of a complex data management process undertaken and influenced by a diverse array of data holders and managers. Data holders and managers include health systems, industrial and academic researchers, and many other entities who hold or modify source data on their journey to becoming an analytic dataset.

As described by the guidance, data curation processes should not alter nor obscure the clinical meaning of the source data and preserve important contextual information. During real-world data (RWD) management operations, multi-source EHR data—both structured and unstructured—may be mapped, harmonized, or otherwise processed across unique platforms to achieve interoperability and effective analytic usability. However, data curators presently lack consensus regarding how to respond to regulatory considerations across FDA’s three life cycle stages and apply corresponding guidelines accordingly. Practical complications arise as multiple data holders variably apply processes to data during curation without transparent and well-documented processes. Unstandardized or unclear data curation practices can limit the regulatory utility as well as relevance and reliability of EHR-sourced data and therefore regulators’ ability to assess exposures, outcomes, and covariates.

In this white paper, we elucidate these underlying considerations to guide multisectoral EHR-sourced data curation processes. Our discussion is based within the FDA’s understanding of the EHR data life cycle and is primarily meant to support stakeholder alignment with regulatory expectations, especially in instances where traditional randomized controlled trials (RCTs) are infeasible or unethical to serve unmet medical need. We discuss how RWD accrued via EHRs are curated to preserve their clinical meaning and analytical reliability with three distinct actions and considerations in mind:

  • Defining EHR-sourced data curation as both a concept and practice;
  • Implementing real-world best practices and tools to address pertinent regulatory concerns for curation; and
  • Involving artificial intelligence (AI) in the process of EHR-sourced data curation.

Through this white paper, we aim to bridge conceptual and operational gaps by examining the curation phase as a distinctly iterative process within the FDA framework (compared to accrual and transformation). We discuss tangible best practices and tools for standardization and transparency, such as predefined curation procedures, audit trails, and provenance documentation. Importantly, we address emerging AI considerations—an area that experts emphatically call the RWE community to address, since AI tools are rapidly being integrated into EHRs and curation-focused operations. Finally, we propose a series of policy recommendations to support multi-disciplinary efforts involved in EHR-sourced data curation.

Read the full paper here.

Duke-Margolis Authors

Matthew Matt D'Ambrosio Headshot Photo

Matt D'Ambrosio

Senior Policy Analyst

Patrick Headshot

Patrick Rodriguez, MA

Policy Analyst
2024 Margolis Intern

Nora Emmott

Nora Emmott, MPH

Policy Research Associate

Dr. Rachele Hendricks-Sturrup, Research Director, Real World Evidence

Rachele Hendricks-Sturrup, DHSc, MSc, MA, FACTS

Research Director, Real-World Evidence
Senior Team Member