Artificial Intelligence (AI) holds great potential for improving health and health care in the United States and globally through knowledge discovery, detection and monitoring of diseases, development of novel digital therapeutics, and augmentation or automation of clinical decision-making for diagnosis and treatment of patients. Some of these AI-enabled software tools will be medical devices (“Software as a Medical Device” or SaMD) that are regulated by the U.S. Food and Drug Administration (FDA), while others will fall outside of FDA’s authority. For all AI-enabled SaMDs, it will be critical to continuously evaluate the tool after deployment and over time to ensure that it is performing within the range expected. This may be a particular challenge for clinical decision support (CDS) tools developed with machine-learning (AI/ML) due to non-standardized electronic health records systems and ever-changing workflows.
Using high-quality real-world data (RWD) to generate real-world evidence (RWE) of the clinical performance of SaMDs may allow evaluations to be done more efficiently and can create a broader sense of how well the software tool works in multiple subgroups of interest (individuals of different ages, races, geographical areas, associated co-morbidities, etc.). This report explores what type of data would be needed to perform these types of evaluations, if those elements exist in common RWD sources, and current challenges in collecting and using such data. While many of the key takeaways apply more broadly, this report focuses on challenges in the context of post-market performance evaluations of AI/ML-enabled CDS tools. These insights and recommendations are drawn from a two-day virtual private workshop held in July 2020, a literature review, and informational calls.
The report provides recommendations for consideration in the continued development of a future regulatory model for software-based medical devices. We separate these recommendations into three areas: performance measures, data access and privacy, and data sharing and security. Performance measures correlate with the specific benefit-risk ratios of CDS tools related to changes in their accuracy as the data environments change, new workflows or standards of care are introduced, or patient populations shift, and can further vary with potential inappropriate usage. Specific data elements that should be monitored include algorithm inputs and outputs, algorithm use characteristics, and patient outcomes. These elements are ideally standardized within and across data sources with consideration of performance bias in diversity of patient populations and generalizability across health systems. The accessibility of data to conduct performance evaluations depends on privacy protection regulations, including HIPAA, the Common Rule, and state and local laws, which can apply differently depending on the entity using the data. Some of these regulations may even require updates to better enable future real-time evaluation of CDS tools. Finally, when the correct data are available and can be accessed, the physical exchange should ideally occur on secure platforms governed by data use agreements.
See Related Work
Some of the issues raised in this paper will be addressed at a Duke-Margolis hosted public webinar, “AI/Machine Learning: Regulation, Development, and Real-World Performance Evaluation,” March 22, 12:30-4:00 pm ET. Registration information is available here.