Generalizability, Domain Shift, and Localization
Screenshot from Dr. Ronald Summer’s presentation at FDA Public Workshop on Feb. 22, 2020.
By: Ed Butler
The FDA held a two-day public workshop titled, Evolving Role of Artificial Intelligence in Radiological Imaging, Feb. 25-26 on the campus of the National Institutes of Health (NIH) in Bethesda, MD. This workshop focused on questions of safety and effectiveness of emerging AI-enabled technologies, such as autonomous AI, adaptive AI, and AI-guided image acquisition systems. One of the more interesting themes discussed over the workshop involved trade-offs between benefits of generalized performance of AI-enabled machine learning applications and the benefits of localizing performance for specific populations, care settings, and clinical purpose.
The FDA invited speakers with varied perspectives, including NIH labs, practicing radiologists, professional association representatives, and patients. The practical benefits of applying AI-enabled computer vision to radiological imaging were clear in many of the presentations from the NIH and commercial developers of medical AI software devices.
Three related themes emerged from the presentations and panel discussions. This includes:
- The FDA’s concern that performance of medical devices should be generalizable across the US population and care settings;
- Training data and problem of “domain shift” in which an AI software medical device that has been trained on one population or in one care setting is used with a new setting, potentially resulting in suboptimal performance;
- The ability to make local adjustments—also known as “localization”—to a generalized AI system to improve its performance with a new population, care setting, or intended use. Federated learning may be a way to balance the needs for generalized and localized performance optimization.
The current FDA review process generally prefers that medical device performance be generalizable. Questions and statements from most panelists were generally favorable to the way the FDA has approached the regulation of AI software medical devices in radiology, although many speakers had specific recommendations for continued modernization.
Figure 1. Dr. Ochs’ presentation provides insight into the approach the FDA takes to regulate radiological AI and machine learning devices.
Robert Ochs, Deputy Director of the FDA Center for Radiological Health, provided an overview of the current regulatory landscape for AI and machine learning in radiological imaging. The approach the FDA takes in regulating radiological imaging AI has a profound influence on commercial availability of products, and consequently on the product strategy of software device manufacturers. Because computer aided detection (CADe) and computer aided diagnosis (CADx) applications were considered Class III until Jan. 2020, the burden of proof of clinical effectiveness was very high, resulting in relatively few approvals. Meanwhile, advances in deep learning and the popularity of data science competitions were mobilizing thousands of scientists who produced working CADx models. The agency created a new Computer Aided Triage and Notification product class (CADt) which now, according to one speaker, accounts for most of the current wave of startup activity.
In Jan. 2020, the FDA down-classified many CADe devices from Class III to Class II, such that faster pathways now exist to gain regulatory clearance or approvals for software medical devices that suggest clinically relevant findings and aid diagnostic decisions.
Dr. Ochs explained some of the issues facing regulators now that technology advances are increasing the feasibility of autonomous devices. He shared that (from lightly edited transcript):
“…. We’re going to promote best practices in the study designs that match the intended use and indications. We want to avoid seeing — tuning in to the validation data set, which is a common mistake that we see. The challenges are assessment of the generalizability and the true match the intend use and indications want to avoid tuning to the validation data set which is a common mistake that we see. The challenges are assessment of the generalized robustness of the system and want to think about the pre-specification for algorithm changes and testing protocols and seek greater insight into premarket performance and a lot of this involves the thoughts you heard earlier about digital health efforts how to regulate the devices adjust for continuous changes, and do so in a way that’s practical and helpful for our mission.”
The agency has recently begun to approve certain autonomous AI devices capable of determining which images are normal and which require further review. These devices are not without precedent. In 1995, the FDA approved an automated cervical cytology slide reader, BD FocalPoint GS (P950002-S002). More recently the agency approved a device for autonomous detection of diabetic retinopathy (IDx-Dr).
The agency and others asserted that the lessons learned in the regulation of CAD devices can inform the ongoing development of regulatory science related to autonomous AI image analysis devices, when the regulatory bar is even higher.
2. Training Data Enhancement and Domain Shift
One of the issues in the future of AI in emerging radiology image processing arises from new implications arising from enhancing training data sets for specific purposes. Dr. Ronald Summers, Senior Investigator of the Imaging Biomarkers and Computer-Aided Diagnosis Lab of Radiology and Imaging Sciences at NIH, described how Dr. Yuxing Tang, from his group, with Generative Adversarial Networks, or GANS that can generate synthetic images that can be used for training algorithms. Dr. Tang was able to create pediatric radiographs from adult images, used it to train a system that was accurate in detecting abnormalities in real pediatric radiographs as shown in the screenshot above.
Dr. Summers also described challenges including “domain shift,” a scenario in which a system trained on one set of patient data “does not generalize well to another patient population because of differences in the image acquisition technology or in the patients themselves.”
During the panel discussion Dr. Summers elaborated, sharing that his team has encountered experiments in their lab in which systems they had initially thought were trained well failed to adapt to new data:
“So, for instance, a developer has a cleared algorithm that he brings to a site. The performance is not as good as were in the parameters specified in their FDA clearance so what happened is that they end up training it on a hundred, a thousand, some number, of cases at the site. The performance of the algorithm improves, of course, biased to their site, but so what, that’s where you want it to work the best, right?”
3. Localization and Federated Learning
The idea of locked devices versus localization stimulated some debate, and will no doubt continue to do so. One of the more intriguing ideas was presented by Dr. Peter Chang, Co-Director of the Center for Artificial Intelligence in Diagnostic Medicine at the University of California Irvine. He suggested that federated learning approaches offer new ways to safely enable the localization of algorithms for better performance:
“Next we have more federated machine learning approach, so rather than a single central learner that learns simultaneously across sites we take the learning algorithm and pass it around to different potential collaborators.
This model itself is again very interesting to think about because you can imagine that during deployment… rather than giving you an institution a fully trained 100 percent algorithm I can give you a federated model that’s been trained 90 percent of the way there and your specific institution is simply the final node in that federated model. You’re the last person to update the weights.
Every institution in that case may have their own unique model to work with. Sort of the hyper-optimization mentioned this morning so certainly a very feasible and interesting approach.”
Federated learning also has the benefit of not centralizing training data, and by avoiding transferring patient data to new organizations this has both a privacy benefit and a data storage benefit. This may be especially useful in complying with laws restricting movement of even anonymized patient data outside the country.
Asking the right questions
This FDA Public Workshop was an exciting open discussion of issues that matter to the developers of machine learning applications using radiological images. In fact, the number of analogies to self-driving cars speaks to the relevance of the questions of generalizability, domain shift and localization across the spectrum of AI applications in the 21st century. Part of getting to the right regulatory process is asking the right questions, and the workshop was a good example of that in action. We look forward to continued discussion of these issues. As Dr. Chang mentioned, it is not only the AI vendors who are active in this area. Healthcare delivery organizations, especially those with academic resources, are on the forefront of developing and deploying software medical devices:
“… while today we see that most of that algorithm development may be done in commercial entities, I see that in the very near future academic hospitals or university departments will take it upon themselves more and more to be building home grown solutions. In fact, to the point where I imagine there will be a rapid blurring between what we often times consider as a research project at a single institution, versus full clinical deployment in the hospital. And so certainly the question here is what the scope of potential regulatory considerations might be, will the regulatory burden be placed on companies whose job is to curate and aggregate models from different academic hospitals, will it in fact be on the very specific institution if they are producing many models that a lot of different hospital are using.”
The existing regulatory toolkit, with its general and special controls, can be supplemented by improved machine learning development practices that provide assurances of safety and effectiveness. It is becoming easier to build powerful models, and with federated learning, it is possible to adjust performance tuning parameters to fit specific needs. The richness, depth, and candor of discussion at this public workshop are positive indicators that the FDA is asking the right questions.
It is also clear that forward-looking healthcare delivery systems and commercial medical technology vendors are pushing the boundaries in a thoughtful, responsible manner. It is encouraging that the FDA and professional organizations are helping to lead the conversation.