Clinical Data is the New Source Code

Oct 14, 2019

Christina Nelson, Communication Specialist

Healthcare organizations have unprecedented opportunities to use their own data to customize solutions that optimize efficiency and improve patient care. To do so, healthcare teams can partner with AI developers to transform data into valuable solutions.

Openly available datasets are not enough

In recent years, organizations including the US National Institutes of Health (NIH) and Massachusetts Institute of Technology (MIT) Laboratory for Computational Physiology have released a series of medical image datasets to the public. These datasets have included Chest X-ray8, ChestX-14, Lung Image Database Consortium (LIDC-IDRI), DeepLesion dataset and MIMIC Chest X-ray Database (MIMIC-CXR).

The increasing availability of medical image datasets presents new opportunities for AI developers across the globe to contribute toward developing new medical imaging solutions. However, each of these datasets comes with its own potential limitations and quality issues.

Accurately labeling medical images requires experienced and expensive domain experts that are typically unavailable at a large scale to annotate the images in these datasets. This can result in visual label inaccuracies that impact the performance of a medical imaging solution. This problem has improved over time as institutions including the Society for Imaging Informatics in Medicine (SIIM), Radiological Society of North America (RSNA) and American College of Radiology (ACR) have sponsored Kaggle competitions that have brought forward high-quality annotated data for detecting intracranial hemorrhage and identifying pneumothorax disease in chest x-rays.

In addition, machine learning algorithms are subject to bias resulting from the data curation process. This can occur when the data used to develop a solution is not representative of the populations that the solution will be used with. As a result, the accuracy of the solution may suffer.

To move beyond relying on publicly available medical image data alone, organizations can take advantage of their internal data resources to improve care. Organizations can achieve this by partnering with AI developers to develop custom medical image solutions optimized for their patients and their specialties.

Evaluating project feasibility

Feasibility studies are beneficial to perform prior to investing resources in developing a solution because they help understand how clinical data can be used effectively. When evaluating the feasibility of a project, take the following steps:

1. Define and research clinical problems

Begin by identifying the clinical challenge the solution is trying to solve. A review of the scientific literature can establish whether anyone else has made progress in that area.

2. Identify potential issues and challenges with data

A review of public datasets and the client’s own clinical data can determine suitability for training and validating deep learning models. After reviewing available data resources, we usually begin by evaluating if there are sufficient diverse training data, if the annotations are reliable and if there is potential bias in the training or validation dataset. This allows attainable goals to be established for model accuracy.

3. Assess technical and commercial feasibility

A preliminary regulatory assessment is important to set an appropriate strategy for obtaining FDA approval or clearance for diagnostics-related innovations. An initial regulatory review can help determine potential predicate devices, prioritize the most important feature sets to be developed and to create an enhancement roadmap.

At CuraCloud, we take a collaborative approach to developing medical imaging solutions with our clinical partners that ensures data is translated into effective solutions. Explore how healthcare teams can collaborate with AI developers to develop medical imaging solutions in our Collaborative Medical AI Development White Paper.