Openly available datasets are not enough
In recent years, organizations including the US National Institutes of Health (NIH) and Massachusetts Institute of Technology (MIT) Laboratory for Computational Physiology have released a series of medical image datasets to the public. These datasets have included Chest X-ray8
, Lung Image Database Consortium (LIDC-IDRI)
dataset and MIMIC Chest X-ray Database (MIMIC-CXR)
The increasing availability of medical image datasets presents new opportunities for AI developers across the globe to contribute toward developing new medical imaging solutions. However, each of these datasets comes with its own potential limitations and quality issues.
Accurately labeling medical images requires experienced and expensive domain experts that are typically unavailable at a large scale to annotate the images in these datasets. This can result in visual label inaccuracies that impact the performance of a medical imaging solution. This problem has improved over time as institutions including the Society for Imaging Informatics in Medicine (SIIM), Radiological Society of North America (RSNA) and American College of Radiology (ACR) have sponsored Kaggle competitions that have brought forward high-quality annotated data for detecting intracranial hemorrhage
and identifying pneumothorax disease in chest x-rays
In addition, machine learning algorithms are subject to bias resulting from the data curation process. This can occur when the data used to develop a solution is not representative of the populations that the solution will be used with. As a result, the accuracy of the solution may suffer.
To move beyond relying on publicly available medical image data alone, organizations can take advantage of their internal data resources to improve care. Organizations can achieve this by partnering with AI developers to develop custom medical image solutions optimized for their patients and their specialties.