Natural Language Processing for Cancer Registry

We conducted a feasibility study in collaboration with a pediatric oncology research organization to partially automate the extraction, abstraction, and loading of patient data into a cancer registry for a consortium of institutions. Our goals were to reduce the labor intensity of the manual process and to improve quality.

The problem we sought to solve in this project was the inefficient use of project coordinators’ time for:

  • Extracting data from source electronic health record systems
  • Manually interpreting the data
  • Assigning values to 1,900 data elements in the REDCap system (abstracting)
  • Entering this data into REDCap

This process took up to three hours per case.
To address the problem, we designed an automated data bridge to ingest the data automatically fed from source Electronic Health Record systems, PDFs, and other electronic files submitted by the consortium locations.

The results of this feasibility study demonstrated our success in developing NLP algorithms capable of mapping a complex set of diagnostic and treatment terminology to appropriate structured data fields. However, our other finding was that achieving the needed time savings required a complex integration into the source Electronic Health Systems.