Development of a generalizable natural language processing pipeline to extract physician-reported pain from clinical reports - Generated using publicly-available datasets and tested on institutional clinical reports for cancer patients with bone metastases

Image credit: Hossein Naseri

Abstract

Objective: The majority of cancer patients suffer from severe pain at the advanced stage of their illness. In most cases, cancer pain is underestimated by clinical staff and is not properly managed until it reaches a critical stage. Therefore, detecting and addressing cancer pain early can potentially improve the quality of life of cancer patients. The objective of this research project was to develop a generalizable Natural Language Processing (NLP) pipeline to find and classify physician-reported pain in the radiation oncology consultation notes of cancer patients with bone metastases. Materials and Methods: The texts of 1249 publicly-available hospital discharge notes in the i2b2 database were used as a training and validation set. The MetaMap and NegEx algorithms were implemented for medical terms extraction. Sets of NLP rules were developed to score pain terms in each note. By averaging pain scores, each note was assigned to one of the three verbally-declared pain (VDP) labels, including no pain, pain, and no mention of pain. Without further training, the generalizability of our pipeline in scoring individual pain terms was tested independently using 30 hospital discharge notes from the MIMIC-III database and 30 consultation notes of cancer patients with bone metastasis from our institution’s radiation oncology electronic health record. Finally, 150 notes from our institution were used to assess the pipeline’s performance at assigning VDP. Results: Our NLP pipeline successfully detected and quantified pain in the i2b2 summary notes with 93% overall precision and 92% overall recall. Testing on the MIMIC-III database achieved precision and recall of 91% and 86% respectively. The pipeline successfully detected pain with 89% precision and 82% recall on our institutional radiation oncology corpus. Finally, our pipeline assigned a VDP to each note in our institutional corpus with 84% and 82% precision and recall, respectively. Conclusion: Our NLP pipeline enables the detection and classification of physician-reported pain in our radiation oncology corpus. This portable and ready-to-use pipeline can be used to automatically extract and classify physician-reported pain from clinical notes where the pain is not otherwise documented through structured data entry.

Publication
In Journal of Biomedical Informatics
Hossein Naseri
Hossein Naseri
PhD Student
Julia Khriguian
Julia Khriguian
Advanced Radiation Oncology fellow at the MD Anderson Cancer Center
John Kildea
John Kildea
Associate Professor (tenured) of Medical Physics