Researchers from the Veterans Affairs Health Care System are using natural language processing to identify and classify patients. Specifically, researchers are using natural language processing (NLP) models to identify patients with muscle-invasive bladder cancer as well as to identify patients with metastatic prostate cancer.

Bladder cancer has the potential to significantly increase mortality whenever it invades the muscle, making the identification of patients with this diagnosis extremely important. Researchers aimed to develop an NLP model capable of automatically and accurately identifying patients with muscle-invasive bladder cancer. Patients and pathology results from the Department of Veterans Affairs were used to develop the NLP model and assess its accuracy. The NLP model exhibited high overall accuracies, with the accuracy for predicting muscle invasion at the patient level being 96%.

Researchers used similar techniques to develop an NLP model capable of accurately identifying patients with metastatic prostate cancer. Radiology reports from the Department of Veterans Affairs were used in the development and assessment of this model. The NLP model demonstrated high performance, as it was able to predict metastasis status with a sensitivity of 91% and a specificity of 81%. These sensitivity and specificity levels suggest that the NLP model performs within the same range or even better than other methods such as ICD9/10 billing codes.

The future of machine learning and natural language processing in patient identification and classification looks promising; the models have the potential to be extremely efficient identification tools. However, researchers acknowledge that further research is needed to strengthen the models and affirm their validity in other cohorts.