You are opening our English language website. You can keep reading or switch to other languages.
22.06.2023
5 min read

A Natural Language Processing Tool for Oncology Patients

At the beginning of this year, DataArt’s team and their partner participated in the podcast by PharmaTalkRadio at the Conference Forum. The meeting brought together Jamie McCulloch, Vice President of Client Solutions at DataArt, and Mark Shapiro, Chief Operating Officer at xCures. The guests discussed their collaboration on a natural language processing tool that helps oncology patients identify and understand the options for treatment. The solution was based on the following technologies: React.js, Node.js, Python, spaCy, AWS: Lambda, EC2, Cloudfront, SQS. The DataArt Healthcare and Life Sciences team is pleased to share the highlights of their fascinating conversation here on our blog.

A Natural Language Processing Tool for Oncology Patients

Jamie McCulloch: What does the xCures technology platform do for patients?

Mark Shapiro: The platform has always been designed to deliver treatment options to cancer patients and their oncologists. To do that with high quality, we need to look through the patient's medical records. In terms of cancer treatment, the most interesting information is in free text. It is in the pathology and radiology reports and clinic notes recorded by physicians.

If anyone wants to make an accurate, informed prediction of things that would be useful for that patient and their doctor to consider, one needs to read through the free text. The language in the cancer-free text notes is full of jargon and abbreviations. It is very domain-specific, even to the point where you will sometimes find made-up compound words. From a natural language processing task, it is a challenging area, and there is no large source of clean annotated training data. We had come up with a strategy that we wanted to work on together to build and custom-train a model based on customized oncology training data that could span those three domains, which have different language features.

Radiologists capture and speak differently than clinical or medical oncologists or pathologists, but also bridge those language differences to other domains like clinical research protocols. We are trying to provide patients and their oncologists with expert-level information that could be read and derived from their medical records and cross-indexed to a lot of other data sources.

Image

Jamie McCulloch: Where is xCures today? What impact for patients and hospital organizations have you seen over the past two years since we started to work together?

Mark Shapiro: The goal has always been to continue improving and scaling the ability to make accurate predictions of the right options for patients. Today we have a number of oncologists who use our platform to help them find and identify potential information relevant to their patients, including treatment options. We get rave reviews, and people love the product. But it has been obviously a very complex and challenging thing that we tried to do, and a long road that has taken many different pieces of AI and natural language processing, including this, but certainly not limited to this tool.

The other important piece for us was that it is not just identifying the important information in the patient's medical records and how that relates to other high-quality data sources that may be out in the public domain, but how you standardize all of this stuff so that when you are talking about medical concepts, you are talking about standardized medical concepts that have consistent meaning in different domains.

It is not just natural language processing, where we started focusing on named entity recognition; it is also entity classification and coding, which include things like biomedical ontologies. All of those things need to be captured for the type of inference we do with our recommendation engine, looking for patient options.

Jamie McCulloch: How has the platform transformed from your original idea into what you have now? And how many pivots have you had to make?

Mark Shapiro: Over at least two years, we did a pretty exhaustive dive into anything we could find purported to be biomedical NLP. However, none were particularly good at parsing free text cancer notes. We ultimately decided to custom-train something. We have accumulated a lot of training data and came up with some great ideas on how to build custom training data sets that were highly domain-specific and would be able to transfer across a couple of the different linguistic domains we were interested in.

Jamie McCulloch: How is it working with the DataArt team? What was that like from the start?

Mark Shapiro: There is a big difference between natural language processing and what passes for machine learning these days. Although we are utilizing certain machine learning tools, a lot of custom-domains specific expertise is required for NLP. In view of this, finding proper people and getting them up to speed on a particular highly refined domain takes time. We were very pleased working with DataArt and being able to dive deep into the oncology domain.

Taping into some of the experts across your organization was helpful, especially at some key challenging points as we were trying to figure out how to move the models forward, improved the evaluation, and we did that at several points. In projects like this, you learn things you did not know before, and sometimes you have to take a step back and move forward in a slightly different direction.

DataArt considers the project with xCures to be one of the most significant collaborations in the interests of patients. It offered a unique opportunity to create something with real, tangible benefits for fellow humans. It was also an inspiring technical challenge because NLP and AI are rapidly evolving areas of technology. The team managed to use this momentum for their benefit and, respectively, to reassess and pivot their approach at several stages of the project. The result speaks for itself, and we are excited to see what comes next for xCures. 

You can listen to the full conversation on the Pharma Talk Radio podcast, available on Spotify, Apple, and wherever you get your audio hits. 

Subscribe to Our Newsletter

Subscribe now to get a monthly recap of our biggest news delivered to your inbox!