AI for Process Automation in the Pharmaceutical Industry

Context

At medical approval, datasheets called "Summary of Product Characteristics" (SmPCs) are collated by pharma corporations which are constantly reviewed by the European Medicines Agency (EMA) or in Germany "Bundesinstitut für Arzneimittel und Medizinprodukte" (BfArM).

After market launch of the medicine, side effects and observations during medication need to be tracked and documented continuously. For this purpose, pharma corporations employ a compliance department that collects all notifications about side effects observed during medication which are sent to them by doctors from hospitals or in practices.

Challenges

Pharma corporations need to ensure to have a reliable process for handling notifications of side effects. However, these notifications can come in various formats and through different channels. For example, a notification can be reported to the EMA or BfArM directly, to the pharma corporation or published in a social media post. Additionally, the notification can be reported by phone, email, letter or published in the web.

Organizing, priorizing and evaluating the severity of these notifications is therefore a critical process in pharma corporations as it substantially increases their chances to act according to EMA regulations and inform the public timely about potential side effects. After the notifications are structured, the SmPCs can be automatically updated.

Potential solution approaches

The solution for the complete automated update contains several steps. On one hand, the free-text information in the SmPC has to be extracted, matching the relevant categories of the pharma corporation (e.g. active ingredient, interaction effects, etc.). Identifying the relevant text can be accomplished by using a natural language processing (NLP) model. Commonly used techniques for text classification tasks are TF-IDF algorithms, naive Bayes classifiers, word embedding methods and LSTM networks.

On the other hand, the information of notifications has to be extracted and categorized, too. Using scraping to obtain text from websites or social media posts, the scraped text must probably be processed with more advanced approaches. Here, BERT or domain specific word embeddings (such as BioBERT for biomedical language) or supervised learning approaches based on labelled data might be chosen.

Eventually, after retrieving information of both inputs, the SmPC and the notification, are processed, the actual update request has to be evaluated and prioritized with respect to urgency. In the evaluation step the processed notification has to be compared to the processed SmPC to avoid redundant information. For the final step, the prioritization, a (boosted) decision tree, trained on the historic data set of earlier updates of the SmPC can be used.

More Use Cases in Healthcare & Pharma

Genome sequencing

Information extraction for electronic health records

Medical image diagnosis

Tracking of side effects and updating of SmPCs