© unsplash/@freestocks

© unsplash/@freestocks
Healthcare & Pharma

Tracking of side effects and updating of SmPCs


At medical approval, datasheets called "Summary of Product Characteristics" (SmPCs) are collated by pharma corporations which are constantly reviewed by the European Medicines Agency (EMA) or in Germany "Bundesinstitut für Arzneimittel und Medizinprodukte" (BfArM).

After market launch of the medicine, side effects and observations during medication need to be tracked and documented continuously. For this purpose, pharma corporations employ a compliance department that collects all notifications about side effects observed during medication which are sent to them by doctors from hospitals or in practices.


Pharma corporations need to ensure to have a reliable process for handling notifications of side effects. However, these notifications can come in various formats and through different channels. For example, a notification can be reported to the EMA or BfArM directly, to the pharma corporation or published in a social media post. Additionally, the notification can be reported by phone, email, letter or published in the web.

Organizing, priorizing and evaluating the severity of these notifications is therefore a critical process in pharma corporations as it substantially increases their chances to act according to EMA regulations and inform the public timely about potential side effects. After the notifications are structured, the SmPCs can be automatically updated.

Potential solution approaches

The solution for the complete automated update contains several steps. On one hand, the free-text information in the SmPC has to be extracted, matching the relevant categories of the pharma corporation (e.g. active ingredient, interaction effects, etc.). Identifying the relevant text can be accomplished by using a natural language processing (NLP) model. Commonly used techniques for text classification tasks are TF-IDF algorithms, naive Bayes classifiers, word embedding methods and LSTM networks.

On the other hand, the information of notifications has to be extracted and categorized, too. Using scraping to obtain text from websites or social media posts, the scraped text must probably be processed with more advanced approaches. Here, BERT or domain specific word embeddings (such as BioBERT for biomedical language) or supervised learning approaches based on labelled data might be chosen.

Eventually, after retrieving information of both inputs, the SmPC and the notification, are processed, the actual update request has to be evaluated and prioritized with respect to urgency. In the evaluation step the processed notification has to be compared to the processed SmPC to avoid redundant information. For the final step, the prioritization, a (boosted) decision tree, trained on the historic data set of earlier updates of the SmPC can be used.

Related webinars

Semantic search and understanding of natural text with neural networks: BERT

In this webinar you will get an introduction to the application of BERT for Semantic Search using a real case study: Every year millions of citizens interact with public authorities and are regularly overwhelmed by the technical language used there. We have successfully used BERT to deliver the right answer from government documents with the help of colloquial queries - without having to use technical terms in the queries.

Konrad Schultka

Machine Learning Scientist

Jona Welsch

Machine Learning Scientist

Automated answering of questions with neural networks: BERT

In this webinar we will present a method based on the BERT model for automated answering of questions.

Mattes Mollenhauer

Machine Learning Scientist

Recurrent neural networks: How computers learn to read

The webinar will give an introduction to the functioning of RNNs and illustrate their use in an example project from the field of legal tech

Fabian Gringel

Machine Learning Scientist