logo

The First Norwegian Clinical Language Model Developed

Norway has taken a significant step forward in the use of artificial intelligence (AI) in the healthcare sector. Researchers at the Norwegian Centre for E-health Research have successfully developed the first Norwegian clinical language model, called NorDeClin-BERT.

(Illustration photo: Colourbox)

This model is based on advanced natural language processing (NLP) and is specially adapted to extract knowledge from clinical texts. This opens a range of possibilities in health research and patient care. However, before the service can be fully developed, researchers must first anonymize patient data.

“The language model is a result of continued pre-training from the Norwegian general language model NorBERT on pseudonymized clinical data from the gastrointestinal surgery department at the University Hospital of North Norway. We hope to get the model approved soon so that more people can use it, thereby providing invaluable assistance in healthcare services,” says Senior Researcher Phuong Dinh Ngo from the Department of Health Data and Analysis at the Norwegian Centre for E-health Research.

What is NorDeClin-BERT?

NorDeClin-BERT is a language model based on BERT technology (Bidirectional Encoder Representations from Transformers) originally developed by Google in 2018.

This language model is trained on Norwegian clinical texts, enabling it to understand medical terms and contexts in a way that general language models cannot. This is crucial for the model's application in the healthcare sector, where precise and accurate text comprehension can be vital.

“Artificial intelligence is already contributing to solving some tasks within healthcare services, and this project represents a further step in the use of AI in healthcare. I am concerned about the safe use of AI, and here we have a language model trained on real Norwegian health data. That is excellent! I would also like to take this opportunity to congratulate the Norwegian Centre for E-Health Research for developing this AI model that addresses culture and language in the Norwegian healthcare system,” says State Secretary in the Ministry of Health and Care Services, Ellen Rønning-Arnesen. (Photo: Esten Borgos, Borgos Foto AS)

Challenges and Solutions

One of the biggest challenges in developing NorDeClin-BERT has been access to clinical data. Clinical texts contain sensitive personal information, and extensive approvals are required to use this data for research.

The ClinCode group at the Norwegian Centre for E-health Research has worked for 4-5 years to gain access to the necessary data and has developed methods to pseudonymize the data to protect privacy.

“Securing access to clinical data was a significant challenge. We dedicated years to navigating approvals and developing robust pseudonymization methods to protect patient privacy. Now, we are awaiting final approvals to deploy NorDeClin-BERT and look forward to its positive impact on healthcare delivery,” says Researcher Miguel Ángel Tejedor Hernández at the Norwegian Centre for E-health Research.

The Importance of NorDeClin-BERT for the Healthcare Sector

NorDeClin-BERT has the potential to revolutionize how healthcare professionals handle clinical information. The model can be applied to a variety of tasks, such as automatic coding of diagnoses (ICD codes), identifying drug names in texts, and even pseudonymizing text.

This can lead to faster and more accurate processing of patient information, which in turn can improve patient safety and streamline hospital administration.

“Clinical text differs from regular Norwegian text in that doctors and healthcare professionals may write it in different ways. They may use different names with different meanings, so a model capable of decoding and understanding the language used by healthcare professionals is an important innovation to improve both patient care and efficiency in the healthcare sector,” says Phuong Dinh Ngo.

Competition and Collaboration

While we now have our first clinical language model within gastrointestinal surgery, other actors in Norway, such as Helse Vest IKT, Helse Bergen, Helse Fonna, Helse Stavanger, Helse Førde, and DIPS, have been working on similar projects.

“With AI, healthcare professionals will be able to use their time more efficiently, which can contribute to more labour-saving processes. ‘For example, more efficient and better content in medical records, supporting the doctor's work process by combining information from, for example, blood tests, image examinations and journal text, retrieving and quickly utilising new research-based knowledge, asking a digital colleague for advice and, for example, getting suggestions for possible diagnoses, and getting help to assess risks in treatment,” says State Secretary Rønning-Arnesen.

NorDeClin-BERT has benefited from collaboration with its Swedish partners, who have provided valuable insights and resources from the Swedish research infrastructure Health Bank at Stockholm University, as well as collaboration with the gastrointestinal surgery department at the University Hospital of Northern Norway.

This collaboration has been crucial in accelerating development and ensuring that Norway can stay at the forefront of this technological development.

The Road Ahead

“Using NorBERT Norwegian general language and pseudonymized Norwegian clinical text together makes the generated final model much safer than only using clinical text for the whole model,” says Professor Hercules Dalianis at the Norwegian Centre for E-health Research.

Researchers have already applied for approval of the language model to make it shareable with other researchers and healthcare institutions. Efforts are underway to secure funding for an implementation study at some hospitals to see how this looks in a real clinical treatment process.

Although the project is slated for completion in 2025, we anticipate that the first version of NorDeClin-BERT will be ready for deployment in the healthcare sector by the latter half of 2024, as we continue to push forward with approvals and implementation studies, concludes Miguel Ángel Tejedor Hernández.

The goal is for the model to become a resource for the entire Norwegian healthcare system, with the possibility of further development and adaptation to more medical fields.

Effects

The development of NorDeClin-BERT marks the beginning of a new era for the use of artificial intelligence in the Norwegian healthcare system. With the potential to improve everything from patient safety to hospital administration, this model is an example of how advanced technology can contribute to better health for everyone. At the same time, it puts Norway on the map as a pioneer in clinical AI research.

What is clinical text?

Clinical text refers to written documentation used in the healthcare sector that contains information about patients' health status, diagnoses, treatments, medications, and other relevant medical details. This includes, among other things:

  • Medical Notes: Documentation written by doctors, nurses, and other healthcare professionals during a patient’s treatment.
  • Discharge Summaries: Summaries of a patient's medical history, diagnosis, and treatment written upon discharge from the hospital.
  • Referrals: Written information sent from a physician to a specialist or another treatment facility.
  • Diagnoses and ICD Codes: Categorization and coding of diseases and medical conditions.
  • Prescriptions: Written orders for medications and treatment plans.

Clinical text is often complex and technical, with specialized medical terms and expressions that can vary depending on the context. Accurate and precise understanding of this text is crucial for the correct treatment and care of patients. This is where clinical language models, such as NorDeClin-BERT, can play an important role by automating and improving the process of interpreting and applying information from clinical texts.