Fact sheet 09-2018

Artificial intelligence and machine learning in healthcare

Increased use of information systems in health services and digitization of patient information generates large amounts of data. Data is stored in many systems in different formats. About 80 percent of health data are unstructured.

The data has a high potential in provision of healthcare services (primary use) as well as quality improvement, research, public health, management and planning (secondary use).

The combination of large amounts of varied data, increased computing power and better methods enables fast and automated production of machine-learning algorithms able to analyze complex data with accurate results.

Four types of machine learning

  • In supervised learning, the algorithms learn by comparing their own predictions with the sample data that a human “supervisor” (expert) has reviewed and provided correct answers/ labels.
  • For unsupervised learning, the goal is to find underlying structures or patterns in the data. The training data has not been reviewed beforehand by a human expert and, therefore, there are no correct answers.
  • Semi-supervised learning combines the use of a small amount of labeled data with larger amounts of unlabeled data.
  • Enhanced learning means that the algorithm is rewarded when it takes the right decision in a given situation.

Challenges in machine learning implementation

The algorithm becomes a “black box” when the process behind the results is too complex to understand.

Personal health information must be treated in a privacy-preserving manner. According to GDPR, data must be sufficient, relevant and limited to what is necessary to achieve the purpose for which the data is processed. Patients must be fully informed about the cause of data processing to be able to choose whether or not allow the use of their data in an algorithm.

The algorithm can produce bad predictions. The common causes of this are overfitting and underfitting. In case of overfitting, the algorithm performs well on the training data, but underperforms on the new data because instead of finding a general pattern explaining the variation, it has found some random patterns in the training data. Underfitting means that the algorithm does not fully exploit the data provided for its training; this happens in case of choosing an insufficiently complex model. Overfitting and underfitting come together. If one decreases, the other one will increase. It is therefore important to find a balance between those - then the total prediction error is minimal.

What is needed for the successful implementation of machine learning?

Machine-learning systems differ from traditional software systems. Machine learning uses self-learning, continuously improving algorithms. Before implementation in real-world conditions, the system must be trained on local data. Each algorithm has its strengths and weaknesses, therefore, it is important to try several algorithms to find out which one works best on the current problem. The strategy for developing a machine-learning system for healthcare should involve mapping of clinical needs and technological capabilities to identify good cases for pilot projects. Pilots showing good results can then be scaled up.

Machine learning has been tested in many research projects with promising results. If machine learning is going to be used for clinical decision support, a ready-trained model and access to individual patient data is needed. This means that the system must be integrated with electronic health record (EHR) or other systems where data is stored. This can be done in several ways, either 1) by incorporating with the EHR system/ radiology system, 2) as a cloud service provided by a third party, or 3) as a service in a private cloud.

Where has machine learning the highest potential?

The use of machine learning can convert data to knowledge, which will lead to disruptions1 in at least three healthcare areas, with different time horizons:

  • Interpretation of medical images. This area is most developed. Many projects have proven their effectiveness, showing results as good or even better than medical specialists. In the next few years, the area will develop rapidly.
  • Prognostics. Algorithms for prognostics are not so developed. It will take about five years before this area becomes mature enough. Diagnostics. This is the most complicated of the areas. It will take about ten years before such solutions are ready for use in practice.

1A disruptive technology is a technology that can dramatically change the established structures and business models. If an algorithm takes over large parts of the radiologists’ job of analyzing medical images, this can be considered a disruption.