Central repository of digital pathology slides to support the development of artificial intelligence tools

This is a future call currently planned to be published 26/06/2019. All information regarding future IMI Call topics is indicative and subject to change.

Specific challenges to be addressed

Although pathology is the cornerstone of the workup of many diseases such as cancer, autoimmune diseases, transplant rejection, it still relies heavily on the subjective interpretation of a histology sample by a qualified pathologist who captures observations and conclusions in a report. Once the observations are captured, the slides are archived and only the pathologist’s report and diagnoses (considered as raw data in Good Laboratory Practice (GLP) nonclinical studies) remain accessible. Therefore, significant information from the histology slides are no longer easily available. This hinders the discovery of new clinico-pathological entities that are relevant to patients’ prognosis and treatment.

The recent developments of high-throughput slide scanners offer a possibility for making the entire information contained in the millions of glass slides produced every year, available for search. Ensuring storage and access to digital slides will overcome the current limitations to accessing and sharing pathology material together with the associated metadata. It will facilitate case consultation, help identify sub-types of diseases, assess the translatability of nonclinical safety observations and animal models, and thereby rationalise the design of clinical trials and the use of animal models.

The rise of deep learning and its unexpected ease at interpreting images offer unprecedented opportunities to develop tools for automated detection, classification and quantification of abnormalities in tissues. Hence, many initiatives are already looking at utilising histopathology slides in a digital format as a source of data for biomedical research. Current researches focus on a relatively reduced set of diseases and/or are fragmented and geographically limited, which may hinder their ability to deliver outside of much-targeted applications.

Mostly because, although clinically relevant and efficient, disease-centric models cannot be easily expanded towards more general purposes.

However, the full transformative potential of deep learning applied to histopathology goes far beyond what is presently undertaken. In the future, it will provide the pathologist with smart suggestions regarding diagnoses and mechanistic or therapeutic hypotheses (predict patient’s outcomes and responses to treatment), significantly improving overall patient’s safety and diagnosis. To achieve this ambitious goal, a much larger series of slides offering a broader coverage of tissues and lesions are required. Whereas such coverage may be difficult to achieve solely with clinical material, nonclinical toxicology studies provide an incredibly valuable and abundant source of histopathology slides, comprising all the normal tissues from multiple species, and a large diversity of lesions. As these lesions are similar to those seen in clinical practice, but in a more pure form, and at stages rarely encountered in humans, they will be a great help for the community developing artificial intelligence (AI). They will also likely offer an opportunity to expedite the development of assisted diagnosis tools applicable to nonclinical safety studies and clinical practice.


The overall scope of the call topic is to collect, host and sustain virtual slides along with associated data and to support the collaborative development of artificial intelligence in pathology.

The funded action will also address the regulatory, legal and ethical challenges associated with the collection, sharing and mining of the virtual slides.

Objective 1: Sustainable infrastructure

To deliver the infrastructure hosting the several petabytes of digital slides and making the data accessible for research. It represents the hardware layer of the funded action and could take the form of the data centre, either centralised or decentralised. The key factors of success for this objective are the storage capacity and the possibility to exchange rapidly large amounts of data. 3 The achievement of this objective is also critical for sustainability and the long-term impact of the funded action. The ambition is that after the end of the funded action, the repository will be maintained and developed, following a model similar to public repositories for genomics (e.g. NCBI/GEO — https://www.ncbi.nlm.nih.gov/geo/) and that it becomes the central place for hosting raw digital slides associated with scientific and medical publications. The planned infrastructure is expected to allow pathologists to concomitantly review difficult cases and to consolidate large case series including histopathology and clinical information in order to establish diagnostic criteria. The sustainability beyond the end of the funded action will take the form of a business model that leaves open access free of charge for non-profit purposes. This will represent a major advantage compared to the current approach of small size databases.

Objective 2: Data

To compile digital histopathology slides from nonclinical safety studies, as well as from clinical series needed to populate the initial version of the repository, and contribute to developing tools and artificial intelligence models. The key factor of success is the diversity of lesions, tissues, and species while providing sufficient sample sizes. In addition, the slides will be made publicly available for the development of artificial intelligence in pathology in line with the sustainability model described in objective 1.

Objective 3: Tools

To deliver a mechanism of an honest broker (see “Expected key deliverables” and “Suggested architecture of the full proposal” sections) by developing a software ensuring the optimal and secure contribution of clinical and nonclinical material. Efforts will also be undertaken to propose a unified open digital slide format and tools to search, access, upload, register, download, view and homogeneously annotate information. In addition, AI models and tools, such as assistance to general diagnosis, screening for slides for lesions, and content-based image retrieval will be developed at a later stage of the funded action.

Objective 4: Regulatory framework

To advance the regulatory framework around the utilisation of digital pathology slides for nonclinical safety testing, evaluation of clinical trials and dissemination/discussion of difficult clinical cases. This will accelerate the adoption of roadmaps for the qualification of the usage of digital slides for peer-review or primary slide reading, as well for the development of artificial intelligence based tools for pre-screening and assisted diagnosis. This objective should be achieved by building on already existing and ongoing interactions and efforts between health and regulatory authorities, and professional societies.

Read more here.