Practical sessions will take place on January 7th-8th 2021.

All selected participants for practical sessions have been notified and know which practical sessions they will be take part in. Please check out this website regularly with updates on requirements.


Each practical session will have a maximum of 20 participants. The sessions will be 6 hours long: 3 hours on the morning with a break, and 3 hours on the afternoon with a break.

In the case of half sessions, please note that they are paired. Participants will attend both sessions on the same day.

Argumentative Analysis of Clinical Trials for Evidence-based Medicine

Organizer: Serena Villata, with Marco Milanesio

Abstract: In this practical session, we will first introduce the main elements about argument mining, a research area in Natural Language Processing which aims at extracting and classifying argumentative structures from text, then we will present how such methods can be applied to clinical text with a particular focus on Randomised Controlled Trials (RCTs). The practical session will employ a dataset of RCTs extracted from PUBMED and annotated with argument components and relations.

Biomedical Text Mining: methods, tools and applications

Organizer: Fenia Christopoulou and Chrysoula Zerva

Abstract: Biomedical Text Mining and Natural Language Processing (BioNLP) are important areas of AI that can assist health experts in biomedical research. The large amount of freely available biomedical text has made BioNLP techniques particularly attractive and useful for clinicians, via methods for automatic knowledge extraction and discovery. Typically, mining knowledge from biomedicine is considered much harder than other types of text due to the expertise required. The goal of this tutorial is to provide the participants with a hands-on overview of the main downstream tasks and applications of biomedical NLP and how these can be used to construct useful tools for biomedical research.

The session will cover three parts:

(i) Overview of key tasks and their variations depending on the target resources and domain of interest (clinical trials, scientific literature, ontologies, etc), along with a synopsis of the most established NLP techniques in the field,

(ii) Exploration of specific use cases on NaCTeM tools that will allow users to appreciate the potential of combining multiple NLP tasks in order to extract structured information and navigate vast amounts of text. Participants will learn how to use such tools in order to accelerate/facilitate their research,

(iii) Testing and comparing existing NLP models to achieve complex information extraction on biomedical datasets.

The participants will be able to build incrementally a text mining workflow that will combine a series of NLP tasks, such as Named Entity Recognition (NER), Relation Extraction (RE), Event Extraction (EE), metaknowledge identification and visualise the output using the brat (https://brat.nlplab.org/) annotation platform. Through experimentation with these modules they will get the chance to examine different deep learning architectures, and appreciate the impact of different training and fine-tuning setups on the performance of the models, depending on the target domain/task. After the tutorial, the participants should be aware of the importance of biomedical NLP in supporting researchers and practitioners in the health and biomedical domains. In addition, they should be able to understand how existing text mining systems work, on what features they rely on as well as remaining challenges. Participants would have an overview of currently available tools and techniques and how to implement and use biomedical NLP applications in practice.

Requirements:

Need to have:

● ability to read/write code in Python

● grasp of basic AI/ML concepts

Good to have:

● familiarity with ipython notebooks (colab, jupyter)

● grasp of core NLP concepts

Causal inference for observational clinical data

Organizer: Julie Josse, with Imke Mayer

Abstract: In machine learning, there has been great progress in obtaining powerful predictive models, but these models rely on correlations between variables and do not allow for an understanding of the underlying mechanisms or how to intervene on the system for achieve a certain goal. The concepts of causality are fundamental to have levers for action, to formulate recommendations and to answer the following questions: "what would happen if » we had acted differently? The idea is to search for "Human like AI", to take reasonable, robust decisions in never in never experienced situations. In this tutorial, we will introduce causal inference to answer questions such as what is the effect of Hydrochloroquine on mortality? We will present techniques in the potential framework of Rubin to estimate the average treatement effect (propensity weighthing, double robust methods) as well as heterogeneous treatment effect for personalized medicine. We will leverage powerful machine learning for statistical inference. We will also discuss the structural causal model framework of pearl and tackle the data fusion problem to combine observational and randomized control trial data.

Requirements: Basic knowledge of classical statistical and ML methods can be useful: linear/logistic regression, random forests. A knowledge of the R software is not mandatory but can be useful.

Teaching material: Install R and R studio.
Download:

  • library(ggplot2)
  • library(grf)
  • library(tableone)
  • library(cobalt)
  • library(FactoMineR)
  • library(ranger)
  • library(Matching)

Deep learning for medical imaging

Organizers: Ninon Burgos, with Simona Bottani, Mauricio Diaz Melo, Johann Faouzi and Elina Thibeau-Sutre

Abstract: This practical session will cover two applications of deep learning for medical imaging: classification in the context of computer-aided diagnosis and image synthesis. The first part will guide you through the steps necessary to carry out an analysis aiming to differentiate patients with dementia from healthy controls using structural magnetic resonance images and convolutional neural networks. It will particularly highlight traps to avoid when carrying out this type of analysis. In the second part, you will learn how to translate a medical image of a particular modality into an image of another modality using generative adversarial networks.

Requirements: Programming knowledge in languages such as Python (preferred), Matlab or R.

Disease course mapping with longitudinal data

Organizer: Stanley Durrleman, with Juliette Ortholand, Etienne Maheux, Igor Koval, Arnaud Valladier and Pierre-Emmanuel Poulet

Abstract: Longitudinal data consist of the repeated observations of subjects or objects over time. They are ubiquitous in biology and medicine as they inform about the progression of a biological phenomenon such as growth or the progression of a chronic disease.

Analysing longitudinal data requires a careful attention because repeated data of the same subjects are not independent. Linear mixed-effect models have long been a piece of choice to address this problem. They model of the progression of the underlying phenomenon and how it manifests itself in variable forms across subjects.

Recent developments made it possible to address some limitations of these methods. They account for the non-linear dynamics of progression. They do not use age a regressor, so that they can compare subjects data even if the subjects differ in their age at onset or pace of progression.

In this workshop, we will cover the theory of non linear mixed effect models and their corresponding inference algorithm: maximum likelihood estimation and expectation-maximisation (EM) algorithms. We will present first these tools in the framework of regression models, and then a particular class of disease progression modeling called “disease course mapping”.

You will practice using several longitudinal data sets from patients developing neurodegenerative diseases: Alzheimer and Parkinson disease. You will learn how disease course mapping allows you to characterise the variability of the progression profiles across subjects, impute missing data, resample data sets at intermediate time-points, predict the future progression of new patients, and even simulate cohorts of virtual patients.

Requirements:

Mathematics: linear algebra, optimisation, differentiation, basics of inferential statistics including maximum likelihood estimation

Computer science: scientific computing and data science with Python

Half session (afternoon): Online learning and experimentation algorithms in mobile health

Organizer: Walter Dempsey with Kelly Zhang and Zejang Jia

Abstract: Mobile health (mHealth) technologies provide promising ways to deliver interventions outside of clinical settings. Wearable sensors and mobile phones provide real-time data streams that provide information about an individual’s current health including both internal (e.g., mood) and external (e.g., location) contexts. This practical session discusses the algorithms underlying mobile health clinical trials. Attendees will work with mobile health experimental data to better understand online learning and experimentation algorithms, the systems underlying real time delivery of treatment, and their evaluation using collected data. The course focuses on the relationship between mobile health trial design and constrained optimization.

Requirements: Participants should be familiar with applied statistics (e.g., regression, constrained optimization), experimental design (e.g., randomized trials, Thompson sampling), and working in iPython notebooks.

Half session (morning): Gaussian Graphical Model exploration and selection in high dimension low sample size setting

Organizers: Stéphanie Allassonnière and Thomas Lartigue

Abstract: Gaussian graphical models (GGM) are often used to describe the conditional correlations between the components of a random vector. In this session, we will compare two state of the art families of GGM inference methods: the nodewise approach of the penalised likelihood maximisation. We will demonstrate on synthetic data that, when the sample size is small, the two methods produce graphs with either too few or too many edges when compared to the real one. As a result, we will propose a composite procedure that explores a family of graphs with a nodewise numerical scheme and selects a candidate among them with an overall likelihood criterion. We will  demonstrate that, when the number of observations is small, this selection method yields graphs closer to the truth and corresponding to distribution with better Kullbach Leiber divergence with regards to the real distribution than the other two. Finally, we will show the interest of our algorithm on real medical databases.

Requirements: Programming knowledge in Python and possibly R. Knowledge in conditional expectation, multidimesional Gaussian models and parameter estimate.

Teaching material:

  • python 3.6 (may work with python 2, not guaranteed) with packages:
  • rpy2
  • numpy
  • pandas
  • matplotlib
  • sklearn
  • seaborn
  • networkx
  • xarray
  • scipy
  • jupyter
  • R, with library:
  • GGMselect

Handling heterogeneity in the analysis of biomedical information: statistical learning with multiple data types and federated datasets

Organizer: Marco Lorenzi, with Yann Fraboni, Luigi Antelmi, Irene Balelli, and Andrea Senacheribbe

Abstract: This session focuses on the problem of statistical analysis of heterogeneous data in biomedical studies. Through guided examples, we will first introduce the basics of latent variable modelling for the joint analysis of heterogeneous data types (such as imaging, clinical or biological measurements). We will initially focus on linear approaches, such as partial least squares and canonical correlation analysis. We will then present more flexible methods based on recent advances in deep learning and stochastic variational inference, such as the multi-channel variational autoencoder. We will finally address the problem of deploying latent variable models for federated learning in multi-centric studies, where models must account for data-privacy and heterogeneity across datasets.

Requirements: Python (mandatory), sklearn and Pytorch (optional). Basics of linear algebra and statistics.

Learning algorithms for prediction of cancer outcomes from multiomic data

Organizers: Magali Richard and Slim Karkar

Abstract: In this session, we will learn how to predict cancer histology and prognosis using multiomic datasets (genomic and epigenomic). Participants will constitute teams of 2 or 3 people and apply the theoretical notions in two data challenges. The session will be organised as follows:

- Lecture on molecular omic data in cancer

- Lecture on machine learning classification methods

- DATA CHALLENGE 1 (moderate complexity, simple omic data, prediction of the histology of lung cancer) - Feedbacks on the results of challenge 1

- DATA CHALLENGE 2 (high complexity, integration of multiomic data, survival prediction in the context of lung cancer)

- Feedbacks on the results of challenge 2

Requirements: 

- R or python programming language (R will be a plus, but data challenges can be solved using python programming)

- basic knowledge in statistics and machine learning algorithms

- basic knowledge in cell biology (gene expression and gene methylation)

Machine learning on electrophysiology EEG signals

Organizer: Alexandre Gramfort

Abstract: Understanding how the brain works in healthy and pathological conditions is considered as one of the major challenges for the 21st century. After the first electroencephalography (EEG) measurements in 1929, the 90's was the birth of modern functional brain imaging with the first functional MRI (fMRI) and full head magnetoencephalography (MEG) system. By offering noninvasively unique insights into the living brain, imaging has revolutionized in the last twenty years both clinical and cognitive neuroscience. After pioneering breakthroughs in physics and engineering, the field of neuroscience has to face new major computational and statistical challenges. The size of the datasets produced by publicly funded populations studies (Human Connectome Project in the USA, UK Biobank or Cam-CAN in the UK etc.) keeps increasing with now hundreds of terabytes of data made available for basic and translational research. The new high density neural electrode grids record signals over hundred of sensors at thousands of Hz which represent also large datasets of time-series which are overly complex to model and analyze: non-stationarity, high noise levels, heterogeneity of sensors, strong variability between individuals, lack of accurate models for the signals.

In this course you will learn about state-of-the-art machine learning approaches for EEG and MEG signals. You will do so with MNE-Python https://mne.tools/stable/index.html which has become a reference tool to process MEG/EEG/sEEG/ECoG data in Python, as well as the scikit-learn library (https://scikit-learn.org/). You will learn to predict what people attend to given their brain activity or look at predicting sleep stages from clinical EEG data. The teaching will be done hands-on using Jupyter notebooks and public datasets, that you will be able to run on google colab.

Teaching material: https://github.com/agramfort/AI4Health_ml_eeg

Make sure also to check:

https://mne.tools/stable/index.html

https://mne.tools/stable/auto_tutorials/sample-datasets/plot_sleep.html

https://github.com/hubertjb/dl-eeg-tutorial