useR! 2019

Toulouse - France

user2019   Program overview

An application to browse the program is also available here.

Keynote speaker informations are available on the program page.

Tutorial details are available on the tutorials page.

08:30 Registration Concorde 1+2
09:15 Opening Welcome Concorde 1+2
09:45 Sponsor talk R Consortium Concorde 1+2
10:00 Keynote: Julia Stewart Lowndes R for better science in less time Concorde 1+2

Julia Stewart Lowndes, Mozilla Fellow and Marine Data Scientist at NCEAS

There is huge potential for R to accelerate scientific research, since it not only provides powerful analytical tools that increase reproducibility but also creates a new frontier for communication and publishing when combined with the open web. However, a fundamental shift is needed in scientific culture so that we value and prioritize data science, collaboration, and open practices, and provide training and support for our emerging scientific leaders. I will discuss my work to help catalyze this shift in environmental science through the Ocean Health Index project. Over the past six years, our team has dramatically improved how we work by building an analytical workflow with R that emphasizes communication and training, which has enabled over 20 groups around the world to build from our science and code for ocean management. R has been so transformative for our science, and we shared our path to better science in less time (Lowndes et al. 2017) to encourage others in the scientific community to embrace data science and do the same. Building from this, as a Mozilla Fellow I recently launched Openscapes to engage and empower scientists and help ignite this change more broadly.

11:00 Break Coffee break
11:30 Talks Parallel sessions
  • Applications 1 (Ariane 1+2)
  • Data handling (Concorde 1+2)
  • Education (Cassiopée)
  • Models 1 (Caravelle 2)
  • Multivariate analysis (Guillaumet 1+2)
  • Shiny 1 (Saint-Exupéry)
13:00 Break Lunch
14:00 Talks Parallel sessions
  • Applications 2 (Cassiopée)
  • Bioinformatics 1 (Guillaumet 1+2)
  • Movement & transport (Ariane 1+2)
  • Reproducibility (Saint-Exupéry)
  • Shiny 2 (Concorde 1+2)
  • Social science, marketing & business (Caravelle 2)
15:15 Break Coffee break
15:45 Sponsor talk Airbus Concorde 1+2
15:55 Sponsor talk Openanalytics Concorde 1+2
16:00 Keynote: Julie Josse A missing value tour in R Concorde 1+2

Julie Josse, Professor at École Polytechnique (France)

In many application settings, the data have missing features which make data analysis challenging. An abundant literature addresses missing data as well as more than 150 R packages. Funded by the R consortium, we have created the R-miss-tastic plateform along with a dedicated task view which aims at giving an overview of main references, contributors, tutorials to offer users keys to analyse their data. This plateform highlights that this is an active field of work and that as usual different problems requires designing dedicated methods.

In this presentation, I will share my experience on the topic. I will start by the inferential framework, where the aim is to estimate at best the parameters and their variance in the presence of missing data. Last multiple imputation methods have focused on taking into account the heterogeneity of the data (multi-sources with variables of different natures, etc.). Then I will present recent results in a supervised-learning setting. A striking one is that the widely-used method of imputing with the mean prior to learning can be consistent. That such a simple approach can be relevant may have important consequences in practice.

19:00 Gala dinner Conference dinner at the "Cité de l'Espace"
08:30 Registration Concorde 1+2
09:00 Sponsor talk RStudio Concorde 1+2
09:15 Keynote: Joe Cheng Shiny's Holy Grail: Interactivity with reproducibility Concorde 1+2

Joe Cheng, Chief Technology Officer at Rstudio (USA)

Since its introduction in 2012, Shiny has become a mainstay of the R ecosystem, providing a solid foundation for building interactive artifacts of all kinds. But that interactivity has always come at a significant cost to reproducibility, as actions performed in a Shiny app are not easily captured for later analysis and replication. In 2016, Shiny gained a "bookmarkable state" feature that makes it possible to snapshot and restore application state via URL. But this feature, though useful, doesn't completely solve the reproducibility problem, as the actual program logic is still locked behind a user interface.

In this talk, I'll discuss some of the approaches that app authors have taken to achieve these ends, along with some surprising and exciting approaches that have recently emerged. These new approaches usefully decrease the implementation effort and code duplication, and may eventually become essential tools for those who wish to combine interactivity with reproducibility.

10:15 Break Comfort break
10:25 Lightning talks Parallel sessions
  • Biostatistics & epidemiology (Ariane 1+2)
  • Open science, education & community (Saint-Exupéry)
  • Spatial & time series (Caravelle 2)
  • Text mining (Cassiopée)
  • Workflow & development (Concorde 1+2)
11:00 Break Coffee break
11:30 Talks Parallel sessions
  • Biostatistics & epidemiology 1 (Guillaumet 1+2)
  • Communities & conferences (Concorde 1+2)
  • Data mining (Cassiopée)
  • Forecasting (Ariane 1+2)
  • Models 2 (Caravelle 2)
  • Programming 1 (Saint-Exupéry)
13:00 Break Lunch
14:00 Talks Parallel sessions
  • Bioinformatics 2 (Guillaumet 1+2)
  • Numerical methods (Caravelle 2)
  • Operations & data products (Concorde 1+2)
  • Programming 2 (Cassiopée)
  • Spatial data & maps (Ariane 1+2)
  • Visualisation (Saint-Exupéry)
15:15 Break Coffee break
15:45 Sponsor talk ThinkR Concorde 1+2
15:55 Sponsor talk Safran Concorde 1+2
16:00 Keynote: Martin Morgan How Bioconductor advances science while contributing to the R language and community Concorde 1+2

Martin Morgan, Professor at Roswell Park Comprehensive Cancer Center (US) and Bioconductor

The Bioconductor project has had profound influence on the statistical analysis and comprehension of high-throughput genomic data, while contributing many innovations to the R language and community. Bioconductor started in 2002 and has grown to more than 1700 packages downloaded to ½ million unique IP addresses annually; Bioconductor has more than 30,000 citations in the scientific literature, and positively impacts many scientific careers. The desire for open, reproducible science contributes to many aspects of Bioconductor, including literate programming vignettes, multi-package workflows, teaching courses and online material, extended package checks, use of formal (S4) classes, reusable ‘infrastructure’ packages for robust and interoperable code, centralized version control and support, nightly cross-platform builds, and a distinctive release strategy that enables developer innovation while providing user stability. Contrasts between Bioconductor and R provide rich opportunities for reflection on establishing open source communities, how users translate software into science, and software development best practices. The ever-changing environment of scientific computing, especially the emergence of cloud-based computation and very large and heterogeneous public data resources, point to areas where Bioconductor, and R, will continue to innovate.

17:00 Late breaking talk Dirk Eddelbuettel : Software Heritage Concorde 1+2
17:10 Poster Flash poster presentations Concorde 1+2
17:55 Poster Poster session Caravelle 1
Greenhouse L
Greenhouse R
08:30 Registration Concorde 1+2
09:00 Sponsor talk Deloitte Concorde 1+2
09:15 Keynote: Bettina Grün Tools for Model-Based Clustering in R Concorde 1+2

Bettina Grün, Professor at Johannes Kepler Universität Linz (Austria)

Model-based clustering aims at partitioning observations into groups based on either finite or infinite mixture models. The mixture models used differ with respect to their clustering kernel, i.e., the statistical model used for each of the groups. The choice of a suitable clustering kernel allows to adapt the model to the available data structure as well as clustering purpose. We first give an overview on available estimation and inference methods for mixture models as well as their implementations available in R and highlight common aspects regardless of the clustering kernel.

Then, the design of the R package flexmix is discussed pointing out how it provides a common infrastructure for fitting different finite mixture models with the EM algorithm. The package thus allows for rapid prototyping and the quick assessment of different model specifications. However, only a specific estimation method and models which share specific characteristics are covered. We thus conclude by highlighting the need for general infrastructure packages to allow for joint use of different estimation and inference methods and post-processing tools.

10:15 Break Comfort break
10:25 Lightning talks Parallel sessions
  • Bioinformatics & biostatistics (Ariane 1+2)
  • Methods & applications (Cassiopée)
  • Models & methods (Caravelle 2)
  • Shiny & web (Concorde 1+2)
  • Switching to R (Saint-Exupéry)
11:00 Break Coffee break
11:30 Talks Parallel sessions
  • Big/high dimensional data (Caravelle 2)
  • Biostatistics & epidemiology 2 (Guillaumet 1+2)
  • Contribution & collaboration (Concorde 1+2)
  • Model deployment (Cassiopée)
  • Performance (Saint-Exupéry)
  • Time series data (Ariane 1+2)
13:00 Break Lunch
14:00 Prizes Datathon and poster prizes Concorde 1+2
14:15 Keynote: Julien Cornebise 'AI for Good' in the R and Python ecosystems Concorde 1+2

Julien Cornebise, Director of Research at Element AI (UK)

Drawing from fifteen years of concrete examples and true stories, from algorithmic uses (computational Bayesian statistics, then deep learning and reinforcement learning at DeepMind) to more recent and more applied "AI for Good" projects (joint with Amnesty International: detecting burned villages on satellite imagery in conflict zones, or studying abuse against women on Twitter, etc), we discuss here how the tools of several communities sometimes rival and sometimes complement each other - in particular R and Python, with a bit of Lua, C++ and Javascript thrown into the mix. Indeed, from mathematical statistics to computational statistics to data science to machine learning to the latest "Artificial Intelligence" effervescence, our large variety of practices are co-existing, overlapping, splitting, merging, and co-evolving. Their scientific ingredients are remarkably similar, yet their technical tools differ and can seem alien to each other. By the end of this talk, we hope each attendee will leave with, in the worst case some interesting stories, in a better case some more thoughts on how this multitude of approaches play together to their own strengths, and, in the best case, some intense conversations in the Q&A!

15:15 Closing Concorde 1+2
15:30 Break Coffee break