useR! 2019

Toulouse - France

user2019   Program

Keynote speakers

We are happy to introduce our 6 confirmed keynote speakers: You will find below a short bio and an abstract.
Joe Cheng
Chief Technology Officer
at RStudio (US)

Joe Cheng is RStudio's Chief Technology Officer, and the original creator of Shiny. He was the first employee at RStudio, joining founder J.J. Allaire in 2009 to help build the RStudio IDE.

Abstract : Shiny's Holy Grail: Interactivity with reproducibility

    Since its introduction in 2012, Shiny has become a mainstay of the R ecosystem, providing a solid foundation for building interactive artifacts of all kinds. But that interactivity has always come at a significant cost to reproducibility, as actions performed in a Shiny app are not easily captured for later analysis and replication. In 2016, Shiny gained a "bookmarkable state" feature that makes it possible to snapshot and restore application state via URL. But this feature, though useful, doesn't completely solve the reproducibility problem, as the actual program logic is still locked behind a user interface.
    The solution that Shiny app authors have long requested is a way for users of their apps to explore interactively, and then generate a reproducible artifact—like a standalone R script or R Markdown document—that represents a snapshot of the app’s state and logic. Such a script can then be rerun, studied, modified, checked into source control—all the things one cannot do with an interactive app. Such script-generating Shiny apps do exist today, but it is generally thanks to heroic efforts on the part of their authors; the level of implementation effort is high, and the Shiny app logic tends to be fragile and/or involve significant duplication of code.
    In this talk, I'll discuss some of the approaches that app authors have taken to achieve these ends, along with some surprising and exciting approaches that have recently emerged. These new approaches usefully decrease the implementation effort and code duplication, and may eventually become essential tools for those who wish to combine interactivity with reproducibility.

Julien Cornebise
Director of Research
at Element AI (UK)

Julien Cornebise is a Director of Research at Element AI and Head of the London Office. He is also an Honorary Associate Professor at University College London. Prior to Element AI, Julien joined DeepMind (later acquired by Google) in 2012 as an early employee. During his four years at DeepMind, he led several fundamental research directions used in early demos and fundraising, he helped create and lead its Health Applied Research Team. Since leaving DeepMind in 2016, he has been working with Amnesty International. Julien holds an MSc in Computer Engineering, an MSc in Mathematical Statistics, and earned his PhD in Mathematics, specialised in Computational Statistics, from University Paris VI Pierre and Marie Curie and Telecom ParisTech, for which he received the 2010 Savage Award in Theory and Methods from the International Society for Bayesian Analysis.

Abstract : "AI for Good" in the R and Python ecosystems

    Drawing from fifteen years of concrete examples and true stories, from algorithmic uses (computational Bayesian statistics, then deep learning and reinforcement learning at DeepMind) to more recent and more applied "AI for Good" projects (joint with Amnesty International: detecting burned villages on satellite imagery in conflict zones, or studying abuse against women on Twitter, etc), we discuss here how the tools of several communities sometimes rival and sometimes complement each other - in particular R and Python, with a bit of Lua, C++ and Javascript thrown into the mix. Indeed, from mathematical statistics to computational statistics to data science to machine learning to the latest "Artificial Intelligence" effervescence, our large variety of practices are co-existing, overlapping, splitting, merging, and co-evolving. Their scientific ingredients are remarkably similar, yet their technical tools differ and can seem alien to each other. By the end of this talk, we hope each attendee will leave with, in the worst case some interesting stories, in a better case some more thoughts on how this multitude of approaches play together to their own strengths, and, in the best case, some intense conversations in the Q&A!

Bettina Grün
at Johannes Kepler Universität Linz (Austria)

Bettina Grün is an Associate Professor at the Department of Applied Statistics, Johannes Kepler University Linz. She holds a PhD in Applied Mathematics from the Vienna University of Technology, obtained under the supervision of R Core team member Fritz Leisch. She has co-authored several packages available on CRAN, among them flexmix for flexible finite mixture modeling, and is an editor-in-chief of the Journal of Statistical Software.

Abstract : Tools for Model-Based Clustering in R

    Model-based clustering aims at partitioning observations into groups based on either finite or infinite mixture models. The mixture models used differ with respect to their clustering kernel, i.e., the statistical model used for each of the groups. The choice of a suitable clustering kernel allows to adapt the model to the available data structure as well as clustering purpose. We first give an overview on available estimation and inference methods for mixture models as well as their implementations available in R and highlight common aspects regardless of the clustering kernel.
    Then, the design of the R package flexmix is discussed pointing out how it provides a common infrastructure for fitting different finite mixture models with the EM algorithm. The package thus allows for rapid prototyping and the quick assessment of different model specifications. However, only a specific estimation method and models which share specific characteristics are covered. We thus conclude by highlighting the need for general infrastructure packages to allow for joint use of different estimation and inference methods and post-processing tools.

Julie Josse
at École Polytechnique (France)

Julie Josse focuses her research on the development of methods to deal with missing values and of matrix completion. She has also specialized in principal components methods to explore and visualized complex data structures. Her fields of application include mainly bio-sciences and public health. Julie Josse is dedicated to reproducible research and has developed many packages, including FactoMineR and missMDA to disseminate her work. she is a member of the R Foundation and of Forwards, a task force to increase the participation of minorities in the R community.

Abstract : A missing value tour in R

    In many application settings, the data have missing features which make data analysis challenging. An abundant literature addresses missing data as well as more than 150 R packages. Funded by the R consortium, we have created the R-miss-tastic plateform along with a dedicated task view which aims at giving an overview of main references, contributors, tutorials to offer users keys to analyse their data. This plateform highlights that this is an active field of work and that as usual different problems requires designing dedicated methods.
      In this presentation, I will share my experience on the topic. I will start by the inferential framework, where the aim is to estimate at best the parameters and their variance in the presence of missing data. Last multiple imputation methods have focused on taking into account the heterogeneity of the data (multi-sources with variables of different natures, etc.). Then I will present recent results in a supervised-learning setting. A striking one is that the widely-used method of imputing with the mean prior to learning can be consistent. That such a simple approach can be relevant may have important consequences in practice.

    Martin Morgan
    at Roswell Park Comprehensive Cancer Center (US) and Bioconductor

    Martin Morgan has worked on the Bioconductor project for the statistical analysis and comprehension of high-throughput genomic data for more than 12 years; he has lead the project since 2008. Martin Morgan is Professor in the Biostatistics and Bioinformatics department at Roswell Park Comprehensive Cancer Center in Buffalo, NY, USA. He has previously held positions at the Fred Hutchinson Cancer Research Center and Washington State University. Martin Morgan's original training was in evolutionary genetics, with a PhD from the University of Chicago.

    Abstract : How Bioconductor advances science while contributing to the R language and community.

      The Bioconductor project has had profound influence on the statistical analysis and comprehension of high-throughput genomic data, while contributing many innovations to the R language and community. Bioconductor started in 2002 and has grown to more than 1700 packages downloaded to ½ million unique IP addresses annually; Bioconductor has more than 30,000 citations in the scientific literature, and positively impacts many scientific careers. The desire for open, reproducible science contributes to many aspects of Bioconductor, including literate programming vignettes, multi-package workflows, teaching courses and online material, extended package checks, use of formal (S4) classes, reusable ‘infrastructure’ packages for robust and interoperable code, centralized version control and support, nightly cross-platform builds, and a distinctive release strategy that enables developer innovation while providing user stability. Contrasts between Bioconductor and R provide rich opportunities for reflection on establishing open source communities, how users translate software into science, and software development best practices. The ever-changing environment of scientific computing, especially the emergence of cloud-based computation and very large and heterogeneous public data resources, point to areas where Bioconductor, and R, will continue to innovate.

    Julia Stewart Lowndes
    Mozilla Fellow and
    Marine Data Scientist at NCEAS

    Julia Stewart Lowndes PhD is a marine ecologist, data scientist, and Mozilla Fellow at the National Center for Ecological Analysis and Synthesis (NCEAS), USA. As founding director of Openscapes and science program lead of the Ocean Health Index, and co-founder of Eco-Data-Science and R-Ladies Santa Barbara, she works to increase the value and practice of environmental open data science. She earned her PhD at Stanford University in 2012 studying drivers and impacts of Humboldt squid in a changing climate

    Abstract : R for better science in less time

      There is huge potential for R to accelerate scientific research, since it not only provides powerful analytical tools that increase reproducibility but also creates a new frontier for communication and publishing when combined with the open web. However, a fundamental shift is needed in scientific culture so that we value and prioritize data science, collaboration, and open practices, and provide training and support for our emerging scientific leaders. I will discuss my work to help catalyze this shift in environmental science through the Ocean Health Index project. Over the past six years, our team has dramatically improved how we work by building an analytical workflow with R that emphasizes communication and training, which has enabled over 20 groups around the world to build from our science and code for ocean management. R has been so transformative for our science, and we shared our path to better science in less time (Lowndes et al. 2017) to encourage others in the scientific community to embrace data science and do the same. Building from this, as a Mozilla Fellow I recently launched Openscapes to engage and empower scientists and help ignite this change more broadly.