useR! 2019

Toulouse - France

Tutorials

All materials for tutorials are gathered in this repo and/or below in this page.

The tutorials took place on Tuesday, July 9. There were 18 tutorials, half in the morning and half in the afternoon. There were beginner and intermediate tracks, with a range of topics in each. The program of day is provided below (clicking on titles provides details on the tutorial) and information on tutors is given below the program.

Tutor	Title	Venue
Tuesday Morning 9:00-12:30 (with a break: 10:30-11:00)
Rasmus Bååth	AM1: Get up to speed with Bayesian data analysis in R	Concorde 2
Abstract: Bayesian data analysis is a powerful tool for inferential statistics and prediction, and this tutorial will get you up to speed with doing Bayesian data analysis using R. The goal of the tutorial is for you to get an understanding of what Bayesian data analysis is and why it is useful. After the tutorial you should be able to run simple Bayesian analyses in R. Part one of the tutorial will introduce Bayesian modelling from a simulation-based perspective. Part two will introduce three packages for doing Bayesian data analysis: Rstanarm, Google's CausalImpact, and Facebook's Prophet. Target audience: The target audience is useRs with little or no knowledge of Bayesian statistics, but with basic knowledge of R. Instructions: Make sure to bring a laptop running R with Rstanarm, CausalImpact, and Prophet installed. To make sure these packages work try running the following script in an R session.
Jo-Fai Chow	AM2: Automatic and Explainable Machine Learning with H2O in R	Cassiopée
Abstract: General Data Protection Regulation (GDPR) is now in place. Are you ready to explain your models? This is a hands-on tutorial for R beginners. I will demonstrate the use of H2O and other R packages for automatic and interpretable machine learning. Participants will be able to follow and build regression and classification models quickly with H2O's AutoML. They will then be able to explain the model outcomes with various methods. Target audience: It is a workshop for R beginners and anyone interested in machine learning. RMarkdown and the rendered HTML will be provided so everyone can follow (even if they don’t want to run the code) Instructions: The latest R version is recommended but any 3.5.x (or 3.4.x) will be fine. All the instructions will be made available at this GitHub repository.
Di Cook	AM3: Visualising High-Dimensional Data	Auditorium Saint-Exupéry
Abstract: This workshop has three, roughly equal, parts: Review basic visualisation and inference with graphics: This part covers making plots using the grammar of graphics and how this fits into statistical inference. We will use the packages ggplot2 and nullabor. Plotting multiple dimensions in a single static plot, adding interaction: The building blocks to viewing high-dimensions are generalised pairs plots and parallel coordinate plots, available in the R package GGally. There are many variations and options that will be discussed, along with making these interactive with the plotly package. Using dynamic plots (tours) to examine models in the data space, beyond 3D: This part will cover the use of tours to examine multivariate spaces, in relation to dimension reduction techniques like principal component analysis and t-SNE, supervised and unsupervised classification models. We will also examine high-dimension, low-sample size problems. The tourr and spinifex packages will be used. Target audience: Materials are designed for an intermediate audience, users who are familiar with R, basic visualisation and tidyverse tools, and who would like to improve their knowledge about data visualisation. Instructions: The workshop is interactive, bring your laptop set up with the latest versions of R (>3.5) and RStudio, and these R packages ggplot2 (and tidyverse), nullabor, GGally, plotly, tourr and spinifex to work along with theinstructor, and do challenge exercises. If you have signed up for this tutorial it will be helpful to download a copy of these notes ahead of time, and these will be available here from Jul 3.
Dirk Eddelbuettel	AM4: Extending R with C++	Caravelle 2
Abstract: Rcpp has become the principal venue for extending R with compiled code. It makes it easy to extend R with C or C++ spanning the range from simple one-liners to larger routines and bindings of entire external libraries. We will motivate and introduce Rcpp as a natural extension to R that provides an easy-to-use and powerful interface. Helper functions and tools including RStudio will be used to easy creation of R extensions. Several examples will introduce basic use cases including writing code with RcppArmadillo which is the most widely-used package on top of Rcpp. Target audience: R users wanting to go beyond R. Prior experience with compiled languages like C and C++ is helpful but not required. Instructions: A laptop with Rcpp and RcppArmadillo pre-installed and tested helps, but is not required.
Colin Fay	AM5: Hacking RStudio: Advanced Use of your Favorite IDE	Guillaumet 1
Abstract: Have you ever wanted to become more productive with RStudio? Then this workshop is made for you! You've been wandering the web for a while now, reading about all the things the cool kids do with RStudio: building addins, using project templates, managing connections, playing around with the rstudioapi package... All this seems pretty nifty but so far you have never find the time to sit down and learn how to master these skills. Then, you've come to the right place — this workshop will teach you how to push the boundaries of your basic RStudio usage, in order to become more efficient in your day to day usage of RStudio. Target audience: We expect people to be familiar with RStudio, and have a little knowledge about programming with R. Knowing how to build a package will be better, but is not mandatory. Instructions: Please come with a recent version of RStudio installed.
Anqi Fu and Balasubramanian Narasimhan	AM6: CVXR: An R Package for Disciplined Convex Optimization (joint work with S. Boyd)	Spot
Abstract: Optimization plays an important role in fitting many statistical models. Some examples include least squares, ridge and lasso regression, isotonic regression, Huber regression, support vector machines, and sparse inverse covariance estimation. CVXR is an R package that provides an object-oriented modeling language for convex optimization. It allows the user to formulate convex optimization problems in a natural mathematical syntax rather than the standard form required by most solvers. The user specifies an objective and set of constraints by combining constants, variables, and parameters using a library of functions with known mathematical properties. CVXR then applies signed disciplined convex programming (DCP) to verify the problem's convexity. Once verified, the problem is converted into conic form using graph implementations and passed to a cone solver such as ECOS, MOSEK, or GUROBI. We demonstrate CVXR's modeling framework with several applications in statistics, engineering, and finance. More information, along with a link to our paper, is available on the official CVXR site at cvxr.rbind.io. Target audience: This tutorial should be of broad interest to useRs interested in optimization, statistics and machine learning. It should also be of interest to researchers who wish to develop new statistical methodology and prototype implementations with little effort. We will begin with a gentle introduction to Disciplined Convex Optimization using examples from ordinary least squares regression and penalized regression. This will be followed by a high level description of CVXR, how it differs from other packages, and a discussion of the domain specific language that CVXR implements. We will show how CVXR works with problem solvers for classes of problems, such as linear programs, quadratic programs, semi-definite programs etc., and demonstrate use of commercial solvers. Finally, we'll have a segment for potential developers where we go over the nuts and bolts of adding new atoms. Instructions: Participants should bring a laptop setup and install CVXR from CRAN. We also invite problems of interest from attendees and we will do our best to formulate them in CVXR. The instructions for the tutorial will be given here.
Colin Gillespie and Rhian Davies	AM7: Getting the most out of Git	Guillaumet 2
Abstract: Do you wonder what a pull request is? Do you browse GitHub and see lots of repositories with fancy looking badges, and think, I would like some of them? And what is a git-hook and why should you care? In this tutorial, we'll look at how we can leverage external tools, such as travis, covr, and Docker hub. Goals By the end of the tutorial participants should: Appreciate that a commit to Git can launch numerous other services Be able to use travis and appveyor to automatically check code and run unit tests Understand the importance of reproducible environments and how to automatically generate them via DockerHub Be aware of the different git-hooks that are available and how they can be used Be able to implement code coverage tools and how they can be used in their R packages Detailed Outline: This tutorial will cover a variety of techniques that will increase the productivity of anyone using Git. The key idea is that a single commit can automatically launch a variety of other services. This can lead to massive savings in time, as well are reduce bugs. An overview of the topics covered are: Continuous integration with travis and appveyor Automatically creating Docker images via DockerHub Generating static HTML documentation with pkgdown & bookdown Organisations vs users Using client and server-side git-hooks Code coverage with codecov.io Github vs Gitlab Target audience: The course is suitable for R users of all levels, but, participants should be familiar with basic git commands. Instructions: Please see the GitHub Gist for pre-course instructions.
Jim Hester, Hadley Wickham and Jenny Bryan	AM8: Package Development	Concorde 1
Abstract: The key to well-documented, well-tested and easily-distributed R code is the package. This half day class will teach you how to make package development as easy as possible with devtools and usethis. You'll also learn roxygen2 to document your code so others (including future you!) can understand what's going on, and testthat to avoid future breakages. The class will consist of a series of demonstrations and hands on exercises. Topics: Package basics The basic structure of a package, where to put code, and the workflow that allows you to take your package code out for a test drive. Documentation and namespaces For your code to be useful, it must be well-documented. We'll solve these problems with roxygen2 with documentation, and rmarkdown for vignettes and READMEs. Automated testing Maintaining R code requires advanced planning. You can simplify debugging, spot errors, and ensure that your package is stable by creating unit tests with the help of testhat. Releasing your package We'll work through R CMD check, and the process of distributing your package on GitHub or CRAN. Target audience: This class will be a good fit for you if you've developed a number of R scripts, and now you want to learn: a more efficient workflow, iterating between writing code and checking that it works how to document your code so others (including future you!) can understand what's going on automated testing principles, so that if you accidentally break something in your code you find out right away how to package and distribute your code to others, whether it's inside your research group, your company, or to the whole world. Instructions: Participants should bring a laptop setup to build R packages. Detailed instructions are available here.
Heather Turner	AM9: Generalized Nonlinear Models	Ariane 1+2
Abstract: The class of generalized linear models covers several methods commonly used in data analysis, including multiple linear regression, logistic regression, and log-linear models. But a linear predictor does not always capture the relationship we wish to model. Rather, a nonlinear predictor may provide a better description of the observed data, often with fewer and more interpretable parameters. This tutorial introduces the wider class of generalized nonlinear models (GNMs) and their implementation via the R package gnm. Part 1 Learn how to specify, fit and evaluate GNMs. This part will focus on multiplicative interaction models, such as Goodman's Row-Column association model for contingency tables. Part 2 Learn about further uses of gnm. This part will consider specific models, such as the diagonal reference model for studying social mobility, as well as bespoke GNMs. Exercises and cases studies will use data from real-world applications in sociology and ecology. Target audience: This tutorial is for people who wish to find out what generalized nonlinear models are and whether such models might be useful in their field of application, as well as people who want an introduction to the gnm package. Participants should have some experience of statistical modelling in R. Familiarity with generalized linear models (e.g. logistic regression models, log-linear models, etc) and/or nonlinear least squares models would be beneficial but not essential. Instructions: Participants should bring a laptop with R installed, along with the gnm and vcdExtra packages. Course materials are available to download from here.
Tuesday Afternoon 14:00-17:30 (with a break: 15:30-16:00)
Edzer Pebesma and Roger Bivand	PM1: Spatial and Spatiotemporal Data Analysis in R	Caravelle 2
Abstract: This tutorial dives into some of the modern spatial and spatiotemporal analysis packages available in R. It will show how support, the spatial size of the area to which a data value refers, plays a role in spatial analysis, and how this is handled in R. It will show how package stars complements package sf for handling spatial time series, raster data, raster time series, and more complex multidimensional data such as dynamic origin-destination matrices. It will also show how stars handles out-of- memory datasets, with an example that uses Sentinel-2 satellite time series. This will be connected to analysing the data with packages that assume spatial processes as their modelling framework, including gstat, spdep, and R-INLA. Familiarity with package sf and the tidyverse will be helpful for taking this tutorial. Target audience: R users interested in analysing spatial data. Instructions: All the instructions will be made available at this github repository.
Maria Prokofieva	PM2: Watch me: introduction to social media analytics	Cassiopée
Abstract: The tutorial will look into social media analytics in R using text and non-text data. The tutorial will look at: most recent developments and guidelines for working with most popular social media platforms common steps of working with APIs as well as R packages for social media data mining and analysis main ways to analyse social media data, including text and image analytics particular focus will be given to image analytics, including analysis of images, emoji's and other media formats. The tutorial will also include a demonstration of the analysis of images using of Google AI & Machine Learning Products, such as Cloud Vision API. The tutorial will also cover details of current application process (i.e. App Review) for Facebook, Instagram and Twitter. The tutorial will provide hands on examples of using social media analytics for business applications and just for fun. The examples will include popular social and sporting events in 2019. The tutorial will be conducted in an engaging manner. Participants will complete a small project and will be provided with help and guidance along the way. The particular packages used for the session will include Rfacebook, instaR, twitteR / rTweet, tuber, tidytext. The tutorial is supported by R Ladies Melbourne, Australia. Target audience: R users interested in analysing social media data. Instructions: Materials will be made available via https://github.com/mariaprokofieva/useR2019_tutorial. Please note that the tutorial page is empty at the moment, but will be updated before the conference.
Emma Rand	PM3: Keeping an exotic pet in your home! Taming Python to live in RStudio because sometimes the best language is both!	Ariane 1+2
Abstract: Two of the most popular programming languages for doing data science are Python and R and many data scientists use both, often along other languages. However, if you came to programming through an academic route to explore, analyse and visualise data then you probably learnt R as your first and only language. Since R was designed for doing statistical analyses and data visualisation it can be all we need. However, perhaps you are an R enthusiast with data tasks who has heard Python has an advantages as a general-purpose language. Or perhaps you found a solution for one of your tasks, but it's written in python. Or perhaps you collaborate with someone that only uses Python. In all these cases, the reticulate package (developed by JJ Allaire, Kevin Ushey, RStudio and Yuan Tang) can help! In this tutorial we will explore what tasks you might want to do with python, how to embed Python in code chunks of your Rmarkdown document and how to use the resulting Python objects for continued analysis with R. We will cover the aspects of Python needed to call it from R and how to house this exotic pet within the comfort of your RStudio home. This tutorial is aimed at aimed at beginner to intermediate R users with experience mainly or entirely in R who may or may not have experience of Rmarkdown. Target audience: This tutorial for beginner to intermediate) R users with experience little or no experience of other programming languages who are interested in learning a little about Python and how to run python code from RStudio. Some previous experience of Rmarkdown/notebooks would be helpful but not required. Instructions: Participants should bring a laptop with RStudio 1.2 and Anaconda distribution of Python 3.7 installed. They should also install the reticulate package and I recommend the development version from github. Please follow the pre-tutorial instructions here.
Marco Scutari	PM4: bnlearn: Practical Bayesian Networks in R	Concorde 1
Abstract: The tutorial aims to introduce the basics of Bayesian networks' learning and inference using real-world data to explore the issues commonly found in graphical modelling. Key points will include preprocessing the data; learning the structure and the parameters of a Bayesian network; using the network as a predictive model; using the network for inference; validating the network by contrasting it with external information. Target audience: R users interested in modelling complex data with networks, and in graphical models. Participants should have a basic understanding of probability theory (multinomial and normal distributions in particular), and some familiarity with Bayesian statistics is advantageous. Knowledge of Bayesian networks is advantageous but not required, relevant concepts and definitions will be introduced in the tutorial. Instructions: Participants should come with a laptop with bnlearn and its "Suggests" packages (parallel, graph, Rgraphviz, lattice, gRain, Rmpfr, gmp). Material will be made available on www.bnlearn.com.
Torsten Hothorn	PM5: Transformation Models	Guillaumet 2
Abstract: Regression models with a continuous response are commonly understood as models for the conditional mean of the response. This notion is simple but information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference from such models. Transformation models describe conditional distributions in a simple yet powerful and extensible way. Well-known classics, such as the normal linear regression models, binary and polytomous logistic regression, or Weibull and Cox regression models can all be understood as special transformation models. The first part of this tutorial highlights the connections between these models. A general form of the likelihood allowing arbitrary forms of random censoring and truncation will be introduced. Finally, model estimation using the R add-on packages mlt and tram will be illustrated by regression models for binary, ordered, continuous, and potentially censored response variables. In the second part novel conditional transformation models, especially distribution regression, transformation trees and transformation forests (trtf R add-on package) are covered. https://cran.r-project.org/web/packages/mlt.docreg/vignettes/mlt.pdf - https://arxiv.org/abs/1701.02110 Target audience: useRs interested in statistical modelling outside the box. Instructions: Install the CRAN packages mlt, tram, trtf, and mlt.docreg. Slides and R Code for this tutorial will be made available at http://ctm.R-forge.R-project.org.
Mark Van Der Loo and Edwin De Jonge	PM6: Statistical Data Cleaning using R	Concorde 2
Abstract: To trust the outcome of a statistical analyses, one must be able to trust the input data. In this workshop we demonstrate how data quality can be systematically defined and improved using R. The workshop focuses on data validation (data checking), locating errors, and imputing missing or erroneous values under restrictions. We present short introductions to the main principles, provide quizzes and discussions for the audience, and give short R-based exercises. We will demonstrate a number of our R packages including 'validate' (for data quality checks) 'errorlocate' (for error localizationi), 'simputation' for imputaton methods and 'lumberjack' (for keeping track of changes in data). Special attention will be payed on how to combine the various data processing steps, and how to analyze and visualize the results. This tutorial is based on the book 'Statistical Data Cleaning with Applications in R' (John Wiley & Sons, 2018) by the authors of this tutorial. Materials will be made available via https://github.com/data-cleaning/useR2019_tutorial. Target audience: Participants are expected to have a basic understanding of R. Some experience with RStudio projects and git is preferable but not necessary. Instructions: See the instructions at the tutorial's github page. Note: the tutorial GH is empty at the moment but we will fill it before the conference.
Davis Vaughan	PM7: Design For Humans! A Toolkit For Creating Intuitive Modeling Packages (joint work with M. Kuhn)	Guillaumet 1
Abstract: Workshop goals: Empower those new to R with skills required to create modeling packages. Provide guidelines so resulting packages follow best-practices and have predictable interfaces. The modeling ecosystem in R is strong, with hundreds of packages implementing state of the art procedures. That said, R has long suffered from a lack of standardization amongst these packages. Because of this, the cognitive load required to switch between packages is often high, with most of this forced onto the user. The caret, parsnip, mlr and other packages attempt to solve this by creating a unified interface to a number of modeling packages. This workshop is an attempt to go a step further. In it, a set of general but consistent modeling package development principles will be taught that provide guidance on creating packages with a user focus. These principles are aimed at R developers of all experience levels, from the graduate student looking for guidance on their first project, to the veteran developer desiring a framework to standardize their work. This standardization eases the burden of not only the user, but also the future developer who builds on these packages. Attendees will walk away with a clear view of how to build a package using these principles, and will be able to reference a best-practices guide in the future, found here. Target audience: This tutorial will be most useful for participants that are interested in developing R packages related to modeling. Instructions: A laptop with access to the internet is recommended. RStudio Server instances will be provided with all of the materials and packages preinstalled. Much of the material will be about the hardhat package, found here.
Tobias Verbeke	PM8: Docker for Data Science: R, ShinyProxy and more	Auditorium Saint-Exupéry
Abstract: Docker has revolutionized the software landscape in the past 5 years and completely transformed the way programs are packaged and shipped. It has also become a critical technology to achieve reproducible environments for data science and its ease of deployment and inherent scalability (Docker clusters) allow to bring data science products from a data scientist's laptop to data center scale. The primary goal of this tutorial is to demonstrate the benefits of Docker's container technology for day to day data science work. In a first part, the Docker technology will be introduced by setting up R-based environments for computing (defining, building, running containers, rocker). This will include connecting to Docker-based R environments from an IDE (RConsoleProxy). In a second part, the use of Docker will be demonstrated for all typical data science artefacts: Shiny apps, notebooks, reports, data science APIs and R packages. For each of these, mature open source components will be introduced to bring Docker into the workflow. For Shiny apps and notebooks, a detailed introduction will be given to Docker-based deployment using ShinyProxy. For APIs, reports and other artefacts, we will peek under the hood of ShinyProxy and customize its ContainerProxy engineto serve APIs, schedule reports and more. In a third part, we will cover some more advanced topics e.g. scaling these solutions with Kubernetes or applying the same technology beyond R (Python, Julia etc.) Target audience: R users with R programming experience and that need to disseminate their work (reports, Shiny apps, APIs). Technical concepts will be explained in layman's terms, but experience with using the command line and/or Linux will be helpful. Instructions: The workshop is interactive and examples for all scenarios will provided, such that users can replicate these on their laptops. Docker and associated tools need to be installed on participant laptops. Detailed instructions can be found here.
Achim Zeileis	PM9: R/exams: A One-For-All Exams Generator	Spot
Abstract: This tutorial demonstrates how the R/exams package can be used for generating various kinds of exams, quizzes, and assessments from the same pool of dynamic exercises in plain R/Markdown or R/LaTeX format. The supported output formats include: Online tests in various kinds of learning management systems such as Moodle, Canvas, Blackboard, etc. Written multiple-choice exams that can be automatically scanned and evaluated. Live quizzes with voting via smartphones or tables (in ARSnova). Custom documents ind PDF, HTML, Docx, ... The exercises may combine generation of some numbers, text, or full data sets, etc. with question and solution text which can embed verbatim R input/output, mathematical equations, images, or supplements such as .rda or .csv files among others. The answer types may be: Multiple choice. Single choice. Numeric. Text. Or combinations of these. The tutorial will give an overview of the R/exams package and its capabilities and guide all participants towards authoring their first exercises and corresponding exams. Target audience: Lecturers, teachers, teaching assistants, etc. who want to leverage R's flexibility to generate exams, tests, and quizzes. Instructions: Participants should bring their laptops with an installation of R (and Rtools), the exams packages and its dependencies, LaTeX, and pandoc. See Steps 1-4 at http://www.R-exams.org/tutorials/installation/.

Our tutors:

Rasmus Bååth Senior Data Scientist at King Enterntainment and Lund University Get up to speed with Bayesian data analysis in R	Roger Bivand Professor, Norwegian School of Economics Spatial and Spatiotemporal Data Analysis in R	Jenny Bryan Software Engineer at RStudio Package Development	Jo-Fai Chow Data Science Evangelist & Community Manager at H2O.ai Automatic and Explainable Machine Learning with H2O in R
Di Cook Professor, Monash University Visualising High-Dimensional Data	Rhian Davies Data Scientist at Jumping Rivers Getting the most out of Git	Dirk Eddelbuettel Quantitative Analyst & Adjunct Professor at the University of Illinois Extending R with C++	Colin Fay Data Scientist & R Hacker at ThinkR Hacking RStudio: Advanced Use of your Favorite IDE
Anqi Fu Ph.D Candidate at Stanford University CVXR: An R Package for Disciplined Convex Optimization	Colin Gillespie Data scientist at Jumping Rivers & Senior Statistics lecturer, Newcastle University Getting the most out of Git	Jim Hester Software Engineer at RStudio Package Development	Torsten Hothorn Professor at Universität Zürich Transformation Models
Balasubramanian Narasimhan Senior Research Scientist at Stanford University CVXR: An R Package for Disciplined Convex Optimization	Edzer Pebesma Professor, University of Muenster Spatial and Spatiotemporal Data Analysis in R	Maria Prokofieva Senior lecturer, Business School, Victoria University, Australia Watch me: introduction to social media analytics	Emma Rand Teaching and Scholorship lecturer in biological data Science, University of York, UK Keeping an exotic pet in your home! Taming Python to live in RStudio because sometimes the best language is both!
Marco Scutari Senior Researcher, IDSIA, Lugano bnlearn: Practical Bayesian Networks in R	Heather Turner Statistical Consultant Generalized Nonlinear Models	Mark van der Loo and Edwin de Jonge Senior researchers at Statistics Netherlands Statistical Data Cleaning using R	Davis Vaughan Software Engineer at RStudio Design For Humans! A Toolkit For Creating Intuitive Modeling Packages
Tobias Verbeke Managing director at Open Analytics Docker for Data Science: R, ShinyProxy and more	Hadley Wickham Chief Scientist at RStudio Package Development	Achim Zeileis Professor, Universität Innsbruck R/exams: A One-For-All Exams Generator