All materials for tutorials are gathered in this repo and/or below in this page.
The tutorials took place on Tuesday, July 9. There were 18 tutorials, half in the morning and half in the afternoon. There were beginner and intermediate tracks, with a range of topics in each. The program of day is provided below (clicking on titles provides details on the tutorial) and information on tutors is given below the program.
Tutor | Title | Venue | Link | ||
---|---|---|---|---|---|
Tuesday Morning 9:00-12:30 (with a break: 10:30-11:00) | |||||
Rasmus Bååth | AM1: Get up to speed with Bayesian data analysis in R | Concorde 2 | |||
Abstract: Bayesian data analysis is a powerful tool for inferential statistics and prediction, and this tutorial will get you up to speed with doing Bayesian data analysis using R. The goal of the tutorial is for you to get an understanding of what Bayesian data analysis is and why it is useful. After the tutorial you should be able to run simple Bayesian analyses in R. Part one of the tutorial will introduce Bayesian modelling from a simulation-based perspective. Part two will introduce three packages for doing Bayesian data analysis: Rstanarm, Google's CausalImpact, and Facebook's Prophet. Target audience: The target audience is useRs with little or no knowledge of Bayesian statistics, but with basic knowledge of R. Instructions: Make sure to bring a laptop running R with Rstanarm, CausalImpact, and Prophet installed. To make sure these packages work try running the following script in an R session. |
|||||
Jo-Fai Chow | AM2: Automatic and Explainable Machine Learning with H2O in R | Cassiopée | |||
Abstract: General Data Protection Regulation (GDPR) is now in place. Are you ready to explain your models? This is a hands-on tutorial for R beginners. I will demonstrate the use of H2O and other R packages for automatic and interpretable machine learning. Participants will be able to follow and build regression and classification models quickly with H2O's AutoML. They will then be able to explain the model outcomes with various methods. Target audience: It is a workshop for R beginners and anyone interested in machine learning. RMarkdown and the rendered HTML will be provided so everyone can follow (even if they don’t want to run the code) Instructions: The latest R version is recommended but any 3.5.x (or 3.4.x) will be fine. All the instructions will be made available at this GitHub repository. |
|||||
Di Cook | AM3: Visualising High-Dimensional Data | Auditorium Saint-Exupéry | |||
Abstract: This workshop has three, roughly equal, parts:
Target audience: Materials are designed for an intermediate audience, users who are familiar with R, basic visualisation and tidyverse tools, and who would like to improve their knowledge about data visualisation. Instructions: The workshop is interactive, bring your laptop set up with the latest versions of R (>3.5) and RStudio, and these R packages ggplot2 (and tidyverse), nullabor, GGally, plotly, tourr and spinifex to work along with theinstructor, and do challenge exercises. |
|||||
Dirk Eddelbuettel | AM4: Extending R with C++ | Caravelle 2 | |||
Abstract: Rcpp has become the principal venue for extending R with compiled code. It makes it easy to extend R with C or C++ spanning the range from simple one-liners to larger routines and bindings of entire external libraries. We will motivate and introduce Rcpp as a natural extension to R that provides an easy-to-use and powerful interface. Helper functions and tools including RStudio will be used to easy creation of R extensions. Several examples will introduce basic use cases including writing code with RcppArmadillo which is the most widely-used package on top of Rcpp. Target audience: R users wanting to go beyond R. Prior experience with compiled languages like C and C++ is helpful but not required. Instructions: A laptop with Rcpp and RcppArmadillo pre-installed and tested helps, but is not required. |
|||||
Colin Fay | AM5: Hacking RStudio: Advanced Use of your Favorite IDE | Guillaumet 1 | |||
Abstract: Have you ever wanted to become more productive with RStudio? Then this workshop is made for you! Target audience: We expect people to be familiar with RStudio, and have a little knowledge about programming with R. Knowing how to build a package will be better, but is not mandatory. Instructions: Please come with a recent version of RStudio installed. |
|||||
Anqi Fu and Balasubramanian Narasimhan | AM6: CVXR: An R Package for Disciplined Convex Optimization (joint work with S. Boyd) | Spot | |||
Abstract: Optimization plays an important role in fitting many statistical models. Some examples include least squares, ridge and lasso regression, isotonic regression, Huber regression, support vector machines, and sparse inverse covariance estimation. CVXR is an R package that provides an object-oriented modeling language for convex optimization. It allows the user to formulate convex optimization problems in a natural mathematical syntax rather than the standard form required by most solvers. The user specifies an objective and set of constraints by combining constants, variables, and parameters using a library of functions with known mathematical properties. CVXR then applies signed disciplined convex programming (DCP) to verify the problem's convexity. Once verified, the problem is converted into conic form using graph implementations and passed to a cone solver such as ECOS, MOSEK, or GUROBI. We demonstrate CVXR's modeling framework with several applications in statistics, engineering, and finance. More information, along with a link to our paper, is available on the official CVXR site at cvxr.rbind.io. Target audience: This tutorial should be of broad interest to useRs interested in optimization, statistics and machine learning. It should also be of interest to researchers who wish to develop new statistical methodology and prototype implementations with little effort. Instructions: Participants should bring a laptop setup and install CVXR from CRAN. We also invite problems of interest from attendees and we will do our best to formulate them in CVXR. The instructions for the tutorial will be given here. |
|||||
Colin Gillespie and Rhian Davies | AM7: Getting the most out of Git | Guillaumet 2 | |||
Abstract: Do you wonder what a pull request is? Do you browse GitHub and see lots of repositories with fancy looking badges, and think, I would like some of them? And what is a git-hook and why should you care? In this tutorial, we'll look at how we can leverage external tools, such as travis, covr, and Docker hub.
Target audience: The course is suitable for R users of all levels, but, participants should be familiar with basic git commands. Instructions: Please see the GitHub Gist for pre-course instructions. |
|||||
Jim Hester, Hadley Wickham and Jenny Bryan | AM8: Package Development | Concorde 1 | |||
Abstract: The key to well-documented, well-tested and easily-distributed R code is the package. This half day class will teach you how to make package development as easy as possible with devtools and usethis. You'll also learn roxygen2 to document your code so others (including future you!) can understand what's going on, and testthat to avoid future breakages. The class will consist of a series of demonstrations and hands on exercises. Topics:
Target audience: This class will be a good fit for you if you've developed a number of R scripts, and now you want to learn:
Instructions: Participants should bring a laptop setup to build R packages. Detailed instructions are available here. |
|||||
Heather Turner | AM9: Generalized Nonlinear Models | Ariane 1+2 | |||
Abstract: The class of generalized linear models covers several methods commonly used in data analysis, including multiple linear regression, logistic regression, and log-linear models. But a linear predictor does not always capture the relationship we wish to model. Rather, a nonlinear predictor may provide a better description of the observed data, often with fewer and more interpretable parameters. Target audience: This tutorial is for people who wish to find out what generalized nonlinear models are and whether such models might be useful in their field of application, as well as people who want an introduction to the gnm package. Participants should have some experience of statistical modelling in R. Familiarity with generalized linear models (e.g. logistic regression models, log-linear models, etc) and/or nonlinear least squares models would be beneficial but not essential. Instructions: Participants should bring a laptop with R installed, along with the gnm and vcdExtra packages. Course materials are available to download from here. |
|||||
Tuesday Afternoon 14:00-17:30 (with a break: 15:30-16:00) | |||||
Edzer Pebesma and Roger Bivand | PM1: Spatial and Spatiotemporal Data Analysis in R | Caravelle 2 | |||
Abstract: This tutorial dives into some of the modern spatial and spatiotemporal analysis packages available in R. It will show how support, the spatial size of the area to which a data value refers, plays a role in spatial analysis, and how this is handled in R. It will show how package stars complements package sf for handling spatial time series, raster data, raster time series, and more complex multidimensional data such as dynamic origin-destination matrices. It will also show how stars handles out-of- memory datasets, with an example that uses Sentinel-2 satellite time series. This will be connected to analysing the data with packages that assume spatial processes as their modelling framework, including gstat, spdep, and R-INLA. Familiarity with package sf and the tidyverse will be helpful for taking this tutorial. Target audience: R users interested in analysing spatial data. Instructions: All the instructions will be made available at this github repository. |
|||||
Maria Prokofieva | PM2: Watch me: introduction to social media analytics | Cassiopée | |||
Abstract: The tutorial will look into social media analytics in R using text and non-text data. The tutorial will look at:
Target audience: R users interested in analysing social media data. Instructions: Materials will be made available via https://github.com/mariaprokofieva/useR2019_tutorial. Please note that the tutorial page is empty at the moment, but will be updated before the conference. |
|||||
Emma Rand | PM3: Keeping an exotic pet in your home! Taming Python to live in RStudio because sometimes the best language is both! | Ariane 1+2 | |||
Abstract: Two of the most popular programming languages for doing data science are Python and R and many data scientists use both, often along other languages. However, if you came to programming through an academic route to explore, analyse and visualise data then you probably learnt R as your first and only language. Since R was designed for doing statistical analyses and data visualisation it can be all we need. However, perhaps you are an R enthusiast with data tasks who has heard Python has an advantages as a general-purpose language. Or perhaps you found a solution for one of your tasks, but it's written in python. Or perhaps you collaborate with someone that only uses Python. In all these cases, the reticulate package (developed by JJ Allaire, Kevin Ushey, RStudio and Yuan Tang) can help! In this tutorial we will explore what tasks you might want to do with python, how to embed Python in code chunks of your Rmarkdown document and how to use the resulting Python objects for continued analysis with R. We will cover the aspects of Python needed to call it from R and how to house this exotic pet within the comfort of your RStudio home. This tutorial is aimed at aimed at beginner to intermediate R users with experience mainly or entirely in R who may or may not have experience of Rmarkdown. Target audience: This tutorial for beginner to intermediate) R users with experience little or no experience of other programming languages who are interested in learning a little about Python and how to run python code from RStudio. Some previous experience of Rmarkdown/notebooks would be helpful but not required. Instructions: Participants should bring a laptop with RStudio 1.2 and Anaconda distribution of Python 3.7 installed. They should also install the reticulate package and I recommend the development version from github. Please follow the pre-tutorial instructions here. |
|||||
Marco Scutari | PM4: bnlearn: Practical Bayesian Networks in R | Concorde 1 | |||
Abstract: The tutorial aims to introduce the basics of Bayesian networks' learning and inference using real-world data to explore the issues commonly found in graphical modelling. Key points will include
Target audience: R users interested in modelling complex data with networks, and in graphical models. Participants should have a basic understanding of probability theory (multinomial and normal distributions in particular), and some familiarity with Bayesian statistics is advantageous. Knowledge of Bayesian networks is advantageous but not required, relevant concepts and definitions will be introduced in the tutorial. Instructions: Participants should come with a laptop with bnlearn and its "Suggests" packages (parallel, graph, Rgraphviz, lattice, gRain, Rmpfr, gmp). Material will be made available on www.bnlearn.com. |
|||||
Torsten Hothorn | PM5: Transformation Models | Guillaumet 2 | |||
Abstract: Regression models with a continuous response are commonly understood as models for the conditional mean of the response. This notion is simple but information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference from such models. Transformation models describe conditional distributions in a simple yet powerful and extensible way. Well-known classics, such as the normal linear regression models, binary and polytomous logistic regression, or Weibull and Cox regression models can all be understood as special transformation models. The first part of this tutorial highlights the connections between these models. A general form of the likelihood allowing arbitrary forms of random censoring and truncation will be introduced. Finally, model estimation using the R add-on packages mlt and tram will be illustrated by regression models for binary, ordered, continuous, and potentially censored response variables. In the second part novel conditional transformation models, especially distribution regression, transformation trees and transformation forests (trtf R add-on package) are covered. https://cran.r-project.org/web/packages/mlt.docreg/vignettes/mlt.pdf - https://arxiv.org/abs/1701.02110 Target audience: useRs interested in statistical modelling outside the box. Instructions: Install the CRAN packages mlt, tram, trtf, and mlt.docreg. Slides and R Code for this tutorial will be made available at http://ctm.R-forge.R-project.org. |
|||||
Mark Van Der Loo and Edwin De Jonge | PM6: Statistical Data Cleaning using R | Concorde 2 | |||
Abstract: To trust the outcome of a statistical analyses, one must be able to trust the input data. In this workshop we demonstrate how data quality can be systematically defined and improved using R. The workshop focuses on data validation (data checking), locating errors, and imputing missing or erroneous values under restrictions. We present short introductions to the main principles, provide quizzes and discussions for the audience, and give short R-based exercises. We will demonstrate a number of our R packages including 'validate' (for data quality checks) 'errorlocate' (for error localizationi), 'simputation' for imputaton methods and 'lumberjack' (for keeping track of changes in data). Special attention will be payed on how to combine the various data processing steps, and how to analyze and visualize the results. This tutorial is based on the book 'Statistical Data Cleaning with Applications in R' (John Wiley & Sons, 2018) by the authors of this tutorial. Materials will be made available via https://github.com/data-cleaning/useR2019_tutorial. Target audience: Participants are expected to have a basic understanding of R. Some experience with RStudio projects and git is preferable but not necessary. Instructions: See the instructions at the tutorial's github page. Note: the tutorial GH is empty at the moment but we will fill it before the conference. |
|||||
Davis Vaughan | PM7: Design For Humans! A Toolkit For Creating Intuitive Modeling Packages (joint work with M. Kuhn) | Guillaumet 1 | |||
Abstract: Workshop goals:
Target audience: This tutorial will be most useful for participants that are interested in developing R packages related to modeling. Instructions: A laptop with access to the internet is recommended. RStudio Server instances will be provided with all of the materials and packages preinstalled. Much of the material will be about the hardhat package, found here. |
|||||
Tobias Verbeke | PM8: Docker for Data Science: R, ShinyProxy and more | Auditorium Saint-Exupéry | |||
Abstract: Docker has revolutionized the software landscape in the past 5 years and completely transformed the way programs are packaged and shipped. It has also become a critical technology to achieve reproducible environments for data science and its ease of deployment and inherent scalability (Docker clusters) allow to bring data science products from a data scientist's laptop to data center scale. The primary goal of this tutorial is to demonstrate the benefits of Docker's container technology for day to day data science work. In a first part, the Docker technology will be introduced by setting up R-based environments for computing (defining, building, running containers, rocker). This will include connecting to Docker-based R environments from an IDE (RConsoleProxy). In a second part, the use of Docker will be demonstrated for all typical data science artefacts: Shiny apps, notebooks, reports, data science APIs and R packages. For each of these, mature open source components will be introduced to bring Docker into the workflow. For Shiny apps and notebooks, a detailed introduction will be given to Docker-based deployment using ShinyProxy. For APIs, reports and other artefacts, we will peek under the hood of ShinyProxy and customize its ContainerProxy engineto serve APIs, schedule reports and more. In a third part, we will cover some more advanced topics e.g. scaling these solutions with Kubernetes or applying the same technology beyond R (Python, Julia etc.) Target audience: R users with R programming experience and that need to disseminate their work (reports, Shiny apps, APIs). Technical concepts will be explained in layman's terms, but experience with using the command line and/or Linux will be helpful. Instructions: The workshop is interactive and examples for all scenarios will provided, such that users can replicate these on their laptops. Docker and associated tools need to be installed on participant laptops. Detailed instructions can be found here. |
|||||
Achim Zeileis | PM9: R/exams: A One-For-All Exams Generator | Spot | |||
Abstract: This tutorial demonstrates how the R/exams package can be used for generating various kinds of exams, quizzes, and assessments from the same pool of dynamic exercises in plain R/Markdown or R/LaTeX format. The supported output formats include:
Target audience: Lecturers, teachers, teaching assistants, etc. who want to leverage R's flexibility to generate exams, tests, and quizzes. Instructions: Participants should bring their laptops with an installation of R (and Rtools), the exams packages and its dependencies, LaTeX, and pandoc. See Steps 1-4 at http://www.R-exams.org/tutorials/installation/. |