FastR update

LLVM back-end and R compatibility

Stepan Sindelar
OracleLabs

Safe Harbor Statement

The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Oracle reserves the right to alter its development plans and practices at any time, and the development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.

Introduction

FastR

FastR is a GNU-R compatible implementation of R built on top of the GraalVM platform. It is currently based on R 3.5.1 and reuses the base packages of GNU-R.

Open source and licensed under GPLv3.

GNU-R == the reference implementation of R

FastR

  • Emphasis on compatibility with GNU-R
    • Including its C and Fortran interface
  • Additional R language level features
  • Based on GraalVM
    • Graal compiler: dynamic JIT compilation ...can execute R code significantly faster
    • Heaps of language agnostic tooling ...VisualVM, Chrome DevTools debugger, ...
    • Zero-overhead interoperability with other GraalVM languages ...JavaScript, Python, Ruby, any JVM language.

GraalVM

GraalVM is a universal virtual machine for running applications written in JavaScript, Python, Ruby, R, JVM-based languages like Java.

HotSpot JVM compiler interface Graal compiler Truffle JavaScript

Truffle framework

Language Implementation Framework based on self-optimizing abstract syntax trees and partial evaluation.

Abstract syntax tree (AST)


									2 + x
								
+ 2 x

Self optimizing AST


									foo <- function(x) 2 + x
									foo(0.2)
								
Layer 1 + 2 x UnitializedAdd UninitializedLookup Constant + 2 x DoubleDoubleAdd LocalVarLookup Constant

Partial evaluation


									int f(int x, int y) {
										return x*42 + y;
									}

									// Given x = 3
									int f(int y) {
										return 126 + y;
									}			
								

Partial evaluation of an AST interpreter

									
										double interpret(AST ast, Environment env) {
											ast.execute(env)
										}
	
										// Partially evaluate "interpreter"
										// given fixed self-optimized AST of program "2+x"
										double interpret(Environment env) {
											return 2 + env.get("x")
										}

										// ==> Compile this code with Graal to machine code
									

Compatibility with GNU-R

Issue: stability of CRAN packages w.r.t. FastR

  • Some packages release new versions often
  • Pkg devs and CRAN check them against GNU-R
  • Solution: use a fixed CRAN snapshot by default in FastR (MRAN)
  • We can make sure that important pkgs continue work on FastR
  • Bump the snapshot every now and then

Automated testing of packages

  • Easier with testthat/testit/… tests
  • Other tests and examples:
    • Run the pkg's tests/examples in both engines
    • Compare the output
    • Issues: timestamps, FS paths, minor formatting differences
    • Custom sed-like script to filter them out

						# differences in paths
						assertthat => s/Path '.*' does not exist/Path 'path/to/somewhere' does not exist
						assertthat => s/Path '.*' is not a directory/Path 'path/to/dir' is not a directory
						
						# different output format in GNUR and FastR
						assertthat => R/[1] "mean %has_args% \"y\" is not TRUE"/REPLACED has_args
						assertthat => R/[1] "`%has_args%`(f = mean, args = \"y\") is not TRUE"/REPLACED has_args							
					

Interaction with pkg authors

  • Mostly positive attitude
  • Genuine bug in Rcpp reported

Interaction with pkg authors

  • Mostly positive attitude
  • Genuine bug in Rcpp reported

FastR in GNU-R

  • Run FastR in GNU-R as a PSOCK node
  • XDR serialization supported in FastR
  • useR! talk: FastRCluster: running FastR from GNU-R
  • Future work: FastR can "adopt" GNU-R vectors
    • Sharing memory between the two
    • FastR atomic vectors can be backed by native memory
    • Other complex objects still serialized (lists, envs, …)

FastR in GNU-R

LLVM back-end for native extensions

Work-in-progress

Previous approach (NFI)

  • Run the native code as-is
  • SEXPs: opaque pointers/handles for FastR objects
  • Down-calls (e.g., .Call):
    • Pass handles to the native code
  • R API up-call (e.g. Rf_length):
    • Translate the handles back to FastR objects
  • Issues
    • Native-to-java transitions are expensive
    • Direct accesses to SEXPREC fields

Previous approach (NFI): DATAPTR

  • Run the native code as-is
  • DATAPTR: transfer the data to native memory
    • Keep it there from then on
    • Use unsafe operations to access it
  • Issues
    • transferring the data has its cost
    • Java GC does not see the true size of the object
    • Lists DATAPTR works reliably only for reading

Graal LLVM

  • Interprets LLVM bitcode (another Truffle language)
  • Supports interoperability between Truffle languages
    • Can pass FastR objects to Graal LLVM
    • FastR can provide code snippets that Graal LLVM should invoke when, e.g., indexing into the object
  • Everything is in Truffle: the compiler can inline across R and C (bitcode) boundary

Graal LLVM


						SEXP foo(SEXP x) {
						  // no translation to opaque pointer/handle
						  // x directly points to the FastR object

						  // INTEGER function in FastR gets directly the FastR object
						  // no translation from handle to FastR object
						  int* data = INTEGER(x)
						  // INTEGER returns special Java object "VectorWrapper"
						  // that represents the data of x
						  
						  data[1] = 42;
						  // Graal LLVM calls VectorWrapper.writeArrayElement(1, 42)
						  
						  DATAPTR(mylist)[0] = x;
						  // Graal LLVM calls VectorWrapper.writeArrayElement(0, x)
						  // where x is still FastR object representing the vector
						  // this operation is visible to Java GC
					

Graal LLVM: technicalities

  • How to get the bitcode?
  • Oracle Labs LLVM Tools
    • Standard toolchain (compilers, linker, etc.)
    • Embeds bitcode into all the produced binaries
      • both intermediate and final
    • Matter of changing CC etc. in Makevars
    • End result is a valid binary:
      can be loaded via NFI or Graal LLVM

Graal LLVM: cross language debugging

Graal LLVM: cross language profiling

Graal LLVM: performance

Preliminary results


					SEXP benchmark(SEXP x) {
					  SEXP result;
					  PROTECT(result = Rf_allocVector(VECSXP, Rf_length(x)));
					  for (int i = 0; i < Rf_length(x); ++i) {
					    SET_VECTOR_ELT(result, i, ScalarInteger(
							Rf_length(VECTOR_ELT(x, i))));
					  }
					  UNPROTECT(1);
					  return result;
					}
					

						a <- rep(list(1:5, rnorm(10), runif(10)), 1_000_000)
						.Call(C_benchmark, a); # 3 times == one run
					

FastR vs GNU-R: ~0.270s vs ~0.220s

i7-8750H CPU, 6x2.20GHz, 32GB RAM

Fork me on GitHub

Conclusion

  • FastR is more and more compatible with GNU-R
    • Additionally, running FastR from GNU-R can help with compatibility concerns/issues
  • FastR is getting LLVM back-end
    • Cross-language tools: debugger, profiler, …
    • Compatibility benefits: accesses to SEXPREC structure
    • Performance benefits
  • http://graalvm.org
    http://github.com/oracle/fastr