July 10, useR! 2019, Toulouse - France

Dose matters!

  • Liver is responsible for concentrating and metabolizing a majority of medications, so it is a prime target for medication-induced damage

  • (the next targets are kidney and heart).

  • Drug-induced liver injury (DILI) is also one of the main reasons for drug withdrawal from the market.

  • Detecting DILI as early as possible during drug development is crucial.

  • Low doses are ineffective and high doses are toxic.

  • Determining the toxic dose is an important aspect of this procedure.

A motivating project

Some examples

plotDoseResponseData(inputDataset = dataUse, dose = 2, response = 1, ID = 3, subjectID = 16)

logistic model

As it is defined in R package DoseFinding:

\[f(d) = E_0 + \frac{E_{\max}}{1 + \exp\{\frac{ED_{50} - d}{\delta}\}}\]

Fitting logistic model

Any other models?

Any other models?

Which model to use?

Model averaging

Model averaging

Model averaging

  • Uses SOME useful models instead of one!

  • The two sources of uncertainy which are usually considered: model-parameters, model error. But the model itself is a source of uncertainty as well!

  • The model averaged estimate: \[ \widehat{\theta}_{MA} = \sum_{r=1}^L w_i \widehat{\theta}_r.\]

  • With:
    \[w_i = \frac{\exp\{-0.5 (\mathrm{AIC}_i - \mathrm{AIC}_{\mathrm{min}})\}}{\sum_{j =1}^ L\exp\{-0.5 (\mathrm{AIC}_j - \mathrm{AIC}_{\mathrm{min}})\}},\]

  • More information can be found in: Anderson DR, Burnham K. Model selection and multi-model inference. NY: Springer-Verlag, 2002.

Average all models?

  • Of course not!

  • The set of models should be selected intelligently:

  • Rule 1: No model should be fitted to a flat dose-response pattern

  • Rule 2: The set of models for a monotone response should be different from the set of models with a response with possibility of an umbrella-shaped pattern.

  • That is what clustDRM designed to take care of.

Workflow of clustDRM

Identified patterns for our examples

idx <- which(dataUse$CompID == 5 | dataUse$CompID == 16 |dataUse$CompID == 17 |dataUse$CompID == 44)
toInput <- inputDataMaker(2, 1, 3, dataUse[idx,])
generalPatternClust <- generalPatternClustering(inputData = toInput$inputData, colsData = toInput$colsData, 
        colID = toInput$colID, doseLevels = toInput$doseLevels, numReplications = toInput$numReplicates, 
        na.rm = FALSE, imputationMethod = "mean", ORICC = "two", transform = "none", LRT = FALSE, MCT = TRUE, 
        adjustMethod = "BH", nPermute = 100, useSeed = NULL, theLeastNumberOfMethods = 2, alpha = 0.05, 
        nCores = 1)
generalPatternClust$clusteringORICC2Results$resultsMCT
##   clusteringResults ID testStat unadjustedPvalues adjustedPvalues
## 1              flat  5       NA                NA              NA
## 2              flat 16       NA                NA              NA
## 3  down up min at 2 17 0.436370      4.279877e-06    4.279877e-06
## 4  up down max at 2 44 3.716037      5.778762e-07    1.155752e-06
##   selectedContrast
## 1               NA
## 2               NA
## 3                1
## 4               33

Compute model-averaged ED50’s

fittedModel <- fitDRM (inputDataset = dataUse[idx,], dose = 2, response = 1, ID = 3, subsettingID = NULL, 
        transform = c("none"), addCovars = ~1, 
        patternClusters = generalPatternClust$clusteringORICC2Results$clusteringResultsORICC2, 
        EDp = 0.5, addCovarsVar = FALSE, alpha = 0.05, na.rm = FALSE, imputationMethod = c("mean"), 
        nCores = 1)
fittedModel$estEDpMonotone
## [1] CompID            linear            linlog            exponential      
## [5] emax              sigEmax           logistic          minAIC           
## [9] modelAveragingAIC
## <0 rows> (or 0-length row.names)
fittedModel$estEDpNonmonotone
##   CompID linear   linlog exponential      emax   sigEmax  logistic
## 1     17     25 4.524938    45.41529 0.4901961 0.3472222 28.250954
## 2     44     25 4.524938    31.00573 9.2956173 5.8693030  7.098101
##        betaMod quadratic       minAIC modelAveragingAIC
## 1 9.042245e-05  5.418892 9.042245e-05      0.0003590113
## 2 1.194241e+01 10.515112 5.869303e+00      8.2724232535

We have more to offer!

simRes <- simulEvalDRM (pilotData = dataUse[dataUse$CompID == idx[4], c(1,2)], doseLevels = c(0, 2, 4), 
        numReplications = c(6, 3, 3), numSim = 100, standardDeviation = 1, EDp = 0.5, 
        funcList = c("linlog", "emax", "sigEmax", "logistic", "betaMod"))
simRes$varEDpAveraged
##          linlog         emax      sigEmax     logistic      betaMod
## linlog        0 6.609280e-06 9.935835e-06 2.564004e-03 1.169315e-07
## emax          0 1.116934e-06 3.232857e-07 1.617163e-06 4.173184e-03
## sigEmax       0 1.201556e-06 4.410548e-07 4.155338e-06 2.187586e-06
## logistic      0 0.000000e+00 7.033344e-06 5.868206e-04 7.861744e-06
## betaMod       0 0.000000e+00 1.132671e-06 4.617211e-06 2.340998e-05
##                minAIC modelAveraging
## linlog   6.609280e-06   6.577676e-05
## emax     1.276258e-02   1.336768e-03
## sigEmax  1.201556e-06   7.247402e-07
## logistic 6.824368e-03   8.130960e-05
## betaMod  1.610476e-01   4.376284e-06

We have more to offer!

plotSimulDRM(simRes, quantity2Plot = "mse")

We have more to offer!

There are also two Shiny applications!

  • clustDRMapp()
  • clustDRMappSimple()

News!

  • We are working on a new R package currently called drcMA.
  • The clustering step of drcMA will cover unbalanced data.
  • drcMA will use the features available in R package drc as well.
  • Selection of candidate models is done more carefully (considering the precision of the estimates, etc.).
  • More options for weights will be available.
  • A wide range of variance and confidence interval estimators for model-averaged estimates are implemented (e.g., delta method, Buckland, Burnham & Anderson, tail area Wald method, Kang, etc.).
  • More extensive visualization tools are offered.