--- title: "Computing p-values for Fleming-Harrington weighted logrank tests and the MaxCombo test" author: "Keaven Anderson, Yujie Zhao" output: rmarkdown::html_vignette bibliography: simtrial.bib vignette: > %\VignetteIndexEntry{Computing p-values for Fleming-Harrington weighted logrank tests and the MaxCombo test} %\VignetteEngine{knitr::rmarkdown} --- ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) run <- requireNamespace("dplyr", quietly = TRUE) && requireNamespace("gt", quietly = TRUE) knitr::opts_chunk$set(eval = run) ``` ## Introduction This vignette demonstrates use of a simple routine to do simulations and testing using Fleming-Harrington weighted logrank tests and the MaxCombo test. In addition, we demonstrate how to perform these tests with a dataset not generated by simulation routines within the package. Note that all $p$-values computed here are one-sided with small values indicating that the experimental treatment is favored. ## Defining the test The MaxCombo test has been posed as the maximum of multiple Fleming-Harrington weighted logrank tests (@FH1982, @FH2011). Combination tests looking at a maximum of selected tests in this class have also been proposed; see @Lee2007, @NPHWGDesign, and @NPHWGSimulation. The Fleming-Harrington class is indexed by the parameters $\rho \geq 0$ and $\gamma \geq 0$. We will denote these as FH($\rho, \gamma$). This class includes the logrank test as FH(0, 0). Other tests of interest here include: - FH(0, 1): a test that down-weights early events - FH(1, 0): a test that down-weights late events - FH(1, 1): a test that down-weights events increasingly as their quantiles differ from the median ## Executing for a single dataset ### Generating test statistics with `sim_fixed_n()` We begin with a single trial simulation generated by the routine `sim_fixed_n()` using default arguments for that routine. `sim_fixed_n()` produces one record per test and data cutoff method per simulation. Here we choose 3 tests (logrank = FH(0, 0), FH(0, 1) and FH(1, 1)). When more than one test is chosen the correlation between tests is computed as shown by @Karrison2016, in this case in the columns `V1`, `V2`, `V3`. The columns `rho`, `gamma` indicate $\rho$ and $\gamma$ used to compute the test. `z` is the FH($\rho, \gamma$) normal test statistic with variance 1 with a negative value favoring experimental treatment. The variable `cut` indicates how the data were cut for analysis, in this case at the maximum of the targeted minimum follow-up after last enrollment and the date at which the targeted event count was reached. `Sim` is a sequential index of the simulations performed. ```{r, message=FALSE, warning=FALSE} library(simtrial) library(knitr) library(dplyr) library(gt) ``` ```{r} set.seed(123) x <- sim_fixed_n( n_sim = 1, timing_type = 5, rho_gamma = data.frame(rho = c(0, 0, 1), gamma = c(0, 1, 1)) ) x |> gt() |> fmt_number(columns = c("ln_hr", "z", "duration", "v1", "v2", "v3"), decimals = 2) ``` ### Generating data with `sim_pw_surv()` We begin with another simulation generated by `sim_pw_surv()`. Again, we use defaults for that routine. ```{r, message=FALSE, warning=FALSE, cache=FALSE} set.seed(123) s <- sim_pw_surv(n = 100) s |> head() |> gt() |> fmt_number(columns = c("enroll_time", "fail_time", "dropout_time", "cte"), decimals = 2) ``` Once generated, we need to cut the data for analysis. Here we cut after 75 events. ```{r, warning=FALSE, message=FALSE} x <- s |> cut_data_by_event(75) x |> head() |> gt() |> fmt_number(columns = "tte", decimals = 2) ``` Now we can analyze this data. We begin with `s` to show how this can be done in a single line. In this case, we use the 4 test combination suggested in @NPHWGSimulation, @NPHWGDesign. ```{r, warning=FALSE, message=FALSE} z <- s |> cut_data_by_event(75) |> maxcombo(rho = c(0, 0, 1, 1), gamma = c(0, 1, 0, 1)) z ``` Suppose we want the $p$-value just based on the logrank and FH(0, 1) and FH(1, 0) as suggested by @Lee2007. We remove the rows and columns associated with FH(0, 0) and FH(1, 1) and then apply `pvalue_maxcombo()`. ```{r, warning=FALSE, message=FALSE} z <- s |> cut_data_by_event(75) |> maxcombo(rho = c(0, 1), gamma = c(1, 0)) z ``` ### Using survival data in another format For a trial not generated by `sim_fixed_n()`, the process is slightly more involved. We consider survival data not in the simtrial format and show the transformation needed. In this case we use the small `aml` dataset from the survival package. ```{r, warning=FALSE, message=FALSE} library(survival) aml |> head() |> gt() ``` We rename variables and create a stratum variable as follows: ```{r, warning=FALSE, message=FALSE} x <- aml |> transmute( tte = time, event = status, stratum = "All", treatment = case_when( x == "Maintained" ~ "experimental", x == "Nonmaintained" ~ "control" ) ) x |> head() |> gt() ``` Now we analyze the data with a MaxCombo with the logrank and FH(0, 1) and compute a $p$-value. ```{r, warning=FALSE, message=FALSE} x |> maxcombo(rho = c(0, 0), gamma = c(0, 1)) ``` ## Simulation We now consider the example simulation from the `pvalue_maxcombo()` help file to demonstrate how to simulate power for the MaxCombo test. However, we increase the number of simulations to 100 in this case; a larger number should be used (e.g., 1000) for a better estimate of design properties. Here we will test at the $\alpha=0.001$ level. ```{r, cache=FALSE, warning=FALSE, message=FALSE} set.seed(123) # Only use cut events + min follow-up x <- sim_fixed_n( n_sim = 100, timing_type = 5, rho_gamma = data.frame(rho = c(0, 0, 1), gamma = c(0, 1, 1)) ) # MaxCombo power estimate for cutoff at max of targeted events, minimum follow-up x |> group_by(sim) |> filter(row_number() == 1) |> ungroup() |> summarize(power = mean(p_value < .001)) ``` We note the use of `group_map` in the above produces a list of $p$-values for each simulation. It would be nice to have something that worked more like `dplyr::summarize()` to avoid `unlist()` and to allow evaluating, say, multiple data cutoff methods. The latter can be done without having to re-run all simulations as follows, demonstrated with a smaller number of simulations. ```{r, cache=FALSE, warning=FALSE, message=FALSE} # Only use cuts for events and events + min follow-up set.seed(123) x <- sim_fixed_n( n_sim = 100, timing_type = c(2, 5), rho_gamma = data.frame(rho = 0, gamma = c(0, 1)) ) ``` Now we compute a $p$-value separately for each cut type, first for targeted event count. ```{r, warning=FALSE, message=FALSE} # Subset to targeted events cutoff tests # This chunk will be updated after the development of sim_gs_n and sim_fixed_n x |> filter(cut == "Targeted events") |> group_by(sim) |> filter(row_number() == 1) |> ungroup() |> summarize(power = mean(p_value < .025)) ``` Now we use the later of targeted events and minimum follow-up cutoffs. ```{r, warning=FALSE, message=FALSE} # Subset to targeted events cutoff tests x |> filter(cut != "Targeted events") |> group_by(sim) |> filter(row_number() == 1) |> ungroup() |> summarize(power = mean(p_value < .025)) ``` ## References